API: Utility Functions

These miscellaneous functions are independent of experiment and data handle objects. They are useful for logistics such as canceling running operations, downloading cluster logs, and ad hoc file handling on remote storage. The file handling functions can be obtained with a single import statement: from rapidfire.util import file_utils.

Cancel Current

This function cancels an ongoing potentially long-running operation on the cluster. For instance, if you launch a run_fit() op and then realize you made a mistake in its config or want to try fewer configs, of it you launch a download() of a data handle and then realize you wanted a smaller sample, this function will help save you time instead of waiting for that operation to finish.

cancel_current() → None

Returns:: None
Return type:: None

Example:

from rapidfire import cancel_current
# Cancel whatever operation is still running on the cluster
cancel_current()

Notes:

Note that RapidFire AI requires Jupyter notebook cells to be run synchronously (one after another). So, you must first click “Stop” on the Jupyter menu for the op’s cell that you want to cancel first; and then you can run cancel_current() in a new cell, perhaps right below the cancelled op’s cell.

Under the hood, cancel_current() will reset the relevant states of the distributed execution engine cleanly so that all machines move to a consistent state where they can be ready for new ops without conflicts with the cancelled op.

You can use cancel_current() to cancel a running run_test() and run_predict() too. It is not applicable to delete_local() of a data handle.

Note that if you need to change your locators and/or any code in your MLSpec, it does NOT suffice to just run cancel_current(). You must also run end() and after that create a new experiment with a new name for your new code. Note that this will help you more easily track what runs came from what code.

Download Logs

This function lets you see the detailed logs on the cluster from all RapidFire AI processes, including from the controller and worker machines. This can help with deeper debugging in case the error messages displayed on the notebook are insufficient.

download_logs(experiment_name: str | None) → None

Parameters:: experiment_name (str, optional) – Name of an experiment if logs are needed only for its operations
Returns:: None
Return type:: None

Example:

from rapidfire import download_logs
# Download detailed log files from the cluster
download_logs()

Notes:

It returns a single zipped folder that is placed on the Jupyter server home directory. A message will be printed on the notebook along with a link that you can click to download that packet.

File Handling Utilities

These functions allow you to list, read, write, or delete objects on remote storage (S3 for now) and the local Jupyter filesystem. This can be useful for multiple scenarios, e.g., uploading modified ESFs for your data, reading raw data examples for visualization, reading model checkpoints for processing outside RapidFire AI API, etc.

The path argument for the functions below must be an absolute path as follows:

Amazon S3 bucket: “s3://bucket/prefix”
Local file on Jupyter server: “file:///folder/path”

The examples below are taken from the COCO tutorial notebook. We plan to expand this API to support more remote storage options based on feedback.

Get Object

get_object(path: str) → None

Parameters:: path (str) – The bucket and prefix of object to get from remote storage (for S3) or full path to local file (on Jupyter server)
Returns:: A byte stream with the object that can be used with any file type specific reader, e.g., pandas.read_csv()
Return type:: BytesIO or None if there was a problem retrieving object

Example:

from rapidfire.util import file_utils

# Obtain the ESF for COCO validation partition and open as DataFrame
dataset_bucket = getenv("DATASET_BUCKET")
valesf = pandas.read_csv(file_utils.get_object(f"s3://{dataset_bucket}/coco/metadata/coco-val.csv"))

# Obtain an image from COCO validation partition and open as Image
imgfilename = "000000000632.jpg"
image = Image.open(file_utils.get_object(f"s3://{dataset_bucket}/coco/data/val2017/{imgfilename}"))

Put Object

put_object(path: str, data: BytesIO) → None

Parameters:

path (str) – The bucket and prefix of object to put on remote storage (for S3) or full path to a local filename (on Jupyter server)
data (BytesIO) – A byte stream of the object to put into file

Returns:

None

Return type:

None

Example:

# Write a modified ESF for COCO validation partition read above into a new user bucket location
user_bucket = getenv("USER_BUCKET")

valesfnew = ... # Modify this DataFrame that was read above

# Write DataFrame to BytesIO as CSV and put on remote storage
buffer = io.BytesIO()
valesfnew.to_csv(buffer, index=False)
buffer.seek(0)
file_utils.put_object(f"s3://{user_bucket}/coco/metadata/valesfnew.csv", buffer)

List Objects

list_objects(path: str) → None

Parameters:: path (str) – The bucket and prefix of source location on remote storage to recursively list objects under it (for S3) or a file system path to begin search under (on Jupyter server)
Returns:: A list of objects in the specified path or an empty list if the path is None
Return type:: List [str]

Example:

# Obtain all objects under COCO metadata on S3
coco_metadata_files = file_utils.list_objects(f"s3://{dataset_bucket}/coco/metadata")

Delete Object

delete_object(path: str) → None

Parameters:: path (str) – The bucket and prefix of object to delete on remote storage (for S3) or full path to local file (on Jupyter server)
Returns:: None
Return type:: None

Example:

# Delete the new ESF for COCO validation partition that was written above
user_bucket = getenv("USER_BUCKET")
filename = f"s3://{user_bucket}/coco/metadata/valesfnew.csv"
file_utils.delete_object(filename)

Notes:

Note that you do not have write access to data buckets owned by RapidFire AI, but you are given full read-write access to your user account-specific buckets.