API: Utility Functions
These miscellaneous functions are independent of experiment and data handle objects.
They are useful for logistics such as canceling running operations, downloading cluster logs, and ad hoc file handling on remote storage.
The file handling functions can be obtained with a single import statement: from rapidfire.util import file_utils
.
Cancel Current
This function cancels an ongoing potentially long-running operation on the cluster.
For instance, if you launch a run_fit()
op and then realize you made a mistake in its config or
want to try fewer configs,
of it you launch a download()
of a data handle and then realize you wanted a smaller sample,
this function will help save you time instead of waiting for that operation to finish.
Example:
from rapidfire import cancel_current
# Cancel whatever operation is still running on the cluster
cancel_current()
Notes:
Note that RapidFire AI requires Jupyter notebook cells to be run synchronously (one after another).
So, you must first click “Stop” on the Jupyter menu for the op’s cell that you want to cancel first;
and then you can run cancel_current()
in a new cell, perhaps right below the cancelled op’s cell.
Under the hood, cancel_current()
will reset the relevant states of the distributed
execution engine cleanly so that all machines move to a consistent state where they can
be ready for new ops without conflicts with the cancelled op.
You can use cancel_current()
to cancel a running run_test()
and run_predict()
too.
It is not applicable to delete_local()
of a data handle.
Note that if you need to change your locators and/or any code in your MLSpec
,
it does NOT suffice to just run cancel_current()
.
You must also run end()
and after that create a new experiment with a new name for your new code.
Note that this will help you more easily track what runs came from what code.
Download Logs
This function lets you see the detailed logs on the cluster from all RapidFire AI processes, including from the controller and worker machines. This can help with deeper debugging in case the error messages displayed on the notebook are insufficient.
- download_logs(experiment_name: str | None) None
- Parameters:
experiment_name (str, optional) – Name of an experiment if logs are needed only for its operations
- Returns:
None
- Return type:
None
Example:
from rapidfire import download_logs
# Download detailed log files from the cluster
download_logs()
Notes:
It returns a single zipped folder that is placed on the Jupyter server home directory. A message will be printed on the notebook along with a link that you can click to download that packet.
File Handling Utilities
These functions allow you to list, read, write, or delete objects on remote storage (S3 for now) and the local Jupyter filesystem. This can be useful for multiple scenarios, e.g., uploading modified ESFs for your data, reading raw data examples for visualization, reading model checkpoints for processing outside RapidFire AI API, etc.
The path
argument for the functions below must be an absolute path as follows:
Amazon S3 bucket: “s3://bucket/prefix”
Local file on Jupyter server: “file:///folder/path”
The examples below are taken from the COCO tutorial notebook. We plan to expand this API to support more remote storage options based on feedback.
Get Object
- get_object(path: str) None
- Parameters:
path (str) – The bucket and prefix of object to get from remote storage (for S3) or full path to local file (on Jupyter server)
- Returns:
A byte stream with the object that can be used with any file type specific reader, e.g.,
pandas.read_csv()
- Return type:
BytesIO or None if there was a problem retrieving object
Example:
from rapidfire.util import file_utils
# Obtain the ESF for COCO validation partition and open as DataFrame
dataset_bucket = getenv("DATASET_BUCKET")
valesf = pandas.read_csv(file_utils.get_object(f"s3://{dataset_bucket}/coco/metadata/coco-val.csv"))
# Obtain an image from COCO validation partition and open as Image
imgfilename = "000000000632.jpg"
image = Image.open(file_utils.get_object(f"s3://{dataset_bucket}/coco/data/val2017/{imgfilename}"))
Put Object
- put_object(path: str, data: BytesIO) None
- Parameters:
path (str) – The bucket and prefix of object to put on remote storage (for S3) or full path to a local filename (on Jupyter server)
data (BytesIO) – A byte stream of the object to put into file
- Returns:
None
- Return type:
None
Example:
# Write a modified ESF for COCO validation partition read above into a new user bucket location
user_bucket = getenv("USER_BUCKET")
valesfnew = ... # Modify this DataFrame that was read above
# Write DataFrame to BytesIO as CSV and put on remote storage
buffer = io.BytesIO()
valesfnew.to_csv(buffer, index=False)
buffer.seek(0)
file_utils.put_object(f"s3://{user_bucket}/coco/metadata/valesfnew.csv", buffer)
List Objects
- list_objects(path: str) None
- Parameters:
path (str) – The bucket and prefix of source location on remote storage to recursively list objects under it (for S3) or a file system path to begin search under (on Jupyter server)
- Returns:
A list of objects in the specified path or an empty list if the path is None
- Return type:
List [str]
Example:
# Obtain all objects under COCO metadata on S3
coco_metadata_files = file_utils.list_objects(f"s3://{dataset_bucket}/coco/metadata")
Delete Object
- delete_object(path: str) None
- Parameters:
path (str) – The bucket and prefix of object to delete on remote storage (for S3) or full path to local file (on Jupyter server)
- Returns:
None
- Return type:
None
Example:
# Delete the new ESF for COCO validation partition that was written above
user_bucket = getenv("USER_BUCKET")
filename = f"s3://{user_bucket}/coco/metadata/valesfnew.csv"
file_utils.delete_object(filename)
Notes:
Note that you do not have write access to data buckets owned by RapidFire AI, but you are given full read-write access to your user account-specific buckets.