Tutorial Use Case: ImageNet

Dataset

The ImageNet dataset is a pioneering benchmark dataset in computer vision for object recognition. The input is an image sourced from the public Web. The target is a 1000-class label vocabulary of objects from the WordNet hierarchy (car, bus, cat, etc.).

The raw dataset files are sourced from this Kaggle page.

There are 3 top-level directories, but we only need the first two of these: * Data * Metadata * Annotations

The Data directory has all the raw image files stored under subdirectories as follows:

  • train: This has another level of subdirectories grouped by label, e.g., “n01440764/”, and within that are the image files, e.g., “n01440764_10026.JPEG”

  • val: This has another level of subdirectories grouped by label, e.g., “n01440764/”, and within that are the image files, e.g., “ILSVRC2012_val_00000293.JPEG”

  • test: Under this directly are the image files, e.g., “ILSVRC2012_test_00000001.JPEG”

The Metadata directory has the Example Structure Files (ESFs) for the three data partitions. Please read API: Data Ingestion and Locators for more ESF-related details. The files are as follows:

  • train.csv: A labeled set with the following column names: (id, filename, filepath, original_label, label).

  • valid.csv: Also a labeled set with the same column names: (id, filename, filepath, original_label, label).

  • test.csv: An unlabeled set with the following column names: (id, filename, filepath).

Note that this is actually the “predict” partition in our nomenclature, since it is not labeled.

Model(s)

This tutorial notebook illustrates both model architecture comparison and hyperparameter tuning in a single run_fit(). It showcases the following models:

  • ResNet-50 from torchvision library.

  • VGG16 from torchvision library.

  • ViT from this HF models page.

All these models were already trained on this dataset. But since we are continuing training on a smaller sample with potentially different hyperparameters that what they were originally trained with, you are likely to see diverse learning behaviors. Such diversity is useful to help understand the utility of our Interactive Control operations.

If you’d like to retrain any model from scratch you can reinitialize its weights in your create_model() function in MLSpec.

Config Knobs

As mentioned above, we perform both architecture comparisons and hyperparameter tuning with GridSearch(). We compare 2 architectures using a user knob named model_type; for each architecture, we try 2 different values each for the batch_size and lr (learning rate).

We also indicate 2 named-metrics: top1_accuracy and top5_accuracy to be plotted out of the box; these are the original benchmark metrics for this multi-class classification task.

Step by Step Code in our API

Please check out the notebook named rf-tutorial-imagenet.ipynb on the Jupyter home directory in your RapidFire AI cluster.