Tutorial Use Case: ImageNet
=======================

Dataset
-------

The ImageNet dataset is a pioneering benchmark dataset in computer vision for object recognition.
The input is an image sourced from the public Web. 
The target is a 1000-class label vocabulary of objects from the WordNet hierarchy (car, bus, cat, etc.).

The raw dataset files are sourced from 
`this Kaggle page <https://www.kaggle.com/c/imagenet-object-localization-challenge/data>`_.

There are 3 top-level directories, but we only need the first two of these:
* Data
* Metadata
* Annotations

The Data directory has all the raw image files stored under subdirectories as follows:

* train: This has another level of subdirectories grouped by label, e.g., "n01440764/", and within that are the image files, e.g., "n01440764_10026.JPEG"

* val: This has another level of subdirectories grouped by label, e.g., "n01440764/", and within that are the image files, e.g., "ILSVRC2012_val_00000293.JPEG"

* test: Under this directly are the image files, e.g., "ILSVRC2012_test_00000001.JPEG"

The Metadata directory has the Example Structure Files (ESFs) for the three data partitions.
Please read :doc:`API: Data Ingestion and Locators <ingestion>` for more ESF-related details. 
The files are as follows:

* train.csv: A labeled set with the following column names: (id, filename, filepath, original_label, label).

* valid.csv: Also a labeled set with the same column names: (id, filename, filepath, original_label, label).

* test.csv: An unlabeled set with the following column names: (id, filename, filepath). 
Note that this is actually the "predict" partition in our nomenclature, since it is not labeled.


Model(s)
-------

This tutorial notebook illustrates both model architecture comparison and hyperparameter tuning in a single :func:`run_fit()`. It showcases the following models: 

* ResNet-50 from :code:`torchvision` library.

* VGG16 from :code:`torchvision` library.

* ViT from `this HF models page <https://huggingface.co/google/vit-base-patch16-224>`_.

All these models were already trained on this dataset. 
But since we are continuing training on a smaller sample with potentially different hyperparameters 
that what they were originally trained with, you are likely to see diverse learning behaviors. 
Such diversity is useful to help understand the utility of our :doc:`Interactive Control operations <icops>`. 

If you'd like to retrain any model from scratch you can reinitialize its weights in your 
:func:`create_model()` function in :code:`MLSpec`.


Config Knobs
-----------

As mentioned above, we perform both architecture comparisons and hyperparameter tuning  with :func:`GridSearch()`. 
We compare 2 architectures using a user knob named :code:`model_type`; for each architecture, we try 2 different values each for the :code:`batch_size` and :code:`lr` (learning rate).

We also indicate 2 named-metrics: :code:`top1_accuracy` and :code:`top5_accuracy` to be plotted out of the box; these are the original benchmark metrics for this multi-class classification task.


Step by Step Code in our API
----------------------------

Please check out the notebook named :code:`rf-tutorial-imagenet.ipynb` on the Jupyter home directory in your 
RapidFire AI cluster.