|
|
A basic structure with the essential components of a PyTorch model is provided inside *models/model_name* directory.
|
|
|
|
|
|
All the following section subtitles are relative to paths starting from this directory.
|
|
|
|
|
|
## run.py
|
|
|
The starting point of the model. It accepts these command-line arguments:
|
|
|
|
|
|
* *--config* → path to the [configuration file](https://gitlab.fbk.eu/dsip/templates/dl_setup/-/wikis/Configuration-file);
|
|
|
* *--train* → flag to train model;
|
|
|
* *--inference* → flag to use model for inference.
|
|
|
|
|
|
Inside *run.py* file there are 4 functions:
|
|
|
|
|
|
* *train* → defines some initial components before the train loop (loads config, creates dataloader, etc.). Here you can decide if you want to use Scheduler, early stopper, and so on;
|
|
|
* *inference* → same as train but for inference operation;
|
|
|
* *get_device* → get device (gpu or cpu);
|
|
|
* *set_threads* → set the number of threads that will be used by PyTorch.
|
|
|
|
|
|
## train.py
|
|
|
Inside this file are two functions:
|
|
|
|
|
|
* *compute_loss* → self explainatory. Called by train_model;
|
|
|
* *train_model* → training loop called inside *train* function (run.py file).
|
|
|
|
|
|
#### train_model function
|
|
|
##### Paramaters
|
|
|
The train_model function accepts these required parameters:
|
|
|
|
|
|
* *aim_session* → session used for track experiment, for example, loss and other metrics;
|
|
|
* *model* → model to train;
|
|
|
* *optimizer* → optimizer;
|
|
|
* *epochs* → number of epochs;
|
|
|
* *data_loaders* → dictionary. Keys are strings representing phases of the training process (train, validation, test). Values are DataLoader instances;
|
|
|
* *device* → device that will be used;
|
|
|
* *net_weights_dir* → path to the directory in which model weights will be saved (loaded from [configuration file](https://gitlab.fbk.eu/dsip/templates/dl_setup/-/wikis/Configuration-file) in train function inside run.py file)
|
|
|
|
|
|
There are also some optional parameters:
|
|
|
|
|
|
* *scheduler* → scheduler for the optimizer;
|
|
|
* *early_stopper* → used to save periodically best weights. It is used to stop the training when the model starts to overfit (more information in the related section below).
|
|
|
|
|
|
Optional parameters can be set to *None* if you don't want to use one or more of them.
|
|
|
|
|
|
##### Body
|
|
|
The body of the functions iterates over epochs. For each epoch, there is a phase for each one of the data_loaders.
|
|
|
|
|
|
## inference.py
|
|
|
Inside this file, there is only one function that iterates over the dataset. The data retrieved from the dataloader is passed to the model in order to compute inference output.
|
|
|
|
|
|
## data_processing/data_loading.py
|
|
|
In this file, two simple train and inference Dataset classes are defined. There are also two functions for data loading from a file or database.
|
|
|
|
|
|
## model/early_stopper.py
|
|
|
The early stopper is a component that checks validation loss every epoch and memorizes the best one. When the loss decrease under the best value registred the early stopper saves net weights to a file.
|
|
|
|
|
|
It is possible to set a patience value. The patience is an integer and it indicates to the early stopper how many epochs without loss decrease it must wait before stopping the train loop.
|
|
|
|
|
|
To summarize:
|
|
|
* it avoids overfitting saving the best validation model;
|
|
|
* it saves training time stopping the loop before the end.
|
|
|
|
|
|
Net weights will be saved inside the directory specified inside [configuration file](https://gitlab.fbk.eu/dsip/templates/dl_setup/-/wikis/Configuration-file). The dir structure will be:
|
|
|
```
|
|
|
net_weights_dir
|
|
|
├──┐experiment_name
|
|
|
│ ├── run_name_1.pth
|
|
|
│ ├── run_name_1_early_stop.pth
|
|
|
│ ├── run_name_2.pth
|
|
|
│ └── run_name_2_early_stop.pth
|
|
|
└──┐another_experiment
|
|
|
├── a_run_name.pth
|
|
|
└── a_run_name_early_stop.pth
|
|
|
``` |