3.1.2.3. Training a machine learning model
In this example, we perform the “action” of training a single ML model using the training and validation ML datasets generated from the action described in this page.
NOTE: Users are advised to read the remainder of the current page in its entirety before trying to execute this action.
To execute the action, first we need to change into the directory
<root>/examples/modelling/cbed/distortion/estimation/scripts
, where
<root>
is the root of the emicroml
repository. Then, we need to run the
Python script ./execute_action.py
via the terminal command:
python execute_action.py --action=<action> --use_slurm=<use_slurm>
where <action>
must be equal to train_ml_model_set
, and <use_slurm>
is either yes
or no
. If <use_slurm>
equals yes
and a SLURM
workload manager is available on the server from which you intend to run the
script, then the action will be performed as a SLURM job. If <use_slurm>
is
equal to no
, then the action will be performed locally without using a SLURM
workload manager.
If the action is to be performed locally without using a SLURM workload manager, then prior to executing the above Python script, a set of Python libraries need to be installed in the Python environment within which said Python script is to be executed. See this page for instructions on how to do so. If the action is being performed as a SLURM job, then prior to executing any Python commands that do not belong to Python’s standard library, a customizable sequence of commands are executed that are expected to try to either activate an existing Python virtual environment, or create then activate one, in which the Python libraries needed to complete the action successfully are installed. See this page for instructions how to customize the sequence of commands.
The action described at the beginning of the current page takes automatically as
input data output data generated by the action described in the page
Combining then splitting machine learning datasets for training and validation,
hence one must execute the latter action first, prior to the former. Upon
successful completion of the former action, Upon successful completion of the
former action, a dictionary representation of the ML model after training is
saved to a file at the file path
<top_level_data_dir>/ml_models/ml_model_0/ml_model_at_lr_step_<last_lr_step>.pth
,
where <last_lr_step>
is an integer indicating the last learning rate step in
the ML model training procedure, and <top_level_data_dir>
is
<root>/examples/modelling/cbed/distortion/estimation/data
. Additionally, the
ML model training summary output data file is saved to
<top_level_data_dir>/ml_models/ml_model_0/ml_model_training_summary_output_data.h5"
.
In executing the action described at the beginning of the current page, multiple
scripts are executed. The particular scripts that are executed depend on the
command line arguments of the parent Python script introduced at the beginning
of this page. If <use_slurm>
equals yes
, then the following scripts are
executed in the order that they appear directly below:
<root>/examples/modelling/cbed/distortion/estimation/scripts/execute_action.py
<root>/examples/modelling/cbed/common/scripts/train_ml_model_set/execute_all_action_steps.py
<root>/examples/modelling/cbed/common/scripts/train_ml_model/execute_all_action_steps.py
<root>/examples/modelling/cbed/common/scripts/train_ml_model/prepare_and_submit_slurm_job.sh
<root>/examples/modelling/cbed/common/scripts/train_ml_model/execute_main_action_steps.py
Otherwise, if <use_slurm>
equals no
, then the fourth script, i.e. the
one with the basename prepare_and_submit_slurm_job.sh
is not executed. See
the contents of the scripts listed above for implementation details. The last
script uses the module emicroml.modelling.cbed.distortion.estimation
. It
is recommended that you consult the documentation of said module as you explore
said script. Lastly, if the action is being performed as a SLURM job, then the
default sbatch
options, which are specified in the file with the basename
prepare_and_submit_slurm_job.sh
, can be overridden by following the
instructions in this page.