3.1.2.7. Running the RGM test set #1
In this example, we perform the “action” of running “first” set of tests of the radial gradient maximization (RGM) approach to the distortion estimation of CBED patterns. The RGM tests use the machine learning (ML) datasets generated from the action described in this page.
NOTE: Users are advised to read the remainder of the current page in its entirety before trying to execute this action.
To execute the action, first we need to change into the directory
<root>/examples/modelling/cbed/distortion/estimation/scripts
, where
<root>
is the root of the emicroml
repository. Then, we need to run the
Python script ./execute_action.py
via the terminal command:
python execute_action.py --action=<action> --use_slurm=<use_slurm>
where <action>
must be equal to run_rgm_test_set_1
, and <use_slurm>
is either yes
or no
. If <use_slurm>
equals yes
and a SLURM
workload manager is available on the server from which you intend to run the
script, then the action will be performed as multiple SLURM jobs. If
<use_slurm>
is equal to no
, then the action will be performed locally
without using a SLURM workload manager.
If the action is to be performed locally without using a SLURM workload manager, then prior to executing the above Python script, a set of Python libraries need to be installed in the Python environment within which said Python script is to be executed. See this page for instructions on how to do so. If the action is being performed as multiple SLURM jobs, then prior to executing any Python commands that do not belong to Python’s standard library, a customizable sequence of commands are executed that are expected to try to either activate an existing Python virtual environment, or create then activate one, in which the Python libraries needed to complete the action successfully are installed. See this page for instructions how to customize the sequence of commands.
The action described at the beginning of the current page takes automatically as input data output data generated by the action described in the page Combining machine learning datasets for the machine learning model test set #1, hence one must execute the latter action first, prior to the former.
At this point, it is worth noting that every ML data instance stored in every
valid ML dataset encodes data about a “fake” CBED pattern. See the documentation
for the class fakecbed.discretized.CBEDPattern
for a full discussion on
fake CBED patterns and the context relevant to the discussion below. In
constructing a fake CBED pattern, one needs to specify a set of circular,
undistorted CBED disk supports, defined in
\(\left(u_{x},u_{y}\right)\)-space, sharing a common disk radius, with their
centers being specified in \(\left(u_{x},u_{y}\right)\) coordinates. One
also needs to specify a distortion field to construct a fake CBED pattern. The
supports of the distorted CBED disks that appear in the fake CBED pattern are
obtained by distorting the aforementioned set of undistorted disk supports
according to the aforementioned distortion field. Let \(\left\{
\left(u_{x;c;\text{C};i},u_{y;c;\text{C};i}\right)\right\}_{i}\) be the
\(\left(u_{x},u_{y}\right)\) coordinates of the undistorted disk support
centers, and \(\left\{
\left(q_{x;c;\text{C};i},q_{y;c;\text{C};i}\right)\right\}_{i}\) be the
corresponding coordinates in \(\left(q_{x},q_{y}\right)\)-space according to
the distortion field. The RGM approach to estimating the distortion of the fake
CBED pattern can be described as follows:
1. Use the RGM technique, described in Ref. [Mahr1] to estimate the subset of \(\left\{ \left(q_{x;c;\text{C};i},q_{y;c;\text{C};i}\right)\right\}_{i}\) corresponding to the distorted CBED disks in the fake CBED pattern that are not clipped.
2. Determine iteratively via non-linear least squares the distortion field that minimizes the mean-square error of the estimated coordinates in step 1.
Upon successful completion of the action described briefly at the beginning of
the current page, for every string <disk_size>
in the sequence (small,
medium, large)
, the RGM approach will have been tested against the ML testing
dataset stored in the HDF5 file at the file path
<top_level_data_dir>/ml_datasets/ml_datasets_for_ml_model_test_set_1/ml_datasets_with_cbed_patterns_of_MoS2_on_amorphous_C/ml_dataset_with_<disk_size>_sized_disks.h5
,
with the output having being saved in an HDF5 file generated at the file path
<top_level_data_dir>/rgm_test_set_1_results/results_for_cbed_patterns_of_MoS2_on_amorphous_C_with_<disk_size>_sized_disks/rgm_testing_summary_output_data.h5
,
where <top_level_data_dir>
is
<root>/examples/modelling/cbed/distortion/estimation/data
. Each HDF5 file
is guaranteed to contain the following HDF5 objects:
path_to_ml_testing_dataset: <HDF5 1D dataset>
total_num_ml_testing_data_instances: <HDF5 0D dataset>
ml_data_instance_metrics: <HDF5 group>
testing: <HDF5 group>
epes_of_adjusted_distortion_fields <HDF5 1D dataset>
dim_0: “ml testing data instance idx”
Note that the sub-bullet points listed immediately below a given HDF5 dataset
display the HDF5 attributes associated with said HDF5 dataset. Some HDF5
datasets have attributes with names of the form "dim_{}".format(i)
with
i
being an integer. Attribute "dim_{}".format(i)
of a given HDF5 dataset
labels the i
th dimension of the underlying array of the dataset.
The HDF5 dataset at the HDF5 path "/path_to_ml_testing_dataset"
stores the
path, as a string, to the ML testing dataset used for the test.
The HDF5 dataset at the HDF5 path
"/ml_data_instance_metrics/testing/epes_of_adjusted_distortion_fields"
stores the end-point errors (EPEs) of the “adjusted” standard distortion fields
specified by the predicted standard coordinate transformation parameter sets,
during testing. For every nonnegative integer m
less than the the total
number of ML testing data instances, the m
th element of the aforementioned
HDF5 dataset is the EPE of the adjusted standard distortion field specified by
the m
th predicted standard standard coordinate transformation set, during
testing. See the summary documentation of the class
emicroml.modelling.cbed.distortion.estimation.MLModelTrainer
for a
definition of an adjusted standard distortion field, and how the EPE is
calculated exactly.
In executing the action described at the beginning of the current page, multiple
scripts are executed. The particular scripts that are executed depend on the
command line arguments of the parent Python script introduced at the beginning
of this page. If <use_slurm>
equals yes
, then the following scripts are
executed in the order that they appear directly below:
<root>/examples/modelling/cbed/distortion/estimation/scripts/execute_action.py
<root>/examples/modelling/cbed/distortion/estimation/scripts/run_rgm_test_set_1/execute_all_action_steps.py
<root>/examples/modelling/cbed/distortion/estimation/scripts/run_rgm_test_from_set_1/execute_all_action_steps.py
<root>/examples/modelling/cbed/distortion/estimation/scripts/run_rgm_test_from_set_1/prepare_and_submit_slurm_job.sh
<root>/examples/modelling/cbed/distortion/estimation/scripts/run_rgm_test_from_set_1/execute_main_action_steps.py
Otherwise, if <use_slurm>
equals no
, then the fourth script, i.e. the
one with the basename prepare_and_submit_slurm_job.sh
is not executed. See
the contents of the scripts listed above for implementation details. Lastly, if
the action is being performed as multiple SLURM jobs, then the default
sbatch
options, which are specified in the file with the basename
prepare_and_submit_slurm_job.sh
, can be overridden by following the
instructions in this page.