3.1.2.7. Running the RGM test set #1

In this example, we perform the “action” of running “first” set of tests of the radial gradient maximization (RGM) approach to the distortion estimation of CBED patterns. The RGM tests use the machine learning (ML) datasets generated from the action described in this page.

NOTE: Users are advised to read the remainder of the current page in its entirety before trying to execute this action.

To execute the action, first we need to change into the directory <root>/examples/modelling/cbed/distortion/estimation/scripts, where <root> is the root of the emicroml repository. Then, we need to run the Python script ./execute_action.py via the terminal command:

python execute_action.py --action=<action> --use_slurm=<use_slurm>

where <action> must be equal to run_rgm_test_set_1, and <use_slurm> is either yes or no. If <use_slurm> equals yes and a SLURM workload manager is available on the server from which you intend to run the script, then the action will be performed as multiple SLURM jobs. If <use_slurm> is equal to no, then the action will be performed locally without using a SLURM workload manager.

If the action is to be performed locally without using a SLURM workload manager, then prior to executing the above Python script, a set of Python libraries need to be installed in the Python environment within which said Python script is to be executed. See this page for instructions on how to do so. If the action is being performed as multiple SLURM jobs, then prior to executing any Python commands that do not belong to Python’s standard library, a customizable sequence of commands are executed that are expected to try to either activate an existing Python virtual environment, or create then activate one, in which the Python libraries needed to complete the action successfully are installed. See this page for instructions how to customize the sequence of commands.

The action described at the beginning of the current page takes automatically as input data output data generated by the action described in the page Combining machine learning datasets for the machine learning model test set #1, hence one must execute the latter action first, prior to the former.

At this point, it is worth noting that every ML data instance stored in every valid ML dataset encodes data about a “fake” CBED pattern. See the documentation for the class fakecbed.discretized.CBEDPattern for a full discussion on fake CBED patterns and the context relevant to the discussion below. In constructing a fake CBED pattern, one needs to specify a set of circular, undistorted CBED disk supports, defined in \(\left(u_{x},u_{y}\right)\)-space, sharing a common disk radius, with their centers being specified in \(\left(u_{x},u_{y}\right)\) coordinates. One also needs to specify a distortion field to construct a fake CBED pattern. The supports of the distorted CBED disks that appear in the fake CBED pattern are obtained by distorting the aforementioned set of undistorted disk supports according to the aforementioned distortion field. Let \(\left\{ \left(u_{x;c;\text{C};i},u_{y;c;\text{C};i}\right)\right\}_{i}\) be the \(\left(u_{x},u_{y}\right)\) coordinates of the undistorted disk support centers, and \(\left\{ \left(q_{x;c;\text{C};i},q_{y;c;\text{C};i}\right)\right\}_{i}\) be the corresponding coordinates in \(\left(q_{x},q_{y}\right)\)-space according to the distortion field. The RGM approach to estimating the distortion of the fake CBED pattern can be described as follows:

1. Use the RGM technique, described in Ref. [Mahr1] to estimate the subset of \(\left\{ \left(q_{x;c;\text{C};i},q_{y;c;\text{C};i}\right)\right\}_{i}\) corresponding to the distorted CBED disks in the fake CBED pattern that are not clipped.

2. Determine iteratively via non-linear least squares the distortion field that minimizes the mean-square error of the estimated coordinates in step 1.

Upon successful completion of the action described briefly at the beginning of the current page, for every string <disk_size> in the sequence (small, medium, large), the RGM approach will have been tested against the ML testing dataset stored in the HDF5 file at the file path <top_level_data_dir>/ml_datasets/ml_datasets_for_ml_model_test_set_1/ml_datasets_with_cbed_patterns_of_MoS2_on_amorphous_C/ml_dataset_with_<disk_size>_sized_disks.h5, with the output having being saved in an HDF5 file generated at the file path <top_level_data_dir>/rgm_test_set_1_results/results_for_cbed_patterns_of_MoS2_on_amorphous_C_with_<disk_size>_sized_disks/rgm_testing_summary_output_data.h5, where <top_level_data_dir> is <root>/examples/modelling/cbed/distortion/estimation/data. Each HDF5 file is guaranteed to contain the following HDF5 objects:

path_to_ml_testing_dataset: <HDF5 1D dataset>
total_num_ml_testing_data_instances: <HDF5 0D dataset>

ml_data_instance_metrics: <HDF5 group>
- testing: <HDF5 group>
  - epes_of_adjusted_distortion_fields <HDF5 1D dataset>
    - dim_0: “ml testing data instance idx”

Note that the sub-bullet points listed immediately below a given HDF5 dataset display the HDF5 attributes associated with said HDF5 dataset. Some HDF5 datasets have attributes with names of the form "dim_{}".format(i) with i being an integer. Attribute "dim_{}".format(i) of a given HDF5 dataset labels the i th dimension of the underlying array of the dataset.

The HDF5 dataset at the HDF5 path "/path_to_ml_testing_dataset" stores the path, as a string, to the ML testing dataset used for the test.

The HDF5 dataset at the HDF5 path "/ml_data_instance_metrics/testing/epes_of_adjusted_distortion_fields" stores the end-point errors (EPEs) of the “adjusted” standard distortion fields specified by the predicted standard coordinate transformation parameter sets, during testing. For every nonnegative integer m less than the the total number of ML testing data instances, the m th element of the aforementioned HDF5 dataset is the EPE of the adjusted standard distortion field specified by the m th predicted standard standard coordinate transformation set, during testing. See the summary documentation of the class emicroml.modelling.cbed.distortion.estimation.MLModelTrainer for a definition of an adjusted standard distortion field, and how the EPE is calculated exactly.

In executing the action described at the beginning of the current page, multiple scripts are executed. The particular scripts that are executed depend on the command line arguments of the parent Python script introduced at the beginning of this page. If <use_slurm> equals yes, then the following scripts are executed in the order that they appear directly below:

<root>/examples/modelling/cbed/distortion/estimation/scripts/execute_action.py <root>/examples/modelling/cbed/distortion/estimation/scripts/run_rgm_test_set_1/execute_all_action_steps.py <root>/examples/modelling/cbed/distortion/estimation/scripts/run_rgm_test_from_set_1/execute_all_action_steps.py <root>/examples/modelling/cbed/distortion/estimation/scripts/run_rgm_test_from_set_1/prepare_and_submit_slurm_job.sh <root>/examples/modelling/cbed/distortion/estimation/scripts/run_rgm_test_from_set_1/execute_main_action_steps.py

Otherwise, if <use_slurm> equals no, then the fourth script, i.e. the one with the basename prepare_and_submit_slurm_job.sh is not executed. See the contents of the scripts listed above for implementation details. Lastly, if the action is being performed as multiple SLURM jobs, then the default sbatch options, which are specified in the file with the basename prepare_and_submit_slurm_job.sh, can be overridden by following the instructions in this page.