Loading the MATLAB module
The current default MATLAB version on the HEC is 2013a. To use MATLAB, the MATLAB compiler, or standalone executables created with the MATLAB compiler, you must first load the matlab module via the following command:
module add matlab
In normal operation, MATLAB instances require a permanent connection to the University's MATLAB license server. Long running jobs may be at risk from occasional network outages, so it is recommended to compile MATLAB workloads into standalones as these are not subject to licence counts.
For a full description of how to use the MATLAB compiler, please refer to Mathworks' MATLAB Compiler Users Guide. The rest of this section gives a brief guide to creating and running standalones on the HEC.
Step 1: Code Preparation
Script M-files cannot be compiled directly. Instead, they must be converted into function M-files. Normally, this is simply a case of wrapping the main section of code within a function. See the Converting Script M-Files section of the MATLAB Compiler Getting Started guide for more details.
Step 2: Compiling
The MATLAB compiler tool is mcc. It can be invoked from within MATLAB itself, or on the command line:
mcc -R -singleCompThread -m myscript.m
If you have more than one M-file, list them all at the end of mcc command, ensuring that the 'main' M-File is the first file listed.
Compilation will result in a number of files — the standalone itself will be named after the first m-script file use in compilation (in the example above, it will be named myscript). Note that a standalone wrapper script will also be created, using the m-file name prefixed by run (so in the example above, the would be run_myscript.sh). This file can ignored when using the HEC — its purpose is to enable the standalone to be run in an environment where MATLAB isn't installed.
Step 3: Running the application
MATLAB standalone applications also require the MATLAB module to be loaded from within a batch job script. Once that's loaded, the standalone can be run just the same as any executable.
Most of the time, though, you'll want to run these as batch job scripts.
The MATLAB compiler generates a fair amount of output, with some error and warning messages. The section below is an example of a compiling and running of a simple MATLAB standalone, so you know what to expect. The script itself takes a single number from the command line, doubles it and returns the value.
The program is a simple two-line MATLAB M-File:
> cat twotimes.m function twotimes (x) 2 * str2num(x)
Now to compile it:
mcc -R -singleCompThread -m twotimes.m
To run it as a standalone from the command line (this is only recommended for testing):
>./twotimes 3.1 ans = 6.2000
And finally, a sample job script to run two times:
#$ -S /bin/bash #$ -N matlab_job #$ -l h_vmem=2.5G source /etc/profile module add matlab/2013a export MCR_CACHE_ROOT="$TMPDIR/mcrCache" mkdir -p $MCR_CACHE_ROOT ./twotimes 3.1
There are two additional lines here compared to normal job scripts:
export MCR_CACHE_ROOT="$TMPDIR/mcrCache" mkdir -p $MCR_CACHE_ROOT
These lines direct the standalone to use a different directory to unpack itself into, unique for each job run. This prevents concurrency issues that can occur when multiple standalone jobs launch simultaneously. The directory is created within a unique temporary directory created by the job scheduler, and will be cleaned up automatically when the job finishes.
It is strongly recommended to run a standalone using the exact same version of MATLAB that was used to create it. Best practice is to use the full MATLAB/version name of the MATLAB module you used to create it rather than just the default MATLAB, as the version chosen by default is likely to change as newer versions are released.
Modern versions of MATLAB come with multi-threaded support which allow them to make use of multiple cores on a single machine. Any function MATLAB identifies as being able to benefit from parallelisation will be parallelised. While this is a very handy feature for desktop use, in a multi-user environment this can cause problems, as MATLAB neither informs the user if their application will run multi-threaded nor indicates how many threads (cores) it is best to use.
For this reason, we recommend that the default for MATLAB is the use of the -singleCompThread argument as above, which limits MATLAB to a single thread for computational tasks. This prevents MATLAB jobs inadvertently running multi-threaded on compute nodes with other users jobs and grabbing more than their fair share of CPU resource.
Advanced users may wish to try testing the benefits of multi-threaded versions of their jobs. The following is an example jobs script for a MATLAB standalone, which assumes that the standalone benefits from 16 core multi-threaded use and has been compiled without the -singleCompThread option:
#$ -S /bin/bash #$ -N matlab_job #$ -q parallel #$ -l nodes=1 source /etc/profile module add matlab export MCR_CACHE_ROOT="$TMPDIR/mcrCache" mkdir -p $MCR_CACHE_ROOT cd matlab_files ./twotimes 3.1
Always test multi-threaded runs against a single-threaded version to ensure that a particular type of job can make good use of multiple cores. Using 16 cores to obtain only a 2x speedup would very inefficient, and reduces the amount of CPU resource available for other HEC users.
MATLAB users whose scripts call system() or unix() should take care when submitting batch jobs. If the unix command to be called does not already use standard input redirection, the command should redirect standard input from the special device /dev/null. For example, a simple command such as:
unix ('gzip myfile')
should instead be written as:
unix ('gzip myfile < /dev/null')
Many unix commands when they encounter unusual circumstances will prompt the user for input — but only if they believe they are being used in an interactive context. When not used in an interactive context, most of these tools will take a default safe option without prompting, and then exit. A bug in MATLAB fools these commands into believing they are being used in an interactive context. In an HEC batch context, however, no user input is possible, and so a call which requires user input will cause the job to hang indefinitely.
In the above gzip example, if a filename matching the gzipped version's name already exists, gzip will prompt an interactive user if they wish to overwrite the file. The default action for a non-interactive session is for gzip to chose not to overwrite the file and exit.