Cluster

Overview

 

Hardware

The Reslab cluster consists of 37 machines, named clsnn001-clsnn030 and digilab01-digilab07. They are running Ubuntu 10.04 on the following hardware:

clsnn*:

  • Intel Core2 quadcore @ 2.83 GHz
  • 8GiB of memory
  • 80GB harddrive

 

digilab*:

  • Intel Core2 quadcore @ 2.83 GHz
  • 4 GiB of memory
  • 160 GB harddrive

 

Software

 

Anaconda

 

Anaconda is a self-contained Python installation with many scientific packages included, in particular a NumPy compiled with the multi-threaded Intel MKL numerical computation libraries. It can be found under /mnt/storage/software/anaconda. The executable is /mnt/storage/software/anaconda/bin/python. The commands apython and aipython (for ipython) are available as shortcuts on all clusternodes. Alternatively, add /mnt/storage/software/anaconda/bin/ to your $PATH variable in your bash_profile. If you are running Python code on the cluster, using Anaconda is strongly recommended, because it is considerably faster for numerical operations and it ensures that all available cores are used.

If you were previously using EPD (see below), switching to anaconda is recommended because it features newer versions of almost all packages. Especially the versions of numpy and scikit-learn are much more recent.

Enthought Python distribution (EPD)

Like Anaconda, EPD is also a self-contained Python installation with many scientific computing packages included. However, it is no longer being updated. It can be found under /opt/epd-7.3-2. The executable is /opt/epd-7.3-2/bin/python. The commands epython and eipython (for ipython) are available as shortcuts on all cluster nodes. Alternatively, add /opt/epd-7.3-2/bin/ to your $PATH variable in your bash_profile. 

 

Shared file storage

Two drives are available for shared file storage: /mnt/snn_gluster and /mnt/storage. They are accessible from all cluster nodes. Note that these drivers are not backed up, you are personally responsible for keeping backup copies of your files. If you wish to mount these locally, you are strongly advised to do so at the same mountpoint.  This will make your scripts more portable and avoid problems when submitting jobs from your local machine.

 

Glusterfs (/mnt/snn_gluster)

A shared filesystem is available to all nodes under /mnt/snn_gluster offering a total of close to 1TiB of disk space distributed over the clsnn* nodes with 2x replication.  The filesystem can be mounted on your local machine using either the glusterfs client or NFS. With glusterfs, the filesystem can be mounted as follows (provided that glusterfs is installed):

sudo mount -t glusterfs thepiratebay:/snn_gluster /mnt/snn_gluster

Alternatively, you can mount the filesystem through NFS with

sudo mount -t nfs -o nfsvers=3 thepiratebay:/snn_gluster /mnt/snn_gluster

Every cluster user has a directory under /mnt/snn_gluster/users/loginname

 

Users with Mac laptops can use Disk Utility to automatically mount the glusterfs when connecting to the ELIS network. Go to File->NFS mounts, add a new mount with settings:

- Remote NFS URL: nfs://thepiratebay/snn_gluster
- Mount location: /Network/snn_gluster
- Advanced mount parameters : noasync noowners nosuid sync noatime resvport noac locallocks nolocks no_root_squash

Choose Verify, then click Cancel Verify, and Save. Then, create a symbolic link:

sudo mkdir /mnt
sudo ln -s /Network/snn_gluster /mnt/snn_gluster

 

NFS storage (/mnt/storage)

A shared 3TB drive is available to all nodes under /mnt/storage. This drive is located in the clsnn101 file server. This drive can be mounted as follows:

sudo mount -t nfs4 clsnn101:/storage /mnt/storage

Or on Mac OS X:

sudo mount -o vers=4.0alpha clsnn101:/storage /Network/storage

 

Usage

The cluster is managed using Torque, which handles job scheduling. thepirateby is the machine on which the scheduler runs.

You can install the Torque client on your local machine, or alternatively you can log in to thepiratebay to submit and manage your jobs. Binary packages are available for Ubuntu, for OS X 10.6 download the tarball here, extract and issue:

./configure --disable-gcc-warnings
make
sudo make install

After installing, change the contents of the server_name file (in /var/lib/torque/ for Ubuntu, /var/spool/torque for OS X) to thepiratebay.elis.ugent.be. Check if you can see the compute nodes using 

qnodes -a
 
Submitting a job

If that works, the next step is to submit a simple job. Jobs consist of bash scripts, wherein you can call whichever executable you need (python, matlab, ...). Standard output and standard error for each job is collected and copied to the directory from which you submit the job. Since the jobs do not have access to your kerberos tokens, and by extension to the AFS filesystem, all file transfer happens over the gluster filesystem (see above). So, you should only submit jobs from your user directory on the gluster fs, otherwise you will not be able to view the stdo and stderr streams. 

Here is a simple test job. It prints the time to standard out, sleeps for 10 seconds and prints the time again. If you saved this script above to /mnt/snn_gluster/users/your_login, you can submit a single job with the qsub command, using

qsub test_job

When successful, this command prints a jobname to the screen. Around ten seconds later, two files should be added to your directory of the form jobname.o* and jobname.e*, containing the standard output and standard error streams of your job.

Next step is to try to submit multiple jobs, using

qsub -t 0-9 test_job

You can query the status of your jobs in the queue using 

qstat

Jobs are started with /home/loginname as the working directory where they don't have tokens and cannot read or write files.  If your script does file I/O of its own, it is a good idea to specify the working directory of the job by adding a line to the job submit script similar to:

#PBS -d /mnt/snn_gluster/users/loginname

or to always use absolute paths.

 
Jobs with high memory requirements

If your job requires a machine with more than 4GiB of RAM, and should not run on the digilab* nodes, you can add the nodes=1:bigmem property when submitting, for example: 

qsub -t 0-9 -l nodes=1:bigmem test_job

Alternatively, you can add the line

#PBS -l nodes=1:bigmem

to your jobscript.

 
Long jobs

Every job has a default 'walltime' of 72 hours. The walltime is the predicted running time (in real hours, not CPU hours) of the job. If the job exceeds this walltime it will get killed. This mechanism allows the resource manager to free up resources allocated to jobs which are hanging or have been killed without notifying the resource manager. If you do have a job that runs longer, you can assign a longer walltime e.g. 100 hours:

qsub -l walltime=100:0:0 jobname jobname 
 
Running Python scripts

Within a job script you can call the Python executable with the Python script as a command line parameter. The environment variable $PBS_ARRAYID is available in the bash script, which you can then use for instance to index a list of parameter values. See the documentation here for more options you can give to the qsub command. 

As an example, here is a python script which simply prints the command line parameters, and an accompanying job script. Save these in your user directory on gluster and submit them with 

qsub -t 0-9 test_python

and check the output with 

cat test_python.o*

Note that the full path to both the Python executable and the python script should be given in the job script. 

 
Submitting jobs from outside the ELIS network

Unfortunately, it is not possible to mount the glusterfs or submit/monitor jobs from outside the ELIS network (not even with VPN/SSH tunnels, too many ports are involved in the communication). However, all compute nodes (clsnn*) can also be used to submit jobs, so you can ssh into ssh.elis.ugent.be and from there onto one of the clsnn* nodes. 

 

Submitting jobs with Python (and using CMA-ES)

It is possible to use the cluster from within Python without writing job scripts and manually launching qsub. Please check out the mercurial repository at /mnt/snn_gluster/pbs_python and refer to the README for more information and sample code.

 

Some reasons jobs may fail
  • If you import matplotlib, you might get the error: RuntimeError: '/home/login' is not a writable dir; you must set /home/login/.matplotlib to be a writable dir. This is caused by matplotlib wanting to write configuration settings to your home drive which is not accessible because of a lack of tokens. The solution is to add the line:
export MPLCONFIGDIR=/tmp

at the beginning of your job script.

  • If your scripts give 'Permission denied' errors when writing to a directory on the glusterfs: the scripts run under your username (and more importantly, your UID) of the ELIS network. If you mounted the glusterfs locally and created the directory there, this is probably due to a mismatch between your local UID and the ELIS UID, meaning the scripts cannot write in the directory. The solution is to erase the directory and re-create it on one of the clustermachines.

 

Monitoring

Use qstat to view the current job queue. Use qstat -n to also list the machines on which the currently running jobs are being executed. qnodes lists the available machines. qnodes -l lists the machines that are currently down or offline. The digilab* machines are often offline because they are regularly in use for courses. Various system parameters such as load and memory usage are monitored by Munin. You can find an overview of the cluster usage at munin.elis.ugent.be.

 

HPC Ugent grid

Another grid also available to researchers is Ghent University's own High Performance Computing (HPC) grid. Its homepage with background, infrastructure, and support details can be found here. Also, the presentation slides found here contain a lot of essential material.  

More importantly for new users, there also exists a wiki containing a lot of important information including how to request a new account,  how to access the clusters, and how to run jobs scripts. It can be found here and is strongly recommended you read it and obtain a VSC account before continuining with the demo below. 

 

Running an array job on the HPC

The following demo shows how to run a simple job array script over the Ugent HPC, store data and look at the stdoutput and stderror.

For reference, when you signup for a new account you will be assigned a username, vscxxxxx, where xxxxx is your unique ID number. Also, you also are assigned disk space in three different locations, which can be accessed by any machine on the grid.

– VSC_HOME: 3GB, home directory

– VSC_DATA: 25GB, long term storage

– VSC_SCRATCH: 25GB, input/output and temporary files 

 

Running the demo

This demo holds a small Python package which creates random input, run it through five different reservoirs, and stores the reservoir states to disk. Each reservoir is created using a unique random seed.

The package contains:

  • pre_reservoir_grid.py - creates the input and five seeds for the five different reservoirs and writes them to disk.
  • run_reservoir_grid.sh - bash script which loads the Enthought python module and runs
  • reservoir_grid.py - takes the written experiment data, finds the seed unique to the job, creates a reservoir, runs the input through it, and saves the reservoir states to disk.

To run the demo, first download reservoir_hpc.zip (found at bottom of this page) and unzip it.

Next, copy the reservoir_hpc directory over to the HPC grid using scp.

$  scp -r reservoir_hpc vscXXXXX@gengar.ugent.be:./

This copies the code to your $VSC_HOME directory on the grid. Here XXXXX is to be replaced with your own user number. 

Then ssh in to the HPC login nodes. 

$ ssh vscXXXXX@gengar.ugent.be

Create directories to hold the job errors, output, and calculated reservoir states.

$ mkdir $VSC_SCRATCH/error  $VSC_SCRATCH/output  $VSC_SCRATCH/states

Change to the reservoir directory and load the Enthough distribution locally (this is needed because pre_reservoir_grid uses numpy).

$ cd reservoir_grid
$ module EPD/7.2-1-rh5

Run reservoir_grid.py

$ python pre_reservoir_grid.py

This creates a file $VSC_SCRATCH/reservoir_data which contains the input and paramters for the different jobs.

Then run the qsub command which will run the five jobs across the grid.

$ qsub -N reservoir_hpc -o $VSC_SCRATCH/output/output -e $VSC_SCRATCH/error/error 
-m abe -l nodes=1:ppn=1,walltime=0:00:05 -t 0-4 run_grid_reservoir.sh

 

Looking at this command in detail:

-N reservoir_hpc, specifices the name of the array job

-o $VSC_SCRATCH/output/output, stdout of the job

-e $VSC_SCRATCH/error/error, stderr of the job

-m abe mailing options, in this case, mail the user when the job aborts (a), begins (b), and ends (e).

-l nodes=1:ppn=1,walltime=0:00:05 specifies to run this job on one core of one machine for a maximum of 5 minutes.

-t 0-4 specifies to run the script 5 times with job ids 0,1,2,3, and 4.

run_grid_reservoir.sh the job script to run, just used to load the Enthought module and start reservoir_grid.py

 

Job results

If the job has run successfully then the command 

$ cat $VSC_SCRATCH/output/*

will output:

JOB_ID: 1 reservoir states written
JOB_ID: 2 reservoir states written
JOB_ID: 3 reservoir states written
JOB_ID: 4 reservoir states written

and running

$ ls $VSC_SCRATCH/states/*

outputs

states_job_0  states_job_1  states_job_2  states_job_3  states_job_4

If there were any problems during execution you can check the error files with:

$ cat $VSC_SCRATCH/error/*