Salk



System Configuration

Hardware

Salk is an SGI Altix 4700 shared-memory NUMA system comprising 36 blades. Each blade holds 2 Itanium2 Montvale 9130M dual-core processors. The four cores on a blade share 8 Gybtes of local memory. The processors are connected by a NUMAlink interconnect. Through this interconnect the local memory on each processor is accessible to all the other processors. Each processor runs an enhanced version of the SuSE Linux operating system.

There are multiple frontend processors, which are also Itanium2 processors and which run the same version of SuSE Linux as the compute processors. You login to one of these frontend processors, not to the compute processors.

Software

The Intel C, C++ and Fortran compilers and the Gnu C and C++ compilers are installed on salk, as are the facilities to enable you to run OpenMP, MPI and hybrid OpenMP and MPI programs.

Access

Connecting

To connect to salk you must ssh to salk.psc.edu. When you are prompted for a password enter your PSC Kerberos password.

Changing your password

Use the kpasswd command to change your PSC Kerberos password, not the passwd command. You have the same password on all PSC production platforms. If you change your password on one PSC system using kpasswd you change it on all other PSC systems.

You must change your salk password within 30 days of the date on your initial password form or your password will be disabled. We will also disable your password if you do not change it at least once a year. We will send you an email warning you that your password is about to be disabled in the latter case. See the PSC password policies for more information. If your password is disabled send email to remarks@psc.edu to have it reset.

Changing your login shell

You can use the chsh command to change your login shell. When doing so, specify a shell from the /usr/psc/shells directory.

Storing Files

File Systems

File systems are file storage spaces directly connected to a system. There are currently two such areas available to you on salk.

$HOME

This is your home directory. Your $HOME directory has a 5-Gbyte quota. $HOME is visible to all of salk's compute and frontend processors. $HOME is backed up daily, although it is still a good idea to store your important $HOME files to golem. Golem, PSC's file archival system, is discussed below.

$SCRATCH

This is salk's scratch area to be used as a working space for your running jobs. $SCRATCH is visible to all of salk's compute and frontend processors. You should use the name $SCRATCH to refer to your scratch area since we may change its implementation.

$SCRATCH is not a permanent storage space. Files can only remain on $SCRATCH for up to 7 days and then we will delete them. In addition, we will delete $SCRATCH files if we need to free up space to keep jobs running. Finally, $SCRATCH is not backed up. For these three reasons, you should store copies of your $SCRATCH files to your local site or to golem as soon as you can after you create them. Golem, PSC's file archival system, is discussed below.

File Repositories

File repositories are file storage spaces which are not directly connected to a frontend or compute processor. You cannot, for example, open a file that resides in a file repository. You must use explicit file copy commands to move files to and from a repository. You currently have one file repository available to you on salk: golem, PSC's file archival system.

golem

Golem is a combination tape-and-disk archival system. The far program should be used to tranfer files between golem and salk. You should transfer files between golem and salk outside of your batch jobs. Otherwise your jobs will be holding compute processors while your files are being transferred. You can use scp or kftp to transfer files between golem and your remote machine. If you need to store a file to golem that is 2 Tbytes or larger first send email to remarks@psc.edu so that special arrangements can be made to store your file.

Transferring Files

You can use either the scp or the kftp program to transfer files between your remote machine and salk and between your remote machine and golem. Which method will perform better varies based on location. Therefore you should try both approaches and see which performs better for you. If you want assistance in improving the performance of your file transfers send email to remarks@psc.edu.

Creating Programs

The Intel C, C++ and Fortran compilers and the Gnu C and C++ compilers are installed on salk and they can be used to create OpenMP, MPI, hybrid and serial programs. The commands you should use to create each of these types of programs are shown in the table below.

OpenMP MPI Hybrid Serial
Intel Fortran ifort -openmp myopenmp.f ifort mympi.f -lmpi ifort -openmp myhybrid.f -lmpi ifort myserial.f
Intel C icc -openmp myopenmp.c icc mympi.c -lmpi icc -openmp myhybrid.c -lmpi icc myserial.c
Intel C++ icpc -openmp myopenmp.cc icpc mympi.cc -lmpi -lmpi++ icpc -openmp myhyrid.cc -lpmi -lmpi++ icpc myserial.cc
Gnu C gcc -openmp myopenmp.c gcc mympi.c -lmpi gcc -openmp myhybrid.c -lmpi gcc myserial.c
Gnu C++ g++ -openmp myopenmp.cc g++ mympi.cc -lmpi -lmpi++ g++ -openmp myhybrid.cc -limpi -lmpi++ g++ myserial.cc

Man pages are available for ifort, icc and icpc and for gcc and g++.

The UPC compiler is also installed on salk. Online instructions for its use are available.

Running Jobs

Queue structure

Torque, an open source version of the Portable Batch Scheduler (PBS), controls all access to salk's compute processors, for both batch and interactive jobs. Currently salk has two queues: the batch queue and the debug queue. Interactive jobs can run in the batch queue and the debug queue and the method for doing so is discussed below.

The maximum walltime for the batch queue is 24 hours and the maximum number of cores you can request is 132. The maximum walltime for the debug queue is 30 minutes and the maximum number of cores you can request is 8.

We plan to create several other queues to meet user needs. If you would like to make a suggestion about salk's queue structure send email to remarks@psc.edu.

Scheduling policies

The batch and debug queues are currently FIFO queues with mechanisms in place to prevent a single user from dominating either queue. We will modify the scheduling policies on salk to meet user needs. If you have suggestions or comments about the scheduling policies on salk or find that they do not meet your needs send email to remarks@psc.edu.

Sample batch jobs

To run a batch job on salk you submit a batch script to the PBS system. A PBS job script consists of PBS directives, comments and executable commands. The last line of your batch script must end with a newline.

A sample job script to run an OpenMP program is

#!/bin/csh
#PBS -l nodes=1:ppn=4
#PBS -l walltime=5:00
#PBS -j oe
#PBS -q batch

set echo

ja

#move to my $SCRATCH directory
cd $SCRATCH

#copy executable to $SCRATCH
cp $HOME/myopenmp .

#run my executable
setenv OMP_NUM_THREADS 4
./myopenmp

ja -chlst

The first line in the script cannot be a PBS directive. Any PBS directive in the first line is ignored. Here, the first line identifies which shell should be used for your batch job.

The next four lines are PBS directives.

#PBS -l nodes=1:ppn=4

This directive specifies the number of cores to allocate for the job. For performance reasons the actual allocation of cores is done by blades, with each blade containing four cores. You must request cores in multiples of four. Jobs do not share blades.

In this directive the value of nodes is always '1'. The value of ppn is the number of cores requested. The number of cores must be a multiple of four, or the job will fail. Within your batch script the environment variable PBS_PPN is set to the number of cores you requested.

Each blade has 8 Gbytes of physical memory. If your job exceeds the amount of physical memory available to it--a job requesting 16 cores will run on 4 blades and thus have 32 Gbytes of memory available to it--it will be killed by the system with a message similar to

    PBS: Job killed: cpuset memory_pressure X reached/exceeded limit Y

written to its stderr. In this message 'X' and 'Y' are integers.

If this happens to your job you should resubmit it and ask for more cores. The output from the ja command, which is discussed below, can help you determine how many blades your job needs.

#PBS -l walltime=5:00

The second directive requests 5 minutes of walltime. Specify the time in the format HH:MM:SS. At most two digits can be used for minutes and seconds. Do not use leading zeroes in your walltime specification.

#PBS -j oe

The next directive combines your .o and .e output into one file, in this case your .o file. This makes your job easier to debug.

Your stdout and stderr files are each limited to 20 Mbytes. If your job exceeds either of these limits it will be killed by the system. If you have a program that you think will exceed either of these limits you should redirect your stdout or stderr output to a $SCRATCH file.

#PBS -q batch

The final PBS directive requests that your job be run in the batch queue.

The remaining lines in the script are comments and command lines.

set echo

This command causes your batch output to display each command next to its corresponding output. This makes your job easier to debug. If you are using the Bourne shell or one of its descendants use

set -x

instead.

ja

The ja command turns on job accounting for your job. This allows you to obtain information on the elpased time and memory and IO usage of your program, plus other data.

You must pair the command with another ja command at the end of your job. The option -t to this second ja command turns off job accounting and writes your accounting data to stdout. The other options to the second example ja command determine what output you will receive from ja. We recommend these options because we think they will provide detailed but useful information about your job's processes. However, you can look at the man page for ja to see what reporting options you want to use.

There is no overhead to using ja. We strongly recommend that you use ja so you can understand the resource usage of your jobs, which you can use when you submit future jobs. The output from ja can also be used for debugging and performance improvement purposes.

Comment lines

The other lines in the sample script that begin with '#' are comment lines. The '#' for comments and PBS directives must be in column one of your scripts.

setenv OMP_NUM_THREADS 4

This command sets the number of threads you want your OpenMP program to use. You should set this value to the number of cores you requested with your PBS nodes directive so each of your threads will run on its own core.

./myopenmp

This command runs your executable.

A sample job to run an MPI program is

#!/bin/csh
#PBS -l nodes=1:ppn=4
#PBS -l walltime=5:00
#PBS -j oe
#PBS -q batch

set echo

ja

#move to my $SCRATCH directory
cd $SCRATCH

#copy executable to $SCRATCH
cp $HOME/mympi .

#run my executable
mpirun -np 4 ./mympi

ja -chlst

This script is identical to the OpenMP script except when you run your executable. You do not have to set the variable OMP_NUM_THREADS, but you have to use the mpirun command to launch your executable on salk's compute processors. The value for the -np option is the number of cores you want your program to run on. You should set -np to the number of cores you requested with your PBS nodes directive. You must use mpirun to run your MPI executable or it will run on a frontend and degrade overall system performance.

A sample job to run a hybrid OpenMP and MPI program is

#!/bin/csh
#PBS -l nodes=1:ppn=64
#PBS -l walltime=5:00
#PBS -j oe
#PBS -q batch

set echo

ja

#move to my $SCRATCH directory
cd $SCRATCH

#copy executable to $SCRATCH
cp $HOME/myhybrid .

#run my executable
mpirun -np 16 omplace -nt 4 ./myhybrid

ja -chlst

This script is identical to the above two scripts except when you run your executable. You use a combination of the mpirun and omplace commands to run your hybrid program. The value of the -np option to the mpirun command is the number of your MPI tasks. The value of the -nt option to the omplace command is the number of your OpenMP threads per MPI task. The product of these two values should be the total number of cores you requested with your PBS nodes specification.

The omplace command insures that each of your OpenMP threads runs on its own core. You must use mpirun to run your hybrid executable or it will run on a frontend and degrade overall system performance.

Qsub command

After you create your batch script you submit it to PBS with the qsub command.

    qsub myscript.job

Your batch output--your .o and .e files--is returned to the directory from which you issued the qsub command after your job finishes.

You can also specify PBS directives as command-line options. Thus, you could omit the PBS directives from the above sample scripts and submit the scripts with the command

    qsub -l nodes=1:ppn=4 -l walltime=5:00 -j oe -q batch myscript.job

Command-line directives override directives in your scripts.

Interactive access

A form of interactive access is available on salk by using the -I option to qsub. For example, the command

    qsub -I -l nodes=1:ppn=4 -l walltime=5:00 -q debug

requests interactive access to 4 cores for 5 minutes in the debug queue. Your qsub -I request will wait until it can be satisfied. If you want to cancel your request you should type ^C.

When you get your shell prompt back your interactive job is ready to start. At this point any commands you enter will be run as if you had entered them in a batch script. Stdin, stdout, and stderr are connected to your terminal. To run an MPI or hybrid program you must use the mpirun command just as you would in batch script.

When you finish your interactive session type ^D. When you use qsub -I you are charged for the entire time you hold your processors whether you are computing or not. Thus, as soon as you are done executing commands you should type ^D.

Using the module command in a batch script

Depending on your login shell and the shell you use in your batch script you may have to make changes to your batch script if you want to use the module command in your batch script.

If your login shell is csh and your batch script uses csh as its shell then if you need to use the module command in your batch script you must include the commands

    source /usr/share/modules/init/csh
    source /etc/csh.cshrc.psc

in your batch script after your PBS specifications. If you use tcsh as your login shell and as your batch shell you must include the commands

    source /usr/share/modules/init/tcsh
    source /etc/csh.cshrc.psc

in your script.

If your login shell is csh or tcsh and you use sh or bash in your batch script you must start your job with the line

    #!/bin/sh -l

or

    #!/bin/bash -l

depending on whether you want to use sh or bash in your batch script.

If your login shell is sh or bash and you use csh or tcsh as your batch shell you must include the two source commands in your batch script that were described above in the first case.

If your login shell is sh or bash and you use either sh or bash as your batch shell you do not need to make any changes to your batch scripts.

Using the module command in an interactve job

You do not need to issue any special commands if you want to use the module command in an interactive session, but you should not switch your shell from your login shell duing your interactive session.

Monitoring and Killing Jobs

The qstat -a command displays the status of the PBS queues. It shows running and queued jobs. For each job it shows the amount of walltime and the number of cores and processors requested. For running jobs it shows the amount of walltime the job has already used. The qstat -f command, which takes a jobid as an argument, provides more extensive information for a single job.

The qdel command is used to kill queued and running jobs. An example is the command

    qdel 54

The argument to qdel is the jobid of the job you want to kill, which you are shown when you submit your job or you can get it with the qstat command. If you cannot kill a job you want to kill send email to remarks@psc.edu.

Software Packages

A list of software packages installed on salk is available. If you would like us to install a package that is not in this list send email to remarks@psc.edu.

Stay Informed

As a user of salk, it is imperative that you stay informed of changes to the machine's environment. Refer to this document frequently. In addition, important information is posted to the PSC's Web page of bboard posts.

You will also periodically receive email from PSC with information about salk. In order to insure that you receive this email, you should make sure that your email forwarding is set properly by following the instructions for setting your email forwarding.

Reporting a Problem

You have two options for reporting problems on salk.

  • You can call the User Services Hotline at 1-800-2221-1641 from 9:00 a.m. until 8:00 p.m., Eastern time, on weekdays, and from 9:00 a.m. until 4:00 p.m., Eastern time, on Saturdays.

  • You can send email to remarks@psc.edu.