UNCC ChargerNet and MPI
From RCSWiki
About
- Author: Kevin Hammond
- Last Modified: 02/26/2009
- Notes1: This tutorial will briefly explain how to install and run the NAS Performance Benchmarks (http://www.nas.nasa.gov/Resources/Software/npb.html) on the UNCC ChargerNet (https://chargernet.uncc.edu/).
Summary
- The NAS Parallel Benchmarks (NPB) are a small set of programs designed to help evaluate the performance of parallel supercomputers. The benchmarks, which are derived from computational fluid dynamics (CFD) applications, consist of five kernels and three pseudo-applications. The NPB come in several "flavors."
ChargerNet
- You first need to request an account on ChargerNet at https://chargernet.uncc.edu/portal/requestaccount.
- Once you obtain a login, reset your password to one of your choosing.
Installing the NAS Benchmarks onto your account.
- Follow the download instructions on the NASA website. (http://www.nas.nasa.gov/Resources/Software/swinstructions.html)
# NOTE: You will also need to register with the NASA site to download software.
- I downloaded the NPB 3.3 version, simply because it is the latest version. Prior versions are available as well.
- I downloaded it to my local PC. Because I was working from home, ChargerNet did not like the scp from home. I copied it to my home directory on Homer, and copied it to ChargerNet from there.
- Login to your ChargerNet account. Navigate to the directory where you want to install NPB, type:
tar -xvf NPB3.3.tar.gz
- This will extract all of the source code files into the proper directory tree.
Setting EXPORT Variables
- When I started to make my programs, mpicc was not in the path. I did some searching and found an instance of mpi installed onto the filesystem. I edited my .bash_profile to include the bin directory of that mpicc. I do believe it was for an earlier version of MPI-CH
- That was a bad idea.
- What I should have done was simply source the environment variables for OpenMpi. ChargerNet has different settings, depending on type version of compiler you want to use.
. /apps/usr/env/openmpi-1.2.4-gnu.sh // To use the GNU compiler. . /apps/usr/env/openmpi-1.2.4-intel.sh // To use the INTEL compiler . /apps/usr/env/openmpi-1.2.4-pgi.sh // To use the PGI compiler.
- This sets the proper mpicc and mpi77 in the PATH and sets the proper library locations as well.
Setting the compiler options
- I was interested only in the MPI enabled versions of these benchmark programs, so I concentrated on them mostly.
- Assuming you unpacked the .tar file directly in your home directory,
# cd ~/NPB3.3/NPB3.3-MPI/config cp make.def.template make.def nano make.def (or vi, or emacs, or whatever you use to edit text files)
- change the line:
MPIF77 = f77
to
MPIF77 = mpif77
- and
MPICC = cc
to
MPICC = mpicc
- You should now be able to compile each of the individual programs.
A brief description of the actual benchmark programs
The benchmarks are divided into two types of programs: kernel and CFD programs.
- KERNALS
- CG: A conjugate gradient method is used to compute an approximation to the smallest eigenvalue of a large, sparse, symmetric positive definite matrix. This kernel is typical of unstructured grid computations in that it tests irregular long distance communication, employing unstructured matrix vector multiplication.
- EP: An "embarrassingly parallel" kernel. It provides an estimate of the upper achievable limits for floating point performance, i.e., the perfor-mance without signifcant interprocessor communication.
- FT: A 3-D partial differential equation solution using FFTs. This kernel performs the essence of many "spectral" codes. It is a rigorous test of long-distance communication performance.
- IS: A large integer sort. This kernel performs a sorting operation that is important in 'particle method" codes. It tests both integer computation speed and communication performance.
- MG: A simplifed multigrid kernel. It requires highly structured long distance communication and tests both short and long distance data communication.
- APPLICATIONS
- BT: A simulated CFD application that uses an implicit algorithm to solve 3-dimensional (3D) compressible Navier-Stokes equations. The finite differences solution to the problem is based on an Alternating Direction Implicit (ADI) approximate factorization that decouples the x, y and z dimensions. The resulting systems are Block-Tridiagonal of 5×5 blocks and are solved sequentially along each dimension.
- LU: A simulated CFD application that uses symmetric successive over-relaxation (SSOR) method to solve a seven-block-diagonal system resulting from finite-difference discretization of the Navier-Stokes equations in 3-D by splitting it into block Lower and Upper triangular systems.
- SP: A simulated CFD application that has a similar structure to BT. The finite differences solution to the problem is based on a Beam-Warming approximate actorization that decouples the x, y and z dimensions. The resulting system has Scalar Pentadiagonal bands of linear equations that are solved sequentially along each dimension.
Making the programs
- In your ~/NPB3.3/NPB3.3-MPI directory, type:
make
the following will be displayed.
=========================================
= NAS Parallel Benchmarks 3.3 =
= MPI/F77/C =
=========================================
To make a NAS benchmark type
make <benchmark-name> NPROCS=<number> CLASS=<class> [SUBTYPE=<type>]
where <benchmark-name> is "bt", "cg", "ep", "ft", "is", "lu",
"mg", or "sp"
<number> is the number of processors
<class> is "S", "W", "A", "B", "C", or "D"
Only when making the I/O benchmark:
<benchmark-name> is "bt"
<number>, <class> as above
<type> is "full", "simple", "fortran", or "epio"
To make a set of benchmarks, create the file config/suite.def
according to the instructions in config/suite.def.template and type
make suite
***************************************************************
* Remember to edit the file config/make.def for site specific *
* information as described in the README file *
***************************************************************
- Obviously, the <benchmark-name> is one of the above types of benchmark that you are trying to compile.
- The <class> of a benchmark is defined as follows:
# S # W # A # B # C # D # E
- Each type of program is only valid for certain a <number> of parallel NPROCS
- BT, SP, & EP must be compiled with a square (n^i) number of processes, i.e., 1, 4, 9, 16, 25, 36, 49, 64.
- BT also includes 4 subtypes: full, simple, fortran and epio.
- FT, IS, CG, LU & MG must be compiled using a power of 2 (2n) number of processes, i.e., 1, 2, 4, 8, 16, 32, 64.
- BT, SP, & EP must be compiled with a square (n^i) number of processes, i.e., 1, 4, 9, 16, 25, 36, 49, 64.
A bash script to compile all the programs
This script should walk through all the possible combinations of application, data class and processor number, 'make'-ing all of them.
#!/bin/sh
#
#Script to submit all possible NPB make combinations to "make"
#
for APP in bt
do
echo "#$APP submit files"
for CLASS in S W A B C D
do
for NODE in 1 4 9 16 25 36 49 64
do
make $APP NPROCS=$NODE CLASS=$CLASS
for SUBTYPE in full simple fortran epio
do
make $APP NPROCS=$NODE CLASS=$CLASS SUBTYPE=$SUBTYPE
done
done
done
done
for APP in ep sp
do
echo "#$APP submit files"
for CLASS in S W A B C D E
do
for NODE in 1 4 9 16 25 36 49 64
do
make $APP NPROCS=$NODE CLASS=$CLASS
done
done
done
for APP in ft is cg lu mg
do
echo "#$APP submit files"
for CLASS in S W A B C D E
do
for NODE in 1 2 4 8 16 32 64
do
make $APP NPROCS=$NODE CLASS=$CLASS
done
done
done
ChargerNet and Condor
- ChargerNet uses Condor to handle cluster load balancing and job scheduling. Refer to http://www.cs.wisc.edu/condor/ for more information about condor. The biggest thing to realize is that you need to generate submit files for each of the above program iterations. Quite a daunting task!
- If you type:
which condor_submit
you should get
/apps/condor/bin/condor_submit
this is the program you will use to submit jobs to the cluster.
- Some other useful condor commands:
condor_q - prints the current condor queue condor_rm <job_no> - removes the selected job from the queue.
Building a Submit File
- Already we are talking about lots of different programs that need to be run. Scripts are here to help. I use the following to generate all the job submission files.
- Copy the make script and add the following lines:
NPB=NPB3.3/NPB3.3-MPI OUT_DIR=NPB.Output SUB_DIR=NPB.Submit UNIVERSE=parallel EXECUTABLE=/apps/condor/scripts/openmpi-64 CONDOR=/apps/condor/bin/condor_submit function build_submit {
S_APP=$2.$3.$4 echo "Building Submit File for $S_APP" echo "Universe = $UNIVERSE" > $1 echo "Executable = $EXECUTABLE" >> $1 echo "arguments = ~/$NPB/bin/$S_APP" >> $1 echo "machine_count = $4" >> $1 echo "output = $OUT_DIR/out.$S_APP" >> $1 echo "error = $OUT_DIR/err.$S_APP" >> $1 echo "log = $OUT_DIR/log.$S_APP" >> $1 echo "getenv = true" >> $1 echo "queue" >> $1 echo "" >> $1
}
then instead of
SUBFILE=$SUB_DIR/submit.$APP.$CLASS.$NODE make $APP NPROCS=$NODE CLASS=$CLASS
change to
SUBFILE=$SUB_DIR/submit.$APP.$CLASS.$NODE build_submit $SUBFILE $APP $CLASS $NODE
$CONDOR -n mpi $SUBFILE
you could also add
condor_submit -n mpi $OUTFILE
after the build_submit line and submit your job as soon as the submission file is generated!
Sit Back & Watch
- The ChargerNet is quite busy these days as even a small program can take some time to get through the queue. Using condor_submit with the correct mpi parameters -n mpi for mpi jobs greatly increases the chance that your job will run.
- Check the status of your jobs at
https://chargernet.uncc.edu/8001/jobstatus/
References
- http://www.nas.nasa.gov/Resources/Software/npb.html
- H. Jin, M. Frumkin, and J. Yan. The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. NAS Technical Report NAS-99-011, NASA Ames Research Center, 1999. http://www.nas.nasa.gov/News/Techreports/1999/PDF/nas-99-011.pdf
- Condor http://www.cs.wisc.edu/condor/
