At login, the software programming environment loads the Intel compiler and Intel MPI modules by default. The gcc compilers (GNU Compiler Collection), PGI compilers and Open64 compilers are also available and can be loaded by using modules.
For additional help on modules, please see Using Modules.
Using the appropriate compiler options can improve the performance of your application. Generally, the highest impact can be achieved by selecting an appropriate optimization level, by targeting the architecture of the processor (CPU, cache, memory system), and by allowing for interprocedural analysis (inlining, etc.). There is no set of options that gives the highest speed-up for all applications so different combinations should be explored.
The simplest way to control the optimization that the compiler can perform uses the -On options, explained below for different values of n.
n = 0: Fast compilation, full debugging support; equivalent to -g
n = 1,2: Low to moderate optimization, partial debugging support:
n = 3+: Aggressive optimization - compile time/space intensive and/or marginal effectiveness; may change
code semantics and results (sometimes even breaks code!) :
The following table lists some of the more important compiler options that affect application performance, based on the target architecture and application behavior.
-xHost Generates code with appropriate streaming SIMD extensions SSE4 for EM64T architecture
-g Debugging information, generates symbol table
-mp1 improve floating-point precision (speed impact is less than -mp).
-ip Enable single-file interprocedural (IP) optimizations (within files)
-prefetch Enables data prefetching (requires –O3)
-openmp Generate multi-threaded code based on the OpenMP directives
For additional information on other compiler options, see man pages for each compiler or use the --help option.
% icpc --help # C++ source code.
% ifort --help # Fortran source code.
% man icc # man page for the C compiler
% man ifort # man page for the fortran compiler
Some simple examples for building/linking serial codes with different optimization levels are provided below:
% icpc -c foo.cpp -O2 -g # builds a cpp source file for debugging % icpc -o failed.exe -g -O2 foo.o bar.o # links an executable with two files for debugging % icc -o foobar.exe -O3 -xT foo.c bar.c # builds and links two files to create an executable
Building MPI Programs
Many parallel codes running on Peregrine are written in Fortran, C or C++ to run in SPMD (Single Program Multiple Data) mode using explicit communication between tasks via MPI (the Message Passing Interface).
To facilitate building MPI programs, a set of compiler scripts (mpicc, mpiCC and mpif90) are provided, which remove the need to specify the location of the MPI header files and library. The default environment uses compiler scripts/wrappers that invoke the Intel compilers and the Intel MPI library.
Below are some simple examples of building/linking codes that use MPI:
% mpif90 -c simple.f90 -O3 # builds a simple Fortran code that calls MPI routines
% mpif90 -o simple.x simple.o # creates an executable from an object file
% mpiCC -o simple.x -O3 simple.cpp # builds and links an executable from a C++ source file
Building OpenMP Programs
Each node on Peregrine has 16 cores that share memory. Applications can use shared memory programming models to parallelize applications to use all the cores on a node. OpenMP is a pragma/directive-based shared memory programming method used by scientific and engineering applications. It can be used on its own or in combination with MPI.
With the Intel compilers, use the -openmp option to instruct the compiler to include the compiler directives that control this form of parallel programming.
For an application that will run on a single node, use
% icc -o hello.omp -openmp hello.c
For an application that will use both MPI and OpenMP, use
% mpicc -o hello.hybrid -openmp hello_hybrid.c