Debugging and Performance Analysis Tools
NREL Linux HPC systems host a wide variety of debugging and performance analysis tools. An overview of each tool is given below.
- Start TotalView GUI from within an interactive batch job on a compute node
- Use the TotalView Remote Display Client (RDC) on your laptop or workstation and connect to the Linux cluster on which your application runs
- Use scripted debugging
PGDBG is a graphical debugger from the Portland Group. It is included with the PGI compiler suite. It may be used to debug parallel programs including MPI, OpenMP and hybrid MPI/OpenMP applications written in C, C++ or Fortran.
Peregrine has installed the tool suite from Allinea, including the debugger DDT, profiler MAP, and the newer Allinea Performance Reports. Performance Reports is a very easy to use, low overhead tool for analysis of the performance of applications. It reports time spent in computation, I/O and MPI communications, peak memory usage and information about why your program is spending so much time in a particular type of processing. Output is provided in a simple one page text or HTML file.
HPCToolkit is an integrated suite of tools for measurement and analysis of application performance on systems ranging from desktops to large Linux clusters. By using statistical sampling of timers and hardware performance counters, HPCToolkit measures a program's work, resource consumption and efficiency and attributes them to the full calling context in which they occur. Because HPCToolkit uses sampling, measurement has low overhead (1-5%) and scales to large parallel systems.
The TAU Performance System is a profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, Java and Python. Instrumentation can be inserted in the source code using an automatic instrumenter tool based on the Program Database Toolkit (PDT), dynamically using DyninstAPI, at runtime in the Java Virtual Machine or manually using the instrumentation API.
TAU's profile visualization tool, called paraprof, provides graphical displays of all performance analysis results, in aggregate and single node/context/thread forms, which allows the user to quickly identify sources of performance bottlenecks.
TAU can generate event traces that can be displayed with Vampir, Paraver or JumpShot trace visualization tools.
Intel VTune Amplifier XE 2013
is a performance profiler for C, C++, C#, Fortran, Assembly and Java code. Hotspots analysis provides a sorted list of functions that use a lot of CPU time and other features enable the user to quickly find common causes of slow performance in parallel programs, including waiting too long at locks and load imbalance among threads and processes. VTune Amplifier XE uses the Performance Monitoring Unit (PMU) on Intel processors to collect data with very low overhead.
Intel Trace Analyzer and Collector
Intel Trace Analyzer and Collector is a tool for understanding the behavior of MPI applications. Use this tool to visualize and understand MPI parallel application behavior, evaluate load balancing, learn more about communication patterns and identify communication hotspots. This tool is similar to Vampir (Intel acquired Pallas Vampir and created this tool from it).
Intel Inspector XE 2013
Intel Inspector XE is an easy to use memory checker and thread checker for serial and parallel applications written in C, C++, C#, F# and Fortran. It takes you to the source locations of threading and memory errors and provides a call stack to help you determine how you got there. This tool has a GUI and a command line interface.
PGPROF is a performance profiler for MPI and OpenMP applications, including applications that use PGI Accelerator directive and CUDA Fortran. Use PGPROF to visualize and diagnose the performance of your program. It associates execution time with source code. PGPROF allows profiling at the function, source code line and assembly instruction level for PGI-compiled Fortran, C and C++ programs. It provides views of the performance data for analysis of MPI communication, multi-process and multi-thread load balancing and scalability.
provides highly accurate per-thread summaries of hardware execution in terms of meaningful metrics, profiling I/O, communication, synchronization and preemption. PapiEx can measure precisely controlled sections of the application.