My job crashed - where to look for clues about what happened
There are a few places you can look for information about a job that didn't run properly.
By default Peregrine creates 2 log files in the directory where the job was qsub'ed.
<job_name or submit_script_name>.o<job_id> contains standard output information
<job_name or submit_script_name>.e<job_id> contains standard error information
These two logs are the first place to look for information about why a job failed to run properly. Next, many applications have their own logs that may hold additional clues.
If you need help understanding information in the log files, or if there's just nothing in the log files that helps you understand why it didn't run properly, you can open a ticket by sending email to email@example.com. When you open a ticket, please provide the following information:
- directory where the job was started (often where the submit script is located)
- directory where job logs appear
- job_id number(s)
With this information, we can investigate whether a system problem likely contributed to the job failure.