I use slurm to run jobs on a cluster. I would like to get stats about the job, such as used memory, number of processors and wall-time. I would like to get such information in the log file. I think that this was possible with LSF (if I remember correctly and I am not getting confused with some other platform).
Asked
Active
Viewed 1,224 times
0
simona
- 2,009
- 6
- 29
- 41
1 Answers
1
You can get this information from the Slurm database, see https://slurm.schedmd.com/sacct.html or Find out the CPU time and memory usage of a slurm job. E.g. sacct --jobs=12345 --format=NCPUS,MaxRSS,CPUTime.
Note: you can add this to the epilog script. Here is an example of epilog.srun:
#!/bin/sh
TMPDIR="/local"
# Append job usage info to job stdout
stdoutfname=`scontrol show job ${SLURM_JOB_ID} --details | grep "StdOut=" | sed -e 's/.*StdOut=\([^\s][^\s]*\)/\1/'`
if [ -w "${stdoutfname}" ] && [ "${QTMPDIR}" != "" ]; then
sacct --format JobID,jobname,AveCPUFreq,AveDiskRead,AveRSS,cputime,MaxDiskWrite -j ${SLURM_JOB_ID} >> ${stdoutfname}
Alternatively, you can use /usr/bin/time -v <your command> inside of your script (with full path for time, see https://stackoverflow.com/a/774601/6352677). That will be in the logs, but will not exactly match Slurm's accounting values.
Keldorn
- 1,980
- 15
- 25