Highest Voted 'slurm' Questions

19

votes

3 answers

How can I find out how long my slurm job took to execute?

One idea I have to find out how long my slurm job is taking is to use squeue --job How do I find out how long my job took to complete though once the job is complete?

slurm

asked Nov 02 '14 at 00:22

demongolem

667

5

votes

1 answer

remove slurm sacct command double entries: "extern"

Jobs currently running show two entries, one of them has an .extern suffix. Completed (or failed) jobs also have a third entry: .batch. Is there a way to remove (or not show these) from the sacct output? What are these entries?

cluster parallel-processing slurm

asked Nov 17 '16 at 21:39

DilithiumMatrix

579

4

votes

2 answers

Error code 140 in command running through Nextflow on SLURM

[Note: question heavily edited to correspond to the actual problem] I'm trying to debug a command that fails only in specific conditions. The failure is with an exitcode 140, but I have no other information. This command is cat in_file | tr "\t"…

bash slurm

asked Oct 09 '23 at 21:10

Alexlok

143
1
6

4

votes

3 answers

How to cancel a job that is on completing (CG) state?

I normally submitted some jobs using sbatch and canceled some of them after using scancel. However, they are in state CG and I cannot remove the jobs from my list. There is any way to get ride off those CG jobs? Sadly, I'm not the administrator of…

cluster slurm

asked Jun 15 '19 at 14:09

Iago Carvalho

141

4

votes

1 answer

Slurm initialization fails in a Raspberry Pi cluster with Raspbian 9.4

I am trying to set up Slurm in a Raspberry Pi cluster with Raspbian 9.4. I am able to start slurmctld, but when I try to launch slurmd I get the following output: pi@node1:~ $ slurmd -Dvvvc slurmd: debug: Log file re-opened slurmd: error: Domain…

raspberry-pi raspbian slurm

asked Jul 16 '18 at 11:30

Bub Espinja

150

3

votes

1 answer

How to use slurm request for only one core instead of a node or socket?

I wrote Perl scripts to analyze my simulating data. This is not a concurrent program. In the cluster, there are eight nodes. Each of node has 2 sockets which possesses 10 cores. I want to submit my job using Slurm and only request one core to…

slurm

asked Feb 12 '19 at 03:17

Leon

131

2

votes

1 answer

Slurm on AWS returns slurmstepd: error: execve(): : No such file or directory

I have installed a Burstable and Event-driven HPC Cluster on AWS Using Slurm according to this tutorial. With this installation I can burst instances and run jobs in the Slurm environment on EC2. After running: #!/bin/bash #SBATCH --nodes=2 #SBATCH…

amazon-web-services amazon-ec2 slurm

asked Jun 14 '19 at 12:59

Serialchiller

41

2

votes

0 answers

How to use SLURM's --dependency=expand: correctly

I have 1 slurm job unfinished out of 5 that's been running 19 hours and I'm concerned that it will hit walltime before it finishes. I'm not the admin and it's the weekend, so I would like to try using this feature I discovered recently shown in…

slurm

asked Nov 03 '18 at 12:32

hepcat72

155
7

1

vote

0 answers

Sorting, merging, and deduplicating large txt files on HPC?

I am looking for some advice I am currently creating a kmer database and looking to merge/sort and take uniq lines from 47 sample.txt.gz which are 16gb each, What would be the fastest way to do this. i currently running this: zcat…

bash slurm hpc

asked Feb 10 '25 at 23:32

user29589776

1

vote

1 answer

Where is the path of slurmd binary in a HPC system?

I want to find the path of slurmd binary in a HPC system. I used which slurmd but there was an error: /usr/bin/which: no slurmd…

bash path slurm

asked Nov 04 '24 at 18:57

Martin

11

1

vote

0 answers

How to make a host file in SLURM with $SLURM_JOB_NODELIST

I have access to a HPC with 40 cores on each node. I have a batch file to run a total of 35 codes which are in separate folders. Each code is an open mp code which requires 4 cores each. so how do I allocate resources such that each code gets 4…

bash hpc slurm

asked May 29 '21 at 13:50

Libin Varghese

11

1

vote

1 answer

SLURM setting nodes to drain due to low socket-core-thread-cpu count

I have SLURM set up with a couple of workstations. There are different kinds, but let's take one with a CPU which has 4 cores and no additional SMT, so 4 threads in total. lscpu shows me the following: $ lscpu Architecture: x86_64 CPU…

slurm

asked Nov 14 '19 at 13:34

Martin Ueding

2,485

1

vote

1 answer

slurmd: Invalid job credential

I'm having some problems with a test configuration of Slurm on my laptop. I'm trying to run four slurmd instances on one machine, which is also the same machine as slurmctld runs on. I have a local munged running as user munge. slurmd and slurmctld…

hpc slurm

asked Oct 25 '19 at 13:45

lukas

11

1

vote

0 answers

Slurm - GPU enforcement with cgroups

I am running slurm 19.05 on a single machine (Ubuntu 18.04) for scheduling GPU tasks. However, I am having trouble to setup the gpu enforcement with cgroups. If I set ConstrainDevice=yes in my cgroup.conf file, tensorflow is not able to access my…

gpu slurm

asked Sep 10 '19 at 07:06

Jonas

11

1

vote

1 answer

Ubuntu 18.10 and modify installed package - OpenMPI

I've installed openmpi-bin (OpenMPI 3.1) on Ubuntu 18.10. I also run slurm on the same machine and would like to recompile or reconfigure my installation of OpenMPI to cope with Slurm-feature. If one installs OpenMPI from source, there is a setting…

apt slurm ubuntu-18.10

asked Jun 13 '19 at 18:01

Paer

21

Questions tagged [slurm]