Questions tagged [slurm]

32 questions
19
votes
3 answers

How can I find out how long my slurm job took to execute?

One idea I have to find out how long my slurm job is taking is to use squeue --job How do I find out how long my job took to complete though once the job is complete?
5
votes
1 answer

remove slurm sacct command double entries: "extern"

Jobs currently running show two entries, one of them has an .extern suffix. Completed (or failed) jobs also have a third entry: .batch. Is there a way to remove (or not show these) from the sacct output? What are these entries?
4
votes
2 answers

Error code 140 in command running through Nextflow on SLURM

[Note: question heavily edited to correspond to the actual problem] I'm trying to debug a command that fails only in specific conditions. The failure is with an exitcode 140, but I have no other information. This command is cat in_file | tr "\t"…
Alexlok
  • 143
  • 1
  • 6
4
votes
3 answers

How to cancel a job that is on completing (CG) state?

I normally submitted some jobs using sbatch and canceled some of them after using scancel. However, they are in state CG and I cannot remove the jobs from my list. There is any way to get ride off those CG jobs? Sadly, I'm not the administrator of…
4
votes
1 answer

Slurm initialization fails in a Raspberry Pi cluster with Raspbian 9.4

I am trying to set up Slurm in a Raspberry Pi cluster with Raspbian 9.4. I am able to start slurmctld, but when I try to launch slurmd I get the following output: pi@node1:~ $ slurmd -Dvvvc slurmd: debug: Log file re-opened slurmd: error: Domain…
3
votes
1 answer

How to use slurm request for only one core instead of a node or socket?

I wrote Perl scripts to analyze my simulating data. This is not a concurrent program. In the cluster, there are eight nodes. Each of node has 2 sockets which possesses 10 cores. I want to submit my job using Slurm and only request one core to…
Leon
  • 131
2
votes
1 answer

Slurm on AWS returns slurmstepd: error: execve(): : No such file or directory

I have installed a Burstable and Event-driven HPC Cluster on AWS Using Slurm according to this tutorial. With this installation I can burst instances and run jobs in the Slurm environment on EC2. After running: #!/bin/bash #SBATCH --nodes=2 #SBATCH…
2
votes
0 answers

How to use SLURM's --dependency=expand: correctly

I have 1 slurm job unfinished out of 5 that's been running 19 hours and I'm concerned that it will hit walltime before it finishes. I'm not the admin and it's the weekend, so I would like to try using this feature I discovered recently shown in…
hepcat72
  • 155
  • 7
1
vote
0 answers

Sorting, merging, and deduplicating large txt files on HPC?

I am looking for some advice I am currently creating a kmer database and looking to merge/sort and take uniq lines from 47 sample.txt.gz which are 16gb each, What would be the fastest way to do this. i currently running this: zcat…
user29589776
1
vote
1 answer

Where is the path of slurmd binary in a HPC system?

I want to find the path of slurmd binary in a HPC system. I used which slurmd but there was an error: /usr/bin/which: no slurmd…
Martin
  • 11
1
vote
0 answers

How to make a host file in SLURM with $SLURM_JOB_NODELIST

I have access to a HPC with 40 cores on each node. I have a batch file to run a total of 35 codes which are in separate folders. Each code is an open mp code which requires 4 cores each. so how do I allocate resources such that each code gets 4…
1
vote
1 answer

SLURM setting nodes to drain due to low socket-core-thread-cpu count

I have SLURM set up with a couple of workstations. There are different kinds, but let's take one with a CPU which has 4 cores and no additional SMT, so 4 threads in total. lscpu shows me the following: $ lscpu Architecture: x86_64 CPU…
1
vote
1 answer

slurmd: Invalid job credential

I'm having some problems with a test configuration of Slurm on my laptop. I'm trying to run four slurmd instances on one machine, which is also the same machine as slurmctld runs on. I have a local munged running as user munge. slurmd and slurmctld…
lukas
  • 11
1
vote
0 answers

Slurm - GPU enforcement with cgroups

I am running slurm 19.05 on a single machine (Ubuntu 18.04) for scheduling GPU tasks. However, I am having trouble to setup the gpu enforcement with cgroups. If I set ConstrainDevice=yes in my cgroup.conf file, tensorflow is not able to access my…
Jonas
  • 11
1
vote
1 answer

Ubuntu 18.10 and modify installed package - OpenMPI

I've installed openmpi-bin (OpenMPI 3.1) on Ubuntu 18.10. I also run slurm on the same machine and would like to recompile or reconfigure my installation of OpenMPI to cope with Slurm-feature. If one installs OpenMPI from source, there is a setting…
Paer
  • 21
1
2 3