How to Make a Cluster Computer | Part 06 - Installing Slurm on Computer Nodes/Worker Nodes

Published: 20 January 2024
on channel: Wisdom Center
3,159
20

This video is about installation anf configuration of Slurm queuing system on Linux (Ubuntu here). You 'll also learn how to set up slurm on a cluster computer and submit jobs through it.
In this playlist, I talk about how to set up a cluster computing system using Ubuntu (Linux) and also setting up a queuing system for calculation submission.

The commands described in this video are given below:
Installing Slurm on the Compute Nodes ###
$ export MUNGEUSER=1001
$ sudo groupadd -g $MUNGEUSER munge
$ sudo useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge -s /sbin/nologin munge
$ export SLURMUSER=1002
$ sudo groupadd -g $SLURMUSER slurm
$ sudo useradd -m -c "SLURM workload manager" -d /var/lib/slurm -u $SLURMUSER -g slurm -s /bin/bash slurm
sudo apt-get install -y munge

Now copy the munge authentication key from /nfs/slurm/ on every node.
sudo scp /nfs/slurm/munge.key /etc/munge/
sudo chown munge:munge /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
sudo systemctl enable munge
sudo systemctl start munge
Start Installation of SLURM (On all compute nodes)
sudo apt-get install slurm-wlm

Copy slurm.conf and slurmdbd.conf from the login node to the compute nodes.
Copy both slurm.conf and slurmdbd.conf to each node at /etc/slurm
sudo scp /nfs/slurm/slurm.conf /etc/slurm
sudo scp /nfs/slurm/slurmdbd.conf /etc/slurm
On the compute nodes: (login as root and then run all the below commands till 3.5.10)
mkdir /var/spool/slurmd
chown slurm: /var/spool/slurmd
chmod 755 /var/spool/slurmd
mkdir /var/log/slurm/
touch /var/log/slurm/slurmd.log
chown -R slurm:slurm /var/log/slurm/slurmd.log
chmod 755 /var/log/slurm
mkdir /run/slurm
touch /run/slurm/slurmd.pid (For compute node)
chown slurm /run/slurm
chown slurm:slurm /run/slurm
chmod -R 770 /run/slurm

nano /usr/lib/systemd/system/slurmd.service
echo CgroupMountpoint=/sys/fs/cgroup > > /etc/slurm/cgroup.conf
slurmd -C
systemctl enable slurmd.service
systemctl start slurmd.service
systemctl status slurmd.service
If the service is active, you are all good, otherwise just reboot the node and reconnect the NFS. Then check the status of slurmd.service. In any case, a reboot at this stage is necessary.
The following command can check the connectivity with the controller node:
scontrol ping

For more informative videos about computational chemistry and other important software tools like Gaussian, MS Word, Excel, PowerPoint, Endnote, ChemDraw etc. subscribe my channel.    / wisdomcenter  
Facebook Page   / muhammadali.hashmi.33  
Instagram  / hashmi_photography  
Email: muhammad.hashmi [at sign] ue.edu.pk


Watch video How to Make a Cluster Computer | Part 06 - Installing Slurm on Computer Nodes/Worker Nodes online without registration, duration hours minute second in high quality. This video was added by user Wisdom Center 20 January 2024, don't forget to share it with your friends and acquaintances, it has been viewed on our site 3,159 once and liked it 20 people.