How to Make a Cluster Computer | Part 05 - Installing Slurm on Login Node/Head Node

Опубликовано: 20 Январь 2024
на канале: Wisdom Center
6,195
66

This video is about installation and configuration of Slurm queuing system on Linux (Ubuntu here). You 'll also learn how to set up slurm on a cluster computer and submit jobs through it.
In this playlist, I talk about how to set up a cluster computing system using Ubuntu (Linux) and also setting up a queuing system for calculation submission.

The commands described in this video are given below:
Installing SLURM ###
Installing Slurm on the Login Node

$ export MUNGEUSER=1001
$ sudo groupadd -g $MUNGEUSER munge
$ sudo useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge -s /sbin/nologin munge
$ export SLURMUSER=1002
$ sudo groupadd -g $SLURMUSER slurm
$ sudo useradd -m -c "SLURM workload manager" -d /var/lib/slurm -u $SLURMUSER -g slurm -s /bin/bash slurm
sudo apt-get install -y munge
sudo chown -R munge: /etc/munge/ /var/log/munge/ /var/lib/munge/ /run/munge/
sudo chmod 0700 /etc/munge/ /var/log/munge/ /var/lib/munge/ /run/munge/
sudo scp /etc/munge/munge.key /nfs/slurm/
sudo systemctl enable munge
sudo systemctl start munge
Install slurm and associated components on slurm controller (Login) node
sudo apt-get install mariadb-server
sudo apt-get install slurmdbd
sudo apt-get install slurm-wlm

Create and configure the slurm_acct_db database: (Login Node)
sudo –I (login as root. su command may also be used)
mysql
grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by 'hashmi12' with grant option;
create database slurm_acct_db;
exit
sudo mkdir /etc/slurm-llnl
sudo nano /etc/slurm-llnl/slurmdbd.conf (Add the below lines shown in green in the file and save)

AuthType=auth/munge
DbdAddr=localhost
#DbdHost=master0
DbdHost=localhost
DbdPort=6819
SlurmUser=slurm
DebugLevel=4
LogFile=/var/log/slurm/slurmdbd.log
PidFile=/run/slurm/slurmdbd.pid
StorageType=accounting_storage/mysql
StorageHost=localhost
StorageLoc=slurm_acct_db
StoragePass=hashmi12
StorageUser=slurm
###Setting database purge parameters
PurgeEventAfter=12months
PurgeJobAfter=12months
PurgeResvAfter=2months
PurgeStepAfter=2months
PurgeSuspendAfter=1month
PurgeTXNAfter=12months
PurgeUsageAfter=12months

Now we need to give ownership of this file.
chown slurm:slurm /etc/slurm/slurmdbd.conf
chmod -R 600 slurmdbd.conf

Configuration file /etc/slurm/slurm.conf:
Visit the website (https://slurm.schedmd.com/configurato...) to generate a slurm configuration file
sudo nano /etc/slurm-llnl/slurm.conf

Allow the ports in firewall
sudo ufw allow 6817
sudo ufw allow 6818
sudo ufw allow 6819
On the master node: (login as root and then run all the below commands)
mkdir /var/spool/slurmctld
chown slurm:slurm /var/spool/slurmctld
chmod 755 /var/spool/slurmctld
mkdir /var/log/slurm
touch /var/log/slurm/slurmctld.log
touch /var/log/slurm/slurm_jobacct.log /var/log/slurm/slurm_jobcomp.log
chown -R slurm:slurm /var/log/slurm/
chmod 755 /var/log/slurm

#Search and change location of PID file
find / -name "slurmctld.service"
find / -name "slurmd.service"
find / -name "slurmdbd.service"

nano /usr/lib/systemd/system/slurmctld.service
nano /usr/lib/systemd/system/slurmdbd.service
nano /usr/lib/systemd/system/slurmd.service
Run the following as root
echo CgroupMountpoint=/sys/fs/cgroup > > /etc/slurm-llnl/cgroup.conf

slurmd -C
Start SLURM Services on Login Node
systemctl daemon-reload
systemctl enable slurmdbd
systemctl start slurmdbd
systemctl enable slurmctld
systemctl start slurmctld
At this point see the status of the started services:
systemctl status slurmdbd
systemctl status slurmctld
If any of the services are not active, try rebooting the PC and then check again. Hopefully that will do the job.

For more informative videos about computational chemistry and other important software tools like Gaussian, MS Word, Excel, PowerPoint, Endnote, ChemDraw etc. subscribe my channel.    / wisdomcenter  
Facebook Page   / muhammadali.hashmi.33  
Instagram  / hashmi_photography  
Email: muhammad.hashmi [at sign] ue.edu.pk


Смотрите видео How to Make a Cluster Computer | Part 05 - Installing Slurm on Login Node/Head Node онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь Wisdom Center 20 Январь 2024, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 6,195 раз и оно понравилось 66 людям.