<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div style="margin: 0.5em 0px; line-height: inherit; font-variant-ligatures: normal; orphans: 2; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;" class=""><font color="#252525" face="sans-serif" class=""><span style="caret-color: rgb(37, 37, 37);" class="">We are running a small cluster using slurm 20.11.2.</span></font></div><div style="margin: 0.5em 0px; line-height: inherit; font-variant-ligatures: normal; orphans: 2; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;" class=""><font color="#252525" face="sans-serif" class=""><span style="caret-color: rgb(37, 37, 37);" class="">Our application is built with mpich 3.3.2.</span></font></div><div style="margin: 0.5em 0px; line-height: inherit; font-variant-ligatures: normal; orphans: 2; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;" class=""><span style="caret-color: rgb(37, 37, 37); color: rgb(37, 37, 37); font-family: sans-serif;" class=""> </span></div><div style="margin: 0.5em 0px; line-height: inherit; font-variant-ligatures: normal; orphans: 2; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;" class=""><font color="#252525" face="sans-serif" class=""><span style="caret-color: rgb(37, 37, 37);" class="">If we submit a 2 batch jobs, each on it’s own single node, one </span></font></div><div style="margin: 0.5em 0px; line-height: inherit; font-variant-ligatures: normal; orphans: 2; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;" class=""><font color="#252525" face="sans-serif" class=""><span style="caret-color: rgb(37, 37, 37);" class="">using srun and the </span></font><span style="caret-color: rgb(37, 37, 37); color: rgb(37, 37, 37); font-family: sans-serif;" class="">other using mpiexec to run the application, </span></div><div style="margin: 0.5em 0px; line-height: inherit; font-variant-ligatures: normal; orphans: 2; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;" class=""><span style="caret-color: rgb(37, 37, 37); color: rgb(37, 37, 37); font-family: sans-serif;" class="">the mpiexec </span><span style="caret-color: rgb(37, 37, 37); color: rgb(37, 37, 37); font-family: sans-serif;" class="">takes </span><u style="caret-color: rgb(37, 37, 37); color: rgb(37, 37, 37); font-family: sans-serif;" class="">much</u><span style="caret-color: rgb(37, 37, 37); color: rgb(37, 37, 37); font-family: sans-serif;" class=""> longer. This is an 8 processor job</span></div><div style="margin: 0.5em 0px; line-height: inherit; font-variant-ligatures: normal; orphans: 2; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;" class=""><span style="caret-color: rgb(37, 37, 37); color: rgb(37, 37, 37); font-family: sans-serif;" class="">on a 32 core single node.</span></div><div style="margin: 0.5em 0px; line-height: inherit; font-variant-ligatures: normal; orphans: 2; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;" class=""><font color="#252525" face="sans-serif" class=""><span style="caret-color: rgb(37, 37, 37);" class=""><br class=""></span></font></div><div style="margin: 0.5em 0px; line-height: inherit; font-variant-ligatures: normal; orphans: 2; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;" class=""><font color="#252525" face="sans-serif" class=""><span style="caret-color: rgb(37, 37, 37);" class="">Comparison times, for the same </span></font><span style="caret-color: rgb(37, 37, 37); color: rgb(37, 37, 37); font-family: sans-serif;" class="">input file and the same </span></div><div style="margin: 0.5em 0px; line-height: inherit; font-variant-ligatures: normal; orphans: 2; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;" class=""><span style="caret-color: rgb(37, 37, 37); color: rgb(37, 37, 37); font-family: sans-serif;" class="">number of processes (8), are</span></div><div style="margin: 0.5em 0px; line-height: inherit; font-variant-ligatures: normal; orphans: 2; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;" class="">srun 0.006366 s<br class="">mpiexec 22.21 s<br class=""></div><div style="margin: 0.5em 0px; line-height: inherit; font-variant-ligatures: normal; orphans: 2; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;" class="">Something is clearly wrong.</div><div style="margin: 0.5em 0px; line-height: inherit; font-variant-ligatures: normal; orphans: 2; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;" class=""><br class=""></div><div style="margin: 0.5em 0px; line-height: inherit; font-variant-ligatures: normal; orphans: 2; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;" class="">From the FAQ for mpich, it says the default build of MPICH</div><div style="margin: 0.5em 0px; line-height: inherit; font-variant-ligatures: normal; orphans: 2; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;" class="">will work fine:</div><div style="margin: 0.5em 0px; line-height: inherit; font-variant-ligatures: normal; orphans: 2; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;" class=""><font color="#252525" face="sans-serif" class=""><span style="caret-color: rgb(37, 37, 37); font-size: 14px;" class=""><a href="https://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions#Note_that_the_default_build_of_MPICH_will_work_fine_in_SLURM_environments._No_extra_steps_are_needed" class="">https://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions#Note_that_the_default_build_of_MPICH_will_work_fine_in_SLURM_environments._No_extra_steps_are_needed</a>.</span></font></div><div style="margin: 0.5em 0px; line-height: inherit; color: rgb(37, 37, 37); font-family: sans-serif; font-size: 14px; font-variant-ligatures: normal; orphans: 2; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;" class=""><br class=""></div><div style="margin: 0.5em 0px; line-height: inherit; color: rgb(37, 37, 37); font-family: sans-serif; font-size: 14px; font-variant-ligatures: normal; orphans: 2; widows: 2; background-color: rgb(255, 255, 255); text-decoration-thickness: initial;" class="">SLURM is an external process manager that uses MPICH's PMI interface as well.</div><h5 style="background-image: none; background-color: rgb(255, 255, 255); margin: 0.3em 0px 0px; overflow: hidden; padding-top: 0.5em; padding-bottom: 0px; border-bottom-style: none; font-size: 14px; line-height: 1.6; font-family: sans-serif; font-variant-ligatures: normal; orphans: 2; widows: 2; text-decoration-thickness: initial; background-position: initial initial; background-repeat: initial initial;" class=""><span class="mw-headline" id="Note_that_the_default_build_of_MPICH_will_work_fine_in_SLURM_environments._No_extra_steps_are_needed.">Note that the default build of MPICH will work fine in SLURM environments. No extra steps are needed.</span></h5><div class=""><br class=""></div><div class="">——————</div><div class=""><br class=""></div><div class="">If I look at the process tree (ps axjf) when the mpiexec job is running:</div><div class=""><br class=""></div><div class=""><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""> 1 1427683 1427682 1427682 ? -1 Sl 0 0:00 slurmstepd: [5435.batch]</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">1427683 1427687 1427687 1427682 ? -1 S 0 0:00 \_ /bin/bash /var/spool/slurmd/job05435/slurm_script</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">1427687 1427702 1427687 1427682 ? -1 S 0 0:00 \_ mpiexec -n 8 -machinefile ./machinefile myapp -i electron.in</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">1427702 1427703 1427703 1427703 ? -1 Ssl 0 0:00 \_ /bin/srun --nodelist ne04 -N 1 -n 1 --input none /home/common/myapp/myapp-11.0-2021-</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">1427703 1427704 1427703 1427703 ? -1 S 0 0:00 \_ /bin/srun --nodelist ne04 -N 1 -n 1 --input none /home/common/myapp/myapp-11.0-2</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""> 1 1427712 1427711 1427711 ? -1 Sl 0 0:00 slurmstepd: [5435.0]</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">1427712 1427718 1427718 1427711 ? -1 S 0 0:00 \_ /home/common/myapp/myapp-11.0-2021-03-30-05.02/Contents/engine/bin/hydra_pmi_proxy --control</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">1427718 1427719 1427719 1427719 ? -1 Rsl 0 0:10 \_ myapp -i electron.in</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">1427718 1427720 1427720 1427720 ? -1 Rsl 0 0:11 \_ myapp -i electron.in</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">1427718 1427721 1427721 1427721 ? -1 Rsl 0 0:06 \_ myapp -i electron.in</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">1427718 1427722 1427722 1427722 ? -1 Rsl 0 0:06 \_ myapp -i electron.in</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">1427718 1427723 1427723 1427723 ? -1 Rsl 0 0:06 \_ myapp -i electron.in</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">1427718 1427724 1427724 1427724 ? -1 Rsl 0 0:10 \_ myapp -i electron.in</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">1427718 1427725 1427725 1427725 ? -1 Rsl 0 0:08 \_ myapp -i electron.in</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">1427718 1427726 1427726 1427726 ? -1 Rsl 0 0:11 \_ myapp -i electron.in</span></div></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""><br class=""></span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">I see that mpiexec is calling /bin/srun with -N 1 -n 1. I don’t know if this matters, because</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">there are 8 myapp processes running under slurmstepd.</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""><br class=""></span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">The problem is that mpiexec is taking so much more time.</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""><br class=""></span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class="">mpich 3.3.2 says</div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Process Manager: pmi</span></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""><br class=""></span></div><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Is pmi2 different from pmi? </span></div></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""><br class=""></span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class="">The issue could also be with slurm. In /etc/slurm/slurm.conf, we define MpiDefault to be pmi2:</div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><div style="margin: 0px; font-stretch: normal; line-height: normal;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">MpiDefault=pmi2</span></div><div class=""><br class=""></div></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">If anyone can point me where to look to solve this problem, I would be very thankful.</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""><br class=""></span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Thanks in advance,</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class="">Anne Hammond</span></div><div style="margin: 0px; font-stretch: normal; font-size: 11px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures" class=""><br class=""></span></div><div class=""><span style="font-variant-ligatures: no-common-ligatures" class=""><br class=""></span></div><div class=""><br class=""></div><div class="">
<div><br class=""></div>
</div>
<br class=""></body></html>