[mpich-discuss] MPICH taking MUCH longer to run same job compared to srun

Tue Mar 30 17:50:38 CDT 2021

We are running a small cluster using slurm 20.11.2.
Our application is built with mpich 3.3.2.

If we submit a 2 batch jobs, each on it’s own single node, one 
using srun and the other using mpiexec to run the application, 
the mpiexec takes much longer.  This is an 8 processor job
on a 32 core single node.

Comparison times, for the same input file and the same 
number of processes (8), are
srun 0.006366 s
mpiexec 22.21 s
Something is clearly wrong.

From the FAQ for mpich, it says the default build of MPICH
will work fine:
https://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions#Note_that_the_default_build_of_MPICH_will_work_fine_in_SLURM_environments._No_extra_steps_are_needed.

SLURM is an external process manager that uses MPICH's PMI interface as well.
Note that the default build of MPICH will work fine in SLURM environments. No extra steps are needed.

——————

If I look at the process tree (ps axjf) when the mpiexec job is running:

      1 1427683 1427682 1427682 ?             -1 Sl       0   0:00 slurmstepd: [5435.batch]
1427683 1427687 1427687 1427682 ?             -1 S        0   0:00  \_ /bin/bash /var/spool/slurmd/job05435/slurm_script
1427687 1427702 1427687 1427682 ?             -1 S        0   0:00      \_ mpiexec -n 8 -machinefile ./machinefile myapp -i electron.in
1427702 1427703 1427703 1427703 ?             -1 Ssl      0   0:00          \_ /bin/srun --nodelist ne04 -N 1 -n 1 --input none /home/common/myapp/myapp-11.0-2021-
1427703 1427704 1427703 1427703 ?             -1 S        0   0:00              \_ /bin/srun --nodelist ne04 -N 1 -n 1 --input none /home/common/myapp/myapp-11.0-2
      1 1427712 1427711 1427711 ?             -1 Sl       0   0:00 slurmstepd: [5435.0]
1427712 1427718 1427718 1427711 ?             -1 S        0   0:00  \_ /home/common/myapp/myapp-11.0-2021-03-30-05.02/Contents/engine/bin/hydra_pmi_proxy --control
1427718 1427719 1427719 1427719 ?             -1 Rsl      0   0:10      \_ myapp -i electron.in
1427718 1427720 1427720 1427720 ?             -1 Rsl      0   0:11      \_ myapp -i electron.in
1427718 1427721 1427721 1427721 ?             -1 Rsl      0   0:06      \_ myapp -i electron.in
1427718 1427722 1427722 1427722 ?             -1 Rsl      0   0:06      \_ myapp -i electron.in
1427718 1427723 1427723 1427723 ?             -1 Rsl      0   0:06      \_ myapp -i electron.in
1427718 1427724 1427724 1427724 ?             -1 Rsl      0   0:10      \_ myapp -i electron.in
1427718 1427725 1427725 1427725 ?             -1 Rsl      0   0:08      \_ myapp -i electron.in
1427718 1427726 1427726 1427726 ?             -1 Rsl      0   0:11      \_ myapp -i electron.in

I see that mpiexec is calling /bin/srun with -N 1 -n 1.  I don’t know if this matters, because
there are 8 myapp processes running under slurmstepd.

The problem is that mpiexec is taking so much more time.

mpich 3.3.2 says
Process Manager:                         pmi

Is pmi2 different from pmi?  

The issue could also be with slurm.  In /etc/slurm/slurm.conf, we define MpiDefault to be pmi2:
MpiDefault=pmi2

If anyone can point me where to look to solve this problem, I would be very thankful.

Thanks in advance,
Anne Hammond

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20210330/e3c3d1a4/attachment.html>