[mpich-discuss] wrong number of processes on host

bruno.guerraz at orange.com bruno.guerraz at orange.com
Thu Nov 17 08:35:52 CST 2022


Is this log OK for you ?

[pgid: 0] got PMI command: cmd=barrier_in
[pgid: 0] got PMI command: cmd=put 
sharedFilename[0]=/dev/shm/mpich_shar_tmpAOPuur
[pgid: 0] got PMI command: cmd=barrier_in
[pgid: 0] got PMI command: cmd=put 
sharedFilename[4]=/dev/shm/mpich_shar_tmp76zj4w
[pgid: 0] got PMI command: cmd=barrier_in
PMI response to fd 13 pid 4: cmd=keyval_cache 
sharedFilename[2]=/dev/shm/mpich_shar_tmppFa2Wf 
sharedFilename[0]=/dev/shm/mpich_shar_tmpAOPuur 
sharedFilename[4]=/dev/shm/mpich_shar_tmp76zj4w
PMI response to fd 12 pid 4: cmd=keyval_cache 
sharedFilename[2]=/dev/shm/mpich_shar_tmppFa2Wf 
sharedFilename[0]=/dev/shm/mpich_shar_tmpAOPuur 
sharedFilename[4]=/dev/shm/mpich_shar_tmp76zj4w
PMI response to fd 14 pid 4: cmd=keyval_cache 
sharedFilename[2]=/dev/shm/mpich_shar_tmppFa2Wf 
sharedFilename[0]=/dev/shm/mpich_shar_tmpAOPuur 
sharedFilename[4]=/dev/shm/mpich_shar_tmp76zj4w
PMI response to fd 13 pid 4: cmd=barrier_out
PMI response to fd 12 pid 4: cmd=barrier_out
PMI response to fd 14 pid 4: cmd=barrier_out
[pgid: 0] got PMI command: cmd=put 
P2-businesscard=description#l-neobi-1$port#39551$ifname#10.193.21.24$ 
P3-businesscard=description#l-neobi-1$port#49083$ifname#10.193.21.24$
[pgid: 0] got PMI command: cmd=barrier_in
[pgid: 0] got PMI command: cmd=put 
P0-businesscard=description#l-neobi-4$port#57647$ifname#10.193.21.65$ 
P1-businesscard=description#l-neobi-4$port#46287$ifname#10.193.21.65$
[pgid: 0] got PMI command: cmd=barrier_in
[pgid: 0] got PMI command: cmd=put 
P4-businesscard=description#l-neobi-3$port#48597$ifname#10.193.21.26$ 
P5-businesscard=description#l-neobi-3$port#47573$ifname#10.193.21.26$
[pgid: 0] got PMI command: cmd=barrier_in
PMI response to fd 13 pid 5: cmd=keyval_cache 
P2-businesscard=description#l-neobi-1$port#39551$ifname#10.193.21.24$ 
P3-businesscard=description#l-neobi-1$port#49083$ifname#10.193.21.24$ 
P0-businesscard=description#l-neobi-4$port#57647$ifname#10.193.21.65$ 
P1-businesscard=description#l-neobi-4$port#46287$ifname#10.193.21.65$ 
P4-businesscard=description#l-neobi-3$port#48597$ifname#10.193.21.26$ 
P5-businesscard=description#l-neobi-3$port#47573$ifname#10.193.21.26$
PMI response to fd 12 pid 5: cmd=keyval_cache 
P2-businesscard=description#l-neobi-1$port#39551$ifname#10.193.21.24$ 
P3-businesscard=description#l-neobi-1$port#49083$ifname#10.193.21.24$ 
P0-businesscard=description#l-neobi-4$port#57647$ifname#10.193.21.65$ 
P1-businesscard=description#l-neobi-4$port#46287$ifname#10.193.21.65$ 
P4-businesscard=description#l-neobi-3$port#48597$ifname#10.193.21.26$ 
P5-businesscard=description#l-neobi-3$port#47573$ifname#10.193.21.26$
PMI response to fd 14 pid 5: cmd=keyval_cache 
P2-businesscard=description#l-neobi-1$port#39551$ifname#10.193.21.24$ 
P3-businesscard=description#l-neobi-1$port#49083$ifname#10.193.21.24$ 
P0-businesscard=description#l-neobi-4$port#57647$ifname#10.193.21.65$ 
P1-businesscard=description#l-neobi-4$port#46287$ifname#10.193.21.65$ 
P4-businesscard=description#l-neobi-3$port#48597$ifname#10.193.21.26$ 
P5-businesscard=description#l-neobi-3$port#47573$ifname#10.193.21.26$
PMI response to fd 13 pid 5: cmd=barrier_out
PMI response to fd 12 pid 5: cmd=barrier_out
PMI response to fd 14 pid 5: cmd=barrier_out

On 17/11/2022 15:30, Zhou, Hui wrote:
> That is strange. Try pass |-v|​ option (i.e. |mpiexec -v|​ ...) to 
> obtain a console log when that happens.
>
> -- 
> Hui
> ------------------------------------------------------------------------
> *From:* bruno via discuss <discuss at mpich.org>
> *Sent:* Thursday, November 17, 2022 8:11 AM
> *To:* discuss at mpich.org <discuss at mpich.org>
> *Cc:* bruno.guerraz at orange.com <bruno.guerraz at orange.com>
> *Subject:* [mpich-discuss] wrong number of processes on host
> Hi, I am using mpich on a hadoop cluster with yarn. It is not a smooth
> integration but it is working.
> Flowing an old post, I am using the manual launcher and the option
> -disable-hostname-propagation
> (https://lists.mpich.org/mailman/htdig/devel/2016-July/000717.html)
>
> The command line to launch by binary is :
>
> mpiexec -launcher manual -disable-hostname-propagation -n 6 -f ./hosts
> /path/to/my/bin
>
> And the file hosts contains
>
> host1:2
> host2:2
> host3:2
>
> I expect to have 2 processes on each host but most of the time it
> launches 3 processes on host1, 1 on host2 and 2 on host3
> Any ideas?
>
> Bruno
>
> _________________________________________________________________________________________________________________________
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez 
> recu ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les 
> messages electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, 
> deforme ou falsifie. Merci.
>
> This message and its attachments may contain confidential or 
> privileged information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and 
> delete this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have 
> been modified, changed or falsified.
> Thank you.
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20221117/6eedf192/attachment-0001.html>


More information about the discuss mailing list