[mpich-discuss] How does MPICH spawn processes, and is there a way to configure it?

Lu, Huiwei huiweilu at mcs.anl.gov
Mon Jun 2 16:37:19 CDT 2014


Thanks for reporting this, Scott.

—
Huiwei

On Jun 2, 2014, at 4:29 PM, Scott O'Malley <scottyomalley at gmail.com> wrote:

> After extensive research it turns out it was my slave nodes that weren't configured correctly. Only the Master Node was able to access the Slaves so when mpiexec told a slave to spawn a new process and error was thrown. This wasn't being thrown with the original param I was passing as it forced the Master Node to distribute all tasks. It was an odd one to track down but hopefully if anyone else has this problem they can see this information here
> 
> 
> On 2 June 2014 22:09, Lu, Huiwei <huiweilu at mcs.anl.gov> wrote:
> I am not sure what tree mode is and why you need to disable it. In general users does not need to specify which mode hydra should use to run MPI. You can simply run it using: mpiexec -f hosts -n 4 ./app, as explained in http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Quick_Start
> 
> In order to understand you failed run, can you give us your host file and the command line you launch you application?
> 
>> Huiwei
> 
> On May 29, 2014, at 9:16 AM, Scott O'Malley <scottyomalley at gmail.com> wrote:
> 
> > I've looked through the hydra docs but I haven't found a flag or anything else that would let me disable it spawning in tree mode, if MPICH even does that.
> >
> > I was able to kind of rectify the error using OpenMPI and specifying the "-mca plm_rsh_no_tree_spawn" flag
> >
> >
> > On 29 May 2014 14:54, Lu, Huiwei <huiweilu at mcs.anl.gov> wrote:
> > Hi Scott,
> >
> > Check https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager to see if it is what you want.
> >
> > Best,
> > —
> > Huiwei
> >
> > On May 29, 2014, at 6:37 AM, Scott O'Malley <scottyomalley at gmail.com> wrote:
> >
> > > I have recently built a cluster of Raspberry Pi computers. When this was made up of a master node and 2 slave nodes there was no issue calling mpiexec across the cluster. When I add an additional node to the cluster bringing the size to 1 master and 3 slave nodes I get an error of "Host File Verification Failed". I've tested password-less SSH on all nodes and it's working correctly.
> > >
> > > I've done some googling of the error and set up and the only mention I've gotten to it was on this link here relating to OpenMPI - http://www.open-mpi.org/community/lists/users/2013/11/22940.php
> > >
> > > Is there a similar setup for MPICH and if so can someone link me to the correct documentation page?
> > >
> > > Cheers
> > >
> > > Scott
> > > _______________________________________________
> > > discuss mailing list     discuss at mpich.org
> > > To manage subscription options or unsubscribe:
> > > https://lists.mpich.org/mailman/listinfo/discuss
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
> >
> >
> > --
> > - Scott
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 
> 
> 
> -- 
> - Scott
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss




More information about the discuss mailing list