[mpich-discuss] MPICH Connection to Self Rejected

Melissa Romanus melissa.romanus at rutgers.edu
Mon May 1 14:30:12 CDT 2017


Oh okay - thank you!

-Melissa

On Mon, May 1, 2017 at 3:29 PM Kenneth Raffenetti <raffenet at mcs.anl.gov>
wrote:

> Hi Melissa,
>
> Probably best to post this question to
> mvapich-discuss at cse.ohio-state.edu and go from there.
>
> Thanks,
> Ken
>
> On 05/01/2017 01:15 PM, Melissa Romanus wrote:
> > Hello,
> >
> > I am experiencing issues on the SDSC Comet system when using the intel
> > compilers with mvapich2. The scheduling system on Comet is slurm. It
> > seems like the code is seg-faulting inside of MPI_Comm_dup, but prior to
> > that, it seems like it is rejecting a connection request to "self"
> > (i.e., same IP to same IP).
> >
> > The modules loaded are:
> >
> > ```
> > $ module list
> >
> > Currently Loaded Modulefiles:
> >   1) intel/2013_sp1.2.144   2) mvapich2_ib/2.1        3) gnutools/2.69
> > ```
> >
> > I am attempting to use the `ib0` interface. In my job script, I am
> > launching 3 different applications. I am **not** using slurm
> > `--multi-prog`. I am instead using 3 different `srun` commands. My job
> > has to be launched this way.
> >
> > Using OpenMPI, I can set the MCA parameters to allow connections from
> > `self` at the byte-transfer layer, i.e., `OMPI_MCA_btl="self,openib"`
> > and specify to slurm that I would like to use `--mpi=pmi2`.
> >
> > I think the mvapich errors that I am experiencing stem from the fact
> > that the "self" connection is rejected (i.e., node to itself). Is there
> > a way to tell MVAPICH to allow the self connection? I think I want the
> > `--with-device=ch3:nemesis:ib` command in some capacity, but I'm not
> > sure if that would be enough to allow the connection from the node to
> > itself. Is the self connection inherently a TCP connection? Or do I
> > still need `--mpi=pmi2` for srun? Can I use srun or do I need to use
> > `mpiexec` explicitly?
> >
> > Could this also be the cause of the error described by this FAQ?
> >
> https://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions#Q:_All_my_processes_get_rank_0
> .
> >
> > Any help you can provide is greatly appreciated.
> >
> > -Melissa
> >
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170501/50e8e3e6/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list