[mpich-discuss] MPICH Connection to Self Rejected

Melissa Romanus melissa.romanus at rutgers.edu
Mon May 1 13:15:45 CDT 2017


I am experiencing issues on the SDSC Comet system when using the intel
compilers with mvapich2. The scheduling system on Comet is slurm. It seems
like the code is seg-faulting inside of MPI_Comm_dup, but prior to that, it
seems like it is rejecting a connection request to "self" (i.e., same IP to
same IP).

The modules loaded are:

$ module list

Currently Loaded Modulefiles:
  1) intel/2013_sp1.2.144   2) mvapich2_ib/2.1        3) gnutools/2.69

I am attempting to use the `ib0` interface. In my job script, I am
launching 3 different applications. I am **not** using slurm
`--multi-prog`. I am instead using 3 different `srun` commands. My job has
to be launched this way.

Using OpenMPI, I can set the MCA parameters to allow connections from
`self` at the byte-transfer layer, i.e., `OMPI_MCA_btl="self,openib"` and
specify to slurm that I would like to use `--mpi=pmi2`.

I think the mvapich errors that I am experiencing stem from the fact that
the "self" connection is rejected (i.e., node to itself). Is there a way to
tell MVAPICH to allow the self connection? I think I want the
command in some capacity, but I'm not sure if that would be enough to allow
the connection from the node to itself. Is the self connection inherently a
TCP connection? Or do I still need `--mpi=pmi2` for srun? Can I use srun or
do I need to use `mpiexec` explicitly?

Could this also be the cause of the error described by this FAQ?

Any help you can provide is greatly appreciated.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170501/ca594110/attachment.html>
-------------- next part --------------
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:

More information about the discuss mailing list