[mpich-discuss] A possible bug in HYD_pmcd_pmi_allocate_kvs

Raffenetti, Kenneth J. raffenet at mcs.anl.gov
Thu Jun 6 08:25:21 CDT 2019


On 6/6/19 12:52 AM, Xiaopeng Duan wrote:
> Thank you, Ken.
> 
> We were having another problem with 3.3, and will try it once we fixed 
> our issue.
> 
> Just my couriosity, why a random number was chosen for the fix instead 
> of hostname or address? Looks to me the random number still has some 
> possibility to repeat (although very rare), but hostnames and addresses 
> should be unique in a system.

I had the same thought when looking back at this patch. Maybe Giuseppe 
can share why that was added. I'm fairly sure it can be safely removed.

Ken

> 
> Regards,
> Xiaopeng
> 
> On Wed, Jun 5, 2019, 8:29 AM Raffenetti, Kenneth J. 
> <raffenet at mcs.anl.gov <mailto:raffenet at mcs.anl.gov>> wrote:
> 
>     We added a similar fix in https://github.com/pmodels/mpich/pull/2788.
>     This was included in the MPICH 3.3 release.
> 
>     Ken
> 
>     On 6/4/19 11:27 PM, Xiaopeng Duan via discuss wrote:
>      > Hi, MPICH experts,
>      >
>      > We are working on a dynamic master-worker flow using
>      > mpi_comm_connect/mpi_com_accept. In some cases when the total
>     number of
>      > worker process is large, they may get the same kvs_name and
>     confuse the
>      > internal group identifiers. This was traced to the naming
>     convention in
>      > HYD_pmcd_pmi_allocate_kvs() that considers only process id, while
>     two
>      > processes on different machines may have the same pid. I tried to
>     add
>      > host name (from unistd.h>gethostname) to the name, i.e.
>      > 'kvs_HOSTNAME_PID_pgid', then everything is working fine in our
>     testing.
>      >
>      > So I'm wondering if this change is safe (we may need it for our
>     release)
>      > and if it would go into the official MPICH release some time.
>      >
>      > Thank you very much.
>      >
>      > Xiaopeng
>      >
>      > _______________________________________________
>      > discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>      > To manage subscription options or unsubscribe:
>      > https://lists.mpich.org/mailman/listinfo/discuss
>      >
> 


More information about the discuss mailing list