[mpich-discuss] A possible bug in HYD_pmcd_pmi_allocate_kvs

Raffenetti, Kenneth J. raffenet at mcs.anl.gov
Wed Jun 5 08:29:09 CDT 2019


We added a similar fix in https://github.com/pmodels/mpich/pull/2788. 
This was included in the MPICH 3.3 release.

Ken

On 6/4/19 11:27 PM, Xiaopeng Duan via discuss wrote:
> Hi, MPICH experts,
> 
> We are working on a dynamic master-worker flow using 
> mpi_comm_connect/mpi_com_accept. In some cases when the total number of 
> worker process is large, they may get the same kvs_name and confuse the 
> internal group identifiers. This was traced to the naming convention in 
> HYD_pmcd_pmi_allocate_kvs() that considers only process id, while two 
> processes on different machines may have the same pid. I tried to add 
> host name (from unistd.h>gethostname) to the name, i.e. 
> 'kvs_HOSTNAME_PID_pgid', then everything is working fine in our testing.
> 
> So I'm wondering if this change is safe (we may need it for our release) 
> and if it would go into the official MPICH release some time.
> 
> Thank you very much.
> 
> Xiaopeng
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 


More information about the discuss mailing list