[mpich-discuss] A possible bug in HYD_pmcd_pmi_allocate_kvs
Raffenetti, Kenneth J.
raffenet at mcs.anl.gov
Wed Jun 5 08:29:09 CDT 2019
We added a similar fix in https://github.com/pmodels/mpich/pull/2788.
This was included in the MPICH 3.3 release.
Ken
On 6/4/19 11:27 PM, Xiaopeng Duan via discuss wrote:
> Hi, MPICH experts,
>
> We are working on a dynamic master-worker flow using
> mpi_comm_connect/mpi_com_accept. In some cases when the total number of
> worker process is large, they may get the same kvs_name and confuse the
> internal group identifiers. This was traced to the naming convention in
> HYD_pmcd_pmi_allocate_kvs() that considers only process id, while two
> processes on different machines may have the same pid. I tried to add
> host name (from unistd.h>gethostname) to the name, i.e.
> 'kvs_HOSTNAME_PID_pgid', then everything is working fine in our testing.
>
> So I'm wondering if this change is safe (we may need it for our release)
> and if it would go into the official MPICH release some time.
>
> Thank you very much.
>
> Xiaopeng
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
More information about the discuss
mailing list