[mpich-discuss] A possible bug in HYD_pmcd_pmi_allocate_kvs
xpduan12 at gmail.com
Tue Jun 4 23:27:43 CDT 2019
Hi, MPICH experts,
We are working on a dynamic master-worker flow using
mpi_comm_connect/mpi_com_accept. In some cases when the total number of
worker process is large, they may get the same kvs_name and confuse the
internal group identifiers. This was traced to the naming convention in
HYD_pmcd_pmi_allocate_kvs() that considers only process id, while two
processes on different machines may have the same pid. I tried to add host
name (from unistd.h>gethostname) to the name, i.e. 'kvs_HOSTNAME_PID_pgid',
then everything is working fine in our testing.
So I'm wondering if this change is safe (we may need it for our release)
and if it would go into the official MPICH release some time.
Thank you very much.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the discuss