[mpich-discuss] A possible bug in HYD_pmcd_pmi_allocate_kvs

Xiaopeng Duan xpduan12 at gmail.com
Thu Jun 6 15:34:13 CDT 2019


Got it.
Thank you, Giuseppe and Ken.

Regards,
Xiaopeng

On Thu, Jun 6, 2019, 2:56 PM Congiu, Giuseppe <gcongiu at anl.gov> wrote:

> Actually, the fix uses a combination of hostname and random number, which
> seed is a time stamp.
> I don’t remember why exactly we didn’t go for the hostname only but I
> suspect it is because this might not be
> unique. Adding the random number with a timestamp seed should be robust
> enough against collisions.
>
> Giuseppe
>
> > On Jun 6, 2019, at 8:25 AM, Raffenetti, Kenneth J. via discuss <
> discuss at mpich.org> wrote:
> >
> > On 6/6/19 12:52 AM, Xiaopeng Duan wrote:
> >> Thank you, Ken.
> >>
> >> We were having another problem with 3.3, and will try it once we fixed
> >> our issue.
> >>
> >> Just my couriosity, why a random number was chosen for the fix instead
> >> of hostname or address? Looks to me the random number still has some
> >> possibility to repeat (although very rare), but hostnames and addresses
> >> should be unique in a system.
> >
> > I had the same thought when looking back at this patch. Maybe Giuseppe
> > can share why that was added. I'm fairly sure it can be safely removed.
> >
> > Ken
> >
> >>
> >> Regards,
> >> Xiaopeng
> >>
> >> On Wed, Jun 5, 2019, 8:29 AM Raffenetti, Kenneth J.
> >> <raffenet at mcs.anl.gov <mailto:raffenet at mcs.anl.gov>> wrote:
> >>
> >>    We added a similar fix in https://github.com/pmodels/mpich/pull/2788
> .
> >>    This was included in the MPICH 3.3 release.
> >>
> >>    Ken
> >>
> >>    On 6/4/19 11:27 PM, Xiaopeng Duan via discuss wrote:
> >>> Hi, MPICH experts,
> >>>
> >>> We are working on a dynamic master-worker flow using
> >>> mpi_comm_connect/mpi_com_accept. In some cases when the total
> >>    number of
> >>> worker process is large, they may get the same kvs_name and
> >>    confuse the
> >>> internal group identifiers. This was traced to the naming
> >>    convention in
> >>> HYD_pmcd_pmi_allocate_kvs() that considers only process id, while
> >>    two
> >>> processes on different machines may have the same pid. I tried to
> >>    add
> >>> host name (from unistd.h>gethostname) to the name, i.e.
> >>> 'kvs_HOSTNAME_PID_pgid', then everything is working fine in our
> >>    testing.
> >>>
> >>> So I'm wondering if this change is safe (we may need it for our
> >>    release)
> >>> and if it would go into the official MPICH release some time.
> >>>
> >>> Thank you very much.
> >>>
> >>> Xiaopeng
> >>>
> >>> _______________________________________________
> >>> discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
> >>> To manage subscription options or unsubscribe:
> >>> https://lists.mpich.org/mailman/listinfo/discuss
> >>>
> >>
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20190606/b6665245/attachment.html>


More information about the discuss mailing list