[mpich-discuss] Assertion failed in file src/mpid/ch3/src/ch3u_handle_send_req.c at line 61 (RMA && Derived datatypes)

Zhao, Xin xinzhao3 at illinois.edu
Mon Nov 24 14:12:12 CST 2014


Hi Victor,

The bug is recently fixed in mpich/master (see https://trac.mpich.org/projects/mpich/ticket/2204).
Could you try tonight's nightly snapshot?

Thanks,
Xin
________________________________________
From: Victor Vysotskiy [victor.vysotskiy at teokem.lu.se]
Sent: Wednesday, November 19, 2014 3:00 AM
To: Zhao, Xin
Subject: RE: [mpich-discuss] Assertion failed in file src/mpid/ch3/src/ch3u_handle_send_req.c at line 61 (RMA && Derived datatypes)

Hi Xin,

>I think it is due to a bug in our MPICH RMA

thanks for your email! It is a great news, that you have found a problematic place inside MPICH.

>I created a ticket for this: https://trac.mpich.org/projects/mpich/ticket/2204, you can track the progress of this bug on it.

I will keep an eye on it.

With best regards,
Victor.

________________________________________
From: Zhao, Xin [xinzhao3 at illinois.edu]
Sent: Wednesday, November 19, 2014 4:40 AM
To: discuss at mpich.org
Cc: Victor Vysotskiy
Subject: RE: [mpich-discuss] Assertion failed in file src/mpid/ch3/src/ch3u_handle_send_req.c at line 61 (RMA && Derived datatypes)

Hi Victor,

I looked your test code and I think it is due to a bug in our MPICH RMA that the request handler is not re-entrant safe, which makes win_ptr->at_completion_counter being decremented twice for one GET operation. We will fix it as soon as possible. I created a ticket for this: https://trac.mpich.org/projects/mpich/ticket/2204, you can track the progress of this bug on it.

Thanks,
Xin

________________________________________
From: Victor Vysotskiy [victor.vysotskiy at teokem.lu.se]
Sent: Tuesday, November 18, 2014 6:00 AM
To: discuss at mpich.org
Subject: Re: [mpich-discuss] Assertion failed in file src/mpid/ch3/src/ch3u_handle_send_req.c at line 61 (RMA && Derived datatypes)

Hi Pavan,

>FYI, this is what we heard back from the Mellanox folks:
>I think, issue could be because of older MXM (part of MOFED) being used.  We can ask him to try  latest MXM from HPCX (http://bgate.mellanox.com/products/hpcx)

Indeed, I just checked the latest software stack, including:

- hpcx-v1.2.0-258-icc-OFED-3.12-redhat6.5;
- MPICH v3.2a2 ('--with-device=ch3:nemesis:mxm');

And, there is no problem with 'assertion failed in ch3u_handle_send_req.c' anymore!

Many thanks for your help and support!

With best regards,
Victor.
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list