[mpich-discuss] mpich3 error with ch3

Alim Akhtar alim.akhtar at gmail.com
Thu May 27 09:19:29 CDT 2021


Hi Hui


On Thu, May 27, 2021 at 7:28 PM Zhou, Hui <zhouh at anl.gov> wrote:
>
> Similar issues can be very different in causes. Checking the referenced discussion, I wasn’t sure what was the original issues. We suggested to try ch4 as to get more data points rather than as a solution. Nevertheless, ch4 is the current recommended device as it is more actively developed.
>
>
>
> Ch3 is not broken as far as we know. Could you describe your issue in more details?
>
I get
Assertion failed in file
src/mpid/ch3/channels/nemesis/src/ch3_progress.c at line 530:
payload_len >= sizeof (MPIDI_CH3_Pkt_t)
after some loop.
or sometime I see like :
Assertion failed in file
src/mpid/ch3/channels/nemesis/src/ch3_progress.c at line 567:
!vc_ch->recv->active

The number of pass loop does depends on number of CPUs used. (more
number of CPUs more failure).
With One CPU, no failure.


>
>
> --
> Hui Zhou
>
>
>
>
>
> From: Alim Akhtar via discuss <discuss at mpich.org>
> Date: Wednesday, May 26, 2021 at 10:51 PM
> To: discuss at mpich.org <discuss at mpich.org>
> Cc: Alim Akhtar <alim.akhtar at gmail.com>
> Subject: [mpich-discuss] mpich3 error with ch3
>
> Hi mpich dev team,
>
> I am facing one issue similar to discussed in below discussion
>
> https://lists.mpich.org/pipermail/devel/2021-January/000826.html
>
> Someone in the mailing list suggested recompiling the mpi bench using
> CH4. as below:
>
> https://lists.mpich.org/pipermail/devel/2021-January/000828.html
>
> "MPICH with ch4, with `--with-device=ch4:ofi`"
>
> Actually this fixes the failure on this CPU architecture.
>
> Questions:
> 1. Is this a known issue with MPI bench on the recent CPU
> architecture? (I am running on ARM's Cortex -A 75), like ch3 is
> broken?
> 2. With no error after using CH4, does this mean the CPU is all good?
>
> Note: using ch3 was working fine on our previous CPU (A-72)
>
> Any input will be really appreciated.
>
>
> --
> Regards,
> Alim
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



-- 
Regards,
Alim


More information about the discuss mailing list