[mpich-discuss] Help with Cray MPICH

Marcin Zalewski marcin.zalewski at gmail.com
Wed Mar 5 10:31:22 CST 2014


Nick,

We got MPT 6.2.2 installed, and indeed, the initial tests don't cause
the same problem anymore. Thank you for alerting me to that bug. It
really saved me a lot of hair pulling.

-m

On Mon, Feb 3, 2014 at 2:46 PM, Nick Radcliffe <nradclif at cray.com> wrote:
> Hi Marcin,
>
> I can't be certain, but this looks like a bug we fixed recently in Cray MPICH. The fix should be available in MPT 6.2.2 by Feb. 20.
>
> -Nick Radcliffe,
> Cray MPT Team
>
> ________________________________________
> From: discuss-bounces at mpich.org [discuss-bounces at mpich.org] on behalf of Marcin Zalewski [marcin.zalewski at gmail.com]
> Sent: Monday, February 03, 2014 1:27 PM
> To: discuss at mpich.org
> Subject: [mpich-discuss] Help with Cray MPICH
>
> I have an application I am trying to run on a Cray machine composed of
> XE6 nodes. I have run this application previously using Open MPI and
> MVAPICH on a few different machines, so I think it should be more or
> less free of major bugs. However, when I run it on the Cray machine, I
> get a segmentation fault with a stack trace that ends in this:
>
> #0  0x00002aaaafe265e6 in memcpy () from /lib64/libc.so.6
> #1  0x00002aaaae8fe023 in MPID_Segment_index_m2m () from
> /opt/cray/lib64/libmpich_gnu_48.so.2
> #2  0x00002aaaae8fca18 in MPID_Segment_manipulate () from
> /opt/cray/lib64/libmpich_gnu_48.so.2
> #3  0x00002aaaae9049d1 in MPID_Segment_unpack () from
> /opt/cray/lib64/libmpich_gnu_48.so.2
> #4  0x00002aaaae8edd38 in MPID_nem_gni_complete_rdma_get () from
> /opt/cray/lib64/libmpich_gnu_48.so.2
> #5  0x00002aaaae8e11c8 in MPID_nem_gni_check_localCQ () from
> /opt/cray/lib64/libmpich_gnu_48.so.2
> #6  0x00002aaaae8e29fa in MPID_nem_gni_poll () from
> /opt/cray/lib64/libmpich_gnu_48.so.2
> #7  0x00002aaaae8c3515 in MPIDI_CH3I_Progress () from
> /opt/cray/lib64/libmpich_gnu_48.so.2
> #8  0x00002aaaae9a3d4d in PMPI_Testsome () from
> /opt/cray/lib64/libmpich_gnu_48.so.2
>
> I understand that this is very little information to go on, but I
> really cannot deal with this problem, and I thought I would try this
> list while I am trying to solicit help from our support team.
> Currently, I do not have access to a debug version of the library (or
> the source code). Has anyone here seen a similar error? What would be
> a reason to get an error on memcpy in MPICH while calling MPI_Testsome
> in general? I do have asserts in my code to make sure that the output
> arrays are large enough, and I got it to work on other implementations
> of MPI, so I am really clueless to what the problem might be.
>
> Thank you,
> Marcin
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list