[mpich-discuss] Help with Cray MPICH

Marcin Zalewski marcin.zalewski at gmail.com
Mon Feb 3 13:48:18 CST 2014


Thanks Nick, great to know.

-m

On Mon, Feb 3, 2014 at 2:46 PM, Nick Radcliffe <nradclif at cray.com> wrote:
> Hi Marcin,
>
> I can't be certain, but this looks like a bug we fixed recently in Cray MPICH. The fix should be available in MPT 6.2.2 by Feb. 20.
>
> -Nick Radcliffe,
> Cray MPT Team
>
> ________________________________________
> From: discuss-bounces at mpich.org [discuss-bounces at mpich.org] on behalf of Marcin Zalewski [marcin.zalewski at gmail.com]
> Sent: Monday, February 03, 2014 1:27 PM
> To: discuss at mpich.org
> Subject: [mpich-discuss] Help with Cray MPICH
>
> I have an application I am trying to run on a Cray machine composed of
> XE6 nodes. I have run this application previously using Open MPI and
> MVAPICH on a few different machines, so I think it should be more or
> less free of major bugs. However, when I run it on the Cray machine, I
> get a segmentation fault with a stack trace that ends in this:
>
> #0  0x00002aaaafe265e6 in memcpy () from /lib64/libc.so.6
> #1  0x00002aaaae8fe023 in MPID_Segment_index_m2m () from
> /opt/cray/lib64/libmpich_gnu_48.so.2
> #2  0x00002aaaae8fca18 in MPID_Segment_manipulate () from
> /opt/cray/lib64/libmpich_gnu_48.so.2
> #3  0x00002aaaae9049d1 in MPID_Segment_unpack () from
> /opt/cray/lib64/libmpich_gnu_48.so.2
> #4  0x00002aaaae8edd38 in MPID_nem_gni_complete_rdma_get () from
> /opt/cray/lib64/libmpich_gnu_48.so.2
> #5  0x00002aaaae8e11c8 in MPID_nem_gni_check_localCQ () from
> /opt/cray/lib64/libmpich_gnu_48.so.2
> #6  0x00002aaaae8e29fa in MPID_nem_gni_poll () from
> /opt/cray/lib64/libmpich_gnu_48.so.2
> #7  0x00002aaaae8c3515 in MPIDI_CH3I_Progress () from
> /opt/cray/lib64/libmpich_gnu_48.so.2
> #8  0x00002aaaae9a3d4d in PMPI_Testsome () from
> /opt/cray/lib64/libmpich_gnu_48.so.2
>
> I understand that this is very little information to go on, but I
> really cannot deal with this problem, and I thought I would try this
> list while I am trying to solicit help from our support team.
> Currently, I do not have access to a debug version of the library (or
> the source code). Has anyone here seen a similar error? What would be
> a reason to get an error on memcpy in MPICH while calling MPI_Testsome
> in general? I do have asserts in my code to make sure that the output
> arrays are large enough, and I got it to work on other implementations
> of MPI, so I am really clueless to what the problem might be.
>
> Thank you,
> Marcin
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list