[mpich-discuss] Help with Cray MPICH

Nick Radcliffe nradclif at cray.com
Mon Feb 3 13:46:11 CST 2014


Hi Marcin,

I can't be certain, but this looks like a bug we fixed recently in Cray MPICH. The fix should be available in MPT 6.2.2 by Feb. 20.

-Nick Radcliffe,
Cray MPT Team

________________________________________
From: discuss-bounces at mpich.org [discuss-bounces at mpich.org] on behalf of Marcin Zalewski [marcin.zalewski at gmail.com]
Sent: Monday, February 03, 2014 1:27 PM
To: discuss at mpich.org
Subject: [mpich-discuss] Help with Cray MPICH

I have an application I am trying to run on a Cray machine composed of
XE6 nodes. I have run this application previously using Open MPI and
MVAPICH on a few different machines, so I think it should be more or
less free of major bugs. However, when I run it on the Cray machine, I
get a segmentation fault with a stack trace that ends in this:

#0  0x00002aaaafe265e6 in memcpy () from /lib64/libc.so.6
#1  0x00002aaaae8fe023 in MPID_Segment_index_m2m () from
/opt/cray/lib64/libmpich_gnu_48.so.2
#2  0x00002aaaae8fca18 in MPID_Segment_manipulate () from
/opt/cray/lib64/libmpich_gnu_48.so.2
#3  0x00002aaaae9049d1 in MPID_Segment_unpack () from
/opt/cray/lib64/libmpich_gnu_48.so.2
#4  0x00002aaaae8edd38 in MPID_nem_gni_complete_rdma_get () from
/opt/cray/lib64/libmpich_gnu_48.so.2
#5  0x00002aaaae8e11c8 in MPID_nem_gni_check_localCQ () from
/opt/cray/lib64/libmpich_gnu_48.so.2
#6  0x00002aaaae8e29fa in MPID_nem_gni_poll () from
/opt/cray/lib64/libmpich_gnu_48.so.2
#7  0x00002aaaae8c3515 in MPIDI_CH3I_Progress () from
/opt/cray/lib64/libmpich_gnu_48.so.2
#8  0x00002aaaae9a3d4d in PMPI_Testsome () from
/opt/cray/lib64/libmpich_gnu_48.so.2

I understand that this is very little information to go on, but I
really cannot deal with this problem, and I thought I would try this
list while I am trying to solicit help from our support team.
Currently, I do not have access to a debug version of the library (or
the source code). Has anyone here seen a similar error? What would be
a reason to get an error on memcpy in MPICH while calling MPI_Testsome
in general? I do have asserts in my code to make sure that the output
arrays are large enough, and I got it to work on other implementations
of MPI, so I am really clueless to what the problem might be.

Thank you,
Marcin
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list