[mpich-discuss] Help with Cray MPICH

Marcin Zalewski marcin.zalewski at gmail.com
Mon Feb 3 13:27:48 CST 2014


I have an application I am trying to run on a Cray machine composed of
XE6 nodes. I have run this application previously using Open MPI and
MVAPICH on a few different machines, so I think it should be more or
less free of major bugs. However, when I run it on the Cray machine, I
get a segmentation fault with a stack trace that ends in this:

#0  0x00002aaaafe265e6 in memcpy () from /lib64/libc.so.6
#1  0x00002aaaae8fe023 in MPID_Segment_index_m2m () from
/opt/cray/lib64/libmpich_gnu_48.so.2
#2  0x00002aaaae8fca18 in MPID_Segment_manipulate () from
/opt/cray/lib64/libmpich_gnu_48.so.2
#3  0x00002aaaae9049d1 in MPID_Segment_unpack () from
/opt/cray/lib64/libmpich_gnu_48.so.2
#4  0x00002aaaae8edd38 in MPID_nem_gni_complete_rdma_get () from
/opt/cray/lib64/libmpich_gnu_48.so.2
#5  0x00002aaaae8e11c8 in MPID_nem_gni_check_localCQ () from
/opt/cray/lib64/libmpich_gnu_48.so.2
#6  0x00002aaaae8e29fa in MPID_nem_gni_poll () from
/opt/cray/lib64/libmpich_gnu_48.so.2
#7  0x00002aaaae8c3515 in MPIDI_CH3I_Progress () from
/opt/cray/lib64/libmpich_gnu_48.so.2
#8  0x00002aaaae9a3d4d in PMPI_Testsome () from
/opt/cray/lib64/libmpich_gnu_48.so.2

I understand that this is very little information to go on, but I
really cannot deal with this problem, and I thought I would try this
list while I am trying to solicit help from our support team.
Currently, I do not have access to a debug version of the library (or
the source code). Has anyone here seen a similar error? What would be
a reason to get an error on memcpy in MPICH while calling MPI_Testsome
in general? I do have asserts in my code to make sure that the output
arrays are large enough, and I got it to work on other implementations
of MPI, so I am really clueless to what the problem might be.

Thank you,
Marcin



More information about the discuss mailing list