[mpich-discuss] first attempts with neighbor collectives

Kokron, Daniel S. (GSFC-610.1)[Computer Sciences Corporation] daniel.s.kokron at nasa.gov
Fri Nov 22 22:33:46 CST 2013


mpich-3.1rc2 has gives the same result.

Daniel Kokron
NASA Ames (ARC-TN)
SciCon group
301-286-3959

________________________________________
From: discuss-bounces at mpich.org [discuss-bounces at mpich.org] on behalf of Kokron, Daniel S. (GSFC-610.1)[Computer Sciences Corporation] [daniel.s.kokron at nasa.gov]
Sent: Friday, November 22, 2013 3:20 PM
To: discuss at mpich.org
Subject: [mpich-discuss] first attempts with neighbor collectives

I've started playing with the neighbor collectives in mpich-3.0.4.  My first attempt was to convert the existing ~/test/mpi/topo/neighb_coll.c from 1D to 2D.  That went fine.  Now I want to convert that 2D C code to fortran.  All works fine except the call to MPI_Neighbor_alltoallw which fails with a SEGV.  Any ideas what I'm going wrong?

My 3.0.4 was configured and compiled with the Intel 13.1.3.192 compiler suite under Linux kernel 2.6.32.54 x86_64.
./configure CC=icc CXX=icpc FC=ifort F77=ifort --prefix=~/install/intel-13.1.3.192 --enable-f77 --enable-fc --enable-g=all --enable-debuginfo --enable-shared

The attached reproducer was compiled with the same suite
mpif90 -g -O0 -traceback -debug -check -o neighb_coll2Df neighb_coll2Df.f90

and run with
mpirun -np 12 neighb_coll2Df


valgrind-3.8.1 had this to say
==55631== Conditional jump or move depends on uninitialised value(s)
==55631==    at 0x62A9F42: vfprintf (in /lib64/libc-2.11.1.so)
==55631==    by 0x62CD288: vsprintf (in /lib64/libc-2.11.1.so)
==55631==    by 0x62B2BD7: sprintf (in /lib64/libc-2.11.1.so)
==55631==    by 0x47288B: stackwalk_cb (in /home1/dkokron/play/MPICH3/mpich-3.0.4/test/mpi/topo/neighb_coll2Df)
==55631==    by 0x473B94: tbk_trace_stack (in /home1/dkokron/play/MPICH3/mpich-3.0.4/test/mpi/topo/neighb_coll2Df)
==55631==    by 0x4725E5: tbk_string_stack_signal (in /home1/dkokron/play/MPICH3/mpich-3.0.4/test/mpi/topo/neighb_coll2Df)
==55631==    by 0x42ADE1: tbk_stack_trace (in /home1/dkokron/play/MPICH3/mpich-3.0.4/test/mpi/topo/neighb_coll2Df)
==55631==    by 0x40B89A: for__issue_diagnostic (in /home1/dkokron/play/MPICH3/mpich-3.0.4/test/mpi/topo/neighb_coll2Df)
==55631==    by 0x40F034: for__signal_handler (in /home1/dkokron/play/MPICH3/mpich-3.0.4/test/mpi/topo/neighb_coll2Df)
==55631==    by 0x5BFD5CF: ??? (in /lib64/libpthread-2.11.1.so)
==55631==    by 0x4C2B01F: _intel_fast_memcpy (mc_replace_strmem.c:889)
==55631==    by 0x5138DD5: MPIUI_Memcpy (mpiimpl.h:162)
==55631==
==55631== Warning: bad signal number 0 in sigaction()
==55631== Conditional jump or move depends on uninitialised value(s)
==55631==    at 0x47263A: tbk_string_stack_signal (in /home1/dkokron/play/MPICH3/mpich-3.0.4/test/mpi/topo/neighb_coll2Df)
==55631==    by 0x42ADE1: tbk_stack_trace (in /home1/dkokron/play/MPICH3/mpich-3.0.4/test/mpi/topo/neighb_coll2Df)
==55631==    by 0x40B89A: for__issue_diagnostic (in /home1/dkokron/play/MPICH3/mpich-3.0.4/test/mpi/topo/neighb_coll2Df)
==55631==    by 0x40F034: for__signal_handler (in /home1/dkokron/play/MPICH3/mpich-3.0.4/test/mpi/topo/neighb_coll2Df)
==55631==    by 0x5BFD5CF: ??? (in /lib64/libpthread-2.11.1.so)
==55631==    by 0x4C2B01F: _intel_fast_memcpy (mc_replace_strmem.c:889)
==55631==    by 0x5138DD5: MPIUI_Memcpy (mpiimpl.h:162)
==55631==    by 0x5139960: MPID_nem_mpich_sendv_header (mpid_nem_inline.h:307)
==55631==    by 0x5137847: MPIDI_CH3_iSendv (ch3_isendv.c:74)
==55631==    by 0x5119825: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:550)
==55631==    by 0x5123A47: MPID_Isend (mpid_isend.c:131)
==55631==    by 0x518A974: MPID_Sched_start (mpid_sched.c:155)

Daniel Kokron
NASA Ames (ARC-TN)
SciCon group
301-286-3959



More information about the discuss mailing list