[mpich-discuss] Help with debugging

Tue Jun 19 21:20:35 CDT 2018

You can rule out MPICH shm bugs if the code runs fine with Open-MPI or by disabling shm in MPICH, eg by using ch3:sock. 

The fact that small messages work suggest rendezvous deadlock is the issue, which is often but not always an application error. 

It’s impossible to reason about the precise nature of the bug without source code. 

Jeff

Sent from my iPhone

> On Jun 19, 2018, at 5:52 PM, Kochunas, Brendan <bkochuna at umich.edu> wrote:
> 
> We've been dealing with a particularly nasty bug in our code and are having trouble debugging it.
> 
> We are using RHEL7 with default GNU compiler colleciton (4.8.5) and yum package for mpich (mpich-3.2-x86_64)
> 
> The rest of this email describes briefly:
> 
> * Our communication pattern
> * what we observe
> * what we've tested (and think we know)
> 
> Description of communication pattern:
> This is point-to-point communication using MPI_Isend, MPI_Irecv, and MPI_Waitall. Its effectively a structured cartesian grid and we send multiple messages per face that are probably large (message sizes are a few KB).
> 
> What we observe:
> The problem we run into is that when we get to the MPI_Waitall we receive something like:
> 
> Fatal error in PMPI_Waitall: Other MPI error, error stack:
> PMPI_Waitall(323)...............: MPI_Waitall(count=1152, req_array=0x13493b90, status_array=0x1) failed
> MPIR_Waitall_impl(166)..........:
> MPIDI_CH3I_Progress(422)........:
> MPID_nem_handle_pkt(642)........:
> pkt_CTS_handler(321)............:
> MPID_nem_lmt_shm_start_send(270):
> MPID_nem_delete_shm_region(923).:
> MPIU_SHMW_Seg_detach(707).......: unable to remove shared memory - unlink No such file or directory
> 
> Our test case is 9 processors (a 3x3 grid)
> 
> We have tested several things that allow us to avoid the problem (but not reliably) if:
> * we send multiple messages (about 50x) that are smaller, instead of one large message.
> * Adding debug print statements near the MPI_Waitall
> * Removing compiler optimizations
> * Running under valgrind (e.g. mpirun -np <n> valgrind <exe>)
> 
> * We have also observed this error on multiple machines (Cray XK7, workstations with Intel Xeon's, SGI ICE-X with MVAPICH)
> 
> The output from valgrind is:
> 
> ==210233== Invalid read of size 8
> ==210233==    at 0xB3B7060: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1022)
> ==210233==    by 0xE7F8FBB: MPID_nem_mpich_sendv_header (mpid_nem_inline.h:363)
> ==210233==    by 0xE7F8FBB: MPIDI_CH3I_Shm_send_progress (ch3_progress.c:207)
> ==210233==    by 0xE7FBD93: MPIDI_CH3I_Progress (ch3_progress.c:583)
> ==210233==    by 0xE7276B4: MPIR_Waitall_impl (waitall.c:164)
> ==210233==    by 0xE727BB5: PMPI_Waitall (waitall.c:378)
> ==210233==    by 0xDFE4DFD: mpi_waitall (waitallf.c:275)
> ==210233==    by 0x49FF27F: __updatebc_MOD_finish_updatebc_base (UpdateBC.f90:361)
> ==210233==  Address 0x142ec268 is 152 bytes inside a block of size 216 free'd
> ==210233==    at 0xB3B3CBD: free (vg_replace_malloc.c:530)
> ==210233==    by 0xE83FB41: MPL_trfree (in /home/bkochuna/sw/lib/libmpi.so.12.1.0)
> ==210233==    by 0xE7837D8: MPIU_trfree (trmem.c:37)
> ==210233==    by 0xE80DC08: MPIU_SHMW_Hnd_free (mpiu_shm_wrappers.h:247)
> ==210233==    by 0xE80DC08: MPIU_SHMW_Hnd_finalize (mpiu_shm_wrappers.h:443)
> ==210233==    by 0xE80DC08: MPID_nem_delete_shm_region (mpid_nem_lmt_shm.c:963)
> ==210233==    by 0xE80DC08: MPID_nem_lmt_shm_start_send (mpid_nem_lmt_shm.c:270)
> ==210233==    by 0xE8097E0: pkt_CTS_handler (mpid_nem_lmt.c:352)
> ==210233==    by 0xE7FB3C5: MPID_nem_handle_pkt (ch3_progress.c:760)
> ==210233==    by 0xE7FBF8D: MPIDI_CH3I_Progress (ch3_progress.c:570)
> ==210233==    by 0xE7276B4: MPIR_Waitall_impl (waitall.c:164)
> ==210233==    by 0xE727BB5: PMPI_Waitall (waitall.c:378)
> ==210233==    by 0xDFE4DFD: mpi_waitall (waitallf.c:275)
> ==210233==    by 0x49FF27F: __updatebc_MOD_finish_updatebc_base (UpdateBC.f90:361)
> ==210233==  Block was alloc'd at
> ==210233==    at 0xB3B2BC3: malloc (vg_replace_malloc.c:299)
> ==210233==    by 0xE83F38F: MPL_trmalloc (in /home/bkochuna/sw/lib/libmpi.so.12.1.0)
> ==210233==    by 0xE783526: MPIU_trmalloc (trmem.c:29)
> ==210233==    by 0xE80CF74: MPIU_SHMW_Ghnd_alloc (mpiu_shm_wrappers.h:188)
> ==210233==    by 0xE80CF74: MPIU_SHMW_Seg_create_attach_templ (mpiu_shm_wrappers.h:622)
> ==210233==    by 0xE80CF74: MPIU_SHMW_Seg_create_and_attach (mpiu_shm_wrappers.h:894)
> ==210233==    by 0xE80CF74: MPID_nem_allocate_shm_region (mpid_nem_lmt_shm.c:885)
> ==210233==    by 0xE80CF74: MPID_nem_lmt_shm_start_recv (mpid_nem_lmt_shm.c:180)
> ==210233==    by 0xE8094AF: do_cts (mpid_nem_lmt.c:560)
> ==210233==    by 0xE809EBE: pkt_RTS_handler (mpid_nem_lmt.c:276)
> ==210233==    by 0xE7FB3C5: MPID_nem_handle_pkt (ch3_progress.c:760)
> ==210233==    by 0xE7FBF8D: MPIDI_CH3I_Progress (ch3_progress.c:570)
> ==210233==    by 0xE7276B4: MPIR_Waitall_impl (waitall.c:164)
> ==210233==    by 0xE727BB5: PMPI_Waitall (waitall.c:378)
> ==210233==    by 0xDFE4DFD: mpi_waitall (waitallf.c:275)
> ==210233==    by 0x49FF27F: __updatebc_MOD_finish_updatebc_base (UpdateBC.f90:361)
> 
> So my main questions to the list are:
> * are there any known issues with mpich-3.2 and the shared memory communication within nemesis that might affect non-blocking communication?
> * Does this look more like an issue in our code or within MPICH?
> * If it looks like an issue in our code what is the best way to debug this... seems like our current efforts of basic print statements change the behavior and prevent us from identifying it. I suspect there may be some memory overwrite beyond some array bounds, but compiling in debug with -fbounds-check does not expose the problem (and due to the nature of the MPI interfaces, I would not expect this to help). We've stared at the code for the calls to MPI_Isend and MPI_Irecv and everything looks correct (e.g. we don't touch the buffers, the sizes match up, etc.)
> * Are there limits to the number of simultaneous messages? (we are below the maximum number of MPI requests) or message sizes (I don't believe there is a limit here beyond integer overflow)
> * We've tried changing from MPI_Waitall to individual MPI_Waits and MPI_Tests, but the problem still happens with the individual waits.
> 
> Any insights are welcome.
> 
> Thanks,
> -Brendan
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss