[mpich-discuss] Assertion failed in file src/mpid/ch3/src/ch3u_handle_send_req.c at line 61 (RMA && Derived datatypes)
Victor Vysotskiy
victor.vysotskiy at teokem.lu.se
Wed Nov 12 04:28:51 CST 2014
Dear Pavan,
Dear Xin,
I just downloaded the latest nightly MPICH3 tarball ('mpich-master-v3.1.3-174-gb0f5772f') and compiled it on our IB cluster using the Intel's compilers v15.0:
%mpichversion
MPICH Version: 3.1.3
MPICH Release date: Wed Nov 12 00:00:34 CST 2014
MPICH Device: ch3:nemesis
MPICH configure: --prefix=/nobackup/global/x_vicvy/mpich3-dev.ib --with-device=ch3:nemesis:ib CC=icc CXX=icpc FC=ifort F77=ifort
MPICH CC: icc -O2
MPICH CXX: icpc -O2
MPICH F77: ifort -O2
MPICH FC: ifort -O2
Unfortunately, the test-bed code still crashes with the same error message:
%mpirun -np 8 ./mpi_tvec2_rma 64 400000
Allocating memory: win_buf=195 (Mb), loc_buf=24 (Mb)
Assertion failed in file src/mpid/ch3/src/ch3u_handle_send_req.c at line 71: win_ptr->at_completion_counter >= 0
internal ABORT - process 0
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 65516 RUNNING AT n3
= EXIT CODE: 1
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Here I should mention an observed change: the code always crashes with 8 processes, but sometimes it works out with 4 processes (!). However, even with 4 processes you can reproduce the problem by running test-bed several times in a row; i.e.;
for i in {1..5}; do mpirun -np 4 ./mpi_tvec2_rma 64 400000; done
In order to be sure that the problem is generic, I also fetched the latest commits from mpich/master repo and recompiled MPICH3 on my laptop by using GCC v4.9:
%git log --pretty=oneline --abbrev-commit -5
b0f5772 Revert RMA ADI change for req-based RMA operations.
eedd51e Delete unused variable.
9107068 Delete no longer needed file.
c235c75 Delete no longer used epoch states.
f695c96 Bug-fixing: set window state to MPIDI_RMA_NONE when UNLOCK finishes
%mpichversion
MPICH Version: 3.1.3
MPICH Release date: unreleased development copy
MPICH Device: ch3:nemesis
MPICH configure: --prefix=/opt/mpi/mpich3-dev/ --no-create --no-recursion
MPICH CC: gcc -O2
MPICH CXX: g++ -O2
MPICH F77: gfortran -O2
MPICH FC: gfortran -O2
Even on my laptop, the problem still remains:
mpirun -np 4 ./mpi_tvec2_rma 64 400000
Allocating memory: win_buf=195 (Mb), loc_buf=48 (Mb)
Allocating memory: win_buf=195 (Mb), loc_buf=48 (Mb)
Allocating memory: win_buf=195 (Mb), loc_buf=48 (Mb)
Allocating memory: win_buf=195 (Mb), loc_buf=48 (Mb)
Assertion failed in file src/mpid/ch3/src/ch3u_handle_send_req.c at line 71: win_ptr->at_completion_counter >= 0
internal ABORT - process 1
...
It would be great if you can check and fix the issue, if needed.
With best regards,
Victor.
P.s. Just in case, please see the first message for test-bed:
http://lists.mpich.org/pipermail/discuss/attachments/20141110/b5b34433/attachment.obj
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list