[mpich-discuss] Assertion in MX netmod

Kuleshov Aleksey rndfax at yandex.ru
Sat Nov 22 13:52:58 CST 2014


And the same problem with different approach:
I downloaded from http://www.mcs.anl.gov/research/projects/mpi/mpi-test/tsuite.html mpi2test.tar.gz, built it and try
to run pingping test:

> MPITEST_VERBOSE=1 MPICH_NEMESIS_NETMOD=mx mpiexec -f /tmp/m -n 2 /tests/pingping
[stdout]
Get new datatypes: send = MPI_INT, recv = MPI_INT
Get new datatypes: send = MPI_INT, recv = MPI_INT
Sending count = 1 of sendtype MPI_INT of total size 4 bytes
Sending count = 1 of sendtype MPI_INT of total size 4 bytes
Get new datatypes: send = MPI_DOUBLE, recv = MPI_DOUBLE
Get new datatypes: send = MPI_DOUBLE, recv = MPI_DOUBLE
Sending count = 1 of sendtype MPI_DOUBLE of total size 8 bytes
Sending count = 1 of sendtype MPI_DOUBLE of total size 8 bytes
Get new datatypes: send = MPI_FLOAT_INT, recv = MPI_FLOAT_INT
Sending count = 1 of sendtype MPI_FLOAT_INT of total size 8 bytes
Get new datatypes: send = MPI_FLOAT_INT, recv = MPI_FLOAT_INT
Sending count = 1 of sendtype MPI_FLOAT_INT of total size 8 bytes
Get new datatypes: send = dup of MPI_INT, recv = dup of MPI_INT
Get new datatypes: send = dup of MPI_INT, recv = dup of MPI_INT
Sending count = 1 of sendtype dup of MPI_INT of total size 4 bytes
Sending count = 1 of sendtype dup of MPI_INT of total size 4 bytes
Get new datatypes: send = int-vector, recv = MPI_INT
Sending count = 1 of sendtype int-vector of total size 4 bytes
Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_send.c at line 435: n_iov > 0
internal ABORT - process 0
[/stdout]

22.11.2014, 18:39, "Kuleshov Aleksey" <rndfax at yandex.ru>:
> Hello! Can you please help me with problem?
>
> I'm working on custom myriexpress library and I'm using MX netmod in MPICH v.3.1.2.
> For testing purposes I built OSU Micro Benchmarks v3.8.
>
> To run it on 7 nodes I execute test osu_alltoall as follows:
>>  MPICH_NEMESIS_NETMOD=mx mpiexec -f /tmp/m -n 7 /osu_alltoall
>
> It passed successfully (I also tried it on 2, 3, 4, 5 and 6 nodes - everything is alright).
>
> But now I want to run it on 8 nodes:
>>  MPICH_NEMESIS_NETMOD=mx mpiexec -f /tmp/m -n 8 /osu_alltoall
>
> [stdout]
> # OSU MPI All-to-All Personalized Exchange Latency Test v3.8
> # Size       Avg Latency(us)
> Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
> Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
> Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
> Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
> Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
> Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
> Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
> Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
> internal ABORT - process 4
> internal ABORT - process 7
> internal ABORT - process 2
> internal ABORT - process 6
> internal ABORT - process 3
> internal ABORT - process 0
> internal ABORT - process 5
> internal ABORT - process 1
> [/stdout]
>
> So, what does these assertions mean?
> Is it something wrong with MX netmod?
> Or in myriexpress library?
> Or in test osu_alltoall itself?
>
> BTW, osu_alltoall on 8 nodes passed successfully for TCP netmod.
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list