[mpich-discuss] Assertion in MX netmod

Kuleshov Aleksey rndfax at yandex.ru
Sat Nov 22 09:39:14 CST 2014


Hello! Can you please help me with problem?

I'm working on custom myriexpress library and I'm using MX netmod in MPICH v.3.1.2.
For testing purposes I built OSU Micro Benchmarks v3.8.

To run it on 7 nodes I execute test osu_alltoall as follows:
> MPICH_NEMESIS_NETMOD=mx mpiexec -f /tmp/m -n 7 /osu_alltoall

It passed successfully (I also tried it on 2, 3, 4, 5 and 6 nodes - everything is alright).

But now I want to run it on 8 nodes:

> MPICH_NEMESIS_NETMOD=mx mpiexec -f /tmp/m -n 8 /osu_alltoall
[stdout]
# OSU MPI All-to-All Personalized Exchange Latency Test v3.8
# Size       Avg Latency(us)
Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
internal ABORT - process 4
internal ABORT - process 7
internal ABORT - process 2
internal ABORT - process 6
internal ABORT - process 3
internal ABORT - process 0
internal ABORT - process 5
internal ABORT - process 1
[/stdout]

So, what does these assertions mean?
Is it something wrong with MX netmod?
Or in myriexpress library?
Or in test osu_alltoall itself?

BTW, osu_alltoall on 8 nodes passed successfully for TCP netmod.



More information about the discuss mailing list