[mpich-discuss] Assertion in MX netmod

"Antonio J. Peña" apenya at mcs.anl.gov
Mon Nov 24 09:09:15 CST 2014


Dear Kuleshov,

In order to accomodate resources for more recent networking APIs we 
dropped support for the mx netmod, which in fact has been completely 
removed in our most recent 3.2 releases. So, unfortunately, we are not 
able to assist you with this issue.

Best regards,
   Antonio


On 11/22/2014 01:52 PM, Kuleshov Aleksey wrote:
> And the same problem with different approach:
> I downloaded from http://www.mcs.anl.gov/research/projects/mpi/mpi-test/tsuite.html mpi2test.tar.gz, built it and try
> to run pingping test:
>
>> MPITEST_VERBOSE=1 MPICH_NEMESIS_NETMOD=mx mpiexec -f /tmp/m -n 2 /tests/pingping
> [stdout]
> Get new datatypes: send = MPI_INT, recv = MPI_INT
> Get new datatypes: send = MPI_INT, recv = MPI_INT
> Sending count = 1 of sendtype MPI_INT of total size 4 bytes
> Sending count = 1 of sendtype MPI_INT of total size 4 bytes
> Get new datatypes: send = MPI_DOUBLE, recv = MPI_DOUBLE
> Get new datatypes: send = MPI_DOUBLE, recv = MPI_DOUBLE
> Sending count = 1 of sendtype MPI_DOUBLE of total size 8 bytes
> Sending count = 1 of sendtype MPI_DOUBLE of total size 8 bytes
> Get new datatypes: send = MPI_FLOAT_INT, recv = MPI_FLOAT_INT
> Sending count = 1 of sendtype MPI_FLOAT_INT of total size 8 bytes
> Get new datatypes: send = MPI_FLOAT_INT, recv = MPI_FLOAT_INT
> Sending count = 1 of sendtype MPI_FLOAT_INT of total size 8 bytes
> Get new datatypes: send = dup of MPI_INT, recv = dup of MPI_INT
> Get new datatypes: send = dup of MPI_INT, recv = dup of MPI_INT
> Sending count = 1 of sendtype dup of MPI_INT of total size 4 bytes
> Sending count = 1 of sendtype dup of MPI_INT of total size 4 bytes
> Get new datatypes: send = int-vector, recv = MPI_INT
> Sending count = 1 of sendtype int-vector of total size 4 bytes
> Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_send.c at line 435: n_iov > 0
> internal ABORT - process 0
> [/stdout]
>
> 22.11.2014, 18:39, "Kuleshov Aleksey" <rndfax at yandex.ru>:
>> Hello! Can you please help me with problem?
>>
>> I'm working on custom myriexpress library and I'm using MX netmod in MPICH v.3.1.2.
>> For testing purposes I built OSU Micro Benchmarks v3.8.
>>
>> To run it on 7 nodes I execute test osu_alltoall as follows:
>>>   MPICH_NEMESIS_NETMOD=mx mpiexec -f /tmp/m -n 7 /osu_alltoall
>> It passed successfully (I also tried it on 2, 3, 4, 5 and 6 nodes - everything is alright).
>>
>> But now I want to run it on 8 nodes:
>>>   MPICH_NEMESIS_NETMOD=mx mpiexec -f /tmp/m -n 8 /osu_alltoall
>> [stdout]
>> # OSU MPI All-to-All Personalized Exchange Latency Test v3.8
>> # Size       Avg Latency(us)
>> Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
>> Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
>> Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
>> Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
>> Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
>> Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
>> Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
>> Assertion failed in file ../src/mpid/ch3/channels/nemesis/netmod/mx/mx_poll.c at line 784: n_iov > 0
>> internal ABORT - process 4
>> internal ABORT - process 7
>> internal ABORT - process 2
>> internal ABORT - process 6
>> internal ABORT - process 3
>> internal ABORT - process 0
>> internal ABORT - process 5
>> internal ABORT - process 1
>> [/stdout]
>>
>> So, what does these assertions mean?
>> Is it something wrong with MX netmod?
>> Or in myriexpress library?
>> Or in test osu_alltoall itself?
>>
>> BTW, osu_alltoall on 8 nodes passed successfully for TCP netmod.
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss


-- 
Antonio J. Peña
Postdoctoral Appointee
Mathematics and Computer Science Division
Argonne National Laboratory
9700 South Cass Avenue, Bldg. 240, Of. 3148
Argonne, IL 60439-4847
apenya at mcs.anl.gov
www.mcs.anl.gov/~apenya

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list