[mpich-discuss] Missing Nemesis ib
Jason Collins
jasoncollinsw at gmail.com
Thu Sep 21 03:04:21 CDT 2017
I tried to run the test in this other way to get more information:
# mpirun -n 10 ./mxm_perftest
--------------------------------------------------------------------------
Failed to register memory region (MR):
Hostname: compute1
Address: 1d14000
Length: 20480
Error: No space left on device
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Open MPI has detected that there are UD-capable Verbs devices on your
system, but none of them were able to be setup properly. This may
indicate a problem on this system.
You job will continue, but Open MPI will ignore the "ud" oob component
in this run.
Hostname: compute1
--------------------------------------------------------------------------
[1505980772.899531] [compute1:59182:0] sys.c:744 MXM WARN
Conflicting CPU frequencies detected, using: 3599.84
Waiting for connection...
[1505980772.900947] [compute1:59183:0] sys.c:744 MXM WARN
Conflicting CPU frequencies detected, using: 3599.84
[1505980772.902329] [compute1:59184:0] sys.c:744 MXM WARN
Conflicting CPU frequencies detected, using: 3599.84
[1505980772.903490] [compute1:59185:0] sys.c:744 MXM WARN
Conflicting CPU frequencies detected, using: 3599.84
[1505980772.904984] [compute1:59186:0] sys.c:744 MXM WARN
Conflicting CPU frequencies detected, using: 3599.84
[1505980772.906288] [compute1:59187:0] sys.c:744 MXM WARN
Conflicting CPU frequencies detected, using: 3599.84
[1505980772.907957] [compute1:59188:0] sys.c:744 MXM WARN
Conflicting CPU frequencies detected, using: 3599.84
[1505980772.909023] [compute1:59189:0] sys.c:744 MXM WARN
Conflicting CPU frequencies detected, using: 3599.84
bind() failed: Address already in use
bind() failed: Address already in use
bind() failed: Address already in use
bind() failed: Address already in use
bind() failed: Address already in use
bind() failed: Address already in use
bind() failed: Address already in use
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
bind() failed: Address already in use
[1505980772.910503] [compute1:59190:0] sys.c:744 MXM WARN
Conflicting CPU frequencies detected, using: 3599.84
bind() failed: Address already in use
[1505980772.911893] [compute1:59191:0] sys.c:744 MXM WARN
Conflicting CPU frequencies detected, using: 3599.84
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:
Process name: [[55900,1],1]
Exit code: 255
El jue., 21 sept. 2017 a las 7:58, Jason Collins (<jasoncollinsw at gmail.com>)
escribió:
> I ran the test and the result was the following:
>
> # ./mxm_perftest
>
> [1505976675.346380] [compute1:55801:0] sys.c:744 MXM WARN
> Conflicting CPU frequencies detected, using: 3600.52
>
> Waiting for connection...
>
>
> It does nothing else, it remains waiting to establish connection.
>
> <https://audio1.spanishdict.com/audio?lang=en&text=i-ran-the-test-and-the-result-was-the-following%3A>
>
> El mié., 20 sept. 2017 a las 17:12, Halim Amer (<aamer at anl.gov>) escribió:
>
>> I seems you have mismatch in the OFED stack. Try installing the Mellanox
>> OFED stack if you are using the bundled OFED stack right now.
>>
>> Make sure MXM works before trying MPICH. Use the mxm/bin/mxm_perftest
>> from your MXM installation to test that things work properly. If it
>> doesn't work, then contact your admin or Mellanox cause it is not an
>> MPICH problem.
>>
>> Halim
>> www.mcs.anl.gov/~aamer
>>
>> On 9/19/17 7:14 AM, Jason Collins wrote:
>> > Thank you very much.
>> >
>> > I have compiled with "CH3:nemesis:mxm". The compilation was successful.
>> >
>> > Now I have a new problem. I tested the test "./icp" and get the
>> > following error.
>> >
>> > # mpiexec -f hosts -n 4 ./cpi
>> > [1505822776.546898] [compute1:16212:0] sys.c:744 MXM WARN Conflicting
>> > CPU frequencies detected, using: 3459.84
>> > [1505822776.546898] [compute1:16213:0] sys.c:744 MXM WARN Conflicting
>> > CPU frequencies detected, using: 3459.84
>> > [1505822776.546951] [compute1:16216:0] sys.c:744 MXM WARN Conflicting
>> > CPU frequencies detected, using: 3459.84
>> > [1505822776.547039] [compute1:16214:0] sys.c:744 MXM WARN Conflicting
>> > CPU frequencies detected, using: 3459.84
>> > [1505822776.561357] [compute1:16214:0] ib_dev.c:533 MXM WARN failed call
>> > to ibv_exp_use_priv_env(): Function not implemented
>> > [1505822776.561371] [compute1:16214:0] ib_dev.c:544 MXM ERROR
>> > ibv_query_device() returned 38: Function not implemented
>> > [1505822776.561386] [compute1:16218:0] ib_dev.c:533 MXM WARN failed call
>> > to ibv_exp_use_priv_env(): Function not implemented
>> > [1505822776.561396] [compute1:16218:0] ib_dev.c:544 MXM ERROR
>> > ibv_query_device() returned 38: Function not implemented
>> > [1505822776.561426] [compute1:16225:0] ib_dev.c:533 MXM WARN failed call
>> > to ibv_exp_use_priv_env(): Function not implemented
>> > [1505822776.561442] [compute1:16225:0] ib_dev.c:544 MXM ERROR
>> > ibv_query_device() returned 38: Function not implemented
>> > Fatal error in MPI_Init: Other MPI error, error stack:
>> > MPIR_Init_thread(474).........:
>> > MPID_Init(190)................: channel initialization failed
>> > MPIDI_CH3_Init(89)............:
>> > MPID_nem_init(320)............:
>> > MPID_nem_mxm_init(158)........:
>> > MPID_nem_mxm_get_ordering(464): mxm_init failed (Input/output error)
>> >
>> >
>> > El vie., 15 sept. 2017 a las 16:01, Halim Amer (<aamer at anl.gov
>> > <mailto:aamer at anl.gov>>) escribió:
>> >
>> > The "nemesis:ib" netmod does not exist anymore. Try
>> "ch3:nemesis:mxm"
>> > with a dependency on Mellanox's MXM library (can be obtained from
>> the
>> > HPCX package at www.mellanox.com/products/hpcx
>> > <http://www.mellanox.com/products/hpcx>) or "ch3:nemesis:ofi"
>> > with a dependency on libfabric (which would be built to support the
>> IB
>> > or MXM providers; see https://ofiwg.github.io/libfabric/).
>> >
>> > Halim
>> > www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/~aamer>
>> >
>> > On 9/15/17 4:20 AM, Jason Collins wrote:
>> > > Hello everyone.
>> > >
>> > > Recently, I downloaded Mpich-3.2
>> > >
>> > > I want to configure with support for InfiniBand. I've put the
>> > following
>> > > command:
>> > >
>> > > # ./configure --prefix=/my/path --with-device=ch3:nemesis:ib
>> > >
>> > > And I get the following error:
>> > >
>> > > configure: error: Network module ib is unknown
>> > > "./src/mpid/ch3/channels/nemesis/netmod/ib"
>> > >
>> > > When I check the path I confirm that in the folder "netmod" does
>> not
>> > > exist the folder "ib". How can this be solved?
>> > >
>> > > Many thanks.
>> > >
>> > <
>> https://audio1.spanishdict.com/audio?lang=en&text=when-i-check-the-path-i-confirm-that-within-the-folder-netmod-the-folder-does-not-exist-ib-how-can-this-be-solved-many-thanks
>> >
>> > >
>> > >
>> > > _______________________________________________
>> > > discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org
>> >
>> > > To manage subscription options or unsubscribe:
>> > > https://lists.mpich.org/mailman/listinfo/discuss
>> > >
>> > _______________________________________________
>> > discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>> >
>> >
>> >
>> > _______________________________________________
>> > discuss mailing list discuss at mpich.org
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>> >
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170921/7bc55ebb/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list