[mpich-discuss] Missing Nemesis ib

Jason Collins jasoncollinsw at gmail.com
Thu Sep 21 03:04:21 CDT 2017


I tried to run the test in this other way to get more information:

# mpirun -n 10 ./mxm_perftest

--------------------------------------------------------------------------

Failed to register memory region (MR):


Hostname: compute1

Address:  1d14000

Length:   20480

Error:    No space left on device

--------------------------------------------------------------------------

--------------------------------------------------------------------------

Open MPI has detected that there are UD-capable Verbs devices on your

system, but none of them were able to be setup properly.  This may

indicate a problem on this system.


You job will continue, but Open MPI will ignore the "ud" oob component

in this run.


Hostname: compute1

--------------------------------------------------------------------------

[1505980772.899531] [compute1:59182:0]         sys.c:744  MXM  WARN
Conflicting CPU frequencies detected, using: 3599.84

Waiting for connection...

[1505980772.900947] [compute1:59183:0]         sys.c:744  MXM  WARN
Conflicting CPU frequencies detected, using: 3599.84

[1505980772.902329] [compute1:59184:0]         sys.c:744  MXM  WARN
Conflicting CPU frequencies detected, using: 3599.84

[1505980772.903490] [compute1:59185:0]         sys.c:744  MXM  WARN
Conflicting CPU frequencies detected, using: 3599.84

[1505980772.904984] [compute1:59186:0]         sys.c:744  MXM  WARN
Conflicting CPU frequencies detected, using: 3599.84

[1505980772.906288] [compute1:59187:0]         sys.c:744  MXM  WARN
Conflicting CPU frequencies detected, using: 3599.84

[1505980772.907957] [compute1:59188:0]         sys.c:744  MXM  WARN
Conflicting CPU frequencies detected, using: 3599.84

[1505980772.909023] [compute1:59189:0]         sys.c:744  MXM  WARN
Conflicting CPU frequencies detected, using: 3599.84

bind() failed: Address already in use

bind() failed: Address already in use

bind() failed: Address already in use

bind() failed: Address already in use

bind() failed: Address already in use

bind() failed: Address already in use

bind() failed: Address already in use

-------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code.. Per user-direction, the job has been aborted.

-------------------------------------------------------

bind() failed: Address already in use

[1505980772.910503] [compute1:59190:0]         sys.c:744  MXM  WARN
Conflicting CPU frequencies detected, using: 3599.84

bind() failed: Address already in use

[1505980772.911893] [compute1:59191:0]         sys.c:744  MXM  WARN
Conflicting CPU frequencies detected, using: 3599.84

--------------------------------------------------------------------------

mpirun detected that one or more processes exited with non-zero status,
thus causing

the job to be terminated. The first process to do so was:


  Process name: [[55900,1],1]

  Exit code:    255

El jue., 21 sept. 2017 a las 7:58, Jason Collins (<jasoncollinsw at gmail.com>)
escribió:

> I ran the test and the result was the following:
>
> # ./mxm_perftest
>
> [1505976675.346380] [compute1:55801:0]         sys.c:744  MXM  WARN
> Conflicting CPU frequencies detected, using: 3600.52
>
> Waiting for connection...
>
>
> It does nothing else, it remains waiting to establish connection.
>
> <https://audio1.spanishdict.com/audio?lang=en&text=i-ran-the-test-and-the-result-was-the-following%3A>
>
> El mié., 20 sept. 2017 a las 17:12, Halim Amer (<aamer at anl.gov>) escribió:
>
>> I seems you have mismatch in the OFED stack. Try installing the Mellanox
>> OFED stack if you are using the bundled OFED stack right now.
>>
>> Make sure MXM works before trying MPICH. Use the mxm/bin/mxm_perftest
>> from your MXM installation to test that things work properly. If it
>> doesn't work, then contact your admin or Mellanox cause it is not an
>> MPICH problem.
>>
>> Halim
>> www.mcs.anl.gov/~aamer
>>
>> On 9/19/17 7:14 AM, Jason Collins wrote:
>> > Thank you very much.
>> >
>> > I have compiled with "CH3:nemesis:mxm". The compilation was successful.
>> >
>> > Now I have a new problem. I tested the test "./icp" and get the
>> > following error.
>> >
>> > # mpiexec -f hosts -n 4 ./cpi
>> > [1505822776.546898] [compute1:16212:0] sys.c:744 MXM WARN Conflicting
>> > CPU frequencies detected, using: 3459.84
>> > [1505822776.546898] [compute1:16213:0] sys.c:744 MXM WARN Conflicting
>> > CPU frequencies detected, using: 3459.84
>> > [1505822776.546951] [compute1:16216:0] sys.c:744 MXM WARN Conflicting
>> > CPU frequencies detected, using: 3459.84
>> > [1505822776.547039] [compute1:16214:0] sys.c:744 MXM WARN Conflicting
>> > CPU frequencies detected, using: 3459.84
>> > [1505822776.561357] [compute1:16214:0] ib_dev.c:533 MXM WARN failed call
>> > to ibv_exp_use_priv_env(): Function not implemented
>> > [1505822776.561371] [compute1:16214:0] ib_dev.c:544 MXM ERROR
>> > ibv_query_device() returned 38: Function not implemented
>> > [1505822776.561386] [compute1:16218:0] ib_dev.c:533 MXM WARN failed call
>> > to ibv_exp_use_priv_env(): Function not implemented
>> > [1505822776.561396] [compute1:16218:0] ib_dev.c:544 MXM ERROR
>> > ibv_query_device() returned 38: Function not implemented
>> > [1505822776.561426] [compute1:16225:0] ib_dev.c:533 MXM WARN failed call
>> > to ibv_exp_use_priv_env(): Function not implemented
>> > [1505822776.561442] [compute1:16225:0] ib_dev.c:544 MXM ERROR
>> > ibv_query_device() returned 38: Function not implemented
>> > Fatal error in MPI_Init: Other MPI error, error stack:
>> > MPIR_Init_thread(474).........:
>> > MPID_Init(190)................: channel initialization failed
>> > MPIDI_CH3_Init(89)............:
>> > MPID_nem_init(320)............:
>> > MPID_nem_mxm_init(158)........:
>> > MPID_nem_mxm_get_ordering(464): mxm_init failed (Input/output error)
>> >
>> >
>> > El vie., 15 sept. 2017 a las 16:01, Halim Amer (<aamer at anl.gov
>> > <mailto:aamer at anl.gov>>) escribió:
>> >
>> >     The "nemesis:ib" netmod does not exist anymore. Try
>> "ch3:nemesis:mxm"
>> >     with a dependency on Mellanox's MXM library (can be obtained from
>> the
>> >     HPCX package at www.mellanox.com/products/hpcx
>> >     <http://www.mellanox.com/products/hpcx>) or "ch3:nemesis:ofi"
>> >     with a dependency on libfabric (which would be built to support the
>> IB
>> >     or MXM providers; see https://ofiwg.github.io/libfabric/).
>> >
>> >     Halim
>> >     www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/~aamer>
>> >
>> >     On 9/15/17 4:20 AM, Jason Collins wrote:
>> >      > Hello everyone.
>> >      >
>> >      > Recently, I downloaded Mpich-3.2
>> >      >
>> >      > I want to configure with support for InfiniBand. I've put the
>> >     following
>> >      > command:
>> >      >
>> >      > # ./configure --prefix=/my/path --with-device=ch3:nemesis:ib
>> >      >
>> >      > And I get the following error:
>> >      >
>> >      > configure: error: Network module ib is unknown
>> >      > "./src/mpid/ch3/channels/nemesis/netmod/ib"
>> >      >
>> >      > When I check the path I confirm that in the folder "netmod" does
>> not
>> >      > exist the folder "ib". How can this be solved?
>> >      >
>> >      > Many thanks.
>> >      >
>> >     <
>> https://audio1.spanishdict.com/audio?lang=en&text=when-i-check-the-path-i-confirm-that-within-the-folder-netmod-the-folder-does-not-exist-ib-how-can-this-be-solved-many-thanks
>> >
>> >      >
>> >      >
>> >      > _______________________________________________
>> >      > discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org
>> >
>> >      > To manage subscription options or unsubscribe:
>> >      > https://lists.mpich.org/mailman/listinfo/discuss
>> >      >
>> >     _______________________________________________
>> >     discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>> >     To manage subscription options or unsubscribe:
>> >     https://lists.mpich.org/mailman/listinfo/discuss
>> >
>> >
>> >
>> > _______________________________________________
>> > discuss mailing list     discuss at mpich.org
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>> >
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170921/7bc55ebb/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list