<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div dir="ltr">I tried to run the test in this other way to get more information:<br><br><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"># mpirun -n 10 ./mxm_perftest <br></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">--------------------------------------------------------------------------</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">Failed to register memory region (MR):</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo;min-height:13px"><span style="font-variant-ligatures:no-common-ligatures"></span><br></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">Hostname: compute1</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">Address: 1d14000</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">Length: 20480</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">Error: No space left on device</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">--------------------------------------------------------------------------</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">--------------------------------------------------------------------------</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">Open MPI has detected that there are UD-capable Verbs devices on your</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">system, but none of them were able to be setup properly. This may</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">indicate a problem on this system.</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo;min-height:13px"><span style="font-variant-ligatures:no-common-ligatures"></span><br></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">You job will continue, but Open MPI will ignore the "ud" oob component</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">in this run.</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo;min-height:13px"><span style="font-variant-ligatures:no-common-ligatures"></span><br></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">Hostname: </span>compute1</p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">--------------------------------------------------------------------------</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">[1505980772.899531] [compute1:59182:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3599.84</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">Waiting for connection...</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">[1505980772.900947] [compute1:59183:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3599.84</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">[1505980772.902329] [compute1:59184:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3599.84</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">[1505980772.903490] [compute1:59185:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3599.84</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">[1505980772.904984] [compute1:59186:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3599.84</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">[1505980772.906288] [compute1:59187:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3599.84</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">[1505980772.907957] [compute1:59188:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3599.84</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">[1505980772.909023] [compute1:59189:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3599.84</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">bind() failed: Address already in use</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">bind() failed: Address already in use</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">bind() failed: Address already in use</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">bind() failed: Address already in use</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">bind() failed: Address already in use</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">bind() failed: Address already in use</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">bind() failed: Address already in use</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">-------------------------------------------------------</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">Primary job terminated normally, but 1 process returned</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">a non-zero exit code.. Per user-direction, the job has been aborted.</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">-------------------------------------------------------</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">bind() failed: Address already in use</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">[1505980772.910503] [compute1:59190:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3599.84</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">bind() failed: Address already in use</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">[1505980772.911893] [compute1:59191:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3599.84</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">--------------------------------------------------------------------------</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">mpirun detected that one or more processes exited with non-zero status, thus causing</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">the job to be terminated. The first process to do so was:</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo;min-height:13px"><span style="font-variant-ligatures:no-common-ligatures"></span><br></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures"> Process name: [[55900,1],1]</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures"> Exit code: 255</span></p></div><br><div class="gmail_quote"><div dir="ltr">El jue., 21 sept. 2017 a las 7:58, Jason Collins (<<a href="mailto:jasoncollinsw@gmail.com">jasoncollinsw@gmail.com</a>>) escribió:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I ran the test and the result was the following:<div><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures"># ./mxm_perftest </span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">[1505976675.346380] [compute1:55801:0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3600.52</span></p>
<p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures">Waiting for connection...</span></p><p style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo"><span style="font-variant-ligatures:no-common-ligatures"><br></span></p>It does nothing else, it remains waiting to establish connection.<br></div><a class="m_-7868590161563995877inbox-audio-start" href="https://audio1.spanishdict.com/audio?lang=en&text=i-ran-the-test-and-the-result-was-the-following%3A" style="background-color:transparent;text-decoration:none;color:rgb(62,134,199);font-size:14px" target="_blank"></a></div><br><div class="gmail_quote"><div dir="ltr">El mié., 20 sept. 2017 a las 17:12, Halim Amer (<<a href="mailto:aamer@anl.gov" target="_blank">aamer@anl.gov</a>>) escribió:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I seems you have mismatch in the OFED stack. Try installing the Mellanox<br>
OFED stack if you are using the bundled OFED stack right now.<br>
<br>
Make sure MXM works before trying MPICH. Use the mxm/bin/mxm_perftest<br>
from your MXM installation to test that things work properly. If it<br>
doesn't work, then contact your admin or Mellanox cause it is not an<br>
MPICH problem.<br>
<br>
Halim<br>
<a href="http://www.mcs.anl.gov/~aamer" rel="noreferrer" target="_blank">www.mcs.anl.gov/~aamer</a><br>
<br>
On 9/19/17 7:14 AM, Jason Collins wrote:<br>
> Thank you very much.<br>
><br>
> I have compiled with "CH3:nemesis:mxm". The compilation was successful.<br>
><br>
> Now I have a new problem. I tested the test "./icp" and get the<br>
> following error.<br>
><br>
> # mpiexec -f hosts -n 4 ./cpi<br>
> [1505822776.546898] [compute1:16212:0] sys.c:744 MXM WARN Conflicting<br>
> CPU frequencies detected, using: 3459.84<br>
> [1505822776.546898] [compute1:16213:0] sys.c:744 MXM WARN Conflicting<br>
> CPU frequencies detected, using: 3459.84<br>
> [1505822776.546951] [compute1:16216:0] sys.c:744 MXM WARN Conflicting<br>
> CPU frequencies detected, using: 3459.84<br>
> [1505822776.547039] [compute1:16214:0] sys.c:744 MXM WARN Conflicting<br>
> CPU frequencies detected, using: 3459.84<br>
> [1505822776.561357] [compute1:16214:0] ib_dev.c:533 MXM WARN failed call<br>
> to ibv_exp_use_priv_env(): Function not implemented<br>
> [1505822776.561371] [compute1:16214:0] ib_dev.c:544 MXM ERROR<br>
> ibv_query_device() returned 38: Function not implemented<br>
> [1505822776.561386] [compute1:16218:0] ib_dev.c:533 MXM WARN failed call<br>
> to ibv_exp_use_priv_env(): Function not implemented<br>
> [1505822776.561396] [compute1:16218:0] ib_dev.c:544 MXM ERROR<br>
> ibv_query_device() returned 38: Function not implemented<br>
> [1505822776.561426] [compute1:16225:0] ib_dev.c:533 MXM WARN failed call<br>
> to ibv_exp_use_priv_env(): Function not implemented<br>
> [1505822776.561442] [compute1:16225:0] ib_dev.c:544 MXM ERROR<br>
> ibv_query_device() returned 38: Function not implemented<br>
> Fatal error in MPI_Init: Other MPI error, error stack:<br>
> MPIR_Init_thread(474).........:<br>
> MPID_Init(190)................: channel initialization failed<br>
> MPIDI_CH3_Init(89)............:<br>
> MPID_nem_init(320)............:<br>
> MPID_nem_mxm_init(158)........:<br>
> MPID_nem_mxm_get_ordering(464): mxm_init failed (Input/output error)<br>
><br>
><br>
> El vie., 15 sept. 2017 a las 16:01, Halim Amer (<<a href="mailto:aamer@anl.gov" target="_blank">aamer@anl.gov</a><br>
> <mailto:<a href="mailto:aamer@anl.gov" target="_blank">aamer@anl.gov</a>>>) escribió:<br>
><br>
> The "nemesis:ib" netmod does not exist anymore. Try "ch3:nemesis:mxm"<br>
> with a dependency on Mellanox's MXM library (can be obtained from the<br>
> HPCX package at <a href="http://www.mellanox.com/products/hpcx" rel="noreferrer" target="_blank">www.mellanox.com/products/hpcx</a><br>
> <<a href="http://www.mellanox.com/products/hpcx" rel="noreferrer" target="_blank">http://www.mellanox.com/products/hpcx</a>>) or "ch3:nemesis:ofi"<br>
> with a dependency on libfabric (which would be built to support the IB<br>
> or MXM providers; see <a href="https://ofiwg.github.io/libfabric/" rel="noreferrer" target="_blank">https://ofiwg.github.io/libfabric/</a>).<br>
><br>
> Halim<br>
> <a href="http://www.mcs.anl.gov/~aamer" rel="noreferrer" target="_blank">www.mcs.anl.gov/~aamer</a> <<a href="http://www.mcs.anl.gov/~aamer" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/~aamer</a>><br>
><br>
> On 9/15/17 4:20 AM, Jason Collins wrote:<br>
> > Hello everyone.<br>
> ><br>
> > Recently, I downloaded Mpich-3.2<br>
> ><br>
> > I want to configure with support for InfiniBand. I've put the<br>
> following<br>
> > command:<br>
> ><br>
> > # ./configure --prefix=/my/path --with-device=ch3:nemesis:ib<br>
> ><br>
> > And I get the following error:<br>
> ><br>
> > configure: error: Network module ib is unknown<br>
> > "./src/mpid/ch3/channels/nemesis/netmod/ib"<br>
> ><br>
> > When I check the path I confirm that in the folder "netmod" does not<br>
> > exist the folder "ib". How can this be solved?<br>
> ><br>
> > Many thanks.<br>
> ><br>
> <<a href="https://audio1.spanishdict.com/audio?lang=en&text=when-i-check-the-path-i-confirm-that-within-the-folder-netmod-the-folder-does-not-exist-ib-how-can-this-be-solved-many-thanks" rel="noreferrer" target="_blank">https://audio1.spanishdict.com/audio?lang=en&text=when-i-check-the-path-i-confirm-that-within-the-folder-netmod-the-folder-does-not-exist-ib-how-can-this-be-solved-many-thanks</a>><br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a> <mailto:<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>><br>
> > To manage subscription options or unsubscribe:<br>
> > <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
> ><br>
> _______________________________________________<br>
> discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a> <mailto:<a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a>><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
><br>
><br>
><br>
> _______________________________________________<br>
> discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
><br>
_______________________________________________<br>
discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a></blockquote></div></blockquote></div>