No subject


Tue Jun 18 13:52:11 CDT 2019


as less latency than 4KB.

I was looking for explanation of this behavior  but did not get any.


  1.  MPIR_CVAR_CH3_EAGER_MAX_MSG_SIZE is set to 128KB. So none of the abov=
e message size is using Rendezvous protocol. Is there any partition inside =
eager protocol (e.g. 0 - 512 bytes, 1KB - 8KB, 16KB - 64KB)? If yes then wh=
at are the boundaries for them? Can I log them with debug-event-logging?


Setup I am using:

- two nodes has intel core i7, one with 16gb memory another one 8gb

- mpich 3.2.1, configured and build to use nemesis tcp

- 1gb Ethernet connection

- NFS is using for sharing

- osu_latency : uses MPI_Send and MPI_Recv

- MPIR_CVAR_CH3_EAGER_MAX_MSG_SIZE=3D 131072 (128KB)


Can anyone help me on that? Thanks in advance.




Best Regards,

Abu Naser

_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss




--
Jeff Hammond
jeff.science at gmail.com<mailto:jeff.science at gmail.com>
http://jeffhammond.github.io/



_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss





_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss





_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss



--_000_BLUPR0501MB2003DCD7FDB382061050A6B997430BLUPR0501MB2003_
Content-Type: text/html; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

<html><head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1=
252">
<style type=3D"text/css" style=3D"display:none;"><!-- P {margin-top:0;margi=
n-bottom:0;} --></style>
</head>
<body dir=3D"ltr">
<div id=3D"divtagdefaultwrapper" style=3D"font-size: 12pt; color: rgb(0, 0,=
 0); font-family: Calibri, Helvetica, sans-serif, "EmojiFont", &q=
uot;Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, &q=
uot;Segoe UI Symbol", "Android Emoji", EmojiSymbols;" dir=3D=
"ltr">
<div id=3D"divtagdefaultwrapper" style=3D"font-size: 12pt; color: rgb(0, 0,=
 0); font-family: Calibri, Helvetica, sans-serif, "EmojiFont", &q=
uot;Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, &q=
uot;Segoe UI Symbol", "Android Emoji", EmojiSymbols;" dir=3D=
"ltr">
<p style=3D"margin-top:0;margin-bottom:0">Hello Min,</p>
<p style=3D"margin-top:0;margin-bottom:0"><br>
</p>
<p style=3D"margin-top:0;margin-bottom:0">I have downloaded it from <a href=
=3D"http://www.mpich.org/static/downloads/3.2.1/mpich-3.2.1.tar.gz" class=
=3D"OWAAutoLink" id=3D"LPlnk943697" previewremoved=3D"true">
http://www.mpich.org/static/downloads/3.2.1/mpich-3.2.1.tar.gz</a> but it d=
id not work. I have received almost same error. Except this time no process=
 information from my remote machine.</p>
<p style=3D"margin-top:0;margin-bottom:0"><b>Previously I have received thi=
s - </b>
<br>
</p>
<p style=3D"margin-top:0;margin-bottom:0"></p>
<div style=3D""><i><span style=3D"font-size: 10pt; color: rgb(255, 0, 0);">=
Process 3 of 4 is on dhcp16194</span></i></div>
<div style=3D""><i><span style=3D"font-size: 10pt; color: rgb(255, 0, 0);">=
Process 1 of 4 is on dhcp16194</span></i></div>
<div style=3D""><i><span style=3D"font-size: 10pt; color: rgb(255, 0, 0);">=
Process 0 of 4 is on dhcp16198</span></i></div>
<div style=3D""><i><span style=3D"font-size: 10pt; color: rgb(255, 0, 0);">=
Process 2 of 4 is on dhcp16198</span></i></div>
<p></p>
<p style=3D"margin-top:0;margin-bottom:0"><b>With the new source code -</b>=
</p>
<p style=3D"margin-top:0;margin-bottom:0"></p>
<div><i><span style=3D"font-size: 10pt; color: rgb(255, 0, 0);">Process 0 o=
f 4 is on dhcp16198</span></i></div>
<div><i><span style=3D"font-size: 10pt; color: rgb(255, 0, 0);">Process 2 o=
f 4 is on dhcp16198</span></i></div>
<p></p>
<p style=3D"margin-top:0;margin-bottom:0"><br>
</p>
<p style=3D"margin-top:0;margin-bottom:0"><b>Entire error message is:</b></=
p>
<p style=3D"margin-top:0;margin-bottom:0"></p>
<div><i><span style=3D"font-size: 10pt;">Process 0 of 4 is on dhcp16198</sp=
an></i></div>
<div><i><span style=3D"font-size: 10pt;">Process 2 of 4 is on dhcp16198</sp=
an></i></div>
<div><i><span style=3D"font-size: 10pt;">Fatal error in PMPI_Bcast: Unknown=
 error class, error stack:</span></i></div>
<div><i><span style=3D"font-size: 10pt;">PMPI_Bcast(1600)..................=
..........: MPI_Bcast(buf=3D0x7ffd1ee145f0, count=3D1, MPI_INT, root=3D0, M=
PI_COMM_WORLD) failed</span></i></div>
<div><i><span style=3D"font-size: 10pt;">MPIR_Bcast_impl(1452).............=
..........: </span></i></div>
<div><i><span style=3D"font-size: 10pt;">MPIR_Bcast(1476)..................=
..........: </span></i></div>
<div><i><span style=3D"font-size: 10pt;">MPIR_Bcast_intra(1249)............=
..........: </span></i></div>
<div><i><span style=3D"font-size: 10pt;">MPIR_SMP_Bcast(1081)..............=
..........: </span></i></div>
<div><i><span style=3D"font-size: 10pt;">MPIR_Bcast_binomial(285)..........=
..........: </span></i></div>
<div><i><span style=3D"font-size: 10pt;">MPIC_Send(303)....................=
..........: </span></i></div>
<div><i><span style=3D"font-size: 10pt;">MPIC_Wait(226)....................=
..........: </span></i></div>
<div><i><span style=3D"font-size: 10pt;">MPIDI_CH3i_Progress_wait(242).....=
..........: an error occurred while handling an event returned by MPIDU_Soc=
k_Wait()</span></i></div>
<div><i><span style=3D"font-size: 10pt;">MPIDI_CH3I_Progress_handle_sock_ev=
ent(698)..: </span></i></div>
<div><i><span style=3D"font-size: 10pt;">MPIDI_CH3_Sockconn_handle_connect_=
event(597): [ch3:sock] failed to connnect to remote process</span></i></div=
>
<div><i><span style=3D"font-size: 10pt;">MPIDU_Socki_handle_connect(808)...=
..........: connection failure (set=3D0,sock=3D1,errno=3D111:Connection ref=
used)</span></i></div>
<div><i><span style=3D"font-size: 10pt;">MPIR_SMP_Bcast(1088)..............=
..........: </span></i></div>
<div><i><span style=3D"font-size: 10pt;">MPIR_Bcast_binomial(310)..........=
..........: Failure during collective</span></i></div>
<div><i><span style=3D"font-size: 10pt;">Fatal error in PMPI_Bcast: Other M=
PI error, error stack:</span></i></div>
<div><i><span style=3D"font-size: 10pt;">PMPI_Bcast(1600)........: MPI_Bcas=
t(buf=3D0x7ffe2eeb90f0, count=3D1, MPI_INT, root=3D0, MPI_COMM_WORLD) faile=
d</span></i></div>
<div><i><span style=3D"font-size: 10pt;">MPIR_Bcast_impl(1452)...: </s=
pan></i></div>
<div><i><span style=3D"font-size: 10pt;">MPIR_Bcast(1476)........: </s=
pan></i></div>
<div><i><span style=3D"font-size: 10pt;">MPIR_Bcast_intra(1249)..: </s=
pan></i></div>
<div><i><span style=3D"font-size: 10pt;">MPIR_SMP_Bcast(1088)....: </s=
pan></i></div>
<div><i><span style=3D"font-size: 10pt;">MPIR_Bcast_binomial(310): Failure =
during collective</span></i></div>
<br>
<p></p>
<p style=3D"margin-top:0;margin-bottom:0">Again if I configure the new sour=
ce with tcp, it works fine.</p>
<p style=3D"margin-top:0;margin-bottom:0"><br>
</p>
<p style=3D"margin-top:0;margin-bottom:0">Thank You.<br>
</p>
<div id=3D"Signature">
<div id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; color=
:rgb(0,0,0); font-family:Calibri,Helvetica,sans-serif,"EmojiFont"=
,"Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,&q=
uot;Segoe UI Symbol","Android Emoji",EmojiSymbols">
<p><br>
</p>
<p align=3D"left"><span style=3D"font-size:10pt; font-family:Calibri,Helvet=
ica,sans-serif">Best Regards,</span></p>
<span style=3D"font-family:Calibri,Helvetica,sans-serif; font-size:10pt"></=
span>
<div align=3D"left"><span style=3D"font-size:11pt; font-family:Calibri,Helv=
etica,sans-serif"></span></div>
<span style=3D"font-family:Calibri,Helvetica,sans-serif; font-size:10pt"></=
span>
<p align=3D"left"><span style=3D"font-size:10pt; font-family:Calibri,Helvet=
ica,sans-serif">Abu Naser</span><br>
</p>
</div>
</div>
</div>
<hr style=3D"display:inline-block;width:98%" tabindex=3D"-1">
<div id=3D"divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face=
=3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Min Si <msi at anl.=
gov><br>
<b>Sent:</b> Monday, July 2, 2018 11:56:51 AM<br>
<b>To:</b> discuss at mpich.org<br>
<b>Subject:</b> Re: [mpich-discuss] osu_latency test: why 8KB takes less ti=
me than 4KB and 2KB takes less time than 1KB?</font>
<div> </div>
</div>
<meta content=3D"text/html; charset=3DWindows-1252">
<div style=3D"background-color:#FFFFFF">Hi Abu,<br>
<br>
Thanks for reporting this. Can you please try the latest release with ch3/s=
ock and see if you still have this error ?
<br>
<br>
Min<br>
<div class=3D"x_moz-cite-prefix">On 2018/07/01 21:47, Abu Naser wrote:<br>
</div>
<blockquote type=3D"cite">
<div id=3D"x_divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size: 12pt; co=
lor: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif, "Emoji=
Font", "Apple Color Emoji", "Segoe UI Emoji", Noto=
ColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSy=
mbols;">
<div id=3D"x_divtagdefaultwrapper" dir=3D"ltr" style=3D"">
<p style=3D"">Hello Min,</p>
<p style=3D""><br>
</p>
<p style=3D"">After compiling my mpich-3.2.1 with sock, while I was trying =
to run  any program including osu benchmark or examples/cpi&=
nbsp; in two machines, I have received following error -</p>
<p style=3D""><br>
</p>
<div style=3D""><i><span style=3D"font-size:10pt">Process 3 of 4 is on dhcp=
16194</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">Process 1 of 4 is on dhcp=
16194</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">Process 0 of 4 is on dhcp=
16198</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">Process 2 of 4 is on dhcp=
16198</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">Fatal error in PMPI_Bcast=
: Unknown error class, error stack:</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">PMPI_Bcast(1600).........=
...................: MPI_Bcast(buf=3D0x7ffc1808542c, count=3D1, MPI_INT, ro=
ot=3D0, MPI_COMM_WORLD) failed</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_Bcast_impl(1452)....=
...................: </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_Bcast(1476).........=
...................: </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_Bcast_intra(1249)...=
...................: </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_SMP_Bcast(1081).....=
...................: </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_Bcast_binomial(285).=
...................: </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIC_Send(303)...........=
...................: </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIC_Wait(226)...........=
...................: </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIDI_CH3i_Progress_wait(=
242)...............: an error occurred while handling an event returned by =
MPIDU_Sock_Wait()</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIDI_CH3I_Progress_handl=
e_sock_event(698)..: </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIDI_CH3_Sockconn_handle=
_connect_event(597): [ch3:sock] failed to connnect to remote process</span>=
</i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIDU_Socki_handle_connec=
t(808).............: connection failure (set=3D0,sock=3D1,errno=3D111:Conne=
ction refused)</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_SMP_Bcast(1088).....=
...................: </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_Bcast_binomial(310).=
...................: Failure during collective</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">Fatal error in PMPI_Bcast=
: Other MPI error, error stack:</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">PMPI_Bcast(1600)........:=
 MPI_Bcast(buf=3D0x7ffd9eeebdac, count=3D1, MPI_INT, root=3D0, MPI_COMM_WOR=
LD) failed</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_Bcast_impl(1452)...:=
 </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_Bcast(1476)........:=
 </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_Bcast_intra(1249)..:=
 </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_SMP_Bcast(1088)....:=
 </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_Bcast_binomial(310):=
 Failure during collective</span></i></div>
<br style=3D"">
<p style=3D""></p>
<p style=3D""><span style=3D"font-size:12pt">I checked the mpich FAQ a=
nd also mpich discussion list. Based on that I have checked </span>fol=
lowings<span style=3D"font-size:12pt"> </span><span style=3D"font-size=
:12pt">and found  they are fine in my machines -</span><br>
</p>
<p style=3D""><span style=3D"font-size:12pt">- firewall is disabled in both=
 machine</span></p>
<p style=3D""><span style=3D"font-size:12pt">- I can do </span>passwor=
d less<span style=3D"font-size:12pt"> ssh in both machine</span></p>
<p style=3D""><span style=3D"font-size:12pt">- /etc/hosts in both machine c=
onfigured with ip address and name properly</span></p>
<p style=3D""><span style=3D"font-size:12pt">- I have updated the library p=
ath and used absolute path for mpiexec</span></p>
<p style=3D""><span style=3D"font-size:12pt">- Most importantly when I conf=
igured and build mpich with tcp, it works fine.</span></p>
<p style=3D""><span style=3D"font-size:12pt"><br>
</span></p>
<p style=3D""><span style=3D"font-size:12pt"> I think I am </span=
><span style=3D"font-size:12pt">missing something but could not figure out =
yet. Any help would be
</span>appreciated<span style=3D"font-size:12pt">.</span></p>
<p style=3D""><span style=3D"font-size:12pt"><br>
</span></p>
<p style=3D""><span style=3D"font-size:12pt">Thank you.</span></p>
<br>
<p style=3D""><br>
</p>
<p style=3D""><br>
</p>
<p style=3D""><br>
</p>
<div id=3D"x_Signature" style=3D"">
<div id=3D"x_divtagdefaultwrapper" dir=3D"ltr" style=3D"">
<p><br>
</p>
<p align=3D"left"><span style=3D"font-size:10pt; font-family:Calibri,Helvet=
ica,sans-serif">Best Regards,</span></p>
<span style=3D"font-family:Calibri,Helvetica,sans-serif; font-size:10pt"></=
span>
<div align=3D"left"><span style=3D"font-size:11pt; font-family:Calibri,Helv=
etica,sans-serif"></span></div>
<span style=3D"font-family:Calibri,Helvetica,sans-serif; font-size:10pt"></=
span>
<p align=3D"left"><span style=3D"font-size:10pt; font-family:Calibri,Helvet=
ica,sans-serif">Abu Naser</span><br>
</p>
</div>
</div>
</div>
<hr style=3D"display:inline-block; width:98%" tabindex=3D"-1">
<div id=3D"x_divRplyFwdMsg" dir=3D"ltr"><font style=3D"font-size:11pt" face=
=3D"Calibri, sans-serif" color=3D"#000000"><b>From:</b> Min Si
<a class=3D"x_moz-txt-link-rfc2396E OWAAutoLink" href=3D"mailto:msi at anl.gov=
" id=3D"LPlnk191149" previewremoved=3D"true">
<msi at anl.gov></a><br>
<b>Sent:</b> Tuesday, June 26, 2018 12:54:29 PM<br>
<b>To:</b> <a class=3D"x_moz-txt-link-abbreviated OWAAutoLink" href=3D"mail=
to:discuss at mpich.org" id=3D"LPlnk414203" previewremoved=3D"true">
discuss at mpich.org</a><br>
<b>Subject:</b> Re: [mpich-discuss] osu_latency test: why 8KB takes less ti=
me than 4KB and 2KB takes less time than 1KB?</font>
<div> </div>
</div>
<meta content=3D"text/html; charset=3DWindows-1252">
<div style=3D"background-color:#FFFFFF">Hi Abu,<br>
<br>
I think the results are stable enough. Perhaps you could also try the follo=
wing tests, and see if similar trend exists:<br>
- MPICH/socket (set `--with-device=3Dch3:sock` at configure)<br>
- A socket-based pingpong test without MPI. <br>
<br>
At this point, I could not think of any MPI-specific design for 2k/8k messa=
ges. My guess is that it is related to your network connection.
<br>
<br>
Min<br>
<br>
<div class=3D"x_x_moz-cite-prefix">On 2018/06/24 11:09, Abu Naser wrote:<br=
>
</div>
<blockquote type=3D"cite">
<meta content=3D"text/html; charset=3DWindows-1252">
<div id=3D"x_x_divtagdefaultwrapper" dir=3D"ltr">
<div id=3D"x_x_divtagdefaultwrapper" dir=3D"ltr">
<p>Hello Min and Jeff,</p>
<p><br>
</p>
<p>Here is my experiment results. Default number of iterations in osu_=
latency for 0B =96 8KB is 10,000. With that setting I had run the osu_laten=
cy 100 times and found standard deviation 33 for 8KB message size.</p>
<p><br>
</p>
<p>So later I have set the iteration to 50,000 and 100,000 for 1KB =96 16KB=
 message size. Then run osu_latency for 100 times for each setting and take=
 the average and standard deviation.</p>
<p><br>
</p>
<table width=3D"665">
<colgroup><col width=3D"99"><col width=3D"112"><col width=3D"118"><col widt=
h=3D"154"><col width=3D"140"></colgroup>
<tbody>
<tr>
<td width=3D"99">
<p><b>Msg Size in Bytes</b></p>
</td>
<td width=3D"112">
<p><b>Avg time in us (50K iterations)</b></p>
</td>
<td width=3D"118">
<p><b>Avg time in us (100k iterations)</b></p>
</td>
<td width=3D"154">
<p><b>Standard deviation (50K iterations)</b></p>
</td>
<td width=3D"140">
<p><b>Standard deviation (100K iterations)</b></p>
</td>
</tr>
<tr>
<td width=3D"99">
<p>1k</p>
</td>
<td width=3D"112">
<p>85.10</p>
</td>
<td width=3D"118">
<p>84.9</p>
</td>
<td width=3D"154">
<p>0.55</p>
</td>
<td width=3D"140">
<p>0.45</p>
</td>
</tr>
<tr>
<td width=3D"99">
<p>2k</p>
</td>
<td width=3D"112">
<p>75.79</p>
</td>
<td width=3D"118">
<p>74.63</p>
</td>
<td width=3D"154">
<p>5.09</p>
</td>
<td width=3D"140">
<p>4.44</p>
</td>
</tr>
<tr>
<td width=3D"99">
<p>4k</p>
</td>
<td width=3D"112">
<p>273.80</p>
</td>
<td width=3D"118">
<p>274.71</p>
</td>
<td width=3D"154">
<p>4.18</p>
</td>
<td width=3D"140">
<p>2.45</p>
</td>
</tr>
<tr>
<td width=3D"99">
<p>8k</p>
</td>
<td width=3D"112">
<p>258.56</p>
</td>
<td width=3D"118">
<p>249.83</p>
</td>
<td width=3D"154">
<p>21.14</p>
</td>
<td width=3D"140">
<p>28</p>
</td>
</tr>
<tr>
<td width=3D"99" height=3D"24">
<p>16k</p>
</td>
<td width=3D"112">
<p>281.31</p>
</td>
<td width=3D"118">
<p>281.02</p>
</td>
<td width=3D"154">
<p>3.22</p>
</td>
<td width=3D"140">
<p>4.10</p>
</td>
</tr>
</tbody>
</table>
<p><br>
</p>
<p><br>
</p>
<p>The standard deviation of 8K message is so high and that implies it actu=
ally not producing any consistent latency time. Looks like that's the =
reason for 8K is taking less time than 4K.</p>
<p><br>
</p>
<p>Meanwhile, 2K has standard deviation less than 5 but 1K message latency =
timing are more densely populated than 2K. So probably this is the explanat=
ion for 2K message less latency time.</p>
<p><br>
</p>
<p>Thank you for your suggestions.</p>
<br>
<p><br>
</p>
<div id=3D"x_x_Signature">
<div id=3D"x_x_divtagdefaultwrapper" dir=3D"ltr">
<p><br>
</p>
<p><span>Best Regards,</span></p>
<span></span>
<div><span></span></div>
<span></span>
<p><span>Abu Naser</span><br>
</p>
</div>
</div>
</div>
<hr tabindex=3D"-1">
<div id=3D"x_x_divRplyFwdMsg" dir=3D"ltr"><b>From:</b> Abu Naser<br>
<b>Sent:</b> Wednesday, June 20, 2018 1:48:53 PM<br>
<b>To:</b> <a class=3D"x_x_moz-txt-link-abbreviated x_OWAAutoLink" href=3D"=
mailto:discuss at mpich.org" id=3D"LPlnk729146" previewremoved=3D"true">
discuss at mpich.org</a><br>
<b>Subject:</b> Re: [mpich-discuss] osu_latency test: why 8KB takes less ti=
me than 4KB and 2KB takes less time than 1KB?
<div> </div>
</div>
<meta content=3D"text/html; charset=3Diso-8859-1">
<div dir=3D"ltr">
<div id=3D"x_x_x_divtagdefaultwrapper" dir=3D"ltr">
<div id=3D"x_x_x_divtagdefaultwrapper" dir=3D"ltr">
<p>Hello Min,</p>
<p><br>
</p>
<p>Thanks for the clarification.  I will do the experiment.<br>
</p>
<p><br>
</p>
<div id=3D"x_x_x_Signature">
<div id=3D"x_x_x_divtagdefaultwrapper" dir=3D"ltr">
<p>Thanks.</p>
<p><span>Best Regards,</span></p>
<span></span>
<div><span></span></div>
<span></span>
<p><span>Abu Naser</span><br>
</p>
</div>
</div>
</div>
<hr tabindex=3D"-1">
<div id=3D"x_x_x_divRplyFwdMsg" dir=3D"ltr"><b>From:</b> Min Si <a class=3D=
"x_x_moz-txt-link-rfc2396E x_OWAAutoLink" href=3D"mailto:msi at anl.gov" id=3D=
"LPlnk558260" previewremoved=3D"true">
<msi at anl.gov></a><br>
<b>Sent:</b> Wednesday, June 20, 2018 1:39:30 PM<br>
<b>To:</b> <a class=3D"x_x_moz-txt-link-abbreviated x_OWAAutoLink" href=3D"=
mailto:discuss at mpich.org" id=3D"LPlnk472728" previewremoved=3D"true">
discuss at mpich.org</a><br>
<b>Subject:</b> Re: [mpich-discuss] osu_latency test: why 8KB takes less ti=
me than 4KB and 2KB takes less time than 1KB?
<div> </div>
</div>
<meta content=3D"text/html; charset=3DWindows-1252">
<div>Hi Abu,<br>
<br>
I think Jeff means that you should run your experiment with more iterations=
 in order to get a stable results.<br>
- Increase the iteration of for loop in each execution (I think osu benchma=
rk allows you to set it)<br>
- Run the experiments 10 or 100 times, and take the average and standard de=
viation.<br>
<br>
If you see a very small standard deviation (e.g., <=3D5%), then the tren=
d is stable and you might not see such gaps.<br>
<br>
Best regards,<br>
Min<br>
<div class=3D"x_x_x_x_moz-cite-prefix">On 2018/06/20 12:14, Abu Naser wrote=
:<br>
</div>
<blockquote type=3D"cite">
<div id=3D"x_x_x_x_divtagdefaultwrapper" dir=3D"ltr">
<p>Hello Jeff,</p>
<p><br>
</p>
<p>Yes, I am using a switch and other machines are also connected with=
 that switch.
<br>
</p>
<p>If I remove other machines and just use my two node with the switch, the=
n will it improve the performance by 200 ~ 400 iterations?</p>
<p>Meanwhile I will give a try with a single dedicated cable. <span></span>=
<br>
</p>
<p><br>
</p>
<p>Thank you.<br>
</p>
<div id=3D"x_x_x_x_Signature">
<div id=3D"x_x_x_x_divtagdefaultwrapper" dir=3D"ltr">
<p><br>
</p>
<p><span>Best Regards,</span></p>
<span></span>
<div><span></span></div>
<span></span>
<p><span>Abu Naser</span><br>
</p>
</div>
</div>
</div>
<hr tabindex=3D"-1">
<div id=3D"x_x_x_x_divRplyFwdMsg" dir=3D"ltr"><b>From:</b> Jeff Hammond <a =
class=3D"x_x_x_x_moz-txt-link-rfc2396E x_x_x_OWAAutoLink" href=3D"mailto:je=
ff.science at gmail.com" id=3D"LPlnk983157" previewremoved=3D"true">
<jeff.science at gmail.com></a><br>
<b>Sent:</b> Wednesday, June 20, 2018 12:52:06 PM<br>
<b>To:</b> MPICH<br>
<b>Subject:</b> Re: [mpich-discuss] osu_latency test: why 8KB takes less ti=
me than 4KB and 2KB takes less time than 1KB?
<div> </div>
</div>
<meta content=3D"text/html; charset=3Dutf-8">
<div>
<div dir=3D"ltr">Is the ethernet connection a single dedicated cable betwee=
n the two machines or are you running through a switch that handles other t=
raffic?
<div><br>
</div>
<div>My best guess is that this is noise and that you may be able to avoid =
it by running a very long time, e.g. 10000 iterations.</div>
<div><br>
</div>
<div>Jeff</div>
</div>
<div class=3D"x_x_x_x_x_gmail_extra"><br>
<div class=3D"x_x_x_x_x_gmail_quote">On Wed, Jun 20, 2018 at 6:53 AM, Abu N=
aser <span dir=3D"ltr">
<<a href=3D"mailto:an16e at my.fsu.edu" target=3D"_blank" id=3D"LPlnk305789=
" class=3D"x_x_x_OWAAutoLink" previewremoved=3D"true">an16e at my.fsu.edu</a>&=
gt;</span> wrote:<br>
<blockquote class=3D"x_x_x_x_x_gmail_quote">
<div dir=3D"ltr">
<div id=3D"x_x_x_x_x_m_6077755676379859201divtagdefaultwrapper" dir=3D"ltr"=
>
<p><br>
</p>
<p>Good day to all,</p>
<p><br>
</p>
<p>I had run point to point osu_latency test in two nodes for 200 times.&nb=
sp; Followings are the average time in microsecond for various size of the =
messages -</p>
<div>1KB    84.8514 us<br>
<span>2KB    73.52535</span> us<br>
4KB    272.55275 us<br>
<span>8KB    234.86385</span> us<br>
16KB    288.88 us<br>
32KB    523.3725 us<br>
64KB    910.4025 us</div>
<p><br>
</p>
<p>From the above looks like, 2KB message has less latency than 1 KB and 8K=
B has less latency than 4KB.
<br>
</p>
<p>I was looking for explanation of this behavior  but did not get any=
.</p>
<p><br>
</p>
<ol>
<li><span>MPIR_CVAR_CH3_EAGER_MAX_MSG_<wbr>SIZE</span><span> is set to 128K=
B. So none of the above message size is using Rendezvous protocol. Is there=
 any partition inside eager protocol (e.g. 0 - 512 bytes, 1KB - 8KB, 16KB -=
 64KB)? If yes then what are the
 boundaries for them? Can I log them with debug-event-logging? </span><br>
</li></ol>
<p><br>
</p>
<p>Setup I am using:</p>
<p>- two nodes has intel core i7, one with 16gb memory another one 8gb</p>
<p>- mpich 3.2.1, configured and build to use nemesis tcp</p>
<p>- 1gb Ethernet connection</p>
<p>- NFS is using for sharing<br>
</p>
<p>- osu_latency : uses MPI_Send and MPI_Recv</p>
<p>- <span>MPIR_CVAR_CH3_EAGER_MAX_MSG_<wbr>SIZE</span>=3D <span>131072</sp=
an> (128KB)<br>
</p>
<p><br>
</p>
<p>Can anyone help me on that? Thanks in advance.<br>
</p>
<p><br>
</p>
<p><br>
</p>
<div id=3D"x_x_x_x_x_m_6077755676379859201Signature">
<div id=3D"x_x_x_x_x_m_6077755676379859201divtagdefaultwrapper" dir=3D"ltr"=
>
<p><br>
</p>
<p><span>Best Regards,</span></p>
<span></span>
<div><span></span></div>
<span></span>
<p><span>Abu Naser</span><br>
</p>
</div>
</div>
</div>
</div>
<br>
______________________________<wbr>_________________<br>
discuss mailing list     <a href=3D"mailto:discuss at mpich.org=
" id=3D"LPlnk816471" class=3D"x_x_x_OWAAutoLink" previewremoved=3D"true">di=
scuss at mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href=3D"https://lists.mpich.org/mailman/listinfo/discuss" rel=3D"norefer=
rer" target=3D"_blank" id=3D"LPlnk624595" class=3D"x_x_x_OWAAutoLink" previ=
ewremoved=3D"true">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a=
><br>
<br>
</blockquote>
</div>
<br>
<br>
<div><br>
</div>
-- <br>
<div class=3D"x_x_x_x_x_gmail_signature">Jeff Hammond<br>
<a href=3D"mailto:jeff.science at gmail.com" target=3D"_blank" id=3D"LPlnk3149=
93" class=3D"x_x_x_OWAAutoLink" previewremoved=3D"true">jeff.science at gmail.=
com</a><br>
<a href=3D"http://jeffhammond.github.io/" target=3D"_blank" id=3D"LPlnk8614=
34" class=3D"x_x_x_OWAAutoLink" previewremoved=3D"true">http://jeffhammond.=
github.io/</a></div>
</div>
</div>
<br>
<fieldset class=3D"x_x_x_x_mimeAttachmentHeader"></fieldset> <br>
<pre>_______________________________________________=0A=
discuss mailing list     <a class=3D"x_x_x_x_moz-txt-link-abbreviated x_x_x=
_OWAAutoLink" href=3D"mailto:discuss at mpich.org" id=3D"LPlnk657371" previewr=
emoved=3D"true">discuss at mpich.org</a>=0A=
To manage subscription options or unsubscribe:=0A=
<a class=3D"x_x_x_x_moz-txt-link-freetext x_x_x_OWAAutoLink" href=3D"https:=
//lists.mpich.org/mailman/listinfo/discuss" id=3D"LPlnk669988" previewremov=
ed=3D"true">https://lists.mpich.org/mailman/listinfo/discuss</a>=0A=
</pre>
</blockquote>
<br>
</div>
</div>
</div>
</div>
<br>
<fieldset class=3D"x_x_mimeAttachmentHeader"></fieldset> <br>
<pre>_______________________________________________=0A=
discuss mailing list     <a class=3D"x_x_moz-txt-link-abbreviated x_OWAAuto=
Link" href=3D"mailto:discuss at mpich.org" id=3D"LPlnk832953" previewremoved=
=3D"true">discuss at mpich.org</a>=0A=
To manage subscription options or unsubscribe:=0A=
<a class=3D"x_x_moz-txt-link-freetext x_OWAAutoLink" href=3D"https://lists.=
mpich.org/mailman/listinfo/discuss" id=3D"LPlnk481779" previewremoved=3D"tr=
ue">https://lists.mpich.org/mailman/listinfo/discuss</a>=0A=
</pre>
</blockquote>
<br>
</div>
</div>
<br>
<fieldset class=3D"x_mimeAttachmentHeader"></fieldset> <br>
<pre>_______________________________________________=0A=
discuss mailing list     <a class=3D"x_moz-txt-link-abbreviated OWAAutoLink=
" href=3D"mailto:discuss at mpich.org" id=3D"LPlnk408695" previewremoved=3D"tr=
ue">discuss at mpich.org</a>=0A=
To manage subscription options or unsubscribe:=0A=
<a class=3D"x_moz-txt-link-freetext OWAAutoLink" href=3D"https://lists.mpic=
h.org/mailman/listinfo/discuss" id=3D"LPlnk572504" previewremoved=3D"true">=
https://lists.mpich.org/mailman/listinfo/discuss</a>=0A=
</pre>
</blockquote>
<br>
</div>
</div>
</body>
</html>

--_000_BLUPR0501MB2003DCD7FDB382061050A6B997430BLUPR0501MB2003_--

--===============0610270028680441889==
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss

--===============0610270028680441889==--


More information about the discuss mailing list