No subject


Tue Jun 18 13:52:11 CDT 2019


as less latency than 4KB.

I was looking for explanation of this behavior  but did not get any.


  1.  MPIR_CVAR_CH3_EAGER_MAX_MSG_SIZE is set to 128KB. So none of the abov=
e message size is using Rendezvous protocol. Is there any partition inside =
eager protocol (e.g. 0 - 512 bytes, 1KB - 8KB, 16KB - 64KB)? If yes then wh=
at are the boundaries for them? Can I log them with debug-event-logging?


Setup I am using:

- two nodes has intel core i7, one with 16gb memory another one 8gb

- mpich 3.2.1, configured and build to use nemesis tcp

- 1gb Ethernet connection

- NFS is using for sharing

- osu_latency : uses MPI_Send and MPI_Recv

- MPIR_CVAR_CH3_EAGER_MAX_MSG_SIZE=3D 131072 (128KB)


Can anyone help me on that? Thanks in advance.




Best Regards,

Abu Naser

_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss




--
Jeff Hammond
jeff.science at gmail.com<mailto:jeff.science at gmail.com>
http://jeffhammond.github.io/



_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss





_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss





_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss





_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss



--_000_BLUPR0501MB2003D13F739833B52E59DBB097430BLUPR0501MB2003_
Content-Type: text/html; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

<html><head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1=
252">
<style type=3D"text/css" style=3D"display:none;"><!-- P {margin-top:0;margi=
n-bottom:0;} --></style>
</head>
<body dir=3D"ltr">
<div id=3D"divtagdefaultwrapper" style=3D"font-size:12pt;color:#000000;font=
-family:Calibri,Helvetica,sans-serif;" dir=3D"ltr">
<p style=3D"margin-top:0;margin-bottom:0">Hello Min,</p>
<p style=3D"margin-top:0;margin-bottom:0"><br>
</p>
<p style=3D"margin-top:0;margin-bottom:0">Now for some cases it is working =
and for some cases not.</p>
<p style=3D"margin-top:0;margin-bottom:0"><br>
</p>
<p style=3D"margin-top:0;margin-bottom:0"><b>Cases when it worked:</b></p>
<p style=3D"margin-top:0;margin-bottom:0">- when application binary (e.g cp=
i, osu_bw, osu_latency) is compiled with other mpicc (generated when config=
ured with tcp), then mpiexec (<span>generated when</span> configured with s=
ock) could run it.</p>
<p style=3D"margin-top:0;margin-bottom:0"><br>
</p>
<p style=3D"margin-top:0;margin-bottom:0"><b>Cases not working:</b></p>
<p style=3D"margin-top:0;margin-bottom:0">- application binary (e.g cpi) is=
 compiled with mpicc (<span>generated when</span> configured with sock), th=
en mpiexec (<span>generated when</span> configured with sock) could not run=
 it and produce the same error message.
 [lib path was set]</p>
<p style=3D"margin-top:0;margin-bottom:0"><br>
</p>
<p style=3D"margin-top:0;margin-bottom:0">Thank you.<br>
</p>
<p style=3D"margin-top:0;margin-bottom:0"><br>
</p>
<div id=3D"Signature">
<div id=3D"divtagdefaultwrapper" dir=3D"ltr" style=3D"font-size:12pt; color=
:rgb(0,0,0); font-family:Calibri,Helvetica,sans-serif,"EmojiFont"=
,"Apple Color Emoji","Segoe UI Emoji",NotoColorEmoji,&q=
uot;Segoe UI Symbol","Android Emoji",EmojiSymbols">
<p><br>
</p>
<p align=3D"left"><span style=3D"font-size:10pt; font-family:Calibri,Helvet=
ica,sans-serif">Best Regards,</span></p>
<span style=3D"font-family:Calibri,Helvetica,sans-serif; font-size:10pt"></=
span>
<div align=3D"left"><span style=3D"font-size:11pt; font-family:Calibri,Helv=
etica,sans-serif"></span></div>
<span style=3D"font-family:Calibri,Helvetica,sans-serif; font-size:10pt"></=
span>
<p align=3D"left"><span style=3D"font-size:10pt; font-family:Calibri,Helvet=
ica,sans-serif">Abu Naser</span><br>
</p>
</div>
</div>
</div>
<hr style=3D"display:inline-block;width:98%" tabindex=3D"-1">
<div id=3D"divRplyFwdMsg" dir=3D"ltr"><font face=3D"Calibri, sans-serif" st=
yle=3D"font-size:11pt" color=3D"#000000"><b>From:</b> Min Si <msi at anl.go=
v><br>
<b>Sent:</b> Monday, July 2, 2018 2:10:23 PM<br>
<b>To:</b> discuss at mpich.org<br>
<b>Subject:</b> Re: [mpich-discuss] osu_latency test: why 8KB takes less ti=
me than 4KB and 2KB takes less time than 1KB?</font>
<div> </div>
</div>
<meta content=3D"text/html; charset=3DWindows-1252">
<div style=3D"background-color:#FFFFFF">Could you please try mpich-3.3b3 ?<=
br>
<a class=3D"x_moz-txt-link-freetext" href=3D"http://www.mpich.org/static/do=
wnloads/3.3b3/mpich-3.3b3.tar.gz">http://www.mpich.org/static/downloads/3.3=
b3/mpich-3.3b3.tar.gz</a><br>
<br>
Min<br>
<div class=3D"x_moz-cite-prefix">On 2018/07/02 13:01, Abu Naser wrote:<br>
</div>
<blockquote type=3D"cite"><style type=3D"text/css" style=3D"display:none">
<!--
p
	{margin-top:0;
	margin-bottom:0}
-->
</style>
<div id=3D"x_divtagdefaultwrapper" dir=3D"ltr" style=3D"">
<div id=3D"x_divtagdefaultwrapper" dir=3D"ltr" style=3D"">
<p style=3D"margin-top:0; margin-bottom:0">Hello Min,</p>
<p style=3D"margin-top:0; margin-bottom:0"><br>
</p>
<p style=3D"margin-top:0; margin-bottom:0">I have downloaded it from <a hre=
f=3D"http://www.mpich.org/static/downloads/3.2.1/mpich-3.2.1.tar.gz" class=
=3D"x_OWAAutoLink" id=3D"LPlnk943697">
http://www.mpich.org/static/downloads/3.2.1/mpich-3.2.1.tar.gz</a> but it d=
id not work. I have received almost same error. Except this time no process=
 information from my remote machine.</p>
<p style=3D"margin-top:0; margin-bottom:0"><b>Previously I have received th=
is - </b>
<br>
</p>
<div style=3D""><i><span style=3D"font-size:10pt; color:rgb(255,0,0)">Proce=
ss 3 of 4 is on dhcp16194</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt; color:rgb(255,0,0)">Proce=
ss 1 of 4 is on dhcp16194</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt; color:rgb(255,0,0)">Proce=
ss 0 of 4 is on dhcp16198</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt; color:rgb(255,0,0)">Proce=
ss 2 of 4 is on dhcp16198</span></i></div>
<p style=3D"margin-top:0; margin-bottom:0"><b>With the new source code -</b=
></p>
<div><i><span style=3D"font-size:10pt; color:rgb(255,0,0)">Process 0 of 4 i=
s on dhcp16198</span></i></div>
<div><i><span style=3D"font-size:10pt; color:rgb(255,0,0)">Process 2 of 4 i=
s on dhcp16198</span></i></div>
<p style=3D"margin-top:0; margin-bottom:0"><br>
</p>
<p style=3D"margin-top:0; margin-bottom:0"><b>Entire error message is:</b><=
/p>
<div><i><span style=3D"font-size:10pt">Process 0 of 4 is on dhcp16198</span=
></i></div>
<div><i><span style=3D"font-size:10pt">Process 2 of 4 is on dhcp16198</span=
></i></div>
<div><i><span style=3D"font-size:10pt">Fatal error in PMPI_Bcast: Unknown e=
rror class, error stack:</span></i></div>
<div><i><span style=3D"font-size:10pt">PMPI_Bcast(1600)....................=
........: MPI_Bcast(buf=3D0x7ffd1ee145f0, count=3D1, MPI_INT, root=3D0, MPI=
_COMM_WORLD) failed</span></i></div>
<div><i><span style=3D"font-size:10pt">MPIR_Bcast_impl(1452)...............=
........: </span></i></div>
<div><i><span style=3D"font-size:10pt">MPIR_Bcast(1476)....................=
........: </span></i></div>
<div><i><span style=3D"font-size:10pt">MPIR_Bcast_intra(1249)..............=
........: </span></i></div>
<div><i><span style=3D"font-size:10pt">MPIR_SMP_Bcast(1081)................=
........: </span></i></div>
<div><i><span style=3D"font-size:10pt">MPIR_Bcast_binomial(285)............=
........: </span></i></div>
<div><i><span style=3D"font-size:10pt">MPIC_Send(303)......................=
........: </span></i></div>
<div><i><span style=3D"font-size:10pt">MPIC_Wait(226)......................=
........: </span></i></div>
<div><i><span style=3D"font-size:10pt">MPIDI_CH3i_Progress_wait(242).......=
........: an error occurred while handling an event returned by MPIDU_Sock_=
Wait()</span></i></div>
<div><i><span style=3D"font-size:10pt">MPIDI_CH3I_Progress_handle_sock_even=
t(698)..: </span></i></div>
<div><i><span style=3D"font-size:10pt">MPIDI_CH3_Sockconn_handle_connect_ev=
ent(597): [ch3:sock] failed to connnect to remote process</span></i></div>
<div><i><span style=3D"font-size:10pt">MPIDU_Socki_handle_connect(808).....=
........: connection failure (set=3D0,sock=3D1,errno=3D111:Connection refus=
ed)</span></i></div>
<div><i><span style=3D"font-size:10pt">MPIR_SMP_Bcast(1088)................=
........: </span></i></div>
<div><i><span style=3D"font-size:10pt">MPIR_Bcast_binomial(310)............=
........: Failure during collective</span></i></div>
<div><i><span style=3D"font-size:10pt">Fatal error in PMPI_Bcast: Other MPI=
 error, error stack:</span></i></div>
<div><i><span style=3D"font-size:10pt">PMPI_Bcast(1600)........: MPI_Bcast(=
buf=3D0x7ffe2eeb90f0, count=3D1, MPI_INT, root=3D0, MPI_COMM_WORLD) failed<=
/span></i></div>
<div><i><span style=3D"font-size:10pt">MPIR_Bcast_impl(1452)...: </spa=
n></i></div>
<div><i><span style=3D"font-size:10pt">MPIR_Bcast(1476)........: </spa=
n></i></div>
<div><i><span style=3D"font-size:10pt">MPIR_Bcast_intra(1249)..: </spa=
n></i></div>
<div><i><span style=3D"font-size:10pt">MPIR_SMP_Bcast(1088)....: </spa=
n></i></div>
<div><i><span style=3D"font-size:10pt">MPIR_Bcast_binomial(310): Failure du=
ring collective</span></i></div>
<br>
<p style=3D"margin-top:0; margin-bottom:0">Again if I configure the new sou=
rce with tcp, it works fine.</p>
<p style=3D"margin-top:0; margin-bottom:0"><br>
</p>
<p style=3D"margin-top:0; margin-bottom:0">Thank You.<br>
</p>
<div id=3D"x_Signature">
<div id=3D"x_divtagdefaultwrapper" dir=3D"ltr" style=3D"">
<p><br>
</p>
<p align=3D"left"><span style=3D"font-size:10pt; font-family:Calibri,Helvet=
ica,sans-serif">Best Regards,</span></p>
<span style=3D"font-family:Calibri,Helvetica,sans-serif; font-size:10pt"></=
span>
<div align=3D"left"><span style=3D"font-size:11pt; font-family:Calibri,Helv=
etica,sans-serif"></span></div>
<span style=3D"font-family:Calibri,Helvetica,sans-serif; font-size:10pt"></=
span>
<p align=3D"left"><span style=3D"font-size:10pt; font-family:Calibri,Helvet=
ica,sans-serif">Abu Naser</span><br>
</p>
</div>
</div>
</div>
<hr tabindex=3D"-1" style=3D"display:inline-block; width:98%">
<div id=3D"x_divRplyFwdMsg" dir=3D"ltr"><font face=3D"Calibri, sans-serif" =
color=3D"#000000" style=3D"font-size:11pt"><b>From:</b> Min Si
<a class=3D"x_moz-txt-link-rfc2396E" href=3D"mailto:msi at anl.gov"><msi at an=
l.gov></a><br>
<b>Sent:</b> Monday, July 2, 2018 11:56:51 AM<br>
<b>To:</b> <a class=3D"x_moz-txt-link-abbreviated" href=3D"mailto:discuss at m=
pich.org">
discuss at mpich.org</a><br>
<b>Subject:</b> Re: [mpich-discuss] osu_latency test: why 8KB takes less ti=
me than 4KB and 2KB takes less time than 1KB?</font>
<div> </div>
</div>
<meta content=3D"text/html; charset=3DWindows-1252">
<div style=3D"background-color:#FFFFFF">Hi Abu,<br>
<br>
Thanks for reporting this. Can you please try the latest release with ch3/s=
ock and see if you still have this error ?
<br>
<br>
Min<br>
<div class=3D"x_x_moz-cite-prefix">On 2018/07/01 21:47, Abu Naser wrote:<br=
>
</div>
<blockquote type=3D"cite">
<div id=3D"x_x_divtagdefaultwrapper" dir=3D"ltr" style=3D"">
<div id=3D"x_x_divtagdefaultwrapper" dir=3D"ltr" style=3D"">
<p style=3D"">Hello Min,</p>
<p style=3D""><br>
</p>
<p style=3D"">After compiling my mpich-3.2.1 with sock, while I was trying =
to run  any program including osu benchmark or examples/cpi&=
nbsp; in two machines, I have received following error -</p>
<p style=3D""><br>
</p>
<div style=3D""><i><span style=3D"font-size:10pt">Process 3 of 4 is on dhcp=
16194</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">Process 1 of 4 is on dhcp=
16194</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">Process 0 of 4 is on dhcp=
16198</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">Process 2 of 4 is on dhcp=
16198</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">Fatal error in PMPI_Bcast=
: Unknown error class, error stack:</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">PMPI_Bcast(1600).........=
...................: MPI_Bcast(buf=3D0x7ffc1808542c, count=3D1, MPI_INT, ro=
ot=3D0, MPI_COMM_WORLD) failed</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_Bcast_impl(1452)....=
...................: </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_Bcast(1476).........=
...................: </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_Bcast_intra(1249)...=
...................: </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_SMP_Bcast(1081).....=
...................: </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_Bcast_binomial(285).=
...................: </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIC_Send(303)...........=
...................: </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIC_Wait(226)...........=
...................: </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIDI_CH3i_Progress_wait(=
242)...............: an error occurred while handling an event returned by =
MPIDU_Sock_Wait()</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIDI_CH3I_Progress_handl=
e_sock_event(698)..: </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIDI_CH3_Sockconn_handle=
_connect_event(597): [ch3:sock] failed to connnect to remote process</span>=
</i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIDU_Socki_handle_connec=
t(808).............: connection failure (set=3D0,sock=3D1,errno=3D111:Conne=
ction refused)</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_SMP_Bcast(1088).....=
...................: </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_Bcast_binomial(310).=
...................: Failure during collective</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">Fatal error in PMPI_Bcast=
: Other MPI error, error stack:</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">PMPI_Bcast(1600)........:=
 MPI_Bcast(buf=3D0x7ffd9eeebdac, count=3D1, MPI_INT, root=3D0, MPI_COMM_WOR=
LD) failed</span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_Bcast_impl(1452)...:=
 </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_Bcast(1476)........:=
 </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_Bcast_intra(1249)..:=
 </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_SMP_Bcast(1088)....:=
 </span></i></div>
<div style=3D""><i><span style=3D"font-size:10pt">MPIR_Bcast_binomial(310):=
 Failure during collective</span></i></div>
<br style=3D"">
<p style=3D""><span style=3D"font-size:12pt">I checked the mpich FAQ a=
nd also mpich discussion list. Based on that I have checked </span>fol=
lowings<span style=3D"font-size:12pt"> </span><span style=3D"font-size=
:12pt">and found  they are fine in my machines -</span><br>
</p>
<p style=3D""><span style=3D"font-size:12pt">- firewall is disabled in both=
 machine</span></p>
<p style=3D""><span style=3D"font-size:12pt">- I can do </span>passwor=
d less<span style=3D"font-size:12pt"> ssh in both machine</span></p>
<p style=3D""><span style=3D"font-size:12pt">- /etc/hosts in both machine c=
onfigured with ip address and name properly</span></p>
<p style=3D""><span style=3D"font-size:12pt">- I have updated the library p=
ath and used absolute path for mpiexec</span></p>
<p style=3D""><span style=3D"font-size:12pt">- Most importantly when I conf=
igured and build mpich with tcp, it works fine.</span></p>
<p style=3D""><span style=3D"font-size:12pt"><br>
</span></p>
<p style=3D""><span style=3D"font-size:12pt"> I think I am </span=
><span style=3D"font-size:12pt">missing something but could not figure out =
yet. Any help would be
</span>appreciated<span style=3D"font-size:12pt">.</span></p>
<p style=3D""><span style=3D"font-size:12pt"><br>
</span></p>
<p style=3D""><span style=3D"font-size:12pt">Thank you.</span></p>
<br>
<p style=3D""><br>
</p>
<p style=3D""><br>
</p>
<p style=3D""><br>
</p>
<div id=3D"x_x_Signature" style=3D"">
<div id=3D"x_x_divtagdefaultwrapper" dir=3D"ltr" style=3D"">
<p><br>
</p>
<p align=3D"left"><span style=3D"font-size:10pt; font-family:Calibri,Helvet=
ica,sans-serif">Best Regards,</span></p>
<span style=3D"font-family:Calibri,Helvetica,sans-serif; font-size:10pt"></=
span>
<div align=3D"left"><span style=3D"font-size:11pt; font-family:Calibri,Helv=
etica,sans-serif"></span></div>
<span style=3D"font-family:Calibri,Helvetica,sans-serif; font-size:10pt"></=
span>
<p align=3D"left"><span style=3D"font-size:10pt; font-family:Calibri,Helvet=
ica,sans-serif">Abu Naser</span><br>
</p>
</div>
</div>
</div>
<hr tabindex=3D"-1" style=3D"display:inline-block; width:98%">
<div id=3D"x_x_divRplyFwdMsg" dir=3D"ltr"><font face=3D"Calibri, sans-serif=
" color=3D"#000000" style=3D"font-size:11pt"><b>From:</b> Min Si
<a class=3D"x_x_moz-txt-link-rfc2396E x_OWAAutoLink" href=3D"mailto:msi at anl=
.gov" id=3D"LPlnk191149">
<msi at anl.gov></a><br>
<b>Sent:</b> Tuesday, June 26, 2018 12:54:29 PM<br>
<b>To:</b> <a class=3D"x_x_moz-txt-link-abbreviated x_OWAAutoLink" href=3D"=
mailto:discuss at mpich.org" id=3D"LPlnk414203">
discuss at mpich.org</a><br>
<b>Subject:</b> Re: [mpich-discuss] osu_latency test: why 8KB takes less ti=
me than 4KB and 2KB takes less time than 1KB?</font>
<div> </div>
</div>
<meta content=3D"text/html; charset=3DWindows-1252">
<div style=3D"background-color:#FFFFFF">Hi Abu,<br>
<br>
I think the results are stable enough. Perhaps you could also try the follo=
wing tests, and see if similar trend exists:<br>
- MPICH/socket (set `--with-device=3Dch3:sock` at configure)<br>
- A socket-based pingpong test without MPI. <br>
<br>
At this point, I could not think of any MPI-specific design for 2k/8k messa=
ges. My guess is that it is related to your network connection.
<br>
<br>
Min<br>
<br>
<div class=3D"x_x_x_moz-cite-prefix">On 2018/06/24 11:09, Abu Naser wrote:<=
br>
</div>
<blockquote type=3D"cite">
<meta content=3D"text/html; charset=3DWindows-1252">
<div id=3D"x_x_x_divtagdefaultwrapper" dir=3D"ltr">
<div id=3D"x_x_x_divtagdefaultwrapper" dir=3D"ltr">
<p>Hello Min and Jeff,</p>
<p><br>
</p>
<p>Here is my experiment results. Default number of iterations in osu_=
latency for 0B =96 8KB is 10,000. With that setting I had run the osu_laten=
cy 100 times and found standard deviation 33 for 8KB message size.</p>
<p><br>
</p>
<p>So later I have set the iteration to 50,000 and 100,000 for 1KB =96 16KB=
 message size. Then run osu_latency for 100 times for each setting and take=
 the average and standard deviation.</p>
<p><br>
</p>
<table width=3D"665">
<colgroup><col width=3D"99"><col width=3D"112"><col width=3D"118"><col widt=
h=3D"154"><col width=3D"140"></colgroup>
<tbody>
<tr>
<td width=3D"99">
<p><b>Msg Size in Bytes</b></p>
</td>
<td width=3D"112">
<p><b>Avg time in us (50K iterations)</b></p>
</td>
<td width=3D"118">
<p><b>Avg time in us (100k iterations)</b></p>
</td>
<td width=3D"154">
<p><b>Standard deviation (50K iterations)</b></p>
</td>
<td width=3D"140">
<p><b>Standard deviation (100K iterations)</b></p>
</td>
</tr>
<tr>
<td width=3D"99">
<p>1k</p>
</td>
<td width=3D"112">
<p>85.10</p>
</td>
<td width=3D"118">
<p>84.9</p>
</td>
<td width=3D"154">
<p>0.55</p>
</td>
<td width=3D"140">
<p>0.45</p>
</td>
</tr>
<tr>
<td width=3D"99">
<p>2k</p>
</td>
<td width=3D"112">
<p>75.79</p>
</td>
<td width=3D"118">
<p>74.63</p>
</td>
<td width=3D"154">
<p>5.09</p>
</td>
<td width=3D"140">
<p>4.44</p>
</td>
</tr>
<tr>
<td width=3D"99">
<p>4k</p>
</td>
<td width=3D"112">
<p>273.80</p>
</td>
<td width=3D"118">
<p>274.71</p>
</td>
<td width=3D"154">
<p>4.18</p>
</td>
<td width=3D"140">
<p>2.45</p>
</td>
</tr>
<tr>
<td width=3D"99">
<p>8k</p>
</td>
<td width=3D"112">
<p>258.56</p>
</td>
<td width=3D"118">
<p>249.83</p>
</td>
<td width=3D"154">
<p>21.14</p>
</td>
<td width=3D"140">
<p>28</p>
</td>
</tr>
<tr>
<td height=3D"24" width=3D"99">
<p>16k</p>
</td>
<td width=3D"112">
<p>281.31</p>
</td>
<td width=3D"118">
<p>281.02</p>
</td>
<td width=3D"154">
<p>3.22</p>
</td>
<td width=3D"140">
<p>4.10</p>
</td>
</tr>
</tbody>
</table>
<p><br>
</p>
<p><br>
</p>
<p>The standard deviation of 8K message is so high and that implies it actu=
ally not producing any consistent latency time. Looks like that's the =
reason for 8K is taking less time than 4K.</p>
<p><br>
</p>
<p>Meanwhile, 2K has standard deviation less than 5 but 1K message latency =
timing are more densely populated than 2K. So probably this is the explanat=
ion for 2K message less latency time.</p>
<p><br>
</p>
<p>Thank you for your suggestions.</p>
<br>
<p><br>
</p>
<div id=3D"x_x_x_Signature">
<div id=3D"x_x_x_divtagdefaultwrapper" dir=3D"ltr">
<p><br>
</p>
<p><span>Best Regards,</span></p>
<span></span>
<div><span></span></div>
<span></span>
<p><span>Abu Naser</span><br>
</p>
</div>
</div>
</div>
<hr tabindex=3D"-1">
<div id=3D"x_x_x_divRplyFwdMsg" dir=3D"ltr"><b>From:</b> Abu Naser<br>
<b>Sent:</b> Wednesday, June 20, 2018 1:48:53 PM<br>
<b>To:</b> <a class=3D"x_x_x_moz-txt-link-abbreviated x_x_OWAAutoLink" href=
=3D"mailto:discuss at mpich.org" id=3D"LPlnk729146">
discuss at mpich.org</a><br>
<b>Subject:</b> Re: [mpich-discuss] osu_latency test: why 8KB takes less ti=
me than 4KB and 2KB takes less time than 1KB?
<div> </div>
</div>
<meta content=3D"text/html; charset=3Diso-8859-1">
<div dir=3D"ltr">
<div id=3D"x_x_x_x_divtagdefaultwrapper" dir=3D"ltr">
<div id=3D"x_x_x_x_divtagdefaultwrapper" dir=3D"ltr">
<p>Hello Min,</p>
<p><br>
</p>
<p>Thanks for the clarification.  I will do the experiment.<br>
</p>
<p><br>
</p>
<div id=3D"x_x_x_x_Signature">
<div id=3D"x_x_x_x_divtagdefaultwrapper" dir=3D"ltr">
<p>Thanks.</p>
<p><span>Best Regards,</span></p>
<span></span>
<div><span></span></div>
<span></span>
<p><span>Abu Naser</span><br>
</p>
</div>
</div>
</div>
<hr tabindex=3D"-1">
<div id=3D"x_x_x_x_divRplyFwdMsg" dir=3D"ltr"><b>From:</b> Min Si <a class=
=3D"x_x_x_moz-txt-link-rfc2396E x_x_OWAAutoLink" href=3D"mailto:msi at anl.gov=
" id=3D"LPlnk558260">
<msi at anl.gov></a><br>
<b>Sent:</b> Wednesday, June 20, 2018 1:39:30 PM<br>
<b>To:</b> <a class=3D"x_x_x_moz-txt-link-abbreviated x_x_OWAAutoLink" href=
=3D"mailto:discuss at mpich.org" id=3D"LPlnk472728">
discuss at mpich.org</a><br>
<b>Subject:</b> Re: [mpich-discuss] osu_latency test: why 8KB takes less ti=
me than 4KB and 2KB takes less time than 1KB?
<div> </div>
</div>
<meta content=3D"text/html; charset=3DWindows-1252">
<div>Hi Abu,<br>
<br>
I think Jeff means that you should run your experiment with more iterations=
 in order to get a stable results.<br>
- Increase the iteration of for loop in each execution (I think osu benchma=
rk allows you to set it)<br>
- Run the experiments 10 or 100 times, and take the average and standard de=
viation.<br>
<br>
If you see a very small standard deviation (e.g., <=3D5%), then the tren=
d is stable and you might not see such gaps.<br>
<br>
Best regards,<br>
Min<br>
<div class=3D"x_x_x_x_x_moz-cite-prefix">On 2018/06/20 12:14, Abu Naser wro=
te:<br>
</div>
<blockquote type=3D"cite">
<div id=3D"x_x_x_x_x_divtagdefaultwrapper" dir=3D"ltr">
<p>Hello Jeff,</p>
<p><br>
</p>
<p>Yes, I am using a switch and other machines are also connected with=
 that switch.
<br>
</p>
<p>If I remove other machines and just use my two node with the switch, the=
n will it improve the performance by 200 ~ 400 iterations?</p>
<p>Meanwhile I will give a try with a single dedicated cable. <span></span>=
<br>
</p>
<p><br>
</p>
<p>Thank you.<br>
</p>
<div id=3D"x_x_x_x_x_Signature">
<div id=3D"x_x_x_x_x_divtagdefaultwrapper" dir=3D"ltr">
<p><br>
</p>
<p><span>Best Regards,</span></p>
<span></span>
<div><span></span></div>
<span></span>
<p><span>Abu Naser</span><br>
</p>
</div>
</div>
</div>
<hr tabindex=3D"-1">
<div id=3D"x_x_x_x_x_divRplyFwdMsg" dir=3D"ltr"><b>From:</b> Jeff Hammond <=
a class=3D"x_x_x_x_x_moz-txt-link-rfc2396E x_x_x_x_OWAAutoLink" href=3D"mai=
lto:jeff.science at gmail.com" id=3D"LPlnk983157">
<jeff.science at gmail.com></a><br>
<b>Sent:</b> Wednesday, June 20, 2018 12:52:06 PM<br>
<b>To:</b> MPICH<br>
<b>Subject:</b> Re: [mpich-discuss] osu_latency test: why 8KB takes less ti=
me than 4KB and 2KB takes less time than 1KB?
<div> </div>
</div>
<meta content=3D"text/html; charset=3Dutf-8">
<div>
<div dir=3D"ltr">Is the ethernet connection a single dedicated cable betwee=
n the two machines or are you running through a switch that handles other t=
raffic?
<div><br>
</div>
<div>My best guess is that this is noise and that you may be able to avoid =
it by running a very long time, e.g. 10000 iterations.</div>
<div><br>
</div>
<div>Jeff</div>
</div>
<div class=3D"x_x_x_x_x_x_gmail_extra"><br>
<div class=3D"x_x_x_x_x_x_gmail_quote">On Wed, Jun 20, 2018 at 6:53 AM, Abu=
 Naser <span dir=3D"ltr">
<<a href=3D"mailto:an16e at my.fsu.edu" target=3D"_blank" id=3D"LPlnk305789=
" class=3D"x_x_x_x_OWAAutoLink">an16e at my.fsu.edu</a>></span> wrote:<br>
<blockquote class=3D"x_x_x_x_x_x_gmail_quote">
<div dir=3D"ltr">
<div id=3D"x_x_x_x_x_x_m_6077755676379859201divtagdefaultwrapper" dir=3D"lt=
r">
<p><br>
</p>
<p>Good day to all,</p>
<p><br>
</p>
<p>I had run point to point osu_latency test in two nodes for 200 times.&nb=
sp; Followings are the average time in microsecond for various size of the =
messages -</p>
<div>1KB    84.8514 us<br>
<span>2KB    73.52535</span> us<br>
4KB    272.55275 us<br>
<span>8KB    234.86385</span> us<br>
16KB    288.88 us<br>
32KB    523.3725 us<br>
64KB    910.4025 us</div>
<p><br>
</p>
<p>From the above looks like, 2KB message has less latency than 1 KB and 8K=
B has less latency than 4KB.
<br>
</p>
<p>I was looking for explanation of this behavior  but did not get any=
.</p>
<p><br>
</p>
<ol>
<li><span>MPIR_CVAR_CH3_EAGER_MAX_MSG_<wbr>SIZE</span><span> is set to 128K=
B. So none of the above message size is using Rendezvous protocol. Is there=
 any partition inside eager protocol (e.g. 0 - 512 bytes, 1KB - 8KB, 16KB -=
 64KB)? If yes then what are the
 boundaries for them? Can I log them with debug-event-logging? </span><br>
</li></ol>
<p><br>
</p>
<p>Setup I am using:</p>
<p>- two nodes has intel core i7, one with 16gb memory another one 8gb</p>
<p>- mpich 3.2.1, configured and build to use nemesis tcp</p>
<p>- 1gb Ethernet connection</p>
<p>- NFS is using for sharing<br>
</p>
<p>- osu_latency : uses MPI_Send and MPI_Recv</p>
<p>- <span>MPIR_CVAR_CH3_EAGER_MAX_MSG_<wbr>SIZE</span>=3D <span>131072</sp=
an> (128KB)<br>
</p>
<p><br>
</p>
<p>Can anyone help me on that? Thanks in advance.<br>
</p>
<p><br>
</p>
<p><br>
</p>
<div id=3D"x_x_x_x_x_x_m_6077755676379859201Signature">
<div id=3D"x_x_x_x_x_x_m_6077755676379859201divtagdefaultwrapper" dir=3D"lt=
r">
<p><br>
</p>
<p><span>Best Regards,</span></p>
<span></span>
<div><span></span></div>
<span></span>
<p><span>Abu Naser</span><br>
</p>
</div>
</div>
</div>
</div>
<br>
______________________________<wbr>_________________<br>
discuss mailing list     <a href=3D"mailto:discuss at mpich.org=
" id=3D"LPlnk816471" class=3D"x_x_x_x_OWAAutoLink">discuss at mpich.org</a><br=
>
To manage subscription options or unsubscribe:<br>
<a href=3D"https://lists.mpich.org/mailman/listinfo/discuss" rel=3D"norefer=
rer" target=3D"_blank" id=3D"LPlnk624595" class=3D"x_x_x_x_OWAAutoLink">htt=
ps://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>
<br>
</blockquote>
</div>
<br>
<br>
<div><br>
</div>
-- <br>
<div class=3D"x_x_x_x_x_x_gmail_signature">Jeff Hammond<br>
<a href=3D"mailto:jeff.science at gmail.com" target=3D"_blank" id=3D"LPlnk3149=
93" class=3D"x_x_x_x_OWAAutoLink">jeff.science at gmail.com</a><br>
<a href=3D"http://jeffhammond.github.io/" target=3D"_blank" id=3D"LPlnk8614=
34" class=3D"x_x_x_x_OWAAutoLink">http://jeffhammond.github.io/</a></div>
</div>
</div>
<br>
<fieldset class=3D"x_x_x_x_x_mimeAttachmentHeader"></fieldset> <br>
<pre>_______________________________________________
discuss mailing list     <a class=3D"x_x_x_x_x_moz-txt-link-abbreviated x_x=
_x_x_OWAAutoLink" href=3D"mailto:discuss at mpich.org" id=3D"LPlnk657371">disc=
uss at mpich.org</a>
To manage subscription options or unsubscribe:
<a class=3D"x_x_x_x_x_moz-txt-link-freetext x_x_x_x_OWAAutoLink" href=3D"ht=
tps://lists.mpich.org/mailman/listinfo/discuss" id=3D"LPlnk669988">https://=
lists.mpich.org/mailman/listinfo/discuss</a>
</pre>
</blockquote>
<br>
</div>
</div>
</div>
</div>
<br>
<fieldset class=3D"x_x_x_mimeAttachmentHeader"></fieldset> <br>
<pre>_______________________________________________
discuss mailing list     <a class=3D"x_x_x_moz-txt-link-abbreviated x_x_OWA=
AutoLink" href=3D"mailto:discuss at mpich.org" id=3D"LPlnk832953">discuss at mpic=
h.org</a>
To manage subscription options or unsubscribe:
<a class=3D"x_x_x_moz-txt-link-freetext x_x_OWAAutoLink" href=3D"https://li=
sts.mpich.org/mailman/listinfo/discuss" id=3D"LPlnk481779">https://lists.mp=
ich.org/mailman/listinfo/discuss</a>
</pre>
</blockquote>
<br>
</div>
</div>
<br>
<fieldset class=3D"x_x_mimeAttachmentHeader"></fieldset> <br>
<pre>_______________________________________________
discuss mailing list     <a class=3D"x_x_moz-txt-link-abbreviated x_OWAAuto=
Link" href=3D"mailto:discuss at mpich.org" id=3D"LPlnk408695">discuss at mpich.or=
g</a>
To manage subscription options or unsubscribe:
<a class=3D"x_x_moz-txt-link-freetext x_OWAAutoLink" href=3D"https://lists.=
mpich.org/mailman/listinfo/discuss" id=3D"LPlnk572504">https://lists.mpich.=
org/mailman/listinfo/discuss</a>
</pre>
</blockquote>
<br>
</div>
</div>
<br>
<fieldset class=3D"x_mimeAttachmentHeader"></fieldset> <br>
<pre>_______________________________________________
discuss mailing list     <a class=3D"x_moz-txt-link-abbreviated" href=3D"ma=
ilto:discuss at mpich.org">discuss at mpich.org</a>
To manage subscription options or unsubscribe:
<a class=3D"x_moz-txt-link-freetext" href=3D"https://lists.mpich.org/mailma=
n/listinfo/discuss">https://lists.mpich.org/mailman/listinfo/discuss</a>
</pre>
</blockquote>
<br>
</div>
</body>
</html>

--_000_BLUPR0501MB2003D13F739833B52E59DBB097430BLUPR0501MB2003_--

--===============7802683794760088852==
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss

--===============7802683794760088852==--


More information about the discuss mailing list