<div dir="ltr"><div class="gmail_quote"><div dir="ltr">Hello everyone,<div><br></div><div>I use mpich2 on a small cluster of 3 nodes and each node has Ubuntu 12.04 installed. I use this cluster to do the following.</div><div>
<br></div><div>1. <i>Master </i>node send some matrices to the 2 <i>workers</i></div>
<div>2. Workers perform some calculations and send the resulted matrices back to the master.</div><div>3. Master perform some final calculations.</div><div><br></div><div>code snippet:</div><div><br></div><div><br></div>
<div><font face="courier new, monospace">//n*n is the matrix size</font></div><div><font face="courier new, monospace">//master(taskid=0)</font></div>
<div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace"> MPI_Send(ha11, n / 2 * n / 2, MPI_DOUBLE, 1, 1, MPI_COMM_WORLD); //to worker 1</font></div><div><font face="courier new, monospace"> MPI_Send(ha11, n / 2 * n / 2, MPI_DOUBLE, 2, 1, MPI_COMM_WORLD); //to worker 2<br>
</font></div><div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace"><br></font></div><div><div><font face="courier new, monospace"> MPI_Recv(hM1, n / 2 * n / 2, MPI_DOUBLE, 1, 2, MPI_COMM_WORLD,&status); </font><span style="font-family:'courier new',monospace">//from worker 1</span></div>
</div><div><font face="courier new, monospace"> MPI_Recv(hM2, n / 2 * n / 2, MPI_DOUBLE, 2, 2, MPI_COMM_WORLD,&status);</font><span style="font-family:'courier new',monospace">//from worker 2</span><font face="courier new, monospace"><br>
</font></div><div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace">//final calculations using hM1,hM2</font></div><div><font face="courier new, monospace"><br></font></div><div>
<font face="courier new, monospace">//worker 1 (taskid=1)</font></div><div><font face="courier new, monospace"><br></font></div><div><div><font face="courier new, monospace">MPI_Recv(ha11, n / 2 * n / 2, MPI_DOUBLE, 0, 1, MPI_COMM_WORLD,</font><span style="font-family:'courier new',monospace">&status);</span></div>
<div><span style="font-family:'courier new',monospace">//does some calculations</span><br></div><div><span style="font-family:'courier new',monospace">MPI_Send(hM1, n / 2 * n / 2, MPI_DOUBLE, 0, 2, MPI_COMM_WORLD); //sends back</span><br>
</div></div><div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace"> <br clear="all"></font><div><font face="courier new, monospace">//worker 2(taskid=2)</font></div><div><div><font face="courier new, monospace">MPI_Recv(ha11, n / 2 * n / 2, MPI_DOUBLE, 0, 1, MPI_COMM_WORLD,</font><span style="font-family:'courier new',monospace">&status);</span></div>
<div><span style="font-family:'courier new',monospace">//does some calculations</span><br></div><div><span style="font-family:'courier new',monospace">MPI_Send(hM2, n / 2 * n / 2, MPI_DOUBLE, 0, 2, MPI_COMM_WORLD); //sends back</span><br>
</div></div><div><font face="courier new, monospace"><br></font></div><div><br></div><div>This worked fine at first, for n=128 to n=2048. But after I pushed 'n' beyond 2048 I got a segmentation fault from the worker 1. </div>
<div><br></div><div>Since then, code works fine for the small n values. But whenever I set the value n=128 or greater, worker 1 is getting delayed indefinitely while the rest of the nodes works fine.</div><div><br></div>
<div>
What could be the reason for this? And how can I resolve this? If I have done any mistakes please point out. Thanks in advance.</div><span class="HOEnZb"><font color="#888888"><div><br></div>-- <br><div dir="ltr"><div><font face="tahoma, sans-serif" color="#333333">Regards,</font></div>
<b><font face="tahoma, sans-serif" color="#333333">H.K. Madhawa Bandara</font></b><div><div style="font-family:arial;font-size:small"><br></div></div><div><br></div></div></font></span></div></div></div>
</div>