<div dir="ltr"><div>Hello,<br><br></div><div>Sorry to post this long, stupid and simple question.<br></div>I found that some time MPI_Barrier cannot stop all the process. I try to write a simple test program to create data_struct shown below:<br>
<br><span style="color:rgb(166,77,121)">#include <stdlib.h><br>#include <stdio.h><br>#include "mpi.h"<br><br>typedef struct <br>{<br> int a;<br> char b;<br> int c;<br> int d; <br>} foo;<br>
<br>int main(int argc, char *argv[])<br>{<br> <br> int rank, size;<br> int i;<br> <br> foo x;<br><br> MPI_Init(&argc, &argv);<br> MPI_Comm_size(MPI_COMM_WORLD, &size);<br> MPI_Comm_rank(MPI_COMM_WORLD, &rank);<br>
<br> char processor_name[MPI_MAX_PROCESSOR_NAME];<br> int name_len;<br> MPI_Get_processor_name(processor_name, &name_len);<br> printf("-- processor %s, rank %d out of %d processors\n", processor_name, rank, size);<br>
<br> <b> MPI_Barrier(MPI_COMM_WORLD);</b><br><br> int count=4; <br><br> MPI_Datatype testtype; <br> MPI_Datatype types[4] = {MPI_INT, MPI_CHAR, MPI_INT, MPI_DOUBLE};<br> int len[4] = {1, 1, 1, 1};<br>
MPI_Aint disp[4];<br> long int base;<br> <br> MPI_Address(&x, disp);<br> MPI_Address(&(x.a), disp+1);<br> MPI_Address(&(x.b), disp+2);<br> MPI_Address(&(x.c), disp+3);<br> base = disp[0];<br>
for(i=0; i<4; i++) disp[i] -= base;<br> <br> MPI_Type_struct(count, len, disp, types, &testtype);<br> MPI_Type_commit(&testtype);<br><br> if(rank == 0){<br> x.a = 2;<br> x.b = 0;<br>
x.c = 10;<br> x.d = 3;<br> }<br> <br> printf("rank %d(before): x value is %d, %d, %d, %d\n", rank, x.a, x.b, x.c, x.d);<br> <b>MPI_Barrier(MPI_COMM_WORLD);</b><br> MPI_Bcast(&x, 1, testtype, 0, MPI_COMM_WORLD);<br>
<br> printf("rank %d(after): x value is %d, %d, %d, %d\n", rank, x.a, x.b, x.c, x.d);<br> <br> MPI_Finalize();<br><br> return 0;<br>}</span><br clear="all"><div><div><br></div><div>the output should be looks like:<br>
<div style="margin-left:40px"><span style="color:rgb(166,77,121)">-- processor <a href="http://iocfccd3.aps.anl.gov">iocfccd3.aps.anl.gov</a>, rank 0 out of 4 processors<br>-- processor <a href="http://iocfccd3.aps.anl.gov">iocfccd3.aps.anl.gov</a>, rank 3 out of 4 processors<br>
-- processor <a href="http://iocfccd3.aps.anl.gov">iocfccd3.aps.anl.gov</a>, rank 1 out of 4 processors<br>-- processor <a href="http://iocfccd3.aps.anl.gov">iocfccd3.aps.anl.gov</a>, rank 2 out of 4 processors<br>rank 0(before): x value is 2, 0, 10, 3<br>
rank 1(before): x value is 1197535864, -1, 4994901, 0<br>rank 2(before): x value is 1591464488, -1, 4994901, 0<br>rank 3(before): x value is 1851622184, -1, 4994901, 0<br>rank 0(after): x value is 2, 0, 10, 3<br>rank 3(after): x value is 2, 0, 10, 3<br>
rank 1(after): x value is 2, 0, 10, 3<br>rank 2(after): x value is 2, 0, 10, 3</span><br></div></div><div><br></div><div>but some time is shows as:<br><span style="color:rgb(166,77,121)">-- processor <a href="http://iocfccd3.aps.anl.gov">iocfccd3.aps.anl.gov</a>, rank 0 out of 4 processors<br>
rank 0(before): x value is 2, 0, 10, 3<br>rank 0(after): x value is 2, 0, 10, 3<br>-- processor <a href="http://iocfccd3.aps.anl.gov">iocfccd3.aps.anl.gov</a>, rank 1 out of 4 processors<br>rank 1(before): x value is -464731256, -1, 4994901, 0<br>
rank 1(after): x value is 2, 0, 10, 3<br>-- processor <a href="http://iocfccd3.aps.anl.gov">iocfccd3.aps.anl.gov</a>, rank 2 out of 4 processors<br>rank 2(before): x value is 1863042488, -1, 4994901, 0<br>rank 2(after): x value is 2, 0, 10, 3<br>
-- processor <a href="http://iocfccd3.aps.anl.gov">iocfccd3.aps.anl.gov</a>, rank 3 out of 4 processors<br>rank 3(before): x value is 1721065144, -1, 4994901, 0<br>rank 3(after): x value is 2, 0, 10, 3</span><br></div><div>
<br>or<br><span style="color:rgb(166,77,121)">-- processor <a href="http://iocfccd3.aps.anl.gov">iocfccd3.aps.anl.gov</a>, rank 0 out of 4 processors<br>-- processor <a href="http://iocfccd3.aps.anl.gov">iocfccd3.aps.anl.gov</a>, rank 1 out of 4 processors<br>
rank 1(before): x value is -1883169624, -1, 4994901, 0<br>-- processor <a href="http://iocfccd3.aps.anl.gov">iocfccd3.aps.anl.gov</a>, rank 2 out of 4 processors<br>rank 2(before): x value is -451256152, -1, 4994901, 0<br>
-- processor <a href="http://iocfccd3.aps.anl.gov">iocfccd3.aps.anl.gov</a>, rank 3 out of 4 processors<br>rank 3(before): x value is 1715067240, -1, 4994901, 0<br>rank 0(before): x value is 2, 0, 10, 3<br>rank 0(after): x value is 2, 0, 10, 3<br>
rank 1(after): x value is 2, 0, 10, 3<br>rank 2(after): x value is 2, 0, 10, 3<br>rank 3(after): x value is 2, 0, 10, 3</span><br><br></div><div>it is all randomly, I am not sure where is the problem. <br></div><div><br></div>
<div>The second issue is I use MPI_Datatype to create a MPI struct for broadcast. However, as the program shown above, if I change the struct:<br><span style="color:rgb(166,77,121)">typedef struct <br>{<br> int a;<br> char b;<br>
int c;<br> <b>int d;</b> <br>} foo;</span><br></div><div>as <br><span style="color:rgb(166,77,121)">typedef struct <br>{<br> int a;<br> char b;<br> int c;<br> <b>double d</b>; <br>} foo;</span><br>
</div><div><br></div><div>I found the result is:<br><span style="color:rgb(166,77,121)">-- processor <a href="http://ephesus.ece.iit.edu">ephesus.ece.iit.edu</a>, rank 1 out of 4 processors<br>-- processor <a href="http://ephesus.ece.iit.edu">ephesus.ece.iit.edu</a>, rank 2 out of 4 processors<br>
-- processor <a href="http://ephesus.ece.iit.edu">ephesus.ece.iit.edu</a>, rank 3 out of 4 processors<br>-- processor <a href="http://ephesus.ece.iit.edu">ephesus.ece.iit.edu</a>, rank 0 out of 4 processors<br>rank 0(before): x value is 2, 0, 10, 3.250000<br>
rank 3(before): x value is 0, 0, 547474368, 0.000000<br>rank 1(before): x value is 0, 0, 547474368, 0.000000<br>rank 2(before): x value is 0, 0, 547474368, 0.000000<br>rank 0(after): x value is 2, 0, 10, 3.250000<br><b>rank 2(after): x value is 2, 0, 10, 0.000000 <span style="color:rgb(0,0,0)"><- should be 3.25</span><br>
rank 3(after): x value is 2, 0, 10, 0.000000 <span style="color:rgb(0,0,0)"><- should be 3.25</span><br>rank 1(after): x value is 2, 0, 10, 0.000000</b></span><b> <- should be 3.25<br></b><br></div><div>Do you have any clues on why this happened? Thanks a lot!<br>
</div><div><br></div><div>-- <br>Best Regards,<div>Sufeng Niu</div><div>ECASP lab, ECE department, Illinois Institute of Technology</div><div>Tel: 312-731-7219</div>
</div></div></div>