<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div dir="ltr"><div>I'm trying to simulate a cluster setup where only the I/O aggregators have access to the working directory. Is this feasible to do with romio hints?</div><div><br></div><div>Setup:</div><div> A) Work dir is on Node0's local hard drive.<br></div><div> B) Remote Node1 has no access to Node0's hard drive</div><div> C) Run a case where only rank 0 on Node0 serves as the I/O aggregator</div><div><br></div><div>Simple MPI I/O Code:</div><div><br></div><div><div><font face="monospace, monospace">#include "mpi.h"</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">#include <string.h></font></div><div><font face="monospace, monospace">#include <stdio.h></font></div><div><font face="monospace, monospace">#include <unistd.h></font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">int main(int argc, char *argv[])</font></div><div><font face="monospace, monospace">{</font></div><div><font face="monospace, monospace"> MPI_File fh;</font></div><div><font face="monospace, monospace"> MPI_Info info;</font></div><div><font face="monospace, monospace"> int amode, mpi_ret;</font></div><div><font face="monospace, monospace"> char filename[64];</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> MPI_Init(&argc, &argv);</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> MPI_Info_create(&info);</font></div><div><font face="monospace, monospace"> MPI_Info_set(info, "cb_nodes", "1");</font></div><div><font face="monospace, monospace"> MPI_Info_set(info, "no_indep_rw", "true");</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> strcpy(filename, "mpio_test.out");</font></div><div><font face="monospace, monospace"> amode = MPI_MODE_CREATE | MPI_MODE_WRONLY;</font></div><div><font face="monospace, monospace"> mpi_ret = MPI_File_open(MPI_COMM_WORLD, filename, amode, info, &fh);</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> if(mpi_ret != MPI_SUCCESS)</font></div><div><font face="monospace, monospace"> {</font></div><div><font face="monospace, monospace"> char mpi_err_buf[MPI_MAX_ERROR_STRING];</font></div><div><font face="monospace, monospace"> int mpi_err_len;</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> MPI_Error_string(mpi_ret, mpi_err_buf, &mpi_err_len);</font></div><div><font face="monospace, monospace"> fprintf(stderr, "Failed MPI_File_open. Filename = %s, error = %s", filename, mpi_err_buf);</font></div><div><font face="monospace, monospace"> return -1;</font></div><div><font face="monospace, monospace"> }</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> // Force I/O errors associated with this file to abort</font></div><div><font face="monospace, monospace"> MPI_File_set_errhandler(fh, MPI_ERRORS_ARE_FATAL);</font></div><div><font face="monospace, monospace"> </font></div><div><font face="monospace, monospace"> MPI_File_close(&fh);</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> printf("SUCCESS\n");</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> MPI_Finalize();</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace"> return 0;</font></div><div><font face="monospace, monospace">}</font></div></div><div><br></div><div>Note the hardcoded "MPI_Info_set(info, "cb_nodes", "1");" and "MPI_Info_set(info, "no_indep_rw", "true");".</div><div><br></div><div>The above code runs fine when run from multiple cores on Node0. As soon as I add a Node1 to the mix, I get the following error:</div><div><br></div><div><div><font face="monospace, monospace">jpovich@crane mini_test> mpirun -np 2 -hosts crane,node1 /work/jpovich/box_testing/1970/1970_test </font></div><div><font face="monospace, monospace">[<a href="mailto:mpiexec@crane.csi.com">mpiexec@crane.csi.com</a>] control_cb (pm/pmiserv/pmiserv_cb.c:200): assert (!closed) failed</font></div><div><font face="monospace, monospace">[<a href="mailto:mpiexec@crane.csi.com">mpiexec@crane.csi.com</a>] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status</font></div><div><font face="monospace, monospace">[<a href="mailto:mpiexec@crane.csi.com">mpiexec@crane.csi.com</a>] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:198): error waiting for event</font></div><div><font face="monospace, monospace">[<a href="mailto:mpiexec@crane.csi.com">mpiexec@crane.csi.com</a>] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion</font></div></div><div><br></div><div>The cb_nodes setting seems to have no impact on behavior. I get the same error if I comment out the cb_nodes and no_indep_rw settings.</div><div><br></div><div>Any help is greatly appreciated,</div><div><br></div><div>Jon</div><div><br></div><div><br></div>
</div>