[mpich-discuss] Single node IO Aggregator setup
Jon Povich
jon.povich at convergecfd.com
Tue Jun 21 12:48:46 CDT 2016
I'm trying to simulate a cluster setup where only the I/O aggregators have
access to the working directory. Is this feasible to do with romio hints?
Setup:
A) Work dir is on Node0's local hard drive.
B) Remote Node1 has no access to Node0's hard drive
C) Run a case where only rank 0 on Node0 serves as the I/O aggregator
Simple MPI I/O Code:
#include "mpi.h"
#include <string.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
MPI_File fh;
MPI_Info info;
int amode, mpi_ret;
char filename[64];
MPI_Init(&argc, &argv);
MPI_Info_create(&info);
MPI_Info_set(info, "cb_nodes", "1");
MPI_Info_set(info, "no_indep_rw", "true");
strcpy(filename, "mpio_test.out");
amode = MPI_MODE_CREATE | MPI_MODE_WRONLY;
mpi_ret = MPI_File_open(MPI_COMM_WORLD, filename, amode, info, &fh);
if(mpi_ret != MPI_SUCCESS)
{
char mpi_err_buf[MPI_MAX_ERROR_STRING];
int mpi_err_len;
MPI_Error_string(mpi_ret, mpi_err_buf, &mpi_err_len);
fprintf(stderr, "Failed MPI_File_open. Filename = %s, error = %s",
filename, mpi_err_buf);
return -1;
}
// Force I/O errors associated with this file to abort
MPI_File_set_errhandler(fh, MPI_ERRORS_ARE_FATAL);
MPI_File_close(&fh);
printf("SUCCESS\n");
MPI_Finalize();
return 0;
}
Note the hardcoded "MPI_Info_set(info, "cb_nodes", "1");" and
"MPI_Info_set(info, "no_indep_rw", "true");".
The above code runs fine when run from multiple cores on Node0. As soon as
I add a Node1 to the mix, I get the following error:
jpovich at crane mini_test> mpirun -np 2 -hosts crane,node1
/work/jpovich/box_testing/1970/1970_test
[mpiexec at crane.csi.com] control_cb (pm/pmiserv/pmiserv_cb.c:200): assert
(!closed) failed
[mpiexec at crane.csi.com] HYDT_dmxu_poll_wait_for_event
(tools/demux/demux_poll.c:76): callback returned error status
[mpiexec at crane.csi.com] HYD_pmci_wait_for_completion
(pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
[mpiexec at crane.csi.com] main (ui/mpich/mpiexec.c:344): process manager
error waiting for completion
The cb_nodes setting seems to have no impact on behavior. I get the same
error if I comment out the cb_nodes and no_indep_rw settings.
Any help is greatly appreciated,
Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160621/d49f015b/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list