[mpich-discuss] Single node IO Aggregator setup
Thakur, Rajeev
thakur at mcs.anl.gov
Tue Jun 21 13:01:22 CDT 2016
Try running the cpi example from the examples directory across nodes and see if that works. There may be some issue unrelated to MPI-IO.
Rajeev
> On Jun 21, 2016, at 12:48 PM, Jon Povich <jon.povich at convergecfd.com> wrote:
>
> I'm trying to simulate a cluster setup where only the I/O aggregators have access to the working directory. Is this feasible to do with romio hints?
>
> Setup:
> A) Work dir is on Node0's local hard drive.
> B) Remote Node1 has no access to Node0's hard drive
> C) Run a case where only rank 0 on Node0 serves as the I/O aggregator
>
> Simple MPI I/O Code:
>
> #include "mpi.h"
>
> #include <string.h>
> #include <stdio.h>
> #include <unistd.h>
>
> int main(int argc, char *argv[])
> {
> MPI_File fh;
> MPI_Info info;
> int amode, mpi_ret;
> char filename[64];
>
> MPI_Init(&argc, &argv);
>
>
> MPI_Info_create(&info);
> MPI_Info_set(info, "cb_nodes", "1");
> MPI_Info_set(info, "no_indep_rw", "true");
>
> strcpy(filename, "mpio_test.out");
> amode = MPI_MODE_CREATE | MPI_MODE_WRONLY;
> mpi_ret = MPI_File_open(MPI_COMM_WORLD, filename, amode, info, &fh);
>
> if(mpi_ret != MPI_SUCCESS)
> {
> char mpi_err_buf[MPI_MAX_ERROR_STRING];
> int mpi_err_len;
>
> MPI_Error_string(mpi_ret, mpi_err_buf, &mpi_err_len);
> fprintf(stderr, "Failed MPI_File_open. Filename = %s, error = %s", filename, mpi_err_buf);
> return -1;
> }
>
> // Force I/O errors associated with this file to abort
> MPI_File_set_errhandler(fh, MPI_ERRORS_ARE_FATAL);
>
> MPI_File_close(&fh);
>
> printf("SUCCESS\n");
>
> MPI_Finalize();
>
> return 0;
> }
>
> Note the hardcoded "MPI_Info_set(info, "cb_nodes", "1");" and "MPI_Info_set(info, "no_indep_rw", "true");".
>
> The above code runs fine when run from multiple cores on Node0. As soon as I add a Node1 to the mix, I get the following error:
>
> jpovich at crane mini_test> mpirun -np 2 -hosts crane,node1 /work/jpovich/box_testing/1970/1970_test
> [mpiexec at crane.csi.com] control_cb (pm/pmiserv/pmiserv_cb.c:200): assert (!closed) failed
> [mpiexec at crane.csi.com] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
> [mpiexec at crane.csi.com] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
> [mpiexec at crane.csi.com] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion
>
> The cb_nodes setting seems to have no impact on behavior. I get the same error if I comment out the cb_nodes and no_indep_rw settings.
>
> Any help is greatly appreciated,
>
> Jon
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list