[mpich-discuss] Single node IO Aggregator setup

Thakur, Rajeev thakur at mcs.anl.gov
Tue Jun 21 13:01:22 CDT 2016


Try running the cpi example from the examples directory across nodes and see if that works. There may be some issue unrelated to MPI-IO.

Rajeev

> On Jun 21, 2016, at 12:48 PM, Jon Povich <jon.povich at convergecfd.com> wrote:
> 
> I'm trying to simulate a cluster setup where only the I/O aggregators have access to the working directory. Is this feasible to do with romio hints?
> 
> Setup:
>   A) Work dir is on Node0's local hard drive.
>   B) Remote Node1 has no access to Node0's hard drive
>   C) Run a case where only rank 0 on Node0 serves as the I/O aggregator
> 
> Simple MPI I/O Code:
> 
> #include "mpi.h"
> 
> #include <string.h>
> #include <stdio.h>
> #include <unistd.h>
> 
> int main(int argc, char *argv[])
> {
>    MPI_File fh;
>    MPI_Info info;
>    int amode, mpi_ret;
>    char filename[64];
> 
>    MPI_Init(&argc, &argv);
> 
> 
>    MPI_Info_create(&info);
>    MPI_Info_set(info, "cb_nodes", "1");
>    MPI_Info_set(info, "no_indep_rw", "true");
> 
>    strcpy(filename, "mpio_test.out");
>    amode = MPI_MODE_CREATE | MPI_MODE_WRONLY;
>    mpi_ret = MPI_File_open(MPI_COMM_WORLD, filename, amode, info, &fh);
> 
>    if(mpi_ret != MPI_SUCCESS)
>    {
>       char       mpi_err_buf[MPI_MAX_ERROR_STRING];
>       int        mpi_err_len;
> 
>       MPI_Error_string(mpi_ret, mpi_err_buf, &mpi_err_len);
>       fprintf(stderr, "Failed MPI_File_open. Filename = %s, error = %s", filename, mpi_err_buf);
>       return -1;
>    }
> 
>    // Force I/O errors associated with this file to abort
>    MPI_File_set_errhandler(fh, MPI_ERRORS_ARE_FATAL);
>   
>    MPI_File_close(&fh);
> 
>    printf("SUCCESS\n");
> 
>    MPI_Finalize();
> 
>    return 0;
> }
> 
> Note the hardcoded "MPI_Info_set(info, "cb_nodes", "1");" and "MPI_Info_set(info, "no_indep_rw", "true");".
> 
> The above code runs fine when run from multiple cores on Node0. As soon as I add a Node1 to the mix, I get the following error:
> 
> jpovich at crane mini_test> mpirun -np 2 -hosts crane,node1 /work/jpovich/box_testing/1970/1970_test 
> [mpiexec at crane.csi.com] control_cb (pm/pmiserv/pmiserv_cb.c:200): assert (!closed) failed
> [mpiexec at crane.csi.com] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
> [mpiexec at crane.csi.com] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
> [mpiexec at crane.csi.com] main (ui/mpich/mpiexec.c:344): process manager error waiting for completion
> 
> The cb_nodes setting seems to have no impact on behavior. I get the same error if I comment out the cb_nodes and no_indep_rw settings.
> 
> Any help is greatly appreciated,
> 
> Jon
> 
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list