[mpich-discuss] Single node IO Aggregator setup

Jon Povich jon.povich at convergecfd.com
Tue Jun 21 12:48:46 CDT 2016


I'm trying to simulate a cluster setup where only the I/O aggregators have
access to the working directory. Is this feasible to do with romio hints?

Setup:
  A) Work dir is on Node0's local hard drive.
  B) Remote Node1 has no access to Node0's hard drive
  C) Run a case where only rank 0 on Node0 serves as the I/O aggregator

Simple MPI I/O Code:

#include "mpi.h"

#include <string.h>
#include <stdio.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
   MPI_File fh;
   MPI_Info info;
   int amode, mpi_ret;
   char filename[64];

   MPI_Init(&argc, &argv);


   MPI_Info_create(&info);
   MPI_Info_set(info, "cb_nodes", "1");
   MPI_Info_set(info, "no_indep_rw", "true");

   strcpy(filename, "mpio_test.out");
   amode = MPI_MODE_CREATE | MPI_MODE_WRONLY;
   mpi_ret = MPI_File_open(MPI_COMM_WORLD, filename, amode, info, &fh);

   if(mpi_ret != MPI_SUCCESS)
   {
      char       mpi_err_buf[MPI_MAX_ERROR_STRING];
      int        mpi_err_len;

      MPI_Error_string(mpi_ret, mpi_err_buf, &mpi_err_len);
      fprintf(stderr, "Failed MPI_File_open. Filename = %s, error = %s",
filename, mpi_err_buf);
      return -1;
   }

   // Force I/O errors associated with this file to abort
   MPI_File_set_errhandler(fh, MPI_ERRORS_ARE_FATAL);

   MPI_File_close(&fh);

   printf("SUCCESS\n");

   MPI_Finalize();

   return 0;
}

Note the hardcoded "MPI_Info_set(info, "cb_nodes", "1");" and
"MPI_Info_set(info, "no_indep_rw", "true");".

The above code runs fine when run from multiple cores on Node0. As soon as
I add a Node1 to the mix, I get the following error:

jpovich at crane mini_test> mpirun -np 2 -hosts crane,node1
/work/jpovich/box_testing/1970/1970_test
[mpiexec at crane.csi.com] control_cb (pm/pmiserv/pmiserv_cb.c:200): assert
(!closed) failed
[mpiexec at crane.csi.com] HYDT_dmxu_poll_wait_for_event
(tools/demux/demux_poll.c:76): callback returned error status
[mpiexec at crane.csi.com] HYD_pmci_wait_for_completion
(pm/pmiserv/pmiserv_pmci.c:198): error waiting for event
[mpiexec at crane.csi.com] main (ui/mpich/mpiexec.c:344): process manager
error waiting for completion

The cb_nodes setting seems to have no impact on behavior. I get the same
error if I comment out the cb_nodes and no_indep_rw settings.

Any help is greatly appreciated,

Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160621/d49f015b/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list