[mpich-discuss] Use of MPI derived data types / MPI file IO
Wei-keng Liao
wkliao at ece.northwestern.edu
Sun Nov 18 12:27:04 CST 2012
Hi, John
If your I/O is simply appending one process's data after another and the I/O buffers in memory
are all contiguous, then you can simply do the following without defining MPI
derived data types or setting the file view.
MPI_File_write_at_all(f, offset, &atoms[0], (int)atoms.size() * sizeof(struct atom), MPI_BYTE, &stat);
Using derived data types is usually when you have noncontiguous buffer in memory or
want to access non-contiguous data in files.
Wei-keng
On Nov 18, 2012, at 11:52 AM, <jgrime at uchicago.edu> <jgrime at uchicago.edu> wrote:
> Hi all,
>
> I'm having some problems with using derived data types and MPI parallel IO, and
> was wondering if anyone could help. I tried to search the archives in case this
> was covered earlier, but that just gave me "ht://Dig error" messages.
>
> Outline: I have written a C++ program where each MPI rank acts on data stored
> in a local array of structures. The arrays are typically of different lengths on each
> rank. I wish to write and read the contents of these arrays to disk using MPI's
> parallel IO routines. The file format is simply an initial integer which describes
> how many "structures" are in the file, followed by the data which represents the
> "structure information" from all ranks (ie the total data set).
>
> So far, I've tried two different approaches: the first consists of each rank
> serialising the contents of the local array of structures into a byte array, which is
> then saved to file "f" using MPI_File_set_view( f, MPI_COMM_WORLD, offset,
> MPI_CHAR, MPI_CHAR, "native", MPI_INFO_NULL ) to skip the initial integer
> "header" and then a call to MPI_File_write_all( f, local_bytearray, local_n_bytes,
> MPI_CHAR, &status ). Here, "offset" is simply the size of an integer (in bytes) +
> the summation of the number of bytes each preceeding rank wishes to write to
> the file (received via an earlier MPI_Allgather call). This seems to work, as when I
> read the file back in on a single MPI rank and deserialise the data into an array of
> structures I get the results I expect.
>
> The second approach is to use MPI's derived data types to create MPI
> representations of the structures, and then treat the arrays of structures as MPI
> data types. This allows me to avoid copying the local data into an intermediate
> buffer etc, and seems the more elegant approach. I cannot, however, seem to
> make this approach work.
>
> I'm pretty sure the problem lies in my use of the file views, but I'm not sure
> where I'm going wrong. The reading of the integer "header" always works fine,
> but the proceeding data is garbled. I'm using the "native" data representation for
> testing, but will likely change that to something more portable when I get this
> code working.
>
> I've included the important excerpts of the test code I'm trying to use below
> (with some printf()s and error handling etc removed to make it a little more
> concise). I have previously tested that std::vector allocates a contiguous flat
> array of the appropriate data type in memory, so passing a pointer/reference to
> the first element in such a data structure behaves the same way as simply
> passing a conventional array of the appropriate data type:
>
> struct atom
> {
> int global_id;
> double xyz[3];
> };
>
> void write( char * fpath, std::vector<struct atom> &atoms, int rank, int nranks )
> {
> /*
> Memory layout information for the structure we wish to convert into
> an
> MPI derived data type.
> */
> std::vector<int> s_blocklengths;
> std::vector<MPI_Aint> s_displacements;
> std::vector<MPI_Datatype> s_datatypes;
> MPI_Aint addr_start, addr;
> MPI_Datatype mpi_atom_type, mpi_atom_type_resized;
> int type_size;
>
> struct atom a;
>
> MPI_File f;
> MPI_Status stat;
> MPI_Offset offset;
> char *datarep = (char *)"native";
>
> std::vector<int> all_N;
> int local_N, global_N;
>
> /*
> Set up the structure data type: single integer, and 3 double precision
> floats.
> We use the temporary "a" structure to determine the layout of memory
> inside
> atom structures.
> */
> MPI_Get_address( &a, &addr_start );
>
> s_blocklengths.push_back( 1 );
> s_datatypes.push_back( MPI_INT );
> MPI_Get_address( &a.global_id, &addr );
> s_displacements.push_back( addr - addr_start );
>
> s_blocklengths.push_back( 3 );
> s_datatypes.push_back( MPI_DOUBLE );
> MPI_Get_address( &a.xyz[0], &addr );
> s_displacements.push_back( addr - addr_start );
>
> MPI_Type_create_struct( (int)s_blocklengths.size(), &s_blocklengths[0],
> &s_displacements[0], &s_datatypes[0], &mpi_atom_type );
> MPI_Type_commit( &mpi_atom_type );
>
> /*
> Take into account any compiler padding in creating an array of
> structures.
> */
> MPI_Type_create_resized( mpi_atom_type, 0, sizeof(struct atom),
> &mpi_atom_type_resized );
> MPI_Type_commit( &mpi_atom_type_resized );
>
> MPI_Type_size( mpi_atom_type_resized, &type_size );
>
> local_N = (int)atoms.size();
> all_N.resize( nranks );
>
> MPI_Allgather( &local_N, 1, MPI_INT, &all_N[0], 1, MPI_INT,
> MPI_COMM_WORLD );
>
> global_N = 0;
> for( size_t i=0; i<all_N.size(); i++ ) global_N += all_N[i];
>
> offset = 0;
> for( int i=0; i<rank; i++ ) offset += all_N[i];
>
> offset *= type_size; // convert from structure counts -> bytes into file for
> true structure size
> offset += sizeof( int ); // skip leading integer (global_N) in file.
>
> MPI_File_open( MPI_COMM_WORLD, fpath, MPI_MODE_CREATE |
> MPI_MODE_WRONLY, MPI_INFO_NULL, &f );
> if( rank == 0 )
> {
> MPI_File_write( f, &global_N, 1, MPI_INT, &stat );
> }
> MPI_File_set_view( f, offset, mpi_atom_type_resized,
> mpi_atom_type_resized, datarep, MPI_INFO_NULL );
>
> MPI_File_write_all( f, &atoms[0], (int)atoms.size(), mpi_atom_type_resized,
> &stat );
> MPI_File_close( &f );
>
> MPI_Type_free( &mpi_atom_type );
> MPI_Type_free( &mpi_atom_type_resized );
>
> return;
> }
>
> void read( char * fpath, std::vector<struct atom> &atoms )
> {
> std::vector<int> s_blocklengths;
> std::vector<MPI_Aint> s_displacements;
> std::vector<MPI_Datatype> s_datatypes;
> MPI_Datatype mpi_atom_type, mpi_atom_type_resized;
>
> struct atom a;
> MPI_Aint addr_start, addr;
>
> MPI_File f;
> MPI_Status stat;
>
> int global_N;
> char *datarep = (char *)"native";
>
> int type_size;
>
> /*
> Set up the structure data type
> */
> MPI_Get_address( &a, &addr_start );
>
> s_blocklengths.push_back( 1 );
> s_datatypes.push_back( MPI_INT );
> MPI_Get_address( &a.global_id, &addr );
> s_displacements.push_back( addr - addr_start );
>
> s_blocklengths.push_back( 3 );
> s_datatypes.push_back( MPI_DOUBLE );
> MPI_Get_address( &a.xyz[0], &addr );
> s_displacements.push_back( addr - addr_start );
>
> MPI_Type_create_struct( (int)s_blocklengths.size(), &s_blocklengths[0],
> &s_displacements[0], &s_datatypes[0], &mpi_atom_type );
> MPI_Type_commit( &mpi_atom_type );
>
> /*
> Take into account any compiler padding in creating an array of
> structures.
> */
> MPI_Type_create_resized( mpi_atom_type, 0, sizeof(struct atom),
> &mpi_atom_type_resized );
> MPI_Type_commit( &mpi_atom_type_resized );
>
> MPI_Type_size( mpi_atom_type_resized, &type_size );
>
> MPI_File_open( MPI_COMM_SELF, fpath, MPI_MODE_RDONLY,
> MPI_INFO_NULL, &f );
>
> MPI_File_read( f, &global_N, 1, MPI_INT, &stat );
>
> atoms.clear();
> atoms.resize( global_N );
>
> errcode = MPI_File_set_view( f, sizeof(int), mpi_atom_type_resized,
> mpi_atom_type_resized, datarep, MPI_INFO_NULL );
> errcode = MPI_File_read( f, &atoms[0], global_N, mpi_atom_type_resized,
> &stat );
> errcode = MPI_File_close( &f );
>
> MPI_Type_free( &mpi_atom_type );
> MPI_Type_free( &mpi_atom_type_resized );
>
> return;
> }
>
> Calling MPI_Type_get_extent() and MPI_Type_get_true_extent() for both
> mpi_atom_type and mpi_atom_type_resized returns (0,32) bytes in all cases.
> Calling MPI_Type_size() on both derived data types returns 28 bytes.
>
> If I call MPI_File_get_type_extent() on both derived data types after opening the
> file, they both resolve to 32 bytes - so I think the problem is in the difference
> between the data representation in memory and on disk. If I explicitly use 32
> bytes in the offset calculation in the write() routine above, it still doesn't work.
>
> I'm finding it remarkably difficult to do something very simple using MPI's
> derived data types and the parallel IO, and hence I'm guessing that I have
> fundamentally misunderstood one or more aspects of this. If anyone can help
> clarify where I'm going wrong, that would be much appreciated!
>
> Cheers,
>
> John.
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list