[mpich-discuss] Use of MPI derived data types / MPI file IO

Sun Nov 18 12:27:04 CST 2012

Hi, John

If your I/O is simply appending one process's data after another and the I/O buffers in memory
are all contiguous, then you can simply do the following without defining MPI
derived data types or setting the file view.

MPI_File_write_at_all(f, offset, &atoms[0], (int)atoms.size() * sizeof(struct atom), MPI_BYTE, &stat);

Using derived data types is usually when you have noncontiguous buffer in memory or
want to access non-contiguous data in files.

Wei-keng

On Nov 18, 2012, at 11:52 AM, <jgrime at uchicago.edu> <jgrime at uchicago.edu> wrote:

> Hi all,
> 
> I'm having some problems with using derived data types and MPI parallel IO, and 
> was wondering if anyone could help. I tried to search the archives in case this 
> was covered earlier, but that just gave me "ht://Dig error" messages.
> 
> Outline: I have written a C++ program where each MPI rank acts on data stored 
> in a local array of structures. The arrays are typically of different lengths on each 
> rank. I wish to write and read the contents of these arrays to disk using MPI's 
> parallel IO routines. The file format is simply an initial integer which describes 
> how many "structures" are in the file, followed by the data which represents the 
> "structure information" from all ranks (ie the total data set).
> 
> So far, I've tried two different approaches: the first consists of each rank 
> serialising the contents of the local array of structures into a byte array, which is 
> then saved to file "f" using MPI_File_set_view( f, MPI_COMM_WORLD, offset, 
> MPI_CHAR, MPI_CHAR, "native", MPI_INFO_NULL ) to skip the initial integer 
> "header" and then a call to MPI_File_write_all( f, local_bytearray, local_n_bytes, 
> MPI_CHAR, &status ). Here, "offset" is simply the size of an integer (in bytes) + 
> the summation of the number of bytes each preceeding rank wishes to write to 
> the file (received via an earlier MPI_Allgather call). This seems to work, as when I 
> read the file back in on a single MPI rank and deserialise the data into an array of 
> structures I get the results I expect.
> 
> The second approach is to use MPI's derived data types to create MPI 
> representations of the structures, and then treat the arrays of structures as MPI 
> data types. This allows me to avoid copying the local data into an intermediate 
> buffer etc, and seems the more elegant approach. I cannot, however, seem to 
> make this approach work.
> 
> I'm pretty sure the problem lies in my use of the file views, but I'm not sure 
> where I'm going wrong. The reading of the integer "header" always works fine, 
> but the proceeding data is garbled. I'm using the "native" data representation for 
> testing, but will likely change that to something more portable when I get this 
> code working.
> 
> I've included the important excerpts of the test code I'm trying to use below 
> (with some printf()s and error handling etc removed to make it a little more 
> concise). I have previously tested that std::vector allocates a contiguous flat 
> array of the appropriate data type in memory, so passing a pointer/reference to 
> the first element in such a data structure behaves the same way as simply 
> passing a conventional array of the appropriate data type:
> 
> struct atom
> {
> 	int global_id;
> 	double xyz[3];
> };
> 
> void write( char * fpath, std::vector<struct atom> &atoms, int rank, int nranks )
> {
> 	/*
> 		Memory layout information for the structure we wish to convert into 
> an
> 		MPI derived data type.
> 	*/
> 	std::vector<int> s_blocklengths;
> 	std::vector<MPI_Aint> s_displacements;
> 	std::vector<MPI_Datatype> s_datatypes;
> 	MPI_Aint addr_start, addr;
> 	MPI_Datatype mpi_atom_type, mpi_atom_type_resized;
> 	int type_size;
> 	
> 	struct atom a;
> 	
> 	MPI_File f;
> 	MPI_Status stat;
> 	MPI_Offset offset;
> 	char *datarep = (char *)"native";
> 
> 	std::vector<int> all_N;
> 	int local_N, global_N;
> 
> 	/*
> 		Set up the structure data type: single integer, and 3 double precision 
> floats.
> 		We use the temporary "a" structure to determine the layout of memory 
> inside
> 		atom structures.
> 	*/
> 	MPI_Get_address( &a, &addr_start );
> 	
> 	s_blocklengths.push_back( 1 );
> 	s_datatypes.push_back( MPI_INT );
> 	MPI_Get_address( &a.global_id, &addr );
> 	s_displacements.push_back( addr - addr_start );
> 
> 	s_blocklengths.push_back( 3 );
> 	s_datatypes.push_back( MPI_DOUBLE );
> 	MPI_Get_address( &a.xyz[0], &addr );
> 	s_displacements.push_back( addr - addr_start );
> 	
> 	MPI_Type_create_struct( (int)s_blocklengths.size(), &s_blocklengths[0], 
> &s_displacements[0], &s_datatypes[0], &mpi_atom_type );
> 	MPI_Type_commit( &mpi_atom_type );
> 	
> 	/*
> 		Take into account any compiler padding in creating an array of 
> structures.
> 	*/
> 	MPI_Type_create_resized( mpi_atom_type, 0, sizeof(struct atom), 
> &mpi_atom_type_resized );
> 	MPI_Type_commit( &mpi_atom_type_resized );
> 		
> 	MPI_Type_size( mpi_atom_type_resized, &type_size );
> 
> 	local_N = (int)atoms.size();
> 	all_N.resize( nranks );
> 
> 	MPI_Allgather( &local_N, 1, MPI_INT, &all_N[0], 1, MPI_INT, 
> MPI_COMM_WORLD );
> 
> 	global_N = 0;
> 	for( size_t i=0; i<all_N.size(); i++ ) global_N += all_N[i];
> 
> 	offset = 0;
> 	for( int i=0; i<rank; i++ ) offset += all_N[i];
> 
> 	offset *= type_size; // convert from structure counts -> bytes into file for 
> true structure size
> 	offset += sizeof( int ); // skip leading integer (global_N) in file.
> 
> 	MPI_File_open( MPI_COMM_WORLD, fpath, MPI_MODE_CREATE | 
> MPI_MODE_WRONLY, MPI_INFO_NULL, &f );
> 	if( rank == 0 )
> 	{
> 		MPI_File_write( f, &global_N, 1, MPI_INT, &stat );
> 	}
> 	MPI_File_set_view( f, offset, mpi_atom_type_resized, 
> mpi_atom_type_resized, datarep, MPI_INFO_NULL );
> 	
> 	MPI_File_write_all( f, &atoms[0], (int)atoms.size(), mpi_atom_type_resized, 
> &stat );
> 	MPI_File_close( &f );
> 
> 	MPI_Type_free( &mpi_atom_type );
> 	MPI_Type_free( &mpi_atom_type_resized );
> 
> 	return;
> }
> 
> void read( char * fpath, std::vector<struct atom> &atoms )
> {
> 	std::vector<int> s_blocklengths;
> 	std::vector<MPI_Aint> s_displacements;
> 	std::vector<MPI_Datatype> s_datatypes;
> 	MPI_Datatype mpi_atom_type, mpi_atom_type_resized;
> 	
> 	struct atom a;
> 	MPI_Aint addr_start, addr;
> 	
> 	MPI_File f;
> 	MPI_Status stat;
> 	
> 	int global_N;
> 	char *datarep = (char *)"native";
> 
> 	int type_size;
> 
> 	/*
> 		Set up the structure data type
> 	*/
> 	MPI_Get_address( &a, &addr_start );
> 	
> 	s_blocklengths.push_back( 1 );
> 	s_datatypes.push_back( MPI_INT );
> 	MPI_Get_address( &a.global_id, &addr );
> 	s_displacements.push_back( addr - addr_start );
> 
> 	s_blocklengths.push_back( 3 );
> 	s_datatypes.push_back( MPI_DOUBLE );
> 	MPI_Get_address( &a.xyz[0], &addr );
> 	s_displacements.push_back( addr - addr_start );
> 	
> 	MPI_Type_create_struct( (int)s_blocklengths.size(), &s_blocklengths[0], 
> &s_displacements[0], &s_datatypes[0], &mpi_atom_type );
> 	MPI_Type_commit( &mpi_atom_type );
> 	
> 	/*
> 		Take into account any compiler padding in creating an array of 
> structures.
> 	*/
> 	MPI_Type_create_resized( mpi_atom_type, 0, sizeof(struct atom), 
> &mpi_atom_type_resized );
> 	MPI_Type_commit( &mpi_atom_type_resized );
> 
> 	MPI_Type_size( mpi_atom_type_resized, &type_size );
> 	
> 	MPI_File_open( MPI_COMM_SELF, fpath, MPI_MODE_RDONLY, 
> MPI_INFO_NULL, &f );
> 
> 	MPI_File_read( f, &global_N, 1, MPI_INT, &stat );
> 	
> 	atoms.clear();
> 	atoms.resize( global_N );
> 
> 	errcode = MPI_File_set_view( f, sizeof(int), mpi_atom_type_resized, 
> mpi_atom_type_resized, datarep, MPI_INFO_NULL );
> 	errcode = MPI_File_read( f, &atoms[0], global_N, mpi_atom_type_resized, 
> &stat );
> 	errcode = MPI_File_close( &f );
> 
> 	MPI_Type_free( &mpi_atom_type );
> 	MPI_Type_free( &mpi_atom_type_resized );
> 
> 	return;
> }
> 
> Calling MPI_Type_get_extent() and MPI_Type_get_true_extent() for both 
> mpi_atom_type and mpi_atom_type_resized returns (0,32) bytes in all cases. 
> Calling MPI_Type_size() on both derived data types returns 28 bytes.
> 
> If I call MPI_File_get_type_extent() on both derived data types after opening the 
> file, they both resolve to 32 bytes - so I think the problem is in the difference 
> between the data representation in memory and on disk. If I explicitly use 32 
> bytes in the offset calculation in the write() routine above, it still doesn't work.
> 
> I'm finding it remarkably difficult to do something very simple using MPI's 
> derived data types and the parallel IO, and hence I'm guessing that I have 
> fundamentally misunderstood one or more aspects of this. If anyone can help 
> clarify where I'm going wrong, that would be much appreciated!
> 
> Cheers,
> 
> John.
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss