[mpich-devel] odd timings in type create/free (related to handle pool?)

William Gropp wgropp at illinois.edu
Mon Oct 6 10:03:00 CDT 2014


I’ve thought about that, but the real fix is to use the new code that we’re working on for better datatype performance.  It has a compact representation that can be sent to the target process (sending these flattened struct will kill performance at the target *and* take a lot of bandwidth, unnecessarily.

Bill

On Oct 5, 2014, at 7:50 PM, Jeff Hammond <jeff.science at gmail.com> wrote:

> Hi Bill,
> 
> Indeed, you're right.  My benchmark unnecessarily beats on the
> flattening code.  The intended use (in BigMPI) will not (because the
> chunk size is INT_MAX and thus one needs quite a bit of DRAM to
> require more than 10-way flattening).
> 
> As for the flattening code, why not take a lazy approach and only
> flatten datatypes upon their first use by RMA?  MPICH would need to
> have the ability to see them in both representations, but at least
> you'd avoid unnecessary flattening in the common case of 2-sided and I
> doubt that RMA with user-defined datatypes would feel the pain that
> much anyways.  ARMCI-MPI is perhaps the world's largest consumer of
> this feature and it will avoid this issue except in some highly
> unlikely (and perhaps strictly not-by-default) cases.
> 
> Best,
> 
> Jeff
> 
> On Sat, Oct 4, 2014 at 4:15 PM, William Gropp <wgropp at illinois.edu> wrote:
>> Jeff,
>> 
>> You are creating different datatypes with each call - if you call the constructor with the same argument (e.g., “10” instead of “i”), the time is just about constant.  The time appears to be proportional to the size, and I think it is due to the unconditional flattening of the struct representation, which is a known problem.  For this datatype, type flattened representation is inefficient, and it uses huge amounts of memory for large “i”.  Unfortunately, the current RMA code incorrectly assumes a flattened representation, so fixing this has turned out to be more involved that expected.
>> 
>> Bill
>> 
>> On Oct 3, 2014, at 5:36 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>> 
>>> I wanted to time how long it took to create a datatype.  Obviously,
>>> timing a series of calls is the normal way to get reasonable data.
>>> However, I find that my test shows that the time per call is in some
>>> way proportional to the number of calls in the series, even when I
>>> reuse the same handle.  I previously timed on a vector of handles and
>>> saw the same result.
>>> 
>>> I can only assume this is related to how MPICH does handle allocation
>>> internally.  Can you confirm?  Is there any way to get MPICH to
>>> garbage collect the internal handle pool so that the time per call
>>> goes back down again?  An increase from 4 us to 112 us per call is
>>> pretty substantial if I have a library that is going to use a lot of
>>> derived datatypes and has no reasonable way to cache them.
>>> 
>>> Thanks,
>>> 
>>> Jeff
>>> 
>>> OUTPUT
>>> 
>>> create, commit (free?) 100 Type_contig_x in 0.000393 s (3.926220 us per call)
>>> create, commit (free?) 1000 Type_contig_x in 0.006105 s (6.105444 us per call)
>>> create, commit (free?) 10000 Type_contig_x in 0.082496 s (8.249623 us per call)
>>> create, commit (free?) 25000 Type_contig_x in 0.341852 s (13.674085 us per call)
>>> create, commit (free?) 50000 Type_contig_x in 1.280882 s (25.617630 us per call)
>>> create, commit (free?) 100000 Type_contig_x in 4.565911 s (45.659108
>>> us per call)
>>> create, commit (free?) 250000 Type_contig_x in 27.989672 s (111.958686
>>> us per call)
>>> 
>>> 
>>> SOURCE
>>> 
>>> #include <stdint.h>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <string.h>
>>> #include <limits.h>
>>> #include <math.h>
>>> #include <mpi.h>
>>> 
>>> /* there is a reason for this silliness */
>>> static volatile int bigmpi_int_max = 3;
>>> 
>>> int MPIX_Type_contiguous_x(MPI_Count count, MPI_Datatype oldtype,
>>> MPI_Datatype * newtype)
>>> {
>>>   MPI_Count c = count/bigmpi_int_max;
>>>   MPI_Count r = count%bigmpi_int_max;
>>> 
>>>   MPI_Datatype chunks;
>>>   MPI_Type_vector(c, bigmpi_int_max, bigmpi_int_max, oldtype, &chunks);
>>> 
>>>   MPI_Datatype remainder;
>>>   MPI_Type_contiguous(r, oldtype, &remainder);
>>> 
>>>   MPI_Aint lb /* unused */, extent;
>>>   MPI_Type_get_extent(oldtype, &lb, &extent);
>>> 
>>>   MPI_Aint remdisp          = (MPI_Aint)c*bigmpi_int_max*extent;
>>>   int blocklengths[2]       = {1,1};
>>>   MPI_Aint displacements[2] = {0,remdisp};
>>>   MPI_Datatype types[2]     = {chunks,remainder};
>>>   MPI_Type_create_struct(2, blocklengths, displacements, types, newtype);
>>> 
>>>   MPI_Type_free(&chunks);
>>>   MPI_Type_free(&remainder);
>>> 
>>>   return MPI_SUCCESS;
>>> }
>>> 
>>> int main(int argc, char* argv[])
>>> {
>>>   int rank=0, size=1;
>>>   MPI_Init(&argc, &argv);
>>>   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>   MPI_Comm_size(MPI_COMM_WORLD, &size);
>>> 
>>>   int n = (argc>1) ? atoi(argv[1]) : 10000;
>>>   //MPI_Datatype * dtout = malloc(n*sizeof(MPI_Datatype));
>>>   MPI_Datatype dtout;
>>>   double t0 = MPI_Wtime();
>>>   for (int i=0; i<n; i++) {
>>>       //MPIX_Type_contiguous_x((MPI_Count)i, MPI_DOUBLE, &(dtout[i]));
>>>       //MPI_Type_commit(&(dtout[i]));
>>>       MPIX_Type_contiguous_x((MPI_Count)i, MPI_DOUBLE, &dtout);
>>>       MPI_Type_commit(&dtout);
>>>       MPI_Type_free(&dtout);
>>>   }
>>>   double t1 = MPI_Wtime();
>>>   double dt = t1-t0;
>>>   printf("create, commit (free?) %d Type_contig_x in %lf s (%lf us
>>> per call)\n",
>>>           n, dt, 1.e6*dt/(double)n);
>>> 
>>>   //for (int i=0; i<n; i++) {
>>>   //    MPI_Type_free(&(dtout[i]));
>>>   //}
>>>   //free(dtout);
>>> 
>>>   MPI_Finalize();
>>>   return 0;
>>> }
>>> 
>>> 
>>> --
>>> Jeff Hammond
>>> jeff.science at gmail.com
>>> http://jeffhammond.github.io/
>>> _______________________________________________
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/devel
>> 
>> _______________________________________________
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/devel
> 
> 
> 
> -- 
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
> _______________________________________________
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/devel



More information about the devel mailing list