[mpich-devel] odd timings in type create/free (related to handle pool?)
William Gropp
wgropp at illinois.edu
Mon Oct 6 10:03:00 CDT 2014
I’ve thought about that, but the real fix is to use the new code that we’re working on for better datatype performance. It has a compact representation that can be sent to the target process (sending these flattened struct will kill performance at the target *and* take a lot of bandwidth, unnecessarily.
Bill
On Oct 5, 2014, at 7:50 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
> Hi Bill,
>
> Indeed, you're right. My benchmark unnecessarily beats on the
> flattening code. The intended use (in BigMPI) will not (because the
> chunk size is INT_MAX and thus one needs quite a bit of DRAM to
> require more than 10-way flattening).
>
> As for the flattening code, why not take a lazy approach and only
> flatten datatypes upon their first use by RMA? MPICH would need to
> have the ability to see them in both representations, but at least
> you'd avoid unnecessary flattening in the common case of 2-sided and I
> doubt that RMA with user-defined datatypes would feel the pain that
> much anyways. ARMCI-MPI is perhaps the world's largest consumer of
> this feature and it will avoid this issue except in some highly
> unlikely (and perhaps strictly not-by-default) cases.
>
> Best,
>
> Jeff
>
> On Sat, Oct 4, 2014 at 4:15 PM, William Gropp <wgropp at illinois.edu> wrote:
>> Jeff,
>>
>> You are creating different datatypes with each call - if you call the constructor with the same argument (e.g., “10” instead of “i”), the time is just about constant. The time appears to be proportional to the size, and I think it is due to the unconditional flattening of the struct representation, which is a known problem. For this datatype, type flattened representation is inefficient, and it uses huge amounts of memory for large “i”. Unfortunately, the current RMA code incorrectly assumes a flattened representation, so fixing this has turned out to be more involved that expected.
>>
>> Bill
>>
>> On Oct 3, 2014, at 5:36 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>>
>>> I wanted to time how long it took to create a datatype. Obviously,
>>> timing a series of calls is the normal way to get reasonable data.
>>> However, I find that my test shows that the time per call is in some
>>> way proportional to the number of calls in the series, even when I
>>> reuse the same handle. I previously timed on a vector of handles and
>>> saw the same result.
>>>
>>> I can only assume this is related to how MPICH does handle allocation
>>> internally. Can you confirm? Is there any way to get MPICH to
>>> garbage collect the internal handle pool so that the time per call
>>> goes back down again? An increase from 4 us to 112 us per call is
>>> pretty substantial if I have a library that is going to use a lot of
>>> derived datatypes and has no reasonable way to cache them.
>>>
>>> Thanks,
>>>
>>> Jeff
>>>
>>> OUTPUT
>>>
>>> create, commit (free?) 100 Type_contig_x in 0.000393 s (3.926220 us per call)
>>> create, commit (free?) 1000 Type_contig_x in 0.006105 s (6.105444 us per call)
>>> create, commit (free?) 10000 Type_contig_x in 0.082496 s (8.249623 us per call)
>>> create, commit (free?) 25000 Type_contig_x in 0.341852 s (13.674085 us per call)
>>> create, commit (free?) 50000 Type_contig_x in 1.280882 s (25.617630 us per call)
>>> create, commit (free?) 100000 Type_contig_x in 4.565911 s (45.659108
>>> us per call)
>>> create, commit (free?) 250000 Type_contig_x in 27.989672 s (111.958686
>>> us per call)
>>>
>>>
>>> SOURCE
>>>
>>> #include <stdint.h>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <string.h>
>>> #include <limits.h>
>>> #include <math.h>
>>> #include <mpi.h>
>>>
>>> /* there is a reason for this silliness */
>>> static volatile int bigmpi_int_max = 3;
>>>
>>> int MPIX_Type_contiguous_x(MPI_Count count, MPI_Datatype oldtype,
>>> MPI_Datatype * newtype)
>>> {
>>> MPI_Count c = count/bigmpi_int_max;
>>> MPI_Count r = count%bigmpi_int_max;
>>>
>>> MPI_Datatype chunks;
>>> MPI_Type_vector(c, bigmpi_int_max, bigmpi_int_max, oldtype, &chunks);
>>>
>>> MPI_Datatype remainder;
>>> MPI_Type_contiguous(r, oldtype, &remainder);
>>>
>>> MPI_Aint lb /* unused */, extent;
>>> MPI_Type_get_extent(oldtype, &lb, &extent);
>>>
>>> MPI_Aint remdisp = (MPI_Aint)c*bigmpi_int_max*extent;
>>> int blocklengths[2] = {1,1};
>>> MPI_Aint displacements[2] = {0,remdisp};
>>> MPI_Datatype types[2] = {chunks,remainder};
>>> MPI_Type_create_struct(2, blocklengths, displacements, types, newtype);
>>>
>>> MPI_Type_free(&chunks);
>>> MPI_Type_free(&remainder);
>>>
>>> return MPI_SUCCESS;
>>> }
>>>
>>> int main(int argc, char* argv[])
>>> {
>>> int rank=0, size=1;
>>> MPI_Init(&argc, &argv);
>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>
>>> int n = (argc>1) ? atoi(argv[1]) : 10000;
>>> //MPI_Datatype * dtout = malloc(n*sizeof(MPI_Datatype));
>>> MPI_Datatype dtout;
>>> double t0 = MPI_Wtime();
>>> for (int i=0; i<n; i++) {
>>> //MPIX_Type_contiguous_x((MPI_Count)i, MPI_DOUBLE, &(dtout[i]));
>>> //MPI_Type_commit(&(dtout[i]));
>>> MPIX_Type_contiguous_x((MPI_Count)i, MPI_DOUBLE, &dtout);
>>> MPI_Type_commit(&dtout);
>>> MPI_Type_free(&dtout);
>>> }
>>> double t1 = MPI_Wtime();
>>> double dt = t1-t0;
>>> printf("create, commit (free?) %d Type_contig_x in %lf s (%lf us
>>> per call)\n",
>>> n, dt, 1.e6*dt/(double)n);
>>>
>>> //for (int i=0; i<n; i++) {
>>> // MPI_Type_free(&(dtout[i]));
>>> //}
>>> //free(dtout);
>>>
>>> MPI_Finalize();
>>> return 0;
>>> }
>>>
>>>
>>> --
>>> Jeff Hammond
>>> jeff.science at gmail.com
>>> http://jeffhammond.github.io/
>>> _______________________________________________
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/devel
>>
>> _______________________________________________
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/devel
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
> _______________________________________________
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/devel
More information about the devel
mailing list