[mpich-discuss] Bug fix for dims_create

Jeff Hammond jhammond at alcf.anl.gov
Sat Dec 8 14:46:03 CST 2012


Hi Ian,

I have created https://trac.mpich.org/projects/mpich/ticket/1765 on
your behalf so that this issue can be tracked by the developers.

Best,

Jeff

On Sat, Dec 8, 2012 at 2:07 PM, Ian Hutchinson <hutch at psfc.mit.edu> wrote:
>
> The file src/mpi/topo/dims_create.c contains the code that determines the
> result of
>
> MPI_DIMS_CREATE
>
> It contains a bug which causes it to produce improper distributions of the
> processes among dimensions that do not satisfy the objective of being "as
> close to each other as possible". For example, if called in 3-dimensions,
> with 16 nodes, the topology returned is 4, 4, 1. It ought to be 4, 2, 2.
>
> This bug is caused by some longstanding cobbled-together code that is called
> when all the factors of the nnodes are 2 (which is not an unusual case).
>
> I attach (and include below) a patch to correct this bug. It would be great
> if it could find its way into the distribution.
>
> Thanks
>         Ian Hutchinson
>         http://www.psfc.mit.edu/people/hutch/
>
> =========================================================================
>
> --- dims_create.c.dist  2012-12-08 13:46:46.000000000 -0500
> +++ dims_create.c       2012-12-08 13:48:02.000000000 -0500
> @@ -317,28 +317,22 @@
>             int cnt    = factors[0].cnt; /* Numver of factors left */
>             int cnteach = ( cnt + dims_needed - 1 ) / dims_needed;
>             int factor_each;
> - -         factor_each = factor;
> -           for (i=1; i<cnteach; i++) factor_each *= factor;
>
> -           for (i=0; i<ndims; i++) {
> -               if (dims[i] == 0) {
> -                   if (cnt > cnteach) {
> -                       dims[i] = factor_each;
> -                       cnt -= cnteach;
> -                   }
> -                   else if (cnt > 0) {
> -                       factor_each = factor;
> -                       for (j=1; j<cnt; j++) -
> factor_each *= factor;
> -                       dims[i] = factor_each;
> -                       cnt = 0;
> -                   }
> -                   else {
> -                       dims[i] = 1;
> -                   }
> +           for (i=0;i<ndims;i++){ +            if(dims[i]==0)dims[i]=-1;
> +           }
> +           i=0;
> +           while(cnt > 0){
> +               if(dims[i] < 0){
> +                   dims[i]=dims[i]*factor;
> +                   cnt--;
>                 }
> +               if(++i >= ndims)i=0;
> +           }
> +           for (i=0;i<ndims;i++){
> +               if(dims[i] < 0)dims[i]=-dims[i];
>             }
> +
>         }
>         else {
>             /* Here is the general case.  */
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond



More information about the discuss mailing list