[mpich-discuss] MPI Reduce with MPI_IN_PLACE fails with non-0 root rank for message sizes over 256 with MPI version 4 and after

Solomonik, Edgar solomon2 at illinois.edu
Thu Jun 8 16:01:09 CDT 2023


Thanks, indeed the env variable fixes the issue and glad to hear its fixed in the latest version.

Best,
Edgar
________________________________
From: Raffenetti, Ken via discuss <discuss at mpich.org>
Sent: Thursday, June 8, 2023 3:50 PM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Raffenetti, Ken <raffenet at anl.gov>
Subject: Re: [mpich-discuss] MPI Reduce with MPI_IN_PLACE fails with non-0 root rank for message sizes over 256 with MPI version 4 and after


FWIW, you can workaround the bug in older versions by setting MPIR_CVAR_DEVICE_COLLECTIVES=none in your environment.



Ken



From: "Raffenetti, Ken via discuss" <discuss at mpich.org>
Reply-To: "discuss at mpich.org" <discuss at mpich.org>
Date: Thursday, June 8, 2023 at 3:45 PM
To: "discuss at mpich.org" <discuss at mpich.org>
Cc: "Raffenetti, Ken" <raffenet at anl.gov>
Subject: Re: [mpich-discuss] MPI Reduce with MPI_IN_PLACE fails with non-0 root rank for message sizes over 256 with MPI version 4 and after



Hi,



I believe this bug was recently fixed in https://github.com/pmodels/mpich/pull/6543<https://urldefense.com/v3/__https://github.com/pmodels/mpich/pull/6543__;!!DZ3fjg!8AzKEU73uMciZu4bNdH1-uJ_sMFhjhYR4z6BeYLyhi-KiaPNXdrn62dnQZ4iz3VzxMVG5mCaYmCzlj5lcno$>. The fix is part of the MPICH 4.1.2 release just posted to our website and Github. I confirmed that your test program works as expected now vs. an older 4.1 release.



Ken



From: "Solomonik, Edgar via discuss" <discuss at mpich.org>
Reply-To: "discuss at mpich.org" <discuss at mpich.org>
Date: Thursday, June 8, 2023 at 3:37 PM
To: "discuss at mpich.org" <discuss at mpich.org>
Cc: "Solomonik, Edgar" <solomon2 at illinois.edu>
Subject: [mpich-discuss] MPI Reduce with MPI_IN_PLACE fails with non-0 root rank for message sizes over 256 with MPI version 4 and after



Hello,



Our library's autobuild (CTF, which uses MPI extensively and in relatively sophisticated ways) started failing on multiple architectures after github workflows moved to later OS versions (and so later MPI versions). I believe I have narrowed the issue to an MPI bug associated with very basic usage of MPI Reduce. The following test code runs into a segmentation fault inside MPI when running with 2 MPI processes with the latest Ubuntu MPI build and MPI 4.0. It works for smaller values of message size (n) or if the root is rank 0. The usage of MPI_IN_PLACE adheres with the MPI standard.



Best,

Edgar Solomonik



#include <mpi.h>

#include <iostream>



int main(int argc, char ** argv){

  int64_t n = 257;



  MPI_Init(&argc, &argv);

  int rank;

  MPI_Comm_rank(MPI_COMM_WORLD, &rank);



  double * A = (double*)malloc(sizeof(double)*n);

  for (int i=0; i<n; i++){

    A[i] = (double)i;

  }



  if (rank == 1){

    MPI_Reduce(MPI_IN_PLACE, A, n, MPI_DOUBLE, MPI_SUM, 1, MPI_COMM_WORLD);

  } else {

    MPI_Reduce(A, NULL, n, MPI_DOUBLE, MPI_SUM, 1, MPI_COMM_WORLD);

  }



  free(A);



  MPI_Finalize();



  return 0;

}




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20230608/cebcba03/attachment-0001.html>


More information about the discuss mailing list