[mpich-discuss] Error with mpich-3.3.2 and ucx-1.8.0

Raffenetti, Kenneth J. raffenet at mcs.anl.gov
Thu Mar 4 11:37:24 CST 2021


Hi Junchao,

We do not have any plans to provide another MPICH release in the 3.3.x series. It is good to know this bug is fixed in the latest version of MPICH.

Ken

On 3/2/21, 8:17 PM, "Junchao Zhang via discuss" <discuss at mpich.org> wrote:

    Hello,  We met an error with mpich-3.3.2 and ucx-1.8.0.  See the attached example, which uses a user defined data type in MPI_Startall.   MPICH was configured with --with-device=ch4:ucx --with-ucx=/path/to/ucx-1.8.0. The error stack is
    
    $ mpirun -n 2 ./dtype
    Assertion failed in file src/mpi/datatype/type_free.c at line 38: (((datatype_ptr)))->ref_count >= 0
    /home/jczhang/soft/lib/libmpi.so.12(+0x48fa1f) [0x7f4019b6ba1f]
    /home/jczhang/soft/lib/libmpi.so.12(MPL_backtrace_show+0x18) [0x7f4019b6baff]
    /home/jczhang/soft/lib/libmpi.so.12(+0x441e27) [0x7f4019b1de27]
    /home/jczhang/soft/lib/libmpi.so.12(+0xfebc2) [0x7f40197dabc2]
    /home/jczhang/soft/lib/libmpi.so.12(MPI_Type_free+0x5fd) [0x7f40197db239]
    
    
    
    It does not happen with mpich-3.4.1.
     
    --Junchao Zhang



More information about the discuss mailing list