[mpich-discuss] [petsc-dev] Is mpich/master:a8a2b30fd21 tested with Petsc?

Min Si msi at anl.gov
Tue Apr 17 15:58:20 CDT 2018


Hi all,

Thanks for narrowing down the problem. I checked the MPICH code and 
believe this is a bug in MPICH. I just created a PR to fix it:
https://github.com/pmodels/mpich/pull/3097

It should be merged into MPICH master branch soon.

Thanks,
Min

On 2018/04/17 14:10, Eric Chamberland wrote:
> Hi,
>
> are we talking about the "tag" passed to MPI_Isend for example?
>
> but does that mean there is something to change for any MPI call which 
> involves tags usage or is it only a PETSc "bad" tag usage?
>
> thanks Satish for your finding!
>
> Eric
>
> On 16/04/18 11:31 PM, Satish Balay wrote:
>> On Tue, 13 Mar 2018, Eric Chamberland wrote:
>>
>>> Hi,
>>>
>>> each night we are testing mpich/master with our petsc-based code.  I 
>>> don't
>>> know if PETSc team is doing the same thing with mpich/master?   
>>> (Maybe it is a
>>> good idea?)
>>>
>>> Everything was fine (except the issue
>>> https://github.com/pmodels/mpich/issues/2892) up to commit 
>>> 7b8d64debd, but
>>> since commit mpich:a8a2b30fd21), I have a segfault on a any parallel 
>>> nightly
>>> test.
>>
>> I attempted a bisect of the above range of commits - and narrowed 
>> down to:
>>
>>>>>>>>>
>> db11d4c4a70e39a28be88ed32f00542301699e08 is the first bad commit
>> <<<<<<<
>>>>>>>>>>
>> balay at asterix /home/balay/soft/build/mpich ((db11d4c4a...)|BISECTING)
>> $ git show db11d4c4a70e39a28be88ed32f00542301699e08
>> commit db11d4c4a70e39a28be88ed32f00542301699e08 (HEAD, refs/bisect/bad)
>> Author: Ken Raffenetti <raffenet at mcs.anl.gov>
>> Date:   Thu Feb 15 11:37:59 2018 -0600
>>
>>      init: Fix tag upper limit initialization
>>           The starting point for this value is equivalent to the 
>> usable tag bits
>>      macro. This value should be set before device initialization,
>>      otherwise devices will assume they have more bits than are actually
>>      available.
>>           Signed-off-by: Wesley Bland <wesley.bland at intel.com>
>>
>> diff --git a/src/mpi/init/initthread.c b/src/mpi/init/initthread.c
>> index cbc41f4d5..b31ae2f07 100644
>> --- a/src/mpi/init/initthread.c
>> +++ b/src/mpi/init/initthread.c
>> @@ -403,7 +403,7 @@ int MPIR_Init_thread(int *argc, char ***argv, int 
>> required, int *provided)
>>       MPIR_Process.attrs.host = MPI_PROC_NULL;
>>       MPIR_Process.attrs.io = MPI_PROC_NULL;
>>       MPIR_Process.attrs.lastusedcode = MPI_ERR_LASTCODE;
>> -    MPIR_Process.attrs.tag_ub = 0;
>> +    MPIR_Process.attrs.tag_ub = MPIR_TAG_USABLE_BITS;
>>       MPIR_Process.attrs.universe = MPIR_UNIVERSE_SIZE_NOT_SET;
>>       MPIR_Process.attrs.wtime_is_global = 0;
>>   @@ -531,13 +531,6 @@ int MPIR_Init_thread(int *argc, char ***argv, 
>> int required, int *provided)
>>       MPIR_Assert(((unsigned) MPIR_Process.
>>                    attrs.tag_ub & ((unsigned) 
>> MPIR_Process.attrs.tag_ub + 1)) == 0);
>>   -    /* Set aside tag space for tagged collectives and failure 
>> notification */
>> -#ifdef HAVE_TAG_ERROR_BITS
>> -    MPIR_Process.attrs.tag_ub >>= 3;
>> -#else
>> -    MPIR_Process.attrs.tag_ub >>= 1;
>> -#endif
>> -
>>       /* Assert: tag_ub is at least the minimum asked for in the MPI 
>> spec */
>>       MPIR_Assert(MPIR_Process.attrs.tag_ub >= 32767);
>> <<<<<<<<<<<<<<<<<
>>
>> Reverthing this patch gets mpich-3.3b2 working with petsc
>>
>> Satish
>>

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list