[mpich-discuss] [petsc-dev] Is mpich/master:a8a2b30fd21 tested with Petsc?
Eric Chamberland
Eric.Chamberland at giref.ulaval.ca
Tue Oct 2 08:55:18 CDT 2018
Hi,
mainly for PETSc users: please do no waste your time using MPI released
with Intel Parallel Studio 2019 since it is the buggy MPICH 3.3b2 for
which this initial thread has been created...
I just wrote a remind about this also on Intel forum:
https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/797761
Eric
On 19/04/18 09:01 AM, Eric Chamberland wrote:
> Hi,
>
> this morning, mpich/master with PETSc is 100% working again for us.
>
> Thanks to both commits:
>
> https://github.com/pmodels/mpich/commit/c597c8d79deea220a42751fda0f01ce70764c260
>
>
> https://github.com/pmodels/mpich/commit/8edabc7373b82dd660019e53d246131765819294
>
>
> and thanks to everybody who helped:
>
> Satish
> Min
> Wesley
> Ken
> Rob
> Jed
>
> :)
>
> Eric
>
> On 17/04/18 04:58 PM, Min Si wrote:
>> Hi all,
>>
>> Thanks for narrowing down the problem. I checked the MPICH code and
>> believe this is a bug in MPICH. I just created a PR to fix it:
>> https://github.com/pmodels/mpich/pull/3097
>>
>> It should be merged into MPICH master branch soon.
>>
>> Thanks,
>> Min
>>
>> On 2018/04/17 14:10, Eric Chamberland wrote:
>>> Hi,
>>>
>>> are we talking about the "tag" passed to MPI_Isend for example?
>>>
>>> but does that mean there is something to change for any MPI call
>>> which involves tags usage or is it only a PETSc "bad" tag usage?
>>>
>>> thanks Satish for your finding!
>>>
>>> Eric
>>>
>>> On 16/04/18 11:31 PM, Satish Balay wrote:
>>>> On Tue, 13 Mar 2018, Eric Chamberland wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> each night we are testing mpich/master with our petsc-based code.
>>>>> I don't
>>>>> know if PETSc team is doing the same thing with mpich/master?
>>>>> (Maybe it is a
>>>>> good idea?)
>>>>>
>>>>> Everything was fine (except the issue
>>>>> https://github.com/pmodels/mpich/issues/2892) up to commit
>>>>> 7b8d64debd, but
>>>>> since commit mpich:a8a2b30fd21), I have a segfault on a any
>>>>> parallel nightly
>>>>> test.
>>>>
>>>> I attempted a bisect of the above range of commits - and narrowed
>>>> down to:
>>>>
>>>>>>>>>>>
>>>> db11d4c4a70e39a28be88ed32f00542301699e08 is the first bad commit
>>>> <<<<<<<
>>>>>>>>>>>>
>>>> balay at asterix /home/balay/soft/build/mpich ((db11d4c4a...)|BISECTING)
>>>> $ git show db11d4c4a70e39a28be88ed32f00542301699e08
>>>> commit db11d4c4a70e39a28be88ed32f00542301699e08 (HEAD, refs/bisect/bad)
>>>> Author: Ken Raffenetti <raffenet at mcs.anl.gov>
>>>> Date: Thu Feb 15 11:37:59 2018 -0600
>>>>
>>>> init: Fix tag upper limit initialization
>>>> The starting point for this value is equivalent to the
>>>> usable tag bits
>>>> macro. This value should be set before device initialization,
>>>> otherwise devices will assume they have more bits than are
>>>> actually
>>>> available.
>>>> Signed-off-by: Wesley Bland <wesley.bland at intel.com>
>>>>
>>>> diff --git a/src/mpi/init/initthread.c b/src/mpi/init/initthread.c
>>>> index cbc41f4d5..b31ae2f07 100644
>>>> --- a/src/mpi/init/initthread.c
>>>> +++ b/src/mpi/init/initthread.c
>>>> @@ -403,7 +403,7 @@ int MPIR_Init_thread(int *argc, char ***argv,
>>>> int required, int *provided)
>>>> MPIR_Process.attrs.host = MPI_PROC_NULL;
>>>> MPIR_Process.attrs.io = MPI_PROC_NULL;
>>>> MPIR_Process.attrs.lastusedcode = MPI_ERR_LASTCODE;
>>>> - MPIR_Process.attrs.tag_ub = 0;
>>>> + MPIR_Process.attrs.tag_ub = MPIR_TAG_USABLE_BITS;
>>>> MPIR_Process.attrs.universe = MPIR_UNIVERSE_SIZE_NOT_SET;
>>>> MPIR_Process.attrs.wtime_is_global = 0;
>>>> @@ -531,13 +531,6 @@ int MPIR_Init_thread(int *argc, char ***argv,
>>>> int required, int *provided)
>>>> MPIR_Assert(((unsigned) MPIR_Process.
>>>> attrs.tag_ub & ((unsigned)
>>>> MPIR_Process.attrs.tag_ub + 1)) == 0);
>>>> - /* Set aside tag space for tagged collectives and failure
>>>> notification */
>>>> -#ifdef HAVE_TAG_ERROR_BITS
>>>> - MPIR_Process.attrs.tag_ub >>= 3;
>>>> -#else
>>>> - MPIR_Process.attrs.tag_ub >>= 1;
>>>> -#endif
>>>> -
>>>> /* Assert: tag_ub is at least the minimum asked for in the MPI
>>>> spec */
>>>> MPIR_Assert(MPIR_Process.attrs.tag_ub >= 32767);
>>>> <<<<<<<<<<<<<<<<<
>>>>
>>>> Reverthing this patch gets mpich-3.3b2 working with petsc
>>>>
>>>> Satish
>>>>
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list