<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi Junchao,<br>
<br>
This is a great idea. We will add large tag tests in our test suite
!<br>
<br>
Min<br>
<br>
<div class="moz-cite-prefix">On 2018/04/17 18:17, Junchao Zhang
wrote:<br>
</div>
<blockquote type="cite" cite="mid:CA+MQGp8CMmZZtbT9ABU3GyyafXzVVOOptWFQPkRwhWsYJAPG=Q@mail.gmail.com">
<div dir="ltr">Min,
<div> I suggest MPICH add tests to play with the maximal MPI
tag (through attribute MPI_TAG_UB). </div>
<div> PETSc uses tags from the maximal and downwards. I guess
MPICH tests use small tags. That is why the bug only showed up
with PETSc.<br>
<div class="gmail_extra"><br clear="all">
<div>
<div class="m_-2405801840177781509gmail_signature" data-smartmail="gmail_signature">
<div dir="ltr">--Junchao Zhang</div>
</div>
</div>
<br>
<div class="gmail_quote">On Tue, Apr 17, 2018 at 3:58 PM,
Min Si <span dir="ltr"><<a href="mailto:msi@anl.gov" target="_blank" moz-do-not-send="true">msi@anl.gov</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi
all,<br>
<br>
Thanks for narrowing down the problem. I checked the
MPICH code and believe this is a bug in MPICH. I just
created a PR to fix it:<br>
<a href="https://github.com/pmodels/mpich/pull/3097" rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/pmodels/mpi<wbr>ch/pull/3097</a><br>
<br>
It should be merged into MPICH master branch soon.<br>
<br>
Thanks,<br>
Min
<div class="m_-2405801840177781509HOEnZb">
<div class="m_-2405801840177781509h5"><br>
<br>
On 2018/04/17 14:10, Eric Chamberland wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<br>
<br>
are we talking about the "tag" passed to MPI_Isend
for example?<br>
<br>
but does that mean there is something to change
for any MPI call which involves tags usage or is
it only a PETSc "bad" tag usage?<br>
<br>
thanks Satish for your finding!<br>
<br>
Eric<br>
<br>
On 16/04/18 11:31 PM, Satish Balay wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0
0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
On Tue, 13 Mar 2018, Eric Chamberland wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
Hi,<br>
<br>
each night we are testing mpich/master with
our petsc-based code. I don't<br>
know if PETSc team is doing the same thing
with mpich/master? (Maybe it is a<br>
good idea?)<br>
<br>
Everything was fine (except the issue<br>
<a href="https://github.com/pmodels/mpich/issues/2892" rel="noreferrer" target="_blank" moz-do-not-send="true">https://github.com/pmodels/mpi<wbr>ch/issues/2892</a>)
up to commit 7b8d64debd, but<br>
since commit mpich:a8a2b30fd21), I have a
segfault on a any parallel nightly<br>
test.<br>
</blockquote>
<br>
I attempted a bisect of the above range of
commits - and narrowed down to:<br>
<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<br>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
db11d4c4a70e39a28be88ed32f0054<wbr>2301699e08 is
the first bad commit<br>
<<<<<<<<br>
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<br>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
balay@asterix /home/balay/soft/build/mpich
((db11d4c4a...)|BISECTING)<br>
$ git show db11d4c4a70e39a28be88ed32f0054<wbr>2301699e08<br>
commit db11d4c4a70e39a28be88ed32f0054<wbr>2301699e08
(HEAD, refs/bisect/bad)<br>
Author: Ken Raffenetti <<a href="mailto:raffenet@mcs.anl.gov" target="_blank" moz-do-not-send="true">raffenet@mcs.anl.gov</a>><br>
Date: Thu Feb 15 11:37:59 2018 -0600<br>
<br>
init: Fix tag upper limit initialization<br>
The starting point for this value is
equivalent to the usable tag bits<br>
macro. This value should be set before
device initialization,<br>
otherwise devices will assume they have
more bits than are actually<br>
available.<br>
Signed-off-by: Wesley Bland <<a href="mailto:wesley.bland@intel.com" target="_blank" moz-do-not-send="true">wesley.bland@intel.com</a>><br>
<br>
diff --git a/src/mpi/init/initthread.c
b/src/mpi/init/initthread.c<br>
index cbc41f4d5..b31ae2f07 100644<br>
--- a/src/mpi/init/initthread.c<br>
+++ b/src/mpi/init/initthread.c<br>
@@ -403,7 +403,7 @@ int MPIR_Init_thread(int
*argc, char ***argv, int required, int
*provided)<br>
MPIR_Process.attrs.host = MPI_PROC_NULL;<br>
<a href="http://MPIR_Process.attrs.io" rel="noreferrer" target="_blank" moz-do-not-send="true">MPIR_Process.attrs.io</a>
= MPI_PROC_NULL;<br>
MPIR_Process.attrs.lastusedcod<wbr>e =
MPI_ERR_LASTCODE;<br>
- MPIR_Process.attrs.tag_ub = 0;<br>
+ MPIR_Process.attrs.tag_ub =
MPIR_TAG_USABLE_BITS;<br>
MPIR_Process.attrs.universe =
MPIR_UNIVERSE_SIZE_NOT_SET;<br>
MPIR_Process.attrs.wtime_is_gl<wbr>obal =
0;<br>
@@ -531,13 +531,6 @@ int MPIR_Init_thread(int
*argc, char ***argv, int required, int
*provided)<br>
MPIR_Assert(((unsigned) MPIR_Process.<br>
attrs.tag_ub &
((unsigned) MPIR_Process.attrs.tag_ub + 1)) ==
0);<br>
- /* Set aside tag space for tagged
collectives and failure notification */<br>
-#ifdef HAVE_TAG_ERROR_BITS<br>
- MPIR_Process.attrs.tag_ub >>= 3;<br>
-#else<br>
- MPIR_Process.attrs.tag_ub >>= 1;<br>
-#endif<br>
-<br>
/* Assert: tag_ub is at least the minimum
asked for in the MPI spec */<br>
MPIR_Assert(MPIR_Process.attrs<wbr>.tag_ub
>= 32767);<br>
<<<<<<<<<<<<<<<<<<br>
<br>
Reverthing this patch gets mpich-3.3b2 working
with petsc<br>
<br>
Satish<br>
<br>
</blockquote>
</blockquote>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>