<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div dir="ltr">Anyone with a GitHub account should be able to create an issue here: <a href="https://github.com/pmodels/mpich/issues/">https://github.com/pmodels/mpich/issues/</a>.<div><br></div><div>Best,</div><div><br></div><div>Jeff</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Dec 14, 2017 at 8:21 AM, Eric Chamberland <span dir="ltr"><<a href="mailto:Eric.Chamberland@giref.ulaval.ca" target="_blank">Eric.Chamberland@giref.ulaval.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
Can I have an account to declare this bug into the bug tracker?<br>
<br>
Thanks,<br>
<br>
Eric<br>
<br>
<br>
On 08/12/17 10:14 AM, Guo, Yanfei wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Eric,<br>
<br>
Sorry about the delay. I am able to reproduce the problem with your example (even with the master branch of MPICH). Maybe Rob can take a look at the problem.<br>
<br>
Yanfei Guo<br>
Assistant Computer Scientist<br>
MCS Division, ANL<br>
<br>
  <br>
On 12/7/17, 1:59 PM, "Eric Chamberland" <<a href="mailto:Eric.Chamberland@giref.ulaval.ca" target="_blank">Eric.Chamberland@giref.ulaval<wbr>.ca</a>> wrote:<br>
<br>
Hi,<br>
<br>
I first posted on Nov 15 this bug on the list and I still have no reply.<br>
<br>
Is there something I should know or is there a better place to post an<br>
MPICH bug?<br>
<br>
Thanks,<br>
<br>
Eric<br>
<br>
<br>
On 04/12/17 08:18 PM, Eric Chamberland wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<br>
<br>
I have taken some time to have a relative "small" code to reproduce the<br>
problem.<br>
<br>
The good thing is that now I can extract the exact call sequence of<br>
almost any MPI I/O call we make in our code to have a pure "mpi" calls<br>
in s simple C program, independent of our in-house code.<br>
<br>
To reproduce the bug, just compile the attached file with any<br>
mpich/master since commit b4ab2f118d (nov 8) , and launch the resulting<br>
executable with 3 processes along with the second attachment<br>
(file_for_bug.data) saved in the pwd on an *NFS* path.<br>
<br>
You should see comething like this:<br>
<br>
ERROR Returned by MPI: 604040736<br>
ERROR_string Returned by MPI: Other I/O error , error stack:<br>
ADIOI_NFS_READSTRIDED(523): Other I/O error Bad file descriptor<br>
ERROR Returned by MPI: 268496416<br>
ERROR_string Returned by MPI: Other I/O error , error stack:<br>
ADIOI_NFS_READSTRIDED(523): Other I/O error Operation now in progress<br>
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0<br>
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2<br>
<br>
If you launch it on a local drive, it works.<br>
<br>
Can someone confirm it as reproduce the problem please?<br>
<br>
Moreover, if I launch it with valgrind, even on a local disk, it<br>
complains like this, on process 0:<br>
<br>
==99023== Warning: invalid file descriptor -1 in syscall read()<br>
==99023==    at 0x53E2CB0: __read_nocancel (in /lib64/<a href="http://libc-2.19.so" rel="noreferrer" target="_blank">libc-2.19.so</a>)<br>
==99023==    by 0x5041606: file_to_info_all (system_hints.c:101)<br>
==99023==    by 0x5041606: ADIOI_process_system_hints (system_hints.c:150)<br>
==99023==    by 0x50311B8: ADIO_Open (ad_open.c:123)<br>
==99023==    by 0x50161FD: PMPI_File_open (open.c:154)<br>
==99023==    by 0x400F91: main (mpich_mpiio_nfs_bug_read.c:42<wbr>)<br>
<br>
Thanks,<br>
<br>
Eric<br>
<br>
On 21/11/17 02:49 PM, Eric Chamberland wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi M. Latham,<br>
<br>
I have more information now.<br>
<br>
When I try to run my example on NFS, I have the following error code:<br>
<br>
error #812707360<br>
Other I/O error , error stack:<br>
ADIOI_NFS_READSTRIDED(523): Other I/O error Success<br>
<br>
that is returned by MPI_File_read_all_begin<br>
<br>
When I try on a local disk, everything is fine.<br>
<br>
Here are all files about my actual build:<br>
<br>
<a href="http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2017.11.21.05h40m02s_config.log" rel="noreferrer" target="_blank">http://www.giref.ulaval.ca/~cm<wbr>pgiref/dernier_mpich/2017.11.<wbr>21.05h40m02s_config.log</a><br>
<br>
<br>
<a href="http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2017.11.21.05h40m02s_c.txt" rel="noreferrer" target="_blank">http://www.giref.ulaval.ca/~cm<wbr>pgiref/dernier_mpich/2017.11.<wbr>21.05h40m02s_c.txt</a><br>
<br>
<br>
<a href="http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2017.11.21.05h40m02s_m.txt" rel="noreferrer" target="_blank">http://www.giref.ulaval.ca/~cm<wbr>pgiref/dernier_mpich/2017.11.<wbr>21.05h40m02s_m.txt</a><br>
<br>
<br>
<a href="http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2017.11.21.05h40m02s_mi.txt" rel="noreferrer" target="_blank">http://www.giref.ulaval.ca/~cm<wbr>pgiref/dernier_mpich/2017.11.<wbr>21.05h40m02s_mi.txt</a><br>
<br>
<br>
<a href="http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2017.11.21.05h40m02s_mpl_config.log" rel="noreferrer" target="_blank">http://www.giref.ulaval.ca/~cm<wbr>pgiref/dernier_mpich/2017.11.<wbr>21.05h40m02s_mpl_config.log</a><br>
<br>
<br>
<a href="http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2017.11.21.05h40m02s_pm_hydra_config.log" rel="noreferrer" target="_blank">http://www.giref.ulaval.ca/~cm<wbr>pgiref/dernier_mpich/2017.11.<wbr>21.05h40m02s_pm_hydra_config.<wbr>log</a><br>
<br>
<br>
<a href="http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2017.11.21.05h40m02s_mpiexec_info.txt" rel="noreferrer" target="_blank">http://www.giref.ulaval.ca/~cm<wbr>pgiref/dernier_mpich/2017.11.<wbr>21.05h40m02s_mpiexec_info.txt</a><br>
<br>
<br>
Hope this help to dig further into this issue.<br>
<br>
Thanks,<br>
<br>
Eric<br>
<br>
<br>
On 15/11/17 03:55 PM, Eric Chamberland wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<br>
<br>
We are compiling with mpich/master each night since august 2016...<br>
<br>
since nov 8, the mpich/master branch is buggy with our nighlty build<br>
tests.<br>
<br>
Here is the nov 8 config.log:<br>
<br>
<a href="http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2017.11.08.05h36m02s_config.log" rel="noreferrer" target="_blank">http://www.giref.ulaval.ca/~cm<wbr>pgiref/dernier_mpich/2017.11.<wbr>08.05h36m02s_config.log</a><br>
<br>
<br>
For nov 7 the configure log:<br>
<br>
<a href="http://www.giref.ulaval.ca/~cmpgiref/dernier_mpich/2017.11.07.05h36m01s_config.log" rel="noreferrer" target="_blank">http://www.giref.ulaval.ca/~cm<wbr>pgiref/dernier_mpich/2017.11.<wbr>07.05h36m01s_config.log</a><br>
<br>
<br>
<br>
Since nov 8, on a specific ROMIO test, it is hanging indefinitely in<br>
optimized mode, and into DEBUG mode, I have a strange (yet to be<br>
debugged) assertion in our code.<br>
<br>
I reran the test manually, and when I wrote the results on a local<br>
disk, everything is fine.<br>
<br>
However, when I write over *NFS*, the test is faulty.<br>
<br>
I have not yet debugged enough through this, but, I suspect something<br>
related with one of:<br>
<br>
MPI_File_write_all_begin<br>
MPI_File_write_all_end<br>
MPI_File_read_all_begin<br>
MPI_File_read_all_end<br>
MPI_File_set_view<br>
MPI_Type_free<br>
<br>
Am I alone to see these problems?<br>
<br>
Thanks,<br>
Eric<br>
<br>
______________________________<wbr>_________________<br>
discuss mailing list     <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailma<wbr>n/listinfo/discuss</a><br>
</blockquote>
______________________________<wbr>_________________<br>
discuss mailing list     <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailma<wbr>n/listinfo/discuss</a><br>
</blockquote></blockquote>
______________________________<wbr>_________________<br>
discuss mailing list     <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailma<wbr>n/listinfo/discuss</a><br>
<br>
<br>
______________________________<wbr>_________________<br>
discuss mailing list     <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailma<wbr>n/listinfo/discuss</a><br>
<br>
</blockquote>
______________________________<wbr>_________________<br>
discuss mailing list     <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailma<wbr>n/listinfo/discuss</a><br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>
</div>