<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;" dir="ltr">
<p><br>
</p>
<meta content="text/html; charset=UTF-8">
<div dir="ltr">
<div id="x_divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:#000000; font-family:Calibri,Arial,Helvetica,sans-serif">
<p>Hi Rob,</p>
<p>In my previous email, I mentioned that the failure in MPI_File_set_view that we've observed and reported might be the effect of some prior data-corruption, e.g. a buffer overrun... but I haven't found evidence of that yet. Even more curious, is that the
application crash occurs from within an internal Barrier operation in MPI_File_set_view given that an MPI_Barrier call that I added immediately prior to the set_view proceeds without error (though I've used MPI_COMM_WORLD for that "test").
<br>
</p>
<p>Here's a more detailed trace which shows me stepping thru the MPI_File_set_view/Barrier code:</p>
<p></p>
<div><span style="font-size:8pt">112 my_rank = MPID_nem_mem_region.rank;
</span><br>
<span style="font-size:8pt">(gdb)
</span><br>
<span style="font-size:8pt">119 if (MPID_nem_fbox_is_full((MPID_nem_fbox_common_ptr_t)pbox))
</span><br>
<span style="font-size:8pt">(gdb)
</span><br>
<span style="font-size:8pt">OPA_load_acquire_int (ptr=0x7ffff5a91040) at /home/riwarren/Sandbox/mpich-3.2/src/openpa/src/primitives/opa_gcc_intel_32_64_ops.h:65
</span><br>
<span style="font-size:8pt">65 /home/riwarren/Sandbox/mpich-3.2/src/openpa/src/primitives/opa_gcc_intel_32_64_ops.h: No such file or directory.
</span><br>
<span style="font-size:8pt">(gdb)
</span><br>
<span style="font-size:8pt">MPID_nem_mpich_send_header (size=48, again=<synthetic pointer>, vc=0xb1b838, buf=0x7fffffffcf10) at ./src/mpid/ch3/channels/nemesis/include/mpid_nem_inline.h:119
</span><br>
<span style="font-size:8pt">119 if (MPID_nem_fbox_is_full((MPID_nem_fbox_common_ptr_t)pbox))
</span><br>
<span style="font-size:8pt">(gdb)
</span><br>
<span style="font-size:8pt">122 pbox->cell.pkt.mpich.source = MPID_nem_mem_region.local_rank;
</span><br>
<span style="font-size:8pt">(gdb)
</span><br>
<span style="font-size:8pt">128 MPIU_Memcpy((void *)pbox->cell.pkt.mpich.p.payload, buf, size);
</span><br>
<span style="font-size:8pt">(gdb)
</span><br>
<span style="font-size:8pt">122 pbox->cell.pkt.mpich.source = MPID_nem_mem_region.local_rank;
</span><br>
<span style="font-size:8pt">(gdb)
</span><br>
<span style="font-size:8pt">123 pbox->cell.pkt.mpich.datalen = size;
</span><br>
<span style="font-size:8pt">(gdb)
</span><br>
<span style="font-size:8pt">124 pbox->cell.pkt.mpich.seqno = vc_ch->send_seqno++;
</span><br>
<span style="font-size:8pt">(gdb)
</span><br>
<span style="font-size:8pt">128 MPIU_Memcpy((void *)pbox->cell.pkt.mpich.p.payload, buf, size);
</span><br>
<span style="font-size:8pt">(gdb) n
</span><br>
<span style="font-size:8pt">130 <b><span style="color: rgb(255, 0, 0);">OPA_store_release_int(&pbox->flag.value, 1); </span></b>
</span><br>
<span style="font-size:8pt">(gdb) where
</span><br>
<span style="font-size:8pt">#0 MPID_nem_mpich_send_header (size=48, again=<synthetic pointer>, vc=0xb1b838, buf=0x7fffffffcf10) at ./src/mpid/ch3/channels/nemesis/include/mpid_nem_inline.h:130
</span><br>
<span style="font-size:8pt">#1 MPIDI_CH3_iSend (vc=vc@entry=0xb1b838, sreq=0x7ffff769fe88 <MPID_Request_direct+904>, hdr=hdr@entry=0x7fffffffcf10, hdr_sz=48, hdr_sz@entry=32) at src/mpid/ch3/channels/nemesis/src/ch3_isend.c:56
</span><br>
<span style="font-size:8pt">#2 0x00007ffff736e6eb in MPID_Isend (buf=buf@entry=0x0, count=count@entry=0, datatype=datatype@entry=1275068685, rank=rank@entry=1, tag=tag@entry=1, comm=comm@entry=0x7ffff7e27520, context_offset=1,
</span><br>
<span style="font-size:8pt"> request=0x7fffffffcf98) at src/mpid/ch3/src/mpid_isend.c:115
</span><br>
<span style="font-size:8pt">#3 0x00007ffff73127a4 in MPIC_Sendrecv (sendbuf=sendbuf@entry=0x0, sendcount=sendcount@entry=0, sendtype=sendtype@entry=1275068685, dest=dest@entry=1, sendtag=sendtag@entry=1, recvbuf=recvbuf@entry=0x0,
</span><br>
<span style="font-size:8pt"> recvcount=0, recvtype=1275068685, source=1, recvtag=1, comm_ptr=0x7ffff7e27520, status=0x7fffffffcfa0, errflag=0x7fffffffd14c) at src/mpi/coll/helper_fns.c:481
</span><br>
<span style="font-size:8pt">#4 0x00007ffff726f18c in MPIR_Barrier_intra (comm_ptr=0x7ffff7e27520, errflag=0x7fffffffd14c) at src/mpi/coll/barrier.c:162
</span><br>
<span style="font-size:8pt">#5 0x00007ffff726f7b2 in MPIR_Barrier (comm_ptr=<optimized out>, errflag=<optimized out>) at src/mpi/coll/barrier.c:291
</span><br>
<span style="font-size:8pt">#6 0x00007ffff726f095 in MPIR_Barrier_impl (comm_ptr=<optimized out>, errflag=errflag@entry=0x7fffffffd14c) at src/mpi/coll/barrier.c:326
</span><br>
<span style="font-size:8pt">#7 0x00007ffff726f26a in barrier_smp_intra (errflag=0x7fffffffd14c, comm_ptr=0x7ffff7e27370) at src/mpi/coll/barrier.c:81
</span><br>
<span style="font-size:8pt">#8 MPIR_Barrier_intra (comm_ptr=0x7ffff7e27370, errflag=0x7fffffffd14c) at src/mpi/coll/barrier.c:146
</span><br>
<span style="font-size:8pt">#9 0x00007ffff726f7b2 in MPIR_Barrier (comm_ptr=<optimized out>, errflag=<optimized out>) at src/mpi/coll/barrier.c:291
</span><br>
<span style="font-size:8pt">#10 0x00007ffff726f095 in MPIR_Barrier_impl (comm_ptr=comm_ptr@entry=0x7ffff7e27370, errflag=errflag@entry=0x7fffffffd14c) at src/mpi/coll/barrier.c:326
</span><br>
<span style="font-size:8pt">#11 0x00007ffff726f9e2 in PMPI_Barrier (comm=-1006632958) at src/mpi/coll/barrier.c:410
</span><br>
<span style="font-size:8pt">#12 0x00007ffff73b6bf1 in PMPI_File_set_view (fh=0xb99e78, disp=0, etype=etype@entry=1275068685, filetype=-1946157049, datarep=<optimized out>, datarep@entry=0xae9b08 <H5FD_mpi_native_g> "native",
</span><br>
<span style="font-size:8pt"> info=-1677721596) at mpi-io/set_view.c:188
</span><br>
<span style="font-size:8pt">#13 0x000000000075cce1 in H5FD_mpio_write (_file=_file@entry=0xbb21d0, type=type@entry=H5FD_MEM_DEFAULT, dxpl_id=<optimized out>, addr=addr@entry=0, size=size@entry=1, buf=buf@entry=0xbcb830)
</span><br>
<span style="font-size:8pt"> at H5FDmpio.c:1781
</span><br>
<span style="font-size:8pt">#14 0x000000000055e285 in H5FD_write (file=0xbb21d0, dxpl=0xb84210, type=type@entry=H5FD_MEM_DEFAULT, addr=addr@entry=0, size=size@entry=1, buf=buf@entry=0xbcb830) at H5FDint.c:294
</span><br>
<span style="font-size:8pt">#15 0x000000000054aa12 in H5F__accum_write (fio_info=fio_info@entry=0x7fffffffd350, map_type=map_type@entry=H5FD_MEM_DEFAULT, addr=addr@entry=0, size=size@entry=1, buf=buf@entry=0xbcb830) at H5Faccum.c:821
</span><br>
<span style="font-size:8pt">#16 0x000000000054c5fc in H5F_block_write (f=f@entry=0xbb2290, type=type@entry=H5FD_MEM_DEFAULT, addr=addr@entry=0, size=size@entry=1, dxpl_id=dxpl_id@entry=720575940379279375, buf=buf@entry=0xbcb830)
</span><br>
<span style="font-size:8pt"> at H5Fio.c:195
</span><br>
<span style="font-size:8pt">#17 0x0000000000752a11 in H5C__collective_write (f=f@entry=0xbb2290, dxpl_id=dxpl_id@entry=720575940379279375) at H5Cmpio.c:1454
</span><br>
<span style="font-size:8pt">#18 0x0000000000754267 in H5C_apply_candidate_list (f=f@entry=0xbb2290, dxpl_id=dxpl_id@entry=720575940379279375, cache_ptr=cache_ptr@entry=0x7ffff455a040, num_candidates=1,
</span><br>
<span style="font-size:8pt"> candidates_list_ptr=<optimized out>, mpi_rank=<optimized out>, mpi_size=2) at H5Cmpio.c:760
</span><br>
<span style="font-size:8pt">#19 0x0000000000750676 in H5AC__rsp__dist_md_write__flush (f=f@entry=0xbb2290, dxpl_id=dxpl_id@entry=720575940379279375) at H5ACmpio.c:1707
</span><br>
<span style="font-size:8pt">#20 0x0000000000751f7f in H5AC__run_sync_point (f=f@entry=0xbb2290, dxpl_id=dxpl_id@entry=720575940379279375, sync_point_op=sync_point_op@entry=1) at H5ACmpio.c:2158
</span><br>
<span style="font-size:8pt">#21 0x000000000075205e in H5AC__flush_entries (f=f@entry=0xbb2290, dxpl_id=dxpl_id@entry=720575940379279375) at H5ACmpio.c:2301
</span><br>
<span style="font-size:8pt">#22 0x00000000004bfe16 in H5AC_dest (f=f@entry=0xbb2290, dxpl_id=dxpl_id@entry=720575940379279375) at H5AC.c:582
</span><br>
<span style="font-size:8pt">#23 0x0000000000543ef1 in H5F_dest (f=f@entry=0xbb2290, dxpl_id=720575940379279375, flush=flush@entry=true) at H5Fint.c:964
</span><br>
<span style="font-size:8pt">#24 0x0000000000544ae2 in H5F_try_close (f=f@entry=0xbb2290, was_closed=was_closed@entry=0x0) at H5Fint.c:1800
</span><br>
<span style="font-size:8pt">#25 0x0000000000544ee2 in H5F_close (f=0xbb2290) at H5Fint.c:1626
</span><br>
<span style="font-size:8pt">#26 0x00000000005b65cd in H5I_dec_ref (id=id@entry=72057594037927936) at H5I.c:1308
</span><br>
<span style="font-size:8pt">#27 0x00000000005b669e in H5I_dec_app_ref (id=id@entry=72057594037927936) at H5I.c:1353
</span><br>
<span style="font-size:8pt">#28 0x000000000053d058 in H5Fclose (file_id=72057594037927936) at H5F.c:769
</span><br>
<span style="font-size:8pt">#29 0x0000000000487353 in ADFH_Database_Close (root=4.7783097267364807e-299, status=0x7fffffffdb24) at adfh/ADFH.c:2447
</span><br>
<span style="font-size:8pt">#30 0x0000000000481633 in cgio_close_file (cgio_num=1) at cgns_io.c:817
</span><br>
<span style="font-size:8pt">#31 0x00000000004060d7 in cg_close (file_number=1) at cgnslib.c:636
</span><br>
<span style="font-size:8pt">#32 0x0000000000437417 in cgp_close (fn=1) at pcgnslib.c:288
</span><br>
<span style="font-size:8pt">#33 0x0000000000403524 in main (argc=1, argv=0x7fffffffdd28) at benchmark_hdf5.c:186 </span></div>
<br>
The code fails when I attempt to step over the OPA_store_release_int function. Might you have some explanation about what things *COULD* go wrong with this? Not knowing the current details of the MPICH collectives, why would a Barrier operation which immediate
precedes that call to MPI_File_set_view work correctly and then have that same operation fail from within?<br>
<br>
Thanks for any insight you have on this issue!<br>
Best regards,<br>
Richard<br>
<p></p>
<p><br>
</p>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="x_divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" color="#000000" face="Calibri, sans-serif"><b>From:</b> Richard Warren <Richard.Warren@hdfgroup.org><br>
<b>Sent:</b> Wednesday, January 11, 2017 11:27:42 AM<br>
<b>To:</b> discuss@mpich.org<br>
<b>Subject:</b> Re: [mpich-discuss] FW: Potential MPICH problem</font>
<div> </div>
</div>
</div>
<font size="2"><span style="font-size:10pt;">
<div class="PlainText">[This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing at
<a href="http://aka.ms/LearnAboutSpoofing" id="LPlnk470114" previewremoved="true">
http://aka.ms/LearnAboutSpoofing</a>]
<div id="LPBorder_GT_14849641930010.25117647558781997" style="margin-bottom: 20px; overflow: auto; width: 100%; text-indent: 0px;">
<table id="LPContainer_14849641929870.8124643586184285" style="width: 90%; background-color: rgb(255, 255, 255); position: relative; overflow: auto; padding-top: 20px; padding-bottom: 20px; margin-top: 20px; border-top: 1px dotted rgb(200, 200, 200); border-bottom: 1px dotted rgb(200, 200, 200);" cellspacing="0">
<tbody>
<tr style="border-spacing: 0px;" valign="top">
<td id="TextCell_14849641929910.21003120710408896" style="vertical-align: top; position: relative; padding: 0px; display: table-cell;" colspan="2">
<div id="LPRemovePreviewContainer_14849641929920.389207716174809"></div>
<div id="LPTitle_14849641929920.6424886569441032" style="top: 0px; color: rgb(0, 120, 215); font-weight: 400; font-size: 21px; font-family: "wf_segoe-ui_light","Segoe UI Light","Segoe WP Light","Segoe UI","Segoe WP",Tahoma,Arial,sans-serif; line-height: 21px;">
<a id="LPUrlAnchor_14849641929950.2191646023487661" style="text-decoration: none;" href="http://aka.ms/LearnAboutSpoofing" target="_blank">How Office helps protect you from phishing schemes - Office Support</a></div>
<div id="LPMetadata_14849641929950.037997182226197745" style="margin: 10px 0px 16px; color: rgb(102, 102, 102); font-weight: 400; font-family: "wf_segoe-ui_normal","Segoe UI","Segoe WP",Tahoma,Arial,sans-serif; font-size: 14px; line-height: 14px;">
aka.ms</div>
<div id="LPDescription_14849641929980.7541597542487902" style="display: block; color: rgb(102, 102, 102); font-weight: 400; font-family: "wf_segoe-ui_normal","Segoe UI","Segoe WP",Tahoma,Arial,sans-serif; font-size: 14px; line-height: 20px; max-height: 100px; overflow: hidden;">
This article explains what phishing is and includes tips on how to identify phishing schemes and follow best practices to avoid becoming a victim of online fraud.</div>
</td>
</tr>
</tbody>
</table>
</div>
<br>
<br>
Hi Rob,<br>
I did put together an initial simplified test case to attempt to reproduce the issue, but that does code does NOT fail! I’ve since focused my efforts not so much on the file closing operations, but other collective operations that take place PRIOR to the
close. I suspect that some operation(s) that precede the file close are responsible for potentially getting things out-of-sync and hence we observe the failure “down-stream”. At this time, I’m spending my time attempting to trace these other collective
operations in order to gain a better understanding what’s happening. The code that we’re testing has not been released yet, so I’m not convinced that you could reproduce it directly from the currently available HDF5 and CGNS downloads. I could send you
the actual test, since it links to the static libraries (.a) for both HDF5 and CGNS. Would that be of interest to you?<br>
Thanks,<br>
Richard<br>
<br>
<br>
On 1/11/17, 11:02 AM, "Rob Latham" <robl@mcs.anl.gov> wrote:<br>
<br>
<br>
<br>
On 12/29/2016 08:28 AM, Richard Warren wrote:<br>
><br>
><br>
> Hi All,<br>
><br>
> I’m writing to get some advice and possibly report a bug. The<br>
> circumstances are that we are currently working on updating HDF5<br>
> functionality and have run into an issue running a parallel test of a<br>
> CFD code (benchmark.hdf) from the CGNS code base<br>
> <<a href="https://github.com/CGNS/CGNS.git" id="LPlnk32406" previewremoved="true">https://github.com/CGNS/CGNS.git</a>>. I’ve debugged enough to see that
<div id="LPBorder_GT_14849641911540.7780226899489849" style="margin-bottom: 20px; overflow: auto; width: 100%; text-indent: 0px;">
<table id="LPContainer_14849641911350.38626048770740984" style="width: 90%; background-color: rgb(255, 255, 255); position: relative; overflow: auto; padding-top: 20px; padding-bottom: 20px; margin-top: 20px; border-top: 1px dotted rgb(200, 200, 200); border-bottom: 1px dotted rgb(200, 200, 200);" cellspacing="0">
<tbody>
<tr style="border-spacing: 0px;" valign="top">
<td id="TextCell_14849641911460.7685010473949812" style="vertical-align: top; position: relative; padding: 0px; display: table-cell;" colspan="2">
<div id="LPRemovePreviewContainer_14849641911460.004016755149779283"></div>
<div id="LPTitle_14849641911470.19776847962098798" style="top: 0px; color: rgb(0, 120, 215); font-weight: 400; font-size: 21px; font-family: "wf_segoe-ui_light","Segoe UI Light","Segoe WP Light","Segoe UI","Segoe WP",Tahoma,Arial,sans-serif; line-height: 21px;">
<a id="LPUrlAnchor_14849641911490.8098007095300815" style="text-decoration: none;" href="https://github.com/CGNS/CGNS.git" target="_blank">CGNS/CGNS</a></div>
<div id="LPMetadata_14849641911500.3580064310788992" style="margin: 10px 0px 16px; color: rgb(102, 102, 102); font-weight: 400; font-family: "wf_segoe-ui_normal","Segoe UI","Segoe WP",Tahoma,Arial,sans-serif; font-size: 14px; line-height: 14px;">
github.com</div>
<div id="LPDescription_14849641911520.7448150575377851" style="display: block; color: rgb(102, 102, 102); font-weight: 400; font-family: "wf_segoe-ui_normal","Segoe UI","Segoe WP",Tahoma,Arial,sans-serif; font-size: 14px; line-height: 20px; max-height: 100px; overflow: hidden;">
The CFD General Notation System (CGNS) provides a standard for recording and recovering computer data associated with the numerical solution of fluid dynamics equations. All developement work and b...</div>
</td>
</tr>
</tbody>
</table>
</div>
<br>
> our failure occurs during a call to MPI_File_set_view, with the failure<br>
> signature as follows:<br>
<br>
i'm having a heck of a time building CGNS. When I make cgns,<br>
cgnsconvert is failing to find the hdf5 symbols despite me telling<br>
cmake where to find libhdf5.<br>
<br>
did you get that test case you mentioned?<br>
<br>
<br>
==rob<br>
<br>
><br>
><br>
><br>
> [brtnfld@jelly] ~/scratch/CGNS/CGNS/src/ptests % mpirun -n 2 benchmark_hdf5<br>
><br>
> Fatal error in PMPI_Barrier: Message truncated, error stack:<br>
><br>
> PMPI_Barrier(430)...................: MPI_Barrier(comm=0x84000006) failed<br>
><br>
> MPIR_Barrier_impl(337)..............: Failure during collective<br>
><br>
> MPIR_Barrier_impl(330)..............:<br>
><br>
> MPIR_Barrier(294)...................:<br>
><br>
> MPIR_Barrier_intra(151).............:<br>
><br>
> barrier_smp_intra(111)..............:<br>
><br>
> MPIR_Bcast_impl(1462)...............:<br>
><br>
> MPIR_Bcast(1486)....................:<br>
><br>
> MPIR_Bcast_intra(1295)..............:<br>
><br>
> MPIR_Bcast_binomial(241)............:<br>
><br>
> MPIC_Recv(352)......................:<br>
><br>
> MPIDI_CH3U_Request_unpack_uebuf(608): Message truncated; 4 bytes<br>
> received but buffer size is 1<br>
><br>
> [cli_1]: aborting job:<br>
><br>
> Fatal error in PMPI_Barrier: Message truncated, error stack:<br>
><br>
> PMPI_Barrier(430)...................: MPI_Barrier(comm=0x84000006) failed<br>
><br>
> MPIR_Barrier_impl(337)..............: Failure during collective<br>
><br>
> MPIR_Barrier_impl(330)..............:<br>
><br>
> MPIR_Barrier(294)...................:<br>
><br>
> MPIR_Barrier_intra(151).............:<br>
><br>
> barrier_smp_intra(111)..............:<br>
><br>
> MPIR_Bcast_impl(1462)...............:<br>
><br>
> MPIR_Bcast(1486)....................:<br>
><br>
> MPIR_Bcast_intra(1295)..............:<br>
><br>
> MPIR_Bcast_binomial(241)............:<br>
><br>
> MPIC_Recv(352)......................:<br>
><br>
> MPIDI_CH3U_Request_unpack_uebuf(608): Message truncated; 4 bytes<br>
> received but buffer size is 1<br>
><br>
> benchmark_hdf5: /mnt/hdf/brtnfld/hdf5/trunk/hdf5/src/H5Fint.c:1465:<br>
> H5F_close: Assertion `f->file_id > 0' failed.<br>
><br>
> Fatal error in PMPI_Allgather: Unknown error class, error stack:<br>
><br>
> PMPI_Allgather(1002)......................:<br>
> MPI_Allgather(sbuf=0x7ffdfdaf9b10, scount=1, MPI_LONG_LONG_INT,<br>
> rbuf=0x1d53ed8, rcount=1, MPI_LONG_LONG_INT, comm=0xc4000002) failed<br>
><br>
> MPIR_Allgather_impl(842)..................:<br>
><br>
> MPIR_Allgather(801).......................:<br>
><br>
> MPIR_Allgather_intra(216).................:<br>
><br>
> MPIC_Sendrecv(475)........................:<br>
><br>
> MPIC_Wait(243)............................:<br>
><br>
> MPIDI_CH3i_Progress_wait(239).............: an error occurred while<br>
> handling an event returned by MPIDU_Sock_Wait()<br>
><br>
> MPIDI_CH3I_Progress_handle_sock_event(451):<br>
><br>
> MPIDU_Socki_handle_read(649)..............: connection failure<br>
> (set=0,sock=1,errno=104:Connection reset by peer)<br>
><br>
> [cli_0]: aborting job:<br>
><br>
> Fatal error in PMPI_Allgather: Unknown error class, error stack:<br>
><br>
> PMPI_Allgather(1002)......................:<br>
> MPI_Allgather(sbuf=0x7ffdfdaf9b10, scount=1, MPI_LONG_LONG_INT,<br>
> rbuf=0x1d53ed8, rcount=1, MPI_LONG_LONG_INT, comm=0xc4000002) failed<br>
><br>
> MPIR_Allgather_impl(842)..................:<br>
><br>
> MPIR_Allgather(801).......................:<br>
><br>
> MPIR_Allgather_intra(216).................:<br>
><br>
> MPIC_Sendrecv(475)........................:<br>
><br>
> MPIC_Wait(243)............................:<br>
><br>
> MPIDI_CH3i_Progress_wait(239).............: an error occurred while<br>
> handling an event returned by MPIDU_Sock_Wait()<br>
><br>
> MPIDI_CH3I_Progress_handle_sock_event(451):<br>
><br>
> MPIDU_Socki_handle_read(649)..............: connection failure<br>
> (set=0,sock=1,errno=104:Connection reset by peer)<br>
><br>
> benchmark_hdf5: /mnt/hdf/brtnfld/hdf5/trunk/hdf5/src/H5Fint.c:1465:<br>
> H5F_close: Assertion `f->file_id > 0’ failed.<br>
><br>
><br>
><br>
> Please note that the above trace was the original stacktrace which<br>
> appears to utilize sockets, though I’ve reproduced the same problem by<br>
> running on an SMP with shared memory.<br>
><br>
><br>
><br>
> While it’s not definitive that the issue has anything to do with the<br>
> above stack trace, the very same benchmark runs perfectly well utilizing<br>
> PHDF5 built with OpenMPI. My own testing is with MPICH version 3.2<br>
> available from your download site and with OpenMPI 2.0.1 (also their<br>
> latest download). Both MPI releases were built from source on my Fedora<br>
> 25 Linux distribution using GCC 6.2.1 20160916 (Red Hat 6.2.1-2).<br>
><br>
><br>
><br>
> Given that the synchronous calls into MPI_File_set_view appear to be<br>
> coded correctly AND that there isn’t much in the way of input parameters<br>
> that would cause problems (other than incorrect coding), we tend to<br>
> believe that the internal message queues between processes may somehow<br>
> be corrupted. This impression is strengthened by the fact that our<br>
> recent codebase changes (which are unrelated to the actual calls to<br>
> MPI_File_set_view) may have introduced this issue. Note too, that the<br>
> code paths to MPI_File_set_view have been taken many times previously<br>
> and those function calls have all succeeded.<br>
><br>
><br>
><br>
> Are there any suggestions out there as to how to further debug this<br>
> potential corruption issue?<br>
><br>
> Many thanks,<br>
><br>
> Richard A. Warren<br>
><br>
><br>
><br>
><br>
><br>
> _______________________________________________<br>
> discuss mailing list discuss@mpich.org<br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" id="LPlnk358762" previewremoved="true">
https://lists.mpich.org/mailman/listinfo/discuss</a>
<div id="LPBorder_GT_14849641930710.31448920519998336" style="margin-bottom: 20px; overflow: auto; width: 100%; text-indent: 0px;">
<table id="LPContainer_14849641930630.9497262674654686" style="width: 90%; background-color: rgb(255, 255, 255); position: relative; overflow: auto; padding-top: 20px; padding-bottom: 20px; margin-top: 20px; border-top: 1px dotted rgb(200, 200, 200); border-bottom: 1px dotted rgb(200, 200, 200);" cellspacing="0">
<tbody>
<tr style="border-spacing: 0px;" valign="top">
<td id="TextCell_14849641930650.6945863462177031" style="vertical-align: top; position: relative; padding: 0px; display: table-cell;" colspan="2">
<div id="LPRemovePreviewContainer_14849641930650.36376232714087386"></div>
<div id="LPTitle_14849641930650.31704677455648145" style="top: 0px; color: rgb(0, 120, 215); font-weight: 400; font-size: 21px; font-family: "wf_segoe-ui_light","Segoe UI Light","Segoe WP Light","Segoe UI","Segoe WP",Tahoma,Arial,sans-serif; line-height: 21px;">
<a id="LPUrlAnchor_14849641930670.7220037520069965" style="text-decoration: none;" href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">discuss Info Page - MPICH</a></div>
<div id="LPMetadata_14849641930670.7958896798280727" style="margin: 10px 0px 16px; color: rgb(102, 102, 102); font-weight: 400; font-family: "wf_segoe-ui_normal","Segoe UI","Segoe WP",Tahoma,Arial,sans-serif; font-size: 14px; line-height: 14px;">
lists.mpich.org</div>
<div id="LPDescription_14849641930690.5510205436021421" style="display: block; color: rgb(102, 102, 102); font-weight: 400; font-family: "wf_segoe-ui_normal","Segoe UI","Segoe WP",Tahoma,Arial,sans-serif; font-size: 14px; line-height: 20px; max-height: 100px; overflow: hidden;">
To see the collection of prior postings to the list, visit the discuss Archives. Using discuss: To post a message to all the list members, send email ...</div>
</td>
</tr>
</tbody>
</table>
</div>
<br>
><br>
_______________________________________________<br>
discuss mailing list discuss@mpich.org<br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" id="LPlnk573082" previewremoved="true">
https://lists.mpich.org/mailman/listinfo/discuss</a>
<div id="LPBorder_GT_14849642501650.8222450310623294" style="margin-bottom: 20px; overflow: auto; width: 100%; text-indent: 0px;">
<table id="LPContainer_14849642501560.23977408567554648" style="width: 90%; background-color: rgb(255, 255, 255); position: relative; overflow: auto; padding-top: 20px; padding-bottom: 20px; margin-top: 20px; border-top: 1px dotted rgb(200, 200, 200); border-bottom: 1px dotted rgb(200, 200, 200);" cellspacing="0">
<tbody>
<tr style="border-spacing: 0px;" valign="top">
<td id="TextCell_14849642501580.35122957343311034" style="vertical-align: top; position: relative; padding: 0px; display: table-cell;" colspan="2">
<div id="LPRemovePreviewContainer_14849642501580.3014778647206888"></div>
<div id="LPTitle_14849642501580.6589243628751166" style="top: 0px; color: rgb(0, 120, 215); font-weight: 400; font-size: 21px; font-family: "wf_segoe-ui_light","Segoe UI Light","Segoe WP Light","Segoe UI","Segoe WP",Tahoma,Arial,sans-serif; line-height: 21px;">
<a id="LPUrlAnchor_14849642501600.740727755225108" style="text-decoration: none;" href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">discuss Info Page - MPICH</a></div>
<div id="LPMetadata_14849642501620.8928252882839545" style="margin: 10px 0px 16px; color: rgb(102, 102, 102); font-weight: 400; font-family: "wf_segoe-ui_normal","Segoe UI","Segoe WP",Tahoma,Arial,sans-serif; font-size: 14px; line-height: 14px;">
lists.mpich.org</div>
<div id="LPDescription_14849642501630.3137726694862881" style="display: block; color: rgb(102, 102, 102); font-weight: 400; font-family: "wf_segoe-ui_normal","Segoe UI","Segoe WP",Tahoma,Arial,sans-serif; font-size: 14px; line-height: 20px; max-height: 100px; overflow: hidden;">
To see the collection of prior postings to the list, visit the discuss Archives. Using discuss: To post a message to all the list members, send email ...</div>
</td>
</tr>
</tbody>
</table>
</div>
<br>
<br>
<br>
_______________________________________________<br>
discuss mailing list discuss@mpich.org<br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" id="LPlnk175408" previewremoved="true">https://lists.mpich.org/mailman/listinfo/discuss</a>
<div id="LPBorder_GT_14849642500800.7293579767858409" style="margin-bottom: 20px; overflow: auto; width: 100%; text-indent: 0px;">
<table id="LPContainer_14849642500690.4304458208609724" style="width: 90%; background-color: rgb(255, 255, 255); position: relative; overflow: auto; padding-top: 20px; padding-bottom: 20px; margin-top: 20px; border-top: 1px dotted rgb(200, 200, 200); border-bottom: 1px dotted rgb(200, 200, 200);" cellspacing="0">
<tbody>
<tr style="border-spacing: 0px;" valign="top">
<td id="TextCell_14849642500720.5616880327831322" style="vertical-align: top; position: relative; padding: 0px; display: table-cell;" colspan="2">
<div id="LPRemovePreviewContainer_14849642500730.9903434687459279"></div>
<div id="LPTitle_14849642500730.1510066866985733" style="top: 0px; color: rgb(0, 120, 215); font-weight: 400; font-size: 21px; font-family: "wf_segoe-ui_light","Segoe UI Light","Segoe WP Light","Segoe UI","Segoe WP",Tahoma,Arial,sans-serif; line-height: 21px;">
<a id="LPUrlAnchor_14849642500750.3339623792291877" style="text-decoration: none;" href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">discuss Info Page - MPICH</a></div>
<div id="LPMetadata_14849642500760.108896541345477" style="margin: 10px 0px 16px; color: rgb(102, 102, 102); font-weight: 400; font-family: "wf_segoe-ui_normal","Segoe UI","Segoe WP",Tahoma,Arial,sans-serif; font-size: 14px; line-height: 14px;">
lists.mpich.org</div>
<div id="LPDescription_14849642500780.24819526038944473" style="display: block; color: rgb(102, 102, 102); font-weight: 400; font-family: "wf_segoe-ui_normal","Segoe UI","Segoe WP",Tahoma,Arial,sans-serif; font-size: 14px; line-height: 20px; max-height: 100px; overflow: hidden;">
To see the collection of prior postings to the list, visit the discuss Archives. Using discuss: To post a message to all the list members, send email ...</div>
</td>
</tr>
</tbody>
</table>
</div>
<br>
</div>
</span></font></div>
</body>
</html>