[mpich-discuss] [Fwd: MPI_File_read_at_all external32 bug in MV2-2.1]
Balaji, Pavan
balaji at anl.gov
Fri Nov 6 19:13:08 CST 2015
Adam,
Not yet. We were working heads-down on the mpich-3.2 release.
I'm adding the mpich discuss list to cc, and poking @robl as well. Do you have a test program that reproduces the error? (sorry if you already sent it, I'm still catching up on the pending issues).
-- Pavan
> On Nov 6, 2015, at 2:10 PM, Adam T. Moody <moody20 at llnl.gov> wrote:
>
> Hi Pavan,
> Did you have a chance to look into this in MPICH?
>
> I'm pretty certain there is a lurking ROMIO bug here.
> -Adam
>
>
> Adam T. Moody wrote:
>
>> Hi Pavan and Howard,
>> FYI, it looks this same bug is in MPICH-3.2rc1 and Open MPI-1.10.0.
>>
>> I guess no one uses collective I/O with external32 :-)
>> -Adam
>>
>> ------------------------------------------------------------------------
>>
>> Subject:
>> MPI_File_read_at_all external32 bug in MV2-2.1
>> From:
>> "Adam T. Moody" <moody20 at llnl.gov>
>> Date:
>> Fri, 30 Oct 2015 11:48:35 -0700
>> To:
>> "mvapich-discuss at cse.ohio-state.edu" <mvapich-discuss at cse.ohio-state.edu>
>>
>> To:
>> "mvapich-discuss at cse.ohio-state.edu" <mvapich-discuss at cse.ohio-state.edu>
>>
>>
>> Hello MVAPICH team,
>> I've hit a bug in MPI_File_read_at_all in MVAPICH2-2.1. I have an application that reads and writes files in external32 format. It writes the file just fine, but it throws the following error when reading the file back:
>>
>> internal ABORT - process 0
>> srun: error: rzmerl2: task 0: Exited with exit code 1
>> Assertion failed in file src/mpid/common/datatype/mpid_ext32_segment.c at line 277: FALSE
>> memcpy argument memory ranges overlap, dst_=0x2aaab5c160a8 src_=0x2aaab5c160a8 len_=176
>>
>> Above, MPI is trying to do a memcpy where the source and destination buffer are the same address. Looking through the code for MVAPICH2-2.1, the problem seems to be at line 132 in src/mpi/romio/mpi-io/read_all.c:
>>
>> if (e32_buf != NULL) {
>> error_code = MPIU_read_external32_conversion_fn(xbuf, datatype,
>> count, e32_buf);
>> ADIOI_Free(e32_buf);
>> }
>>
>> I think the fix is to change "xbuf" above to "buf" as it is in read.c below:
>>
>> if (e32_buf != NULL) {
>> error_code = MPIU_read_external32_conversion_fn(buf, datatype,
>> count, e32_buf);
>> ADIOI_Free(e32_buf);
>> }
>>
>> When in external32 mode, xbuf == e32_buf, which acts as a temporary buffer in which to read the data. The code is then meant to unpack and convert the data from the temporary buffer into the user buffer at buf.
>>
>> It's probably worth checking the other external32 code paths to look for similar bugs.
>> Thanks,
>> -Adam
>>
>
More information about the discuss
mailing list