[mpich-discuss] Bug (?) report: potential division by zero in ADIOI_LUSTRE_Docollect
Constantine Khroulev
ckhroulev at alaska.edu
Fri Oct 30 15:36:34 CDT 2015
Dear MPICH developers,
I am writing to you to report what I think is a bug in ADIO (which, if
I understand it correctly, is a part of ROMIO, which is a part of
MPICH).
The function int ADIOI_LUSTRE_Docollect(ADIO_File, int, ADIO_Offset *,
int) defined in src/mpi/romio/adio/ad_lustre/ad_lustre_aggregate.c
(MPICH version 3.1.4 and several earlier versions) contains an
unprotected division:
/* estimate average req_size */
avg_req_size = (int)(total_req_size / total_access_count);
I suggest adding an if statement protecting from division by zero and
stopping (if appropriate).
I hope that this may save somebody a chunk of time later.
Some context: I am debugging a failure of PISM [1] on NASA
Pleiades [2], which uses the SGI MPI implementation, which also uses
ROMIO. PISM crashed with SIGFPE in ADIOI_LUSTRE_Docollect deep inside
HDF5 and NetCDF. I am not asking for help with this [3] here; I will
contact NASA's support as soon as I have a way of reproducing the
issue outside of PISM.
Thank you!
* Footnotes
[1] http://pism-docs.org/wiki/doku.php
[2] http://www.nas.nasa.gov/hecc/resources/pleiades.html
[3] Any advice is greatly appreciated, though.
--
Constantine
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list