[mpich-discuss] Bug (?) report: potential division by zero in ADIOI_LUSTRE_Docollect

Constantine Khroulev ckhroulev at alaska.edu
Fri Oct 30 15:36:34 CDT 2015


Dear MPICH developers,

I am writing to you to report what I think is a bug in ADIO (which, if
I understand it correctly, is a part of ROMIO, which is a part of
MPICH).

The function int ADIOI_LUSTRE_Docollect(ADIO_File, int, ADIO_Offset *,
int) defined in src/mpi/romio/adio/ad_lustre/ad_lustre_aggregate.c
(MPICH version 3.1.4 and several earlier versions) contains an
unprotected division:

    /* estimate average req_size */
    avg_req_size = (int)(total_req_size / total_access_count);

I suggest adding an if statement protecting from division by zero and
stopping (if appropriate).

I hope that this may save somebody a chunk of time later.

Some context: I am debugging a failure of PISM [1] on NASA
Pleiades [2], which uses the SGI MPI implementation, which also uses
ROMIO. PISM crashed with SIGFPE in ADIOI_LUSTRE_Docollect deep inside
HDF5 and NetCDF. I am not asking for help with this [3] here; I will
contact NASA's support as soon as I have a way of reproducing the
issue outside of PISM.

Thank you!

* Footnotes

[1] http://pism-docs.org/wiki/doku.php

[2] http://www.nas.nasa.gov/hecc/resources/pleiades.html

[3] Any advice is greatly appreciated, though.

-- 
Constantine
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list