[mpich-discuss] crash on 2^31 size in MPI_Win_allocate_shared(...)

Jeff Hammond jhammond at alcf.anl.gov
Fri Jun 7 09:00:40 CDT 2013


First, "long unsigned int window_size=2147483648" is not correct.
They type you need to use there is MPI_Aint.  The syntax of this
function is

int MPI_Win_allocate_shared(MPI_Aint size, int disp_unit, MPI_Info
info, MPI_Comm comm, void *baseptr, MPI_Win *win)

It may be true that "long unsigned int" is safely case to MPI_Aint,
but that's a very danger way to write code and it may be broken on
some platforms.

In any case, everything above 2^31 is probably not okay.  Unless
absolutely every integer type used in the code paths you are hitting
is size_t (or equivalent) and not int, you're going to hit overflow
somewhere.  Maybe I'm wrong, but you should verify (as a debugging
mechanism, not in general) that MPI_Win_allocate_shared is behaving as
desired by memset-ing the resulting data (mem) to verify that you're
actually getting e.g. 2^34 bytes back.  If /dev/shm is 4G, I'm not
sure how that's possible but maybe the implementation doesn't use
that.

I'm going to be on a plane today but I'll run your code on my machine
and try to figure out more about how "count-safe"
MPI_Win_allocate_shared is.

Jeff

PS Installing MPICH in ~/git/openmpi is just dirty :-)

On Fri, Jun 7, 2013 at 5:50 AM, Weise Steffen
<Steffen.Weise at iec.tu-freiberg.de> wrote:
> Dear mailing-list,
>
> this is my first time posting here. I found that with version 3.0.4 using MPI_Win_allocate_shared i get an error when using a size exactly 2^31 everything below and above is ok. Though i also had the same issue with 2^34. Some kind of division or type conversion seems to be off. (/dev/shm has 4G so it is not a size issue.. i know what those errors look like)
>
> I attach my code and the output i get on a linux (debian 6.0) 64 bit machine (same issue on a mac though) .
>
> I'll be happy to provide more machine details or everything you guys need to analyse whats going on.
>
> with kind regards,
> Steffen Weise
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
ALCF docs: http://www.alcf.anl.gov/user-guides



More information about the discuss mailing list