[mpich-discuss] crash on 2^31 size in MPI_Win_allocate_shared(...)

Weise Steffen Steffen.Weise at iec.tu-freiberg.de
Sat Jun 8 03:02:07 CDT 2013


Dear Jeff,

thanks for taking care of the issue.
Sure your right about the signed vs. unsigned int stuff but i also tried it with long int : same result. Every integer in my code (memory manager) is long unsigned int, no reason to deal with negative sizes (at least from my perspective). I have an extended version of this example which actually uses the window, sets it and checks it afterwards (but didn't post it to keep the source file small, also the issue is not with that part of the code). All sizes not exactly equal to 2^31 and some multiples of that work very well and "df -h" on /dev/shm clearly shows that it is used correctly. Memory sizes returned by MPI_Win_shared_query also match the requested amount.

my installation is in /opt/mpi/ (which would mean i consider MPICH to be THE MPI) ;)
I still have to rename my source repo ~/git/openmpi though ;)

regards,
Steffen Weise



First, "long unsigned int window_size=2147483648" is not correct.
They type you need to use there is MPI_Aint.  The syntax of this
function is

int MPI_Win_allocate_shared(MPI_Aint size, int disp_unit, MPI_Info
info, MPI_Comm comm, void *baseptr, MPI_Win *win)

It may be true that "long unsigned int" is safely case to MPI_Aint,
but that's a very danger way to write code and it may be broken on
some platforms.

In any case, everything above 2^31 is probably not okay.  Unless
absolutely every integer type used in the code paths you are hitting
is size_t (or equivalent) and not int, you're going to hit overflow
somewhere.  Maybe I'm wrong, but you should verify (as a debugging
mechanism, not in general) that MPI_Win_allocate_shared is behaving as
desired by memset-ing the resulting data (mem) to verify that you're
actually getting e.g. 2^34 bytes back.  If /dev/shm is 4G, I'm not
sure how that's possible but maybe the implementation doesn't use
that.

I'm going to be on a plane today but I'll run your code on my machine
and try to figure out more about how "count-safe"
MPI_Win_allocate_shared is.

Jeff

PS Installing MPICH in ~/git/openmpi is just dirty :-)

On Fri, Jun 7, 2013 at 5:50 AM, Weise Steffen
<Steffen.Weise at iec.tu-freiberg.de<mailto:Steffen.Weise at iec.tu-freiberg.de>> wrote:
Dear mailing-list,

this is my first time posting here. I found that with version 3.0.4 using MPI_Win_allocate_shared i get an error when using a size exactly 2^31 everything below and above is ok. Though i also had the same issue with 2^34. Some kind of division or type conversion seems to be off. (/dev/shm has 4G so it is not a size issue.. i know what those errors look like)

I attach my code and the output i get on a linux (debian 6.0) 64 bit machine (same issue on a mac though) .

I'll be happy to provide more machine details or everything you guys need to analyse whats going on.

with kind regards,
Steffen Weise



_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130608/8923aae8/attachment.html>


More information about the discuss mailing list