[mpich-discuss] Issues with shared memory window
Jonathan A Blair
qbit at utexas.edu
Sat Jul 26 16:49:12 CDT 2014
Thank you both Pavan and rr,
I thought working with a pointer to the window would be safer since my code
should _never_ work with the window itself. Not initializing a pointer is
bad practice, and nullifying the pointer trips hard-coded errors in the MPI
source (I'm not sure why). I modified the design to feed a reference to
MPI_Win_allocate_shared() and the window itself to MPI_Win_shared_query().
Do either of you know why MPI_Win_shared_query() requires the MPI_Win
variable to exist prior to allocating the space (or can point me to an
appropriate resource)? The MPI 3.0 specification seems to indicate the
possibility that the function allocates the window and passes a pointer
back to the user.
My barriers did indeed need repositioning - thanks rr.
I'm not sure how I overlooked passing my array pointer by reference. That
seemed to be the most egregious issue. The complaints from MPI_Finalize()
were because I did not free the window and communicator as I should have.
Thanks Pavan.
I'll include the modified test example which works as intended.
The point of allocating shared memory was to remove all instances of
synchronization except initialization. I'm never doing MPI RMA via MPI
calls - I'm opting to use direct access. This is particularly important for
my application, since it is a hybrid between memory bound functions and CPU
bound functions.
Thanks,
Jonathan
[begin file test.cpp]
#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>
using namespace std;
int main(int argc, char *argv[]){
int rank;
int color = 1;
int ierr;
int *arr = NULL;
int disp_unit;
MPI_Aint size = 2048;
MPI_Aint reportedSize = 0;
MPI_Comm comm;
MPI_Win win;
ierr = MPI_Init( &argc, &argv );
ierr = MPI_Comm_rank( MPI_COMM_WORLD, &rank );
ierr = MPI_Comm_split( MPI_COMM_WORLD, color, rank, &comm );
if (rank == 0){
ierr = MPI_Win_allocate_shared( \
size, \
(int) sizeof(int), \
MPI_INFO_NULL, \
comm, \
(void *) &arr, \
&win );
printf( "Rank: %i ierr from MPI_Win_allocate_shared = %i\n", rank,
ierr);
for (int i=0; i < size; i++){
arr[i] = i;
}
printf( "Rank: %i arr[0] = %i, arr[1] = %i\n", rank, arr[0], arr[1] );
ierr = MPI_Barrier( comm );
ierr = MPI_Barrier( comm );
}
else{
ierr = MPI_Win_allocate_shared( \
(MPI_Aint) 0, \
(int) sizeof(int), \
MPI_INFO_NULL, \
comm, \
(void *) &arr, \
&win );
printf( "Rank: %i ierr from MPI_Win_allocate_shared = %i\n", rank,
ierr);
ierr = MPI_Win_shared_query( \
win, \
(int) 0, \
&reportedSize, \
&disp_unit, \
(void *) &arr );
printf( "Rank: %i ierr from MPI_Win_shared_query = %i\n", rank, ierr);
printf( "Rank: %i reportedSize = %i\n", rank, (int) reportedSize);
printf( "Rank: %i disp_unit = %i\n", rank, disp_unit);
ierr = MPI_Barrier( comm );
printf( "Rank: %i arr[0] = %i, arr[1] = %i\n", rank, arr[0], arr[1] );
ierr = MPI_Barrier( comm );
}
MPI_Win_free( &win );
MPI_Comm_free( &comm );
ierr = MPI_Finalize();
return 0;
}
[end file test.cpp]
On Sat, Jul 26, 2014 at 9:46 AM, Balaji, Pavan <balaji at anl.gov> wrote:
> Jonathan,
>
> There are several errors in the code.
>
> Why is the window malloc(0)’ed? The window is created by
> Win_allocate_shared.
>
> You need to do a barrier after all accesses to the window. Otherwise,
> another process might free the shared memory while you are accessing it.
>
> You need to have a bunch of Win_sync’s after each process has done its
> writes and before processes do their reads.
>
> You need to free the Window using MPI_Win_free.
>
> You should free your communicator, otherwise it’s a resource leak.
>
> The argument to be passed for your array should be &arr, not arr, since
> arr is an OUT parameter.
>
> I’d recommend taking one of the code examples in test/mpi/rma and
> modifying it to what you need. That way the code would be standard
> compliant.
>
> Regards,
>
> — Pavan
>
> On Jul 25, 2014, at 11:07 PM, Jonathan Blair <qbit at utexas.edu> wrote:
>
> > Hi MPICH users,
> >
> > I've been having issues with MPI_Win_allocate_shared(). I believe my use
> case is compliant with the standard, but I am not ruling out ignorance on
> my part as the fault.
> >
> > In my project, one task allocates the memory to be shared, and the other
> tasks attach to the shared memory. The allocation function returns
> MPI_SUCCESS, as do all calls of MPI_Win_shared_query(). The size and
> displacement unit match expected values. However, the picture of the memory
> is nonuniform.
> >
> > I'm running this on a shared memory system (the communicator is
> intra-node, currently being tested on a desktop), with MPICH 3.1.2
> installed, passing all internal tests during installation.
> >
> > I notice that MPI_Free_mem() reports errors and I believe MPI_Finalize()
> causes a segfault, but I'm not sure if this is specifically related to the
> issue at hand.
> >
> > I have included a minimal test case below. Does anyone have any insight
> into my problem?
> >
> > Thanks for you input,
> > Jonathan
> >
> >
> > [begin file test.cpp]
> > #include <stdlib.h>
> > #include <stdio.h>
> > #include <mpi.h>
> >
> > using namespace std;
> >
> > int main(int argc, char *argv[]){
> >
> > int rank;
> > int color = 1;
> > int ierr;
> > int *arr = (int *) malloc( 0 );
> > int disp_unit;
> >
> > MPI_Aint size = 2048;
> > MPI_Aint reportedSize = 0;
> > MPI_Comm comm;
> >
> > ierr = MPI_Init( &argc, &argv );
> > ierr = MPI_Comm_rank( MPI_COMM_WORLD, &rank );
> > ierr = MPI_Comm_split( MPI_COMM_WORLD, color, rank, &comm );
> >
> > MPI_Win *win = (MPI_Win *) malloc( 0 );
> >
> > if (rank == 0){
> > ierr = MPI_Win_allocate_shared( \
> > size, \
> > (int) sizeof(int), \
> > MPI_INFO_NULL, \
> > comm, \
> > (void *) arr, \
> > win );
> >
> > printf( "Rank: %i ierr from MPI_Win_allocate_shared = %i\n", rank,
> ierr);
> >
> > ierr = MPI_Barrier( comm );
> >
> > for (int i=0; i < size; i++){
> > arr[i] = i;
> > }
> >
> > printf( "Rank: %i arr[0] = %i, arr[1] = %i\n", rank, arr[0], arr[1] );
> >
> > ierr = MPI_Barrier( comm );
> > }
> > else{
> > ierr = MPI_Win_allocate_shared( \
> > (MPI_Aint) 0, \
> > (int) sizeof(int), \
> > MPI_INFO_NULL, \
> > comm, \
> > (void *) arr, \
> > win );
> >
> > printf( "Rank: %i ierr from MPI_Win_allocate_shared = %i\n", rank,
> ierr);
> >
> > ierr = MPI_Win_shared_query( \
> > *win, \
> > (int) 0, \
> > &reportedSize, \
> > &disp_unit, \
> > (void *) arr );
> >
> > printf( "Rank: %i ierr from MPI_Win_shared_query = %i\n", rank, ierr);
> > printf( "Rank: %i reportedSize = %i\n", rank, (int) reportedSize);
> > printf( "Rank: %i disp_unit = %i\n", rank, disp_unit);
> >
> > ierr = MPI_Barrier( comm );
> >
> > ierr = MPI_Barrier( comm );
> > printf( "Rank: %i arr[0] = %i, arr[1] = %i\n", rank, arr[0], arr[1] );
> > }
> >
> > MPI_Free_mem((void *) win);
> > ierr = MPI_Finalize();
> > return 0;
> > }
> > [end file test.cpp]
> >
> >
> >
> > [begin shell output]
> > $ mpirun -n 2 ./test
> > Rank: 0 ierr from MPI_Win_allocate_shared = 0
> > Rank: 1 ierr from MPI_Win_allocate_shared = 0
> > Rank: 1 ierr from MPI_Win_shared_query = 0
> > Rank: 1 reportedSize = 2048
> > Rank: 1 disp_unit = 4
> > Rank: 0 arr[0] = 0, arr[1] = 1
> > Rank: 1 arr[0] = 1996775424, arr[1] = 32592
> > [0] Block at address 0x000000000093f190 is corrupted; cannot free;
> > may be block not allocated with MPL_trmalloc or MALLOC
> > called in /path/to/mpich-3.1.2/src/mpid/ch3/src/ch3u_rma_ops.c at line
> 493
> > [1] Block at address 0x0000000000ec8190 is corrupted; cannot free;
> > may be block not allocated with MPL_trmalloc or MALLOC
> > called in /path/to/mpich-3.1.2/src/mpid/ch3/src/ch3u_rma_ops.c at line
> 493
> > [1] 56 at [0x0000000000eccb78],
> ich-3.1.2/src/util/wrappers/mpiu_shm_wrappers.h[188]
> > [1] 24 at [0x0000000000eccab8],
> ich-3.1.2/src/util/wrappers/mpiu_shm_wrappers.h[217]
> > [1] 56 at [0x0000000000ecc9d8],
> ich-3.1.2/src/util/wrappers/mpiu_shm_wrappers.h[188]
> > [1] 24 at [0x0000000000ecc918],
> ich-3.1.2/src/util/wrappers/mpiu_shm_wrappers.h[217]
> > [1] 8 at [0x0000000000ecc788],
> src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c[131]
> > [1] 8 at [0x0000000000eca3a8],
> src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c[127]
> > [1] 8 at [0x0000000000ec74e8],
> src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c[123]
> > [1] 16 at [0x0000000000ecc6c8],
> src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c[120]
> > [1] 16 at [0x0000000000ecc608],
> src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c[117]
> > [1] 16 at [0x0000000000ecc548],
> src/mpid/ch3/channels/nemesis/src/ch3_win_fns.c[113]
> > [1] 48 at [0x0000000000ecbe98],
> h/mpich/mpich-3.1.2/src/mpid/ch3/src/mpid_rma.c[301]
> > [1] 32 at [0x0000000000ecc478],
> ch/mpich/mpich-3.1.2/src/mpid/ch3/src/mpid_vc.c[122]
> > [1] 8 at [0x0000000000ecc2f8],
> mpich/mpich-3.1.2/src/util/procmap/local_proc.c[93]
> > [1] 8 at [0x0000000000ecc248],
> mpich/mpich-3.1.2/src/util/procmap/local_proc.c[92]
> > [1] 32 at [0x0000000000ecc3a8],
> ch/mpich/mpich-3.1.2/src/mpid/ch3/src/mpid_vc.c[122]
> > [1] 8 at [0x0000000000ecc198],
> mpich/mpich-3.1.2/src/util/procmap/local_proc.c[93]
> > [1] 8 at [0x0000000000ecc0e8],
> mpich/mpich-3.1.2/src/util/procmap/local_proc.c[92]
> > [1] 32 at [0x0000000000ecc018],
> ch/mpich/mpich-3.1.2/src/mpid/ch3/src/mpid_vc.c[122]
> > [1] 504 at [0x0000000000ecafc8],
> earch/mpich/mpich-3.1.2/src/mpi/comm/commutil.c[281]
> > [1] 504 at [0x0000000000ecaa88],
> earch/mpich/mpich-3.1.2/src/mpi/comm/commutil.c[281]
> >
> >
> ===================================================================================
> > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> > = PID 8822 RUNNING AT Machine
> > = EXIT CODE: 139
> > = CLEANING UP REMAINING PROCESSES
> > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> >
> ===================================================================================
> > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault
> (signal 11)
> > This typically refers to a problem with your application.
> > Please see the FAQ page for debugging suggestions
> > [end shell output]
> > _______________________________________________
> > discuss mailing list discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
> --
> Pavan Balaji ✉️
> http://www.mcs.anl.gov/~balaji
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
--
Jonathan Blair
College of Natural Sciences | Physics - Computation | Mathematics
The University of Texas at Austin
512-230-0543 | qbit at utexas.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140726/3186f5e1/attachment.html>
More information about the discuss
mailing list