[mpich-discuss] MPI_Win_fence failed
Sufeng Niu
sniu at hawk.iit.edu
Wed Jul 10 12:08:19 CDT 2013
Oh, yeah, that would be an easier way. I just create a repository in
github. you can
git clone https://github.com/sufengniu/mpi_app_test.git
to run the program. you need to install a tif library. I know ubuntu is
sudo apt-get install libtiff4-dev.
after you download it. just make
then there will be 2 bin file,
please change hostfile to your machine, first run mpi: ./run.perl main
then run ./udp_client 55Fe_run5_dark.tif
Thanks a lot!
Sufeng
On Wed, Jul 10, 2013 at 11:57 AM, <discuss-request at mpich.org> wrote:
> Send discuss mailing list submissions to
> discuss at mpich.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.mpich.org/mailman/listinfo/discuss
> or, via email, send a message with subject or body 'help' to
> discuss-request at mpich.org
>
> You can reach the person managing the list at
> discuss-owner at mpich.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of discuss digest..."
>
>
> Today's Topics:
>
> 1. Re: MPI_Win_fence failed (Jeff Hammond)
> 2. Re: MPI_Win_fence failed (Sufeng Niu)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 10 Jul 2013 11:46:08 -0500
> From: Jeff Hammond <jeff.science at gmail.com>
> To: discuss at mpich.org
> Subject: Re: [mpich-discuss] MPI_Win_fence failed
> Message-ID:
> <CAGKz=
> uLiq6rur+15MBip5U-_AS2JWefYOHfX07b1dkR8POOk6A at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Just post the code so we can run it.
>
> Jeff
>
> On Wed, Jul 10, 2013 at 11:39 AM, Sufeng Niu <sniu at hawk.iit.edu> wrote:
> > Sorry I forget to add screen shot for backtrace. the screen shot is
> > attached.
> >
> > Thanks a lot!
> >
> > Sufeng
> >
> >
> >
> > On Wed, Jul 10, 2013 at 11:30 AM, <discuss-request at mpich.org> wrote:
> >>
> >> Send discuss mailing list submissions to
> >> discuss at mpich.org
> >>
> >> To subscribe or unsubscribe via the World Wide Web, visit
> >> https://lists.mpich.org/mailman/listinfo/discuss
> >> or, via email, send a message with subject or body 'help' to
> >> discuss-request at mpich.org
> >>
> >> You can reach the person managing the list at
> >> discuss-owner at mpich.org
> >>
> >> When replying, please edit your Subject line so it is more specific
> >> than "Re: Contents of discuss digest..."
> >>
> >>
> >> Today's Topics:
> >>
> >> 1. Re: MPI_Win_fence failed (Sufeng Niu)
> >>
> >>
> >> ----------------------------------------------------------------------
> >>
> >> Message: 1
> >> Date: Wed, 10 Jul 2013 11:30:36 -0500
> >> From: Sufeng Niu <sniu at hawk.iit.edu>
> >> To: discuss at mpich.org
> >> Subject: Re: [mpich-discuss] MPI_Win_fence failed
> >> Message-ID:
> >>
> >> <CAFNNHkyLj8CbYMmc_w2DA9_+q2Oe3kyus+g6c99ShPk6ZXVkdA at mail.gmail.com>
> >> Content-Type: text/plain; charset="iso-8859-1"
> >>
> >>
> >> Hi Jim,
> >>
> >> Thanks a lot for your reply. the basic way for me to debugging is
> >> barrier+printf, right now I only have an evaluation version of
> totalview.
> >> the backtrace using totalview shown below. the udp is the udp collection
> >> and create RMA window, image_rms doing MPI_Get to access the window
> >>
> >> There is a segment violation, but I don't know why the program stopped
> at
> >> MPI_Win_fence.
> >>
> >> Thanks a lot!
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Jul 10, 2013 at 10:12 AM, <discuss-request at mpich.org> wrote:
> >>
> >> > Send discuss mailing list submissions to
> >> > discuss at mpich.org
> >> >
> >> > To subscribe or unsubscribe via the World Wide Web, visit
> >> > https://lists.mpich.org/mailman/listinfo/discuss
> >> > or, via email, send a message with subject or body 'help' to
> >> > discuss-request at mpich.org
> >> >
> >> > You can reach the person managing the list at
> >> > discuss-owner at mpich.org
> >> >
> >> > When replying, please edit your Subject line so it is more specific
> >> > than "Re: Contents of discuss digest..."
> >> >
> >> >
> >> > Today's Topics:
> >> >
> >> > 1. Re: MPICH3.0.4 make fails with "No rule to make target..."
> >> > (Wesley Bland)
> >> > 2. Re: Error in MPI_Finalize on a simple ring test over TCP
> >> > (Wesley Bland)
> >> > 3. Restrict number of cores, not threads (Bob Ilgner)
> >> > 4. Re: Restrict number of cores, not threads (Wesley Bland)
> >> > 5. Re: Restrict number of cores, not threads (Wesley Bland)
> >> > 6. Re: Error in MPI_Finalize on a simple ring test over TCP
> >> > (Thomas Ropars)
> >> > 7. MPI_Win_fence failed (Sufeng Niu)
> >> > 8. Re: MPI_Win_fence failed (Jim Dinan)
> >> >
> >> >
> >> > ----------------------------------------------------------------------
> >> >
> >> > Message: 1
> >> > Date: Wed, 10 Jul 2013 08:29:06 -0500
> >> > From: Wesley Bland <wbland at mcs.anl.gov>
> >> > To: discuss at mpich.org
> >> > Subject: Re: [mpich-discuss] MPICH3.0.4 make fails with "No rule to
> >> > make target..."
> >> > Message-ID: <F48FC916-31F7-4F82-95F8-2D6A6C45264F at mcs.anl.gov>
> >> > Content-Type: text/plain; charset="iso-8859-1"
> >> >
> >> > Unfortunately, due to the lack of developer resources and interest,
> the
> >> > last version of MPICH which was supported on Windows was 1.4.1p. You
> can
> >> > find that version on the downloads page:
> >> >
> >> > http://www.mpich.org/downloads/
> >> >
> >> > Alternatively, Microsoft maintains a derivative of MPICH which should
> >> > provide the features you need. You also find a link to that on the
> >> > downloads page above.
> >> >
> >> > Wesley
> >> >
> >> > On Jul 10, 2013, at 1:16 AM, Don Warren <don.warren at gmail.com> wrote:
> >> >
> >> > > Hello,
> >> > >
> >> > > As requested in the installation guide, I'm informing this list of a
> >> > failure to correctly make MPICH3.0.4 on a Win7 system. The specific
> >> > error
> >> > encountered is
> >> > > "make[2]: *** No rule to make target
> >> > `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.am', needed by
> >> > `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.in'. Stop."
> >> > >
> >> > > I have confirmed that both Makefile.am and Makefile.in exist in the
> >> > directory listed. I'm attaching the c.txt and the m.txt files.
> >> > >
> >> > > Possibly of interest is that the command "make clean" fails at
> exactly
> >> > the same folder, with exactly the same error message as shown in m.txt
> >> > and
> >> > above.
> >> > >
> >> > > Any advice you can give would be appreciated. I'm attempting to get
> >> > FLASH running on my computer, which seems to require MPICH.
> >> > >
> >> > > Regards,
> >> > > Don Warren
> >> > >
> >> > >
> <config-make-outputs.zip>_______________________________________________
> >>
> >> > > discuss mailing list discuss at mpich.org
> >> > > To manage subscription options or unsubscribe:
> >> > > https://lists.mpich.org/mailman/listinfo/discuss
> >> >
> >> > -------------- next part --------------
> >> > An HTML attachment was scrubbed...
> >> > URL: <
> >> >
> >> >
> http://lists.mpich.org/pipermail/discuss/attachments/20130710/69b497f1/attachment-0001.html
> >> > >
> >> >
> >> > ------------------------------
> >> >
> >> > Message: 2
> >> > Date: Wed, 10 Jul 2013 08:39:47 -0500
> >> > From: Wesley Bland <wbland at mcs.anl.gov>
> >> > To: discuss at mpich.org
> >> > Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple ring
> >> > test over TCP
> >> > Message-ID: <D5999106-2A75-4091-8B0F-EAFA22880862 at mcs.anl.gov>
> >> > Content-Type: text/plain; charset=us-ascii
> >> >
> >> > The value of previous for rank 0 in your code is -1. MPICH is
> >> > complaining
> >> > because all of the requests to receive a message from -1 are still
> >> > pending
> >> > when you try to finalize. You need to make sure that you are receiving
> >> > from
> >> > valid ranks.
> >> >
> >> > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <thomas.ropars at epfl.ch>
> >> > wrote:
> >> >
> >> > > Yes sure. Here it is.
> >> > >
> >> > > Thomas
> >> > >
> >> > > On 07/10/2013 02:23 PM, Wesley Bland wrote:
> >> > >> Can you send us the smallest chunk of code that still exhibits this
> >> > error?
> >> > >>
> >> > >> Wesley
> >> > >>
> >> > >> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <thomas.ropars at epfl.ch>
> >> > wrote:
> >> > >>
> >> > >>> Hi all,
> >> > >>>
> >> > >>> I get the following error when I try to run a simple application
> >> > implementing a ring (each process sends to rank+1 and receives from
> >> > rank-1). More precisely, the error occurs during the call to
> >> > MPI_Finalize():
> >> > >>>
> >> > >>> Assertion failed in file
> >> > src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363:
> >> > sc->pg_is_set
> >> > >>> internal ABORT - process 0
> >> > >>>
> >> > >>> Does anybody else also noticed the same error?
> >> > >>>
> >> > >>> Here are all the details about my test:
> >> > >>> - The error is generated with mpich-3.0.2 (but I noticed the exact
> >> > same error with mpich-3.0.4)
> >> > >>> - I am using IPoIB for communication between nodes (The same thing
> >> > happens over Ethernet)
> >> > >>> - The problem comes from TCP links. When all processes are on the
> >> > >>> same
> >> > node, there is no error. As soon as one process is on a remote node,
> the
> >> > failure occurs.
> >> > >>> - Note also that the failure does not occur if I run a more
> complex
> >> > code (eg, a NAS benchmark).
> >> > >>>
> >> > >>> Thomas Ropars
> >>
> >> > >>> _______________________________________________
> >> > >>> discuss mailing list discuss at mpich.org
> >> > >>> To manage subscription options or unsubscribe:
> >> > >>> https://lists.mpich.org/mailman/listinfo/discuss
> >> > >> _______________________________________________
> >> > >> discuss mailing list discuss at mpich.org
> >> > >> To manage subscription options or unsubscribe:
> >> > >> https://lists.mpich.org/mailman/listinfo/discuss
> >> > >>
> >> > >>
> >> > >
> >> > > <ring_clean.c>_______________________________________________
> >>
> >> > > discuss mailing list discuss at mpich.org
> >> > > To manage subscription options or unsubscribe:
> >> > > https://lists.mpich.org/mailman/listinfo/discuss
> >> >
> >> >
> >> >
> >> > ------------------------------
> >> >
> >> > Message: 3
> >> > Date: Wed, 10 Jul 2013 16:41:27 +0200
> >> > From: Bob Ilgner <bobilgner at gmail.com>
> >> > To: mpich-discuss at mcs.anl.gov
> >> > Subject: [mpich-discuss] Restrict number of cores, not threads
> >> > Message-ID:
> >> > <
> >> > CAKv15b-QgmHkVkoiTFmP3EZXvyy6sc_QeqHQgbMUhnr3Xh9ecA at mail.gmail.com>
> >> > Content-Type: text/plain; charset="iso-8859-1"
> >> >
> >> > Dear all,
> >> >
> >> > I am working on a shared memory processor with 256 cores. I am working
> >> > from
> >> > the command line directly.
> >> >
> >> > Can I restict the number of cores that I deploy.The command
> >> >
> >> > mpirun -n 100 myprog
> >> >
> >> >
> >> > will automatically start on 100 cores. I wish to use only 10 cores and
> >> > have
> >> > 10 threads on each core. Can I do this with mpich ? Rememebre that
> this
> >> > an
> >> > smp abd I can not identify each core individually(as in a cluster)
> >> >
> >> > Regards, bob
> >> > -------------- next part --------------
> >> > An HTML attachment was scrubbed...
> >> > URL: <
> >> >
> >> >
> http://lists.mpich.org/pipermail/discuss/attachments/20130710/ec659e91/attachment-0001.html
> >> > >
> >> >
> >> > ------------------------------
> >> >
> >> > Message: 4
> >> > Date: Wed, 10 Jul 2013 09:46:38 -0500
> >> > From: Wesley Bland <wbland at mcs.anl.gov>
> >> > To: discuss at mpich.org
> >> > Cc: mpich-discuss at mcs.anl.gov
> >> > Subject: Re: [mpich-discuss] Restrict number of cores, not threads
> >> > Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
> >> > Content-Type: text/plain; charset=iso-8859-1
> >> >
> >> > Threads in MPI are not ranks. When you say you want to launch with -n
> >> > 100,
> >> > you will always get 100 processes, not threads. If you want 10 threads
> >> > on
> >> > 10 cores, you will need to launch with -n 10, then add your threads
> >> > according to your threading library.
> >> >
> >> > Note that threads in MPI do not get their own rank currently. They all
> >> > share the same rank as the process in which they reside, so if you
> need
> >> > to
> >> > be able to handle things with different ranks, you'll need to use
> actual
> >> > processes.
> >> >
> >> > Wesley
> >> >
> >> > On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com> wrote:
> >> >
> >> > > Dear all,
> >> > >
> >> > > I am working on a shared memory processor with 256 cores. I am
> working
> >> > from the command line directly.
> >> > >
> >> > > Can I restict the number of cores that I deploy.The command
> >> > >
> >> > > mpirun -n 100 myprog
> >> > >
> >> > >
> >> > > will automatically start on 100 cores. I wish to use only 10 cores
> and
> >> > have 10 threads on each core. Can I do this with mpich ? Rememebre
> that
> >> > this an smp abd I can not identify each core individually(as in a
> >> > cluster)
> >> > >
> >> > > Regards, bob
> >>
> >> > > _______________________________________________
> >> > > discuss mailing list discuss at mpich.org
> >> > > To manage subscription options or unsubscribe:
> >> > > https://lists.mpich.org/mailman/listinfo/discuss
> >> >
> >> >
> >> >
> >> > ------------------------------
> >> >
> >> > Message: 5
> >> > Date: Wed, 10 Jul 2013 09:46:38 -0500
> >> > From: Wesley Bland <wbland at mcs.anl.gov>
> >> > To: discuss at mpich.org
> >> > Cc: mpich-discuss at mcs.anl.gov
> >> > Subject: Re: [mpich-discuss] Restrict number of cores, not threads
> >> > Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
> >> > Content-Type: text/plain; charset=iso-8859-1
> >> >
> >> > Threads in MPI are not ranks. When you say you want to launch with -n
> >> > 100,
> >> > you will always get 100 processes, not threads. If you want 10 threads
> >> > on
> >> > 10 cores, you will need to launch with -n 10, then add your threads
> >> > according to your threading library.
> >> >
> >> > Note that threads in MPI do not get their own rank currently. They all
> >> > share the same rank as the process in which they reside, so if you
> need
> >> > to
> >> > be able to handle things with different ranks, you'll need to use
> actual
> >> > processes.
> >> >
> >> > Wesley
> >> >
> >> > On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com> wrote:
> >> >
> >> > > Dear all,
> >> > >
> >> > > I am working on a shared memory processor with 256 cores. I am
> working
> >> > from the command line directly.
> >> > >
> >> > > Can I restict the number of cores that I deploy.The command
> >> > >
> >> > > mpirun -n 100 myprog
> >> > >
> >> > >
> >> > > will automatically start on 100 cores. I wish to use only 10 cores
> and
> >> > have 10 threads on each core. Can I do this with mpich ? Rememebre
> that
> >> > this an smp abd I can not identify each core individually(as in a
> >> > cluster)
> >> > >
> >> > > Regards, bob
> >>
> >> > > _______________________________________________
> >> > > discuss mailing list discuss at mpich.org
> >> > > To manage subscription options or unsubscribe:
> >> > > https://lists.mpich.org/mailman/listinfo/discuss
> >> >
> >> >
> >> >
> >> > ------------------------------
> >> >
> >> > Message: 6
> >> > Date: Wed, 10 Jul 2013 16:50:36 +0200
> >> > From: Thomas Ropars <thomas.ropars at epfl.ch>
> >> > To: discuss at mpich.org
> >> > Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple ring
> >> > test over TCP
> >> > Message-ID: <51DD74BC.3020009 at epfl.ch>
> >> > Content-Type: text/plain; charset=UTF-8; format=flowed
> >> >
> >> > Yes, you are right, sorry for disturbing.
> >> >
> >> > On 07/10/2013 03:39 PM, Wesley Bland wrote:
> >> > > The value of previous for rank 0 in your code is -1. MPICH is
> >> > complaining because all of the requests to receive a message from -1
> are
> >> > still pending when you try to finalize. You need to make sure that you
> >> > are
> >> > receiving from valid ranks.
> >> > >
> >> > > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <thomas.ropars at epfl.ch>
> >> > wrote:
> >> > >
> >> > >> Yes sure. Here it is.
> >> > >>
> >> > >> Thomas
> >> > >>
> >> > >> On 07/10/2013 02:23 PM, Wesley Bland wrote:
> >> > >>> Can you send us the smallest chunk of code that still exhibits
> this
> >> > error?
> >> > >>>
> >> > >>> Wesley
> >> > >>>
> >> > >>> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <thomas.ropars at epfl.ch
> >
> >> > wrote:
> >> > >>>
> >> > >>>> Hi all,
> >> > >>>>
> >> > >>>> I get the following error when I try to run a simple application
> >> > implementing a ring (each process sends to rank+1 and receives from
> >> > rank-1). More precisely, the error occurs during the call to
> >> > MPI_Finalize():
> >> > >>>>
> >> > >>>> Assertion failed in file
> >> > src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363:
> >> > sc->pg_is_set
> >> > >>>> internal ABORT - process 0
> >> > >>>>
> >> > >>>> Does anybody else also noticed the same error?
> >> > >>>>
> >> > >>>> Here are all the details about my test:
> >> > >>>> - The error is generated with mpich-3.0.2 (but I noticed the
> exact
> >> > same error with mpich-3.0.4)
> >> > >>>> - I am using IPoIB for communication between nodes (The same
> thing
> >> > happens over Ethernet)
> >> > >>>> - The problem comes from TCP links. When all processes are on the
> >> > same node, there is no error. As soon as one process is on a remote
> >> > node,
> >> > the failure occurs.
> >> > >>>> - Note also that the failure does not occur if I run a more
> complex
> >> > code (eg, a NAS benchmark).
> >> > >>>>
> >> > >>>> Thomas Ropars
> >>
> >> > >>>> _______________________________________________
> >> > >>>> discuss mailing list discuss at mpich.org
> >> > >>>> To manage subscription options or unsubscribe:
> >> > >>>> https://lists.mpich.org/mailman/listinfo/discuss
> >> > >>> _______________________________________________
> >> > >>> discuss mailing list discuss at mpich.org
> >> > >>> To manage subscription options or unsubscribe:
> >> > >>> https://lists.mpich.org/mailman/listinfo/discuss
> >> > >>>
> >> > >>>
> >> > >> <ring_clean.c>_______________________________________________
> >>
> >> > >> discuss mailing list discuss at mpich.org
> >> > >> To manage subscription options or unsubscribe:
> >> > >> https://lists.mpich.org/mailman/listinfo/discuss
> >> > > _______________________________________________
> >> > > discuss mailing list discuss at mpich.org
> >> > > To manage subscription options or unsubscribe:
> >> > > https://lists.mpich.org/mailman/listinfo/discuss
> >> > >
> >> > >
> >> >
> >> >
> >> >
> >> > ------------------------------
> >> >
> >> > Message: 7
> >> > Date: Wed, 10 Jul 2013 10:07:21 -0500
> >> > From: Sufeng Niu <sniu at hawk.iit.edu>
> >> > To: discuss at mpich.org
> >> > Subject: [mpich-discuss] MPI_Win_fence failed
> >> > Message-ID:
> >> > <
> >> > CAFNNHkz_1gC7hfpx0G9j24adO-gDabdmwZ4VuT6jip-fDMhS9A at mail.gmail.com>
> >> > Content-Type: text/plain; charset="iso-8859-1"
> >>
> >> >
> >> > Hello,
> >> >
> >> > I used MPI RMA in my program, but the program stop at the
> MPI_Win_fence,
> >> > I
> >> > have a master process receive data from udp socket. Other processes
> use
> >> > MPI_Get to access data.
> >> >
> >> > master process:
> >> >
> >> > MPI_Create(...)
> >> > for(...){
> >> > /* udp recv operation */
> >> >
> >> > MPI_Barrier // to let other process know data received from udp is
> >> > ready
> >> >
> >> > MPI_Win_fence(0, win);
> >> > MPI_Win_fence(0, win);
> >> >
> >> > }
> >> >
> >> > other processes:
> >> >
> >> > for(...){
> >> >
> >> > MPI_Barrier // sync for udp data ready
> >> >
> >> > MPI_Win_fence(0, win);
> >> >
> >> > MPI_Get();
> >> >
> >> > MPI_Win_fence(0, win); <-- program stopped here
> >> >
> >> > /* other operation */
> >> > }
> >> >
> >> > I found that the program stopped at second MPI_Win_fence, the terminal
> >> > output is:
> >> >
> >> >
> >> >
> >> >
> >> >
> ===================================================================================
> >> > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> >> > = EXIT CODE: 11
> >> > = CLEANING UP REMAINING PROCESSES
> >> > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> >> >
> >> >
> >> >
> ===================================================================================
> >> > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault
> >> > (signal 11)
> >> > This typically refers to a problem with your application.
> >> > Please see the FAQ page for debugging suggestions
> >> >
> >> > Do you have any suggestions? Thank you very much!
> >> >
> >> > --
> >> > Best Regards,
> >> > Sufeng Niu
> >> > ECASP lab, ECE department, Illinois Institute of Technology
> >> > Tel: 312-731-7219
> >> > -------------- next part --------------
> >> > An HTML attachment was scrubbed...
> >> > URL: <
> >> >
> >> >
> http://lists.mpich.org/pipermail/discuss/attachments/20130710/375a95ac/attachment-0001.html
> >> > >
> >> >
> >> > ------------------------------
> >> >
> >> > Message: 8
> >> > Date: Wed, 10 Jul 2013 11:12:45 -0400
> >> > From: Jim Dinan <james.dinan at gmail.com>
> >> > To: discuss at mpich.org
> >> > Subject: Re: [mpich-discuss] MPI_Win_fence failed
> >> > Message-ID:
> >> > <CAOoEU4F3hX=y3yrJKYKucNeiueQYBeR_3OQas9E+mg+GM6Rz=
> >> > w at mail.gmail.com>
> >> > Content-Type: text/plain; charset="iso-8859-1"
> >>
> >> >
> >> > It's hard to tell where the segmentation fault is coming from. Can
> you
> >> > use
> >> > a debugger to generate a backtrace?
> >> >
> >> > ~Jim.
> >> >
> >> >
> >> > On Wed, Jul 10, 2013 at 11:07 AM, Sufeng Niu <sniu at hawk.iit.edu>
> wrote:
> >> >
> >> > > Hello,
> >> > >
> >> > > I used MPI RMA in my program, but the program stop at the
> >> > > MPI_Win_fence,
> >> > I
> >> > > have a master process receive data from udp socket. Other processes
> >> > > use
> >> > > MPI_Get to access data.
> >> > >
> >> > > master process:
> >> > >
> >> > > MPI_Create(...)
> >> > > for(...){
> >> > > /* udp recv operation */
> >> > >
> >> > > MPI_Barrier // to let other process know data received from udp is
> >> > > ready
> >> > >
> >> > > MPI_Win_fence(0, win);
> >> > > MPI_Win_fence(0, win);
> >> > >
> >> > > }
> >> > >
> >> > > other processes:
> >> > >
> >> > > for(...){
> >> > >
> >> > > MPI_Barrier // sync for udp data ready
> >> > >
> >> > > MPI_Win_fence(0, win);
> >> > >
> >> > > MPI_Get();
> >> > >
> >> > > MPI_Win_fence(0, win); <-- program stopped here
> >> > >
> >> > > /* other operation */
> >> > > }
> >> > >
> >> > > I found that the program stopped at second MPI_Win_fence, the
> terminal
> >> > > output is:
> >> > >
> >> > >
> >> > >
> >> > >
> >> >
> >> >
> ===================================================================================
> >> > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> >> > > = EXIT CODE: 11
> >> > > = CLEANING UP REMAINING PROCESSES
> >> > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> >> > >
> >> > >
> >> >
> >> >
> ===================================================================================
> >> > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault
> >> > > (signal 11)
> >> > > This typically refers to a problem with your application.
> >> > > Please see the FAQ page for debugging suggestions
> >> > >
> >> > > Do you have any suggestions? Thank you very much!
> >> > >
> >> > > --
> >> > > Best Regards,
> >> > > Sufeng Niu
> >> > > ECASP lab, ECE department, Illinois Institute of Technology
> >> > > Tel: 312-731-7219
> >> > >
> >> > > _______________________________________________
> >> > > discuss mailing list discuss at mpich.org
> >> > > To manage subscription options or unsubscribe:
> >> > > https://lists.mpich.org/mailman/listinfo/discuss
> >> > >
> >> > -------------- next part --------------
> >> > An HTML attachment was scrubbed...
> >> > URL: <
> >> >
> >> >
> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48c5f337/attachment.html
> >> > >
> >> >
> >> > ------------------------------
> >>
> >> >
> >> > _______________________________________________
> >> > discuss mailing list
> >> > discuss at mpich.org
> >> > https://lists.mpich.org/mailman/listinfo/discuss
> >> >
> >> > End of discuss Digest, Vol 9, Issue 27
> >> > **************************************
> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Best Regards,
> >> Sufeng Niu
> >> ECASP lab, ECE department, Illinois Institute of Technology
> >> Tel: 312-731-7219
> >> -------------- next part --------------
> >> An HTML attachment was scrubbed...
> >> URL:
> >> <
> http://lists.mpich.org/pipermail/discuss/attachments/20130710/57a5e76f/attachment.html
> >
> >>
> >> ------------------------------
> >>
> >>
> >> _______________________________________________
> >> discuss mailing list
> >> discuss at mpich.org
> >> https://lists.mpich.org/mailman/listinfo/discuss
> >>
> >> End of discuss Digest, Vol 9, Issue 28
> >> **************************************
> >
> >
> >
> >
> > --
> > Best Regards,
> > Sufeng Niu
> > ECASP lab, ECE department, Illinois Institute of Technology
> > Tel: 312-731-7219
> >
> > _______________________________________________
> > discuss mailing list discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 10 Jul 2013 11:57:31 -0500
> From: Sufeng Niu <sniu at hawk.iit.edu>
> To: discuss at mpich.org
> Subject: Re: [mpich-discuss] MPI_Win_fence failed
> Message-ID:
> <
> CAFNNHkzKmAg8B6hamyrr7B2anU9EP_0yxmajxePVr35UnHVavw at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Sorry, I found that this discussion email cannot add figure or attachment.
>
> the backtrace information is below:
>
> processes Location
> PC Host Rank ID Status
> 7 _start
> 0x00402399
> `-7 _libc_start_main
> 0x3685c1ecdd
> `-7 main
> 0x00402474
> `-7 dkm
> ...
> |-6 image_rms
> 0x004029bb
> | `-6 rms
> 0x00402d44
> | `-6 PMPI_Win_fence
> 0x0040c389
> | `-6 MPIDI_Win_fence
> 0x004a45f4
> | `-6 MPIDI_CH3I_RMAListComplete 0x004a27d3
> | `-6 MPIDI_CH3I_Progress ...
> `-1 udp
> 0x004035cf
> `-1 PMPI_Win_fence
> 0x0040c389
> `-1 MPIDI_Win_fence
> 0x004a45a0
> `-1 MPIDI_CH3I_Progress 0x004292f5
> `-1 MPIDI_CH3_PktHandler_Get 0x0049f3f9
> `-1 MPIDI_CH3_iSendv 0x004aa67c
> `- memcpy
> 0x3685c89329 164.54.54.122 0 20.1-13994 Stopped
>
>
>
> On Wed, Jul 10, 2013 at 11:39 AM, <discuss-request at mpich.org> wrote:
>
> > Send discuss mailing list submissions to
> > discuss at mpich.org
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> > https://lists.mpich.org/mailman/listinfo/discuss
> > or, via email, send a message with subject or body 'help' to
> > discuss-request at mpich.org
> >
> > You can reach the person managing the list at
> > discuss-owner at mpich.org
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of discuss digest..."
> >
> >
> > Today's Topics:
> >
> > 1. Re: MPI_Win_fence failed (Sufeng Niu)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Wed, 10 Jul 2013 11:39:39 -0500
> > From: Sufeng Niu <sniu at hawk.iit.edu>
> > To: discuss at mpich.org
> > Subject: Re: [mpich-discuss] MPI_Win_fence failed
> > Message-ID:
> > <CAFNNHkz8pBfX33icn=+3rdXvqDfWqeu58odpd=
> > mOXLciysHgfg at mail.gmail.com>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > Sorry I forget to add screen shot for backtrace. the screen shot is
> > attached.
> >
> > Thanks a lot!
> >
> > Sufeng
> >
> >
> >
> > On Wed, Jul 10, 2013 at 11:30 AM, <discuss-request at mpich.org> wrote:
> >
> > > Send discuss mailing list submissions to
> > > discuss at mpich.org
> > >
> > > To subscribe or unsubscribe via the World Wide Web, visit
> > > https://lists.mpich.org/mailman/listinfo/discuss
> > > or, via email, send a message with subject or body 'help' to
> > > discuss-request at mpich.org
> > >
> > > You can reach the person managing the list at
> > > discuss-owner at mpich.org
> > >
> > > When replying, please edit your Subject line so it is more specific
> > > than "Re: Contents of discuss digest..."
> > >
> > >
> > > Today's Topics:
> > >
> > > 1. Re: MPI_Win_fence failed (Sufeng Niu)
> > >
> > >
> > > ----------------------------------------------------------------------
> > >
> > > Message: 1
> > > Date: Wed, 10 Jul 2013 11:30:36 -0500
> > > From: Sufeng Niu <sniu at hawk.iit.edu>
> > > To: discuss at mpich.org
> > > Subject: Re: [mpich-discuss] MPI_Win_fence failed
> > > Message-ID:
> > > <
> > > CAFNNHkyLj8CbYMmc_w2DA9_+q2Oe3kyus+g6c99ShPk6ZXVkdA at mail.gmail.com>
> > > Content-Type: text/plain; charset="iso-8859-1"
> > >
> > > Hi Jim,
> > >
> > > Thanks a lot for your reply. the basic way for me to debugging is
> > > barrier+printf, right now I only have an evaluation version of
> totalview.
> > > the backtrace using totalview shown below. the udp is the udp
> collection
> > > and create RMA window, image_rms doing MPI_Get to access the window
> > >
> > > There is a segment violation, but I don't know why the program stopped
> > at
> > > MPI_Win_fence.
> > >
> > > Thanks a lot!
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Jul 10, 2013 at 10:12 AM, <discuss-request at mpich.org> wrote:
> > >
> > > > Send discuss mailing list submissions to
> > > > discuss at mpich.org
> > > >
> > > > To subscribe or unsubscribe via the World Wide Web, visit
> > > > https://lists.mpich.org/mailman/listinfo/discuss
> > > > or, via email, send a message with subject or body 'help' to
> > > > discuss-request at mpich.org
> > > >
> > > > You can reach the person managing the list at
> > > > discuss-owner at mpich.org
> > > >
> > > > When replying, please edit your Subject line so it is more specific
> > > > than "Re: Contents of discuss digest..."
> > > >
> > > >
> > > > Today's Topics:
> > > >
> > > > 1. Re: MPICH3.0.4 make fails with "No rule to make target..."
> > > > (Wesley Bland)
> > > > 2. Re: Error in MPI_Finalize on a simple ring test over TCP
> > > > (Wesley Bland)
> > > > 3. Restrict number of cores, not threads (Bob Ilgner)
> > > > 4. Re: Restrict number of cores, not threads (Wesley Bland)
> > > > 5. Re: Restrict number of cores, not threads (Wesley Bland)
> > > > 6. Re: Error in MPI_Finalize on a simple ring test over TCP
> > > > (Thomas Ropars)
> > > > 7. MPI_Win_fence failed (Sufeng Niu)
> > > > 8. Re: MPI_Win_fence failed (Jim Dinan)
> > > >
> > > >
> > > >
> ----------------------------------------------------------------------
> > > >
> > > > Message: 1
> > > > Date: Wed, 10 Jul 2013 08:29:06 -0500
> > > > From: Wesley Bland <wbland at mcs.anl.gov>
> > > > To: discuss at mpich.org
> > > > Subject: Re: [mpich-discuss] MPICH3.0.4 make fails with "No rule to
> > > > make target..."
> > > > Message-ID: <F48FC916-31F7-4F82-95F8-2D6A6C45264F at mcs.anl.gov>
> > > > Content-Type: text/plain; charset="iso-8859-1"
> > > >
> > > > Unfortunately, due to the lack of developer resources and interest,
> the
> > > > last version of MPICH which was supported on Windows was 1.4.1p. You
> > can
> > > > find that version on the downloads page:
> > > >
> > > > http://www.mpich.org/downloads/
> > > >
> > > > Alternatively, Microsoft maintains a derivative of MPICH which should
> > > > provide the features you need. You also find a link to that on the
> > > > downloads page above.
> > > >
> > > > Wesley
> > > >
> > > > On Jul 10, 2013, at 1:16 AM, Don Warren <don.warren at gmail.com>
> wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > As requested in the installation guide, I'm informing this list of
> a
> > > > failure to correctly make MPICH3.0.4 on a Win7 system. The specific
> > > error
> > > > encountered is
> > > > > "make[2]: *** No rule to make target
> > > > `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.am', needed by
> > > > `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.in'. Stop."
> > > > >
> > > > > I have confirmed that both Makefile.am and Makefile.in exist in the
> > > > directory listed. I'm attaching the c.txt and the m.txt files.
> > > > >
> > > > > Possibly of interest is that the command "make clean" fails at
> > exactly
> > > > the same folder, with exactly the same error message as shown in
> m.txt
> > > and
> > > > above.
> > > > >
> > > > > Any advice you can give would be appreciated. I'm attempting to
> get
> > > > FLASH running on my computer, which seems to require MPICH.
> > > > >
> > > > > Regards,
> > > > > Don Warren
> > > > >
> > >
> <config-make-outputs.zip>_______________________________________________
> > > > > discuss mailing list discuss at mpich.org
> > > > > To manage subscription options or unsubscribe:
> > > > > https://lists.mpich.org/mailman/listinfo/discuss
> > > >
> > > > -------------- next part --------------
> > > > An HTML attachment was scrubbed...
> > > > URL: <
> > > >
> > >
> >
> http://lists.mpich.org/pipermail/discuss/attachments/20130710/69b497f1/attachment-0001.html
> > > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Message: 2
> > > > Date: Wed, 10 Jul 2013 08:39:47 -0500
> > > > From: Wesley Bland <wbland at mcs.anl.gov>
> > > > To: discuss at mpich.org
> > > > Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple ring
> > > > test over TCP
> > > > Message-ID: <D5999106-2A75-4091-8B0F-EAFA22880862 at mcs.anl.gov>
> > > > Content-Type: text/plain; charset=us-ascii
> > > >
> > > > The value of previous for rank 0 in your code is -1. MPICH is
> > complaining
> > > > because all of the requests to receive a message from -1 are still
> > > pending
> > > > when you try to finalize. You need to make sure that you are
> receiving
> > > from
> > > > valid ranks.
> > > >
> > > > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <thomas.ropars at epfl.ch>
> > > wrote:
> > > >
> > > > > Yes sure. Here it is.
> > > > >
> > > > > Thomas
> > > > >
> > > > > On 07/10/2013 02:23 PM, Wesley Bland wrote:
> > > > >> Can you send us the smallest chunk of code that still exhibits
> this
> > > > error?
> > > > >>
> > > > >> Wesley
> > > > >>
> > > > >> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <thomas.ropars at epfl.ch
> >
> > > > wrote:
> > > > >>
> > > > >>> Hi all,
> > > > >>>
> > > > >>> I get the following error when I try to run a simple application
> > > > implementing a ring (each process sends to rank+1 and receives from
> > > > rank-1). More precisely, the error occurs during the call to
> > > MPI_Finalize():
> > > > >>>
> > > > >>> Assertion failed in file
> > > > src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363:
> > > sc->pg_is_set
> > > > >>> internal ABORT - process 0
> > > > >>>
> > > > >>> Does anybody else also noticed the same error?
> > > > >>>
> > > > >>> Here are all the details about my test:
> > > > >>> - The error is generated with mpich-3.0.2 (but I noticed the
> exact
> > > > same error with mpich-3.0.4)
> > > > >>> - I am using IPoIB for communication between nodes (The same
> thing
> > > > happens over Ethernet)
> > > > >>> - The problem comes from TCP links. When all processes are on the
> > > same
> > > > node, there is no error. As soon as one process is on a remote node,
> > the
> > > > failure occurs.
> > > > >>> - Note also that the failure does not occur if I run a more
> complex
> > > > code (eg, a NAS benchmark).
> > > > >>>
> > > > >>> Thomas Ropars
> > > > >>> _______________________________________________
> > > > >>> discuss mailing list discuss at mpich.org
> > > > >>> To manage subscription options or unsubscribe:
> > > > >>> https://lists.mpich.org/mailman/listinfo/discuss
> > > > >> _______________________________________________
> > > > >> discuss mailing list discuss at mpich.org
> > > > >> To manage subscription options or unsubscribe:
> > > > >> https://lists.mpich.org/mailman/listinfo/discuss
> > > > >>
> > > > >>
> > > > >
> > > > > <ring_clean.c>_______________________________________________
> > > > > discuss mailing list discuss at mpich.org
> > > > > To manage subscription options or unsubscribe:
> > > > > https://lists.mpich.org/mailman/listinfo/discuss
> > > >
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Message: 3
> > > > Date: Wed, 10 Jul 2013 16:41:27 +0200
> > > > From: Bob Ilgner <bobilgner at gmail.com>
> > > > To: mpich-discuss at mcs.anl.gov
> > > > Subject: [mpich-discuss] Restrict number of cores, not threads
> > > > Message-ID:
> > > > <
> > > > CAKv15b-QgmHkVkoiTFmP3EZXvyy6sc_QeqHQgbMUhnr3Xh9ecA at mail.gmail.com>
> > > > Content-Type: text/plain; charset="iso-8859-1"
> > > >
> > > > Dear all,
> > > >
> > > > I am working on a shared memory processor with 256 cores. I am
> working
> > > from
> > > > the command line directly.
> > > >
> > > > Can I restict the number of cores that I deploy.The command
> > > >
> > > > mpirun -n 100 myprog
> > > >
> > > >
> > > > will automatically start on 100 cores. I wish to use only 10 cores
> and
> > > have
> > > > 10 threads on each core. Can I do this with mpich ? Rememebre that
> > this
> > > an
> > > > smp abd I can not identify each core individually(as in a cluster)
> > > >
> > > > Regards, bob
> > > > -------------- next part --------------
> > > > An HTML attachment was scrubbed...
> > > > URL: <
> > > >
> > >
> >
> http://lists.mpich.org/pipermail/discuss/attachments/20130710/ec659e91/attachment-0001.html
> > > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Message: 4
> > > > Date: Wed, 10 Jul 2013 09:46:38 -0500
> > > > From: Wesley Bland <wbland at mcs.anl.gov>
> > > > To: discuss at mpich.org
> > > > Cc: mpich-discuss at mcs.anl.gov
> > > > Subject: Re: [mpich-discuss] Restrict number of cores, not threads
> > > > Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
> > > > Content-Type: text/plain; charset=iso-8859-1
> > > >
> > > > Threads in MPI are not ranks. When you say you want to launch with -n
> > > 100,
> > > > you will always get 100 processes, not threads. If you want 10
> threads
> > on
> > > > 10 cores, you will need to launch with -n 10, then add your threads
> > > > according to your threading library.
> > > >
> > > > Note that threads in MPI do not get their own rank currently. They
> all
> > > > share the same rank as the process in which they reside, so if you
> need
> > > to
> > > > be able to handle things with different ranks, you'll need to use
> > actual
> > > > processes.
> > > >
> > > > Wesley
> > > >
> > > > On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com> wrote:
> > > >
> > > > > Dear all,
> > > > >
> > > > > I am working on a shared memory processor with 256 cores. I am
> > working
> > > > from the command line directly.
> > > > >
> > > > > Can I restict the number of cores that I deploy.The command
> > > > >
> > > > > mpirun -n 100 myprog
> > > > >
> > > > >
> > > > > will automatically start on 100 cores. I wish to use only 10 cores
> > and
> > > > have 10 threads on each core. Can I do this with mpich ? Rememebre
> > that
> > > > this an smp abd I can not identify each core individually(as in a
> > > cluster)
> > > > >
> > > > > Regards, bob
> > > > > _______________________________________________
> > > > > discuss mailing list discuss at mpich.org
> > > > > To manage subscription options or unsubscribe:
> > > > > https://lists.mpich.org/mailman/listinfo/discuss
> > > >
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Message: 5
> > > > Date: Wed, 10 Jul 2013 09:46:38 -0500
> > > > From: Wesley Bland <wbland at mcs.anl.gov>
> > > > To: discuss at mpich.org
> > > > Cc: mpich-discuss at mcs.anl.gov
> > > > Subject: Re: [mpich-discuss] Restrict number of cores, not threads
> > > > Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
> > > > Content-Type: text/plain; charset=iso-8859-1
> > > >
> > > > Threads in MPI are not ranks. When you say you want to launch with -n
> > > 100,
> > > > you will always get 100 processes, not threads. If you want 10
> threads
> > on
> > > > 10 cores, you will need to launch with -n 10, then add your threads
> > > > according to your threading library.
> > > >
> > > > Note that threads in MPI do not get their own rank currently. They
> all
> > > > share the same rank as the process in which they reside, so if you
> need
> > > to
> > > > be able to handle things with different ranks, you'll need to use
> > actual
> > > > processes.
> > > >
> > > > Wesley
> > > >
> > > > On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com> wrote:
> > > >
> > > > > Dear all,
> > > > >
> > > > > I am working on a shared memory processor with 256 cores. I am
> > working
> > > > from the command line directly.
> > > > >
> > > > > Can I restict the number of cores that I deploy.The command
> > > > >
> > > > > mpirun -n 100 myprog
> > > > >
> > > > >
> > > > > will automatically start on 100 cores. I wish to use only 10 cores
> > and
> > > > have 10 threads on each core. Can I do this with mpich ? Rememebre
> > that
> > > > this an smp abd I can not identify each core individually(as in a
> > > cluster)
> > > > >
> > > > > Regards, bob
> > > > > _______________________________________________
> > > > > discuss mailing list discuss at mpich.org
> > > > > To manage subscription options or unsubscribe:
> > > > > https://lists.mpich.org/mailman/listinfo/discuss
> > > >
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Message: 6
> > > > Date: Wed, 10 Jul 2013 16:50:36 +0200
> > > > From: Thomas Ropars <thomas.ropars at epfl.ch>
> > > > To: discuss at mpich.org
> > > > Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple ring
> > > > test over TCP
> > > > Message-ID: <51DD74BC.3020009 at epfl.ch>
> > > > Content-Type: text/plain; charset=UTF-8; format=flowed
> > > >
> > > > Yes, you are right, sorry for disturbing.
> > > >
> > > > On 07/10/2013 03:39 PM, Wesley Bland wrote:
> > > > > The value of previous for rank 0 in your code is -1. MPICH is
> > > > complaining because all of the requests to receive a message from -1
> > are
> > > > still pending when you try to finalize. You need to make sure that
> you
> > > are
> > > > receiving from valid ranks.
> > > > >
> > > > > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <thomas.ropars at epfl.ch>
> > > > wrote:
> > > > >
> > > > >> Yes sure. Here it is.
> > > > >>
> > > > >> Thomas
> > > > >>
> > > > >> On 07/10/2013 02:23 PM, Wesley Bland wrote:
> > > > >>> Can you send us the smallest chunk of code that still exhibits
> this
> > > > error?
> > > > >>>
> > > > >>> Wesley
> > > > >>>
> > > > >>> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <
> thomas.ropars at epfl.ch>
> > > > wrote:
> > > > >>>
> > > > >>>> Hi all,
> > > > >>>>
> > > > >>>> I get the following error when I try to run a simple application
> > > > implementing a ring (each process sends to rank+1 and receives from
> > > > rank-1). More precisely, the error occurs during the call to
> > > MPI_Finalize():
> > > > >>>>
> > > > >>>> Assertion failed in file
> > > > src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363:
> > > sc->pg_is_set
> > > > >>>> internal ABORT - process 0
> > > > >>>>
> > > > >>>> Does anybody else also noticed the same error?
> > > > >>>>
> > > > >>>> Here are all the details about my test:
> > > > >>>> - The error is generated with mpich-3.0.2 (but I noticed the
> exact
> > > > same error with mpich-3.0.4)
> > > > >>>> - I am using IPoIB for communication between nodes (The same
> thing
> > > > happens over Ethernet)
> > > > >>>> - The problem comes from TCP links. When all processes are on
> the
> > > > same node, there is no error. As soon as one process is on a remote
> > node,
> > > > the failure occurs.
> > > > >>>> - Note also that the failure does not occur if I run a more
> > complex
> > > > code (eg, a NAS benchmark).
> > > > >>>>
> > > > >>>> Thomas Ropars
> > > > >>>> _______________________________________________
> > > > >>>> discuss mailing list discuss at mpich.org
> > > > >>>> To manage subscription options or unsubscribe:
> > > > >>>> https://lists.mpich.org/mailman/listinfo/discuss
> > > > >>> _______________________________________________
> > > > >>> discuss mailing list discuss at mpich.org
> > > > >>> To manage subscription options or unsubscribe:
> > > > >>> https://lists.mpich.org/mailman/listinfo/discuss
> > > > >>>
> > > > >>>
> > > > >> <ring_clean.c>_______________________________________________
> > > > >> discuss mailing list discuss at mpich.org
> > > > >> To manage subscription options or unsubscribe:
> > > > >> https://lists.mpich.org/mailman/listinfo/discuss
> > > > > _______________________________________________
> > > > > discuss mailing list discuss at mpich.org
> > > > > To manage subscription options or unsubscribe:
> > > > > https://lists.mpich.org/mailman/listinfo/discuss
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Message: 7
> > > > Date: Wed, 10 Jul 2013 10:07:21 -0500
> > > > From: Sufeng Niu <sniu at hawk.iit.edu>
> > > > To: discuss at mpich.org
> > > > Subject: [mpich-discuss] MPI_Win_fence failed
> > > > Message-ID:
> > > > <
> > > > CAFNNHkz_1gC7hfpx0G9j24adO-gDabdmwZ4VuT6jip-fDMhS9A at mail.gmail.com>
> > > > Content-Type: text/plain; charset="iso-8859-1"
> > > >
> > > > Hello,
> > > >
> > > > I used MPI RMA in my program, but the program stop at the
> > MPI_Win_fence,
> > > I
> > > > have a master process receive data from udp socket. Other processes
> use
> > > > MPI_Get to access data.
> > > >
> > > > master process:
> > > >
> > > > MPI_Create(...)
> > > > for(...){
> > > > /* udp recv operation */
> > > >
> > > > MPI_Barrier // to let other process know data received from udp is
> > ready
> > > >
> > > > MPI_Win_fence(0, win);
> > > > MPI_Win_fence(0, win);
> > > >
> > > > }
> > > >
> > > > other processes:
> > > >
> > > > for(...){
> > > >
> > > > MPI_Barrier // sync for udp data ready
> > > >
> > > > MPI_Win_fence(0, win);
> > > >
> > > > MPI_Get();
> > > >
> > > > MPI_Win_fence(0, win); <-- program stopped here
> > > >
> > > > /* other operation */
> > > > }
> > > >
> > > > I found that the program stopped at second MPI_Win_fence, the
> terminal
> > > > output is:
> > > >
> > > >
> > > >
> > > >
> > >
> >
> ===================================================================================
> > > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> > > > = EXIT CODE: 11
> > > > = CLEANING UP REMAINING PROCESSES
> > > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> > > >
> > > >
> > >
> >
> ===================================================================================
> > > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault
> > > > (signal 11)
> > > > This typically refers to a problem with your application.
> > > > Please see the FAQ page for debugging suggestions
> > > >
> > > > Do you have any suggestions? Thank you very much!
> > > >
> > > > --
> > > > Best Regards,
> > > > Sufeng Niu
> > > > ECASP lab, ECE department, Illinois Institute of Technology
> > > > Tel: 312-731-7219
> > > > -------------- next part --------------
> > > > An HTML attachment was scrubbed...
> > > > URL: <
> > > >
> > >
> >
> http://lists.mpich.org/pipermail/discuss/attachments/20130710/375a95ac/attachment-0001.html
> > > > >
> > > >
> > > > ------------------------------
> > > >
> > > > Message: 8
> > > > Date: Wed, 10 Jul 2013 11:12:45 -0400
> > > > From: Jim Dinan <james.dinan at gmail.com>
> > > > To: discuss at mpich.org
> > > > Subject: Re: [mpich-discuss] MPI_Win_fence failed
> > > > Message-ID:
> > > > <CAOoEU4F3hX=y3yrJKYKucNeiueQYBeR_3OQas9E+mg+GM6Rz=
> > > > w at mail.gmail.com>
> > > > Content-Type: text/plain; charset="iso-8859-1"
> > > >
> > > > It's hard to tell where the segmentation fault is coming from. Can
> you
> > > use
> > > > a debugger to generate a backtrace?
> > > >
> > > > ~Jim.
> > > >
> > > >
> > > > On Wed, Jul 10, 2013 at 11:07 AM, Sufeng Niu <sniu at hawk.iit.edu>
> > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I used MPI RMA in my program, but the program stop at the
> > > MPI_Win_fence,
> > > > I
> > > > > have a master process receive data from udp socket. Other processes
> > use
> > > > > MPI_Get to access data.
> > > > >
> > > > > master process:
> > > > >
> > > > > MPI_Create(...)
> > > > > for(...){
> > > > > /* udp recv operation */
> > > > >
> > > > > MPI_Barrier // to let other process know data received from udp is
> > > ready
> > > > >
> > > > > MPI_Win_fence(0, win);
> > > > > MPI_Win_fence(0, win);
> > > > >
> > > > > }
> > > > >
> > > > > other processes:
> > > > >
> > > > > for(...){
> > > > >
> > > > > MPI_Barrier // sync for udp data ready
> > > > >
> > > > > MPI_Win_fence(0, win);
> > > > >
> > > > > MPI_Get();
> > > > >
> > > > > MPI_Win_fence(0, win); <-- program stopped here
> > > > >
> > > > > /* other operation */
> > > > > }
> > > > >
> > > > > I found that the program stopped at second MPI_Win_fence, the
> > terminal
> > > > > output is:
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> ===================================================================================
> > > > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> > > > > = EXIT CODE: 11
> > > > > = CLEANING UP REMAINING PROCESSES
> > > > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> > > > >
> > > > >
> > > >
> > >
> >
> ===================================================================================
> > > > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation
> fault
> > > > > (signal 11)
> > > > > This typically refers to a problem with your application.
> > > > > Please see the FAQ page for debugging suggestions
> > > > >
> > > > > Do you have any suggestions? Thank you very much!
> > > > >
> > > > > --
> > > > > Best Regards,
> > > > > Sufeng Niu
> > > > > ECASP lab, ECE department, Illinois Institute of Technology
> > > > > Tel: 312-731-7219
> > > > >
> > > > > _______________________________________________
> > > > > discuss mailing list discuss at mpich.org
> > > > > To manage subscription options or unsubscribe:
> > > > > https://lists.mpich.org/mailman/listinfo/discuss
> > > > >
> > > > -------------- next part --------------
> > > > An HTML attachment was scrubbed...
> > > > URL: <
> > > >
> > >
> >
> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48c5f337/attachment.html
> > > > >
> > > >
> > > > ------------------------------
> > > >
> > > > _______________________________________________
> > > > discuss mailing list
> > > > discuss at mpich.org
> > > > https://lists.mpich.org/mailman/listinfo/discuss
> > > >
> > > > End of discuss Digest, Vol 9, Issue 27
> > > > **************************************
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards,
> > > Sufeng Niu
> > > ECASP lab, ECE department, Illinois Institute of Technology
> > > Tel: 312-731-7219
> > > -------------- next part --------------
> > > An HTML attachment was scrubbed...
> > > URL: <
> > >
> >
> http://lists.mpich.org/pipermail/discuss/attachments/20130710/57a5e76f/attachment.html
> > > >
> > >
> > > ------------------------------
> > >
> > > _______________________________________________
> > > discuss mailing list
> > > discuss at mpich.org
> > > https://lists.mpich.org/mailman/listinfo/discuss
> > >
> > > End of discuss Digest, Vol 9, Issue 28
> > > **************************************
> > >
> >
> >
> >
> > --
> > Best Regards,
> > Sufeng Niu
> > ECASP lab, ECE department, Illinois Institute of Technology
> > Tel: 312-731-7219
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <
> >
> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48296a33/attachment.html
> > >
> > -------------- next part --------------
> > A non-text attachment was scrubbed...
> > Name: Screenshot.png
> > Type: image/png
> > Size: 131397 bytes
> > Desc: not available
> > URL: <
> >
> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48296a33/attachment.png
> > >
> >
> > ------------------------------
> >
> > _______________________________________________
> > discuss mailing list
> > discuss at mpich.org
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
> > End of discuss Digest, Vol 9, Issue 29
> > **************************************
> >
>
>
>
> --
> Best Regards,
> Sufeng Niu
> ECASP lab, ECE department, Illinois Institute of Technology
> Tel: 312-731-7219
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.mpich.org/pipermail/discuss/attachments/20130710/7c5cb5bf/attachment.html
> >
>
> ------------------------------
>
> _______________________________________________
> discuss mailing list
> discuss at mpich.org
> https://lists.mpich.org/mailman/listinfo/discuss
>
> End of discuss Digest, Vol 9, Issue 30
> **************************************
>
--
Best Regards,
Sufeng Niu
ECASP lab, ECE department, Illinois Institute of Technology
Tel: 312-731-7219
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130710/2de2b7a5/attachment.html>
More information about the discuss
mailing list