[mpich-discuss] MPI_Win_fence failed

Jeff Hammond jeff.science at gmail.com
Wed Jul 10 12:05:09 CDT 2013


use dropbox, pastebin, etc. for attachments.  it makes life a lot
easier for everyone.

jeff

On Wed, Jul 10, 2013 at 11:57 AM, Sufeng Niu <sniu at hawk.iit.edu> wrote:
> Sorry, I found that this discussion email cannot add figure or attachment.
>
> the backtrace information is below:
>
> processes               Location                                   PC
> Host     Rank       ID      Status
> 7                            _start
> 0x00402399
> `-7                          _libc_start_main
> 0x3685c1ecdd
>    `-7                       main
> 0x00402474
>       `-7                    dkm
> ...
>         |-6                   image_rms
> 0x004029bb
>         | `-6                 rms
> 0x00402d44
>         |   `-6               PMPI_Win_fence                      0x0040c389
>         |      `-6            MPIDI_Win_fence                     0x004a45f4
>         |        `-6          MPIDI_CH3I_RMAListComplete 0x004a27d3
>         |          `-6        MPIDI_CH3I_Progress               ...
>         `-1                   udp
> 0x004035cf
>           `-1                PMPI_Win_fence                       0x0040c389
>             `-1              MPIDI_Win_fence                      0x004a45a0
>                `-1           MPIDI_CH3I_Progress               0x004292f5
>                  `-1         MPIDI_CH3_PktHandler_Get      0x0049f3f9
>                    `-1       MPIDI_CH3_iSendv                   0x004aa67c
>                      `-       memcpy
> 0x3685c89329  164.54.54.122    0  20.1-13994 Stopped
>
>
>
> On Wed, Jul 10, 2013 at 11:39 AM, <discuss-request at mpich.org> wrote:
>>
>> Send discuss mailing list submissions to
>>         discuss at mpich.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>         https://lists.mpich.org/mailman/listinfo/discuss
>> or, via email, send a message with subject or body 'help' to
>>         discuss-request at mpich.org
>>
>> You can reach the person managing the list at
>>         discuss-owner at mpich.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of discuss digest..."
>>
>>
>> Today's Topics:
>>
>>    1. Re:  MPI_Win_fence failed (Sufeng Niu)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Wed, 10 Jul 2013 11:39:39 -0500
>>
>> From: Sufeng Niu <sniu at hawk.iit.edu>
>> To: discuss at mpich.org
>> Subject: Re: [mpich-discuss] MPI_Win_fence failed
>> Message-ID:
>>
>> <CAFNNHkz8pBfX33icn=+3rdXvqDfWqeu58odpd=mOXLciysHgfg at mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>>
>> Sorry I forget to add screen shot for backtrace. the screen shot is
>> attached.
>>
>> Thanks a lot!
>>
>> Sufeng
>>
>>
>>
>> On Wed, Jul 10, 2013 at 11:30 AM, <discuss-request at mpich.org> wrote:
>>
>> > Send discuss mailing list submissions to
>> >         discuss at mpich.org
>> >
>> > To subscribe or unsubscribe via the World Wide Web, visit
>> >         https://lists.mpich.org/mailman/listinfo/discuss
>> > or, via email, send a message with subject or body 'help' to
>> >         discuss-request at mpich.org
>> >
>> > You can reach the person managing the list at
>> >         discuss-owner at mpich.org
>> >
>> > When replying, please edit your Subject line so it is more specific
>> > than "Re: Contents of discuss digest..."
>> >
>> >
>> > Today's Topics:
>> >
>> >    1. Re:  MPI_Win_fence failed (Sufeng Niu)
>> >
>> >
>> > ----------------------------------------------------------------------
>> >
>> > Message: 1
>> > Date: Wed, 10 Jul 2013 11:30:36 -0500
>> > From: Sufeng Niu <sniu at hawk.iit.edu>
>> > To: discuss at mpich.org
>> > Subject: Re: [mpich-discuss] MPI_Win_fence failed
>> > Message-ID:
>> >         <
>> > CAFNNHkyLj8CbYMmc_w2DA9_+q2Oe3kyus+g6c99ShPk6ZXVkdA at mail.gmail.com>
>> > Content-Type: text/plain; charset="iso-8859-1"
>> >
>> > Hi Jim,
>> >
>> > Thanks a lot for your reply. the basic way for me to debugging is
>> > barrier+printf, right now I only have an evaluation version of
>> > totalview.
>> > the backtrace using totalview shown below. the udp is the udp collection
>> > and create RMA window, image_rms doing MPI_Get to access the window
>> >
>> >  There is a segment violation, but I don't know why the program stopped
>> > at
>> > MPI_Win_fence.
>> >
>> > Thanks a lot!
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Jul 10, 2013 at 10:12 AM, <discuss-request at mpich.org> wrote:
>> >
>> > > Send discuss mailing list submissions to
>> > >         discuss at mpich.org
>> > >
>> > > To subscribe or unsubscribe via the World Wide Web, visit
>> > >         https://lists.mpich.org/mailman/listinfo/discuss
>> > > or, via email, send a message with subject or body 'help' to
>> > >         discuss-request at mpich.org
>> > >
>> > > You can reach the person managing the list at
>> > >         discuss-owner at mpich.org
>> > >
>> > > When replying, please edit your Subject line so it is more specific
>> > > than "Re: Contents of discuss digest..."
>> > >
>> > >
>> > > Today's Topics:
>> > >
>> > >    1. Re:  MPICH3.0.4 make fails with "No rule to make  target..."
>> > >       (Wesley Bland)
>> > >    2. Re:  Error in MPI_Finalize on a simple ring test  over TCP
>> > >       (Wesley Bland)
>> > >    3.  Restrict number of cores, not threads (Bob Ilgner)
>> > >    4. Re:  Restrict number of cores, not threads (Wesley Bland)
>> > >    5. Re:  Restrict number of cores, not threads (Wesley Bland)
>> > >    6. Re:  Error in MPI_Finalize on a simple ring test over TCP
>> > >       (Thomas Ropars)
>> > >    7.  MPI_Win_fence failed (Sufeng Niu)
>> > >    8. Re:  MPI_Win_fence failed (Jim Dinan)
>> > >
>> > >
>> > > ----------------------------------------------------------------------
>> > >
>> > > Message: 1
>> > > Date: Wed, 10 Jul 2013 08:29:06 -0500
>> > > From: Wesley Bland <wbland at mcs.anl.gov>
>> > > To: discuss at mpich.org
>> > > Subject: Re: [mpich-discuss] MPICH3.0.4 make fails with "No rule to
>> > >         make    target..."
>> > > Message-ID: <F48FC916-31F7-4F82-95F8-2D6A6C45264F at mcs.anl.gov>
>> > > Content-Type: text/plain; charset="iso-8859-1"
>> > >
>> > > Unfortunately, due to the lack of developer resources and interest,
>> > > the
>> > > last version of MPICH which was supported on Windows was 1.4.1p. You
>> > > can
>> > > find that version on the downloads page:
>> > >
>> > > http://www.mpich.org/downloads/
>> > >
>> > > Alternatively, Microsoft maintains a derivative of MPICH which should
>> > > provide the features you need. You also find a link to that on the
>> > > downloads page above.
>> > >
>> > > Wesley
>> > >
>> > > On Jul 10, 2013, at 1:16 AM, Don Warren <don.warren at gmail.com> wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > As requested in the installation guide, I'm informing this list of a
>> > > failure to correctly make MPICH3.0.4 on a Win7 system.  The specific
>> > error
>> > > encountered is
>> > > > "make[2]: *** No rule to make target
>> > > `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.am', needed by
>> > > `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.in'.  Stop."
>> > > >
>> > > > I have confirmed that both Makefile.am and Makefile.in exist in the
>> > > directory listed.  I'm attaching the c.txt and the m.txt files.
>> > > >
>> > > > Possibly of interest is that the command "make clean" fails at
>> > > > exactly
>> > > the same folder, with exactly the same error message as shown in m.txt
>> > and
>> > > above.
>> > > >
>> > > > Any advice you can give would be appreciated.  I'm attempting to get
>> > > FLASH running on my computer, which seems to require MPICH.
>> > > >
>> > > > Regards,
>> > > > Don Warren
>> > > >
>> > <config-make-outputs.zip>_______________________________________________
>> > > > discuss mailing list     discuss at mpich.org
>> > > > To manage subscription options or unsubscribe:
>> > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > >
>> > > -------------- next part --------------
>> > > An HTML attachment was scrubbed...
>> > > URL: <
>> > >
>> >
>> > http://lists.mpich.org/pipermail/discuss/attachments/20130710/69b497f1/attachment-0001.html
>> > > >
>> > >
>> > > ------------------------------
>> > >
>> > > Message: 2
>> > > Date: Wed, 10 Jul 2013 08:39:47 -0500
>> > > From: Wesley Bland <wbland at mcs.anl.gov>
>> > > To: discuss at mpich.org
>> > > Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple ring
>> > >         test    over TCP
>> > > Message-ID: <D5999106-2A75-4091-8B0F-EAFA22880862 at mcs.anl.gov>
>> > > Content-Type: text/plain; charset=us-ascii
>> > >
>> > > The value of previous for rank 0 in your code is -1. MPICH is
>> > > complaining
>> > > because all of the requests to receive a message from -1 are still
>> > pending
>> > > when you try to finalize. You need to make sure that you are receiving
>> > from
>> > > valid ranks.
>> > >
>> > > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <thomas.ropars at epfl.ch>
>> > wrote:
>> > >
>> > > > Yes sure. Here it is.
>> > > >
>> > > > Thomas
>> > > >
>> > > > On 07/10/2013 02:23 PM, Wesley Bland wrote:
>> > > >> Can you send us the smallest chunk of code that still exhibits this
>> > > error?
>> > > >>
>> > > >> Wesley
>> > > >>
>> > > >> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <thomas.ropars at epfl.ch>
>> > > wrote:
>> > > >>
>> > > >>> Hi all,
>> > > >>>
>> > > >>> I get the following error when I try to run a simple application
>> > > implementing a ring (each process sends to rank+1 and receives from
>> > > rank-1). More precisely, the error occurs during the call to
>> > MPI_Finalize():
>> > > >>>
>> > > >>> Assertion failed in file
>> > > src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363:
>> > sc->pg_is_set
>> > > >>> internal ABORT - process 0
>> > > >>>
>> > > >>> Does anybody else also noticed the same error?
>> > > >>>
>> > > >>> Here are all the details about my test:
>> > > >>> - The error is generated with mpich-3.0.2 (but I noticed the exact
>> > > same error with mpich-3.0.4)
>> > > >>> - I am using IPoIB for communication between nodes (The same thing
>> > > happens over Ethernet)
>> > > >>> - The problem comes from TCP links. When all processes are on the
>> > same
>> > > node, there is no error. As soon as one process is on a remote node,
>> > > the
>> > > failure occurs.
>> > > >>> - Note also that the failure does not occur if I run a more
>> > > >>> complex
>> > > code (eg, a NAS benchmark).
>> > > >>>
>> > > >>> Thomas Ropars
>> > > >>> _______________________________________________
>> > > >>> discuss mailing list     discuss at mpich.org
>> > > >>> To manage subscription options or unsubscribe:
>> > > >>> https://lists.mpich.org/mailman/listinfo/discuss
>> > > >> _______________________________________________
>> > > >> discuss mailing list     discuss at mpich.org
>> > > >> To manage subscription options or unsubscribe:
>> > > >> https://lists.mpich.org/mailman/listinfo/discuss
>> > > >>
>> > > >>
>> > > >
>> > > > <ring_clean.c>_______________________________________________
>> > > > discuss mailing list     discuss at mpich.org
>> > > > To manage subscription options or unsubscribe:
>> > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > >
>> > >
>> > >
>> > > ------------------------------
>> > >
>> > > Message: 3
>> > > Date: Wed, 10 Jul 2013 16:41:27 +0200
>> > > From: Bob Ilgner <bobilgner at gmail.com>
>> > > To: mpich-discuss at mcs.anl.gov
>> > > Subject: [mpich-discuss] Restrict number of cores, not threads
>> > > Message-ID:
>> > >         <
>> > > CAKv15b-QgmHkVkoiTFmP3EZXvyy6sc_QeqHQgbMUhnr3Xh9ecA at mail.gmail.com>
>> > > Content-Type: text/plain; charset="iso-8859-1"
>> > >
>> > > Dear all,
>> > >
>> > > I am working on a shared memory processor with 256 cores. I am working
>> > from
>> > > the command line directly.
>> > >
>> > > Can I restict the number of cores that I deploy.The command
>> > >
>> > > mpirun -n 100 myprog
>> > >
>> > >
>> > > will automatically start on 100 cores. I wish to use only 10 cores and
>> > have
>> > > 10 threads on each core. Can I do this with mpich ?  Rememebre that
>> > > this
>> > an
>> > > smp abd I can not identify each core individually(as in a cluster)
>> > >
>> > > Regards, bob
>> > > -------------- next part --------------
>> > > An HTML attachment was scrubbed...
>> > > URL: <
>> > >
>> >
>> > http://lists.mpich.org/pipermail/discuss/attachments/20130710/ec659e91/attachment-0001.html
>> > > >
>> > >
>> > > ------------------------------
>> > >
>> > > Message: 4
>> > > Date: Wed, 10 Jul 2013 09:46:38 -0500
>> > > From: Wesley Bland <wbland at mcs.anl.gov>
>> > > To: discuss at mpich.org
>> > > Cc: mpich-discuss at mcs.anl.gov
>> > > Subject: Re: [mpich-discuss] Restrict number of cores, not threads
>> > > Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
>> > > Content-Type: text/plain; charset=iso-8859-1
>> > >
>> > > Threads in MPI are not ranks. When you say you want to launch with -n
>> > 100,
>> > > you will always get 100 processes, not threads. If you want 10 threads
>> > > on
>> > > 10 cores, you will need to launch with -n 10, then add your threads
>> > > according to your threading library.
>> > >
>> > > Note that threads in MPI do not get their own rank currently. They all
>> > > share the same rank as the process in which they reside, so if you
>> > > need
>> > to
>> > > be able to handle things with different ranks, you'll need to use
>> > > actual
>> > > processes.
>> > >
>> > > Wesley
>> > >
>> > > On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com> wrote:
>> > >
>> > > > Dear all,
>> > > >
>> > > > I am working on a shared memory processor with 256 cores. I am
>> > > > working
>> > > from the command line directly.
>> > > >
>> > > > Can I restict the number of cores that I deploy.The command
>> > > >
>> > > > mpirun -n 100 myprog
>> > > >
>> > > >
>> > > > will automatically start on 100 cores. I wish to use only 10 cores
>> > > > and
>> > > have 10 threads on each core. Can I do this with mpich ?  Rememebre
>> > > that
>> > > this an smp abd I can not identify each core individually(as in a
>> > cluster)
>> > > >
>> > > > Regards, bob
>> > > > _______________________________________________
>> > > > discuss mailing list     discuss at mpich.org
>> > > > To manage subscription options or unsubscribe:
>> > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > >
>> > >
>> > >
>> > > ------------------------------
>> > >
>> > > Message: 5
>> > > Date: Wed, 10 Jul 2013 09:46:38 -0500
>> > > From: Wesley Bland <wbland at mcs.anl.gov>
>> > > To: discuss at mpich.org
>> > > Cc: mpich-discuss at mcs.anl.gov
>> > > Subject: Re: [mpich-discuss] Restrict number of cores, not threads
>> > > Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
>> > > Content-Type: text/plain; charset=iso-8859-1
>> > >
>> > > Threads in MPI are not ranks. When you say you want to launch with -n
>> > 100,
>> > > you will always get 100 processes, not threads. If you want 10 threads
>> > > on
>> > > 10 cores, you will need to launch with -n 10, then add your threads
>> > > according to your threading library.
>> > >
>> > > Note that threads in MPI do not get their own rank currently. They all
>> > > share the same rank as the process in which they reside, so if you
>> > > need
>> > to
>> > > be able to handle things with different ranks, you'll need to use
>> > > actual
>> > > processes.
>> > >
>> > > Wesley
>> > >
>> > > On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com> wrote:
>> > >
>> > > > Dear all,
>> > > >
>> > > > I am working on a shared memory processor with 256 cores. I am
>> > > > working
>> > > from the command line directly.
>> > > >
>> > > > Can I restict the number of cores that I deploy.The command
>> > > >
>> > > > mpirun -n 100 myprog
>> > > >
>> > > >
>> > > > will automatically start on 100 cores. I wish to use only 10 cores
>> > > > and
>> > > have 10 threads on each core. Can I do this with mpich ?  Rememebre
>> > > that
>> > > this an smp abd I can not identify each core individually(as in a
>> > cluster)
>> > > >
>> > > > Regards, bob
>> > > > _______________________________________________
>> > > > discuss mailing list     discuss at mpich.org
>> > > > To manage subscription options or unsubscribe:
>> > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > >
>> > >
>> > >
>> > > ------------------------------
>> > >
>> > > Message: 6
>> > > Date: Wed, 10 Jul 2013 16:50:36 +0200
>> > > From: Thomas Ropars <thomas.ropars at epfl.ch>
>> > > To: discuss at mpich.org
>> > > Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple ring
>> > >         test over TCP
>> > > Message-ID: <51DD74BC.3020009 at epfl.ch>
>> > > Content-Type: text/plain; charset=UTF-8; format=flowed
>> > >
>> > > Yes, you are right, sorry for disturbing.
>> > >
>> > > On 07/10/2013 03:39 PM, Wesley Bland wrote:
>> > > > The value of previous for rank 0 in your code is -1. MPICH is
>> > > complaining because all of the requests to receive a message from -1
>> > > are
>> > > still pending when you try to finalize. You need to make sure that you
>> > are
>> > > receiving from valid ranks.
>> > > >
>> > > > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <thomas.ropars at epfl.ch>
>> > > wrote:
>> > > >
>> > > >> Yes sure. Here it is.
>> > > >>
>> > > >> Thomas
>> > > >>
>> > > >> On 07/10/2013 02:23 PM, Wesley Bland wrote:
>> > > >>> Can you send us the smallest chunk of code that still exhibits
>> > > >>> this
>> > > error?
>> > > >>>
>> > > >>> Wesley
>> > > >>>
>> > > >>> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <thomas.ropars at epfl.ch>
>> > > wrote:
>> > > >>>
>> > > >>>> Hi all,
>> > > >>>>
>> > > >>>> I get the following error when I try to run a simple application
>> > > implementing a ring (each process sends to rank+1 and receives from
>> > > rank-1). More precisely, the error occurs during the call to
>> > MPI_Finalize():
>> > > >>>>
>> > > >>>> Assertion failed in file
>> > > src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363:
>> > sc->pg_is_set
>> > > >>>> internal ABORT - process 0
>> > > >>>>
>> > > >>>> Does anybody else also noticed the same error?
>> > > >>>>
>> > > >>>> Here are all the details about my test:
>> > > >>>> - The error is generated with mpich-3.0.2 (but I noticed the
>> > > >>>> exact
>> > > same error with mpich-3.0.4)
>> > > >>>> - I am using IPoIB for communication between nodes (The same
>> > > >>>> thing
>> > > happens over Ethernet)
>> > > >>>> - The problem comes from TCP links. When all processes are on the
>> > > same node, there is no error. As soon as one process is on a remote
>> > > node,
>> > > the failure occurs.
>> > > >>>> - Note also that the failure does not occur if I run a more
>> > > >>>> complex
>> > > code (eg, a NAS benchmark).
>> > > >>>>
>> > > >>>> Thomas Ropars
>> > > >>>> _______________________________________________
>> > > >>>> discuss mailing list     discuss at mpich.org
>> > > >>>> To manage subscription options or unsubscribe:
>> > > >>>> https://lists.mpich.org/mailman/listinfo/discuss
>> > > >>> _______________________________________________
>> > > >>> discuss mailing list     discuss at mpich.org
>> > > >>> To manage subscription options or unsubscribe:
>> > > >>> https://lists.mpich.org/mailman/listinfo/discuss
>> > > >>>
>> > > >>>
>> > > >> <ring_clean.c>_______________________________________________
>> > > >> discuss mailing list     discuss at mpich.org
>> > > >> To manage subscription options or unsubscribe:
>> > > >> https://lists.mpich.org/mailman/listinfo/discuss
>> > > > _______________________________________________
>> > > > discuss mailing list     discuss at mpich.org
>> > > > To manage subscription options or unsubscribe:
>> > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > > >
>> > > >
>> > >
>> > >
>> > >
>> > > ------------------------------
>> > >
>> > > Message: 7
>> > > Date: Wed, 10 Jul 2013 10:07:21 -0500
>> > > From: Sufeng Niu <sniu at hawk.iit.edu>
>> > > To: discuss at mpich.org
>> > > Subject: [mpich-discuss] MPI_Win_fence failed
>> > > Message-ID:
>> > >         <
>> > > CAFNNHkz_1gC7hfpx0G9j24adO-gDabdmwZ4VuT6jip-fDMhS9A at mail.gmail.com>
>> > > Content-Type: text/plain; charset="iso-8859-1"
>> > >
>> > > Hello,
>> > >
>> > > I used MPI RMA in my program, but the program stop at the
>> > > MPI_Win_fence,
>> > I
>> > > have a master process receive data from udp socket. Other processes
>> > > use
>> > > MPI_Get to access data.
>> > >
>> > > master process:
>> > >
>> > > MPI_Create(...)
>> > > for(...){
>> > > /* udp recv operation */
>> > >
>> > > MPI_Barrier  // to let other process know data received from udp is
>> > > ready
>> > >
>> > > MPI_Win_fence(0, win);
>> > > MPI_Win_fence(0, win);
>> > >
>> > > }
>> > >
>> > > other processes:
>> > >
>> > > for(...){
>> > >
>> > > MPI_Barrier  // sync for udp data ready
>> > >
>> > > MPI_Win_fence(0, win);
>> > >
>> > > MPI_Get();
>> > >
>> > > MPI_Win_fence(0, win);  <-- program stopped here
>> > >
>> > > /* other operation */
>> > > }
>> > >
>> > > I found that the program stopped at second MPI_Win_fence, the terminal
>> > > output is:
>> > >
>> > >
>> > >
>> > >
>> >
>> > ===================================================================================
>> > > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> > > =   EXIT CODE: 11
>> > > =   CLEANING UP REMAINING PROCESSES
>> > > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> > >
>> > >
>> >
>> > ===================================================================================
>> > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault
>> > > (signal 11)
>> > > This typically refers to a problem with your application.
>> > > Please see the FAQ page for debugging suggestions
>> > >
>> > > Do you have any suggestions? Thank you very much!
>> > >
>> > > --
>> > > Best Regards,
>> > > Sufeng Niu
>> > > ECASP lab, ECE department, Illinois Institute of Technology
>> > > Tel: 312-731-7219
>> > > -------------- next part --------------
>> > > An HTML attachment was scrubbed...
>> > > URL: <
>> > >
>> >
>> > http://lists.mpich.org/pipermail/discuss/attachments/20130710/375a95ac/attachment-0001.html
>> > > >
>> > >
>> > > ------------------------------
>> > >
>> > > Message: 8
>> > > Date: Wed, 10 Jul 2013 11:12:45 -0400
>> > > From: Jim Dinan <james.dinan at gmail.com>
>> > > To: discuss at mpich.org
>> > > Subject: Re: [mpich-discuss] MPI_Win_fence failed
>> > > Message-ID:
>> > >         <CAOoEU4F3hX=y3yrJKYKucNeiueQYBeR_3OQas9E+mg+GM6Rz=
>> > > w at mail.gmail.com>
>> > > Content-Type: text/plain; charset="iso-8859-1"
>> > >
>> > > It's hard to tell where the segmentation fault is coming from.  Can
>> > > you
>> > use
>> > > a debugger to generate a backtrace?
>> > >
>> > >  ~Jim.
>> > >
>> > >
>> > > On Wed, Jul 10, 2013 at 11:07 AM, Sufeng Niu <sniu at hawk.iit.edu>
>> > > wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > I used MPI RMA in my program, but the program stop at the
>> > MPI_Win_fence,
>> > > I
>> > > > have a master process receive data from udp socket. Other processes
>> > > > use
>> > > > MPI_Get to access data.
>> > > >
>> > > > master process:
>> > > >
>> > > > MPI_Create(...)
>> > > > for(...){
>> > > > /* udp recv operation */
>> > > >
>> > > > MPI_Barrier  // to let other process know data received from udp is
>> > ready
>> > > >
>> > > > MPI_Win_fence(0, win);
>> > > > MPI_Win_fence(0, win);
>> > > >
>> > > > }
>> > > >
>> > > > other processes:
>> > > >
>> > > > for(...){
>> > > >
>> > > > MPI_Barrier  // sync for udp data ready
>> > > >
>> > > > MPI_Win_fence(0, win);
>> > > >
>> > > > MPI_Get();
>> > > >
>> > > > MPI_Win_fence(0, win);  <-- program stopped here
>> > > >
>> > > > /* other operation */
>> > > > }
>> > > >
>> > > > I found that the program stopped at second MPI_Win_fence, the
>> > > > terminal
>> > > > output is:
>> > > >
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> > ===================================================================================
>> > > > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> > > > =   EXIT CODE: 11
>> > > > =   CLEANING UP REMAINING PROCESSES
>> > > > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> > > >
>> > > >
>> > >
>> >
>> > ===================================================================================
>> > > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault
>> > > > (signal 11)
>> > > > This typically refers to a problem with your application.
>> > > > Please see the FAQ page for debugging suggestions
>> > > >
>> > > > Do you have any suggestions? Thank you very much!
>> > > >
>> > > > --
>> > > > Best Regards,
>> > > > Sufeng Niu
>> > > > ECASP lab, ECE department, Illinois Institute of Technology
>> > > > Tel: 312-731-7219
>> > > >
>> > > > _______________________________________________
>> > > > discuss mailing list     discuss at mpich.org
>> > > > To manage subscription options or unsubscribe:
>> > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > > >
>> > > -------------- next part --------------
>> > > An HTML attachment was scrubbed...
>> > > URL: <
>> > >
>> >
>> > http://lists.mpich.org/pipermail/discuss/attachments/20130710/48c5f337/attachment.html
>> > > >
>> > >
>> > > ------------------------------
>> > >
>> > > _______________________________________________
>> > > discuss mailing list
>> > > discuss at mpich.org
>> > > https://lists.mpich.org/mailman/listinfo/discuss
>> > >
>> > > End of discuss Digest, Vol 9, Issue 27
>> > > **************************************
>> > >
>> >
>> >
>> >
>> > --
>> > Best Regards,
>> > Sufeng Niu
>> > ECASP lab, ECE department, Illinois Institute of Technology
>> > Tel: 312-731-7219
>> > -------------- next part --------------
>> > An HTML attachment was scrubbed...
>> > URL: <
>> >
>> > http://lists.mpich.org/pipermail/discuss/attachments/20130710/57a5e76f/attachment.html
>> > >
>> >
>> > ------------------------------
>> >
>> > _______________________________________________
>> > discuss mailing list
>> > discuss at mpich.org
>> > https://lists.mpich.org/mailman/listinfo/discuss
>> >
>> > End of discuss Digest, Vol 9, Issue 28
>> > **************************************
>> >
>>
>>
>>
>> --
>> Best Regards,
>> Sufeng Niu
>> ECASP lab, ECE department, Illinois Institute of Technology
>> Tel: 312-731-7219
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> <http://lists.mpich.org/pipermail/discuss/attachments/20130710/48296a33/attachment.html>
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: Screenshot.png
>> Type: image/png
>> Size: 131397 bytes
>> Desc: not available
>> URL:
>> <http://lists.mpich.org/pipermail/discuss/attachments/20130710/48296a33/attachment.png>
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> discuss mailing list
>> discuss at mpich.org
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>> End of discuss Digest, Vol 9, Issue 29
>> **************************************
>
>
>
>
> --
> Best Regards,
> Sufeng Niu
> ECASP lab, ECE department, Illinois Institute of Technology
> Tel: 312-731-7219
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



-- 
Jeff Hammond
jeff.science at gmail.com



More information about the discuss mailing list