[mpich-discuss] MPI_Win_fence failed

Sufeng Niu sniu at hawk.iit.edu
Wed Jul 10 11:30:36 CDT 2013


Hi Jim,

Thanks a lot for your reply. the basic way for me to debugging is
barrier+printf, right now I only have an evaluation version of totalview.
the backtrace using totalview shown below. the udp is the udp collection
and create RMA window, image_rms doing MPI_Get to access the window

 There is a segment violation, but I don't know why the program stopped at
MPI_Win_fence.

Thanks a lot!







On Wed, Jul 10, 2013 at 10:12 AM, <discuss-request at mpich.org> wrote:

> Send discuss mailing list submissions to
>         discuss at mpich.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.mpich.org/mailman/listinfo/discuss
> or, via email, send a message with subject or body 'help' to
>         discuss-request at mpich.org
>
> You can reach the person managing the list at
>         discuss-owner at mpich.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of discuss digest..."
>
>
> Today's Topics:
>
>    1. Re:  MPICH3.0.4 make fails with "No rule to make  target..."
>       (Wesley Bland)
>    2. Re:  Error in MPI_Finalize on a simple ring test  over TCP
>       (Wesley Bland)
>    3.  Restrict number of cores, not threads (Bob Ilgner)
>    4. Re:  Restrict number of cores, not threads (Wesley Bland)
>    5. Re:  Restrict number of cores, not threads (Wesley Bland)
>    6. Re:  Error in MPI_Finalize on a simple ring test over TCP
>       (Thomas Ropars)
>    7.  MPI_Win_fence failed (Sufeng Niu)
>    8. Re:  MPI_Win_fence failed (Jim Dinan)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 10 Jul 2013 08:29:06 -0500
> From: Wesley Bland <wbland at mcs.anl.gov>
> To: discuss at mpich.org
> Subject: Re: [mpich-discuss] MPICH3.0.4 make fails with "No rule to
>         make    target..."
> Message-ID: <F48FC916-31F7-4F82-95F8-2D6A6C45264F at mcs.anl.gov>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Unfortunately, due to the lack of developer resources and interest, the
> last version of MPICH which was supported on Windows was 1.4.1p. You can
> find that version on the downloads page:
>
> http://www.mpich.org/downloads/
>
> Alternatively, Microsoft maintains a derivative of MPICH which should
> provide the features you need. You also find a link to that on the
> downloads page above.
>
> Wesley
>
> On Jul 10, 2013, at 1:16 AM, Don Warren <don.warren at gmail.com> wrote:
>
> > Hello,
> >
> > As requested in the installation guide, I'm informing this list of a
> failure to correctly make MPICH3.0.4 on a Win7 system.  The specific error
> encountered is
> > "make[2]: *** No rule to make target
> `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.am', needed by
> `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.in'.  Stop."
> >
> > I have confirmed that both Makefile.am and Makefile.in exist in the
> directory listed.  I'm attaching the c.txt and the m.txt files.
> >
> > Possibly of interest is that the command "make clean" fails at exactly
> the same folder, with exactly the same error message as shown in m.txt and
> above.
> >
> > Any advice you can give would be appreciated.  I'm attempting to get
> FLASH running on my computer, which seems to require MPICH.
> >
> > Regards,
> > Don Warren
> > <config-make-outputs.zip>_______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.mpich.org/pipermail/discuss/attachments/20130710/69b497f1/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Wed, 10 Jul 2013 08:39:47 -0500
> From: Wesley Bland <wbland at mcs.anl.gov>
> To: discuss at mpich.org
> Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple ring
>         test    over TCP
> Message-ID: <D5999106-2A75-4091-8B0F-EAFA22880862 at mcs.anl.gov>
> Content-Type: text/plain; charset=us-ascii
>
> The value of previous for rank 0 in your code is -1. MPICH is complaining
> because all of the requests to receive a message from -1 are still pending
> when you try to finalize. You need to make sure that you are receiving from
> valid ranks.
>
> On Jul 10, 2013, at 7:50 AM, Thomas Ropars <thomas.ropars at epfl.ch> wrote:
>
> > Yes sure. Here it is.
> >
> > Thomas
> >
> > On 07/10/2013 02:23 PM, Wesley Bland wrote:
> >> Can you send us the smallest chunk of code that still exhibits this
> error?
> >>
> >> Wesley
> >>
> >> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <thomas.ropars at epfl.ch>
> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I get the following error when I try to run a simple application
> implementing a ring (each process sends to rank+1 and receives from
> rank-1). More precisely, the error occurs during the call to MPI_Finalize():
> >>>
> >>> Assertion failed in file
> src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363: sc->pg_is_set
> >>> internal ABORT - process 0
> >>>
> >>> Does anybody else also noticed the same error?
> >>>
> >>> Here are all the details about my test:
> >>> - The error is generated with mpich-3.0.2 (but I noticed the exact
> same error with mpich-3.0.4)
> >>> - I am using IPoIB for communication between nodes (The same thing
> happens over Ethernet)
> >>> - The problem comes from TCP links. When all processes are on the same
> node, there is no error. As soon as one process is on a remote node, the
> failure occurs.
> >>> - Note also that the failure does not occur if I run a more complex
> code (eg, a NAS benchmark).
> >>>
> >>> Thomas Ropars
> >>> _______________________________________________
> >>> discuss mailing list     discuss at mpich.org
> >>> To manage subscription options or unsubscribe:
> >>> https://lists.mpich.org/mailman/listinfo/discuss
> >> _______________________________________________
> >> discuss mailing list     discuss at mpich.org
> >> To manage subscription options or unsubscribe:
> >> https://lists.mpich.org/mailman/listinfo/discuss
> >>
> >>
> >
> > <ring_clean.c>_______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 10 Jul 2013 16:41:27 +0200
> From: Bob Ilgner <bobilgner at gmail.com>
> To: mpich-discuss at mcs.anl.gov
> Subject: [mpich-discuss] Restrict number of cores, not threads
> Message-ID:
>         <
> CAKv15b-QgmHkVkoiTFmP3EZXvyy6sc_QeqHQgbMUhnr3Xh9ecA at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Dear all,
>
> I am working on a shared memory processor with 256 cores. I am working from
> the command line directly.
>
> Can I restict the number of cores that I deploy.The command
>
> mpirun -n 100 myprog
>
>
> will automatically start on 100 cores. I wish to use only 10 cores and have
> 10 threads on each core. Can I do this with mpich ?  Rememebre that this an
> smp abd I can not identify each core individually(as in a cluster)
>
> Regards, bob
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.mpich.org/pipermail/discuss/attachments/20130710/ec659e91/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 4
> Date: Wed, 10 Jul 2013 09:46:38 -0500
> From: Wesley Bland <wbland at mcs.anl.gov>
> To: discuss at mpich.org
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Restrict number of cores, not threads
> Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
> Content-Type: text/plain; charset=iso-8859-1
>
> Threads in MPI are not ranks. When you say you want to launch with -n 100,
> you will always get 100 processes, not threads. If you want 10 threads on
> 10 cores, you will need to launch with -n 10, then add your threads
> according to your threading library.
>
> Note that threads in MPI do not get their own rank currently. They all
> share the same rank as the process in which they reside, so if you need to
> be able to handle things with different ranks, you'll need to use actual
> processes.
>
> Wesley
>
> On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com> wrote:
>
> > Dear all,
> >
> > I am working on a shared memory processor with 256 cores. I am working
> from the command line directly.
> >
> > Can I restict the number of cores that I deploy.The command
> >
> > mpirun -n 100 myprog
> >
> >
> > will automatically start on 100 cores. I wish to use only 10 cores and
> have 10 threads on each core. Can I do this with mpich ?  Rememebre that
> this an smp abd I can not identify each core individually(as in a cluster)
> >
> > Regards, bob
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> ------------------------------
>
> Message: 5
> Date: Wed, 10 Jul 2013 09:46:38 -0500
> From: Wesley Bland <wbland at mcs.anl.gov>
> To: discuss at mpich.org
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Restrict number of cores, not threads
> Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
> Content-Type: text/plain; charset=iso-8859-1
>
> Threads in MPI are not ranks. When you say you want to launch with -n 100,
> you will always get 100 processes, not threads. If you want 10 threads on
> 10 cores, you will need to launch with -n 10, then add your threads
> according to your threading library.
>
> Note that threads in MPI do not get their own rank currently. They all
> share the same rank as the process in which they reside, so if you need to
> be able to handle things with different ranks, you'll need to use actual
> processes.
>
> Wesley
>
> On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com> wrote:
>
> > Dear all,
> >
> > I am working on a shared memory processor with 256 cores. I am working
> from the command line directly.
> >
> > Can I restict the number of cores that I deploy.The command
> >
> > mpirun -n 100 myprog
> >
> >
> > will automatically start on 100 cores. I wish to use only 10 cores and
> have 10 threads on each core. Can I do this with mpich ?  Rememebre that
> this an smp abd I can not identify each core individually(as in a cluster)
> >
> > Regards, bob
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> ------------------------------
>
> Message: 6
> Date: Wed, 10 Jul 2013 16:50:36 +0200
> From: Thomas Ropars <thomas.ropars at epfl.ch>
> To: discuss at mpich.org
> Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple ring
>         test over TCP
> Message-ID: <51DD74BC.3020009 at epfl.ch>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> Yes, you are right, sorry for disturbing.
>
> On 07/10/2013 03:39 PM, Wesley Bland wrote:
> > The value of previous for rank 0 in your code is -1. MPICH is
> complaining because all of the requests to receive a message from -1 are
> still pending when you try to finalize. You need to make sure that you are
> receiving from valid ranks.
> >
> > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <thomas.ropars at epfl.ch>
> wrote:
> >
> >> Yes sure. Here it is.
> >>
> >> Thomas
> >>
> >> On 07/10/2013 02:23 PM, Wesley Bland wrote:
> >>> Can you send us the smallest chunk of code that still exhibits this
> error?
> >>>
> >>> Wesley
> >>>
> >>> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <thomas.ropars at epfl.ch>
> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> I get the following error when I try to run a simple application
> implementing a ring (each process sends to rank+1 and receives from
> rank-1). More precisely, the error occurs during the call to MPI_Finalize():
> >>>>
> >>>> Assertion failed in file
> src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363: sc->pg_is_set
> >>>> internal ABORT - process 0
> >>>>
> >>>> Does anybody else also noticed the same error?
> >>>>
> >>>> Here are all the details about my test:
> >>>> - The error is generated with mpich-3.0.2 (but I noticed the exact
> same error with mpich-3.0.4)
> >>>> - I am using IPoIB for communication between nodes (The same thing
> happens over Ethernet)
> >>>> - The problem comes from TCP links. When all processes are on the
> same node, there is no error. As soon as one process is on a remote node,
> the failure occurs.
> >>>> - Note also that the failure does not occur if I run a more complex
> code (eg, a NAS benchmark).
> >>>>
> >>>> Thomas Ropars
> >>>> _______________________________________________
> >>>> discuss mailing list     discuss at mpich.org
> >>>> To manage subscription options or unsubscribe:
> >>>> https://lists.mpich.org/mailman/listinfo/discuss
> >>> _______________________________________________
> >>> discuss mailing list     discuss at mpich.org
> >>> To manage subscription options or unsubscribe:
> >>> https://lists.mpich.org/mailman/listinfo/discuss
> >>>
> >>>
> >> <ring_clean.c>_______________________________________________
> >> discuss mailing list     discuss at mpich.org
> >> To manage subscription options or unsubscribe:
> >> https://lists.mpich.org/mailman/listinfo/discuss
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
> >
>
>
>
> ------------------------------
>
> Message: 7
> Date: Wed, 10 Jul 2013 10:07:21 -0500
> From: Sufeng Niu <sniu at hawk.iit.edu>
> To: discuss at mpich.org
> Subject: [mpich-discuss] MPI_Win_fence failed
> Message-ID:
>         <
> CAFNNHkz_1gC7hfpx0G9j24adO-gDabdmwZ4VuT6jip-fDMhS9A at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hello,
>
> I used MPI RMA in my program, but the program stop at the MPI_Win_fence, I
> have a master process receive data from udp socket. Other processes use
> MPI_Get to access data.
>
> master process:
>
> MPI_Create(...)
> for(...){
> /* udp recv operation */
>
> MPI_Barrier  // to let other process know data received from udp is ready
>
> MPI_Win_fence(0, win);
> MPI_Win_fence(0, win);
>
> }
>
> other processes:
>
> for(...){
>
> MPI_Barrier  // sync for udp data ready
>
> MPI_Win_fence(0, win);
>
> MPI_Get();
>
> MPI_Win_fence(0, win);  <-- program stopped here
>
> /* other operation */
> }
>
> I found that the program stopped at second MPI_Win_fence, the terminal
> output is:
>
>
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 11
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ===================================================================================
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault
> (signal 11)
> This typically refers to a problem with your application.
> Please see the FAQ page for debugging suggestions
>
> Do you have any suggestions? Thank you very much!
>
> --
> Best Regards,
> Sufeng Niu
> ECASP lab, ECE department, Illinois Institute of Technology
> Tel: 312-731-7219
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.mpich.org/pipermail/discuss/attachments/20130710/375a95ac/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 8
> Date: Wed, 10 Jul 2013 11:12:45 -0400
> From: Jim Dinan <james.dinan at gmail.com>
> To: discuss at mpich.org
> Subject: Re: [mpich-discuss] MPI_Win_fence failed
> Message-ID:
>         <CAOoEU4F3hX=y3yrJKYKucNeiueQYBeR_3OQas9E+mg+GM6Rz=
> w at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> It's hard to tell where the segmentation fault is coming from.  Can you use
> a debugger to generate a backtrace?
>
>  ~Jim.
>
>
> On Wed, Jul 10, 2013 at 11:07 AM, Sufeng Niu <sniu at hawk.iit.edu> wrote:
>
> > Hello,
> >
> > I used MPI RMA in my program, but the program stop at the MPI_Win_fence,
> I
> > have a master process receive data from udp socket. Other processes use
> > MPI_Get to access data.
> >
> > master process:
> >
> > MPI_Create(...)
> > for(...){
> > /* udp recv operation */
> >
> > MPI_Barrier  // to let other process know data received from udp is ready
> >
> > MPI_Win_fence(0, win);
> > MPI_Win_fence(0, win);
> >
> > }
> >
> > other processes:
> >
> > for(...){
> >
> > MPI_Barrier  // sync for udp data ready
> >
> > MPI_Win_fence(0, win);
> >
> > MPI_Get();
> >
> > MPI_Win_fence(0, win);  <-- program stopped here
> >
> > /* other operation */
> > }
> >
> > I found that the program stopped at second MPI_Win_fence, the terminal
> > output is:
> >
> >
> >
> >
> ===================================================================================
> > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> > =   EXIT CODE: 11
> > =   CLEANING UP REMAINING PROCESSES
> > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> >
> >
> ===================================================================================
> > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault
> > (signal 11)
> > This typically refers to a problem with your application.
> > Please see the FAQ page for debugging suggestions
> >
> > Do you have any suggestions? Thank you very much!
> >
> > --
> > Best Regards,
> > Sufeng Niu
> > ECASP lab, ECE department, Illinois Institute of Technology
> > Tel: 312-731-7219
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48c5f337/attachment.html
> >
>
> ------------------------------
>
> _______________________________________________
> discuss mailing list
> discuss at mpich.org
> https://lists.mpich.org/mailman/listinfo/discuss
>
> End of discuss Digest, Vol 9, Issue 27
> **************************************
>



-- 
Best Regards,
Sufeng Niu
ECASP lab, ECE department, Illinois Institute of Technology
Tel: 312-731-7219
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130710/57a5e76f/attachment.html>


More information about the discuss mailing list