[mpich-discuss] MPI_Win_fence failed
Jim Dinan
james.dinan at gmail.com
Wed Jul 10 17:11:38 CDT 2013
>From that backtrace, it looks like the displacement/datatype that you gave
in the call to MPI_Get() caused the target process to access an invalid
location in memory. MPICH does not check whether the window accesses at a
process targeted by RMA operations are constrained to the window. I would
start by making sure that your gets are contained within the window at the
target process.
~Jim.
On Wed, Jul 10, 2013 at 1:33 PM, Sufeng Niu <sniu at hawk.iit.edu> wrote:
> Hi Jeff
>
> Sorry to send so many emails which messed up discuss group email.
>
> I found that the scientific image is too large to upload on github. so I
> put it on the ftp:
> ftp://ftp.xray.aps.anl.gov/pub/sector8/ there is 55Fe_run5_dark.tif file.
>
> just put the tif file with the source code. Sorry again on my frequently
> email broadcast. Thank you so much for your debugging help
>
> Sufeng
>
>
> On Wed, Jul 10, 2013 at 12:08 PM, <discuss-request at mpich.org> wrote:
>
>> Send discuss mailing list submissions to
>> discuss at mpich.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> https://lists.mpich.org/mailman/listinfo/discuss
>> or, via email, send a message with subject or body 'help' to
>> discuss-request at mpich.org
>>
>> You can reach the person managing the list at
>> discuss-owner at mpich.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of discuss digest..."
>>
>>
>> Today's Topics:
>>
>> 1. Re: MPI_Win_fence failed (Jeff Hammond)
>> 2. Re: MPI_Win_fence failed (Sufeng Niu)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Wed, 10 Jul 2013 12:05:09 -0500
>> From: Jeff Hammond <jeff.science at gmail.com>
>> To: discuss at mpich.org
>> Subject: Re: [mpich-discuss] MPI_Win_fence failed
>> Message-ID:
>> <CAGKz=
>> uJ-aoHqK5A_tS6YfWaaxjw5AjhHM7xL1A0XaUSUjKvDcQ at mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> use dropbox, pastebin, etc. for attachments. it makes life a lot
>> easier for everyone.
>>
>> jeff
>>
>> On Wed, Jul 10, 2013 at 11:57 AM, Sufeng Niu <sniu at hawk.iit.edu> wrote:
>> > Sorry, I found that this discussion email cannot add figure or
>> attachment.
>> >
>> > the backtrace information is below:
>> >
>> > processes Location PC
>> > Host Rank ID Status
>> > 7 _start
>> > 0x00402399
>> > `-7 _libc_start_main
>> > 0x3685c1ecdd
>> > `-7 main
>> > 0x00402474
>> > `-7 dkm
>> > ...
>> > |-6 image_rms
>> > 0x004029bb
>> > | `-6 rms
>> > 0x00402d44
>> > | `-6 PMPI_Win_fence
>> 0x0040c389
>> > | `-6 MPIDI_Win_fence
>> 0x004a45f4
>> > | `-6 MPIDI_CH3I_RMAListComplete 0x004a27d3
>> > | `-6 MPIDI_CH3I_Progress ...
>> > `-1 udp
>> > 0x004035cf
>> > `-1 PMPI_Win_fence
>> 0x0040c389
>> > `-1 MPIDI_Win_fence
>> 0x004a45a0
>> > `-1 MPIDI_CH3I_Progress
>> 0x004292f5
>> > `-1 MPIDI_CH3_PktHandler_Get 0x0049f3f9
>> > `-1 MPIDI_CH3_iSendv
>> 0x004aa67c
>> > `- memcpy
>> > 0x3685c89329 164.54.54.122 0 20.1-13994 Stopped
>> >
>> >
>> >
>> > On Wed, Jul 10, 2013 at 11:39 AM, <discuss-request at mpich.org> wrote:
>> >>
>> >> Send discuss mailing list submissions to
>> >> discuss at mpich.org
>> >>
>> >> To subscribe or unsubscribe via the World Wide Web, visit
>> >> https://lists.mpich.org/mailman/listinfo/discuss
>> >> or, via email, send a message with subject or body 'help' to
>> >> discuss-request at mpich.org
>> >>
>> >> You can reach the person managing the list at
>> >> discuss-owner at mpich.org
>> >>
>> >> When replying, please edit your Subject line so it is more specific
>> >> than "Re: Contents of discuss digest..."
>> >>
>> >>
>> >> Today's Topics:
>> >>
>> >> 1. Re: MPI_Win_fence failed (Sufeng Niu)
>> >>
>> >>
>> >> ----------------------------------------------------------------------
>> >>
>> >> Message: 1
>> >> Date: Wed, 10 Jul 2013 11:39:39 -0500
>> >>
>> >> From: Sufeng Niu <sniu at hawk.iit.edu>
>> >> To: discuss at mpich.org
>> >> Subject: Re: [mpich-discuss] MPI_Win_fence failed
>> >> Message-ID:
>> >>
>> >> <CAFNNHkz8pBfX33icn=+3rdXvqDfWqeu58odpd=mOXLciysHgfg at mail.gmail.com>
>> >> Content-Type: text/plain; charset="iso-8859-1"
>> >>
>> >>
>> >> Sorry I forget to add screen shot for backtrace. the screen shot is
>> >> attached.
>> >>
>> >> Thanks a lot!
>> >>
>> >> Sufeng
>> >>
>> >>
>> >>
>> >> On Wed, Jul 10, 2013 at 11:30 AM, <discuss-request at mpich.org> wrote:
>> >>
>> >> > Send discuss mailing list submissions to
>> >> > discuss at mpich.org
>> >> >
>> >> > To subscribe or unsubscribe via the World Wide Web, visit
>> >> > https://lists.mpich.org/mailman/listinfo/discuss
>> >> > or, via email, send a message with subject or body 'help' to
>> >> > discuss-request at mpich.org
>> >> >
>> >> > You can reach the person managing the list at
>> >> > discuss-owner at mpich.org
>> >> >
>> >> > When replying, please edit your Subject line so it is more specific
>> >> > than "Re: Contents of discuss digest..."
>> >> >
>> >> >
>> >> > Today's Topics:
>> >> >
>> >> > 1. Re: MPI_Win_fence failed (Sufeng Niu)
>> >> >
>> >> >
>> >> >
>> ----------------------------------------------------------------------
>> >> >
>> >> > Message: 1
>> >> > Date: Wed, 10 Jul 2013 11:30:36 -0500
>> >> > From: Sufeng Niu <sniu at hawk.iit.edu>
>> >> > To: discuss at mpich.org
>> >> > Subject: Re: [mpich-discuss] MPI_Win_fence failed
>> >> > Message-ID:
>> >> > <
>> >> > CAFNNHkyLj8CbYMmc_w2DA9_+q2Oe3kyus+g6c99ShPk6ZXVkdA at mail.gmail.com>
>> >> > Content-Type: text/plain; charset="iso-8859-1"
>> >> >
>> >> > Hi Jim,
>> >> >
>> >> > Thanks a lot for your reply. the basic way for me to debugging is
>> >> > barrier+printf, right now I only have an evaluation version of
>> >> > totalview.
>> >> > the backtrace using totalview shown below. the udp is the udp
>> collection
>> >> > and create RMA window, image_rms doing MPI_Get to access the window
>> >> >
>> >> > There is a segment violation, but I don't know why the program
>> stopped
>> >> > at
>> >> > MPI_Win_fence.
>> >> >
>> >> > Thanks a lot!
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Jul 10, 2013 at 10:12 AM, <discuss-request at mpich.org> wrote:
>> >> >
>> >> > > Send discuss mailing list submissions to
>> >> > > discuss at mpich.org
>> >> > >
>> >> > > To subscribe or unsubscribe via the World Wide Web, visit
>> >> > > https://lists.mpich.org/mailman/listinfo/discuss
>> >> > > or, via email, send a message with subject or body 'help' to
>> >> > > discuss-request at mpich.org
>> >> > >
>> >> > > You can reach the person managing the list at
>> >> > > discuss-owner at mpich.org
>> >> > >
>> >> > > When replying, please edit your Subject line so it is more specific
>> >> > > than "Re: Contents of discuss digest..."
>> >> > >
>> >> > >
>> >> > > Today's Topics:
>> >> > >
>> >> > > 1. Re: MPICH3.0.4 make fails with "No rule to make target..."
>> >> > > (Wesley Bland)
>> >> > > 2. Re: Error in MPI_Finalize on a simple ring test over TCP
>> >> > > (Wesley Bland)
>> >> > > 3. Restrict number of cores, not threads (Bob Ilgner)
>> >> > > 4. Re: Restrict number of cores, not threads (Wesley Bland)
>> >> > > 5. Re: Restrict number of cores, not threads (Wesley Bland)
>> >> > > 6. Re: Error in MPI_Finalize on a simple ring test over TCP
>> >> > > (Thomas Ropars)
>> >> > > 7. MPI_Win_fence failed (Sufeng Niu)
>> >> > > 8. Re: MPI_Win_fence failed (Jim Dinan)
>> >> > >
>> >> > >
>> >> > >
>> ----------------------------------------------------------------------
>> >> > >
>> >> > > Message: 1
>> >> > > Date: Wed, 10 Jul 2013 08:29:06 -0500
>> >> > > From: Wesley Bland <wbland at mcs.anl.gov>
>> >> > > To: discuss at mpich.org
>> >> > > Subject: Re: [mpich-discuss] MPICH3.0.4 make fails with "No rule to
>> >> > > make target..."
>> >> > > Message-ID: <F48FC916-31F7-4F82-95F8-2D6A6C45264F at mcs.anl.gov>
>> >> > > Content-Type: text/plain; charset="iso-8859-1"
>> >> > >
>> >> > > Unfortunately, due to the lack of developer resources and interest,
>> >> > > the
>> >> > > last version of MPICH which was supported on Windows was 1.4.1p.
>> You
>> >> > > can
>> >> > > find that version on the downloads page:
>> >> > >
>> >> > > http://www.mpich.org/downloads/
>> >> > >
>> >> > > Alternatively, Microsoft maintains a derivative of MPICH which
>> should
>> >> > > provide the features you need. You also find a link to that on the
>> >> > > downloads page above.
>> >> > >
>> >> > > Wesley
>> >> > >
>> >> > > On Jul 10, 2013, at 1:16 AM, Don Warren <don.warren at gmail.com>
>> wrote:
>> >> > >
>> >> > > > Hello,
>> >> > > >
>> >> > > > As requested in the installation guide, I'm informing this list
>> of a
>> >> > > failure to correctly make MPICH3.0.4 on a Win7 system. The
>> specific
>> >> > error
>> >> > > encountered is
>> >> > > > "make[2]: *** No rule to make target
>> >> > > `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.am', needed
>> by
>> >> > > `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.in'. Stop."
>> >> > > >
>> >> > > > I have confirmed that both Makefile.am and Makefile.in exist in
>> the
>> >> > > directory listed. I'm attaching the c.txt and the m.txt files.
>> >> > > >
>> >> > > > Possibly of interest is that the command "make clean" fails at
>> >> > > > exactly
>> >> > > the same folder, with exactly the same error message as shown in
>> m.txt
>> >> > and
>> >> > > above.
>> >> > > >
>> >> > > > Any advice you can give would be appreciated. I'm attempting to
>> get
>> >> > > FLASH running on my computer, which seems to require MPICH.
>> >> > > >
>> >> > > > Regards,
>> >> > > > Don Warren
>> >> > > >
>> >> >
>> <config-make-outputs.zip>_______________________________________________
>> >> > > > discuss mailing list discuss at mpich.org
>> >> > > > To manage subscription options or unsubscribe:
>> >> > > > https://lists.mpich.org/mailman/listinfo/discuss
>> >> > >
>> >> > > -------------- next part --------------
>> >> > > An HTML attachment was scrubbed...
>> >> > > URL: <
>> >> > >
>> >> >
>> >> >
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/69b497f1/attachment-0001.html
>> >> > > >
>> >> > >
>> >> > > ------------------------------
>> >> > >
>> >> > > Message: 2
>> >> > > Date: Wed, 10 Jul 2013 08:39:47 -0500
>> >> > > From: Wesley Bland <wbland at mcs.anl.gov>
>> >> > > To: discuss at mpich.org
>> >> > > Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple ring
>> >> > > test over TCP
>> >> > > Message-ID: <D5999106-2A75-4091-8B0F-EAFA22880862 at mcs.anl.gov>
>> >> > > Content-Type: text/plain; charset=us-ascii
>> >> > >
>> >> > > The value of previous for rank 0 in your code is -1. MPICH is
>> >> > > complaining
>> >> > > because all of the requests to receive a message from -1 are still
>> >> > pending
>> >> > > when you try to finalize. You need to make sure that you are
>> receiving
>> >> > from
>> >> > > valid ranks.
>> >> > >
>> >> > > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <thomas.ropars at epfl.ch>
>> >> > wrote:
>> >> > >
>> >> > > > Yes sure. Here it is.
>> >> > > >
>> >> > > > Thomas
>> >> > > >
>> >> > > > On 07/10/2013 02:23 PM, Wesley Bland wrote:
>> >> > > >> Can you send us the smallest chunk of code that still exhibits
>> this
>> >> > > error?
>> >> > > >>
>> >> > > >> Wesley
>> >> > > >>
>> >> > > >> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <
>> thomas.ropars at epfl.ch>
>> >> > > wrote:
>> >> > > >>
>> >> > > >>> Hi all,
>> >> > > >>>
>> >> > > >>> I get the following error when I try to run a simple
>> application
>> >> > > implementing a ring (each process sends to rank+1 and receives from
>> >> > > rank-1). More precisely, the error occurs during the call to
>> >> > MPI_Finalize():
>> >> > > >>>
>> >> > > >>> Assertion failed in file
>> >> > > src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363:
>> >> > sc->pg_is_set
>> >> > > >>> internal ABORT - process 0
>> >> > > >>>
>> >> > > >>> Does anybody else also noticed the same error?
>> >> > > >>>
>> >> > > >>> Here are all the details about my test:
>> >> > > >>> - The error is generated with mpich-3.0.2 (but I noticed the
>> exact
>> >> > > same error with mpich-3.0.4)
>> >> > > >>> - I am using IPoIB for communication between nodes (The same
>> thing
>> >> > > happens over Ethernet)
>> >> > > >>> - The problem comes from TCP links. When all processes are on
>> the
>> >> > same
>> >> > > node, there is no error. As soon as one process is on a remote
>> node,
>> >> > > the
>> >> > > failure occurs.
>> >> > > >>> - Note also that the failure does not occur if I run a more
>> >> > > >>> complex
>> >> > > code (eg, a NAS benchmark).
>> >> > > >>>
>> >> > > >>> Thomas Ropars
>> >> > > >>> _______________________________________________
>> >> > > >>> discuss mailing list discuss at mpich.org
>> >> > > >>> To manage subscription options or unsubscribe:
>> >> > > >>> https://lists.mpich.org/mailman/listinfo/discuss
>> >> > > >> _______________________________________________
>> >> > > >> discuss mailing list discuss at mpich.org
>> >> > > >> To manage subscription options or unsubscribe:
>> >> > > >> https://lists.mpich.org/mailman/listinfo/discuss
>> >> > > >>
>> >> > > >>
>> >> > > >
>> >> > > > <ring_clean.c>_______________________________________________
>> >> > > > discuss mailing list discuss at mpich.org
>> >> > > > To manage subscription options or unsubscribe:
>> >> > > > https://lists.mpich.org/mailman/listinfo/discuss
>> >> > >
>> >> > >
>> >> > >
>> >> > > ------------------------------
>> >> > >
>> >> > > Message: 3
>> >> > > Date: Wed, 10 Jul 2013 16:41:27 +0200
>> >> > > From: Bob Ilgner <bobilgner at gmail.com>
>> >> > > To: mpich-discuss at mcs.anl.gov
>> >> > > Subject: [mpich-discuss] Restrict number of cores, not threads
>> >> > > Message-ID:
>> >> > > <
>> >> > > CAKv15b-QgmHkVkoiTFmP3EZXvyy6sc_QeqHQgbMUhnr3Xh9ecA at mail.gmail.com
>> >
>> >> > > Content-Type: text/plain; charset="iso-8859-1"
>> >> > >
>> >> > > Dear all,
>> >> > >
>> >> > > I am working on a shared memory processor with 256 cores. I am
>> working
>> >> > from
>> >> > > the command line directly.
>> >> > >
>> >> > > Can I restict the number of cores that I deploy.The command
>> >> > >
>> >> > > mpirun -n 100 myprog
>> >> > >
>> >> > >
>> >> > > will automatically start on 100 cores. I wish to use only 10 cores
>> and
>> >> > have
>> >> > > 10 threads on each core. Can I do this with mpich ? Rememebre that
>> >> > > this
>> >> > an
>> >> > > smp abd I can not identify each core individually(as in a cluster)
>> >> > >
>> >> > > Regards, bob
>> >> > > -------------- next part --------------
>> >> > > An HTML attachment was scrubbed...
>> >> > > URL: <
>> >> > >
>> >> >
>> >> >
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/ec659e91/attachment-0001.html
>> >> > > >
>> >> > >
>> >> > > ------------------------------
>> >> > >
>> >> > > Message: 4
>> >> > > Date: Wed, 10 Jul 2013 09:46:38 -0500
>> >> > > From: Wesley Bland <wbland at mcs.anl.gov>
>> >> > > To: discuss at mpich.org
>> >> > > Cc: mpich-discuss at mcs.anl.gov
>> >> > > Subject: Re: [mpich-discuss] Restrict number of cores, not threads
>> >> > > Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
>> >> > > Content-Type: text/plain; charset=iso-8859-1
>> >> > >
>> >> > > Threads in MPI are not ranks. When you say you want to launch with
>> -n
>> >> > 100,
>> >> > > you will always get 100 processes, not threads. If you want 10
>> threads
>> >> > > on
>> >> > > 10 cores, you will need to launch with -n 10, then add your threads
>> >> > > according to your threading library.
>> >> > >
>> >> > > Note that threads in MPI do not get their own rank currently. They
>> all
>> >> > > share the same rank as the process in which they reside, so if you
>> >> > > need
>> >> > to
>> >> > > be able to handle things with different ranks, you'll need to use
>> >> > > actual
>> >> > > processes.
>> >> > >
>> >> > > Wesley
>> >> > >
>> >> > > On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com>
>> wrote:
>> >> > >
>> >> > > > Dear all,
>> >> > > >
>> >> > > > I am working on a shared memory processor with 256 cores. I am
>> >> > > > working
>> >> > > from the command line directly.
>> >> > > >
>> >> > > > Can I restict the number of cores that I deploy.The command
>> >> > > >
>> >> > > > mpirun -n 100 myprog
>> >> > > >
>> >> > > >
>> >> > > > will automatically start on 100 cores. I wish to use only 10
>> cores
>> >> > > > and
>> >> > > have 10 threads on each core. Can I do this with mpich ? Rememebre
>> >> > > that
>> >> > > this an smp abd I can not identify each core individually(as in a
>> >> > cluster)
>> >> > > >
>> >> > > > Regards, bob
>> >> > > > _______________________________________________
>> >> > > > discuss mailing list discuss at mpich.org
>> >> > > > To manage subscription options or unsubscribe:
>> >> > > > https://lists.mpich.org/mailman/listinfo/discuss
>> >> > >
>> >> > >
>> >> > >
>> >> > > ------------------------------
>> >> > >
>> >> > > Message: 5
>> >> > > Date: Wed, 10 Jul 2013 09:46:38 -0500
>> >> > > From: Wesley Bland <wbland at mcs.anl.gov>
>> >> > > To: discuss at mpich.org
>> >> > > Cc: mpich-discuss at mcs.anl.gov
>> >> > > Subject: Re: [mpich-discuss] Restrict number of cores, not threads
>> >> > > Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
>> >> > > Content-Type: text/plain; charset=iso-8859-1
>> >> > >
>> >> > > Threads in MPI are not ranks. When you say you want to launch with
>> -n
>> >> > 100,
>> >> > > you will always get 100 processes, not threads. If you want 10
>> threads
>> >> > > on
>> >> > > 10 cores, you will need to launch with -n 10, then add your threads
>> >> > > according to your threading library.
>> >> > >
>> >> > > Note that threads in MPI do not get their own rank currently. They
>> all
>> >> > > share the same rank as the process in which they reside, so if you
>> >> > > need
>> >> > to
>> >> > > be able to handle things with different ranks, you'll need to use
>> >> > > actual
>> >> > > processes.
>> >> > >
>> >> > > Wesley
>> >> > >
>> >> > > On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com>
>> wrote:
>> >> > >
>> >> > > > Dear all,
>> >> > > >
>> >> > > > I am working on a shared memory processor with 256 cores. I am
>> >> > > > working
>> >> > > from the command line directly.
>> >> > > >
>> >> > > > Can I restict the number of cores that I deploy.The command
>> >> > > >
>> >> > > > mpirun -n 100 myprog
>> >> > > >
>> >> > > >
>> >> > > > will automatically start on 100 cores. I wish to use only 10
>> cores
>> >> > > > and
>> >> > > have 10 threads on each core. Can I do this with mpich ? Rememebre
>> >> > > that
>> >> > > this an smp abd I can not identify each core individually(as in a
>> >> > cluster)
>> >> > > >
>> >> > > > Regards, bob
>> >> > > > _______________________________________________
>> >> > > > discuss mailing list discuss at mpich.org
>> >> > > > To manage subscription options or unsubscribe:
>> >> > > > https://lists.mpich.org/mailman/listinfo/discuss
>> >> > >
>> >> > >
>> >> > >
>> >> > > ------------------------------
>> >> > >
>> >> > > Message: 6
>> >> > > Date: Wed, 10 Jul 2013 16:50:36 +0200
>> >> > > From: Thomas Ropars <thomas.ropars at epfl.ch>
>> >> > > To: discuss at mpich.org
>> >> > > Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple ring
>> >> > > test over TCP
>> >> > > Message-ID: <51DD74BC.3020009 at epfl.ch>
>> >> > > Content-Type: text/plain; charset=UTF-8; format=flowed
>> >> > >
>> >> > > Yes, you are right, sorry for disturbing.
>> >> > >
>> >> > > On 07/10/2013 03:39 PM, Wesley Bland wrote:
>> >> > > > The value of previous for rank 0 in your code is -1. MPICH is
>> >> > > complaining because all of the requests to receive a message from
>> -1
>> >> > > are
>> >> > > still pending when you try to finalize. You need to make sure that
>> you
>> >> > are
>> >> > > receiving from valid ranks.
>> >> > > >
>> >> > > > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <
>> thomas.ropars at epfl.ch>
>> >> > > wrote:
>> >> > > >
>> >> > > >> Yes sure. Here it is.
>> >> > > >>
>> >> > > >> Thomas
>> >> > > >>
>> >> > > >> On 07/10/2013 02:23 PM, Wesley Bland wrote:
>> >> > > >>> Can you send us the smallest chunk of code that still exhibits
>> >> > > >>> this
>> >> > > error?
>> >> > > >>>
>> >> > > >>> Wesley
>> >> > > >>>
>> >> > > >>> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <
>> thomas.ropars at epfl.ch>
>> >> > > wrote:
>> >> > > >>>
>> >> > > >>>> Hi all,
>> >> > > >>>>
>> >> > > >>>> I get the following error when I try to run a simple
>> application
>> >> > > implementing a ring (each process sends to rank+1 and receives from
>> >> > > rank-1). More precisely, the error occurs during the call to
>> >> > MPI_Finalize():
>> >> > > >>>>
>> >> > > >>>> Assertion failed in file
>> >> > > src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363:
>> >> > sc->pg_is_set
>> >> > > >>>> internal ABORT - process 0
>> >> > > >>>>
>> >> > > >>>> Does anybody else also noticed the same error?
>> >> > > >>>>
>> >> > > >>>> Here are all the details about my test:
>> >> > > >>>> - The error is generated with mpich-3.0.2 (but I noticed the
>> >> > > >>>> exact
>> >> > > same error with mpich-3.0.4)
>> >> > > >>>> - I am using IPoIB for communication between nodes (The same
>> >> > > >>>> thing
>> >> > > happens over Ethernet)
>> >> > > >>>> - The problem comes from TCP links. When all processes are on
>> the
>> >> > > same node, there is no error. As soon as one process is on a remote
>> >> > > node,
>> >> > > the failure occurs.
>> >> > > >>>> - Note also that the failure does not occur if I run a more
>> >> > > >>>> complex
>> >> > > code (eg, a NAS benchmark).
>> >> > > >>>>
>> >> > > >>>> Thomas Ropars
>> >> > > >>>> _______________________________________________
>> >> > > >>>> discuss mailing list discuss at mpich.org
>> >> > > >>>> To manage subscription options or unsubscribe:
>> >> > > >>>> https://lists.mpich.org/mailman/listinfo/discuss
>> >> > > >>> _______________________________________________
>> >> > > >>> discuss mailing list discuss at mpich.org
>> >> > > >>> To manage subscription options or unsubscribe:
>> >> > > >>> https://lists.mpich.org/mailman/listinfo/discuss
>> >> > > >>>
>> >> > > >>>
>> >> > > >> <ring_clean.c>_______________________________________________
>> >> > > >> discuss mailing list discuss at mpich.org
>> >> > > >> To manage subscription options or unsubscribe:
>> >> > > >> https://lists.mpich.org/mailman/listinfo/discuss
>> >> > > > _______________________________________________
>> >> > > > discuss mailing list discuss at mpich.org
>> >> > > > To manage subscription options or unsubscribe:
>> >> > > > https://lists.mpich.org/mailman/listinfo/discuss
>> >> > > >
>> >> > > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > ------------------------------
>> >> > >
>> >> > > Message: 7
>> >> > > Date: Wed, 10 Jul 2013 10:07:21 -0500
>> >> > > From: Sufeng Niu <sniu at hawk.iit.edu>
>> >> > > To: discuss at mpich.org
>> >> > > Subject: [mpich-discuss] MPI_Win_fence failed
>> >> > > Message-ID:
>> >> > > <
>> >> > > CAFNNHkz_1gC7hfpx0G9j24adO-gDabdmwZ4VuT6jip-fDMhS9A at mail.gmail.com
>> >
>> >> > > Content-Type: text/plain; charset="iso-8859-1"
>> >> > >
>> >> > > Hello,
>> >> > >
>> >> > > I used MPI RMA in my program, but the program stop at the
>> >> > > MPI_Win_fence,
>> >> > I
>> >> > > have a master process receive data from udp socket. Other processes
>> >> > > use
>> >> > > MPI_Get to access data.
>> >> > >
>> >> > > master process:
>> >> > >
>> >> > > MPI_Create(...)
>> >> > > for(...){
>> >> > > /* udp recv operation */
>> >> > >
>> >> > > MPI_Barrier // to let other process know data received from udp is
>> >> > > ready
>> >> > >
>> >> > > MPI_Win_fence(0, win);
>> >> > > MPI_Win_fence(0, win);
>> >> > >
>> >> > > }
>> >> > >
>> >> > > other processes:
>> >> > >
>> >> > > for(...){
>> >> > >
>> >> > > MPI_Barrier // sync for udp data ready
>> >> > >
>> >> > > MPI_Win_fence(0, win);
>> >> > >
>> >> > > MPI_Get();
>> >> > >
>> >> > > MPI_Win_fence(0, win); <-- program stopped here
>> >> > >
>> >> > > /* other operation */
>> >> > > }
>> >> > >
>> >> > > I found that the program stopped at second MPI_Win_fence, the
>> terminal
>> >> > > output is:
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> >
>> >> >
>> ===================================================================================
>> >> > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> >> > > = EXIT CODE: 11
>> >> > > = CLEANING UP REMAINING PROCESSES
>> >> > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> >> > >
>> >> > >
>> >> >
>> >> >
>> ===================================================================================
>> >> > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation
>> fault
>> >> > > (signal 11)
>> >> > > This typically refers to a problem with your application.
>> >> > > Please see the FAQ page for debugging suggestions
>> >> > >
>> >> > > Do you have any suggestions? Thank you very much!
>> >> > >
>> >> > > --
>> >> > > Best Regards,
>> >> > > Sufeng Niu
>> >> > > ECASP lab, ECE department, Illinois Institute of Technology
>> >> > > Tel: 312-731-7219
>> >> > > -------------- next part --------------
>> >> > > An HTML attachment was scrubbed...
>> >> > > URL: <
>> >> > >
>> >> >
>> >> >
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/375a95ac/attachment-0001.html
>> >> > > >
>> >> > >
>> >> > > ------------------------------
>> >> > >
>> >> > > Message: 8
>> >> > > Date: Wed, 10 Jul 2013 11:12:45 -0400
>> >> > > From: Jim Dinan <james.dinan at gmail.com>
>> >> > > To: discuss at mpich.org
>> >> > > Subject: Re: [mpich-discuss] MPI_Win_fence failed
>> >> > > Message-ID:
>> >> > > <CAOoEU4F3hX=y3yrJKYKucNeiueQYBeR_3OQas9E+mg+GM6Rz=
>> >> > > w at mail.gmail.com>
>> >> > > Content-Type: text/plain; charset="iso-8859-1"
>> >> > >
>> >> > > It's hard to tell where the segmentation fault is coming from. Can
>> >> > > you
>> >> > use
>> >> > > a debugger to generate a backtrace?
>> >> > >
>> >> > > ~Jim.
>> >> > >
>> >> > >
>> >> > > On Wed, Jul 10, 2013 at 11:07 AM, Sufeng Niu <sniu at hawk.iit.edu>
>> >> > > wrote:
>> >> > >
>> >> > > > Hello,
>> >> > > >
>> >> > > > I used MPI RMA in my program, but the program stop at the
>> >> > MPI_Win_fence,
>> >> > > I
>> >> > > > have a master process receive data from udp socket. Other
>> processes
>> >> > > > use
>> >> > > > MPI_Get to access data.
>> >> > > >
>> >> > > > master process:
>> >> > > >
>> >> > > > MPI_Create(...)
>> >> > > > for(...){
>> >> > > > /* udp recv operation */
>> >> > > >
>> >> > > > MPI_Barrier // to let other process know data received from udp
>> is
>> >> > ready
>> >> > > >
>> >> > > > MPI_Win_fence(0, win);
>> >> > > > MPI_Win_fence(0, win);
>> >> > > >
>> >> > > > }
>> >> > > >
>> >> > > > other processes:
>> >> > > >
>> >> > > > for(...){
>> >> > > >
>> >> > > > MPI_Barrier // sync for udp data ready
>> >> > > >
>> >> > > > MPI_Win_fence(0, win);
>> >> > > >
>> >> > > > MPI_Get();
>> >> > > >
>> >> > > > MPI_Win_fence(0, win); <-- program stopped here
>> >> > > >
>> >> > > > /* other operation */
>> >> > > > }
>> >> > > >
>> >> > > > I found that the program stopped at second MPI_Win_fence, the
>> >> > > > terminal
>> >> > > > output is:
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > >
>> >> >
>> >> >
>> ===================================================================================
>> >> > > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> >> > > > = EXIT CODE: 11
>> >> > > > = CLEANING UP REMAINING PROCESSES
>> >> > > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> >> > > >
>> >> > > >
>> >> > >
>> >> >
>> >> >
>> ===================================================================================
>> >> > > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation
>> fault
>> >> > > > (signal 11)
>> >> > > > This typically refers to a problem with your application.
>> >> > > > Please see the FAQ page for debugging suggestions
>> >> > > >
>> >> > > > Do you have any suggestions? Thank you very much!
>> >> > > >
>> >> > > > --
>> >> > > > Best Regards,
>> >> > > > Sufeng Niu
>> >> > > > ECASP lab, ECE department, Illinois Institute of Technology
>> >> > > > Tel: 312-731-7219
>> >> > > >
>> >> > > > _______________________________________________
>> >> > > > discuss mailing list discuss at mpich.org
>> >> > > > To manage subscription options or unsubscribe:
>> >> > > > https://lists.mpich.org/mailman/listinfo/discuss
>> >> > > >
>> >> > > -------------- next part --------------
>> >> > > An HTML attachment was scrubbed...
>> >> > > URL: <
>> >> > >
>> >> >
>> >> >
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48c5f337/attachment.html
>> >> > > >
>> >> > >
>> >> > > ------------------------------
>> >> > >
>> >> > > _______________________________________________
>> >> > > discuss mailing list
>> >> > > discuss at mpich.org
>> >> > > https://lists.mpich.org/mailman/listinfo/discuss
>> >> > >
>> >> > > End of discuss Digest, Vol 9, Issue 27
>> >> > > **************************************
>> >> > >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Best Regards,
>> >> > Sufeng Niu
>> >> > ECASP lab, ECE department, Illinois Institute of Technology
>> >> > Tel: 312-731-7219
>> >> > -------------- next part --------------
>> >> > An HTML attachment was scrubbed...
>> >> > URL: <
>> >> >
>> >> >
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/57a5e76f/attachment.html
>> >> > >
>> >> >
>> >> > ------------------------------
>> >> >
>> >> > _______________________________________________
>> >> > discuss mailing list
>> >> > discuss at mpich.org
>> >> > https://lists.mpich.org/mailman/listinfo/discuss
>> >> >
>> >> > End of discuss Digest, Vol 9, Issue 28
>> >> > **************************************
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Best Regards,
>> >> Sufeng Niu
>> >> ECASP lab, ECE department, Illinois Institute of Technology
>> >> Tel: 312-731-7219
>> >> -------------- next part --------------
>> >> An HTML attachment was scrubbed...
>> >> URL:
>> >> <
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48296a33/attachment.html
>> >
>> >> -------------- next part --------------
>> >> A non-text attachment was scrubbed...
>> >> Name: Screenshot.png
>> >> Type: image/png
>> >> Size: 131397 bytes
>> >> Desc: not available
>> >> URL:
>> >> <
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48296a33/attachment.png
>> >
>> >>
>> >>
>> >> ------------------------------
>> >>
>> >> _______________________________________________
>> >> discuss mailing list
>> >> discuss at mpich.org
>> >> https://lists.mpich.org/mailman/listinfo/discuss
>> >>
>> >> End of discuss Digest, Vol 9, Issue 29
>> >> **************************************
>> >
>> >
>> >
>> >
>> > --
>> > Best Regards,
>> > Sufeng Niu
>> > ECASP lab, ECE department, Illinois Institute of Technology
>> > Tel: 312-731-7219
>> >
>> > _______________________________________________
>> > discuss mailing list discuss at mpich.org
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com
>>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Wed, 10 Jul 2013 12:08:19 -0500
>> From: Sufeng Niu <sniu at hawk.iit.edu>
>> To: discuss at mpich.org
>> Subject: Re: [mpich-discuss] MPI_Win_fence failed
>> Message-ID:
>> <CAFNNHkzu0GYT0qSdWx1VQz0+V7mg5d=
>> tZFQm-MHPVoCyKfiYSA at mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Oh, yeah, that would be an easier way. I just create a repository in
>> github. you can
>> git clone https://github.com/sufengniu/mpi_app_test.git
>>
>> to run the program. you need to install a tif library. I know ubuntu is
>> sudo apt-get install libtiff4-dev.
>> after you download it. just make
>> then there will be 2 bin file,
>>
>> please change hostfile to your machine, first run mpi: ./run.perl main
>>
>> then run ./udp_client 55Fe_run5_dark.tif
>>
>> Thanks a lot!
>> Sufeng
>>
>>
>>
>> On Wed, Jul 10, 2013 at 11:57 AM, <discuss-request at mpich.org> wrote:
>>
>> > Send discuss mailing list submissions to
>> > discuss at mpich.org
>> >
>> > To subscribe or unsubscribe via the World Wide Web, visit
>> > https://lists.mpich.org/mailman/listinfo/discuss
>> > or, via email, send a message with subject or body 'help' to
>> > discuss-request at mpich.org
>> >
>> > You can reach the person managing the list at
>> > discuss-owner at mpich.org
>> >
>> > When replying, please edit your Subject line so it is more specific
>> > than "Re: Contents of discuss digest..."
>> >
>> >
>> > Today's Topics:
>> >
>> > 1. Re: MPI_Win_fence failed (Jeff Hammond)
>> > 2. Re: MPI_Win_fence failed (Sufeng Niu)
>> >
>> >
>> > ----------------------------------------------------------------------
>> >
>> > Message: 1
>> > Date: Wed, 10 Jul 2013 11:46:08 -0500
>> > From: Jeff Hammond <jeff.science at gmail.com>
>> > To: discuss at mpich.org
>> > Subject: Re: [mpich-discuss] MPI_Win_fence failed
>> > Message-ID:
>> > <CAGKz=
>> > uLiq6rur+15MBip5U-_AS2JWefYOHfX07b1dkR8POOk6A at mail.gmail.com>
>> > Content-Type: text/plain; charset=ISO-8859-1
>> >
>> > Just post the code so we can run it.
>> >
>> > Jeff
>> >
>> > On Wed, Jul 10, 2013 at 11:39 AM, Sufeng Niu <sniu at hawk.iit.edu> wrote:
>> > > Sorry I forget to add screen shot for backtrace. the screen shot is
>> > > attached.
>> > >
>> > > Thanks a lot!
>> > >
>> > > Sufeng
>> > >
>> > >
>> > >
>> > > On Wed, Jul 10, 2013 at 11:30 AM, <discuss-request at mpich.org> wrote:
>> > >>
>> > >> Send discuss mailing list submissions to
>> > >> discuss at mpich.org
>> > >>
>> > >> To subscribe or unsubscribe via the World Wide Web, visit
>> > >> https://lists.mpich.org/mailman/listinfo/discuss
>> > >> or, via email, send a message with subject or body 'help' to
>> > >> discuss-request at mpich.org
>> > >>
>> > >> You can reach the person managing the list at
>> > >> discuss-owner at mpich.org
>> > >>
>> > >> When replying, please edit your Subject line so it is more specific
>> > >> than "Re: Contents of discuss digest..."
>> > >>
>> > >>
>> > >> Today's Topics:
>> > >>
>> > >> 1. Re: MPI_Win_fence failed (Sufeng Niu)
>> > >>
>> > >>
>> > >>
>> ----------------------------------------------------------------------
>> > >>
>> > >> Message: 1
>> > >> Date: Wed, 10 Jul 2013 11:30:36 -0500
>> > >> From: Sufeng Niu <sniu at hawk.iit.edu>
>> > >> To: discuss at mpich.org
>> > >> Subject: Re: [mpich-discuss] MPI_Win_fence failed
>> > >> Message-ID:
>> > >>
>> > >> <CAFNNHkyLj8CbYMmc_w2DA9_+q2Oe3kyus+g6c99ShPk6ZXVkdA at mail.gmail.com>
>> > >> Content-Type: text/plain; charset="iso-8859-1"
>> > >>
>> > >>
>> > >> Hi Jim,
>> > >>
>> > >> Thanks a lot for your reply. the basic way for me to debugging is
>> > >> barrier+printf, right now I only have an evaluation version of
>> > totalview.
>> > >> the backtrace using totalview shown below. the udp is the udp
>> collection
>> > >> and create RMA window, image_rms doing MPI_Get to access the window
>> > >>
>> > >> There is a segment violation, but I don't know why the program
>> stopped
>> > at
>> > >> MPI_Win_fence.
>> > >>
>> > >> Thanks a lot!
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Wed, Jul 10, 2013 at 10:12 AM, <discuss-request at mpich.org> wrote:
>> > >>
>> > >> > Send discuss mailing list submissions to
>> > >> > discuss at mpich.org
>> > >> >
>> > >> > To subscribe or unsubscribe via the World Wide Web, visit
>> > >> > https://lists.mpich.org/mailman/listinfo/discuss
>> > >> > or, via email, send a message with subject or body 'help' to
>> > >> > discuss-request at mpich.org
>> > >> >
>> > >> > You can reach the person managing the list at
>> > >> > discuss-owner at mpich.org
>> > >> >
>> > >> > When replying, please edit your Subject line so it is more specific
>> > >> > than "Re: Contents of discuss digest..."
>> > >> >
>> > >> >
>> > >> > Today's Topics:
>> > >> >
>> > >> > 1. Re: MPICH3.0.4 make fails with "No rule to make target..."
>> > >> > (Wesley Bland)
>> > >> > 2. Re: Error in MPI_Finalize on a simple ring test over TCP
>> > >> > (Wesley Bland)
>> > >> > 3. Restrict number of cores, not threads (Bob Ilgner)
>> > >> > 4. Re: Restrict number of cores, not threads (Wesley Bland)
>> > >> > 5. Re: Restrict number of cores, not threads (Wesley Bland)
>> > >> > 6. Re: Error in MPI_Finalize on a simple ring test over TCP
>> > >> > (Thomas Ropars)
>> > >> > 7. MPI_Win_fence failed (Sufeng Niu)
>> > >> > 8. Re: MPI_Win_fence failed (Jim Dinan)
>> > >> >
>> > >> >
>> > >> >
>> ----------------------------------------------------------------------
>> > >> >
>> > >> > Message: 1
>> > >> > Date: Wed, 10 Jul 2013 08:29:06 -0500
>> > >> > From: Wesley Bland <wbland at mcs.anl.gov>
>> > >> > To: discuss at mpich.org
>> > >> > Subject: Re: [mpich-discuss] MPICH3.0.4 make fails with "No rule to
>> > >> > make target..."
>> > >> > Message-ID: <F48FC916-31F7-4F82-95F8-2D6A6C45264F at mcs.anl.gov>
>> > >> > Content-Type: text/plain; charset="iso-8859-1"
>> > >> >
>> > >> > Unfortunately, due to the lack of developer resources and interest,
>> > the
>> > >> > last version of MPICH which was supported on Windows was 1.4.1p.
>> You
>> > can
>> > >> > find that version on the downloads page:
>> > >> >
>> > >> > http://www.mpich.org/downloads/
>> > >> >
>> > >> > Alternatively, Microsoft maintains a derivative of MPICH which
>> should
>> > >> > provide the features you need. You also find a link to that on the
>> > >> > downloads page above.
>> > >> >
>> > >> > Wesley
>> > >> >
>> > >> > On Jul 10, 2013, at 1:16 AM, Don Warren <don.warren at gmail.com>
>> wrote:
>> > >> >
>> > >> > > Hello,
>> > >> > >
>> > >> > > As requested in the installation guide, I'm informing this list
>> of a
>> > >> > failure to correctly make MPICH3.0.4 on a Win7 system. The
>> specific
>> > >> > error
>> > >> > encountered is
>> > >> > > "make[2]: *** No rule to make target
>> > >> > `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.am', needed
>> by
>> > >> > `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.in'. Stop."
>> > >> > >
>> > >> > > I have confirmed that both Makefile.am and Makefile.in exist in
>> the
>> > >> > directory listed. I'm attaching the c.txt and the m.txt files.
>> > >> > >
>> > >> > > Possibly of interest is that the command "make clean" fails at
>> > exactly
>> > >> > the same folder, with exactly the same error message as shown in
>> m.txt
>> > >> > and
>> > >> > above.
>> > >> > >
>> > >> > > Any advice you can give would be appreciated. I'm attempting to
>> get
>> > >> > FLASH running on my computer, which seems to require MPICH.
>> > >> > >
>> > >> > > Regards,
>> > >> > > Don Warren
>> > >> > >
>> > >> > >
>> > <config-make-outputs.zip>_______________________________________________
>> > >>
>> > >> > > discuss mailing list discuss at mpich.org
>> > >> > > To manage subscription options or unsubscribe:
>> > >> > > https://lists.mpich.org/mailman/listinfo/discuss
>> > >> >
>> > >> > -------------- next part --------------
>> > >> > An HTML attachment was scrubbed...
>> > >> > URL: <
>> > >> >
>> > >> >
>> >
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/69b497f1/attachment-0001.html
>> > >> > >
>> > >> >
>> > >> > ------------------------------
>> > >> >
>> > >> > Message: 2
>> > >> > Date: Wed, 10 Jul 2013 08:39:47 -0500
>> > >> > From: Wesley Bland <wbland at mcs.anl.gov>
>> > >> > To: discuss at mpich.org
>> > >> > Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple ring
>> > >> > test over TCP
>> > >> > Message-ID: <D5999106-2A75-4091-8B0F-EAFA22880862 at mcs.anl.gov>
>> > >> > Content-Type: text/plain; charset=us-ascii
>> > >> >
>> > >> > The value of previous for rank 0 in your code is -1. MPICH is
>> > >> > complaining
>> > >> > because all of the requests to receive a message from -1 are still
>> > >> > pending
>> > >> > when you try to finalize. You need to make sure that you are
>> receiving
>> > >> > from
>> > >> > valid ranks.
>> > >> >
>> > >> > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <thomas.ropars at epfl.ch>
>> > >> > wrote:
>> > >> >
>> > >> > > Yes sure. Here it is.
>> > >> > >
>> > >> > > Thomas
>> > >> > >
>> > >> > > On 07/10/2013 02:23 PM, Wesley Bland wrote:
>> > >> > >> Can you send us the smallest chunk of code that still exhibits
>> this
>> > >> > error?
>> > >> > >>
>> > >> > >> Wesley
>> > >> > >>
>> > >> > >> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <
>> thomas.ropars at epfl.ch>
>> > >> > wrote:
>> > >> > >>
>> > >> > >>> Hi all,
>> > >> > >>>
>> > >> > >>> I get the following error when I try to run a simple
>> application
>> > >> > implementing a ring (each process sends to rank+1 and receives from
>> > >> > rank-1). More precisely, the error occurs during the call to
>> > >> > MPI_Finalize():
>> > >> > >>>
>> > >> > >>> Assertion failed in file
>> > >> > src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363:
>> > >> > sc->pg_is_set
>> > >> > >>> internal ABORT - process 0
>> > >> > >>>
>> > >> > >>> Does anybody else also noticed the same error?
>> > >> > >>>
>> > >> > >>> Here are all the details about my test:
>> > >> > >>> - The error is generated with mpich-3.0.2 (but I noticed the
>> exact
>> > >> > same error with mpich-3.0.4)
>> > >> > >>> - I am using IPoIB for communication between nodes (The same
>> thing
>> > >> > happens over Ethernet)
>> > >> > >>> - The problem comes from TCP links. When all processes are on
>> the
>> > >> > >>> same
>> > >> > node, there is no error. As soon as one process is on a remote
>> node,
>> > the
>> > >> > failure occurs.
>> > >> > >>> - Note also that the failure does not occur if I run a more
>> > complex
>> > >> > code (eg, a NAS benchmark).
>> > >> > >>>
>> > >> > >>> Thomas Ropars
>> > >>
>> > >> > >>> _______________________________________________
>> > >> > >>> discuss mailing list discuss at mpich.org
>> > >> > >>> To manage subscription options or unsubscribe:
>> > >> > >>> https://lists.mpich.org/mailman/listinfo/discuss
>> > >> > >> _______________________________________________
>> > >> > >> discuss mailing list discuss at mpich.org
>> > >> > >> To manage subscription options or unsubscribe:
>> > >> > >> https://lists.mpich.org/mailman/listinfo/discuss
>> > >> > >>
>> > >> > >>
>> > >> > >
>> > >> > > <ring_clean.c>_______________________________________________
>> > >>
>> > >> > > discuss mailing list discuss at mpich.org
>> > >> > > To manage subscription options or unsubscribe:
>> > >> > > https://lists.mpich.org/mailman/listinfo/discuss
>> > >> >
>> > >> >
>> > >> >
>> > >> > ------------------------------
>> > >> >
>> > >> > Message: 3
>> > >> > Date: Wed, 10 Jul 2013 16:41:27 +0200
>> > >> > From: Bob Ilgner <bobilgner at gmail.com>
>> > >> > To: mpich-discuss at mcs.anl.gov
>> > >> > Subject: [mpich-discuss] Restrict number of cores, not threads
>> > >> > Message-ID:
>> > >> > <
>> > >> > CAKv15b-QgmHkVkoiTFmP3EZXvyy6sc_QeqHQgbMUhnr3Xh9ecA at mail.gmail.com
>> >
>> > >> > Content-Type: text/plain; charset="iso-8859-1"
>> > >> >
>> > >> > Dear all,
>> > >> >
>> > >> > I am working on a shared memory processor with 256 cores. I am
>> working
>> > >> > from
>> > >> > the command line directly.
>> > >> >
>> > >> > Can I restict the number of cores that I deploy.The command
>> > >> >
>> > >> > mpirun -n 100 myprog
>> > >> >
>> > >> >
>> > >> > will automatically start on 100 cores. I wish to use only 10 cores
>> and
>> > >> > have
>> > >> > 10 threads on each core. Can I do this with mpich ? Rememebre that
>> > this
>> > >> > an
>> > >> > smp abd I can not identify each core individually(as in a cluster)
>> > >> >
>> > >> > Regards, bob
>> > >> > -------------- next part --------------
>> > >> > An HTML attachment was scrubbed...
>> > >> > URL: <
>> > >> >
>> > >> >
>> >
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/ec659e91/attachment-0001.html
>> > >> > >
>> > >> >
>> > >> > ------------------------------
>> > >> >
>> > >> > Message: 4
>> > >> > Date: Wed, 10 Jul 2013 09:46:38 -0500
>> > >> > From: Wesley Bland <wbland at mcs.anl.gov>
>> > >> > To: discuss at mpich.org
>> > >> > Cc: mpich-discuss at mcs.anl.gov
>> > >> > Subject: Re: [mpich-discuss] Restrict number of cores, not threads
>> > >> > Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
>> > >> > Content-Type: text/plain; charset=iso-8859-1
>> > >> >
>> > >> > Threads in MPI are not ranks. When you say you want to launch with
>> -n
>> > >> > 100,
>> > >> > you will always get 100 processes, not threads. If you want 10
>> threads
>> > >> > on
>> > >> > 10 cores, you will need to launch with -n 10, then add your threads
>> > >> > according to your threading library.
>> > >> >
>> > >> > Note that threads in MPI do not get their own rank currently. They
>> all
>> > >> > share the same rank as the process in which they reside, so if you
>> > need
>> > >> > to
>> > >> > be able to handle things with different ranks, you'll need to use
>> > actual
>> > >> > processes.
>> > >> >
>> > >> > Wesley
>> > >> >
>> > >> > On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com>
>> wrote:
>> > >> >
>> > >> > > Dear all,
>> > >> > >
>> > >> > > I am working on a shared memory processor with 256 cores. I am
>> > working
>> > >> > from the command line directly.
>> > >> > >
>> > >> > > Can I restict the number of cores that I deploy.The command
>> > >> > >
>> > >> > > mpirun -n 100 myprog
>> > >> > >
>> > >> > >
>> > >> > > will automatically start on 100 cores. I wish to use only 10
>> cores
>> > and
>> > >> > have 10 threads on each core. Can I do this with mpich ? Rememebre
>> > that
>> > >> > this an smp abd I can not identify each core individually(as in a
>> > >> > cluster)
>> > >> > >
>> > >> > > Regards, bob
>> > >>
>> > >> > > _______________________________________________
>> > >> > > discuss mailing list discuss at mpich.org
>> > >> > > To manage subscription options or unsubscribe:
>> > >> > > https://lists.mpich.org/mailman/listinfo/discuss
>> > >> >
>> > >> >
>> > >> >
>> > >> > ------------------------------
>> > >> >
>> > >> > Message: 5
>> > >> > Date: Wed, 10 Jul 2013 09:46:38 -0500
>> > >> > From: Wesley Bland <wbland at mcs.anl.gov>
>> > >> > To: discuss at mpich.org
>> > >> > Cc: mpich-discuss at mcs.anl.gov
>> > >> > Subject: Re: [mpich-discuss] Restrict number of cores, not threads
>> > >> > Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
>> > >> > Content-Type: text/plain; charset=iso-8859-1
>> > >> >
>> > >> > Threads in MPI are not ranks. When you say you want to launch with
>> -n
>> > >> > 100,
>> > >> > you will always get 100 processes, not threads. If you want 10
>> threads
>> > >> > on
>> > >> > 10 cores, you will need to launch with -n 10, then add your threads
>> > >> > according to your threading library.
>> > >> >
>> > >> > Note that threads in MPI do not get their own rank currently. They
>> all
>> > >> > share the same rank as the process in which they reside, so if you
>> > need
>> > >> > to
>> > >> > be able to handle things with different ranks, you'll need to use
>> > actual
>> > >> > processes.
>> > >> >
>> > >> > Wesley
>> > >> >
>> > >> > On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com>
>> wrote:
>> > >> >
>> > >> > > Dear all,
>> > >> > >
>> > >> > > I am working on a shared memory processor with 256 cores. I am
>> > working
>> > >> > from the command line directly.
>> > >> > >
>> > >> > > Can I restict the number of cores that I deploy.The command
>> > >> > >
>> > >> > > mpirun -n 100 myprog
>> > >> > >
>> > >> > >
>> > >> > > will automatically start on 100 cores. I wish to use only 10
>> cores
>> > and
>> > >> > have 10 threads on each core. Can I do this with mpich ? Rememebre
>> > that
>> > >> > this an smp abd I can not identify each core individually(as in a
>> > >> > cluster)
>> > >> > >
>> > >> > > Regards, bob
>> > >>
>> > >> > > _______________________________________________
>> > >> > > discuss mailing list discuss at mpich.org
>> > >> > > To manage subscription options or unsubscribe:
>> > >> > > https://lists.mpich.org/mailman/listinfo/discuss
>> > >> >
>> > >> >
>> > >> >
>> > >> > ------------------------------
>> > >> >
>> > >> > Message: 6
>> > >> > Date: Wed, 10 Jul 2013 16:50:36 +0200
>> > >> > From: Thomas Ropars <thomas.ropars at epfl.ch>
>> > >> > To: discuss at mpich.org
>> > >> > Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple ring
>> > >> > test over TCP
>> > >> > Message-ID: <51DD74BC.3020009 at epfl.ch>
>> > >> > Content-Type: text/plain; charset=UTF-8; format=flowed
>> > >> >
>> > >> > Yes, you are right, sorry for disturbing.
>> > >> >
>> > >> > On 07/10/2013 03:39 PM, Wesley Bland wrote:
>> > >> > > The value of previous for rank 0 in your code is -1. MPICH is
>> > >> > complaining because all of the requests to receive a message from
>> -1
>> > are
>> > >> > still pending when you try to finalize. You need to make sure that
>> you
>> > >> > are
>> > >> > receiving from valid ranks.
>> > >> > >
>> > >> > > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <
>> thomas.ropars at epfl.ch>
>> > >> > wrote:
>> > >> > >
>> > >> > >> Yes sure. Here it is.
>> > >> > >>
>> > >> > >> Thomas
>> > >> > >>
>> > >> > >> On 07/10/2013 02:23 PM, Wesley Bland wrote:
>> > >> > >>> Can you send us the smallest chunk of code that still exhibits
>> > this
>> > >> > error?
>> > >> > >>>
>> > >> > >>> Wesley
>> > >> > >>>
>> > >> > >>> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <
>> thomas.ropars at epfl.ch
>> > >
>> > >> > wrote:
>> > >> > >>>
>> > >> > >>>> Hi all,
>> > >> > >>>>
>> > >> > >>>> I get the following error when I try to run a simple
>> application
>> > >> > implementing a ring (each process sends to rank+1 and receives from
>> > >> > rank-1). More precisely, the error occurs during the call to
>> > >> > MPI_Finalize():
>> > >> > >>>>
>> > >> > >>>> Assertion failed in file
>> > >> > src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363:
>> > >> > sc->pg_is_set
>> > >> > >>>> internal ABORT - process 0
>> > >> > >>>>
>> > >> > >>>> Does anybody else also noticed the same error?
>> > >> > >>>>
>> > >> > >>>> Here are all the details about my test:
>> > >> > >>>> - The error is generated with mpich-3.0.2 (but I noticed the
>> > exact
>> > >> > same error with mpich-3.0.4)
>> > >> > >>>> - I am using IPoIB for communication between nodes (The same
>> > thing
>> > >> > happens over Ethernet)
>> > >> > >>>> - The problem comes from TCP links. When all processes are on
>> the
>> > >> > same node, there is no error. As soon as one process is on a remote
>> > >> > node,
>> > >> > the failure occurs.
>> > >> > >>>> - Note also that the failure does not occur if I run a more
>> > complex
>> > >> > code (eg, a NAS benchmark).
>> > >> > >>>>
>> > >> > >>>> Thomas Ropars
>> > >>
>> > >> > >>>> _______________________________________________
>> > >> > >>>> discuss mailing list discuss at mpich.org
>> > >> > >>>> To manage subscription options or unsubscribe:
>> > >> > >>>> https://lists.mpich.org/mailman/listinfo/discuss
>> > >> > >>> _______________________________________________
>> > >> > >>> discuss mailing list discuss at mpich.org
>> > >> > >>> To manage subscription options or unsubscribe:
>> > >> > >>> https://lists.mpich.org/mailman/listinfo/discuss
>> > >> > >>>
>> > >> > >>>
>> > >> > >> <ring_clean.c>_______________________________________________
>> > >>
>> > >> > >> discuss mailing list discuss at mpich.org
>> > >> > >> To manage subscription options or unsubscribe:
>> > >> > >> https://lists.mpich.org/mailman/listinfo/discuss
>> > >> > > _______________________________________________
>> > >> > > discuss mailing list discuss at mpich.org
>> > >> > > To manage subscription options or unsubscribe:
>> > >> > > https://lists.mpich.org/mailman/listinfo/discuss
>> > >> > >
>> > >> > >
>> > >> >
>> > >> >
>> > >> >
>> > >> > ------------------------------
>> > >> >
>> > >> > Message: 7
>> > >> > Date: Wed, 10 Jul 2013 10:07:21 -0500
>> > >> > From: Sufeng Niu <sniu at hawk.iit.edu>
>> > >> > To: discuss at mpich.org
>> > >> > Subject: [mpich-discuss] MPI_Win_fence failed
>> > >> > Message-ID:
>> > >> > <
>> > >> > CAFNNHkz_1gC7hfpx0G9j24adO-gDabdmwZ4VuT6jip-fDMhS9A at mail.gmail.com
>> >
>> > >> > Content-Type: text/plain; charset="iso-8859-1"
>> > >>
>> > >> >
>> > >> > Hello,
>> > >> >
>> > >> > I used MPI RMA in my program, but the program stop at the
>> > MPI_Win_fence,
>> > >> > I
>> > >> > have a master process receive data from udp socket. Other processes
>> > use
>> > >> > MPI_Get to access data.
>> > >> >
>> > >> > master process:
>> > >> >
>> > >> > MPI_Create(...)
>> > >> > for(...){
>> > >> > /* udp recv operation */
>> > >> >
>> > >> > MPI_Barrier // to let other process know data received from udp is
>> > >> > ready
>> > >> >
>> > >> > MPI_Win_fence(0, win);
>> > >> > MPI_Win_fence(0, win);
>> > >> >
>> > >> > }
>> > >> >
>> > >> > other processes:
>> > >> >
>> > >> > for(...){
>> > >> >
>> > >> > MPI_Barrier // sync for udp data ready
>> > >> >
>> > >> > MPI_Win_fence(0, win);
>> > >> >
>> > >> > MPI_Get();
>> > >> >
>> > >> > MPI_Win_fence(0, win); <-- program stopped here
>> > >> >
>> > >> > /* other operation */
>> > >> > }
>> > >> >
>> > >> > I found that the program stopped at second MPI_Win_fence, the
>> terminal
>> > >> > output is:
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> >
>> ===================================================================================
>> > >> > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> > >> > = EXIT CODE: 11
>> > >> > = CLEANING UP REMAINING PROCESSES
>> > >> > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> > >> >
>> > >> >
>> > >> >
>> >
>> ===================================================================================
>> > >> > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation
>> fault
>> > >> > (signal 11)
>> > >> > This typically refers to a problem with your application.
>> > >> > Please see the FAQ page for debugging suggestions
>> > >> >
>> > >> > Do you have any suggestions? Thank you very much!
>> > >> >
>> > >> > --
>> > >> > Best Regards,
>> > >> > Sufeng Niu
>> > >> > ECASP lab, ECE department, Illinois Institute of Technology
>> > >> > Tel: 312-731-7219
>> > >> > -------------- next part --------------
>> > >> > An HTML attachment was scrubbed...
>> > >> > URL: <
>> > >> >
>> > >> >
>> >
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/375a95ac/attachment-0001.html
>> > >> > >
>> > >> >
>> > >> > ------------------------------
>> > >> >
>> > >> > Message: 8
>> > >> > Date: Wed, 10 Jul 2013 11:12:45 -0400
>> > >> > From: Jim Dinan <james.dinan at gmail.com>
>> > >> > To: discuss at mpich.org
>> > >> > Subject: Re: [mpich-discuss] MPI_Win_fence failed
>> > >> > Message-ID:
>> > >> > <CAOoEU4F3hX=y3yrJKYKucNeiueQYBeR_3OQas9E+mg+GM6Rz=
>> > >> > w at mail.gmail.com>
>> > >> > Content-Type: text/plain; charset="iso-8859-1"
>> > >>
>> > >> >
>> > >> > It's hard to tell where the segmentation fault is coming from. Can
>> > you
>> > >> > use
>> > >> > a debugger to generate a backtrace?
>> > >> >
>> > >> > ~Jim.
>> > >> >
>> > >> >
>> > >> > On Wed, Jul 10, 2013 at 11:07 AM, Sufeng Niu <sniu at hawk.iit.edu>
>> > wrote:
>> > >> >
>> > >> > > Hello,
>> > >> > >
>> > >> > > I used MPI RMA in my program, but the program stop at the
>> > >> > > MPI_Win_fence,
>> > >> > I
>> > >> > > have a master process receive data from udp socket. Other
>> processes
>> > >> > > use
>> > >> > > MPI_Get to access data.
>> > >> > >
>> > >> > > master process:
>> > >> > >
>> > >> > > MPI_Create(...)
>> > >> > > for(...){
>> > >> > > /* udp recv operation */
>> > >> > >
>> > >> > > MPI_Barrier // to let other process know data received from udp
>> is
>> > >> > > ready
>> > >> > >
>> > >> > > MPI_Win_fence(0, win);
>> > >> > > MPI_Win_fence(0, win);
>> > >> > >
>> > >> > > }
>> > >> > >
>> > >> > > other processes:
>> > >> > >
>> > >> > > for(...){
>> > >> > >
>> > >> > > MPI_Barrier // sync for udp data ready
>> > >> > >
>> > >> > > MPI_Win_fence(0, win);
>> > >> > >
>> > >> > > MPI_Get();
>> > >> > >
>> > >> > > MPI_Win_fence(0, win); <-- program stopped here
>> > >> > >
>> > >> > > /* other operation */
>> > >> > > }
>> > >> > >
>> > >> > > I found that the program stopped at second MPI_Win_fence, the
>> > terminal
>> > >> > > output is:
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> >
>> > >> >
>> >
>> ===================================================================================
>> > >> > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> > >> > > = EXIT CODE: 11
>> > >> > > = CLEANING UP REMAINING PROCESSES
>> > >> > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> > >> > >
>> > >> > >
>> > >> >
>> > >> >
>> >
>> ===================================================================================
>> > >> > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation
>> fault
>> > >> > > (signal 11)
>> > >> > > This typically refers to a problem with your application.
>> > >> > > Please see the FAQ page for debugging suggestions
>> > >> > >
>> > >> > > Do you have any suggestions? Thank you very much!
>> > >> > >
>> > >> > > --
>> > >> > > Best Regards,
>> > >> > > Sufeng Niu
>> > >> > > ECASP lab, ECE department, Illinois Institute of Technology
>> > >> > > Tel: 312-731-7219
>> > >> > >
>> > >> > > _______________________________________________
>> > >> > > discuss mailing list discuss at mpich.org
>> > >> > > To manage subscription options or unsubscribe:
>> > >> > > https://lists.mpich.org/mailman/listinfo/discuss
>> > >> > >
>> > >> > -------------- next part --------------
>> > >> > An HTML attachment was scrubbed...
>> > >> > URL: <
>> > >> >
>> > >> >
>> >
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48c5f337/attachment.html
>> > >> > >
>> > >> >
>> > >> > ------------------------------
>> > >>
>> > >> >
>> > >> > _______________________________________________
>> > >> > discuss mailing list
>> > >> > discuss at mpich.org
>> > >> > https://lists.mpich.org/mailman/listinfo/discuss
>> > >> >
>> > >> > End of discuss Digest, Vol 9, Issue 27
>> > >> > **************************************
>> > >>
>> > >> >
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Best Regards,
>> > >> Sufeng Niu
>> > >> ECASP lab, ECE department, Illinois Institute of Technology
>> > >> Tel: 312-731-7219
>> > >> -------------- next part --------------
>> > >> An HTML attachment was scrubbed...
>> > >> URL:
>> > >> <
>> >
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/57a5e76f/attachment.html
>> > >
>> > >>
>> > >> ------------------------------
>> > >>
>> > >>
>> > >> _______________________________________________
>> > >> discuss mailing list
>> > >> discuss at mpich.org
>> > >> https://lists.mpich.org/mailman/listinfo/discuss
>> > >>
>> > >> End of discuss Digest, Vol 9, Issue 28
>> > >> **************************************
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > Best Regards,
>> > > Sufeng Niu
>> > > ECASP lab, ECE department, Illinois Institute of Technology
>> > > Tel: 312-731-7219
>> > >
>> > > _______________________________________________
>> > > discuss mailing list discuss at mpich.org
>> > > To manage subscription options or unsubscribe:
>> > > https://lists.mpich.org/mailman/listinfo/discuss
>> >
>> >
>> >
>> > --
>> > Jeff Hammond
>> > jeff.science at gmail.com
>> >
>> >
>> > ------------------------------
>> >
>> > Message: 2
>> > Date: Wed, 10 Jul 2013 11:57:31 -0500
>> > From: Sufeng Niu <sniu at hawk.iit.edu>
>> > To: discuss at mpich.org
>> > Subject: Re: [mpich-discuss] MPI_Win_fence failed
>> > Message-ID:
>> > <
>> > CAFNNHkzKmAg8B6hamyrr7B2anU9EP_0yxmajxePVr35UnHVavw at mail.gmail.com>
>> > Content-Type: text/plain; charset="iso-8859-1"
>> >
>> > Sorry, I found that this discussion email cannot add figure or
>> attachment.
>> >
>> > the backtrace information is below:
>> >
>> > processes Location
>> > PC Host Rank ID Status
>> > 7 _start
>> > 0x00402399
>> > `-7 _libc_start_main
>> > 0x3685c1ecdd
>> > `-7 main
>> > 0x00402474
>> > `-7 dkm
>> > ...
>> > |-6 image_rms
>> > 0x004029bb
>> > | `-6 rms
>> > 0x00402d44
>> > | `-6 PMPI_Win_fence
>> > 0x0040c389
>> > | `-6 MPIDI_Win_fence
>> > 0x004a45f4
>> > | `-6 MPIDI_CH3I_RMAListComplete 0x004a27d3
>> > | `-6 MPIDI_CH3I_Progress ...
>> > `-1 udp
>> > 0x004035cf
>> > `-1 PMPI_Win_fence
>> > 0x0040c389
>> > `-1 MPIDI_Win_fence
>> > 0x004a45a0
>> > `-1 MPIDI_CH3I_Progress
>> 0x004292f5
>> > `-1 MPIDI_CH3_PktHandler_Get 0x0049f3f9
>> > `-1 MPIDI_CH3_iSendv
>> 0x004aa67c
>> > `- memcpy
>> > 0x3685c89329 164.54.54.122 0 20.1-13994 Stopped
>> >
>> >
>> >
>> > On Wed, Jul 10, 2013 at 11:39 AM, <discuss-request at mpich.org> wrote:
>> >
>> > > Send discuss mailing list submissions to
>> > > discuss at mpich.org
>> > >
>> > > To subscribe or unsubscribe via the World Wide Web, visit
>> > > https://lists.mpich.org/mailman/listinfo/discuss
>> > > or, via email, send a message with subject or body 'help' to
>> > > discuss-request at mpich.org
>> > >
>> > > You can reach the person managing the list at
>> > > discuss-owner at mpich.org
>> > >
>> > > When replying, please edit your Subject line so it is more specific
>> > > than "Re: Contents of discuss digest..."
>> > >
>> > >
>> > > Today's Topics:
>> > >
>> > > 1. Re: MPI_Win_fence failed (Sufeng Niu)
>> > >
>> > >
>> > > ----------------------------------------------------------------------
>> > >
>> > > Message: 1
>> > > Date: Wed, 10 Jul 2013 11:39:39 -0500
>> > > From: Sufeng Niu <sniu at hawk.iit.edu>
>> > > To: discuss at mpich.org
>> > > Subject: Re: [mpich-discuss] MPI_Win_fence failed
>> > > Message-ID:
>> > > <CAFNNHkz8pBfX33icn=+3rdXvqDfWqeu58odpd=
>> > > mOXLciysHgfg at mail.gmail.com>
>> > > Content-Type: text/plain; charset="iso-8859-1"
>> > >
>> > > Sorry I forget to add screen shot for backtrace. the screen shot is
>> > > attached.
>> > >
>> > > Thanks a lot!
>> > >
>> > > Sufeng
>> > >
>> > >
>> > >
>> > > On Wed, Jul 10, 2013 at 11:30 AM, <discuss-request at mpich.org> wrote:
>> > >
>> > > > Send discuss mailing list submissions to
>> > > > discuss at mpich.org
>> > > >
>> > > > To subscribe or unsubscribe via the World Wide Web, visit
>> > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > > > or, via email, send a message with subject or body 'help' to
>> > > > discuss-request at mpich.org
>> > > >
>> > > > You can reach the person managing the list at
>> > > > discuss-owner at mpich.org
>> > > >
>> > > > When replying, please edit your Subject line so it is more specific
>> > > > than "Re: Contents of discuss digest..."
>> > > >
>> > > >
>> > > > Today's Topics:
>> > > >
>> > > > 1. Re: MPI_Win_fence failed (Sufeng Niu)
>> > > >
>> > > >
>> > > >
>> ----------------------------------------------------------------------
>> > > >
>> > > > Message: 1
>> > > > Date: Wed, 10 Jul 2013 11:30:36 -0500
>> > > > From: Sufeng Niu <sniu at hawk.iit.edu>
>> > > > To: discuss at mpich.org
>> > > > Subject: Re: [mpich-discuss] MPI_Win_fence failed
>> > > > Message-ID:
>> > > > <
>> > > > CAFNNHkyLj8CbYMmc_w2DA9_+q2Oe3kyus+g6c99ShPk6ZXVkdA at mail.gmail.com>
>> > > > Content-Type: text/plain; charset="iso-8859-1"
>> > > >
>> > > > Hi Jim,
>> > > >
>> > > > Thanks a lot for your reply. the basic way for me to debugging is
>> > > > barrier+printf, right now I only have an evaluation version of
>> > totalview.
>> > > > the backtrace using totalview shown below. the udp is the udp
>> > collection
>> > > > and create RMA window, image_rms doing MPI_Get to access the window
>> > > >
>> > > > There is a segment violation, but I don't know why the program
>> stopped
>> > > at
>> > > > MPI_Win_fence.
>> > > >
>> > > > Thanks a lot!
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Wed, Jul 10, 2013 at 10:12 AM, <discuss-request at mpich.org>
>> wrote:
>> > > >
>> > > > > Send discuss mailing list submissions to
>> > > > > discuss at mpich.org
>> > > > >
>> > > > > To subscribe or unsubscribe via the World Wide Web, visit
>> > > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > > > > or, via email, send a message with subject or body 'help' to
>> > > > > discuss-request at mpich.org
>> > > > >
>> > > > > You can reach the person managing the list at
>> > > > > discuss-owner at mpich.org
>> > > > >
>> > > > > When replying, please edit your Subject line so it is more
>> specific
>> > > > > than "Re: Contents of discuss digest..."
>> > > > >
>> > > > >
>> > > > > Today's Topics:
>> > > > >
>> > > > > 1. Re: MPICH3.0.4 make fails with "No rule to make target..."
>> > > > > (Wesley Bland)
>> > > > > 2. Re: Error in MPI_Finalize on a simple ring test over TCP
>> > > > > (Wesley Bland)
>> > > > > 3. Restrict number of cores, not threads (Bob Ilgner)
>> > > > > 4. Re: Restrict number of cores, not threads (Wesley Bland)
>> > > > > 5. Re: Restrict number of cores, not threads (Wesley Bland)
>> > > > > 6. Re: Error in MPI_Finalize on a simple ring test over TCP
>> > > > > (Thomas Ropars)
>> > > > > 7. MPI_Win_fence failed (Sufeng Niu)
>> > > > > 8. Re: MPI_Win_fence failed (Jim Dinan)
>> > > > >
>> > > > >
>> > > > >
>> > ----------------------------------------------------------------------
>> > > > >
>> > > > > Message: 1
>> > > > > Date: Wed, 10 Jul 2013 08:29:06 -0500
>> > > > > From: Wesley Bland <wbland at mcs.anl.gov>
>> > > > > To: discuss at mpich.org
>> > > > > Subject: Re: [mpich-discuss] MPICH3.0.4 make fails with "No rule
>> to
>> > > > > make target..."
>> > > > > Message-ID: <F48FC916-31F7-4F82-95F8-2D6A6C45264F at mcs.anl.gov>
>> > > > > Content-Type: text/plain; charset="iso-8859-1"
>> > > > >
>> > > > > Unfortunately, due to the lack of developer resources and
>> interest,
>> > the
>> > > > > last version of MPICH which was supported on Windows was 1.4.1p.
>> You
>> > > can
>> > > > > find that version on the downloads page:
>> > > > >
>> > > > > http://www.mpich.org/downloads/
>> > > > >
>> > > > > Alternatively, Microsoft maintains a derivative of MPICH which
>> should
>> > > > > provide the features you need. You also find a link to that on the
>> > > > > downloads page above.
>> > > > >
>> > > > > Wesley
>> > > > >
>> > > > > On Jul 10, 2013, at 1:16 AM, Don Warren <don.warren at gmail.com>
>> > wrote:
>> > > > >
>> > > > > > Hello,
>> > > > > >
>> > > > > > As requested in the installation guide, I'm informing this list
>> of
>> > a
>> > > > > failure to correctly make MPICH3.0.4 on a Win7 system. The
>> specific
>> > > > error
>> > > > > encountered is
>> > > > > > "make[2]: *** No rule to make target
>> > > > > `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.am', needed
>> by
>> > > > > `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.in'. Stop."
>> > > > > >
>> > > > > > I have confirmed that both Makefile.am and Makefile.in exist in
>> the
>> > > > > directory listed. I'm attaching the c.txt and the m.txt files.
>> > > > > >
>> > > > > > Possibly of interest is that the command "make clean" fails at
>> > > exactly
>> > > > > the same folder, with exactly the same error message as shown in
>> > m.txt
>> > > > and
>> > > > > above.
>> > > > > >
>> > > > > > Any advice you can give would be appreciated. I'm attempting to
>> > get
>> > > > > FLASH running on my computer, which seems to require MPICH.
>> > > > > >
>> > > > > > Regards,
>> > > > > > Don Warren
>> > > > > >
>> > > >
>> > <config-make-outputs.zip>_______________________________________________
>> > > > > > discuss mailing list discuss at mpich.org
>> > > > > > To manage subscription options or unsubscribe:
>> > > > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > > > >
>> > > > > -------------- next part --------------
>> > > > > An HTML attachment was scrubbed...
>> > > > > URL: <
>> > > > >
>> > > >
>> > >
>> >
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/69b497f1/attachment-0001.html
>> > > > > >
>> > > > >
>> > > > > ------------------------------
>> > > > >
>> > > > > Message: 2
>> > > > > Date: Wed, 10 Jul 2013 08:39:47 -0500
>> > > > > From: Wesley Bland <wbland at mcs.anl.gov>
>> > > > > To: discuss at mpich.org
>> > > > > Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple
>> ring
>> > > > > test over TCP
>> > > > > Message-ID: <D5999106-2A75-4091-8B0F-EAFA22880862 at mcs.anl.gov>
>> > > > > Content-Type: text/plain; charset=us-ascii
>> > > > >
>> > > > > The value of previous for rank 0 in your code is -1. MPICH is
>> > > complaining
>> > > > > because all of the requests to receive a message from -1 are still
>> > > > pending
>> > > > > when you try to finalize. You need to make sure that you are
>> > receiving
>> > > > from
>> > > > > valid ranks.
>> > > > >
>> > > > > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <thomas.ropars at epfl.ch
>> >
>> > > > wrote:
>> > > > >
>> > > > > > Yes sure. Here it is.
>> > > > > >
>> > > > > > Thomas
>> > > > > >
>> > > > > > On 07/10/2013 02:23 PM, Wesley Bland wrote:
>> > > > > >> Can you send us the smallest chunk of code that still exhibits
>> > this
>> > > > > error?
>> > > > > >>
>> > > > > >> Wesley
>> > > > > >>
>> > > > > >> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <
>> thomas.ropars at epfl.ch
>> > >
>> > > > > wrote:
>> > > > > >>
>> > > > > >>> Hi all,
>> > > > > >>>
>> > > > > >>> I get the following error when I try to run a simple
>> application
>> > > > > implementing a ring (each process sends to rank+1 and receives
>> from
>> > > > > rank-1). More precisely, the error occurs during the call to
>> > > > MPI_Finalize():
>> > > > > >>>
>> > > > > >>> Assertion failed in file
>> > > > > src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363:
>> > > > sc->pg_is_set
>> > > > > >>> internal ABORT - process 0
>> > > > > >>>
>> > > > > >>> Does anybody else also noticed the same error?
>> > > > > >>>
>> > > > > >>> Here are all the details about my test:
>> > > > > >>> - The error is generated with mpich-3.0.2 (but I noticed the
>> > exact
>> > > > > same error with mpich-3.0.4)
>> > > > > >>> - I am using IPoIB for communication between nodes (The same
>> > thing
>> > > > > happens over Ethernet)
>> > > > > >>> - The problem comes from TCP links. When all processes are on
>> the
>> > > > same
>> > > > > node, there is no error. As soon as one process is on a remote
>> node,
>> > > the
>> > > > > failure occurs.
>> > > > > >>> - Note also that the failure does not occur if I run a more
>> > complex
>> > > > > code (eg, a NAS benchmark).
>> > > > > >>>
>> > > > > >>> Thomas Ropars
>> > > > > >>> _______________________________________________
>> > > > > >>> discuss mailing list discuss at mpich.org
>> > > > > >>> To manage subscription options or unsubscribe:
>> > > > > >>> https://lists.mpich.org/mailman/listinfo/discuss
>> > > > > >> _______________________________________________
>> > > > > >> discuss mailing list discuss at mpich.org
>> > > > > >> To manage subscription options or unsubscribe:
>> > > > > >> https://lists.mpich.org/mailman/listinfo/discuss
>> > > > > >>
>> > > > > >>
>> > > > > >
>> > > > > > <ring_clean.c>_______________________________________________
>> > > > > > discuss mailing list discuss at mpich.org
>> > > > > > To manage subscription options or unsubscribe:
>> > > > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > > > >
>> > > > >
>> > > > >
>> > > > > ------------------------------
>> > > > >
>> > > > > Message: 3
>> > > > > Date: Wed, 10 Jul 2013 16:41:27 +0200
>> > > > > From: Bob Ilgner <bobilgner at gmail.com>
>> > > > > To: mpich-discuss at mcs.anl.gov
>> > > > > Subject: [mpich-discuss] Restrict number of cores, not threads
>> > > > > Message-ID:
>> > > > > <
>> > > > >
>> CAKv15b-QgmHkVkoiTFmP3EZXvyy6sc_QeqHQgbMUhnr3Xh9ecA at mail.gmail.com>
>> > > > > Content-Type: text/plain; charset="iso-8859-1"
>> > > > >
>> > > > > Dear all,
>> > > > >
>> > > > > I am working on a shared memory processor with 256 cores. I am
>> > working
>> > > > from
>> > > > > the command line directly.
>> > > > >
>> > > > > Can I restict the number of cores that I deploy.The command
>> > > > >
>> > > > > mpirun -n 100 myprog
>> > > > >
>> > > > >
>> > > > > will automatically start on 100 cores. I wish to use only 10 cores
>> > and
>> > > > have
>> > > > > 10 threads on each core. Can I do this with mpich ? Rememebre
>> that
>> > > this
>> > > > an
>> > > > > smp abd I can not identify each core individually(as in a cluster)
>> > > > >
>> > > > > Regards, bob
>> > > > > -------------- next part --------------
>> > > > > An HTML attachment was scrubbed...
>> > > > > URL: <
>> > > > >
>> > > >
>> > >
>> >
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/ec659e91/attachment-0001.html
>> > > > > >
>> > > > >
>> > > > > ------------------------------
>> > > > >
>> > > > > Message: 4
>> > > > > Date: Wed, 10 Jul 2013 09:46:38 -0500
>> > > > > From: Wesley Bland <wbland at mcs.anl.gov>
>> > > > > To: discuss at mpich.org
>> > > > > Cc: mpich-discuss at mcs.anl.gov
>> > > > > Subject: Re: [mpich-discuss] Restrict number of cores, not threads
>> > > > > Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
>> > > > > Content-Type: text/plain; charset=iso-8859-1
>> > > > >
>> > > > > Threads in MPI are not ranks. When you say you want to launch
>> with -n
>> > > > 100,
>> > > > > you will always get 100 processes, not threads. If you want 10
>> > threads
>> > > on
>> > > > > 10 cores, you will need to launch with -n 10, then add your
>> threads
>> > > > > according to your threading library.
>> > > > >
>> > > > > Note that threads in MPI do not get their own rank currently. They
>> > all
>> > > > > share the same rank as the process in which they reside, so if you
>> > need
>> > > > to
>> > > > > be able to handle things with different ranks, you'll need to use
>> > > actual
>> > > > > processes.
>> > > > >
>> > > > > Wesley
>> > > > >
>> > > > > On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com>
>> wrote:
>> > > > >
>> > > > > > Dear all,
>> > > > > >
>> > > > > > I am working on a shared memory processor with 256 cores. I am
>> > > working
>> > > > > from the command line directly.
>> > > > > >
>> > > > > > Can I restict the number of cores that I deploy.The command
>> > > > > >
>> > > > > > mpirun -n 100 myprog
>> > > > > >
>> > > > > >
>> > > > > > will automatically start on 100 cores. I wish to use only 10
>> cores
>> > > and
>> > > > > have 10 threads on each core. Can I do this with mpich ?
>> Rememebre
>> > > that
>> > > > > this an smp abd I can not identify each core individually(as in a
>> > > > cluster)
>> > > > > >
>> > > > > > Regards, bob
>> > > > > > _______________________________________________
>> > > > > > discuss mailing list discuss at mpich.org
>> > > > > > To manage subscription options or unsubscribe:
>> > > > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > > > >
>> > > > >
>> > > > >
>> > > > > ------------------------------
>> > > > >
>> > > > > Message: 5
>> > > > > Date: Wed, 10 Jul 2013 09:46:38 -0500
>> > > > > From: Wesley Bland <wbland at mcs.anl.gov>
>> > > > > To: discuss at mpich.org
>> > > > > Cc: mpich-discuss at mcs.anl.gov
>> > > > > Subject: Re: [mpich-discuss] Restrict number of cores, not threads
>> > > > > Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
>> > > > > Content-Type: text/plain; charset=iso-8859-1
>> > > > >
>> > > > > Threads in MPI are not ranks. When you say you want to launch
>> with -n
>> > > > 100,
>> > > > > you will always get 100 processes, not threads. If you want 10
>> > threads
>> > > on
>> > > > > 10 cores, you will need to launch with -n 10, then add your
>> threads
>> > > > > according to your threading library.
>> > > > >
>> > > > > Note that threads in MPI do not get their own rank currently. They
>> > all
>> > > > > share the same rank as the process in which they reside, so if you
>> > need
>> > > > to
>> > > > > be able to handle things with different ranks, you'll need to use
>> > > actual
>> > > > > processes.
>> > > > >
>> > > > > Wesley
>> > > > >
>> > > > > On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com>
>> wrote:
>> > > > >
>> > > > > > Dear all,
>> > > > > >
>> > > > > > I am working on a shared memory processor with 256 cores. I am
>> > > working
>> > > > > from the command line directly.
>> > > > > >
>> > > > > > Can I restict the number of cores that I deploy.The command
>> > > > > >
>> > > > > > mpirun -n 100 myprog
>> > > > > >
>> > > > > >
>> > > > > > will automatically start on 100 cores. I wish to use only 10
>> cores
>> > > and
>> > > > > have 10 threads on each core. Can I do this with mpich ?
>> Rememebre
>> > > that
>> > > > > this an smp abd I can not identify each core individually(as in a
>> > > > cluster)
>> > > > > >
>> > > > > > Regards, bob
>> > > > > > _______________________________________________
>> > > > > > discuss mailing list discuss at mpich.org
>> > > > > > To manage subscription options or unsubscribe:
>> > > > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > > > >
>> > > > >
>> > > > >
>> > > > > ------------------------------
>> > > > >
>> > > > > Message: 6
>> > > > > Date: Wed, 10 Jul 2013 16:50:36 +0200
>> > > > > From: Thomas Ropars <thomas.ropars at epfl.ch>
>> > > > > To: discuss at mpich.org
>> > > > > Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple
>> ring
>> > > > > test over TCP
>> > > > > Message-ID: <51DD74BC.3020009 at epfl.ch>
>> > > > > Content-Type: text/plain; charset=UTF-8; format=flowed
>> > > > >
>> > > > > Yes, you are right, sorry for disturbing.
>> > > > >
>> > > > > On 07/10/2013 03:39 PM, Wesley Bland wrote:
>> > > > > > The value of previous for rank 0 in your code is -1. MPICH is
>> > > > > complaining because all of the requests to receive a message from
>> -1
>> > > are
>> > > > > still pending when you try to finalize. You need to make sure that
>> > you
>> > > > are
>> > > > > receiving from valid ranks.
>> > > > > >
>> > > > > > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <
>> thomas.ropars at epfl.ch>
>> > > > > wrote:
>> > > > > >
>> > > > > >> Yes sure. Here it is.
>> > > > > >>
>> > > > > >> Thomas
>> > > > > >>
>> > > > > >> On 07/10/2013 02:23 PM, Wesley Bland wrote:
>> > > > > >>> Can you send us the smallest chunk of code that still exhibits
>> > this
>> > > > > error?
>> > > > > >>>
>> > > > > >>> Wesley
>> > > > > >>>
>> > > > > >>> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <
>> > thomas.ropars at epfl.ch>
>> > > > > wrote:
>> > > > > >>>
>> > > > > >>>> Hi all,
>> > > > > >>>>
>> > > > > >>>> I get the following error when I try to run a simple
>> application
>> > > > > implementing a ring (each process sends to rank+1 and receives
>> from
>> > > > > rank-1). More precisely, the error occurs during the call to
>> > > > MPI_Finalize():
>> > > > > >>>>
>> > > > > >>>> Assertion failed in file
>> > > > > src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363:
>> > > > sc->pg_is_set
>> > > > > >>>> internal ABORT - process 0
>> > > > > >>>>
>> > > > > >>>> Does anybody else also noticed the same error?
>> > > > > >>>>
>> > > > > >>>> Here are all the details about my test:
>> > > > > >>>> - The error is generated with mpich-3.0.2 (but I noticed the
>> > exact
>> > > > > same error with mpich-3.0.4)
>> > > > > >>>> - I am using IPoIB for communication between nodes (The same
>> > thing
>> > > > > happens over Ethernet)
>> > > > > >>>> - The problem comes from TCP links. When all processes are on
>> > the
>> > > > > same node, there is no error. As soon as one process is on a
>> remote
>> > > node,
>> > > > > the failure occurs.
>> > > > > >>>> - Note also that the failure does not occur if I run a more
>> > > complex
>> > > > > code (eg, a NAS benchmark).
>> > > > > >>>>
>> > > > > >>>> Thomas Ropars
>> > > > > >>>> _______________________________________________
>> > > > > >>>> discuss mailing list discuss at mpich.org
>> > > > > >>>> To manage subscription options or unsubscribe:
>> > > > > >>>> https://lists.mpich.org/mailman/listinfo/discuss
>> > > > > >>> _______________________________________________
>> > > > > >>> discuss mailing list discuss at mpich.org
>> > > > > >>> To manage subscription options or unsubscribe:
>> > > > > >>> https://lists.mpich.org/mailman/listinfo/discuss
>> > > > > >>>
>> > > > > >>>
>> > > > > >> <ring_clean.c>_______________________________________________
>> > > > > >> discuss mailing list discuss at mpich.org
>> > > > > >> To manage subscription options or unsubscribe:
>> > > > > >> https://lists.mpich.org/mailman/listinfo/discuss
>> > > > > > _______________________________________________
>> > > > > > discuss mailing list discuss at mpich.org
>> > > > > > To manage subscription options or unsubscribe:
>> > > > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > ------------------------------
>> > > > >
>> > > > > Message: 7
>> > > > > Date: Wed, 10 Jul 2013 10:07:21 -0500
>> > > > > From: Sufeng Niu <sniu at hawk.iit.edu>
>> > > > > To: discuss at mpich.org
>> > > > > Subject: [mpich-discuss] MPI_Win_fence failed
>> > > > > Message-ID:
>> > > > > <
>> > > > >
>> CAFNNHkz_1gC7hfpx0G9j24adO-gDabdmwZ4VuT6jip-fDMhS9A at mail.gmail.com>
>> > > > > Content-Type: text/plain; charset="iso-8859-1"
>> > > > >
>> > > > > Hello,
>> > > > >
>> > > > > I used MPI RMA in my program, but the program stop at the
>> > > MPI_Win_fence,
>> > > > I
>> > > > > have a master process receive data from udp socket. Other
>> processes
>> > use
>> > > > > MPI_Get to access data.
>> > > > >
>> > > > > master process:
>> > > > >
>> > > > > MPI_Create(...)
>> > > > > for(...){
>> > > > > /* udp recv operation */
>> > > > >
>> > > > > MPI_Barrier // to let other process know data received from udp
>> is
>> > > ready
>> > > > >
>> > > > > MPI_Win_fence(0, win);
>> > > > > MPI_Win_fence(0, win);
>> > > > >
>> > > > > }
>> > > > >
>> > > > > other processes:
>> > > > >
>> > > > > for(...){
>> > > > >
>> > > > > MPI_Barrier // sync for udp data ready
>> > > > >
>> > > > > MPI_Win_fence(0, win);
>> > > > >
>> > > > > MPI_Get();
>> > > > >
>> > > > > MPI_Win_fence(0, win); <-- program stopped here
>> > > > >
>> > > > > /* other operation */
>> > > > > }
>> > > > >
>> > > > > I found that the program stopped at second MPI_Win_fence, the
>> > terminal
>> > > > > output is:
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> ===================================================================================
>> > > > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> > > > > = EXIT CODE: 11
>> > > > > = CLEANING UP REMAINING PROCESSES
>> > > > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> ===================================================================================
>> > > > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation
>> fault
>> > > > > (signal 11)
>> > > > > This typically refers to a problem with your application.
>> > > > > Please see the FAQ page for debugging suggestions
>> > > > >
>> > > > > Do you have any suggestions? Thank you very much!
>> > > > >
>> > > > > --
>> > > > > Best Regards,
>> > > > > Sufeng Niu
>> > > > > ECASP lab, ECE department, Illinois Institute of Technology
>> > > > > Tel: 312-731-7219
>> > > > > -------------- next part --------------
>> > > > > An HTML attachment was scrubbed...
>> > > > > URL: <
>> > > > >
>> > > >
>> > >
>> >
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/375a95ac/attachment-0001.html
>> > > > > >
>> > > > >
>> > > > > ------------------------------
>> > > > >
>> > > > > Message: 8
>> > > > > Date: Wed, 10 Jul 2013 11:12:45 -0400
>> > > > > From: Jim Dinan <james.dinan at gmail.com>
>> > > > > To: discuss at mpich.org
>> > > > > Subject: Re: [mpich-discuss] MPI_Win_fence failed
>> > > > > Message-ID:
>> > > > > <CAOoEU4F3hX=y3yrJKYKucNeiueQYBeR_3OQas9E+mg+GM6Rz=
>> > > > > w at mail.gmail.com>
>> > > > > Content-Type: text/plain; charset="iso-8859-1"
>> > > > >
>> > > > > It's hard to tell where the segmentation fault is coming from.
>> Can
>> > you
>> > > > use
>> > > > > a debugger to generate a backtrace?
>> > > > >
>> > > > > ~Jim.
>> > > > >
>> > > > >
>> > > > > On Wed, Jul 10, 2013 at 11:07 AM, Sufeng Niu <sniu at hawk.iit.edu>
>> > > wrote:
>> > > > >
>> > > > > > Hello,
>> > > > > >
>> > > > > > I used MPI RMA in my program, but the program stop at the
>> > > > MPI_Win_fence,
>> > > > > I
>> > > > > > have a master process receive data from udp socket. Other
>> processes
>> > > use
>> > > > > > MPI_Get to access data.
>> > > > > >
>> > > > > > master process:
>> > > > > >
>> > > > > > MPI_Create(...)
>> > > > > > for(...){
>> > > > > > /* udp recv operation */
>> > > > > >
>> > > > > > MPI_Barrier // to let other process know data received from
>> udp is
>> > > > ready
>> > > > > >
>> > > > > > MPI_Win_fence(0, win);
>> > > > > > MPI_Win_fence(0, win);
>> > > > > >
>> > > > > > }
>> > > > > >
>> > > > > > other processes:
>> > > > > >
>> > > > > > for(...){
>> > > > > >
>> > > > > > MPI_Barrier // sync for udp data ready
>> > > > > >
>> > > > > > MPI_Win_fence(0, win);
>> > > > > >
>> > > > > > MPI_Get();
>> > > > > >
>> > > > > > MPI_Win_fence(0, win); <-- program stopped here
>> > > > > >
>> > > > > > /* other operation */
>> > > > > > }
>> > > > > >
>> > > > > > I found that the program stopped at second MPI_Win_fence, the
>> > > terminal
>> > > > > > output is:
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> ===================================================================================
>> > > > > > = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> > > > > > = EXIT CODE: 11
>> > > > > > = CLEANING UP REMAINING PROCESSES
>> > > > > > = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> ===================================================================================
>> > > > > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation
>> > fault
>> > > > > > (signal 11)
>> > > > > > This typically refers to a problem with your application.
>> > > > > > Please see the FAQ page for debugging suggestions
>> > > > > >
>> > > > > > Do you have any suggestions? Thank you very much!
>> > > > > >
>> > > > > > --
>> > > > > > Best Regards,
>> > > > > > Sufeng Niu
>> > > > > > ECASP lab, ECE department, Illinois Institute of Technology
>> > > > > > Tel: 312-731-7219
>> > > > > >
>> > > > > > _______________________________________________
>> > > > > > discuss mailing list discuss at mpich.org
>> > > > > > To manage subscription options or unsubscribe:
>> > > > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > > > > >
>> > > > > -------------- next part --------------
>> > > > > An HTML attachment was scrubbed...
>> > > > > URL: <
>> > > > >
>> > > >
>> > >
>> >
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48c5f337/attachment.html
>> > > > > >
>> > > > >
>> > > > > ------------------------------
>> > > > >
>> > > > > _______________________________________________
>> > > > > discuss mailing list
>> > > > > discuss at mpich.org
>> > > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > > > >
>> > > > > End of discuss Digest, Vol 9, Issue 27
>> > > > > **************************************
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Best Regards,
>> > > > Sufeng Niu
>> > > > ECASP lab, ECE department, Illinois Institute of Technology
>> > > > Tel: 312-731-7219
>> > > > -------------- next part --------------
>> > > > An HTML attachment was scrubbed...
>> > > > URL: <
>> > > >
>> > >
>> >
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/57a5e76f/attachment.html
>> > > > >
>> > > >
>> > > > ------------------------------
>> > > >
>> > > > _______________________________________________
>> > > > discuss mailing list
>> > > > discuss at mpich.org
>> > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > > >
>> > > > End of discuss Digest, Vol 9, Issue 28
>> > > > **************************************
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Best Regards,
>> > > Sufeng Niu
>> > > ECASP lab, ECE department, Illinois Institute of Technology
>> > > Tel: 312-731-7219
>> > > -------------- next part --------------
>> > > An HTML attachment was scrubbed...
>> > > URL: <
>> > >
>> >
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48296a33/attachment.html
>> > > >
>> > > -------------- next part --------------
>> > > A non-text attachment was scrubbed...
>> > > Name: Screenshot.png
>> > > Type: image/png
>> > > Size: 131397 bytes
>> > > Desc: not available
>> > > URL: <
>> > >
>> >
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48296a33/attachment.png
>> > > >
>> > >
>> > > ------------------------------
>> > >
>> > > _______________________________________________
>> > > discuss mailing list
>> > > discuss at mpich.org
>> > > https://lists.mpich.org/mailman/listinfo/discuss
>> > >
>> > > End of discuss Digest, Vol 9, Issue 29
>> > > **************************************
>> > >
>> >
>> >
>> >
>> > --
>> > Best Regards,
>> > Sufeng Niu
>> > ECASP lab, ECE department, Illinois Institute of Technology
>> > Tel: 312-731-7219
>> > -------------- next part --------------
>> > An HTML attachment was scrubbed...
>> > URL: <
>> >
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/7c5cb5bf/attachment.html
>> > >
>> >
>> > ------------------------------
>> >
>> > _______________________________________________
>> > discuss mailing list
>> > discuss at mpich.org
>> > https://lists.mpich.org/mailman/listinfo/discuss
>> >
>> > End of discuss Digest, Vol 9, Issue 30
>> > **************************************
>> >
>>
>>
>>
>> --
>> Best Regards,
>> Sufeng Niu
>> ECASP lab, ECE department, Illinois Institute of Technology
>> Tel: 312-731-7219
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <
>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/2de2b7a5/attachment.html
>> >
>>
>> ------------------------------
>>
>> _______________________________________________
>> discuss mailing list
>> discuss at mpich.org
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>> End of discuss Digest, Vol 9, Issue 31
>> **************************************
>>
>
>
>
> --
> Best Regards,
> Sufeng Niu
> ECASP lab, ECE department, Illinois Institute of Technology
> Tel: 312-731-7219
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130710/81d23378/attachment.html>
More information about the discuss
mailing list