[mpich-discuss] MPI_Win_fence failed

Jim Dinan james.dinan at gmail.com
Wed Jul 10 17:18:32 CDT 2013


I did a quick grep of your code.  The following looks like it could be a
bug:

---8<---

$ grep MPI_Win_create *
...
udp_server.c: MPI_Win_create(image_buff,
2*image_info->image_size*sizeof(uint16), sizeof(uint16), MPI_INFO_NULL,
MPI_COMM_WORLD, win);
$ grep MPI_Get *
...
rms.c: MPI_Get(strip_buff, image_info->buffer_size, MPI_INT, 0,
(rank-1)*image_info->buffer_size, image_info->buffer_size, MPI_INT, *win);

---8<---

The window size is "2*image_info->image_size*sizeof(uint16)" and the
displacement is "(rank-1)*image_info->buffer_size".  The displacements
expect a window that is proportional to the number of ranks, but the window
has a fixed size.  It looks like this would cause your gets to wander
outside of the exposed buffer at the target.

 ~Jim.


On Wed, Jul 10, 2013 at 6:11 PM, Jim Dinan <james.dinan at gmail.com> wrote:

> From that backtrace, it looks like the displacement/datatype that you gave
> in the call to MPI_Get() caused the target process to access an invalid
> location in memory.  MPICH does not check whether the window accesses at a
> process targeted by RMA operations are constrained to the window.  I would
> start by making sure that your gets are contained within the window at the
> target process.
>
>  ~Jim.
>
>
> On Wed, Jul 10, 2013 at 1:33 PM, Sufeng Niu <sniu at hawk.iit.edu> wrote:
>
>> Hi Jeff
>>
>> Sorry to send so many emails which messed up discuss group email.
>>
>> I found that the scientific image is too large to upload on github. so I
>> put it on the ftp:
>> ftp://ftp.xray.aps.anl.gov/pub/sector8/ there is 55Fe_run5_dark.tif file.
>>
>> just put the tif file with the source code. Sorry again on my frequently
>> email broadcast. Thank you so much for your debugging help
>>
>> Sufeng
>>
>>
>> On Wed, Jul 10, 2013 at 12:08 PM, <discuss-request at mpich.org> wrote:
>>
>>> Send discuss mailing list submissions to
>>>         discuss at mpich.org
>>>
>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>         https://lists.mpich.org/mailman/listinfo/discuss
>>> or, via email, send a message with subject or body 'help' to
>>>         discuss-request at mpich.org
>>>
>>> You can reach the person managing the list at
>>>         discuss-owner at mpich.org
>>>
>>> When replying, please edit your Subject line so it is more specific
>>> than "Re: Contents of discuss digest..."
>>>
>>>
>>> Today's Topics:
>>>
>>>    1. Re:  MPI_Win_fence failed (Jeff Hammond)
>>>    2. Re:  MPI_Win_fence failed (Sufeng Niu)
>>>
>>>
>>> ----------------------------------------------------------------------
>>>
>>> Message: 1
>>> Date: Wed, 10 Jul 2013 12:05:09 -0500
>>> From: Jeff Hammond <jeff.science at gmail.com>
>>> To: discuss at mpich.org
>>> Subject: Re: [mpich-discuss] MPI_Win_fence failed
>>> Message-ID:
>>>         <CAGKz=
>>> uJ-aoHqK5A_tS6YfWaaxjw5AjhHM7xL1A0XaUSUjKvDcQ at mail.gmail.com>
>>> Content-Type: text/plain; charset=ISO-8859-1
>>>
>>> use dropbox, pastebin, etc. for attachments.  it makes life a lot
>>> easier for everyone.
>>>
>>> jeff
>>>
>>> On Wed, Jul 10, 2013 at 11:57 AM, Sufeng Niu <sniu at hawk.iit.edu> wrote:
>>> > Sorry, I found that this discussion email cannot add figure or
>>> attachment.
>>> >
>>> > the backtrace information is below:
>>> >
>>> > processes               Location                                   PC
>>> > Host     Rank       ID      Status
>>> > 7                            _start
>>> > 0x00402399
>>> > `-7                          _libc_start_main
>>> > 0x3685c1ecdd
>>> >    `-7                       main
>>> > 0x00402474
>>> >       `-7                    dkm
>>> > ...
>>> >         |-6                   image_rms
>>> > 0x004029bb
>>> >         | `-6                 rms
>>> > 0x00402d44
>>> >         |   `-6               PMPI_Win_fence
>>>  0x0040c389
>>> >         |      `-6            MPIDI_Win_fence
>>> 0x004a45f4
>>> >         |        `-6          MPIDI_CH3I_RMAListComplete 0x004a27d3
>>> >         |          `-6        MPIDI_CH3I_Progress               ...
>>> >         `-1                   udp
>>> > 0x004035cf
>>> >           `-1                PMPI_Win_fence
>>> 0x0040c389
>>> >             `-1              MPIDI_Win_fence
>>>  0x004a45a0
>>> >                `-1           MPIDI_CH3I_Progress
>>> 0x004292f5
>>> >                  `-1         MPIDI_CH3_PktHandler_Get      0x0049f3f9
>>> >                    `-1       MPIDI_CH3_iSendv
>>> 0x004aa67c
>>> >                      `-       memcpy
>>> > 0x3685c89329  164.54.54.122    0  20.1-13994 Stopped
>>> >
>>> >
>>> >
>>> > On Wed, Jul 10, 2013 at 11:39 AM, <discuss-request at mpich.org> wrote:
>>> >>
>>> >> Send discuss mailing list submissions to
>>> >>         discuss at mpich.org
>>> >>
>>> >> To subscribe or unsubscribe via the World Wide Web, visit
>>> >>         https://lists.mpich.org/mailman/listinfo/discuss
>>> >> or, via email, send a message with subject or body 'help' to
>>> >>         discuss-request at mpich.org
>>> >>
>>> >> You can reach the person managing the list at
>>> >>         discuss-owner at mpich.org
>>> >>
>>> >> When replying, please edit your Subject line so it is more specific
>>> >> than "Re: Contents of discuss digest..."
>>> >>
>>> >>
>>> >> Today's Topics:
>>> >>
>>> >>    1. Re:  MPI_Win_fence failed (Sufeng Niu)
>>> >>
>>> >>
>>> >> ----------------------------------------------------------------------
>>> >>
>>> >> Message: 1
>>> >> Date: Wed, 10 Jul 2013 11:39:39 -0500
>>> >>
>>> >> From: Sufeng Niu <sniu at hawk.iit.edu>
>>> >> To: discuss at mpich.org
>>> >> Subject: Re: [mpich-discuss] MPI_Win_fence failed
>>> >> Message-ID:
>>> >>
>>> >> <CAFNNHkz8pBfX33icn=+3rdXvqDfWqeu58odpd=mOXLciysHgfg at mail.gmail.com>
>>> >> Content-Type: text/plain; charset="iso-8859-1"
>>> >>
>>> >>
>>> >> Sorry I forget to add screen shot for backtrace. the screen shot is
>>> >> attached.
>>> >>
>>> >> Thanks a lot!
>>> >>
>>> >> Sufeng
>>> >>
>>> >>
>>> >>
>>> >> On Wed, Jul 10, 2013 at 11:30 AM, <discuss-request at mpich.org> wrote:
>>> >>
>>> >> > Send discuss mailing list submissions to
>>> >> >         discuss at mpich.org
>>> >> >
>>> >> > To subscribe or unsubscribe via the World Wide Web, visit
>>> >> >         https://lists.mpich.org/mailman/listinfo/discuss
>>> >> > or, via email, send a message with subject or body 'help' to
>>> >> >         discuss-request at mpich.org
>>> >> >
>>> >> > You can reach the person managing the list at
>>> >> >         discuss-owner at mpich.org
>>> >> >
>>> >> > When replying, please edit your Subject line so it is more specific
>>> >> > than "Re: Contents of discuss digest..."
>>> >> >
>>> >> >
>>> >> > Today's Topics:
>>> >> >
>>> >> >    1. Re:  MPI_Win_fence failed (Sufeng Niu)
>>> >> >
>>> >> >
>>> >> >
>>> ----------------------------------------------------------------------
>>> >> >
>>> >> > Message: 1
>>> >> > Date: Wed, 10 Jul 2013 11:30:36 -0500
>>> >> > From: Sufeng Niu <sniu at hawk.iit.edu>
>>> >> > To: discuss at mpich.org
>>> >> > Subject: Re: [mpich-discuss] MPI_Win_fence failed
>>> >> > Message-ID:
>>> >> >         <
>>> >> > CAFNNHkyLj8CbYMmc_w2DA9_+q2Oe3kyus+g6c99ShPk6ZXVkdA at mail.gmail.com>
>>> >> > Content-Type: text/plain; charset="iso-8859-1"
>>> >> >
>>> >> > Hi Jim,
>>> >> >
>>> >> > Thanks a lot for your reply. the basic way for me to debugging is
>>> >> > barrier+printf, right now I only have an evaluation version of
>>> >> > totalview.
>>> >> > the backtrace using totalview shown below. the udp is the udp
>>> collection
>>> >> > and create RMA window, image_rms doing MPI_Get to access the window
>>> >> >
>>> >> >  There is a segment violation, but I don't know why the program
>>> stopped
>>> >> > at
>>> >> > MPI_Win_fence.
>>> >> >
>>> >> > Thanks a lot!
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Wed, Jul 10, 2013 at 10:12 AM, <discuss-request at mpich.org>
>>> wrote:
>>> >> >
>>> >> > > Send discuss mailing list submissions to
>>> >> > >         discuss at mpich.org
>>> >> > >
>>> >> > > To subscribe or unsubscribe via the World Wide Web, visit
>>> >> > >         https://lists.mpich.org/mailman/listinfo/discuss
>>> >> > > or, via email, send a message with subject or body 'help' to
>>> >> > >         discuss-request at mpich.org
>>> >> > >
>>> >> > > You can reach the person managing the list at
>>> >> > >         discuss-owner at mpich.org
>>> >> > >
>>> >> > > When replying, please edit your Subject line so it is more
>>> specific
>>> >> > > than "Re: Contents of discuss digest..."
>>> >> > >
>>> >> > >
>>> >> > > Today's Topics:
>>> >> > >
>>> >> > >    1. Re:  MPICH3.0.4 make fails with "No rule to make  target..."
>>> >> > >       (Wesley Bland)
>>> >> > >    2. Re:  Error in MPI_Finalize on a simple ring test  over TCP
>>> >> > >       (Wesley Bland)
>>> >> > >    3.  Restrict number of cores, not threads (Bob Ilgner)
>>> >> > >    4. Re:  Restrict number of cores, not threads (Wesley Bland)
>>> >> > >    5. Re:  Restrict number of cores, not threads (Wesley Bland)
>>> >> > >    6. Re:  Error in MPI_Finalize on a simple ring test over TCP
>>> >> > >       (Thomas Ropars)
>>> >> > >    7.  MPI_Win_fence failed (Sufeng Niu)
>>> >> > >    8. Re:  MPI_Win_fence failed (Jim Dinan)
>>> >> > >
>>> >> > >
>>> >> > >
>>> ----------------------------------------------------------------------
>>> >> > >
>>> >> > > Message: 1
>>> >> > > Date: Wed, 10 Jul 2013 08:29:06 -0500
>>> >> > > From: Wesley Bland <wbland at mcs.anl.gov>
>>> >> > > To: discuss at mpich.org
>>> >> > > Subject: Re: [mpich-discuss] MPICH3.0.4 make fails with "No rule
>>> to
>>> >> > >         make    target..."
>>> >> > > Message-ID: <F48FC916-31F7-4F82-95F8-2D6A6C45264F at mcs.anl.gov>
>>> >> > > Content-Type: text/plain; charset="iso-8859-1"
>>> >> > >
>>> >> > > Unfortunately, due to the lack of developer resources and
>>> interest,
>>> >> > > the
>>> >> > > last version of MPICH which was supported on Windows was 1.4.1p.
>>> You
>>> >> > > can
>>> >> > > find that version on the downloads page:
>>> >> > >
>>> >> > > http://www.mpich.org/downloads/
>>> >> > >
>>> >> > > Alternatively, Microsoft maintains a derivative of MPICH which
>>> should
>>> >> > > provide the features you need. You also find a link to that on the
>>> >> > > downloads page above.
>>> >> > >
>>> >> > > Wesley
>>> >> > >
>>> >> > > On Jul 10, 2013, at 1:16 AM, Don Warren <don.warren at gmail.com>
>>> wrote:
>>> >> > >
>>> >> > > > Hello,
>>> >> > > >
>>> >> > > > As requested in the installation guide, I'm informing this list
>>> of a
>>> >> > > failure to correctly make MPICH3.0.4 on a Win7 system.  The
>>> specific
>>> >> > error
>>> >> > > encountered is
>>> >> > > > "make[2]: *** No rule to make target
>>> >> > > `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.am', needed
>>> by
>>> >> > > `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.in'.  Stop."
>>> >> > > >
>>> >> > > > I have confirmed that both Makefile.am and Makefile.in exist in
>>> the
>>> >> > > directory listed.  I'm attaching the c.txt and the m.txt files.
>>> >> > > >
>>> >> > > > Possibly of interest is that the command "make clean" fails at
>>> >> > > > exactly
>>> >> > > the same folder, with exactly the same error message as shown in
>>> m.txt
>>> >> > and
>>> >> > > above.
>>> >> > > >
>>> >> > > > Any advice you can give would be appreciated.  I'm attempting
>>> to get
>>> >> > > FLASH running on my computer, which seems to require MPICH.
>>> >> > > >
>>> >> > > > Regards,
>>> >> > > > Don Warren
>>> >> > > >
>>> >> >
>>> <config-make-outputs.zip>_______________________________________________
>>> >> > > > discuss mailing list     discuss at mpich.org
>>> >> > > > To manage subscription options or unsubscribe:
>>> >> > > > https://lists.mpich.org/mailman/listinfo/discuss
>>> >> > >
>>> >> > > -------------- next part --------------
>>> >> > > An HTML attachment was scrubbed...
>>> >> > > URL: <
>>> >> > >
>>> >> >
>>> >> >
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/69b497f1/attachment-0001.html
>>> >> > > >
>>> >> > >
>>> >> > > ------------------------------
>>> >> > >
>>> >> > > Message: 2
>>> >> > > Date: Wed, 10 Jul 2013 08:39:47 -0500
>>> >> > > From: Wesley Bland <wbland at mcs.anl.gov>
>>> >> > > To: discuss at mpich.org
>>> >> > > Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple
>>> ring
>>> >> > >         test    over TCP
>>> >> > > Message-ID: <D5999106-2A75-4091-8B0F-EAFA22880862 at mcs.anl.gov>
>>> >> > > Content-Type: text/plain; charset=us-ascii
>>> >> > >
>>> >> > > The value of previous for rank 0 in your code is -1. MPICH is
>>> >> > > complaining
>>> >> > > because all of the requests to receive a message from -1 are still
>>> >> > pending
>>> >> > > when you try to finalize. You need to make sure that you are
>>> receiving
>>> >> > from
>>> >> > > valid ranks.
>>> >> > >
>>> >> > > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <thomas.ropars at epfl.ch
>>> >
>>> >> > wrote:
>>> >> > >
>>> >> > > > Yes sure. Here it is.
>>> >> > > >
>>> >> > > > Thomas
>>> >> > > >
>>> >> > > > On 07/10/2013 02:23 PM, Wesley Bland wrote:
>>> >> > > >> Can you send us the smallest chunk of code that still exhibits
>>> this
>>> >> > > error?
>>> >> > > >>
>>> >> > > >> Wesley
>>> >> > > >>
>>> >> > > >> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <
>>> thomas.ropars at epfl.ch>
>>> >> > > wrote:
>>> >> > > >>
>>> >> > > >>> Hi all,
>>> >> > > >>>
>>> >> > > >>> I get the following error when I try to run a simple
>>> application
>>> >> > > implementing a ring (each process sends to rank+1 and receives
>>> from
>>> >> > > rank-1). More precisely, the error occurs during the call to
>>> >> > MPI_Finalize():
>>> >> > > >>>
>>> >> > > >>> Assertion failed in file
>>> >> > > src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363:
>>> >> > sc->pg_is_set
>>> >> > > >>> internal ABORT - process 0
>>> >> > > >>>
>>> >> > > >>> Does anybody else also noticed the same error?
>>> >> > > >>>
>>> >> > > >>> Here are all the details about my test:
>>> >> > > >>> - The error is generated with mpich-3.0.2 (but I noticed the
>>> exact
>>> >> > > same error with mpich-3.0.4)
>>> >> > > >>> - I am using IPoIB for communication between nodes (The same
>>> thing
>>> >> > > happens over Ethernet)
>>> >> > > >>> - The problem comes from TCP links. When all processes are on
>>> the
>>> >> > same
>>> >> > > node, there is no error. As soon as one process is on a remote
>>> node,
>>> >> > > the
>>> >> > > failure occurs.
>>> >> > > >>> - Note also that the failure does not occur if I run a more
>>> >> > > >>> complex
>>> >> > > code (eg, a NAS benchmark).
>>> >> > > >>>
>>> >> > > >>> Thomas Ropars
>>> >> > > >>> _______________________________________________
>>> >> > > >>> discuss mailing list     discuss at mpich.org
>>> >> > > >>> To manage subscription options or unsubscribe:
>>> >> > > >>> https://lists.mpich.org/mailman/listinfo/discuss
>>> >> > > >> _______________________________________________
>>> >> > > >> discuss mailing list     discuss at mpich.org
>>> >> > > >> To manage subscription options or unsubscribe:
>>> >> > > >> https://lists.mpich.org/mailman/listinfo/discuss
>>> >> > > >>
>>> >> > > >>
>>> >> > > >
>>> >> > > > <ring_clean.c>_______________________________________________
>>> >> > > > discuss mailing list     discuss at mpich.org
>>> >> > > > To manage subscription options or unsubscribe:
>>> >> > > > https://lists.mpich.org/mailman/listinfo/discuss
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > ------------------------------
>>> >> > >
>>> >> > > Message: 3
>>> >> > > Date: Wed, 10 Jul 2013 16:41:27 +0200
>>> >> > > From: Bob Ilgner <bobilgner at gmail.com>
>>> >> > > To: mpich-discuss at mcs.anl.gov
>>> >> > > Subject: [mpich-discuss] Restrict number of cores, not threads
>>> >> > > Message-ID:
>>> >> > >         <
>>> >> > >
>>> CAKv15b-QgmHkVkoiTFmP3EZXvyy6sc_QeqHQgbMUhnr3Xh9ecA at mail.gmail.com>
>>> >> > > Content-Type: text/plain; charset="iso-8859-1"
>>> >> > >
>>> >> > > Dear all,
>>> >> > >
>>> >> > > I am working on a shared memory processor with 256 cores. I am
>>> working
>>> >> > from
>>> >> > > the command line directly.
>>> >> > >
>>> >> > > Can I restict the number of cores that I deploy.The command
>>> >> > >
>>> >> > > mpirun -n 100 myprog
>>> >> > >
>>> >> > >
>>> >> > > will automatically start on 100 cores. I wish to use only 10
>>> cores and
>>> >> > have
>>> >> > > 10 threads on each core. Can I do this with mpich ?  Rememebre
>>> that
>>> >> > > this
>>> >> > an
>>> >> > > smp abd I can not identify each core individually(as in a cluster)
>>> >> > >
>>> >> > > Regards, bob
>>> >> > > -------------- next part --------------
>>> >> > > An HTML attachment was scrubbed...
>>> >> > > URL: <
>>> >> > >
>>> >> >
>>> >> >
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/ec659e91/attachment-0001.html
>>> >> > > >
>>> >> > >
>>> >> > > ------------------------------
>>> >> > >
>>> >> > > Message: 4
>>> >> > > Date: Wed, 10 Jul 2013 09:46:38 -0500
>>> >> > > From: Wesley Bland <wbland at mcs.anl.gov>
>>> >> > > To: discuss at mpich.org
>>> >> > > Cc: mpich-discuss at mcs.anl.gov
>>> >> > > Subject: Re: [mpich-discuss] Restrict number of cores, not threads
>>> >> > > Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
>>> >> > > Content-Type: text/plain; charset=iso-8859-1
>>> >> > >
>>> >> > > Threads in MPI are not ranks. When you say you want to launch
>>> with -n
>>> >> > 100,
>>> >> > > you will always get 100 processes, not threads. If you want 10
>>> threads
>>> >> > > on
>>> >> > > 10 cores, you will need to launch with -n 10, then add your
>>> threads
>>> >> > > according to your threading library.
>>> >> > >
>>> >> > > Note that threads in MPI do not get their own rank currently.
>>> They all
>>> >> > > share the same rank as the process in which they reside, so if you
>>> >> > > need
>>> >> > to
>>> >> > > be able to handle things with different ranks, you'll need to use
>>> >> > > actual
>>> >> > > processes.
>>> >> > >
>>> >> > > Wesley
>>> >> > >
>>> >> > > On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com>
>>> wrote:
>>> >> > >
>>> >> > > > Dear all,
>>> >> > > >
>>> >> > > > I am working on a shared memory processor with 256 cores. I am
>>> >> > > > working
>>> >> > > from the command line directly.
>>> >> > > >
>>> >> > > > Can I restict the number of cores that I deploy.The command
>>> >> > > >
>>> >> > > > mpirun -n 100 myprog
>>> >> > > >
>>> >> > > >
>>> >> > > > will automatically start on 100 cores. I wish to use only 10
>>> cores
>>> >> > > > and
>>> >> > > have 10 threads on each core. Can I do this with mpich ?
>>>  Rememebre
>>> >> > > that
>>> >> > > this an smp abd I can not identify each core individually(as in a
>>> >> > cluster)
>>> >> > > >
>>> >> > > > Regards, bob
>>> >> > > > _______________________________________________
>>> >> > > > discuss mailing list     discuss at mpich.org
>>> >> > > > To manage subscription options or unsubscribe:
>>> >> > > > https://lists.mpich.org/mailman/listinfo/discuss
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > ------------------------------
>>> >> > >
>>> >> > > Message: 5
>>> >> > > Date: Wed, 10 Jul 2013 09:46:38 -0500
>>> >> > > From: Wesley Bland <wbland at mcs.anl.gov>
>>> >> > > To: discuss at mpich.org
>>> >> > > Cc: mpich-discuss at mcs.anl.gov
>>> >> > > Subject: Re: [mpich-discuss] Restrict number of cores, not threads
>>> >> > > Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
>>> >> > > Content-Type: text/plain; charset=iso-8859-1
>>> >> > >
>>> >> > > Threads in MPI are not ranks. When you say you want to launch
>>> with -n
>>> >> > 100,
>>> >> > > you will always get 100 processes, not threads. If you want 10
>>> threads
>>> >> > > on
>>> >> > > 10 cores, you will need to launch with -n 10, then add your
>>> threads
>>> >> > > according to your threading library.
>>> >> > >
>>> >> > > Note that threads in MPI do not get their own rank currently.
>>> They all
>>> >> > > share the same rank as the process in which they reside, so if you
>>> >> > > need
>>> >> > to
>>> >> > > be able to handle things with different ranks, you'll need to use
>>> >> > > actual
>>> >> > > processes.
>>> >> > >
>>> >> > > Wesley
>>> >> > >
>>> >> > > On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com>
>>> wrote:
>>> >> > >
>>> >> > > > Dear all,
>>> >> > > >
>>> >> > > > I am working on a shared memory processor with 256 cores. I am
>>> >> > > > working
>>> >> > > from the command line directly.
>>> >> > > >
>>> >> > > > Can I restict the number of cores that I deploy.The command
>>> >> > > >
>>> >> > > > mpirun -n 100 myprog
>>> >> > > >
>>> >> > > >
>>> >> > > > will automatically start on 100 cores. I wish to use only 10
>>> cores
>>> >> > > > and
>>> >> > > have 10 threads on each core. Can I do this with mpich ?
>>>  Rememebre
>>> >> > > that
>>> >> > > this an smp abd I can not identify each core individually(as in a
>>> >> > cluster)
>>> >> > > >
>>> >> > > > Regards, bob
>>> >> > > > _______________________________________________
>>> >> > > > discuss mailing list     discuss at mpich.org
>>> >> > > > To manage subscription options or unsubscribe:
>>> >> > > > https://lists.mpich.org/mailman/listinfo/discuss
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > ------------------------------
>>> >> > >
>>> >> > > Message: 6
>>> >> > > Date: Wed, 10 Jul 2013 16:50:36 +0200
>>> >> > > From: Thomas Ropars <thomas.ropars at epfl.ch>
>>> >> > > To: discuss at mpich.org
>>> >> > > Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple
>>> ring
>>> >> > >         test over TCP
>>> >> > > Message-ID: <51DD74BC.3020009 at epfl.ch>
>>> >> > > Content-Type: text/plain; charset=UTF-8; format=flowed
>>> >> > >
>>> >> > > Yes, you are right, sorry for disturbing.
>>> >> > >
>>> >> > > On 07/10/2013 03:39 PM, Wesley Bland wrote:
>>> >> > > > The value of previous for rank 0 in your code is -1. MPICH is
>>> >> > > complaining because all of the requests to receive a message from
>>> -1
>>> >> > > are
>>> >> > > still pending when you try to finalize. You need to make sure
>>> that you
>>> >> > are
>>> >> > > receiving from valid ranks.
>>> >> > > >
>>> >> > > > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <
>>> thomas.ropars at epfl.ch>
>>> >> > > wrote:
>>> >> > > >
>>> >> > > >> Yes sure. Here it is.
>>> >> > > >>
>>> >> > > >> Thomas
>>> >> > > >>
>>> >> > > >> On 07/10/2013 02:23 PM, Wesley Bland wrote:
>>> >> > > >>> Can you send us the smallest chunk of code that still exhibits
>>> >> > > >>> this
>>> >> > > error?
>>> >> > > >>>
>>> >> > > >>> Wesley
>>> >> > > >>>
>>> >> > > >>> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <
>>> thomas.ropars at epfl.ch>
>>> >> > > wrote:
>>> >> > > >>>
>>> >> > > >>>> Hi all,
>>> >> > > >>>>
>>> >> > > >>>> I get the following error when I try to run a simple
>>> application
>>> >> > > implementing a ring (each process sends to rank+1 and receives
>>> from
>>> >> > > rank-1). More precisely, the error occurs during the call to
>>> >> > MPI_Finalize():
>>> >> > > >>>>
>>> >> > > >>>> Assertion failed in file
>>> >> > > src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363:
>>> >> > sc->pg_is_set
>>> >> > > >>>> internal ABORT - process 0
>>> >> > > >>>>
>>> >> > > >>>> Does anybody else also noticed the same error?
>>> >> > > >>>>
>>> >> > > >>>> Here are all the details about my test:
>>> >> > > >>>> - The error is generated with mpich-3.0.2 (but I noticed the
>>> >> > > >>>> exact
>>> >> > > same error with mpich-3.0.4)
>>> >> > > >>>> - I am using IPoIB for communication between nodes (The same
>>> >> > > >>>> thing
>>> >> > > happens over Ethernet)
>>> >> > > >>>> - The problem comes from TCP links. When all processes are
>>> on the
>>> >> > > same node, there is no error. As soon as one process is on a
>>> remote
>>> >> > > node,
>>> >> > > the failure occurs.
>>> >> > > >>>> - Note also that the failure does not occur if I run a more
>>> >> > > >>>> complex
>>> >> > > code (eg, a NAS benchmark).
>>> >> > > >>>>
>>> >> > > >>>> Thomas Ropars
>>> >> > > >>>> _______________________________________________
>>> >> > > >>>> discuss mailing list     discuss at mpich.org
>>> >> > > >>>> To manage subscription options or unsubscribe:
>>> >> > > >>>> https://lists.mpich.org/mailman/listinfo/discuss
>>> >> > > >>> _______________________________________________
>>> >> > > >>> discuss mailing list     discuss at mpich.org
>>> >> > > >>> To manage subscription options or unsubscribe:
>>> >> > > >>> https://lists.mpich.org/mailman/listinfo/discuss
>>> >> > > >>>
>>> >> > > >>>
>>> >> > > >> <ring_clean.c>_______________________________________________
>>> >> > > >> discuss mailing list     discuss at mpich.org
>>> >> > > >> To manage subscription options or unsubscribe:
>>> >> > > >> https://lists.mpich.org/mailman/listinfo/discuss
>>> >> > > > _______________________________________________
>>> >> > > > discuss mailing list     discuss at mpich.org
>>> >> > > > To manage subscription options or unsubscribe:
>>> >> > > > https://lists.mpich.org/mailman/listinfo/discuss
>>> >> > > >
>>> >> > > >
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > ------------------------------
>>> >> > >
>>> >> > > Message: 7
>>> >> > > Date: Wed, 10 Jul 2013 10:07:21 -0500
>>> >> > > From: Sufeng Niu <sniu at hawk.iit.edu>
>>> >> > > To: discuss at mpich.org
>>> >> > > Subject: [mpich-discuss] MPI_Win_fence failed
>>> >> > > Message-ID:
>>> >> > >         <
>>> >> > >
>>> CAFNNHkz_1gC7hfpx0G9j24adO-gDabdmwZ4VuT6jip-fDMhS9A at mail.gmail.com>
>>> >> > > Content-Type: text/plain; charset="iso-8859-1"
>>> >> > >
>>> >> > > Hello,
>>> >> > >
>>> >> > > I used MPI RMA in my program, but the program stop at the
>>> >> > > MPI_Win_fence,
>>> >> > I
>>> >> > > have a master process receive data from udp socket. Other
>>> processes
>>> >> > > use
>>> >> > > MPI_Get to access data.
>>> >> > >
>>> >> > > master process:
>>> >> > >
>>> >> > > MPI_Create(...)
>>> >> > > for(...){
>>> >> > > /* udp recv operation */
>>> >> > >
>>> >> > > MPI_Barrier  // to let other process know data received from udp
>>> is
>>> >> > > ready
>>> >> > >
>>> >> > > MPI_Win_fence(0, win);
>>> >> > > MPI_Win_fence(0, win);
>>> >> > >
>>> >> > > }
>>> >> > >
>>> >> > > other processes:
>>> >> > >
>>> >> > > for(...){
>>> >> > >
>>> >> > > MPI_Barrier  // sync for udp data ready
>>> >> > >
>>> >> > > MPI_Win_fence(0, win);
>>> >> > >
>>> >> > > MPI_Get();
>>> >> > >
>>> >> > > MPI_Win_fence(0, win);  <-- program stopped here
>>> >> > >
>>> >> > > /* other operation */
>>> >> > > }
>>> >> > >
>>> >> > > I found that the program stopped at second MPI_Win_fence, the
>>> terminal
>>> >> > > output is:
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> >
>>> >> >
>>> ===================================================================================
>>> >> > > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> >> > > =   EXIT CODE: 11
>>> >> > > =   CLEANING UP REMAINING PROCESSES
>>> >> > > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> >> > >
>>> >> > >
>>> >> >
>>> >> >
>>> ===================================================================================
>>> >> > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation
>>> fault
>>> >> > > (signal 11)
>>> >> > > This typically refers to a problem with your application.
>>> >> > > Please see the FAQ page for debugging suggestions
>>> >> > >
>>> >> > > Do you have any suggestions? Thank you very much!
>>> >> > >
>>> >> > > --
>>> >> > > Best Regards,
>>> >> > > Sufeng Niu
>>> >> > > ECASP lab, ECE department, Illinois Institute of Technology
>>> >> > > Tel: 312-731-7219
>>> >> > > -------------- next part --------------
>>> >> > > An HTML attachment was scrubbed...
>>> >> > > URL: <
>>> >> > >
>>> >> >
>>> >> >
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/375a95ac/attachment-0001.html
>>> >> > > >
>>> >> > >
>>> >> > > ------------------------------
>>> >> > >
>>> >> > > Message: 8
>>> >> > > Date: Wed, 10 Jul 2013 11:12:45 -0400
>>> >> > > From: Jim Dinan <james.dinan at gmail.com>
>>> >> > > To: discuss at mpich.org
>>> >> > > Subject: Re: [mpich-discuss] MPI_Win_fence failed
>>> >> > > Message-ID:
>>> >> > >         <CAOoEU4F3hX=y3yrJKYKucNeiueQYBeR_3OQas9E+mg+GM6Rz=
>>> >> > > w at mail.gmail.com>
>>> >> > > Content-Type: text/plain; charset="iso-8859-1"
>>> >> > >
>>> >> > > It's hard to tell where the segmentation fault is coming from.
>>>  Can
>>> >> > > you
>>> >> > use
>>> >> > > a debugger to generate a backtrace?
>>> >> > >
>>> >> > >  ~Jim.
>>> >> > >
>>> >> > >
>>> >> > > On Wed, Jul 10, 2013 at 11:07 AM, Sufeng Niu <sniu at hawk.iit.edu>
>>> >> > > wrote:
>>> >> > >
>>> >> > > > Hello,
>>> >> > > >
>>> >> > > > I used MPI RMA in my program, but the program stop at the
>>> >> > MPI_Win_fence,
>>> >> > > I
>>> >> > > > have a master process receive data from udp socket. Other
>>> processes
>>> >> > > > use
>>> >> > > > MPI_Get to access data.
>>> >> > > >
>>> >> > > > master process:
>>> >> > > >
>>> >> > > > MPI_Create(...)
>>> >> > > > for(...){
>>> >> > > > /* udp recv operation */
>>> >> > > >
>>> >> > > > MPI_Barrier  // to let other process know data received from
>>> udp is
>>> >> > ready
>>> >> > > >
>>> >> > > > MPI_Win_fence(0, win);
>>> >> > > > MPI_Win_fence(0, win);
>>> >> > > >
>>> >> > > > }
>>> >> > > >
>>> >> > > > other processes:
>>> >> > > >
>>> >> > > > for(...){
>>> >> > > >
>>> >> > > > MPI_Barrier  // sync for udp data ready
>>> >> > > >
>>> >> > > > MPI_Win_fence(0, win);
>>> >> > > >
>>> >> > > > MPI_Get();
>>> >> > > >
>>> >> > > > MPI_Win_fence(0, win);  <-- program stopped here
>>> >> > > >
>>> >> > > > /* other operation */
>>> >> > > > }
>>> >> > > >
>>> >> > > > I found that the program stopped at second MPI_Win_fence, the
>>> >> > > > terminal
>>> >> > > > output is:
>>> >> > > >
>>> >> > > >
>>> >> > > >
>>> >> > > >
>>> >> > >
>>> >> >
>>> >> >
>>> ===================================================================================
>>> >> > > > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> >> > > > =   EXIT CODE: 11
>>> >> > > > =   CLEANING UP REMAINING PROCESSES
>>> >> > > > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> >> > > >
>>> >> > > >
>>> >> > >
>>> >> >
>>> >> >
>>> ===================================================================================
>>> >> > > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation
>>> fault
>>> >> > > > (signal 11)
>>> >> > > > This typically refers to a problem with your application.
>>> >> > > > Please see the FAQ page for debugging suggestions
>>> >> > > >
>>> >> > > > Do you have any suggestions? Thank you very much!
>>> >> > > >
>>> >> > > > --
>>> >> > > > Best Regards,
>>> >> > > > Sufeng Niu
>>> >> > > > ECASP lab, ECE department, Illinois Institute of Technology
>>> >> > > > Tel: 312-731-7219
>>> >> > > >
>>> >> > > > _______________________________________________
>>> >> > > > discuss mailing list     discuss at mpich.org
>>> >> > > > To manage subscription options or unsubscribe:
>>> >> > > > https://lists.mpich.org/mailman/listinfo/discuss
>>> >> > > >
>>> >> > > -------------- next part --------------
>>> >> > > An HTML attachment was scrubbed...
>>> >> > > URL: <
>>> >> > >
>>> >> >
>>> >> >
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48c5f337/attachment.html
>>> >> > > >
>>> >> > >
>>> >> > > ------------------------------
>>> >> > >
>>> >> > > _______________________________________________
>>> >> > > discuss mailing list
>>> >> > > discuss at mpich.org
>>> >> > > https://lists.mpich.org/mailman/listinfo/discuss
>>> >> > >
>>> >> > > End of discuss Digest, Vol 9, Issue 27
>>> >> > > **************************************
>>> >> > >
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Best Regards,
>>> >> > Sufeng Niu
>>> >> > ECASP lab, ECE department, Illinois Institute of Technology
>>> >> > Tel: 312-731-7219
>>> >> > -------------- next part --------------
>>> >> > An HTML attachment was scrubbed...
>>> >> > URL: <
>>> >> >
>>> >> >
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/57a5e76f/attachment.html
>>> >> > >
>>> >> >
>>> >> > ------------------------------
>>> >> >
>>> >> > _______________________________________________
>>> >> > discuss mailing list
>>> >> > discuss at mpich.org
>>> >> > https://lists.mpich.org/mailman/listinfo/discuss
>>> >> >
>>> >> > End of discuss Digest, Vol 9, Issue 28
>>> >> > **************************************
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards,
>>> >> Sufeng Niu
>>> >> ECASP lab, ECE department, Illinois Institute of Technology
>>> >> Tel: 312-731-7219
>>> >> -------------- next part --------------
>>> >> An HTML attachment was scrubbed...
>>> >> URL:
>>> >> <
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48296a33/attachment.html
>>> >
>>> >> -------------- next part --------------
>>> >> A non-text attachment was scrubbed...
>>> >> Name: Screenshot.png
>>> >> Type: image/png
>>> >> Size: 131397 bytes
>>> >> Desc: not available
>>> >> URL:
>>> >> <
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48296a33/attachment.png
>>> >
>>> >>
>>> >>
>>> >> ------------------------------
>>> >>
>>> >> _______________________________________________
>>> >> discuss mailing list
>>> >> discuss at mpich.org
>>> >> https://lists.mpich.org/mailman/listinfo/discuss
>>> >>
>>> >> End of discuss Digest, Vol 9, Issue 29
>>> >> **************************************
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Best Regards,
>>> > Sufeng Niu
>>> > ECASP lab, ECE department, Illinois Institute of Technology
>>> > Tel: 312-731-7219
>>> >
>>> > _______________________________________________
>>> > discuss mailing list     discuss at mpich.org
>>> > To manage subscription options or unsubscribe:
>>> > https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>>
>>>
>>> --
>>> Jeff Hammond
>>> jeff.science at gmail.com
>>>
>>>
>>> ------------------------------
>>>
>>> Message: 2
>>> Date: Wed, 10 Jul 2013 12:08:19 -0500
>>> From: Sufeng Niu <sniu at hawk.iit.edu>
>>> To: discuss at mpich.org
>>> Subject: Re: [mpich-discuss] MPI_Win_fence failed
>>> Message-ID:
>>>         <CAFNNHkzu0GYT0qSdWx1VQz0+V7mg5d=
>>> tZFQm-MHPVoCyKfiYSA at mail.gmail.com>
>>> Content-Type: text/plain; charset="iso-8859-1"
>>>
>>> Oh, yeah, that would be an easier way. I just create a repository in
>>> github. you can
>>> git clone https://github.com/sufengniu/mpi_app_test.git
>>>
>>> to run the program. you need to install a tif library. I know ubuntu is
>>> sudo apt-get install libtiff4-dev.
>>> after you download it. just make
>>> then there will be 2 bin file,
>>>
>>> please change hostfile to your machine, first run mpi: ./run.perl main
>>>
>>> then run ./udp_client 55Fe_run5_dark.tif
>>>
>>> Thanks a lot!
>>> Sufeng
>>>
>>>
>>>
>>> On Wed, Jul 10, 2013 at 11:57 AM, <discuss-request at mpich.org> wrote:
>>>
>>> > Send discuss mailing list submissions to
>>> >         discuss at mpich.org
>>> >
>>> > To subscribe or unsubscribe via the World Wide Web, visit
>>> >         https://lists.mpich.org/mailman/listinfo/discuss
>>> > or, via email, send a message with subject or body 'help' to
>>> >         discuss-request at mpich.org
>>> >
>>> > You can reach the person managing the list at
>>> >         discuss-owner at mpich.org
>>> >
>>> > When replying, please edit your Subject line so it is more specific
>>> > than "Re: Contents of discuss digest..."
>>> >
>>> >
>>> > Today's Topics:
>>> >
>>> >    1. Re:  MPI_Win_fence failed (Jeff Hammond)
>>> >    2. Re:  MPI_Win_fence failed (Sufeng Niu)
>>> >
>>> >
>>> > ----------------------------------------------------------------------
>>> >
>>> > Message: 1
>>> > Date: Wed, 10 Jul 2013 11:46:08 -0500
>>> > From: Jeff Hammond <jeff.science at gmail.com>
>>> > To: discuss at mpich.org
>>> > Subject: Re: [mpich-discuss] MPI_Win_fence failed
>>> > Message-ID:
>>> >         <CAGKz=
>>> > uLiq6rur+15MBip5U-_AS2JWefYOHfX07b1dkR8POOk6A at mail.gmail.com>
>>> > Content-Type: text/plain; charset=ISO-8859-1
>>> >
>>> > Just post the code so we can run it.
>>> >
>>> > Jeff
>>> >
>>> > On Wed, Jul 10, 2013 at 11:39 AM, Sufeng Niu <sniu at hawk.iit.edu>
>>> wrote:
>>> > > Sorry I forget to add screen shot for backtrace. the screen shot is
>>> > > attached.
>>> > >
>>> > > Thanks a lot!
>>> > >
>>> > > Sufeng
>>> > >
>>> > >
>>> > >
>>> > > On Wed, Jul 10, 2013 at 11:30 AM, <discuss-request at mpich.org> wrote:
>>> > >>
>>> > >> Send discuss mailing list submissions to
>>> > >>         discuss at mpich.org
>>> > >>
>>> > >> To subscribe or unsubscribe via the World Wide Web, visit
>>> > >>         https://lists.mpich.org/mailman/listinfo/discuss
>>> > >> or, via email, send a message with subject or body 'help' to
>>> > >>         discuss-request at mpich.org
>>> > >>
>>> > >> You can reach the person managing the list at
>>> > >>         discuss-owner at mpich.org
>>> > >>
>>> > >> When replying, please edit your Subject line so it is more specific
>>> > >> than "Re: Contents of discuss digest..."
>>> > >>
>>> > >>
>>> > >> Today's Topics:
>>> > >>
>>> > >>    1. Re:  MPI_Win_fence failed (Sufeng Niu)
>>> > >>
>>> > >>
>>> > >>
>>> ----------------------------------------------------------------------
>>> > >>
>>> > >> Message: 1
>>> > >> Date: Wed, 10 Jul 2013 11:30:36 -0500
>>> > >> From: Sufeng Niu <sniu at hawk.iit.edu>
>>> > >> To: discuss at mpich.org
>>> > >> Subject: Re: [mpich-discuss] MPI_Win_fence failed
>>> > >> Message-ID:
>>> > >>
>>> > >> <CAFNNHkyLj8CbYMmc_w2DA9_+q2Oe3kyus+g6c99ShPk6ZXVkdA at mail.gmail.com
>>> >
>>> > >> Content-Type: text/plain; charset="iso-8859-1"
>>> > >>
>>> > >>
>>> > >> Hi Jim,
>>> > >>
>>> > >> Thanks a lot for your reply. the basic way for me to debugging is
>>> > >> barrier+printf, right now I only have an evaluation version of
>>> > totalview.
>>> > >> the backtrace using totalview shown below. the udp is the udp
>>> collection
>>> > >> and create RMA window, image_rms doing MPI_Get to access the window
>>> > >>
>>> > >>  There is a segment violation, but I don't know why the program
>>> stopped
>>> > at
>>> > >> MPI_Win_fence.
>>> > >>
>>> > >> Thanks a lot!
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >> On Wed, Jul 10, 2013 at 10:12 AM, <discuss-request at mpich.org>
>>> wrote:
>>> > >>
>>> > >> > Send discuss mailing list submissions to
>>> > >> >         discuss at mpich.org
>>> > >> >
>>> > >> > To subscribe or unsubscribe via the World Wide Web, visit
>>> > >> >         https://lists.mpich.org/mailman/listinfo/discuss
>>> > >> > or, via email, send a message with subject or body 'help' to
>>> > >> >         discuss-request at mpich.org
>>> > >> >
>>> > >> > You can reach the person managing the list at
>>> > >> >         discuss-owner at mpich.org
>>> > >> >
>>> > >> > When replying, please edit your Subject line so it is more
>>> specific
>>> > >> > than "Re: Contents of discuss digest..."
>>> > >> >
>>> > >> >
>>> > >> > Today's Topics:
>>> > >> >
>>> > >> >    1. Re:  MPICH3.0.4 make fails with "No rule to make  target..."
>>> > >> >       (Wesley Bland)
>>> > >> >    2. Re:  Error in MPI_Finalize on a simple ring test  over TCP
>>> > >> >       (Wesley Bland)
>>> > >> >    3.  Restrict number of cores, not threads (Bob Ilgner)
>>> > >> >    4. Re:  Restrict number of cores, not threads (Wesley Bland)
>>> > >> >    5. Re:  Restrict number of cores, not threads (Wesley Bland)
>>> > >> >    6. Re:  Error in MPI_Finalize on a simple ring test over TCP
>>> > >> >       (Thomas Ropars)
>>> > >> >    7.  MPI_Win_fence failed (Sufeng Niu)
>>> > >> >    8. Re:  MPI_Win_fence failed (Jim Dinan)
>>> > >> >
>>> > >> >
>>> > >> >
>>> ----------------------------------------------------------------------
>>> > >> >
>>> > >> > Message: 1
>>> > >> > Date: Wed, 10 Jul 2013 08:29:06 -0500
>>> > >> > From: Wesley Bland <wbland at mcs.anl.gov>
>>> > >> > To: discuss at mpich.org
>>> > >> > Subject: Re: [mpich-discuss] MPICH3.0.4 make fails with "No rule
>>> to
>>> > >> >         make    target..."
>>> > >> > Message-ID: <F48FC916-31F7-4F82-95F8-2D6A6C45264F at mcs.anl.gov>
>>> > >> > Content-Type: text/plain; charset="iso-8859-1"
>>> > >> >
>>> > >> > Unfortunately, due to the lack of developer resources and
>>> interest,
>>> > the
>>> > >> > last version of MPICH which was supported on Windows was 1.4.1p.
>>> You
>>> > can
>>> > >> > find that version on the downloads page:
>>> > >> >
>>> > >> > http://www.mpich.org/downloads/
>>> > >> >
>>> > >> > Alternatively, Microsoft maintains a derivative of MPICH which
>>> should
>>> > >> > provide the features you need. You also find a link to that on the
>>> > >> > downloads page above.
>>> > >> >
>>> > >> > Wesley
>>> > >> >
>>> > >> > On Jul 10, 2013, at 1:16 AM, Don Warren <don.warren at gmail.com>
>>> wrote:
>>> > >> >
>>> > >> > > Hello,
>>> > >> > >
>>> > >> > > As requested in the installation guide, I'm informing this list
>>> of a
>>> > >> > failure to correctly make MPICH3.0.4 on a Win7 system.  The
>>> specific
>>> > >> > error
>>> > >> > encountered is
>>> > >> > > "make[2]: *** No rule to make target
>>> > >> > `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.am', needed
>>> by
>>> > >> > `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.in'.  Stop."
>>> > >> > >
>>> > >> > > I have confirmed that both Makefile.am and Makefile.in exist in
>>> the
>>> > >> > directory listed.  I'm attaching the c.txt and the m.txt files.
>>> > >> > >
>>> > >> > > Possibly of interest is that the command "make clean" fails at
>>> > exactly
>>> > >> > the same folder, with exactly the same error message as shown in
>>> m.txt
>>> > >> > and
>>> > >> > above.
>>> > >> > >
>>> > >> > > Any advice you can give would be appreciated.  I'm attempting
>>> to get
>>> > >> > FLASH running on my computer, which seems to require MPICH.
>>> > >> > >
>>> > >> > > Regards,
>>> > >> > > Don Warren
>>> > >> > >
>>> > >> > >
>>> >
>>> <config-make-outputs.zip>_______________________________________________
>>> > >>
>>> > >> > > discuss mailing list     discuss at mpich.org
>>> > >> > > To manage subscription options or unsubscribe:
>>> > >> > > https://lists.mpich.org/mailman/listinfo/discuss
>>> > >> >
>>> > >> > -------------- next part --------------
>>> > >> > An HTML attachment was scrubbed...
>>> > >> > URL: <
>>> > >> >
>>> > >> >
>>> >
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/69b497f1/attachment-0001.html
>>> > >> > >
>>> > >> >
>>> > >> > ------------------------------
>>> > >> >
>>> > >> > Message: 2
>>> > >> > Date: Wed, 10 Jul 2013 08:39:47 -0500
>>> > >> > From: Wesley Bland <wbland at mcs.anl.gov>
>>> > >> > To: discuss at mpich.org
>>> > >> > Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple
>>> ring
>>> > >> >         test    over TCP
>>> > >> > Message-ID: <D5999106-2A75-4091-8B0F-EAFA22880862 at mcs.anl.gov>
>>> > >> > Content-Type: text/plain; charset=us-ascii
>>> > >> >
>>> > >> > The value of previous for rank 0 in your code is -1. MPICH is
>>> > >> > complaining
>>> > >> > because all of the requests to receive a message from -1 are still
>>> > >> > pending
>>> > >> > when you try to finalize. You need to make sure that you are
>>> receiving
>>> > >> > from
>>> > >> > valid ranks.
>>> > >> >
>>> > >> > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <thomas.ropars at epfl.ch
>>> >
>>> > >> > wrote:
>>> > >> >
>>> > >> > > Yes sure. Here it is.
>>> > >> > >
>>> > >> > > Thomas
>>> > >> > >
>>> > >> > > On 07/10/2013 02:23 PM, Wesley Bland wrote:
>>> > >> > >> Can you send us the smallest chunk of code that still exhibits
>>> this
>>> > >> > error?
>>> > >> > >>
>>> > >> > >> Wesley
>>> > >> > >>
>>> > >> > >> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <
>>> thomas.ropars at epfl.ch>
>>> > >> > wrote:
>>> > >> > >>
>>> > >> > >>> Hi all,
>>> > >> > >>>
>>> > >> > >>> I get the following error when I try to run a simple
>>> application
>>> > >> > implementing a ring (each process sends to rank+1 and receives
>>> from
>>> > >> > rank-1). More precisely, the error occurs during the call to
>>> > >> > MPI_Finalize():
>>> > >> > >>>
>>> > >> > >>> Assertion failed in file
>>> > >> > src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363:
>>> > >> > sc->pg_is_set
>>> > >> > >>> internal ABORT - process 0
>>> > >> > >>>
>>> > >> > >>> Does anybody else also noticed the same error?
>>> > >> > >>>
>>> > >> > >>> Here are all the details about my test:
>>> > >> > >>> - The error is generated with mpich-3.0.2 (but I noticed the
>>> exact
>>> > >> > same error with mpich-3.0.4)
>>> > >> > >>> - I am using IPoIB for communication between nodes (The same
>>> thing
>>> > >> > happens over Ethernet)
>>> > >> > >>> - The problem comes from TCP links. When all processes are on
>>> the
>>> > >> > >>> same
>>> > >> > node, there is no error. As soon as one process is on a remote
>>> node,
>>> > the
>>> > >> > failure occurs.
>>> > >> > >>> - Note also that the failure does not occur if I run a more
>>> > complex
>>> > >> > code (eg, a NAS benchmark).
>>> > >> > >>>
>>> > >> > >>> Thomas Ropars
>>> > >>
>>> > >> > >>> _______________________________________________
>>> > >> > >>> discuss mailing list     discuss at mpich.org
>>> > >> > >>> To manage subscription options or unsubscribe:
>>> > >> > >>> https://lists.mpich.org/mailman/listinfo/discuss
>>> > >> > >> _______________________________________________
>>> > >> > >> discuss mailing list     discuss at mpich.org
>>> > >> > >> To manage subscription options or unsubscribe:
>>> > >> > >> https://lists.mpich.org/mailman/listinfo/discuss
>>> > >> > >>
>>> > >> > >>
>>> > >> > >
>>> > >> > > <ring_clean.c>_______________________________________________
>>> > >>
>>> > >> > > discuss mailing list     discuss at mpich.org
>>> > >> > > To manage subscription options or unsubscribe:
>>> > >> > > https://lists.mpich.org/mailman/listinfo/discuss
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> > ------------------------------
>>> > >> >
>>> > >> > Message: 3
>>> > >> > Date: Wed, 10 Jul 2013 16:41:27 +0200
>>> > >> > From: Bob Ilgner <bobilgner at gmail.com>
>>> > >> > To: mpich-discuss at mcs.anl.gov
>>> > >> > Subject: [mpich-discuss] Restrict number of cores, not threads
>>> > >> > Message-ID:
>>> > >> >         <
>>> > >> >
>>> CAKv15b-QgmHkVkoiTFmP3EZXvyy6sc_QeqHQgbMUhnr3Xh9ecA at mail.gmail.com>
>>> > >> > Content-Type: text/plain; charset="iso-8859-1"
>>> > >> >
>>> > >> > Dear all,
>>> > >> >
>>> > >> > I am working on a shared memory processor with 256 cores. I am
>>> working
>>> > >> > from
>>> > >> > the command line directly.
>>> > >> >
>>> > >> > Can I restict the number of cores that I deploy.The command
>>> > >> >
>>> > >> > mpirun -n 100 myprog
>>> > >> >
>>> > >> >
>>> > >> > will automatically start on 100 cores. I wish to use only 10
>>> cores and
>>> > >> > have
>>> > >> > 10 threads on each core. Can I do this with mpich ?  Rememebre
>>> that
>>> > this
>>> > >> > an
>>> > >> > smp abd I can not identify each core individually(as in a cluster)
>>> > >> >
>>> > >> > Regards, bob
>>> > >> > -------------- next part --------------
>>> > >> > An HTML attachment was scrubbed...
>>> > >> > URL: <
>>> > >> >
>>> > >> >
>>> >
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/ec659e91/attachment-0001.html
>>> > >> > >
>>> > >> >
>>> > >> > ------------------------------
>>> > >> >
>>> > >> > Message: 4
>>> > >> > Date: Wed, 10 Jul 2013 09:46:38 -0500
>>> > >> > From: Wesley Bland <wbland at mcs.anl.gov>
>>> > >> > To: discuss at mpich.org
>>> > >> > Cc: mpich-discuss at mcs.anl.gov
>>> > >> > Subject: Re: [mpich-discuss] Restrict number of cores, not threads
>>> > >> > Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
>>> > >> > Content-Type: text/plain; charset=iso-8859-1
>>> > >> >
>>> > >> > Threads in MPI are not ranks. When you say you want to launch
>>> with -n
>>> > >> > 100,
>>> > >> > you will always get 100 processes, not threads. If you want 10
>>> threads
>>> > >> > on
>>> > >> > 10 cores, you will need to launch with -n 10, then add your
>>> threads
>>> > >> > according to your threading library.
>>> > >> >
>>> > >> > Note that threads in MPI do not get their own rank currently.
>>> They all
>>> > >> > share the same rank as the process in which they reside, so if you
>>> > need
>>> > >> > to
>>> > >> > be able to handle things with different ranks, you'll need to use
>>> > actual
>>> > >> > processes.
>>> > >> >
>>> > >> > Wesley
>>> > >> >
>>> > >> > On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com>
>>> wrote:
>>> > >> >
>>> > >> > > Dear all,
>>> > >> > >
>>> > >> > > I am working on a shared memory processor with 256 cores. I am
>>> > working
>>> > >> > from the command line directly.
>>> > >> > >
>>> > >> > > Can I restict the number of cores that I deploy.The command
>>> > >> > >
>>> > >> > > mpirun -n 100 myprog
>>> > >> > >
>>> > >> > >
>>> > >> > > will automatically start on 100 cores. I wish to use only 10
>>> cores
>>> > and
>>> > >> > have 10 threads on each core. Can I do this with mpich ?
>>>  Rememebre
>>> > that
>>> > >> > this an smp abd I can not identify each core individually(as in a
>>> > >> > cluster)
>>> > >> > >
>>> > >> > > Regards, bob
>>> > >>
>>> > >> > > _______________________________________________
>>> > >> > > discuss mailing list     discuss at mpich.org
>>> > >> > > To manage subscription options or unsubscribe:
>>> > >> > > https://lists.mpich.org/mailman/listinfo/discuss
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> > ------------------------------
>>> > >> >
>>> > >> > Message: 5
>>> > >> > Date: Wed, 10 Jul 2013 09:46:38 -0500
>>> > >> > From: Wesley Bland <wbland at mcs.anl.gov>
>>> > >> > To: discuss at mpich.org
>>> > >> > Cc: mpich-discuss at mcs.anl.gov
>>> > >> > Subject: Re: [mpich-discuss] Restrict number of cores, not threads
>>> > >> > Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
>>> > >> > Content-Type: text/plain; charset=iso-8859-1
>>> > >> >
>>> > >> > Threads in MPI are not ranks. When you say you want to launch
>>> with -n
>>> > >> > 100,
>>> > >> > you will always get 100 processes, not threads. If you want 10
>>> threads
>>> > >> > on
>>> > >> > 10 cores, you will need to launch with -n 10, then add your
>>> threads
>>> > >> > according to your threading library.
>>> > >> >
>>> > >> > Note that threads in MPI do not get their own rank currently.
>>> They all
>>> > >> > share the same rank as the process in which they reside, so if you
>>> > need
>>> > >> > to
>>> > >> > be able to handle things with different ranks, you'll need to use
>>> > actual
>>> > >> > processes.
>>> > >> >
>>> > >> > Wesley
>>> > >> >
>>> > >> > On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com>
>>> wrote:
>>> > >> >
>>> > >> > > Dear all,
>>> > >> > >
>>> > >> > > I am working on a shared memory processor with 256 cores. I am
>>> > working
>>> > >> > from the command line directly.
>>> > >> > >
>>> > >> > > Can I restict the number of cores that I deploy.The command
>>> > >> > >
>>> > >> > > mpirun -n 100 myprog
>>> > >> > >
>>> > >> > >
>>> > >> > > will automatically start on 100 cores. I wish to use only 10
>>> cores
>>> > and
>>> > >> > have 10 threads on each core. Can I do this with mpich ?
>>>  Rememebre
>>> > that
>>> > >> > this an smp abd I can not identify each core individually(as in a
>>> > >> > cluster)
>>> > >> > >
>>> > >> > > Regards, bob
>>> > >>
>>> > >> > > _______________________________________________
>>> > >> > > discuss mailing list     discuss at mpich.org
>>> > >> > > To manage subscription options or unsubscribe:
>>> > >> > > https://lists.mpich.org/mailman/listinfo/discuss
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> > ------------------------------
>>> > >> >
>>> > >> > Message: 6
>>> > >> > Date: Wed, 10 Jul 2013 16:50:36 +0200
>>> > >> > From: Thomas Ropars <thomas.ropars at epfl.ch>
>>> > >> > To: discuss at mpich.org
>>> > >> > Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple
>>> ring
>>> > >> >         test over TCP
>>> > >> > Message-ID: <51DD74BC.3020009 at epfl.ch>
>>> > >> > Content-Type: text/plain; charset=UTF-8; format=flowed
>>> > >> >
>>> > >> > Yes, you are right, sorry for disturbing.
>>> > >> >
>>> > >> > On 07/10/2013 03:39 PM, Wesley Bland wrote:
>>> > >> > > The value of previous for rank 0 in your code is -1. MPICH is
>>> > >> > complaining because all of the requests to receive a message from
>>> -1
>>> > are
>>> > >> > still pending when you try to finalize. You need to make sure
>>> that you
>>> > >> > are
>>> > >> > receiving from valid ranks.
>>> > >> > >
>>> > >> > > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <
>>> thomas.ropars at epfl.ch>
>>> > >> > wrote:
>>> > >> > >
>>> > >> > >> Yes sure. Here it is.
>>> > >> > >>
>>> > >> > >> Thomas
>>> > >> > >>
>>> > >> > >> On 07/10/2013 02:23 PM, Wesley Bland wrote:
>>> > >> > >>> Can you send us the smallest chunk of code that still exhibits
>>> > this
>>> > >> > error?
>>> > >> > >>>
>>> > >> > >>> Wesley
>>> > >> > >>>
>>> > >> > >>> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <
>>> thomas.ropars at epfl.ch
>>> > >
>>> > >> > wrote:
>>> > >> > >>>
>>> > >> > >>>> Hi all,
>>> > >> > >>>>
>>> > >> > >>>> I get the following error when I try to run a simple
>>> application
>>> > >> > implementing a ring (each process sends to rank+1 and receives
>>> from
>>> > >> > rank-1). More precisely, the error occurs during the call to
>>> > >> > MPI_Finalize():
>>> > >> > >>>>
>>> > >> > >>>> Assertion failed in file
>>> > >> > src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363:
>>> > >> > sc->pg_is_set
>>> > >> > >>>> internal ABORT - process 0
>>> > >> > >>>>
>>> > >> > >>>> Does anybody else also noticed the same error?
>>> > >> > >>>>
>>> > >> > >>>> Here are all the details about my test:
>>> > >> > >>>> - The error is generated with mpich-3.0.2 (but I noticed the
>>> > exact
>>> > >> > same error with mpich-3.0.4)
>>> > >> > >>>> - I am using IPoIB for communication between nodes (The same
>>> > thing
>>> > >> > happens over Ethernet)
>>> > >> > >>>> - The problem comes from TCP links. When all processes are
>>> on the
>>> > >> > same node, there is no error. As soon as one process is on a
>>> remote
>>> > >> > node,
>>> > >> > the failure occurs.
>>> > >> > >>>> - Note also that the failure does not occur if I run a more
>>> > complex
>>> > >> > code (eg, a NAS benchmark).
>>> > >> > >>>>
>>> > >> > >>>> Thomas Ropars
>>> > >>
>>> > >> > >>>> _______________________________________________
>>> > >> > >>>> discuss mailing list     discuss at mpich.org
>>> > >> > >>>> To manage subscription options or unsubscribe:
>>> > >> > >>>> https://lists.mpich.org/mailman/listinfo/discuss
>>> > >> > >>> _______________________________________________
>>> > >> > >>> discuss mailing list     discuss at mpich.org
>>> > >> > >>> To manage subscription options or unsubscribe:
>>> > >> > >>> https://lists.mpich.org/mailman/listinfo/discuss
>>> > >> > >>>
>>> > >> > >>>
>>> > >> > >> <ring_clean.c>_______________________________________________
>>> > >>
>>> > >> > >> discuss mailing list     discuss at mpich.org
>>> > >> > >> To manage subscription options or unsubscribe:
>>> > >> > >> https://lists.mpich.org/mailman/listinfo/discuss
>>> > >> > > _______________________________________________
>>> > >> > > discuss mailing list     discuss at mpich.org
>>> > >> > > To manage subscription options or unsubscribe:
>>> > >> > > https://lists.mpich.org/mailman/listinfo/discuss
>>> > >> > >
>>> > >> > >
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> > ------------------------------
>>> > >> >
>>> > >> > Message: 7
>>> > >> > Date: Wed, 10 Jul 2013 10:07:21 -0500
>>> > >> > From: Sufeng Niu <sniu at hawk.iit.edu>
>>> > >> > To: discuss at mpich.org
>>> > >> > Subject: [mpich-discuss] MPI_Win_fence failed
>>> > >> > Message-ID:
>>> > >> >         <
>>> > >> >
>>> CAFNNHkz_1gC7hfpx0G9j24adO-gDabdmwZ4VuT6jip-fDMhS9A at mail.gmail.com>
>>> > >> > Content-Type: text/plain; charset="iso-8859-1"
>>> > >>
>>> > >> >
>>> > >> > Hello,
>>> > >> >
>>> > >> > I used MPI RMA in my program, but the program stop at the
>>> > MPI_Win_fence,
>>> > >> > I
>>> > >> > have a master process receive data from udp socket. Other
>>> processes
>>> > use
>>> > >> > MPI_Get to access data.
>>> > >> >
>>> > >> > master process:
>>> > >> >
>>> > >> > MPI_Create(...)
>>> > >> > for(...){
>>> > >> > /* udp recv operation */
>>> > >> >
>>> > >> > MPI_Barrier  // to let other process know data received from udp
>>> is
>>> > >> > ready
>>> > >> >
>>> > >> > MPI_Win_fence(0, win);
>>> > >> > MPI_Win_fence(0, win);
>>> > >> >
>>> > >> > }
>>> > >> >
>>> > >> > other processes:
>>> > >> >
>>> > >> > for(...){
>>> > >> >
>>> > >> > MPI_Barrier  // sync for udp data ready
>>> > >> >
>>> > >> > MPI_Win_fence(0, win);
>>> > >> >
>>> > >> > MPI_Get();
>>> > >> >
>>> > >> > MPI_Win_fence(0, win);  <-- program stopped here
>>> > >> >
>>> > >> > /* other operation */
>>> > >> > }
>>> > >> >
>>> > >> > I found that the program stopped at second MPI_Win_fence, the
>>> terminal
>>> > >> > output is:
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> >
>>> >
>>> ===================================================================================
>>> > >> > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> > >> > =   EXIT CODE: 11
>>> > >> > =   CLEANING UP REMAINING PROCESSES
>>> > >> > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> > >> >
>>> > >> >
>>> > >> >
>>> >
>>> ===================================================================================
>>> > >> > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation
>>> fault
>>> > >> > (signal 11)
>>> > >> > This typically refers to a problem with your application.
>>> > >> > Please see the FAQ page for debugging suggestions
>>> > >> >
>>> > >> > Do you have any suggestions? Thank you very much!
>>> > >> >
>>> > >> > --
>>> > >> > Best Regards,
>>> > >> > Sufeng Niu
>>> > >> > ECASP lab, ECE department, Illinois Institute of Technology
>>> > >> > Tel: 312-731-7219
>>> > >> > -------------- next part --------------
>>> > >> > An HTML attachment was scrubbed...
>>> > >> > URL: <
>>> > >> >
>>> > >> >
>>> >
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/375a95ac/attachment-0001.html
>>> > >> > >
>>> > >> >
>>> > >> > ------------------------------
>>> > >> >
>>> > >> > Message: 8
>>> > >> > Date: Wed, 10 Jul 2013 11:12:45 -0400
>>> > >> > From: Jim Dinan <james.dinan at gmail.com>
>>> > >> > To: discuss at mpich.org
>>> > >> > Subject: Re: [mpich-discuss] MPI_Win_fence failed
>>> > >> > Message-ID:
>>> > >> >         <CAOoEU4F3hX=y3yrJKYKucNeiueQYBeR_3OQas9E+mg+GM6Rz=
>>> > >> > w at mail.gmail.com>
>>> > >> > Content-Type: text/plain; charset="iso-8859-1"
>>> > >>
>>> > >> >
>>> > >> > It's hard to tell where the segmentation fault is coming from.
>>>  Can
>>> > you
>>> > >> > use
>>> > >> > a debugger to generate a backtrace?
>>> > >> >
>>> > >> >  ~Jim.
>>> > >> >
>>> > >> >
>>> > >> > On Wed, Jul 10, 2013 at 11:07 AM, Sufeng Niu <sniu at hawk.iit.edu>
>>> > wrote:
>>> > >> >
>>> > >> > > Hello,
>>> > >> > >
>>> > >> > > I used MPI RMA in my program, but the program stop at the
>>> > >> > > MPI_Win_fence,
>>> > >> > I
>>> > >> > > have a master process receive data from udp socket. Other
>>> processes
>>> > >> > > use
>>> > >> > > MPI_Get to access data.
>>> > >> > >
>>> > >> > > master process:
>>> > >> > >
>>> > >> > > MPI_Create(...)
>>> > >> > > for(...){
>>> > >> > > /* udp recv operation */
>>> > >> > >
>>> > >> > > MPI_Barrier  // to let other process know data received from
>>> udp is
>>> > >> > > ready
>>> > >> > >
>>> > >> > > MPI_Win_fence(0, win);
>>> > >> > > MPI_Win_fence(0, win);
>>> > >> > >
>>> > >> > > }
>>> > >> > >
>>> > >> > > other processes:
>>> > >> > >
>>> > >> > > for(...){
>>> > >> > >
>>> > >> > > MPI_Barrier  // sync for udp data ready
>>> > >> > >
>>> > >> > > MPI_Win_fence(0, win);
>>> > >> > >
>>> > >> > > MPI_Get();
>>> > >> > >
>>> > >> > > MPI_Win_fence(0, win);  <-- program stopped here
>>> > >> > >
>>> > >> > > /* other operation */
>>> > >> > > }
>>> > >> > >
>>> > >> > > I found that the program stopped at second MPI_Win_fence, the
>>> > terminal
>>> > >> > > output is:
>>> > >> > >
>>> > >> > >
>>> > >> > >
>>> > >> > >
>>> > >> >
>>> > >> >
>>> >
>>> ===================================================================================
>>> > >> > > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> > >> > > =   EXIT CODE: 11
>>> > >> > > =   CLEANING UP REMAINING PROCESSES
>>> > >> > > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> > >> > >
>>> > >> > >
>>> > >> >
>>> > >> >
>>> >
>>> ===================================================================================
>>> > >> > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation
>>> fault
>>> > >> > > (signal 11)
>>> > >> > > This typically refers to a problem with your application.
>>> > >> > > Please see the FAQ page for debugging suggestions
>>> > >> > >
>>> > >> > > Do you have any suggestions? Thank you very much!
>>> > >> > >
>>> > >> > > --
>>> > >> > > Best Regards,
>>> > >> > > Sufeng Niu
>>> > >> > > ECASP lab, ECE department, Illinois Institute of Technology
>>> > >> > > Tel: 312-731-7219
>>> > >> > >
>>> > >> > > _______________________________________________
>>> > >> > > discuss mailing list     discuss at mpich.org
>>> > >> > > To manage subscription options or unsubscribe:
>>> > >> > > https://lists.mpich.org/mailman/listinfo/discuss
>>> > >> > >
>>> > >> > -------------- next part --------------
>>> > >> > An HTML attachment was scrubbed...
>>> > >> > URL: <
>>> > >> >
>>> > >> >
>>> >
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48c5f337/attachment.html
>>> > >> > >
>>> > >> >
>>> > >> > ------------------------------
>>> > >>
>>> > >> >
>>> > >> > _______________________________________________
>>> > >> > discuss mailing list
>>> > >> > discuss at mpich.org
>>> > >> > https://lists.mpich.org/mailman/listinfo/discuss
>>> > >> >
>>> > >> > End of discuss Digest, Vol 9, Issue 27
>>> > >> > **************************************
>>> > >>
>>> > >> >
>>> > >>
>>> > >>
>>> > >>
>>> > >> --
>>> > >> Best Regards,
>>> > >> Sufeng Niu
>>> > >> ECASP lab, ECE department, Illinois Institute of Technology
>>> > >> Tel: 312-731-7219
>>> > >> -------------- next part --------------
>>> > >> An HTML attachment was scrubbed...
>>> > >> URL:
>>> > >> <
>>> >
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/57a5e76f/attachment.html
>>> > >
>>> > >>
>>> > >> ------------------------------
>>> > >>
>>> > >>
>>> > >> _______________________________________________
>>> > >> discuss mailing list
>>> > >> discuss at mpich.org
>>> > >> https://lists.mpich.org/mailman/listinfo/discuss
>>> > >>
>>> > >> End of discuss Digest, Vol 9, Issue 28
>>> > >> **************************************
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Best Regards,
>>> > > Sufeng Niu
>>> > > ECASP lab, ECE department, Illinois Institute of Technology
>>> > > Tel: 312-731-7219
>>> > >
>>> > > _______________________________________________
>>> > > discuss mailing list     discuss at mpich.org
>>> > > To manage subscription options or unsubscribe:
>>> > > https://lists.mpich.org/mailman/listinfo/discuss
>>> >
>>> >
>>> >
>>> > --
>>> > Jeff Hammond
>>> > jeff.science at gmail.com
>>> >
>>> >
>>> > ------------------------------
>>> >
>>> > Message: 2
>>> > Date: Wed, 10 Jul 2013 11:57:31 -0500
>>> > From: Sufeng Niu <sniu at hawk.iit.edu>
>>> > To: discuss at mpich.org
>>> > Subject: Re: [mpich-discuss] MPI_Win_fence failed
>>> > Message-ID:
>>> >         <
>>> > CAFNNHkzKmAg8B6hamyrr7B2anU9EP_0yxmajxePVr35UnHVavw at mail.gmail.com>
>>> > Content-Type: text/plain; charset="iso-8859-1"
>>> >
>>> > Sorry, I found that this discussion email cannot add figure or
>>> attachment.
>>> >
>>> > the backtrace information is below:
>>> >
>>> > processes               Location
>>> > PC                        Host     Rank       ID      Status
>>> > 7                            _start
>>> > 0x00402399
>>> > `-7                          _libc_start_main
>>> > 0x3685c1ecdd
>>> >    `-7                       main
>>> > 0x00402474
>>> >       `-7                    dkm
>>> > ...
>>> >         |-6                   image_rms
>>> > 0x004029bb
>>> >         | `-6                 rms
>>> > 0x00402d44
>>> >         |   `-6               PMPI_Win_fence
>>> >  0x0040c389
>>> >         |      `-6            MPIDI_Win_fence
>>> > 0x004a45f4
>>> >         |        `-6          MPIDI_CH3I_RMAListComplete 0x004a27d3
>>> >         |          `-6        MPIDI_CH3I_Progress               ...
>>> >         `-1                   udp
>>> > 0x004035cf
>>> >           `-1                PMPI_Win_fence
>>> > 0x0040c389
>>> >             `-1              MPIDI_Win_fence
>>> >  0x004a45a0
>>> >                `-1           MPIDI_CH3I_Progress
>>> 0x004292f5
>>> >                  `-1         MPIDI_CH3_PktHandler_Get      0x0049f3f9
>>> >                    `-1       MPIDI_CH3_iSendv
>>> 0x004aa67c
>>> >                      `-       memcpy
>>> > 0x3685c89329  164.54.54.122    0  20.1-13994 Stopped
>>> >
>>> >
>>> >
>>> > On Wed, Jul 10, 2013 at 11:39 AM, <discuss-request at mpich.org> wrote:
>>> >
>>> > > Send discuss mailing list submissions to
>>> > >         discuss at mpich.org
>>> > >
>>> > > To subscribe or unsubscribe via the World Wide Web, visit
>>> > >         https://lists.mpich.org/mailman/listinfo/discuss
>>> > > or, via email, send a message with subject or body 'help' to
>>> > >         discuss-request at mpich.org
>>> > >
>>> > > You can reach the person managing the list at
>>> > >         discuss-owner at mpich.org
>>> > >
>>> > > When replying, please edit your Subject line so it is more specific
>>> > > than "Re: Contents of discuss digest..."
>>> > >
>>> > >
>>> > > Today's Topics:
>>> > >
>>> > >    1. Re:  MPI_Win_fence failed (Sufeng Niu)
>>> > >
>>> > >
>>> > >
>>> ----------------------------------------------------------------------
>>> > >
>>> > > Message: 1
>>> > > Date: Wed, 10 Jul 2013 11:39:39 -0500
>>> > > From: Sufeng Niu <sniu at hawk.iit.edu>
>>> > > To: discuss at mpich.org
>>> > > Subject: Re: [mpich-discuss] MPI_Win_fence failed
>>> > > Message-ID:
>>> > >         <CAFNNHkz8pBfX33icn=+3rdXvqDfWqeu58odpd=
>>> > > mOXLciysHgfg at mail.gmail.com>
>>> > > Content-Type: text/plain; charset="iso-8859-1"
>>> > >
>>> > > Sorry I forget to add screen shot for backtrace. the screen shot is
>>> > > attached.
>>> > >
>>> > > Thanks a lot!
>>> > >
>>> > > Sufeng
>>> > >
>>> > >
>>> > >
>>> > > On Wed, Jul 10, 2013 at 11:30 AM, <discuss-request at mpich.org> wrote:
>>> > >
>>> > > > Send discuss mailing list submissions to
>>> > > >         discuss at mpich.org
>>> > > >
>>> > > > To subscribe or unsubscribe via the World Wide Web, visit
>>> > > >         https://lists.mpich.org/mailman/listinfo/discuss
>>> > > > or, via email, send a message with subject or body 'help' to
>>> > > >         discuss-request at mpich.org
>>> > > >
>>> > > > You can reach the person managing the list at
>>> > > >         discuss-owner at mpich.org
>>> > > >
>>> > > > When replying, please edit your Subject line so it is more specific
>>> > > > than "Re: Contents of discuss digest..."
>>> > > >
>>> > > >
>>> > > > Today's Topics:
>>> > > >
>>> > > >    1. Re:  MPI_Win_fence failed (Sufeng Niu)
>>> > > >
>>> > > >
>>> > > >
>>> ----------------------------------------------------------------------
>>> > > >
>>> > > > Message: 1
>>> > > > Date: Wed, 10 Jul 2013 11:30:36 -0500
>>> > > > From: Sufeng Niu <sniu at hawk.iit.edu>
>>> > > > To: discuss at mpich.org
>>> > > > Subject: Re: [mpich-discuss] MPI_Win_fence failed
>>> > > > Message-ID:
>>> > > >         <
>>> > > > CAFNNHkyLj8CbYMmc_w2DA9_+q2Oe3kyus+g6c99ShPk6ZXVkdA at mail.gmail.com
>>> >
>>> > > > Content-Type: text/plain; charset="iso-8859-1"
>>> > > >
>>> > > > Hi Jim,
>>> > > >
>>> > > > Thanks a lot for your reply. the basic way for me to debugging is
>>> > > > barrier+printf, right now I only have an evaluation version of
>>> > totalview.
>>> > > > the backtrace using totalview shown below. the udp is the udp
>>> > collection
>>> > > > and create RMA window, image_rms doing MPI_Get to access the window
>>> > > >
>>> > > >  There is a segment violation, but I don't know why the program
>>> stopped
>>> > > at
>>> > > > MPI_Win_fence.
>>> > > >
>>> > > > Thanks a lot!
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > On Wed, Jul 10, 2013 at 10:12 AM, <discuss-request at mpich.org>
>>> wrote:
>>> > > >
>>> > > > > Send discuss mailing list submissions to
>>> > > > >         discuss at mpich.org
>>> > > > >
>>> > > > > To subscribe or unsubscribe via the World Wide Web, visit
>>> > > > >         https://lists.mpich.org/mailman/listinfo/discuss
>>> > > > > or, via email, send a message with subject or body 'help' to
>>> > > > >         discuss-request at mpich.org
>>> > > > >
>>> > > > > You can reach the person managing the list at
>>> > > > >         discuss-owner at mpich.org
>>> > > > >
>>> > > > > When replying, please edit your Subject line so it is more
>>> specific
>>> > > > > than "Re: Contents of discuss digest..."
>>> > > > >
>>> > > > >
>>> > > > > Today's Topics:
>>> > > > >
>>> > > > >    1. Re:  MPICH3.0.4 make fails with "No rule to make
>>>  target..."
>>> > > > >       (Wesley Bland)
>>> > > > >    2. Re:  Error in MPI_Finalize on a simple ring test  over TCP
>>> > > > >       (Wesley Bland)
>>> > > > >    3.  Restrict number of cores, not threads (Bob Ilgner)
>>> > > > >    4. Re:  Restrict number of cores, not threads (Wesley Bland)
>>> > > > >    5. Re:  Restrict number of cores, not threads (Wesley Bland)
>>> > > > >    6. Re:  Error in MPI_Finalize on a simple ring test over TCP
>>> > > > >       (Thomas Ropars)
>>> > > > >    7.  MPI_Win_fence failed (Sufeng Niu)
>>> > > > >    8. Re:  MPI_Win_fence failed (Jim Dinan)
>>> > > > >
>>> > > > >
>>> > > > >
>>> > ----------------------------------------------------------------------
>>> > > > >
>>> > > > > Message: 1
>>> > > > > Date: Wed, 10 Jul 2013 08:29:06 -0500
>>> > > > > From: Wesley Bland <wbland at mcs.anl.gov>
>>> > > > > To: discuss at mpich.org
>>> > > > > Subject: Re: [mpich-discuss] MPICH3.0.4 make fails with "No rule
>>> to
>>> > > > >         make    target..."
>>> > > > > Message-ID: <F48FC916-31F7-4F82-95F8-2D6A6C45264F at mcs.anl.gov>
>>> > > > > Content-Type: text/plain; charset="iso-8859-1"
>>> > > > >
>>> > > > > Unfortunately, due to the lack of developer resources and
>>> interest,
>>> > the
>>> > > > > last version of MPICH which was supported on Windows was 1.4.1p.
>>> You
>>> > > can
>>> > > > > find that version on the downloads page:
>>> > > > >
>>> > > > > http://www.mpich.org/downloads/
>>> > > > >
>>> > > > > Alternatively, Microsoft maintains a derivative of MPICH which
>>> should
>>> > > > > provide the features you need. You also find a link to that on
>>> the
>>> > > > > downloads page above.
>>> > > > >
>>> > > > > Wesley
>>> > > > >
>>> > > > > On Jul 10, 2013, at 1:16 AM, Don Warren <don.warren at gmail.com>
>>> > wrote:
>>> > > > >
>>> > > > > > Hello,
>>> > > > > >
>>> > > > > > As requested in the installation guide, I'm informing this
>>> list of
>>> > a
>>> > > > > failure to correctly make MPICH3.0.4 on a Win7 system.  The
>>> specific
>>> > > > error
>>> > > > > encountered is
>>> > > > > > "make[2]: *** No rule to make target
>>> > > > > `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.am',
>>> needed by
>>> > > > > `/cygdrive/c/FLASH/mpich-3.0.4/src/mpi/romio/Makefile.in'.
>>>  Stop."
>>> > > > > >
>>> > > > > > I have confirmed that both Makefile.am and Makefile.in exist
>>> in the
>>> > > > > directory listed.  I'm attaching the c.txt and the m.txt files.
>>> > > > > >
>>> > > > > > Possibly of interest is that the command "make clean" fails at
>>> > > exactly
>>> > > > > the same folder, with exactly the same error message as shown in
>>> > m.txt
>>> > > > and
>>> > > > > above.
>>> > > > > >
>>> > > > > > Any advice you can give would be appreciated.  I'm attempting
>>> to
>>> > get
>>> > > > > FLASH running on my computer, which seems to require MPICH.
>>> > > > > >
>>> > > > > > Regards,
>>> > > > > > Don Warren
>>> > > > > >
>>> > > >
>>> >
>>> <config-make-outputs.zip>_______________________________________________
>>> > > > > > discuss mailing list     discuss at mpich.org
>>> > > > > > To manage subscription options or unsubscribe:
>>> > > > > > https://lists.mpich.org/mailman/listinfo/discuss
>>> > > > >
>>> > > > > -------------- next part --------------
>>> > > > > An HTML attachment was scrubbed...
>>> > > > > URL: <
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/69b497f1/attachment-0001.html
>>> > > > > >
>>> > > > >
>>> > > > > ------------------------------
>>> > > > >
>>> > > > > Message: 2
>>> > > > > Date: Wed, 10 Jul 2013 08:39:47 -0500
>>> > > > > From: Wesley Bland <wbland at mcs.anl.gov>
>>> > > > > To: discuss at mpich.org
>>> > > > > Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple
>>> ring
>>> > > > >         test    over TCP
>>> > > > > Message-ID: <D5999106-2A75-4091-8B0F-EAFA22880862 at mcs.anl.gov>
>>> > > > > Content-Type: text/plain; charset=us-ascii
>>> > > > >
>>> > > > > The value of previous for rank 0 in your code is -1. MPICH is
>>> > > complaining
>>> > > > > because all of the requests to receive a message from -1 are
>>> still
>>> > > > pending
>>> > > > > when you try to finalize. You need to make sure that you are
>>> > receiving
>>> > > > from
>>> > > > > valid ranks.
>>> > > > >
>>> > > > > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <
>>> thomas.ropars at epfl.ch>
>>> > > > wrote:
>>> > > > >
>>> > > > > > Yes sure. Here it is.
>>> > > > > >
>>> > > > > > Thomas
>>> > > > > >
>>> > > > > > On 07/10/2013 02:23 PM, Wesley Bland wrote:
>>> > > > > >> Can you send us the smallest chunk of code that still exhibits
>>> > this
>>> > > > > error?
>>> > > > > >>
>>> > > > > >> Wesley
>>> > > > > >>
>>> > > > > >> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <
>>> thomas.ropars at epfl.ch
>>> > >
>>> > > > > wrote:
>>> > > > > >>
>>> > > > > >>> Hi all,
>>> > > > > >>>
>>> > > > > >>> I get the following error when I try to run a simple
>>> application
>>> > > > > implementing a ring (each process sends to rank+1 and receives
>>> from
>>> > > > > rank-1). More precisely, the error occurs during the call to
>>> > > > MPI_Finalize():
>>> > > > > >>>
>>> > > > > >>> Assertion failed in file
>>> > > > > src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363:
>>> > > > sc->pg_is_set
>>> > > > > >>> internal ABORT - process 0
>>> > > > > >>>
>>> > > > > >>> Does anybody else also noticed the same error?
>>> > > > > >>>
>>> > > > > >>> Here are all the details about my test:
>>> > > > > >>> - The error is generated with mpich-3.0.2 (but I noticed the
>>> > exact
>>> > > > > same error with mpich-3.0.4)
>>> > > > > >>> - I am using IPoIB for communication between nodes (The same
>>> > thing
>>> > > > > happens over Ethernet)
>>> > > > > >>> - The problem comes from TCP links. When all processes are
>>> on the
>>> > > > same
>>> > > > > node, there is no error. As soon as one process is on a remote
>>> node,
>>> > > the
>>> > > > > failure occurs.
>>> > > > > >>> - Note also that the failure does not occur if I run a more
>>> > complex
>>> > > > > code (eg, a NAS benchmark).
>>> > > > > >>>
>>> > > > > >>> Thomas Ropars
>>> > > > > >>> _______________________________________________
>>> > > > > >>> discuss mailing list     discuss at mpich.org
>>> > > > > >>> To manage subscription options or unsubscribe:
>>> > > > > >>> https://lists.mpich.org/mailman/listinfo/discuss
>>> > > > > >> _______________________________________________
>>> > > > > >> discuss mailing list     discuss at mpich.org
>>> > > > > >> To manage subscription options or unsubscribe:
>>> > > > > >> https://lists.mpich.org/mailman/listinfo/discuss
>>> > > > > >>
>>> > > > > >>
>>> > > > > >
>>> > > > > > <ring_clean.c>_______________________________________________
>>> > > > > > discuss mailing list     discuss at mpich.org
>>> > > > > > To manage subscription options or unsubscribe:
>>> > > > > > https://lists.mpich.org/mailman/listinfo/discuss
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > ------------------------------
>>> > > > >
>>> > > > > Message: 3
>>> > > > > Date: Wed, 10 Jul 2013 16:41:27 +0200
>>> > > > > From: Bob Ilgner <bobilgner at gmail.com>
>>> > > > > To: mpich-discuss at mcs.anl.gov
>>> > > > > Subject: [mpich-discuss] Restrict number of cores, not threads
>>> > > > > Message-ID:
>>> > > > >         <
>>> > > > >
>>> CAKv15b-QgmHkVkoiTFmP3EZXvyy6sc_QeqHQgbMUhnr3Xh9ecA at mail.gmail.com>
>>> > > > > Content-Type: text/plain; charset="iso-8859-1"
>>> > > > >
>>> > > > > Dear all,
>>> > > > >
>>> > > > > I am working on a shared memory processor with 256 cores. I am
>>> > working
>>> > > > from
>>> > > > > the command line directly.
>>> > > > >
>>> > > > > Can I restict the number of cores that I deploy.The command
>>> > > > >
>>> > > > > mpirun -n 100 myprog
>>> > > > >
>>> > > > >
>>> > > > > will automatically start on 100 cores. I wish to use only 10
>>> cores
>>> > and
>>> > > > have
>>> > > > > 10 threads on each core. Can I do this with mpich ?  Rememebre
>>> that
>>> > > this
>>> > > > an
>>> > > > > smp abd I can not identify each core individually(as in a
>>> cluster)
>>> > > > >
>>> > > > > Regards, bob
>>> > > > > -------------- next part --------------
>>> > > > > An HTML attachment was scrubbed...
>>> > > > > URL: <
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/ec659e91/attachment-0001.html
>>> > > > > >
>>> > > > >
>>> > > > > ------------------------------
>>> > > > >
>>> > > > > Message: 4
>>> > > > > Date: Wed, 10 Jul 2013 09:46:38 -0500
>>> > > > > From: Wesley Bland <wbland at mcs.anl.gov>
>>> > > > > To: discuss at mpich.org
>>> > > > > Cc: mpich-discuss at mcs.anl.gov
>>> > > > > Subject: Re: [mpich-discuss] Restrict number of cores, not
>>> threads
>>> > > > > Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
>>> > > > > Content-Type: text/plain; charset=iso-8859-1
>>> > > > >
>>> > > > > Threads in MPI are not ranks. When you say you want to launch
>>> with -n
>>> > > > 100,
>>> > > > > you will always get 100 processes, not threads. If you want 10
>>> > threads
>>> > > on
>>> > > > > 10 cores, you will need to launch with -n 10, then add your
>>> threads
>>> > > > > according to your threading library.
>>> > > > >
>>> > > > > Note that threads in MPI do not get their own rank currently.
>>> They
>>> > all
>>> > > > > share the same rank as the process in which they reside, so if
>>> you
>>> > need
>>> > > > to
>>> > > > > be able to handle things with different ranks, you'll need to use
>>> > > actual
>>> > > > > processes.
>>> > > > >
>>> > > > > Wesley
>>> > > > >
>>> > > > > On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com>
>>> wrote:
>>> > > > >
>>> > > > > > Dear all,
>>> > > > > >
>>> > > > > > I am working on a shared memory processor with 256 cores. I am
>>> > > working
>>> > > > > from the command line directly.
>>> > > > > >
>>> > > > > > Can I restict the number of cores that I deploy.The command
>>> > > > > >
>>> > > > > > mpirun -n 100 myprog
>>> > > > > >
>>> > > > > >
>>> > > > > > will automatically start on 100 cores. I wish to use only 10
>>> cores
>>> > > and
>>> > > > > have 10 threads on each core. Can I do this with mpich ?
>>>  Rememebre
>>> > > that
>>> > > > > this an smp abd I can not identify each core individually(as in a
>>> > > > cluster)
>>> > > > > >
>>> > > > > > Regards, bob
>>> > > > > > _______________________________________________
>>> > > > > > discuss mailing list     discuss at mpich.org
>>> > > > > > To manage subscription options or unsubscribe:
>>> > > > > > https://lists.mpich.org/mailman/listinfo/discuss
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > ------------------------------
>>> > > > >
>>> > > > > Message: 5
>>> > > > > Date: Wed, 10 Jul 2013 09:46:38 -0500
>>> > > > > From: Wesley Bland <wbland at mcs.anl.gov>
>>> > > > > To: discuss at mpich.org
>>> > > > > Cc: mpich-discuss at mcs.anl.gov
>>> > > > > Subject: Re: [mpich-discuss] Restrict number of cores, not
>>> threads
>>> > > > > Message-ID: <2FAF588E-2FBE-45E4-B53F-E6BC931E3D51 at mcs.anl.gov>
>>> > > > > Content-Type: text/plain; charset=iso-8859-1
>>> > > > >
>>> > > > > Threads in MPI are not ranks. When you say you want to launch
>>> with -n
>>> > > > 100,
>>> > > > > you will always get 100 processes, not threads. If you want 10
>>> > threads
>>> > > on
>>> > > > > 10 cores, you will need to launch with -n 10, then add your
>>> threads
>>> > > > > according to your threading library.
>>> > > > >
>>> > > > > Note that threads in MPI do not get their own rank currently.
>>> They
>>> > all
>>> > > > > share the same rank as the process in which they reside, so if
>>> you
>>> > need
>>> > > > to
>>> > > > > be able to handle things with different ranks, you'll need to use
>>> > > actual
>>> > > > > processes.
>>> > > > >
>>> > > > > Wesley
>>> > > > >
>>> > > > > On Jul 10, 2013, at 9:41 AM, Bob Ilgner <bobilgner at gmail.com>
>>> wrote:
>>> > > > >
>>> > > > > > Dear all,
>>> > > > > >
>>> > > > > > I am working on a shared memory processor with 256 cores. I am
>>> > > working
>>> > > > > from the command line directly.
>>> > > > > >
>>> > > > > > Can I restict the number of cores that I deploy.The command
>>> > > > > >
>>> > > > > > mpirun -n 100 myprog
>>> > > > > >
>>> > > > > >
>>> > > > > > will automatically start on 100 cores. I wish to use only 10
>>> cores
>>> > > and
>>> > > > > have 10 threads on each core. Can I do this with mpich ?
>>>  Rememebre
>>> > > that
>>> > > > > this an smp abd I can not identify each core individually(as in a
>>> > > > cluster)
>>> > > > > >
>>> > > > > > Regards, bob
>>> > > > > > _______________________________________________
>>> > > > > > discuss mailing list     discuss at mpich.org
>>> > > > > > To manage subscription options or unsubscribe:
>>> > > > > > https://lists.mpich.org/mailman/listinfo/discuss
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > ------------------------------
>>> > > > >
>>> > > > > Message: 6
>>> > > > > Date: Wed, 10 Jul 2013 16:50:36 +0200
>>> > > > > From: Thomas Ropars <thomas.ropars at epfl.ch>
>>> > > > > To: discuss at mpich.org
>>> > > > > Subject: Re: [mpich-discuss] Error in MPI_Finalize on a simple
>>> ring
>>> > > > >         test over TCP
>>> > > > > Message-ID: <51DD74BC.3020009 at epfl.ch>
>>> > > > > Content-Type: text/plain; charset=UTF-8; format=flowed
>>> > > > >
>>> > > > > Yes, you are right, sorry for disturbing.
>>> > > > >
>>> > > > > On 07/10/2013 03:39 PM, Wesley Bland wrote:
>>> > > > > > The value of previous for rank 0 in your code is -1. MPICH is
>>> > > > > complaining because all of the requests to receive a message
>>> from -1
>>> > > are
>>> > > > > still pending when you try to finalize. You need to make sure
>>> that
>>> > you
>>> > > > are
>>> > > > > receiving from valid ranks.
>>> > > > > >
>>> > > > > > On Jul 10, 2013, at 7:50 AM, Thomas Ropars <
>>> thomas.ropars at epfl.ch>
>>> > > > > wrote:
>>> > > > > >
>>> > > > > >> Yes sure. Here it is.
>>> > > > > >>
>>> > > > > >> Thomas
>>> > > > > >>
>>> > > > > >> On 07/10/2013 02:23 PM, Wesley Bland wrote:
>>> > > > > >>> Can you send us the smallest chunk of code that still
>>> exhibits
>>> > this
>>> > > > > error?
>>> > > > > >>>
>>> > > > > >>> Wesley
>>> > > > > >>>
>>> > > > > >>> On Jul 10, 2013, at 6:54 AM, Thomas Ropars <
>>> > thomas.ropars at epfl.ch>
>>> > > > > wrote:
>>> > > > > >>>
>>> > > > > >>>> Hi all,
>>> > > > > >>>>
>>> > > > > >>>> I get the following error when I try to run a simple
>>> application
>>> > > > > implementing a ring (each process sends to rank+1 and receives
>>> from
>>> > > > > rank-1). More precisely, the error occurs during the call to
>>> > > > MPI_Finalize():
>>> > > > > >>>>
>>> > > > > >>>> Assertion failed in file
>>> > > > > src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 363:
>>> > > > sc->pg_is_set
>>> > > > > >>>> internal ABORT - process 0
>>> > > > > >>>>
>>> > > > > >>>> Does anybody else also noticed the same error?
>>> > > > > >>>>
>>> > > > > >>>> Here are all the details about my test:
>>> > > > > >>>> - The error is generated with mpich-3.0.2 (but I noticed the
>>> > exact
>>> > > > > same error with mpich-3.0.4)
>>> > > > > >>>> - I am using IPoIB for communication between nodes (The same
>>> > thing
>>> > > > > happens over Ethernet)
>>> > > > > >>>> - The problem comes from TCP links. When all processes are
>>> on
>>> > the
>>> > > > > same node, there is no error. As soon as one process is on a
>>> remote
>>> > > node,
>>> > > > > the failure occurs.
>>> > > > > >>>> - Note also that the failure does not occur if I run a more
>>> > > complex
>>> > > > > code (eg, a NAS benchmark).
>>> > > > > >>>>
>>> > > > > >>>> Thomas Ropars
>>> > > > > >>>> _______________________________________________
>>> > > > > >>>> discuss mailing list     discuss at mpich.org
>>> > > > > >>>> To manage subscription options or unsubscribe:
>>> > > > > >>>> https://lists.mpich.org/mailman/listinfo/discuss
>>> > > > > >>> _______________________________________________
>>> > > > > >>> discuss mailing list     discuss at mpich.org
>>> > > > > >>> To manage subscription options or unsubscribe:
>>> > > > > >>> https://lists.mpich.org/mailman/listinfo/discuss
>>> > > > > >>>
>>> > > > > >>>
>>> > > > > >> <ring_clean.c>_______________________________________________
>>> > > > > >> discuss mailing list     discuss at mpich.org
>>> > > > > >> To manage subscription options or unsubscribe:
>>> > > > > >> https://lists.mpich.org/mailman/listinfo/discuss
>>> > > > > > _______________________________________________
>>> > > > > > discuss mailing list     discuss at mpich.org
>>> > > > > > To manage subscription options or unsubscribe:
>>> > > > > > https://lists.mpich.org/mailman/listinfo/discuss
>>> > > > > >
>>> > > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > ------------------------------
>>> > > > >
>>> > > > > Message: 7
>>> > > > > Date: Wed, 10 Jul 2013 10:07:21 -0500
>>> > > > > From: Sufeng Niu <sniu at hawk.iit.edu>
>>> > > > > To: discuss at mpich.org
>>> > > > > Subject: [mpich-discuss] MPI_Win_fence failed
>>> > > > > Message-ID:
>>> > > > >         <
>>> > > > >
>>> CAFNNHkz_1gC7hfpx0G9j24adO-gDabdmwZ4VuT6jip-fDMhS9A at mail.gmail.com>
>>> > > > > Content-Type: text/plain; charset="iso-8859-1"
>>> > > > >
>>> > > > > Hello,
>>> > > > >
>>> > > > > I used MPI RMA in my program, but the program stop at the
>>> > > MPI_Win_fence,
>>> > > > I
>>> > > > > have a master process receive data from udp socket. Other
>>> processes
>>> > use
>>> > > > > MPI_Get to access data.
>>> > > > >
>>> > > > > master process:
>>> > > > >
>>> > > > > MPI_Create(...)
>>> > > > > for(...){
>>> > > > > /* udp recv operation */
>>> > > > >
>>> > > > > MPI_Barrier  // to let other process know data received from udp
>>> is
>>> > > ready
>>> > > > >
>>> > > > > MPI_Win_fence(0, win);
>>> > > > > MPI_Win_fence(0, win);
>>> > > > >
>>> > > > > }
>>> > > > >
>>> > > > > other processes:
>>> > > > >
>>> > > > > for(...){
>>> > > > >
>>> > > > > MPI_Barrier  // sync for udp data ready
>>> > > > >
>>> > > > > MPI_Win_fence(0, win);
>>> > > > >
>>> > > > > MPI_Get();
>>> > > > >
>>> > > > > MPI_Win_fence(0, win);  <-- program stopped here
>>> > > > >
>>> > > > > /* other operation */
>>> > > > > }
>>> > > > >
>>> > > > > I found that the program stopped at second MPI_Win_fence, the
>>> > terminal
>>> > > > > output is:
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> ===================================================================================
>>> > > > > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> > > > > =   EXIT CODE: 11
>>> > > > > =   CLEANING UP REMAINING PROCESSES
>>> > > > > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> ===================================================================================
>>> > > > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation
>>> fault
>>> > > > > (signal 11)
>>> > > > > This typically refers to a problem with your application.
>>> > > > > Please see the FAQ page for debugging suggestions
>>> > > > >
>>> > > > > Do you have any suggestions? Thank you very much!
>>> > > > >
>>> > > > > --
>>> > > > > Best Regards,
>>> > > > > Sufeng Niu
>>> > > > > ECASP lab, ECE department, Illinois Institute of Technology
>>> > > > > Tel: 312-731-7219
>>> > > > > -------------- next part --------------
>>> > > > > An HTML attachment was scrubbed...
>>> > > > > URL: <
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/375a95ac/attachment-0001.html
>>> > > > > >
>>> > > > >
>>> > > > > ------------------------------
>>> > > > >
>>> > > > > Message: 8
>>> > > > > Date: Wed, 10 Jul 2013 11:12:45 -0400
>>> > > > > From: Jim Dinan <james.dinan at gmail.com>
>>> > > > > To: discuss at mpich.org
>>> > > > > Subject: Re: [mpich-discuss] MPI_Win_fence failed
>>> > > > > Message-ID:
>>> > > > >         <CAOoEU4F3hX=y3yrJKYKucNeiueQYBeR_3OQas9E+mg+GM6Rz=
>>> > > > > w at mail.gmail.com>
>>> > > > > Content-Type: text/plain; charset="iso-8859-1"
>>> > > > >
>>> > > > > It's hard to tell where the segmentation fault is coming from.
>>>  Can
>>> > you
>>> > > > use
>>> > > > > a debugger to generate a backtrace?
>>> > > > >
>>> > > > >  ~Jim.
>>> > > > >
>>> > > > >
>>> > > > > On Wed, Jul 10, 2013 at 11:07 AM, Sufeng Niu <sniu at hawk.iit.edu>
>>> > > wrote:
>>> > > > >
>>> > > > > > Hello,
>>> > > > > >
>>> > > > > > I used MPI RMA in my program, but the program stop at the
>>> > > > MPI_Win_fence,
>>> > > > > I
>>> > > > > > have a master process receive data from udp socket. Other
>>> processes
>>> > > use
>>> > > > > > MPI_Get to access data.
>>> > > > > >
>>> > > > > > master process:
>>> > > > > >
>>> > > > > > MPI_Create(...)
>>> > > > > > for(...){
>>> > > > > > /* udp recv operation */
>>> > > > > >
>>> > > > > > MPI_Barrier  // to let other process know data received from
>>> udp is
>>> > > > ready
>>> > > > > >
>>> > > > > > MPI_Win_fence(0, win);
>>> > > > > > MPI_Win_fence(0, win);
>>> > > > > >
>>> > > > > > }
>>> > > > > >
>>> > > > > > other processes:
>>> > > > > >
>>> > > > > > for(...){
>>> > > > > >
>>> > > > > > MPI_Barrier  // sync for udp data ready
>>> > > > > >
>>> > > > > > MPI_Win_fence(0, win);
>>> > > > > >
>>> > > > > > MPI_Get();
>>> > > > > >
>>> > > > > > MPI_Win_fence(0, win);  <-- program stopped here
>>> > > > > >
>>> > > > > > /* other operation */
>>> > > > > > }
>>> > > > > >
>>> > > > > > I found that the program stopped at second MPI_Win_fence, the
>>> > > terminal
>>> > > > > > output is:
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> ===================================================================================
>>> > > > > > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> > > > > > =   EXIT CODE: 11
>>> > > > > > =   CLEANING UP REMAINING PROCESSES
>>> > > > > > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> ===================================================================================
>>> > > > > > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation
>>> > fault
>>> > > > > > (signal 11)
>>> > > > > > This typically refers to a problem with your application.
>>> > > > > > Please see the FAQ page for debugging suggestions
>>> > > > > >
>>> > > > > > Do you have any suggestions? Thank you very much!
>>> > > > > >
>>> > > > > > --
>>> > > > > > Best Regards,
>>> > > > > > Sufeng Niu
>>> > > > > > ECASP lab, ECE department, Illinois Institute of Technology
>>> > > > > > Tel: 312-731-7219
>>> > > > > >
>>> > > > > > _______________________________________________
>>> > > > > > discuss mailing list     discuss at mpich.org
>>> > > > > > To manage subscription options or unsubscribe:
>>> > > > > > https://lists.mpich.org/mailman/listinfo/discuss
>>> > > > > >
>>> > > > > -------------- next part --------------
>>> > > > > An HTML attachment was scrubbed...
>>> > > > > URL: <
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48c5f337/attachment.html
>>> > > > > >
>>> > > > >
>>> > > > > ------------------------------
>>> > > > >
>>> > > > > _______________________________________________
>>> > > > > discuss mailing list
>>> > > > > discuss at mpich.org
>>> > > > > https://lists.mpich.org/mailman/listinfo/discuss
>>> > > > >
>>> > > > > End of discuss Digest, Vol 9, Issue 27
>>> > > > > **************************************
>>> > > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > Best Regards,
>>> > > > Sufeng Niu
>>> > > > ECASP lab, ECE department, Illinois Institute of Technology
>>> > > > Tel: 312-731-7219
>>> > > > -------------- next part --------------
>>> > > > An HTML attachment was scrubbed...
>>> > > > URL: <
>>> > > >
>>> > >
>>> >
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/57a5e76f/attachment.html
>>> > > > >
>>> > > >
>>> > > > ------------------------------
>>> > > >
>>> > > > _______________________________________________
>>> > > > discuss mailing list
>>> > > > discuss at mpich.org
>>> > > > https://lists.mpich.org/mailman/listinfo/discuss
>>> > > >
>>> > > > End of discuss Digest, Vol 9, Issue 28
>>> > > > **************************************
>>> > > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Best Regards,
>>> > > Sufeng Niu
>>> > > ECASP lab, ECE department, Illinois Institute of Technology
>>> > > Tel: 312-731-7219
>>> > > -------------- next part --------------
>>> > > An HTML attachment was scrubbed...
>>> > > URL: <
>>> > >
>>> >
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48296a33/attachment.html
>>> > > >
>>> > > -------------- next part --------------
>>> > > A non-text attachment was scrubbed...
>>> > > Name: Screenshot.png
>>> > > Type: image/png
>>> > > Size: 131397 bytes
>>> > > Desc: not available
>>> > > URL: <
>>> > >
>>> >
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/48296a33/attachment.png
>>> > > >
>>> > >
>>> > > ------------------------------
>>> > >
>>> > > _______________________________________________
>>> > > discuss mailing list
>>> > > discuss at mpich.org
>>> > > https://lists.mpich.org/mailman/listinfo/discuss
>>> > >
>>> > > End of discuss Digest, Vol 9, Issue 29
>>> > > **************************************
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Best Regards,
>>> > Sufeng Niu
>>> > ECASP lab, ECE department, Illinois Institute of Technology
>>> > Tel: 312-731-7219
>>> > -------------- next part --------------
>>> > An HTML attachment was scrubbed...
>>> > URL: <
>>> >
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/7c5cb5bf/attachment.html
>>> > >
>>> >
>>> > ------------------------------
>>> >
>>> > _______________________________________________
>>> > discuss mailing list
>>> > discuss at mpich.org
>>> > https://lists.mpich.org/mailman/listinfo/discuss
>>> >
>>> > End of discuss Digest, Vol 9, Issue 30
>>> > **************************************
>>> >
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Sufeng Niu
>>> ECASP lab, ECE department, Illinois Institute of Technology
>>> Tel: 312-731-7219
>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>> URL: <
>>> http://lists.mpich.org/pipermail/discuss/attachments/20130710/2de2b7a5/attachment.html
>>> >
>>>
>>> ------------------------------
>>>
>>> _______________________________________________
>>> discuss mailing list
>>> discuss at mpich.org
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>> End of discuss Digest, Vol 9, Issue 31
>>> **************************************
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Sufeng Niu
>> ECASP lab, ECE department, Illinois Institute of Technology
>> Tel: 312-731-7219
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130710/bc26f429/attachment.html>


More information about the discuss mailing list