[mpich-discuss] Nemesis engine query by Viswanath
Halim Amer
aamer at anl.gov
Fri Jul 31 09:43:06 CDT 2015
You can learn more about multithreading support in MPI from the standard
(www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf) in section 12.4.
If you need us to help, we need more information such as configure line,
target platform, compiler version, and most importantly, a toy program
that reproduces the bug.
--Halim
On 7/31/15 4:41 AM, Viswanath Krishnamurthy wrote:
> I did try initializing Multithreading support
> #include<mpi.h>
> int main()
> {
>
> int provided;
> int returnvalue =
> MPI_Init_thread(&argc,&argv,MPI_THREAD_MULTIPLE,&provided);
> if(provided < MPI_THREAD_MULTIPLE)
> {
> printf("\n THREAD LIBRARY DOESN'T HAVE MULTITHREADING
> SUPPORT:");
> exit(1);
> }
> }
>
> The code compiles but throws an error -
> Assertion failed in file
> /home/viswa/libraries/mpich-3.1.4/src/include/mpiimplthreadpost.h at
> line 163: depth > 0 && depth < 10
> internal ABORT - process 1
> internal ABORT - process 0
>
> Could you please refer me to some documentation for mpi_init_thread/
> MPICH-multithreading documentation as I am relatively new to it.
>
> Thanks,
> Viswanath
>
>
>
> On Fri, Jul 31, 2015 at 2:41 AM, <discuss-request at mpich.org
> <mailto:discuss-request at mpich.org>> wrote:
>
> Send discuss mailing list submissions to
> discuss at mpich.org <mailto:discuss at mpich.org>
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.mpich.org/mailman/listinfo/discuss
> or, via email, send a message with subject or body 'help' to
> discuss-request at mpich.org <mailto:discuss-request at mpich.org>
>
> You can reach the person managing the list at
> discuss-owner at mpich.org <mailto:discuss-owner at mpich.org>
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of discuss digest..."
>
>
> Today's Topics:
>
> 1. PANFS Remove RAID0 and add RAIDN to MPICH 3.2 (Victorelli, Ron)
> 2. Re: PANFS Remove RAID0 and add RAIDN to MPICH 3.2 (Rob Latham)
> 3. Re: hydra, stdin close(), and SLURM (Aaron Knister)
> 4. Re: Nemesis engine (Viswanath Krishnamurthy)
> 5. Re: Nemesis engine (Halim Amer)
> 6. Active loop in MPI_Waitany? (Dorier, Matthieu)
> 7. Re: Active loop in MPI_Waitany? (Jeff Hammond)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 29 Jul 2015 12:07:40 +0000
> From: "Victorelli, Ron" <rvictorelli at panasas.com
> <mailto:rvictorelli at panasas.com>>
> To: "discuss at mpich.org <mailto:discuss at mpich.org>"
> <discuss at mpich.org <mailto:discuss at mpich.org>>
> Subject: [mpich-discuss] PANFS Remove RAID0 and add RAIDN to MPICH 3.2
> Message-ID:
>
> <BN3PR08MB12888AACA7907A8ACED7531AA18C0 at BN3PR08MB1288.namprd08.prod.outlook.com
> <mailto:BN3PR08MB12888AACA7907A8ACED7531AA18C0 at BN3PR08MB1288.namprd08.prod.outlook.com>>
>
> Content-Type: text/plain; charset="us-ascii"
>
> I am a developer at Panasas, and we would like to provide a patch that
> removes RAID0 support and adds RAIDN support to romio (MPICH 3.2):
>
> src/mpi/romio/adio/ad_panfs/ad_panfs_open.c
>
> I currently do not have an MCS or trac account.
>
> Thank You
>
> Ron Victorelli
> Software Engineer
> Panasas, Inc
> Email: rvictorelli at panasas.com
> <mailto:rvictorelli at panasas.com><mailto:rvictorelli at panasas.com
> <mailto:rvictorelli at panasas.com>>
> Tel: 412 -323-6422 <tel:412%20-323-6422>
> www.panasas.com <http://www.panasas.com><http://www.panasas.com>
> [Panasas_Logo_LR]
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> <http://lists.mpich.org/pipermail/discuss/attachments/20150729/00b240fc/attachment-0001.html>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: image001.jpg
> Type: image/jpeg
> Size: 3610 bytes
> Desc: image001.jpg
> URL:
> <http://lists.mpich.org/pipermail/discuss/attachments/20150729/00b240fc/attachment-0001.jpg>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 29 Jul 2015 15:52:31 -0500
> From: Rob Latham <robl at mcs.anl.gov <mailto:robl at mcs.anl.gov>>
> To: <discuss at mpich.org <mailto:discuss at mpich.org>>,
> <rvictorelli at panasas.com <mailto:rvictorelli at panasas.com>>
> Subject: Re: [mpich-discuss] PANFS Remove RAID0 and add RAIDN to MPICH
> 3.2
> Message-ID: <55B93D0F.7040703 at mcs.anl.gov
> <mailto:55B93D0F.7040703 at mcs.anl.gov>>
> Content-Type: text/plain; charset="windows-1252"; format=flowed
>
>
>
> On 07/29/2015 07:07 AM, Victorelli, Ron wrote:
> > I am a developer at Panasas, and we would like to provide a patch
> that
> >
> > removes RAID0 support and adds RAIDN support to romio (MPICH 3.2):
> >
> > src/mpi/romio/adio/ad_panfs/ad_panfs_open.c
> >
> > I currently do not have an MCS or trac account.
>
> Hi Ron. I'm pleased to have contributions from Panasas. It's your
> first since 2007!
>
> If you've got a lot of patches in the works, maybe we should go ahead
> and set you up with a trac account and/or a git tree.
>
> If you're just looking to get this patch into the tree, that's fine too:
> it's definitely easier and you will just need to 'git format-patch' your
> changes and email them to me.
>
> ==rob
>
> >
> > Thank You
> >
> > Ron Victorelli
> >
> > Software Engineer
> >
> > Panasas, Inc
> >
> > Email: rvictorelli at panasas.com <mailto:rvictorelli at panasas.com>
> <mailto:rvictorelli at panasas.com <mailto:rvictorelli at panasas.com>>
> >
> > Tel: 412 -323-6422 <tel:412%20-323-6422>
> >
> > www.panasas.com <http://www.panasas.com> <http://www.panasas.com>
> >
> > Panasas_Logo_LR
> >
> >
> >
> > _______________________________________________
> > discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 29 Jul 2015 17:14:53 -0400
> From: Aaron Knister <aaron.s.knister at nasa.gov
> <mailto:aaron.s.knister at nasa.gov>>
> To: <discuss at mpich.org <mailto:discuss at mpich.org>>
> Subject: Re: [mpich-discuss] hydra, stdin close(), and SLURM
> Message-ID: <55B9424D.6030104 at nasa.gov
> <mailto:55B9424D.6030104 at nasa.gov>>
> Content-Type: text/plain; charset="windows-1252"; Format="flowed"
>
> Thanks Pavan!
>
> -Aaron
>
> On 7/28/15 3:23 PM, Balaji, Pavan wrote:
> > Hi Aaron,
> >
> > I've committed it to mpich/master:
> >
> >
> http://git.mpich.org/mpich.git/commitdiff/6b41775b2056ff18b3c28aab71764e35904c00fa
> >
> > Thanks for the contribution.
> >
> > This should be in tonight's nightlies:
> >
> > http://www.mpich.org/static/downloads/nightly/master/mpich/
> >
> > ... and in the upcoming mpich-3.2rc1 release.
> >
> > -- Pavan
> >
> >
> >
> >
> > On 7/27/15, 1:40 PM, "Balaji, Pavan" <balaji at anl.gov
> <mailto:balaji at anl.gov>> wrote:
> >
> >> Hi Aaron,
> >>
> >>
> >>
> >> Please send the patch to me directly.
> >>
> >> General guidelines as to the kind of patches we ask for:
> >>
> >> https://wiki.mpich.org/mpich/index.php/Version_Control_Systems_101
> >>
> >> You can ignore the git workflow related text, which is for our
> internal testing. I'll take care of that for you.
> >>
> >> Thanks,
> >>
> >> -- Pavan
> >>
> >> On 7/27/15, 1:36 PM, "Aaron Knister" <aaron.s.knister at nasa.gov
> <mailto:aaron.s.knister at nasa.gov>> wrote:
> >>
> >>> Hi Pavan,
> >>>
> >>> I see your reply in the archives but it didn't make it to my
> inbox so
> >>> I'm replying to my post. I don't disagree without you about the
> error
> >>> being in the SLURM code, but I'm not sure how one would prevent
> this
> >>> reliably. SLURM has no expectation that an external library
> will open
> >>> something at file descriptor 0 before it reaches the point in
> the code
> >>> where it's ready to poll for stdin. Do you have any suggestions?
> >>>
> >>> It's been a long while since I've done a git e-mail patch so it
> might
> >>> take me a bit to figure out. Should I send the patch to the
> list or to
> >>> you directly?
> >>>
> >>> Thanks!
> >>>
> >>> -Aaron
> >>>
> >>> On 7/25/15 10:26 PM, Aaron Knister wrote:
> >>>> I sent this off to the mvapich list yesterday and it was
> suggested I
> >>>> raise it here since this is the upstream project:
> >>>>
> >>>> This is a bit of a cross post from a thread I started on the
> slurm dev
> >>>> list:
> http://article.gmane.org/gmane.comp.distributed.slurm.devel/8176
> >>>>
> >>>> I'd like to get feedback on the idea that "--input none" be
> passed to
> >>>> srun when using the SLURM hydra bootstrap mechanism. I figured it
> >>>> would be inserted somewhere around here
> >>>>
> http://trac.mpich.org/projects/mpich/browser/src/pm/hydra/tools/bootstrap/external/slurm_launch.c#L98.
> >>>>
> >>>>
> >>>> Without this argument I'm getting spurious job aborts and
> confusing
> >>>> errors. The gist of it is mpiexec.hydra closes stdin before it
> exec's
> >>>> srun. srun then (possibly via the munge libraries) calls some
> function
> >>>> that does a look up via nss. We use sssd for AAA so
> libnss_sssd will
> >>>> handle this request. Part of the caching mechanism sssd uses will
> >>>> cause the library to open() the cache file. The lowest fd
> available is
> >>>> 0 so the cache file is opened on fd 0. srun then believes it's got
> >>>> stdin attached and it causes the issues outlined in the slurm dev
> >>>> post. I think passing "--input none" is the right thing to do here
> >>>> since hydra has in fact closed stdin to srun. I tested this
> via the
> >>>> HYDRA_LAUNCHER_EXTRA_ARGS environment variable and it does
> resolve the
> >>>> errors I described.
> >>>>
> >>>> Thanks!
> >>>> -Aaron
> >>>>
> >>> --
> >>> Aaron Knister
> >>> NASA Center for Climate Simulation (Code 606.2)
> >>> Goddard Space Flight Center
> >>> (301) 286-2776 <tel:%28301%29%20286-2776>
> >>>
> >>>
> >> _______________________________________________
> >> discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
> >> To manage subscription options or unsubscribe:
> >> https://lists.mpich.org/mailman/listinfo/discuss
> > _______________________________________________
> > discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776 <tel:%28301%29%20286-2776>
>
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: signature.asc
> Type: application/pgp-signature
> Size: 842 bytes
> Desc: OpenPGP digital signature
> URL:
> <http://lists.mpich.org/pipermail/discuss/attachments/20150729/0280399d/attachment-0001.pgp>
>
> ------------------------------
>
> Message: 4
> Date: Thu, 30 Jul 2015 17:40:35 +0300
> From: Viswanath Krishnamurthy <writetoviswa at gmail.com
> <mailto:writetoviswa at gmail.com>>
> To: discuss at mpich.org <mailto:discuss at mpich.org>
> Subject: Re: [mpich-discuss] Nemesis engine
> Message-ID:
>
> <CADhQ-jDZix3e2TmPAPjX3O7GO+Z7vOzSphPgc4Py+B=eRGBypA at mail.gmail.com
> <mailto:eRGBypA at mail.gmail.com>>
> Content-Type: text/plain; charset="utf-8"
>
> Hi All,
>
> I am currently working on MPICH-version 3.1.4 on Ubuntu..
> where I get an error stating that
>
> Assertion failed in
> file src/mpid/ch3/channels/nemesis/src/ch3_progress.c at Line 252
>
> The actual problem which I face is that currently even if MPI_Sends have
> already been dispatched, certain nodes keep waiting for MPI_Recvs which
> never arrive at all(Using multithreading). When I referred the
> internet, my
> understanding is that nemesis is written to handle only one thread
> receive.
> Please let me know about the latest patch for nemesis engine or the
> mpich
> version which has the changes.
> *src/mpid/ch3/channels/nemesis/src/ch3_progress.c *
>
> Thanks,
> Viswanath
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> <http://lists.mpich.org/pipermail/discuss/attachments/20150730/c9d4e337/attachment-0001.html>
>
> ------------------------------
>
> Message: 5
> Date: Thu, 30 Jul 2015 09:51:40 -0500
> From: Halim Amer <aamer at anl.gov <mailto:aamer at anl.gov>>
> To: <discuss at mpich.org <mailto:discuss at mpich.org>>
> Subject: Re: [mpich-discuss] Nemesis engine
> Message-ID: <55BA39FC.20406 at anl.gov <mailto:55BA39FC.20406 at anl.gov>>
> Content-Type: text/plain; charset="windows-1252"; format=flowed
>
> Hi Viswanath,
>
> Nemesis supports multithreading. Have you initialized the MPI
> environment with MPI_THREAD_MULTIPLE threading support?
>
> If you still see the problem after the above initialization, please send
> us a minimal example code that reproduces it.
>
> Thank you,
> --Halim
>
> Abdelhalim Amer (Halim)
> Postdoctoral Appointee
> MCS Division
> Argonne National Laboratory
>
> On 7/30/15 9:40 AM, Viswanath Krishnamurthy wrote:
> > Hi All,
> >
> > I am currently working on MPICH-version 3.1.4 on Ubuntu..
> > where I get an error stating that
> >
> > Assertion failed in
> > file src/mpid/ch3/channels/nemesis/src/ch3_progress.c at Line 252
> > *
> > *
> > The actual problem which I face is that currently even if
> MPI_Sends have
> > already been dispatched, certain nodes keep waiting for MPI_Recvs
> which
> > never arrive at all(Using multithreading). When I referred the
> internet,
> > my understanding is that nemesis is written to handle only one thread
> > receive. Please let me know about the latest patch for nemesis
> engine or
> > the mpich version which has the changes.
> > *src/mpid/ch3/channels/nemesis/src/ch3_progress.c *
> >
> > Thanks,
> > Viswanath
> >
> >
> >
> > _______________________________________________
> > discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
>
>
> ------------------------------
>
> Message: 6
> Date: Thu, 30 Jul 2015 21:09:04 +0000
> From: "Dorier, Matthieu" <mdorier at anl.gov <mailto:mdorier at anl.gov>>
> To: "discuss at mpich.org <mailto:discuss at mpich.org>"
> <discuss at mpich.org <mailto:discuss at mpich.org>>
> Subject: [mpich-discuss] Active loop in MPI_Waitany?
> Message-ID: <37142D5FC373A846ACE4F75AA11EA84D21BA0122 at DITKA.anl.gov
> <mailto:37142D5FC373A846ACE4F75AA11EA84D21BA0122 at DITKA.anl.gov>>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi,
>
> I have a code that looks like this:
>
> while(true) {
> do some I/O (HDF5 POSIX output to a remote, parallel file system)
> wait for communication (MPI_Waitany) from other processes (in
> the same node and outside the node)
> }
>
> I'm measuring the energy consumption of the node that runs this
> process for the same duration, as a function of the amount of data
> written in each I/O operation.
> Surprisingly, the larger the I/O in proposition to the
> communication, the lower the energy consumption. In other words, the
> longer I wait in MPI_Waitany, the more I consume.
>
> Does anyone have a good explanation for that? Is there an active
> loop in MPI_Waitany? Another reason?
>
> Thanks!
>
> Matthieu
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> <http://lists.mpich.org/pipermail/discuss/attachments/20150730/6777b87d/attachment-0001.html>
>
> ------------------------------
>
> Message: 7
> Date: Thu, 30 Jul 2015 19:41:21 -0400
> From: Jeff Hammond <jeff.science at gmail.com
> <mailto:jeff.science at gmail.com>>
> To: "discuss at mpich.org <mailto:discuss at mpich.org>"
> <discuss at mpich.org <mailto:discuss at mpich.org>>
> Subject: Re: [mpich-discuss] Active loop in MPI_Waitany?
> Message-ID:
>
> <CAGKz=uJr5NmO+csEBDOtk67zz+HDEaax_JoLNzHWswZipcPCyA at mail.gmail.com
> <mailto:uJr5NmO%2BcsEBDOtk67zz%2BHDEaax_JoLNzHWswZipcPCyA at mail.gmail.com>>
> Content-Type: text/plain; charset="utf-8"
>
> Seems obvious that Waitany spins on the array of requests until one
> completes. Is that an active loop by your definition?
>
> Jeff
>
> On Thursday, July 30, 2015, Dorier, Matthieu <mdorier at anl.gov
> <mailto:mdorier at anl.gov>> wrote:
>
> > Hi,
> >
> > I have a code that looks like this:
> >
> > while(true) {
> > do some I/O (HDF5 POSIX output to a remote, parallel file system)
> > wait for communication (MPI_Waitany) from other processes (in
> the same
> > node and outside the node)
> > }
> >
> > I'm measuring the energy consumption of the node that runs this
> process
> > for the same duration, as a function of the amount of data
> written in each
> > I/O operation.
> > Surprisingly, the larger the I/O in proposition to the
> communication, the
> > lower the energy consumption. In other words, the longer I wait in
> > MPI_Waitany, the more I consume.
> >
> > Does anyone have a good explanation for that? Is there an active
> loop in
> > MPI_Waitany? Another reason?
> >
> > Thanks!
> >
> > Matthieu
> >
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
> http://jeffhammond.github.io/
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> <http://lists.mpich.org/pipermail/discuss/attachments/20150730/8390a38a/attachment.html>
>
> ------------------------------
>
> _______________________________________________
> discuss mailing list
> discuss at mpich.org <mailto:discuss at mpich.org>
> https://lists.mpich.org/mailman/listinfo/discuss
>
> End of discuss Digest, Vol 33, Issue 10
> ***************************************
>
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list