[mpich-discuss] Nemesis engine query by Viswanath

Viswanath Krishnamurthy writetoviswa at gmail.com
Fri Jul 31 04:41:34 CDT 2015


I did try initializing Multithreading support
#include<mpi.h>
int main()
{

        int provided;
        int returnvalue =
MPI_Init_thread(&argc,&argv,MPI_THREAD_MULTIPLE,&provided);
        if(provided < MPI_THREAD_MULTIPLE)
        {
                printf("\n THREAD LIBRARY DOESN'T HAVE MULTITHREADING
SUPPORT:");
                exit(1);
        }
}

The code compiles but throws an error -

Assertion failed in file
/home/viswa/libraries/mpich-3.1.4/src/include/mpiimplthreadpost.h at line
163: depth > 0 && depth < 10
internal ABORT - process 1
internal ABORT - process 0

Could you please refer me to some documentation for mpi_init_thread/
MPICH-multithreading documentation as I am relatively new to it.

Thanks,
Viswanath



On Fri, Jul 31, 2015 at 2:41 AM, <discuss-request at mpich.org> wrote:

> Send discuss mailing list submissions to
>         discuss at mpich.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.mpich.org/mailman/listinfo/discuss
> or, via email, send a message with subject or body 'help' to
>         discuss-request at mpich.org
>
> You can reach the person managing the list at
>         discuss-owner at mpich.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of discuss digest..."
>
>
> Today's Topics:
>
>    1.  PANFS Remove RAID0 and add RAIDN to MPICH 3.2 (Victorelli, Ron)
>    2. Re:  PANFS Remove RAID0 and add RAIDN to MPICH 3.2 (Rob Latham)
>    3. Re:  hydra, stdin close(), and SLURM (Aaron Knister)
>    4. Re:  Nemesis engine (Viswanath Krishnamurthy)
>    5. Re:  Nemesis engine (Halim Amer)
>    6.  Active loop in MPI_Waitany? (Dorier, Matthieu)
>    7. Re:  Active loop in MPI_Waitany? (Jeff Hammond)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 29 Jul 2015 12:07:40 +0000
> From: "Victorelli, Ron" <rvictorelli at panasas.com>
> To: "discuss at mpich.org" <discuss at mpich.org>
> Subject: [mpich-discuss] PANFS Remove RAID0 and add RAIDN to MPICH 3.2
> Message-ID:
>         <
> BN3PR08MB12888AACA7907A8ACED7531AA18C0 at BN3PR08MB1288.namprd08.prod.outlook.com
> >
>
> Content-Type: text/plain; charset="us-ascii"
>
> I am a developer at Panasas, and we would like to provide a patch that
> removes RAID0 support and adds RAIDN support to romio (MPICH 3.2):
>
> src/mpi/romio/adio/ad_panfs/ad_panfs_open.c
>
> I currently do not have an MCS or trac account.
>
> Thank You
>
> Ron Victorelli
> Software Engineer
> Panasas, Inc
> Email: rvictorelli at panasas.com<mailto:rvictorelli at panasas.com>
> Tel: 412 -323-6422
> www.panasas.com<http://www.panasas.com>
> [Panasas_Logo_LR]
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.mpich.org/pipermail/discuss/attachments/20150729/00b240fc/attachment-0001.html
> >
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: image001.jpg
> Type: image/jpeg
> Size: 3610 bytes
> Desc: image001.jpg
> URL: <
> http://lists.mpich.org/pipermail/discuss/attachments/20150729/00b240fc/attachment-0001.jpg
> >
>
> ------------------------------
>
> Message: 2
> Date: Wed, 29 Jul 2015 15:52:31 -0500
> From: Rob Latham <robl at mcs.anl.gov>
> To: <discuss at mpich.org>, <rvictorelli at panasas.com>
> Subject: Re: [mpich-discuss] PANFS Remove RAID0 and add RAIDN to MPICH
>         3.2
> Message-ID: <55B93D0F.7040703 at mcs.anl.gov>
> Content-Type: text/plain; charset="windows-1252"; format=flowed
>
>
>
> On 07/29/2015 07:07 AM, Victorelli, Ron wrote:
> > I am a developer at Panasas, and we would like to provide a patch that
> >
> > removes RAID0 support and adds RAIDN support to romio (MPICH 3.2):
> >
> > src/mpi/romio/adio/ad_panfs/ad_panfs_open.c
> >
> > I currently do not have an MCS or trac account.
>
> Hi Ron.  I'm pleased to have contributions from Panasas.   It's your
> first since 2007!
>
> If you've got a lot of patches in the works, maybe we should go ahead
> and set you up with a trac account and/or a git tree.
>
> If you're just looking to get this patch into the tree, that's fine too:
> it's definitely easier and you will just need to 'git format-patch' your
> changes and email them to me.
>
> ==rob
>
> >
> > Thank You
> >
> > Ron Victorelli
> >
> > Software Engineer
> >
> > Panasas, Inc
> >
> > Email: rvictorelli at panasas.com <mailto:rvictorelli at panasas.com>
> >
> > Tel: 412 -323-6422
> >
> > www.panasas.com <http://www.panasas.com>
> >
> > Panasas_Logo_LR
> >
> >
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 29 Jul 2015 17:14:53 -0400
> From: Aaron Knister <aaron.s.knister at nasa.gov>
> To: <discuss at mpich.org>
> Subject: Re: [mpich-discuss] hydra, stdin close(), and SLURM
> Message-ID: <55B9424D.6030104 at nasa.gov>
> Content-Type: text/plain; charset="windows-1252"; Format="flowed"
>
> Thanks Pavan!
>
> -Aaron
>
> On 7/28/15 3:23 PM, Balaji, Pavan wrote:
> > Hi Aaron,
> >
> > I've committed it to mpich/master:
> >
> >
> http://git.mpich.org/mpich.git/commitdiff/6b41775b2056ff18b3c28aab71764e35904c00fa
> >
> > Thanks for the contribution.
> >
> > This should be in tonight's nightlies:
> >
> >       http://www.mpich.org/static/downloads/nightly/master/mpich/
> >
> > ... and in the upcoming mpich-3.2rc1 release.
> >
> >    -- Pavan
> >
> >
> >
> >
> > On 7/27/15, 1:40 PM, "Balaji, Pavan" <balaji at anl.gov> wrote:
> >
> >> Hi Aaron,
> >>
> >>
> >>
> >> Please send the patch to me directly.
> >>
> >> General guidelines as to the kind of patches we ask for:
> >>
> >>      https://wiki.mpich.org/mpich/index.php/Version_Control_Systems_101
> >>
> >> You can ignore the git workflow related text, which is for our internal
> testing.  I'll take care of that for you.
> >>
> >> Thanks,
> >>
> >>   -- Pavan
> >>
> >> On 7/27/15, 1:36 PM, "Aaron Knister" <aaron.s.knister at nasa.gov> wrote:
> >>
> >>> Hi Pavan,
> >>>
> >>> I see your reply in the archives but it didn't make it to my inbox so
> >>> I'm replying to my post. I don't disagree without you about the error
> >>> being in the SLURM code, but I'm not sure how one would prevent this
> >>> reliably. SLURM has no expectation that an external library will open
> >>> something at file descriptor 0 before it reaches the point in the code
> >>> where it's ready to poll for stdin. Do you have any suggestions?
> >>>
> >>> It's been a long while since I've done a git e-mail patch so it might
> >>> take me a bit to figure out. Should I send the patch to the list or to
> >>> you directly?
> >>>
> >>> Thanks!
> >>>
> >>> -Aaron
> >>>
> >>> On 7/25/15 10:26 PM, Aaron Knister wrote:
> >>>> I sent this off to the mvapich list yesterday and it was suggested I
> >>>> raise it here since this is the upstream project:
> >>>>
> >>>> This is a bit of a cross post from a thread I started on the slurm dev
> >>>> list:
> http://article.gmane.org/gmane.comp.distributed.slurm.devel/8176
> >>>>
> >>>> I'd like to get feedback on the idea that "--input none" be passed to
> >>>> srun when using the SLURM hydra bootstrap mechanism. I figured it
> >>>> would be inserted somewhere around here
> >>>>
> http://trac.mpich.org/projects/mpich/browser/src/pm/hydra/tools/bootstrap/external/slurm_launch.c#L98
> .
> >>>>
> >>>>
> >>>> Without this argument I'm getting spurious job aborts and confusing
> >>>> errors. The gist of it is mpiexec.hydra closes stdin before it exec's
> >>>> srun. srun then (possibly via the munge libraries) calls some function
> >>>> that does a look up via nss. We use sssd for AAA so libnss_sssd will
> >>>> handle this request. Part of the caching mechanism sssd uses will
> >>>> cause the library to open() the cache file. The lowest fd available is
> >>>> 0 so the cache file is opened on fd 0. srun then believes it's got
> >>>> stdin attached and it causes the issues outlined in the slurm dev
> >>>> post. I think passing "--input none" is the right thing to do here
> >>>> since hydra has in fact closed stdin to srun. I tested this via the
> >>>> HYDRA_LAUNCHER_EXTRA_ARGS environment variable and it does resolve the
> >>>> errors I described.
> >>>>
> >>>> Thanks!
> >>>> -Aaron
> >>>>
> >>> --
> >>> Aaron Knister
> >>> NASA Center for Climate Simulation (Code 606.2)
> >>> Goddard Space Flight Center
> >>> (301) 286-2776
> >>>
> >>>
> >> _______________________________________________
> >> discuss mailing list     discuss at mpich.org
> >> To manage subscription options or unsubscribe:
> >> https://lists.mpich.org/mailman/listinfo/discuss
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
>
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: signature.asc
> Type: application/pgp-signature
> Size: 842 bytes
> Desc: OpenPGP digital signature
> URL: <
> http://lists.mpich.org/pipermail/discuss/attachments/20150729/0280399d/attachment-0001.pgp
> >
>
> ------------------------------
>
> Message: 4
> Date: Thu, 30 Jul 2015 17:40:35 +0300
> From: Viswanath Krishnamurthy <writetoviswa at gmail.com>
> To: discuss at mpich.org
> Subject: Re: [mpich-discuss] Nemesis engine
> Message-ID:
>         <CADhQ-jDZix3e2TmPAPjX3O7GO+Z7vOzSphPgc4Py+B=
> eRGBypA at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi All,
>
> I am currently working on MPICH-version 3.1.4 on Ubuntu..
> where I get an error stating that
>
> Assertion failed in
> file src/mpid/ch3/channels/nemesis/src/ch3_progress.c at Line 252
>
> The actual problem which I face is that currently even if MPI_Sends have
> already been dispatched, certain nodes keep waiting for MPI_Recvs which
> never arrive at all(Using multithreading). When I referred the internet, my
> understanding is that nemesis is written to handle only one thread receive.
> Please let me know about the latest patch for nemesis engine or the mpich
> version which has the changes.
> *src/mpid/ch3/channels/nemesis/src/ch3_progress.c *
>
> Thanks,
> Viswanath
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.mpich.org/pipermail/discuss/attachments/20150730/c9d4e337/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 5
> Date: Thu, 30 Jul 2015 09:51:40 -0500
> From: Halim Amer <aamer at anl.gov>
> To: <discuss at mpich.org>
> Subject: Re: [mpich-discuss] Nemesis engine
> Message-ID: <55BA39FC.20406 at anl.gov>
> Content-Type: text/plain; charset="windows-1252"; format=flowed
>
> Hi Viswanath,
>
> Nemesis supports multithreading. Have you initialized the MPI
> environment with MPI_THREAD_MULTIPLE threading support?
>
> If you still see the problem after the above initialization, please send
> us a minimal example code that reproduces it.
>
> Thank you,
> --Halim
>
> Abdelhalim Amer (Halim)
> Postdoctoral Appointee
> MCS Division
> Argonne National Laboratory
>
> On 7/30/15 9:40 AM, Viswanath Krishnamurthy wrote:
> > Hi All,
> >
> > I am currently working on MPICH-version 3.1.4 on Ubuntu..
> > where I get an error stating that
> >
> > Assertion failed in
> > file src/mpid/ch3/channels/nemesis/src/ch3_progress.c at Line 252
> > *
> > *
> > The actual problem which I face is that currently even if MPI_Sends have
> > already been dispatched, certain nodes keep waiting for MPI_Recvs which
> > never arrive at all(Using multithreading). When I referred the internet,
> > my understanding is that nemesis is written to handle only one thread
> > receive. Please let me know about the latest patch for nemesis engine or
> > the mpich version which has the changes.
> > *src/mpid/ch3/channels/nemesis/src/ch3_progress.c *
> >
> > Thanks,
> > Viswanath
> >
> >
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
>
>
> ------------------------------
>
> Message: 6
> Date: Thu, 30 Jul 2015 21:09:04 +0000
> From: "Dorier, Matthieu" <mdorier at anl.gov>
> To: "discuss at mpich.org" <discuss at mpich.org>
> Subject: [mpich-discuss] Active loop in MPI_Waitany?
> Message-ID: <37142D5FC373A846ACE4F75AA11EA84D21BA0122 at DITKA.anl.gov>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi,
>
> I have a code that looks like this:
>
> while(true) {
>    do some I/O (HDF5 POSIX output to a remote, parallel file system)
>    wait for communication (MPI_Waitany) from other processes (in the same
> node and outside the node)
> }
>
> I'm measuring the energy consumption of the node that runs this process
> for the same duration, as a function of the amount of data written in each
> I/O operation.
> Surprisingly, the larger the I/O in proposition to the communication, the
> lower the energy consumption. In other words, the longer I wait in
> MPI_Waitany, the more I consume.
>
> Does anyone have a good explanation for that? Is there an active loop in
> MPI_Waitany? Another reason?
>
> Thanks!
>
> Matthieu
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.mpich.org/pipermail/discuss/attachments/20150730/6777b87d/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 7
> Date: Thu, 30 Jul 2015 19:41:21 -0400
> From: Jeff Hammond <jeff.science at gmail.com>
> To: "discuss at mpich.org" <discuss at mpich.org>
> Subject: Re: [mpich-discuss] Active loop in MPI_Waitany?
> Message-ID:
>         <CAGKz=
> uJr5NmO+csEBDOtk67zz+HDEaax_JoLNzHWswZipcPCyA at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Seems obvious that Waitany spins on the array of requests until one
> completes. Is that an active loop by your definition?
>
> Jeff
>
> On Thursday, July 30, 2015, Dorier, Matthieu <mdorier at anl.gov> wrote:
>
> > Hi,
> >
> > I have a code that looks like this:
> >
> > while(true) {
> >    do some I/O (HDF5 POSIX output to a remote, parallel file system)
> >    wait for communication (MPI_Waitany) from other processes (in the same
> > node and outside the node)
> > }
> >
> > I'm measuring the energy consumption of the node that runs this process
> > for the same duration, as a function of the amount of data written in
> each
> > I/O operation.
> > Surprisingly, the larger the I/O in proposition to the communication, the
> > lower the energy consumption. In other words, the longer I wait in
> > MPI_Waitany, the more I consume.
> >
> > Does anyone have a good explanation for that? Is there an active loop in
> > MPI_Waitany? Another reason?
> >
> > Thanks!
> >
> > Matthieu
> >
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.mpich.org/pipermail/discuss/attachments/20150730/8390a38a/attachment.html
> >
>
> ------------------------------
>
> _______________________________________________
> discuss mailing list
> discuss at mpich.org
> https://lists.mpich.org/mailman/listinfo/discuss
>
> End of discuss Digest, Vol 33, Issue 10
> ***************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150731/508789a0/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list