[mpich-discuss] Threaded Listener
Jeff Hammond
jeff.science at gmail.com
Sat Jan 17 19:25:02 CST 2015
I see you work for IBM Research (assuming
http://researcher.ibm.com/researcher/view.php?person=br-edrodri is
current). If you have access to your company's digital library, you
might look up the Blue Gene project, which was a low-power
supercomputer running on arguably embedded processors that has no
trouble running MPI using a lightweight kernel that does not support
fork(), oversubscription or OS-scheduling.
The MPI source code for the latest incarnation of Blue Gene (/Q) can
be found in the MPICH repo, no less:
http://git.mpich.org/mpich.git/tree/HEAD:/src/mpid/pamid. MPI for the
previous generation of Blue Gene (/P) can be found on
http://git.mpich.org/mpich.git/tree/fbe1f262b3a56898519750228278617f40bb2f35:/src/mpid/dcmfd.
Documentation of DCMF is available from
http://dcmf.anl-external.org/wiki/index.php/Main_Page. The PAMI
source code is part of the open-source BGQ driver, which you can
download from https://repo.anl-external.org/repos/bgq-driver/V1R2M2/.
That may not be the latest version, but the original maintainer of the
BGQ driver source repo has left the Blue Gene family.
The BGQ driver source should contain other system software that is
used to launch processes on the computer nodes. You might grep for
"runjob", as that is the process launcher for Blue Gene/Q.
Best,
Jeff
On Sat, Jan 17, 2015 at 12:23 PM, Eduardo <erocha.ssa at gmail.com> wrote:
> The system is a micro-OS running in an embedded processor. There will be
> multiple of these processors, each one with a single mpi rank. I'm starting
> to think that I will have to do what Hydra does myself. Is there any
> documentation about what a mpi job expects from hydra?
>
> Eduardo
>
> On Sat, Jan 17, 2015 at 5:58 PM, Jeff Hammond <jeff.science at gmail.com>
> wrote:
>>
>> How do you create processes on this system? If you are using an MPI
>> implementation that has a 1:1 mapping between MPI processes and OS
>> processes, you've got to be able to create multiple processes somehow.
>>
>> Do you have any platform details you can share?
>>
>> Jeff
>>
>> On Sat, Jan 17, 2015 at 11:46 AM, Eduardo <erocha.ssa at gmail.com> wrote:
>> > I tried to use a newer MPICH, but, as you said, Hydra forks as part of
>> > mpiexec. Is it possible to launch a job without mpiexec/mpirun ? That
>> > should
>> > probably work in my environment (assuming that there is no listener as a
>> > heavy process as it is the case in the default mpich-1.2.7p1).
>> >
>> > The problem is not that the fork symbol is missing. The problem is that
>> > it
>> > is been called. My environment actually has a dummy fork, but that only
>> > causes the program to abort if it is called.
>> >
>> > So in summary, the problem is that I cannot use mpiexec to launch a
>> > program,
>> > because it calls fork. In mpich-1.2.7p1 , I can launch myself the
>> > program as
>> > if I was launching a debug (as in section 3.5.6 of the CH_P4 manual). In
>> > addition, the mpich cannot issue a fork for a listener like task, only
>> > threads.
>> >
>> > Eduardo
>> >
>> > On Sat, Jan 17, 2015 at 4:23 PM, Jeff Hammond <jeff.science at gmail.com>
>> > wrote:
>> >>
>> >> MPICH shouldn't fork. Hydra probably uses fork to launch processes as
>> >> part
>> >> of mpiexec.
>> >>
>> >> Blue Gene doesn't support fork either and MPICH runs there, but the
>> >> process launcher is not Hydra. Same for Cray last time I checked. So
>> >> I'm
>> >> sure that fork isn't required to use MPICH, at least if dynamic
>> >> processes
>> >> are not used.
>> >>
>> >> Is the issue that you cannot link an MPI program because the fork
>> >> symbol
>> >> is missing or that you cannot launch jobs with a Hydra?
>> >>
>> >> All mainstream MPI implementations use OS processes to implement MPI
>> >> processes, but the standard doesn't require this. FG-MPI uses threads
>> >> to
>> >> implement MPI processes, but as it still uses Hydra (AFAIK), it may or
>> >> may
>> >> not work for you.
>> >>
>> >> Can you describe in detail what happens when you try to build and run
>> >> MPICH-latest on your system?
>> >>
>> >> Jeff
>> >>
>> >> Sent from my iPhone
>> >>
>> >> On Jan 17, 2015, at 9:41 AM, Eduardo <erocha.ssa at gmail.com> wrote:
>> >>
>> >> I would use a newer version if I could. However, I cannot issue a fork
>> >> in
>> >> my embedded environment. I can create threads though.
>> >>
>> >> So, is there any newer versions of mpich that does not create heavy
>> >> processes? I can live without MPI_Spawn and the like.
>> >>
>> >> Regards,
>> >>
>> >> Eduardo
>> >>
>> >> On Jan 16, 2015 5:42 PM, "Wesley Bland" <wbland at anl.gov> wrote:
>> >>>
>> >>> Can you try using a more recent version of MPICH. The version you are
>> >>> using is years old and we don't support it anymore. Our latest version
>> >>> is
>> >>> 3.1.3. You might see if the issue is still present there.
>> >>>
>> >>> Thanks,
>> >>> Wesley
>> >>>
>> >>> On Fri, Jan 16, 2015 at 10:54 AM, Eduardo <erocha.ssa at gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> I am trying to compile and use mpich-1.2.7p1 with threaded listener
>> >>>> (i.e. configured with --enable-threaded-listener). However, I cannot
>> >>>> even
>> >>>> run a simple mpi example with the resulting mpich.
>> >>>>
>> >>>> I need to use threaded listener because the environment I am
>> >>>> compiling
>> >>>> for (kind of embedded environment) does not have fork (no heavy
>> >>>> processes).
>> >>>>
>> >>>> The error I get with the mpich with threaded listener is:
>> >>>>
>> >>>> rm_2889: 1103279872: p4_error: listener select: -1
>> >>>> p4_error: latest msg from perror: Bad file descriptor
>> >>>> p0_2710: (2.097656) net_recv failed for fd = 5
>> >>>> p0_2710: 3778266880: p4_error: net_recv read, errno = : 104
>> >>>>
>> >>>> Has anyone experienced a similar problem?
>> >>>>
>> >>>> Thanks in advance,
>> >>>> Eduardo
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> discuss mailing list discuss at mpich.org
>> >>> To manage subscription options or unsubscribe:
>> >>> https://lists.mpich.org/mailman/listinfo/discuss
>> >>
>> >> _______________________________________________
>> >> discuss mailing list discuss at mpich.org
>> >> To manage subscription options or unsubscribe:
>> >> https://lists.mpich.org/mailman/listinfo/discuss
>> >>
>> >>
>> >> _______________________________________________
>> >> discuss mailing list discuss at mpich.org
>> >> To manage subscription options or unsubscribe:
>> >> https://lists.mpich.org/mailman/listinfo/discuss
>> >
>> >
>> >
>> > _______________________________________________
>> > discuss mailing list discuss at mpich.org
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com
>> http://jeffhammond.github.io/
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
--
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list