[mpich-discuss] Run mpich on localhost

Martin Ivanov marto1980 at gmail.com
Sat May 9 01:24:17 CDT 2020


Hello Tony,
Thank you very much for your reply. I followed your advice and gave the
hostname 'dragonfly' to my machine. Then, in /etc/hosts I provided gave the
alias 'dragonfly' to localhost as you suggested:

"
marto at dragonfly% cat /etc/hosts
::1                     localhost dragonfly
127.0.0.1               localhost dragonfly
"

I compiled both the icpi and hellow examples. Now mpiexec seems to work,
although not as reliably as I might wish. Running any of the examples with
one core like this:

mpiexec -n 1 mpich-3.0.4/examples/hellow

is always successful. I could never get the command to finish with 2, 3, or
4 cores: it simply hangs. E.g. with 2 cores after running the above command
I get:

"
marto at dragonfly% ps x | grep hellow
166493 ??  I6s      0:00.00 mpich-3.0.4/examples/hellow
166494 ??  I6s      0:00.00 mpich-3.0.4/examples/hellow
"

With 5, 6, 7, or 8 cores most of the time the command is successful, that
is all cores print as expected, but sometimes the command also hangs just
as with 2 cores.
Testing with icpi and 10^10 intervals gives the following output with 1
core:

"
marto at dragonfly% mpiexec -n 1 mpich-3.0.4/examples/icpi
Enter the number of intervals: (0 quits) 10000000000
pi is approximately 3.1415926535895782, Error is 0.0000000000002149
wall clock time = 19.997830
"

and the following with 5 cores, when it does not hang:

"
marto at dragonfly% mpiexec -n 5 mpich-3.0.4/examples/icpi
Enter the number of intervals: (0 quits) 10000000000
pi is approximately 3.1415926535899605, Error is 0.0000000000001674
wall clock time = 4.475224
"

As you can see, the calculation scales nicely. However, I am lost trying to
figure out this unreliable behaviour. Maybe it has something to do with the
hwloc warnings I get at configure?
I am attaching the output of running the configure script.

Thank you very much for your helpful advices. I am looking forward to your
reply.

Best regards,
Martin




On Fri, May 8, 2020 at 2:48 PM Tony Curtis <anthony.curtis at stonybrook.edu>
wrote:

>
>
> On May 8, 2020, at 1:41 AM, Martin Ivanov <marto1980 at gmail.com> wrote:
>
> Hello,
> Thank you for your replies. I think another version of MPICH may not work
> as I am following a tutorial (
> https://www2.mmm.ucar.edu/wrf/OnLineTutorial/compilation_tutorial.php)
> that requires specific library versions.
> My /etc/hosts file contains only these two lines:
>
> "
> ::1 localhost localhost.my.domain
> 127.0.0.1 localhost localhost.my.domain
> "
>
> uname -n returns an empty string as I have not set a hostname. I guess I
> do not need this because I need to run MPICH only on this machine, that is
> on localhost? uname -a gives the following output:
>
>
> Thing is, though, your initial post showed ssh being used as launcher.  So
> in that case you would definitely need a hostname so you can resolve back
> to yourself (your shell prompt was a big clue, by the way).  Setting a
> hostname and making it an alias of localhost in /etc/hosts would probably
> do wonders no matter the underlying launch.
>
> The -info output is pretty much the same as I see on FreeBSD.
>
> If you run a hello world with a sleep in it to allow you to react, what
> does “ps x” show?
>
> (Not familiar much with DragonFly beyond “poking at it” so I don’t have
> any particular insight there.)
>
> Tony
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20200509/43b93443/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.log
Type: text/x-log
Size: 82119 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20200509/43b93443/attachment-0001.bin>


More information about the discuss mailing list