[mpich-discuss] discuss Digest, Vol 27, Issue 12
Livan Valladares
soundmartell at icloud.com
Mon Jan 19 10:36:49 CST 2015
Thank you very much Ken for your help. I am a newie without the knowledge to recode the script. I contacted with Htcondor admin about it. Let me see if I am lucky and they can help.
My best regards,
Livan Valladares Martell.
> On Jan 19, 2015, at 6:24 PM, discuss-request at mpich.org wrote:
>
> Send discuss mailing list submissions to
> discuss at mpich.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.mpich.org/mailman/listinfo/discuss
> or, via email, send a message with subject or body 'help' to
> discuss-request at mpich.org
>
> You can reach the person managing the list at
> discuss-owner at mpich.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of discuss digest..."
>
>
> Today's Topics:
>
> 1. Re: MPICH 2 script to MPICH 3 (Kenneth Raffenetti)
> 2. error installing mpich-master-v3.2a2-113-g7d59d14d on Linux
> (Siegmar Gross)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 19 Jan 2015 10:04:47 -0600
> From: Kenneth Raffenetti <raffenet at mcs.anl.gov>
> To: <discuss at mpich.org>
> Subject: Re: [mpich-discuss] MPICH 2 script to MPICH 3
> Message-ID: <54BD2B1F.7010106 at mcs.anl.gov>
> Content-Type: text/plain; charset="utf-8"; format=flowed
>
> Hi Livan,
>
> It should be possible to convert your script to use Hydra. Hydra has 2
> main components. The first is the main hydra executable - mpiexec. The
> other is the hydra_pmi_proxy.
>
> In a normal ssh launched scenario, mpiexec will launch proxies on all
> nodes being used for the job, and those proxies will launch the mpi
> processes.
>
> If ssh is not possible in your setup, you should be able to utilize the
> hydra manual laucher (mpiexec -launcher manual). When you run mpiexec
> with that parameter, it will output the proxy launch commands that can
> be used to connect to it. It will then wait for all the proxies to
> connect. Here's an example:
>
> raffenet at doom:mpich/ $ mpiexec -launcher manual -n 3 -hosts a,b,c
> /bin/hostname
> HYDRA_LAUNCH: /sandbox/mpich/i/bin/hydra_pmi_proxy --control-port
> doom:36311 --rmk user --launcher manual --demux poll --pgid 0 --retries
> 10 --usize -2 --proxy-id 0
> HYDRA_LAUNCH: /sandbox/mpich/i/bin/hydra_pmi_proxy --control-port
> doom:36311 --rmk user --launcher manual --demux poll --pgid 0 --retries
> 10 --usize -2 --proxy-id 1
> HYDRA_LAUNCH: /sandbox/mpich/i/bin/hydra_pmi_proxy --control-port
> doom:36311 --rmk user --launcher manual --demux poll --pgid 0 --retries
> 10 --usize -2 --proxy-id 2
> HYDRA_LAUNCH_END
>
> Let me know if you have other questions.
>
> Ken
>
> On 01/19/2015 05:39 AM, Livan Valladares wrote:
>> Hello,
>> I have been using Htcondor with MPICH 2 but I update MPICH 2 to MPICH 3 and now the MPD process manager has been deprecated and Hydra is de default process manager.
>> The script I was using use some commands like mpdtrace, mpdallexit but they are not supported anymore.
>> Here is my script, is it any possibility to change this script code to be able to work with Hydra?
>> Thank you very much,
>> Livan Valladares Martell
>> Script:
>>
>> #!/bin/sh
>>
>> ##**************************************************************
>> ##
>> ## Copyright (C) 1990-2014, Condor Team, Computer Sciences Department,
>> ## University of Wisconsin-Madison, WI.
>> ##
>> ## Licensed under the Apache License, Version 2.0 (the "License"); you
>> ## may not use this file except in compliance with the License. You may
>> ## obtain a copy of the License at
>> ##
>> ## http://www.apache.org/licenses/LICENSE-2.0
>> ##
>> ## Unless required by applicable law or agreed to in writing, software
>> ## distributed under the License is distributed on an "AS IS" BASIS,
>> ## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> ## See the License for the specific language governing permissions and
>> ## limitations under the License.
>> ##
>> ##**************************************************************
>>
>>
>> # Set this to the bin directory of MPICH installation
>> MPDIR=/opt/local/bin
>> PATH=$MPDIR:.:$PATH
>> export PATH
>>
>> _CONDOR_PROCNO=$_CONDOR_PROCNO
>> _CONDOR_NPROCS=$_CONDOR_NPROCS
>>
>> # Remove the contact file, so if we are held and released
>> # it can be recreated anew
>>
>> rm -f $CONDOR_CONTACT_FILE
>>
>> PATH=`condor_config_val libexec`/:$PATH
>>
>> # mpd needs a conf file, and it must be
>> # permissions 0700
>> mkdir tmp
>> MPD_CONF_FILE=`pwd`/tmp/mpd_conf_file
>> export MPD_CONF_FILE
>>
>> ulimit -c 0
>>
>> # If you have a shared file system, maybe you
>> # want to put the mpd.conf file in your home
>> # directory
>>
>> echo "password=somepassword" > $MPD_CONF_FILE
>> chmod 0700 $MPD_CONF_FILE
>>
>> # If on the head node, start mpd, get the port and host,
>> # and condor_chirp it back into the ClassAd
>> # so the non-head nodes can find the head node.
>>
>> if [ $_CONDOR_PROCNO -eq 0 ]
>> then
>> mpd > mpd.out.$_CONDOR_PROCNO 2>&1 &
>> sleep 1
>> host=`mpdtrace -l | sed 1q | tr '_' ' ' | awk '{print $1}'`
>> port=`mpdtrace -l | sed 1q | tr '_' ' ' | awk '{print $2}'`
>>
>> condor_chirp set_job_attr MPICH_PORT $port
>> condor_chirp set_job_attr MPICH_HOST \"$host\"
>>
>> num_hosts=1
>> retries=0
>> while [ $num_hosts -ne $_CONDOR_NPROCS ]
>> do
>> num_hosts=`mpdtrace | wc -l`
>> sleep 2
>> retries=`expr $retries + 1`
>> if [ $retries -gt 100 ]
>> then
>> echo "Too many retries, could not start all $_CONDOR_NPROCS nodes, only started $num_hosts, giving up. Here are the hosts I could start "
>> mpdtrace
>> exit 1
>> fi
>> done
>>
>> ## run the actual mpi job, which was the command line argument
>> ## to the invocation of this shell script
>> mpiexec -n $_CONDOR_NPROCS $@
>> e=$?
>>
>> mpdallexit
>> sleep 20
>> echo $e
>> else
>> # If NOT the head node, acquire the host and port of
>> # the head node
>> retries=0
>> host=UNDEFINED
>> while [ $host == "UNDEFINED" ]
>> do
>> host=`condor_chirp get_job_attr MPICH_HOST`
>> sleep 2
>> retries=`expr $retries + 1`
>> if [ $retries -gt 100 ]; then
>> echo "Too many retries, could not get mpd host from condor_chirp, giving up."
>> exit 1
>> fi
>> done
>>
>> port=`condor_chirp get_job_attr MPICH_PORT`
>> host=`echo $host | tr -d '"'`
>> mpd --host=$host --port=$port > mpd.out.$_CONDOR_PROCNO 2>&1
>> fi
>>
>>
>>
>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 19 Jan 2015 17:24:22 +0100 (CET)
> From: Siegmar Gross <Siegmar.Gross at informatik.hs-fulda.de>
> To: discuss at mpich.org
> Subject: [mpich-discuss] error installing
> mpich-master-v3.2a2-113-g7d59d14d on Linux
> Message-ID: <201501191624.t0JGOMgd017579 at tyr.informatik.hs-fulda.de>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi,
>
> today I tried to install mpich-master-v3.2a2-113-g7d59d14d on my
> machines (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux
> 12.1 x86_64) with gcc-4.9.2 and Sun C 5.13. I succedded on both
> Solaris machines with both compilers and got the following error
> on Linux for both compilers.
>
> tyr mpich-master-v3.2a2-113-g7d59d14d-Linux.x86_64.64_gcc 304 cat log.make.Linux.x86_64.64_gcc
> if test ! -h ./src/include/mpio.h ; then \
> rm -f ./src/include/mpio.h ; \
> ( cd ./src/include && \
> ln -s ../mpi/romio/include/mpio.h ) ; \
> fi
> make all-recursive
> make[1]: Entering directory `/export2/src/mpich-3.2/mpich-master-v3.2a2-113-g7d59d14d-Linux.x86_64.64_gcc'
> make[1]: execvp: /bin/sh: Argument list too long
> make[1]: *** [all-recursive] Error 127
> make[1]: Leaving directory `/export2/src/mpich-3.2/mpich-master-v3.2a2-113-g7d59d14d-Linux.x86_64.64_gcc'
> make: *** [all] Error 2
> tyr mpich-master-v3.2a2-113-g7d59d14d-Linux.x86_64.64_gcc 305
>
>
> I used the following "configure" command.
>
> ../mpich-master-v3.2a2-113-g7d59d14d/configure --prefix=/usr/local/mpich-3.2_64_gcc \
> --libdir=/usr/local/mpich-3.2_64_gcc/lib64 \
> --includedir=/usr/local/mpich-3.2_64_gcc/include64 \
> CC="gcc" CXX="g++" F77="gfortran" FC="gfortran" \
> CFLAGS="-m64" CXXFLAGS="-m64" FFLAGS="-m64" FCFLAGS="-m64" \
> LDFLAGS="-m64 -L/usr/lib/sparcv9 -Wl,-rpath -Wl,/usr/lib/sparcv9" \
> --enable-f77 --enable-fc --enable-cxx --enable-romio \
> --enable-debuginfo --enable-smpcoll \
> --enable-threads=runtime --with-thread-package=posix \
> --enable-shared \
> |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_gcc
>
>
> I would be grateful if somebody can fix the problem. Thank you very
> much for any help in advance.
>
>
> Kind regards
>
> Siegmar
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: config.tar.gz
> Type: application/octet-stream
> Size: 85796 bytes
> Desc: config.tar.gz
> URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150119/9a9e2e99/attachment.obj>
>
> ------------------------------
>
> _______________________________________________
> discuss mailing list
> discuss at mpich.org
> https://lists.mpich.org/mailman/listinfo/discuss
>
> End of discuss Digest, Vol 27, Issue 12
> ***************************************
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list