[mpich-discuss] discuss Digest, Vol 27, Issue 12

Livan Valladares soundmartell at icloud.com
Mon Jan 19 10:36:49 CST 2015


Thank you very much Ken for your help. I am a newie without the knowledge to recode the script. I contacted with Htcondor admin about it. Let me see if I am lucky and they can help.
My best regards, 
Livan Valladares Martell.
> On Jan 19, 2015, at 6:24 PM, discuss-request at mpich.org wrote:
> 
> Send discuss mailing list submissions to
> 	discuss at mpich.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://lists.mpich.org/mailman/listinfo/discuss
> or, via email, send a message with subject or body 'help' to
> 	discuss-request at mpich.org
> 
> You can reach the person managing the list at
> 	discuss-owner at mpich.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of discuss digest..."
> 
> 
> Today's Topics:
> 
>   1. Re:  MPICH 2 script to MPICH 3 (Kenneth Raffenetti)
>   2.  error installing mpich-master-v3.2a2-113-g7d59d14d	on Linux
>      (Siegmar Gross)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Mon, 19 Jan 2015 10:04:47 -0600
> From: Kenneth Raffenetti <raffenet at mcs.anl.gov>
> To: <discuss at mpich.org>
> Subject: Re: [mpich-discuss] MPICH 2 script to MPICH 3
> Message-ID: <54BD2B1F.7010106 at mcs.anl.gov>
> Content-Type: text/plain; charset="utf-8"; format=flowed
> 
> Hi Livan,
> 
> It should be possible to convert your script to use Hydra. Hydra has 2 
> main components. The first is the main hydra executable - mpiexec. The 
> other is the hydra_pmi_proxy.
> 
> In a normal ssh launched scenario, mpiexec will launch proxies on all 
> nodes being used for the job, and those proxies will launch the mpi 
> processes.
> 
> If ssh is not possible in your setup, you should be able to utilize the 
> hydra manual laucher (mpiexec -launcher manual). When you run mpiexec 
> with that parameter, it will output the proxy launch commands that can 
> be used to connect to it. It will then wait for all the proxies to 
> connect. Here's an example:
> 
> raffenet at doom:mpich/ $ mpiexec -launcher manual -n 3 -hosts a,b,c 
> /bin/hostname
> HYDRA_LAUNCH: /sandbox/mpich/i/bin/hydra_pmi_proxy --control-port 
> doom:36311 --rmk user --launcher manual --demux poll --pgid 0 --retries 
> 10 --usize -2 --proxy-id 0
> HYDRA_LAUNCH: /sandbox/mpich/i/bin/hydra_pmi_proxy --control-port 
> doom:36311 --rmk user --launcher manual --demux poll --pgid 0 --retries 
> 10 --usize -2 --proxy-id 1
> HYDRA_LAUNCH: /sandbox/mpich/i/bin/hydra_pmi_proxy --control-port 
> doom:36311 --rmk user --launcher manual --demux poll --pgid 0 --retries 
> 10 --usize -2 --proxy-id 2
> HYDRA_LAUNCH_END
> 
> Let me know if you have other questions.
> 
> Ken
> 
> On 01/19/2015 05:39 AM, Livan Valladares wrote:
>> Hello,
>> I have been using Htcondor with MPICH 2 but I update MPICH 2 to MPICH 3 and now the MPD process manager has been deprecated and Hydra is de default process manager.
>> The script I was using use some commands like mpdtrace, mpdallexit but they are not supported anymore.
>> Here is my script, is it any possibility to change this script code to be able to work with Hydra?
>> Thank you very much,
>> Livan Valladares Martell
>> Script:
>> 
>> #!/bin/sh
>> 
>> ##**************************************************************
>> ##
>> ## Copyright (C) 1990-2014, Condor Team, Computer Sciences Department,
>> ## University of Wisconsin-Madison, WI.
>> ##
>> ## Licensed under the Apache License, Version 2.0 (the "License"); you
>> ## may not use this file except in compliance with the License.  You may
>> ## obtain a copy of the License at
>> ##
>> ##    http://www.apache.org/licenses/LICENSE-2.0
>> ##
>> ## Unless required by applicable law or agreed to in writing, software
>> ## distributed under the License is distributed on an "AS IS" BASIS,
>> ## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> ## See the License for the specific language governing permissions and
>> ## limitations under the License.
>> ##
>> ##**************************************************************
>> 
>> 
>> # Set this to the bin directory of MPICH installation
>> MPDIR=/opt/local/bin
>> PATH=$MPDIR:.:$PATH
>> export PATH
>> 
>> _CONDOR_PROCNO=$_CONDOR_PROCNO
>> _CONDOR_NPROCS=$_CONDOR_NPROCS
>> 
>> # Remove the contact file, so if we are held and released
>> # it can be recreated anew
>> 
>> rm -f $CONDOR_CONTACT_FILE
>> 
>> PATH=`condor_config_val libexec`/:$PATH
>> 
>> # mpd needs a conf file, and it must be
>> # permissions 0700
>> mkdir tmp
>> MPD_CONF_FILE=`pwd`/tmp/mpd_conf_file
>> export MPD_CONF_FILE
>> 
>> ulimit -c 0
>> 
>> # If you have a shared file system, maybe you
>> # want to put the mpd.conf file in your home
>> # directory
>> 
>> echo "password=somepassword" > $MPD_CONF_FILE
>> chmod 0700 $MPD_CONF_FILE
>> 
>> # If on the head node, start mpd, get the port and host,
>> # and condor_chirp it back into the ClassAd
>> # so the non-head nodes can find the head node.
>> 
>> if [ $_CONDOR_PROCNO -eq 0 ]
>> then
>> 	mpd > mpd.out.$_CONDOR_PROCNO 2>&1 &
>> 	sleep 1
>> 	host=`mpdtrace -l | sed 1q | tr '_' ' ' | awk '{print $1}'`
>> 	port=`mpdtrace -l | sed 1q | tr '_' ' ' | awk '{print $2}'`
>> 
>> 	condor_chirp set_job_attr MPICH_PORT $port
>> 	condor_chirp set_job_attr MPICH_HOST \"$host\"
>> 	
>> 	num_hosts=1
>> 	retries=0
>> 	while [ $num_hosts -ne $_CONDOR_NPROCS ]
>> 	do
>> 		num_hosts=`mpdtrace | wc -l`
>> 		sleep 2
>> 		retries=`expr $retries + 1`
>> 		if [ $retries -gt 100 ]
>> 		then
>> 			echo "Too many retries, could not start all $_CONDOR_NPROCS nodes, only started $num_hosts, giving up.  Here are the hosts I could start "
>> 			mpdtrace
>> 			exit 1
>> 		fi
>> 	done
>> 
>> 	## run the actual mpi job, which was the command line argument
>>  	## to the invocation of this shell script
>>  	mpiexec -n $_CONDOR_NPROCS $@
>> 	e=$?
>> 
>> 	mpdallexit
>> 	sleep 20
>> 	echo $e
>> else
>> 	# If NOT the head node, acquire the host and port of
>>  	# the head node
>>  	retries=0
>> 	host=UNDEFINED
>> 	while [ $host == "UNDEFINED" ]
>> 	do
>> 		host=`condor_chirp get_job_attr MPICH_HOST`
>> 		sleep 2
>> 		retries=`expr $retries + 1`
>> 		if [ $retries -gt 100 ]; then
>>                     echo "Too many retries, could not get mpd host from condor_chirp, giving up."
>>                     exit 1
>>                 fi
>> 	done
>> 
>> 	port=`condor_chirp get_job_attr MPICH_PORT`
>> 	host=`echo $host | tr -d '"'`
>> 	mpd --host=$host --port=$port > mpd.out.$_CONDOR_PROCNO 2>&1
>> fi
>> 
>> 
>> 
>> 
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Mon, 19 Jan 2015 17:24:22 +0100 (CET)
> From: Siegmar Gross <Siegmar.Gross at informatik.hs-fulda.de>
> To: discuss at mpich.org
> Subject: [mpich-discuss] error installing
> 	mpich-master-v3.2a2-113-g7d59d14d	on Linux
> Message-ID: <201501191624.t0JGOMgd017579 at tyr.informatik.hs-fulda.de>
> Content-Type: text/plain; charset="us-ascii"
> 
> Hi,
> 
> today I tried to install mpich-master-v3.2a2-113-g7d59d14d on my
> machines (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux
> 12.1 x86_64) with gcc-4.9.2 and Sun C 5.13. I succedded on both
> Solaris machines with both compilers and got the following error
> on Linux for both compilers.
> 
> tyr mpich-master-v3.2a2-113-g7d59d14d-Linux.x86_64.64_gcc 304 cat log.make.Linux.x86_64.64_gcc 
> if test ! -h ./src/include/mpio.h ; then \
>    rm -f ./src/include/mpio.h ; \
>    ( cd ./src/include &&       \
>        ln -s ../mpi/romio/include/mpio.h ) ; \
> fi
> make  all-recursive
> make[1]: Entering directory `/export2/src/mpich-3.2/mpich-master-v3.2a2-113-g7d59d14d-Linux.x86_64.64_gcc'
> make[1]: execvp: /bin/sh: Argument list too long
> make[1]: *** [all-recursive] Error 127
> make[1]: Leaving directory `/export2/src/mpich-3.2/mpich-master-v3.2a2-113-g7d59d14d-Linux.x86_64.64_gcc'
> make: *** [all] Error 2
> tyr mpich-master-v3.2a2-113-g7d59d14d-Linux.x86_64.64_gcc 305 
> 
> 
> I used the following "configure" command.
> 
> ../mpich-master-v3.2a2-113-g7d59d14d/configure --prefix=/usr/local/mpich-3.2_64_gcc \
>  --libdir=/usr/local/mpich-3.2_64_gcc/lib64 \
>  --includedir=/usr/local/mpich-3.2_64_gcc/include64 \
>  CC="gcc" CXX="g++" F77="gfortran" FC="gfortran" \
>  CFLAGS="-m64" CXXFLAGS="-m64" FFLAGS="-m64" FCFLAGS="-m64" \
>  LDFLAGS="-m64 -L/usr/lib/sparcv9 -Wl,-rpath -Wl,/usr/lib/sparcv9" \
>  --enable-f77 --enable-fc --enable-cxx --enable-romio \
>  --enable-debuginfo --enable-smpcoll \
>  --enable-threads=runtime --with-thread-package=posix \
>  --enable-shared \
>  |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_gcc
> 
> 
> I would be grateful if somebody can fix the problem. Thank you very
> much for any help in advance.
> 
> 
> Kind regards
> 
> Siegmar
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: config.tar.gz
> Type: application/octet-stream
> Size: 85796 bytes
> Desc: config.tar.gz
> URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150119/9a9e2e99/attachment.obj>
> 
> ------------------------------
> 
> _______________________________________________
> discuss mailing list
> discuss at mpich.org
> https://lists.mpich.org/mailman/listinfo/discuss
> 
> End of discuss Digest, Vol 27, Issue 12
> ***************************************

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list