[mpich-discuss] Problems with running make

Reuti reuti at staff.uni-marburg.de
Mon Mar 3 08:13:12 CST 2014


Am 03.03.2014 um 01:34 schrieb Ron Palmer:

> Gus,
> I have just replied with the details of the success, but I will clarify your questions here, if it helps next time.
> 
> Re the actual application to be run, 'inversion', I have only received binaries. I used to run them on gainsborough (without mpi) and that worked fine.
> 
> Home directories are not nfs shared, they are individual and separate, only the name is repeated.
> 
> I had tried password free ssh in all directions and permutations.
> 
> I had iptables down on sargeant and up on the other two.
> 
> Yes, I installed the gcc-gfortran.x86_64 AFTER I took those screnshots, and the post install output was identical to the top one (sargeant).
> 
> I am unsure about cpi and fortran...
> 
> Stuff remaining to get sorted out:
> 1. Get that hyperthreading set up - what is your suggestions? Disable and let mpi manage the cores?

It depends on your applications, whether you can make use of it. Test it with HT switched off, switched on and increase the number of processes, until you see that any process is no longer running at 100% and you judge that you don't tolerate this slow down.

It's not uncommon to see an improvement up to 150% with HT turned on, but not 200% (depending on the workload).


> 2. run mpiexec with iptables up, need to figure out what traffic to allow.

https://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions#Q:_How_do_I_control_which_ports_MPICH_uses.3F

-- Reuti


> Great many thanks to all, and Gus, Reuti and Rajeev in particular.
> 
> Cheers,
> Ron
> 
> On 3/03/2014 10:09, Gustavo Correa wrote:
>> Hi Ron
>> On Mar 2, 2014, at 6:50 PM, Ron Palmer wrote:
>> 
>>> Gus,
>>> following my reply to Reut's email, I thought I will clarify some of the details of my system.
>>> 
>>> I do not have control over the actual software to be run on the cluster. I have asked whether it requires fortran or not but have not yet received any answers. I do know that they are running the very same version of this application on a mpi 1.4.1 cluster.
>>> 
>>> I have those two PATH statements (PATH and LD_LIBRARY_PATH) in each of the three computers /home/pgcinversion.
>>> 
>> If they gave you just an executable, i.e. if you didn't compile the code and linked it to
>> MPICH, then there is little hope that it will run, just because the 1.4.1 libraries (assuming
>> they are MPICH libraries, which you didn't say) are not the same that you installed.
>> 
>> On the other hand, if you compiled the code yourself, you must know which MPICH compiler
>> wrapper you used to do this.  If it was mpicc, then it is a C program, if it was mpif77 or mpif90,
>> then it is a Fortran program.
>> 
>>> Each of the computers have their own users, separate home directories, I just created the same username on all three of them, pgcinversion.
>>> 
>> Make sure you set up PATH and LD_LIBRARY_PATH on all three home directories on
>> each computer on the respective .bashrc/.tcshrc files
>> (as it looks like the home directories are not NFS shared).
>> 
>>> sargeant is the master, most hdd space, and has exported /inv, which is then mounted via nfs in fstab by the others.
>>> 
>>> constable and gainborough are slaves
>>> 
>>> all three can connect to any of the other two without ssh password
>>> 
>> OK, so you tested that I suppose, on all node pairs, in both directions, right?
>> 
>>> I am not sure whether I have compiled with shared libraries, though in my reply to reuti I had screenshots of the output of ldd cpi on the three computers, will that answer your question?
>>> 
>> Yes you did.
>> The ldd cpi screenshots that you just sent to Reuti show that the cpi linked to
>> the mpich shared libraries (e.gl libmpich.so.12, note "so"=shared object, or shared library)
>> 
>>> in regards to compiling cpi, I used your suggestion of
>>> mpicc -o cpi cpi.c
>>> mpiexec -machinefile all_machines -np 24 ./cpi
>> Are you sure the "cpi" executable is the same in the compilation and execution command
>> lines above?
>> Can you start fresh, delete the cpi executable, recompile and re-run, perhaps?
>> 
>> I say this because I am still rather confused by why then the
>> ldd cpi screenshots you just sent show
>> libgfortran.so.6 as a library cpi was linked to.
>> If you really compiled with mpicc, Fortran shouldn't play any role, I guess.
>>> should I modify and re-compile? I am really unsure about how to set it up.
>>> 
>> Yes, start fresh, delete cpi, recompile, rerun.
>> 
>>> I saved the version of cpi that came with the download as cpi_orig, and then ran the one I compiled (cpi) and the one that came with the download, cpi_orig. A previous email of mine has those screenshots (happy to copy and paste again if you prune your email history).
>>> 
>> Oh, maybe you are using an old executable, compiled who knows how.
>> Start fresh to avoid confusion, please.
>> 
>>> NOTE: You *must install on ALL computers* otherwise the runtime will be missing in some of them. Note that rpms install on *local* directories (/usr and friends), not on you NFS share (as opposed to what you did with MPICH). Was this the issue?
>>> Gus, as Reuti suspected, there was (and may still be) a lack of libraries installed, see my reply to Reuti 30min ago. Installing compilers and getting all parts right is my weakest link, and where I had  guess a bit at the start (is that called learning?????). Once I hear back from the list on the outputs of ldd cpi, I will install all listed by yum list | grep gfortran.
>>> 
>> OK, so apparently you took corrective action and installed gfortran on constable and
>> gainsborouth *AFTER* you made those three screenshots, right?
>> Please, clarify.
>> 
>>> I have shut down iptables on sargeant, but keep them running on the other two . I will log on to the cluster, shut them down, try cpi again and report back.
>>> 
>> I would stop IPtables on all three computers, at least until you have everything sorted out.
>> 
>> I hope this helps,
>> Gus Correa
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss




More information about the discuss mailing list