[mpich-discuss] Problems with running make

Ron Palmer ron.palmer at pgcgroup.com.au
Sun Mar 2 18:34:27 CST 2014


Gus,
I have just replied with the details of the success, but I will clarify 
your questions here, if it helps next time.

Re the actual application to be run, 'inversion', I have only received 
binaries. I used to run them on gainsborough (without mpi) and that 
worked fine.

Home directories are not nfs shared, they are individual and separate, 
only the name is repeated.

I had tried password free ssh in all directions and permutations.

I had iptables down on sargeant and up on the other two.

Yes, I installed the gcc-gfortran.x86_64 AFTER I took those screnshots, 
and the post install output was identical to the top one (sargeant).

I am unsure about cpi and fortran...

Stuff remaining to get sorted out:
1. Get that hyperthreading set up - what is your suggestions? Disable 
and let mpi manage the cores?
2. run mpiexec with iptables up, need to figure out what traffic to allow.

Great many thanks to all, and Gus, Reuti and Rajeev in particular.

Cheers,
Ron

On 3/03/2014 10:09, Gustavo Correa wrote:
> Hi Ron
> On Mar 2, 2014, at 6:50 PM, Ron Palmer wrote:
>
>> Gus,
>> following my reply to Reut's email, I thought I will clarify some of the details of my system.
>>
>> I do not have control over the actual software to be run on the cluster. I have asked whether it requires fortran or not but have not yet received any answers. I do know that they are running the very same version of this application on a mpi 1.4.1 cluster.
>>
>> I have those two PATH statements (PATH and LD_LIBRARY_PATH) in each of the three computers /home/pgcinversion.
>>
> If they gave you just an executable, i.e. if you didn't compile the code and linked it to
> MPICH, then there is little hope that it will run, just because the 1.4.1 libraries (assuming
> they are MPICH libraries, which you didn't say) are not the same that you installed.
>
> On the other hand, if you compiled the code yourself, you must know which MPICH compiler
> wrapper you used to do this.  If it was mpicc, then it is a C program, if it was mpif77 or mpif90,
> then it is a Fortran program.
>
>> Each of the computers have their own users, separate home directories, I just created the same username on all three of them, pgcinversion.
>>
> Make sure you set up PATH and LD_LIBRARY_PATH on all three home directories on
> each computer on the respective .bashrc/.tcshrc files
> (as it looks like the home directories are not NFS shared).
>
>> sargeant is the master, most hdd space, and has exported /inv, which is then mounted via nfs in fstab by the others.
>>
>> constable and gainborough are slaves
>>
>> all three can connect to any of the other two without ssh password
>>
> OK, so you tested that I suppose, on all node pairs, in both directions, right?
>
>> I am not sure whether I have compiled with shared libraries, though in my reply to reuti I had screenshots of the output of ldd cpi on the three computers, will that answer your question?
>>
> Yes you did.
> The ldd cpi screenshots that you just sent to Reuti show that the cpi linked to
> the mpich shared libraries (e.gl libmpich.so.12, note "so"=shared object, or shared library)
>
>> in regards to compiling cpi, I used your suggestion of
>> mpicc -o cpi cpi.c
>> mpiexec -machinefile all_machines -np 24 ./cpi
> Are you sure the "cpi" executable is the same in the compilation and execution command
> lines above?
> Can you start fresh, delete the cpi executable, recompile and re-run, perhaps?
>
> I say this because I am still rather confused by why then the
> ldd cpi screenshots you just sent show
> libgfortran.so.6 as a library cpi was linked to.
> If you really compiled with mpicc, Fortran shouldn't play any role, I guess.
>> should I modify and re-compile? I am really unsure about how to set it up.
>>
> Yes, start fresh, delete cpi, recompile, rerun.
>
>> I saved the version of cpi that came with the download as cpi_orig, and then ran the one I compiled (cpi) and the one that came with the download, cpi_orig. A previous email of mine has those screenshots (happy to copy and paste again if you prune your email history).
>>
> Oh, maybe you are using an old executable, compiled who knows how.
> Start fresh to avoid confusion, please.
>
>> NOTE: You *must install on ALL computers* otherwise the runtime will be missing in some of them. Note that rpms install on *local* directories (/usr and friends), not on you NFS share (as opposed to what you did with MPICH). Was this the issue?
>> Gus, as Reuti suspected, there was (and may still be) a lack of libraries installed, see my reply to Reuti 30min ago. Installing compilers and getting all parts right is my weakest link, and where I had  guess a bit at the start (is that called learning?????). Once I hear back from the list on the outputs of ldd cpi, I will install all listed by yum list | grep gfortran.
>>
> OK, so apparently you took corrective action and installed gfortran on constable and
> gainsborouth *AFTER* you made those three screenshots, right?
> Please, clarify.
>
>> I have shut down iptables on sargeant, but keep them running on the other two . I will log on to the cluster, shut them down, try cpi again and report back.
>>
> I would stop IPtables on all three computers, at least until you have everything sorted out.
>
> I hope this helps,
> Gus Correa




More information about the discuss mailing list