Sunday, January 16, 2011

CIS(theta) Meeting IX (2010-2011) - try, try, try, try, try, try, ...., again!!!


Aim: 
pelican HPC 2.2 64bit: If at first you don't succeed, try, try, try, try, try, try, ...., again!!!

Attending: 
CIS(theta) 2010-2011: DavidG, HerbertK, JoshG, RyanH

Reading: 
pelican HPC CD 64bit version 2.2: /home/usr/*
pelican HPC CD 64bit version 2.2: /usr/bin/pelican*

http://pelicanhpc.788819.n4.nabble.com/pelicanHPC-ReMaster-td3163841.html

It's been a while since I blogged, but that's just because we've been busy trying out several Linux liveCD distributions for setting up High Performance Clusters!  As you will see in the nabble link above, we have a thread on the http://www.pelicanhpc.org forum regarding our efforts in getting pelicanHPC working on our LAN even though we have tons of DHCP conflicts.  We also tried out Cluster By Night which does not suffer from the DHCP problem as it sets up public key authenticated ssh to connect nodes which works right out of the box!  I'm also revisiting BCCD as detailed below.

BCCD Update
I just recently revisited the Bootable Cluster CD project sponsored by the University of Northern Iowa.  Years ago, when Paul Gray started the project, he was very helpful to my students in setting up an openMOSIX cluster using BCCD.  This was back in the day when clusterKNOPPIX and QUANTIAN were the only other competitors.  Now, Paul is not so intimately involved with the project.  Take a look at the BCCD links listed above.  You will see the main site, the site for version 3.0 and the developers' forum to which I have contributed a little in December and January.  So, now it seems there's a team of developers involved from the littleFe project to Earlham and Contra Costa Colleges.

Last week I downloaded their latest ISO (3.0 for 64bit).  The problem with this distro is that its slightly too big for a CD, so I had to burn a DVD.  Another issue is that when BCCD boots, it copies itself to RAM.  An ISO file has some compression associated with it, so the 800MB ISO expands to over 1GB in RAM.  I cannot use this ISO as my worker nodes, 64bit Athlon dual-cores, only have 750MB RAM.  Next step will be to try the 32bit ISO which is smaller.  In fact, I did try the 32bit version on my Xeon dual-cores and the system seems to boot all the way but then fails just when the X windowed desktop should appear.  I'll try to boot with the linux 4 cheat code next time.



Cluster by Night Screencast from Dept. Dirigible Flightcraft on Vimeo.
Cluster By Night Update
Cluster By Night is proving to be very interesting.  Kevin Lynagh created 2 ISOs for 32bit arch.  One boots the Master and then each Worker node uses a separate ISO.  So, we burned 1 master and 7 workers to test out CbN.  Each ISO is only about 15MB so its a fast download, burn and boot!  You get a  full openMPI installation and you can boot up the cluster using public key authenticated ssh in a mater of minutes.  There are sample apps on CbN but not much source code or dox.  So, we booted up a 32bit pelicanHPC 2.2 CD and compiled flops.f using mpif77 -o flops flops.f since doing so in CbN causes errors.  It would seem that the openMPI installation on CbN is incomplete.  At least gfortran is missing.  Then I copied the executable program flops to the CbN master node from sftp user@10.10.129.21 as pelicanHPC which was running on worker node #21.  Then we ran cbn_run -np 16 flops from the master and got 8 nodes, 16 cores running at nearly 7GFLOPS!  Again, openMPI is incomplete as using mpi_run causes a ton of errors too!  All in all, CbN is very promising. 


Pelican Update
We are focusing this week's readings on the files on a booted pelicanHPC 2.2 64bit CD.  We printed out all the files in /home/user (including pelican_config) and all the files starting /usr/bin/pelican* (system scripts for starting the cluster).  Michael Creel says we should be able to modify how pelicanHPC boots by editing pelican_config and putting it in a persistent home volume called PELHOME.  In this manner, we hope to disable pelican's DHCP server so as to use the existing DHCP service on eth0 already.  Once we eliminate the DHCP conflicts, we should be able to boot the worker nodes from the one master CD via PXE after all.  Wish us luck!

Well, that's all for this week!  Please stay tuned.  We hope this record will help others trying to make their own clusters.  Also, if you have any hints or pointers, please leave a comment to let us know.  

Happy Clustering,

No comments:

Post a Comment