Sunday, January 31, 2010

SHADOWFAX: Many miles left to go! (setting up openMPI clusters)

OK, nobody said it would be easy!  Whoever said "instant super computer, just add water" when referring to setting up an openMPI cluster over a LAN of 25 multi core 64-bit nodes was nuts.  Oh, yeah, that was me...

How about a little bit of history?  CIS(theta), the Computing Independent  Study class, has been around in one incarnation or another for over a decade:

2009-2010 SHADOWFAX openMPI cluster with dual core 64-bit AMD 2 GHz Athlons (work in progress).  We are trying to emulate pelicanHPC (http://idea.uab.es/mcreel/ParallelKnoppix) on a Linux Partition.  We based this cluster on the Fedora 11 CD 64-bit Linux distribution and gigaE (gigabit Ethernet).

2008-2009 CISTHETA public key authentication cluster with dual core 64-bit AMD 2 GHz Athlons (used bash scripts to scatter/gather jobs on new architecture).  We did some nice ray tracings using povray (http://www.zazzle.com/cistheta2008).  We based this cluster on the KNOPPIX DVD 32-bit Linux distribution and fastE (100 megabit Ethernet).

1997-2007 CENTAURI openMOSIX cluster with various Pentium class processors (modified load balancing Linux Kernel that automatically sent jobs to other nodes whenever a node was overworked).  We also dabbled with Beowulf, PVM and MPI.  We based this cluster primarily on cluster KNOPPIX (http://clusterknoppix.sw.be), QUANTIAN (http://www.quantian.org) and BCCD (http://bccd.net) and basE (10 megabit Ethernet). For the last few years, we installed the QUANTIAN DVD to a Linux Partition and used a dual boot lab (with WimpDoze).  We did some nice fractals with the fork() method of g++ (http://cistheta2007.deviantart.com).

The main difference between CISTHETA and SHADOWFAX is twofold. Last year, CISTHETA did not implement any clustering environment like MPI.  We simply broke up our jobs into slices and sent and recieved results using bash scripts.  Another main difference was that we set up public key authenticated ssh clients such that you could login as "cluster" on any node, and be automagically logged into all nodes.  In fact, you could ssh to any node with or without the userid (cluster) and you did not need a passwd.  Lets call that "static public key authenticated ssh."  The static method is set up once and used many times.  This year we are using something I'll call "dynamic public key authenticated ssh," such that the RSA keys are generated on the fly.  Under both schemes, the user could start a parallel job from any node making that node the master node and control any number of other nodes as workers.

What's frustrating about all this, is that most of the clustering software we've tried seem to work great "right out of the box" on any one multi core node.  In other words, SMP works by default, but extending the parallel jobs over a cluster of such nodes is a production!  These clustering packages are as follows:

parallelJava: http://www.cs.rit.edu/~ark/pj.shtml
parallelPython: http://www.parallelpython.com & http://www.sagemath.org/doc/numerical_sage/mpi4py.html
openMPI: http://www.open-mpi.org
dSAGE: http://www.sagemath.org and http://www.sagemath.org/tour-research.html

We've gotten openMPI's Torque severs to work with gcc.  However, the Linux distro we based our new cluster on this year was the Fedora 11 CD 64-bit version.  I think we may need to start over with the Fedora 12 DVD 64-bit Linux distribution so as to include g++ as well.  We may all need to re implement "static public key authenticated ssh" which worked very well last year....


Happy Clustering,

No comments:

Post a Comment