That’s one of my coworker’s favorite lines during staff meeting. “I’ve got a breakthrough scheduled for sometime this week.” My breakthrough really was apparently scheduled for this week. Ever since getting our new cluster, we’ve been trying to determine ways to run our analysis missions at work more efficiently. So, Sally gave me the task of finding some batch management software, downloading it, figuring out how to run it and then test run it with some of our scripts that we have. So, for the past few weeks, I’ve been immersed in reading documentation, installing the software that I found and beating my head against the monitor in frustration. The software that I found is PBS – Portable Batch Software The software is great. It allows me to schedule jobs and then I can tell it how I want these jobs to run and I can put jobs into various different queues, etc. It was exactly what we were looking for. Unfortunately, the documentation sucks. I muddled my way through the installation documentation, but came to a dead halt when it came time to actually schedule and run jobs through PBS. That was about a week ago.
Yesterday, I finally got a response to my plea for help on the mailing list. Turns out a guy had the exact same type of setup that I wanted for the software and he was able to point out some mistakes I’d made and even gave me a script file to use for some of the scheduling. I was ecstatic. So, yesterday, I got PBS to send out 20 jobs to the 5 nodes (each with 2 virtual processors) and manage the jobs so that as soon as one job finished, another job was started.
Just to show an example of how well this software works… Before, we ran the jobs on an SGI and each one took 1 hour. Then, we moved everything over to Linux and it took 3 minutes to run each job. Now, with the 5 nodes able to run 10 jobs at a time and manage them properly, it takes me 4 minutes, 57 seconds, to run all 20 jobs. I’m ecstatic. This is going to save us so much time, it’s not even funny. *happy dance*