Tuesday, December 28, 2010

First experience with cluster

I have been thinking about using our cluster but had been so far procrastinating until now. So this past week I thought I should submit a job myself instead of asking others to do this. Even if others submit jobs for me, I need to know some basics of working with clusters … and I mean real basics….

So, I borrowed a completely written script from a smart colleague in which I had to change few words only, basically location of files – input files and output files. And then I took that step … step of submitting a job to a computer cluster.

Result: complete failure; I have been so far unable to run that one… and I am talking about one only … job. Every time when I submit I get error “Requeued job is waiting for rescheduling”. I have still not figured out what and where the problem is. I have been asking friends and colleagues and they have been very forth coming but despite their best efforts and my repeated attempts (and prayers) I have gotten nothing but failure.

However, in the process I have learned few shell commands which I would probably have never learned … at least not this soon.

sh : submit your script

bjobs : submit your job

bjobs –lp : see why your job is still pending- this is the command that told me that my job was waiting for rescheduling or something… whatever!

bkill : get rid of the job that you have submitted

bkill –q <queue name> –u all 0 : this will kill all jobs in that queue

bhosts : shows hosts and load on them

busers : shows your submitted work/jobs

busers all : shows submitted jobs by everyone

bqueues : shows different queues

Alright, so this is enough learning so far. I have to get back to trying again and again and see if repetition of same silly things changes anything ……..Smile