I have been thinking about using our cluster but had been so far procrastinating until now. So this past week I thought I should submit a job myself instead of asking others to do this. Even if others submit jobs for me, I need to know some basics of working with clusters … and I mean real basics….
So, I borrowed a completely written script from a smart colleague in which I had to change few words only, basically location of files – input files and output files. And then I took that step … step of submitting a job to a computer cluster.
Result: complete failure; I have been so far unable to run that one… and I am talking about one only … job. Every time when I submit I get error “Requeued job is waiting for rescheduling”. I have still not figured out what and where the problem is. I have been asking friends and colleagues and they have been very forth coming but despite their best efforts and my repeated attempts (and prayers) I have gotten nothing but failure.
However, in the process I have learned few shell commands which I would probably have never learned … at least not this soon.
sh : submit your script
bjobs : submit your job
bjobs –lp : see why your job is still pending- this is the command that told me that my job was waiting for rescheduling or something… whatever!
bkill : get rid of the job that you have submitted
bkill –q <queue name> –u all 0 : this will kill all jobs in that queue
bhosts : shows hosts and load on them
busers : shows your submitted work/jobs
busers all : shows submitted jobs by everyone
bqueues : shows different queues
Alright, so this is enough learning so far. I have to get back to trying again and again and see if repetition of same silly things changes anything ……..
No comments:
Post a Comment