zondag 3 maart 2013

AMICreator

Having continued with the work our next challenge was to somehow provide a job with a good starting position.
With this I mean a functional AMI that is capable of running a Worker and where the worker is capable of executing a given job.
This is where the Prologue files are being used for.

Having first created a basic AMI for the complete CBAS project, starting from Ubuntu 12.04.
I updated and installed the default Java version, build essential and updated Boto.
Afterwards I compiled the DMTCP version which has the plugin and 'system' support from the last SVN update they provide.
During this compilation process I noticed some of the tests that are provided in DMTCP have failed.
So this is by no means a release version and could have bugs while checkpointing certain applications.

Next I created and compiled my plugin that will snapshot the virtual machine.
Having then taken a snapshot and created an AMI, this will form the CBAS AMI.

Next off was to create job specific AMI's by using the prologue files that were provided within the JDL.
To do this I made an AMICreator which will launch an instance and execute all files that are provided in it's userdata.
After it has finished it will inform us by posting a message in a temporarily created SQS queue whereafter we clean up the used resources.
The files provided in the userdata script are presigned URLs from the prologue files that are uploaded to S3.

Having created a job specific AMI, we use this in our worker manager to start a new instance.
I was planning on launching an instance and then starting a worker on it by using the same methodology as the run files that are used in the original CBAS project.
For that reason I have created a private Git repository on Bitbucket.

But when testing this scenario, I concluded that either the git credentials need to be hardcoded into this script or ssh information needs to be exchanged between the new worker and Bitbucket.
Both are really ugly solutions so I think I'll opt for a third: uploading the source code to an S3 bucket and downloading it from there.

When the instance is launched, we send an SNS message to the new worker indicating that we want to start a job.

The next part will then be to adapt the worker code to our requirements.

Geen opmerkingen:

Een reactie posten