zondag 17 februari 2013

Frontend

The following part I tackled was an easy usable frontend for the system.
In the received implementation there was no need for such a thing since master and client were running on the same machine.
This is no longer possible in a decent implementation of the system.
As such there was a need to communicate in some way with the master, send new jobs and view some information about the job.
I have followed my own suggestion from 2 posts ago with this result :



On the screenshot we are running the frontend locally.
The given bogus information is being used to generate a JDL and 2 archives, one for the prologue files and one for the inputsandbox files.
We have then requested a jobid from the master which is running on an EC2 instance.
Get a reply and start uploading everything to S3 whereafter we send a message to officially add the job.
All information is also written to the dynamodb database.

In the process of implementing this, the AWS system and the messages system were redesigned and implemented anew.

The next steps that will need to be taken is the launching of the new job on a worker.
This will require to first handle the prologue and AMI creation, then deploying everything and making sure the dmtcp process works properly.


maandag 11 februari 2013

Snapshotting successful

After having struggled with DMTCP and having decided to mail the developers there has been a lot of help coming from their side.
Together it became apparent that I was in need of the plugin system that DMTCP offers.
Example:

void dmtcp_process_event(DmtcpEvent_t event, void* data)
{
  /* NOTE:  See warning in plugin/README about calls to printf here. */
  switch (event) {
  case DMTCP_EVENT_INIT:
    printf("The plugin containing %s has been initialized.\n", __FILE__);
    break;
  case DMTCP_EVENT_PRE_CHECKPOINT:
    printf("\n*** The plugin is being called before checkpointing. ***\n");
    break;
  case DMTCP_EVENT_POST_CHECKPOINT:
    printf("*** The plugin has now been checkpointed. ***\n");
    break;
  case DMTCP_EVENT_POST_CHECKPOINT_RESUME:
    printf("The process is now resuming after checkpoint.\n");
    break;
  case DMTCP_EVENT_POST_RESTART_RESUME:
    printf("The plugin is now resuming or restarting from checkpointing.\n");
    break;
  case DMTCP_EVENT_PRE_EXIT:
    printf("The plugin is being called before exiting.\n");
    break;
  /* These events are unused and could be omitted.  See dmtcpplugin.h for
   * complete list.
   */
  case DMTCP_EVENT_POST_RESTART:
  case DMTCP_EVENT_RESET_ON_FORK:
  case DMTCP_EVENT_POST_SUSPEND:
  case DMTCP_EVENT_POST_LEADER_ELECTION:
  case DMTCP_EVENT_POST_DRAIN:
  default:
    break;
  }
  NEXT_DMTCP_PROCESS_EVENT(event, data);
}
But as stated in one of my previous posts, system calls were impossible.
After having mentioned this to the developers (Kapil Arya in particular), this function is now supported.
As a result we can now take complete snapshots of a VM right after the DMTCP checkpoint files are written to disk and before the program which snapshot we are taking has been restarted.

Many thanks to the development team of DMTCP for all their aid !