donderdag 6 juni 2013

CT Checkpointing and others

I have managed to get the CT library working and have started testing DMTCP with it.
Until now it works with the standard 1.2.7 release of DMTCP but it does not work with the latest SVN release.
After having tried the svn checkout of the fix I had supplied to get the VM snapshotting working, it was also noted that restarting did not work.
Something with a deadlock I think since this is the last received debug message:

[40000] TRACE at connectionlist.cpp:387 in refill; REASON='Waiting for Missing Cons'

Java is also tested some more, the main test supplied by DMTCP already fails when 1.2.6 is used.
This is in contrast with my local system where these tests do not fail.
Further investigation has shown that on 32bit systems the snapshots seem to work while on a 64bit system it does not. This explained why it worked on my local system since it is a 32bit OS.
On 1.2.7 both systems fail the java checkpoint tests.
All were using the most recent Java SDK: OpenJDK 1.7


Another additional problem was noted concerning the userid used to run the dmtcp commands.
When supplying userdata to an instance when they are launching, this will be executed as root.
Consequently this means that the workers were running at root level and were launching processes in this state as well.
DMTCP does not like to run as root but there are some ways to circumvent this.
Currently it is noted that those solutions are not adequate and are removed in favor of changing the userlevel.
Now the worker and the user processes are run at the default user level. (ubuntu in the case of Ubuntu AMIs)
Note to myself: don't use "su username command" but use "sudo -u username command".


Meanwhile the usage of directory snapshotting is nearly completed.

Geen opmerkingen:

Een reactie posten