dinsdag 18 juni 2013

All java tests failed once again

For some inexplicable reason all of my java tests have failed both the VM checkpointing and the DIRECTORY archiving strategy.

They simply stopped the checkpointing procedure for some unknown reason.
Examination of the output gives a lot of restarted calculations.
But something is very odd about them, they always restart from the same point and then continue to the same point whereafter they restart once more.

For example: a counter that should be going from 0 to 3600 now is stuck at 360 to around 838.
Then it starts again from 360. No output of checkpoints in between either.
This is very strange because going from 360 to 838 with 5 seconds in between each count would reach  39 minutes and 45 seconds. While a snapshot should start after 30 minutes.


Whenever a manually start and restart is executed, there seem to be no problems.

More tests have shown that indeed checkpointing is somehow not working anymore after a restart.
I want to declare DMTCP as too buggy for further use.
With some luck you can have your application fully operational but this is a very annoying factor.
No clear indications were find for the noticed behavior.
More thorough knowledge of DMTCP's internals is required or simply waiting for a more stable build.

Geen opmerkingen:

Een reactie posten