The backup system is in place and has already been doing several testruns.
Unfortunately not with very good results. The problems seem to keep on appearing.
I have also noticed by accident that when running multiple checkpoints before a single restart, it seems that the amount of errors is greatly reduced.
In fact when taking snapshots every 5 minutes (which nearly caused problems while using the VM checkpointing strategy since it takes around 2.5 minutes to perform this checkpoint and in worse case could even take longer.), there was not a single error in the restart procedure.
But the runtime of the VM checkpointing strategy was doubled.
I also coincidentally discovered that DMTCP has changed their tests to incorporate my proposed solution but have not heard much more about this. My current Java tests show no sign of problems with DMTCP when my fix is in place. (-XX:-UsePerfData)
Geen opmerkingen:
Een reactie posten