woensdag 19 juni 2013

Manual snapshotting fails

As a final attempt to get some trustworthy behavior from DMTCP, I changed the checkpointing system to use dmtcp_command .
This means that instead of letting DMTCP automatically checkpoint after a given period of time, I will now time it myself and issue the checkpoint command.
Something that can be done through the usage of dmtcp_command -c

This causes restarts to sometimes crash with a 134 error code.
It is very frustrating to not have consistent behavior but I have no other means of explaining it.
In some cases the restart works in others it does not with the above error code as a result.

Next to that I also noticed that whenever a restart works, it will not take a second snapshot anymore.
So the new method through the command part is something I will remove again.

Whenever I manually try it seems to be working, so perhaps it has something to do with output not being read?
I have not found any information about this causing crashes to the child process.

Geen opmerkingen:

Een reactie posten