After some more debugging attempts and scrutiny of the DMTCP library.
I think to have found a solution to the java checkpointing problem.
It seems that the culprit was the java monitoring system enabled by UsePerfData.
This enables java to use the jvmstat instrumentation for performance testing and problem isolation purposes.
(Source : UserPerfData)
It also saves annoying data in shared memory and on disk, data which DMTCP seemingly cannot handle.
This flag is turned on by default but disabling this flag has fixed our checkpoint and restart problems.
This can be done by supplying -XX:-UsePerfData to the jvm.
I have send the suggestion towards the DMTCP developers.
Geen opmerkingen:
Een reactie posten