The output received from DMTCP itself showed the following behavior:
[40000] TRACE at dmtcpworker.cpp:518 in waitForCoordinatorMsg; REASON='waiting for REGISTER_NAME_SERVICE_DATA message' [40000] TRACE at dmtcpworker.cpp:670 in waitForStage3Refill; REASON='Key Value Pairs registered with the coordinator' [40000] TRACE at dmtcpworker.cpp:518 in waitForCoordinatorMsg; REASON='waiting for SEND_QUERIES message' [40000] TRACE at dmtcpworker.cpp:675 in waitForStage3Refill; REASON='Queries sent to the coordinator' [40000] TRACE at dmtcpworker.cpp:518 in waitForCoordinatorMsg; REASON='waiting for REFILL message' [40000] TRACE at kernelbufferdrainer.cpp:159 in refillAllSockets; REASON='refilling socket buffers' _drainedData.size() = 0 [40000] TRACE at kernelbufferdrainer.cpp:198 in refillAllSockets; REASON='buffers refilled' [40000] TRACE at dmtcpworker.cpp:689 in waitForStage4Resume; REASON='refilled' [40000] TRACE at dmtcpworker.cpp:518 in waitForCoordinatorMsg; REASON='waiting for RESUME message' [40000] TRACE at dmtcpworker.cpp:692 in waitForStage4Resume; REASON='got resume message'
Then a segmentation fault is received.
This segfault could be traced back to :
#0 0x00007f7386878bf1 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f738720b072 in length (__s=0x7f737bdff000 [address 0x7f737bdff000="" bounds="" of="" out=""] ) at /usr/include/c++/4.6/bits/char_traits.h:261 #2 operator<< [std::char_traits char=""] > (__s=0x7f737bdff000 [address 0x7f737bdff000="" bounds="" of="" out=""] , __out=...) at /usr/include/c++/4.6/ostream:515 #3 Print[char] (this=[optimized out=""], t=optimized out="") at ../../dmtcp/jalib/jassert.h:145 #4 dmtcp::FileConnList::remapShmMaps (this=0x7f7387665508) at file/fileconnlist.cpp:243 #5 0x00007f73871d6652 in dmtcp::ConnectionList::processEvent (this=0x7f7387665508, event=optimized out="", data=optimized out="") at connectionlist.cpp:113 #6 0x00007f73871cabd0 in dmtcp_process_event (event=DMTCP_EVENT_THREADS_RESUME, data=0x7f738549b5a0) at ipc.cpp:43 #7 0x00007f7386f3dfeb in dmtcp::DmtcpWorker::processEvent (event=DMTCP_EVENT_THREADS_RESUME, data=0x7f738549b5a0) at dmtcpworker.cpp:703 #8 0x00007f7386f3dee5 in dmtcp::DmtcpWorker::waitForStage4Resume (this=0x7f73871b0574, isRestart=false) at dmtcpworker.cpp:695 #9 0x00007f7386f517e7 in callbackPostCheckpoint (isRestart=0, mtcpRestoreArgvStartAddr=0x0) at mtcpinterface.cpp:235 #10 0x00007f73859b53d8 in checkpointhread (dummy=0x0) at mtcp.c:1991 #11 0x00007f7386f538f7 in pthread_start (arg=0x7f7387661248) at threadwrappers.cpp:121 #12 0x00007f7386500e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #13 0x00007f7386f53576 in clone_start (arg=0x7f7387661288) at threadwrappers.cpp:71 #14 0x00007f7386cf1154 in clone_start (arg=optimized out="") at pid_miscwrappers.cpp:100 #15 0x00007f7386809ccd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #16 0x0000000000000000 in ?? ()
Which unfortunately did not provide me with a workable point to find a patch.
The developers of DMTCP were informed once more of these findings.
Hopefully they can provide some assistance in locating the problem.
Geen opmerkingen:
Een reactie posten