Looking for the terminated thread leaving loader lock orphaned
Sometimes a process hang is caused by doing something dangerous while holding the loader lock, if you can get a dump file, it is pretty easy to run !locks command to find out the locks causing the trouble. It can be a little bit complicated if one of the locking mechanism is not a critical section(for example, event or mutex), but it is still manageable by looking for similar function calls among the stacks. However today I ran into a dump file from a hung process which shows the loader lock was locked by a thread that has been terminated and being left orphaned:
0:000> !locks CritSec ntdll!LdrpLoaderLock+0 at 00007ffa0aa3d8a8 WaiterWoken No LockCount 9 RecursionCount 1 OwningThread 710 EntryCount 0 ContentionCount 39 *** Locked Scanned 76 critical sections 0:000> ~~k ^ Illegal thread error in '~~k'
Most likely, thread 710 was terminated while it was holding the loader lock. Because of this, there were many other threads in same process were being blocked and waiting for it to be unlocked, which unfortunately will never happen. Since the thread 710 no longer exists, there was no TEB, we don’t know what it was running when being terminated, and most importantly, we don’t know who terminated it. If you have worked on certain amount of crash dumps, eventually you will run into some internal NT structures, one of which is _CLIENT_ID:
0:000> dt _CLIENT_ID combase!_CLIENT_ID +0x000 UniqueProcess : Ptr64 Void +0x008 UniqueThread : Ptr64 Void
It is simply a structure of process id and thread id, and it is being used by many other structures like _TEB. It has been a technique to search them in memory in order to identify the parent structure that contains them. Let’s try this here to see what we can get.
0:000> | . 0 id: 608 examine name: C:\Program Files\ABC\xyz.exe 0:000> s 0 L?ffffffffffffffff 00 00 00 00 08 06 00 00 00 00 00 00 10 07 00 00 00007ffa`07eb6214 00 00 00 00 08 06 00 00-00 00 00 00 10 07 00 00 ................
First we use “|” command to get process id 608, then use the “s” command to search for memory that can be interpreted as a _CLIENT_ID structure that has 608 as process id and 710 as thread id. “s” command did return an address that contains matching data, let’s find out what it is:
0:000> !address 00007ffa`07eb6214 ... Usage: Image Base Address: 00007ffa`07eb3000 End Address: 00007ffa`07eb7000 Region Size: 00000000`00004000 State: 00001000 MEM_COMMIT Protect: 00000004 PAGE_READWRITE Type: 01000000 MEM_IMAGE Allocation Base: 00007ffa`07dc0000 Allocation Protect: 00000080 PAGE_EXECUTE_WRITECOPY Image Path: C:\Windows\System32\KERNELBASE.dll Module Name: KERNELBASE Loaded Image Name: KERNELBASE.dll Mapped Image Name: More info: lmv m KERNELBASE More info: !lmi KERNELBASE More info: ln 0x7ffa07eb6214 More info: !dh 0x7ffa07dc0000 0:000> ln 0x7ffa07eb6214 (00007ffa`07eb6200) KERNELBASE!BaseTerminatedLoaderLockOwner+0x14 | (00007ffa`07eb6268) KERNELBASE!BaseDataFileHandleTableElementCount
It is very surprising, the memory found belongs to a global data structure of KERNELBASE.dll, and based on the name given by symbol, looks like Windows is tracking the threads if they were terminated when holding the loader lock. Could it save the stack as well?
0:000> dps 00007ffa`07eb6200 00007ffa`07eb6200 00000000`00000001 00007ffa`07eb6208 00000000`00000608 00007ffa`07eb6210 00000000`00000c64 00007ffa`07eb6218 00000000`00000608 00007ffa`07eb6220 00000000`00000710 00007ffa`07eb6228 00007ffa`07e64263 KERNELBASE!TerminateThread+0xaf 00007ffa`07eb6230 00007ff9`fa494995 somedll+0x4995 00007ffa`07eb6238 00007ff9`fa492421 somedll+0x2421 00007ffa`07eb6240 00007ff9`fa4924e9 somedll+0x24e9 00007ffa`07eb6248 00007ffa`0a1113d2 kernel32!BaseThreadInitThunk+0x22 00007ffa`07eb6250 00007ffa`0a9803c4 ntdll!RtlUserThreadStart+0x34 00007ffa`07eb6258 00000000`00000000 00007ffa`07eb6260 00000000`00000000 00007ffa`07eb6268 00000000`00000000 00007ffa`07eb6270 00000000`00000000 00007ffa`07eb6278 00000000`00000000
It did! After fixing symbols for somedll and re-run dps command, I received the full stack of the thread when it is calling TerminateThread when the thread being terminated was holding loader lock, it’s time to notify the team responsible for somedll.
Bonus read: Raymond has another two excellent articles about finding ghost threads causing crashing from kernel or user dumps: