Blog Archives

Looking for the terminated thread leaving loader lock orphaned

Sometimes a process hang is caused by doing something dangerous while holding the loader lock, if you can get a dump file, it is pretty easy to run !locks command to find out the locks causing the trouble. It can be a little bit complicated if one of the locking mechanism is not a critical section(for example, event or mutex), but it is still manageable by looking for similar function calls among the stacks. However today I ran into a dump file from a hung process which shows the loader lock was locked by a thread that has been terminated and being left orphaned:

0:000> !locks
CritSec ntdll!LdrpLoaderLock+0 at 00007ffa0aa3d8a8
WaiterWoken        No
LockCount          9
RecursionCount     1
OwningThread       710
EntryCount         0
ContentionCount    39
*** Locked

Scanned 76 critical sections

0:000> ~~[710]k
             ^ Illegal thread error in '~~[710]k'

Most likely, thread 710 was terminated while it was holding the loader lock. Because of this, there were many other threads in same process were being blocked and waiting for it to be unlocked, which unfortunately will never happen. Since the thread 710 no longer exists, there was no TEB, we don’t know what it was running when being terminated, and most importantly, we don’t know who terminated it. If you have worked on certain amount of crash dumps, eventually you will run into some internal NT structures, one of which is _CLIENT_ID:

0:000> dt _CLIENT_ID             
   +0x000 UniqueProcess    : Ptr64 Void
   +0x008 UniqueThread     : Ptr64 Void

It is simply a structure of process id and thread id, and it is being used by many other structures like _TEB. It has been a technique to search them in memory in order to identify the parent structure that contains them. Let’s try this here to see what we can get.

0:000> |
.  0	id: 608	examine	name: C:\Program Files\ABC\xyz.exe

0:000> s 0 L?ffffffffffffffff 00 00 00 00 08 06 00 00 00 00 00 00 10 07 00 00
00007ffa`07eb6214  00 00 00 00 08 06 00 00-00 00 00 00 10 07 00 00  ................

First we use “|” command to get process id 608, then use the “s” command to search for memory that can be interpreted as a _CLIENT_ID structure that has 608 as process id and 710 as thread id. “s” command did return an address that contains matching data, let’s find out what it is:

0:000> !address 00007ffa`07eb6214
Usage:                  Image
Base Address:           00007ffa`07eb3000
End Address:            00007ffa`07eb7000
Region Size:            00000000`00004000
State:                  00001000	MEM_COMMIT
Protect:                00000004	PAGE_READWRITE
Type:                   01000000	MEM_IMAGE
Allocation Base:        00007ffa`07dc0000
Allocation Protect:     00000080	PAGE_EXECUTE_WRITECOPY
Image Path:             C:\Windows\System32\KERNELBASE.dll
Module Name:            KERNELBASE
Loaded Image Name:      KERNELBASE.dll
Mapped Image Name:      
More info:              lmv m KERNELBASE
More info:              !lmi KERNELBASE
More info:              ln 0x7ffa07eb6214
More info:              !dh 0x7ffa07dc0000

0:000> ln 0x7ffa07eb6214
(00007ffa`07eb6200)   KERNELBASE!BaseTerminatedLoaderLockOwner+0x14   |  (00007ffa`07eb6268)   KERNELBASE!BaseDataFileHandleTableElementCount

It is very surprising, the memory found belongs to a global data structure of KERNELBASE.dll, and based on the name given by symbol, looks like Windows is tracking the threads if they were terminated when holding the loader lock. Could it save the stack as well?

0:000> dps 00007ffa`07eb6200
00007ffa`07eb6200  00000000`00000001
00007ffa`07eb6208  00000000`00000608
00007ffa`07eb6210  00000000`00000c64
00007ffa`07eb6218  00000000`00000608
00007ffa`07eb6220  00000000`00000710
00007ffa`07eb6228  00007ffa`07e64263 KERNELBASE!TerminateThread+0xaf
00007ffa`07eb6230  00007ff9`fa494995 somedll+0x4995
00007ffa`07eb6238  00007ff9`fa492421 somedll+0x2421
00007ffa`07eb6240  00007ff9`fa4924e9 somedll+0x24e9
00007ffa`07eb6248  00007ffa`0a1113d2 kernel32!BaseThreadInitThunk+0x22
00007ffa`07eb6250  00007ffa`0a9803c4 ntdll!RtlUserThreadStart+0x34
00007ffa`07eb6258  00000000`00000000
00007ffa`07eb6260  00000000`00000000
00007ffa`07eb6268  00000000`00000000
00007ffa`07eb6270  00000000`00000000
00007ffa`07eb6278  00000000`00000000

It did! After fixing symbols for somedll and re-run dps command, I received the full stack of the thread when it is calling TerminateThread when the thread being terminated was holding loader lock, it’s time to notify the team responsible for somedll.

Bonus read: Raymond has another two excellent articles about finding ghost threads causing crashing from kernel or user dumps:

How to view the stack of threads that were terminated as part of process teardown from the kernel debugger

How to view the stack of threads that were terminated as part of process teardown from user mode

Get this Pointer from 64-bit crash dump

64-bit calling convention means that under most scenarios, the first four parameters are passed using rcx, rdx, r8 and r9 registers rather than using stack. It’s difficult to find out actual parameters passed to functions in the calling stack from a 64-bit crash dump. CodeMachine has a great article X64 Deep Dive on this topic, it also talks about x64 compiler optimization, exception handling, parameter passing as well as parameter retrieval. It is a must read if crash dump analysis is part of your job.

There are lots of times we want to find out “this” pointer from the crash dump. “this” pointer is passed using rcx, and if the function accesses class members, very likely rcx will be saved into rsi or rdi after function’s prolog. I created a MFC application using VS2012 project wizard, inserted DebugBreak() at the end of CMFCApplication1App::InitInstance, built it using x64|Release configuration, ran it, save the crash dump and open it with WinDbg, this is what the initial stack looks like:

0:000> kL
Child-SP          RetAddr           Call Site
000000cc`aa65e468 000007fb`01df12d2 ntdll!NtWaitForMultipleObjects+0xa
000000cc`aa65e470 000007fb`0211d20e KERNELBASE!WaitForMultipleObjectsEx+0xe5
000000cc`aa65e750 000007fb`0211cfd2 kernel32!WerpReportFaultInternal+0x1fa
000000cc`aa65e7f0 000007fb`01e6fc87 kernel32!WerpReportFault+0x42
000000cc`aa65e820 000007fb`04db9183 KERNELBASE!UnhandledExceptionFilter+0x1d7
000000cc`aa65e920 000007fb`04ce4fea ntdll! ?? ::FNODOBFM::`string'+0x9e3
000000cc`aa65e950 000007fb`04ce464d ntdll!_C_specific_handler+0x8e
000000cc`aa65e9c0 000007fb`04ce567c ntdll!RtlpExecuteHandlerForException+0xd
000000cc`aa65e9f0 000007fb`04cc4bba ntdll!RtlDispatchException+0x392
000000cc`aa65f100 000007fb`01e9478a ntdll!KiUserExceptionDispatch+0x2e
000000cc`aa65f828 000007f7`5be74c8b KERNELBASE!DebugBreak+0x2
000000cc`aa65f830 000007f7`5c059a09 MFCApplication1!CMFCApplication1App::InitInstance+0x2cb
000000cc`aa65f920 000007f7`5c0417fc MFCApplication1!AfxWinMain+0x75
000000cc`aa65f960 000007fb`020d167e MFCApplication1!__tmainCRTStartup+0x148
000000cc`aa65f9a0 000007fb`04ce3501 kernel32!BaseThreadInitThunk+0x1a
000000cc`aa65f9d0 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

Let’s say I want to know what does MFCApplication1!CMFCApplication1App look like at this time. First let us disassemble first few instructions in CMFCApplication1App::InitInstance:

0:000> u MFCApplication1!CMFCApplication1App::InitInstance l10
MFCApplication1!CMFCApplication1App::InitInstance [d:\temp\mfcapplication1\mfcapplication1\mfcapplication1.cpp @ 73]:
000007f7`5be749c0 488bc4          mov     rax,rsp
000007f7`5be749c3 55              push    rbp
000007f7`5be749c4 488d68a1        lea     rbp,[rax-5Fh]
000007f7`5be749c8 4881ece0000000  sub     rsp,0E0h
000007f7`5be749cf 48c745affeffffff mov     qword ptr [rbp-51h],0FFFFFFFFFFFFFFFEh
000007f7`5be749d7 48895810        mov     qword ptr [rax+10h],rbx
000007f7`5be749db 48897018        mov     qword ptr [rax+18h],rsi
000007f7`5be749df 48897820        mov     qword ptr [rax+20h],rdi
000007f7`5be749e3 488b0536f72b00  mov     rax,qword ptr [MFCApplication1!__security_cookie (000007f7`5c134120)]
000007f7`5be749ea 4833c4          xor     rax,rsp
000007f7`5be749ed 48894547        mov     qword ptr [rbp+47h],rax
000007f7`5be749f1 488bf1          mov     rsi,rcx <-- rcx was saved to rsi here
000007f7`5be749f4 c745b708000000  mov     dword ptr [rbp-49h],8
000007f7`5be749fb c745bbff000000  mov     dword ptr [rbp-45h],0FFh
000007f7`5be74a02 488d4db7        lea     rcx,[rbp-49h]
000007f7`5be74a06 ff157c961f00    call    qword ptr [MFCApplication1!_imp_InitCommonControlsEx (000007f7`5c06e088)]

We know that “this” pointer of MFCApplication1!CMFCApplication1App object was passed using rcx, rcx was saved into rsi not far from the prolog, and since the crash was happening inside this function, let’s restore the crash context record and check whether we can get MFCApplication1!CMFCApplication1App pointer from rsi:

0:000> .ecxr
rax=0000000000000001 rbx=0000000000000000 rcx=000007fb04b04aaa
rdx=0000000000000000 rsi=000007f75c1418b0 rdi=000000ccaa860ba0
rip=000007fb01e9478a rsp=000000ccaa65f828 rbp=000000ccaa65f8b9
 r8=000000ccaa65f828  r9=000000ccaa65f8b9 r10=0000000000000000
r11=0000000000000202 r12=000007f75c1418b0 r13=0000000000000000
r14=000007f75c1418b0 r15=00000000ffffffff
iopl=0         nv up ei pl nz na pe nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
000007fb`01e9478a cc              int     3

0:000> dt /r MFCApplication1!CMFCApplication1App @rsi
   +0x000 __VFN_table : 0x000007f7`5c0df898 
   =000007f7`5c074d10 classCObject     : CRuntimeClass
      +0x000 m_lpszClassName  : 0x000007f7`5c074d40  "CObject"
      +0x008 m_nObjectSize    : 0n8
      +0x00c m_wSchema        : 0xffff
      +0x010 m_pfnCreateObject : (null) 
      +0x018 m_pBaseClass     : (null) 
      +0x020 m_pNextClass     : (null) 
      +0x028 m_pClassInit     : (null) 
   =000007f7`5c071698 classCCmdTarget  : CRuntimeClass
      +0x000 m_lpszClassName  : 0x000007f7`5c071808  "CCmdTarget"
      +0x008 m_nObjectSize    : 0n64
      +0x00c m_wSchema        : 0xffff
      +0x010 m_pfnCreateObject : (null) 
      +0x018 m_pBaseClass     : 0x000007f7`5c074d10 CRuntimeClass
         +0x000 m_lpszClassName  : 0x000007f7`5c074d40  "CObject"
         +0x008 m_nObjectSize    : 0n8
         +0x00c m_wSchema        : 0xffff
         +0x010 m_pfnCreateObject : (null) 
         +0x018 m_pBaseClass     : (null) 
         +0x020 m_pNextClass     : (null) 
         +0x028 m_pClassInit     : (null) 
      +0x020 m_pNextClass     : (null) 
      +0x028 m_pClassInit     : (null) 
   +0x180 m_strRegSection  : ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >
      +0x000 m_pszData        : 0x000000cc`aa851548  "Workspace"
   +0x1cc m_bHiColorIcons  : 0n1

It looks good. Of course it is completely possible that after the copying “this” pointer from rcx, rsi was changed again before the crash. The only way be sure about it is to disassemble the function from start to the crash point and read the assembly code. I made a bet here that rsi wan’t changed, and I will dump the complete structure to make sure it looks intact.

Another tricky part of 64-bit crash dump analysis is that executable images are as important as symbols. Many times WinDbg needs access to the binaries to be able to fully restore the call stack, there are lots more function information stored in 64-bit binaries than 32-bit binaries. If images are not available, it is still possible to manually construct the stack by walking through output from dps , but it is going to be very painful since rbp is no longer used as frame base pointer.