Category Archives: debug

Crash from HeapLock

One of my routine work is investigating crash dumps collected by Windows Error Reporting service (also known as Winqual). This morning I found cabs are available for some of high volume buckets, I downloaded the first one, opened it in WinDbg, corrected symbols, and here is the stack after .ecxr and kb command:

0:019> kb
  *** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr  Args to Child              
01a1ac80 74b76dee 01c70000 01a1ed04 62d54ab3 ntdll!RtlLockHeap+0x16
01a1ac8c 62d54ab3 01c70000 bfb3fcb0 01a1f050 KERNELBASE!HeapLock+0xe
01a1ed04 62d55239 01b42b71 00000200 01a1f064 MyModule!IsAddressOnHeap+0x83
...

Here is the pseudo code for MyModule!IsAddressOnHeap, it takes a parameter of void*, then walk through heaps in current process and returns whether the address might be coming from a heap.

BOOL IsAddressOnHeap(void* p) {
	HANDLE handles[4096];
	int heaps = GetProcessHeaps(sizeof(handles)/sizeof(HANDLE), handles);	
	if(heaps == 0)
		return FALSE;
	for(int i=0 ; i < heaps ; i++) {
		if (HeapLock(handles[i])) {
			__try {
				// use HeapWalk to determine whether the address is on heap
			}
			__finally {
				HeapUnlock(handles[i]);
			}
		}
	}
	return FALSE;
}

I already had a good guess about what went wrong, but let’s confirm it. First, let’s see how the execution was transferred from MyModule!IsAddressOnHeap to KERNELBASE!HeapLock:

0:019> ub 62d54ab3 
MyModule!IsAddressOnHeap+0x62
62d54a92 0f8447010000    je      MyModule!IsAddressOnHeap+0x1af (62d54bdf)
62d54a98 33db            xor     ebx,ebx
62d54a9a 895de4          mov     dword ptr [ebp-1Ch],ebx
62d54a9d 3bdf            cmp     ebx,edi
62d54a9f 0f8d34010000    jge     MyModule!IsAddressOnHeap+0x1a9 (62d54bd9)
62d54aa5 8b8c9da4bfffff  mov     ecx,dword ptr [ebp+ebx*4-405Ch]
62d54aac 51              push    ecx
62d54aad ff155ca0d662    call    dword ptr [MyModule!_imp__HeapLock (62d6a05c)]

The parameter passed to KERNELBASE!HeapLock was pushed to stack, which came from ecx, which in turn came from address at ebp+ebx*4-405Ch. Also we know that ebx is used as the index (variable i) and edi is used as the counter (variable heaps). We can get the value of registers from context record saved at the time of crash, but they might not be the same as they were in MyModule!IsAddressOnHeap considering there are two more functions (KERNELBASE!HeapLock+0xe and ntdll!RtlLockHeap+0x16) on the stack above MyModule!IsAddressOnHeap. Things gets simple in this case as these two functions didn’t run much (offsets are 0xe and 0x16), which means we can disassemble them from start to the offset to make sure ebx and edi were not changed (I did it and they were not changed). In case they were changed and since they are non-volatile registers, caller is responsible in preserving the old values (by pushing/pop them to/from stack) so they will still be retrievable. Since they are not changed in this case, let’s use the registers in crash context record to see what were in handles array:

0:019> .ecxr
eax=3b94b284 ebx=00000004 ecx=01c70000 edx=00000000 esi=01c70000 edi=00000005
eip=77481641 esp=01a1ac54 ebp=01a1ac80 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010206
ntdll!RtlLockHeap+0x16:
77481641 f7464400000001  test    dword ptr [esi+44h],1000000h ds:0023:01c70044=????????

* 01a1ed04 is ChildEBP of MyModule!IsAddressOnHeap from kb command, the output is in sync with first arg from kb
0:019> dd 01a1ed04+4*4-405c l1
01a1acb8  01c70000

* edi has value of 5, dump first 5 elements in array heaps
0:019> dd 01a1ed04-405c l5
01a1aca8  006d0000 004b0000 00690000 01a80000
01a1acb8  01c70000

Now we know what were in array heaps. Let’s take a look at the heaps in PEB:

0:019> dt _PEB @$peb
MyModule!_PEB
   ...
   +0x088 NumberOfHeaps    : 4
   +0x08c MaximumNumberOfHeaps : 0x10
   +0x090 ProcessHeaps     : 0x7754d3a0  -> 0x006d0000 Void
   ...
0:019> dd 0x7754d3a0 l4
7754d3a0  006d0000 004b0000 00690000 01a80000

So there were 4 heaps reported from PEB, while our array had 5, the first 4 were same as in PEB, the fifth one was not in PEB heaps and wasn’t in a committed page:

0:019> !address 01c70000
Usage:                  Free
Base Address:           01b48000
End Address:            5ffd0000
Region Size:            5e488000
State:                  00010000	MEM_FREE
Protect:                00000001	PAGE_NOACCESS
Type:                   <info not present at the target>

So it looks like by the time function MyModule!IsAddressOnHeap walked to the last heap, it had been destroyed and memory had been freed. We need to put HeapLock into another SEH block to catch the access violation, existing block cannot be reused as it sole purpose is to use the __finally block to unlock the heap (if it was locked successfully).

I wrote a simple test application and validated the theory: after calling HeapDestroy on a heap, calling HeapLock with same heap handle will cause access violation. HeapValidate can be used to check if the heap was valid (even with a handle that has been used with HeapDestroy), I though of calling it before HeapLock, but ditched the idea quickly due to the potential racing condition that HeapDestroy could called in-between.

Get this Pointer from 64-bit crash dump

64-bit calling convention means that under most scenarios, the first four parameters are passed using rcx, rdx, r8 and r9 registers rather than using stack. It’s difficult to find out actual parameters passed to functions in the calling stack from a 64-bit crash dump. CodeMachine has a great article X64 Deep Dive on this topic, it also talks about x64 compiler optimization, exception handling, parameter passing as well as parameter retrieval. It is a must read if crash dump analysis is part of your job.

There are lots of times we want to find out “this” pointer from the crash dump. “this” pointer is passed using rcx, and if the function accesses class members, very likely rcx will be saved into rsi or rdi after function’s prolog. I created a MFC application using VS2012 project wizard, inserted DebugBreak() at the end of CMFCApplication1App::InitInstance, built it using x64|Release configuration, ran it, save the crash dump and open it with WinDbg, this is what the initial stack looks like:

0:000> kL
Child-SP          RetAddr           Call Site
000000cc`aa65e468 000007fb`01df12d2 ntdll!NtWaitForMultipleObjects+0xa
000000cc`aa65e470 000007fb`0211d20e KERNELBASE!WaitForMultipleObjectsEx+0xe5
000000cc`aa65e750 000007fb`0211cfd2 kernel32!WerpReportFaultInternal+0x1fa
000000cc`aa65e7f0 000007fb`01e6fc87 kernel32!WerpReportFault+0x42
000000cc`aa65e820 000007fb`04db9183 KERNELBASE!UnhandledExceptionFilter+0x1d7
000000cc`aa65e920 000007fb`04ce4fea ntdll! ?? ::FNODOBFM::`string'+0x9e3
000000cc`aa65e950 000007fb`04ce464d ntdll!_C_specific_handler+0x8e
000000cc`aa65e9c0 000007fb`04ce567c ntdll!RtlpExecuteHandlerForException+0xd
000000cc`aa65e9f0 000007fb`04cc4bba ntdll!RtlDispatchException+0x392
000000cc`aa65f100 000007fb`01e9478a ntdll!KiUserExceptionDispatch+0x2e
000000cc`aa65f828 000007f7`5be74c8b KERNELBASE!DebugBreak+0x2
000000cc`aa65f830 000007f7`5c059a09 MFCApplication1!CMFCApplication1App::InitInstance+0x2cb
000000cc`aa65f920 000007f7`5c0417fc MFCApplication1!AfxWinMain+0x75
000000cc`aa65f960 000007fb`020d167e MFCApplication1!__tmainCRTStartup+0x148
000000cc`aa65f9a0 000007fb`04ce3501 kernel32!BaseThreadInitThunk+0x1a
000000cc`aa65f9d0 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

Let’s say I want to know what does MFCApplication1!CMFCApplication1App look like at this time. First let us disassemble first few instructions in CMFCApplication1App::InitInstance:

0:000> u MFCApplication1!CMFCApplication1App::InitInstance l10
MFCApplication1!CMFCApplication1App::InitInstance [d:\temp\mfcapplication1\mfcapplication1\mfcapplication1.cpp @ 73]:
000007f7`5be749c0 488bc4          mov     rax,rsp
000007f7`5be749c3 55              push    rbp
000007f7`5be749c4 488d68a1        lea     rbp,[rax-5Fh]
000007f7`5be749c8 4881ece0000000  sub     rsp,0E0h
000007f7`5be749cf 48c745affeffffff mov     qword ptr [rbp-51h],0FFFFFFFFFFFFFFFEh
000007f7`5be749d7 48895810        mov     qword ptr [rax+10h],rbx
000007f7`5be749db 48897018        mov     qword ptr [rax+18h],rsi
000007f7`5be749df 48897820        mov     qword ptr [rax+20h],rdi
000007f7`5be749e3 488b0536f72b00  mov     rax,qword ptr [MFCApplication1!__security_cookie (000007f7`5c134120)]
000007f7`5be749ea 4833c4          xor     rax,rsp
000007f7`5be749ed 48894547        mov     qword ptr [rbp+47h],rax
000007f7`5be749f1 488bf1          mov     rsi,rcx <-- rcx was saved to rsi here
000007f7`5be749f4 c745b708000000  mov     dword ptr [rbp-49h],8
000007f7`5be749fb c745bbff000000  mov     dword ptr [rbp-45h],0FFh
000007f7`5be74a02 488d4db7        lea     rcx,[rbp-49h]
000007f7`5be74a06 ff157c961f00    call    qword ptr [MFCApplication1!_imp_InitCommonControlsEx (000007f7`5c06e088)]

We know that “this” pointer of MFCApplication1!CMFCApplication1App object was passed using rcx, rcx was saved into rsi not far from the prolog, and since the crash was happening inside this function, let’s restore the crash context record and check whether we can get MFCApplication1!CMFCApplication1App pointer from rsi:

0:000> .ecxr
rax=0000000000000001 rbx=0000000000000000 rcx=000007fb04b04aaa
rdx=0000000000000000 rsi=000007f75c1418b0 rdi=000000ccaa860ba0
rip=000007fb01e9478a rsp=000000ccaa65f828 rbp=000000ccaa65f8b9
 r8=000000ccaa65f828  r9=000000ccaa65f8b9 r10=0000000000000000
r11=0000000000000202 r12=000007f75c1418b0 r13=0000000000000000
r14=000007f75c1418b0 r15=00000000ffffffff
iopl=0         nv up ei pl nz na pe nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
KERNELBASE!DebugBreak+0x2:
000007fb`01e9478a cc              int     3

0:000> dt /r MFCApplication1!CMFCApplication1App @rsi
   +0x000 __VFN_table : 0x000007f7`5c0df898 
   =000007f7`5c074d10 classCObject     : CRuntimeClass
      +0x000 m_lpszClassName  : 0x000007f7`5c074d40  "CObject"
      +0x008 m_nObjectSize    : 0n8
      +0x00c m_wSchema        : 0xffff
      +0x010 m_pfnCreateObject : (null) 
      +0x018 m_pBaseClass     : (null) 
      +0x020 m_pNextClass     : (null) 
      +0x028 m_pClassInit     : (null) 
   =000007f7`5c071698 classCCmdTarget  : CRuntimeClass
      +0x000 m_lpszClassName  : 0x000007f7`5c071808  "CCmdTarget"
      +0x008 m_nObjectSize    : 0n64
      +0x00c m_wSchema        : 0xffff
      +0x010 m_pfnCreateObject : (null) 
      +0x018 m_pBaseClass     : 0x000007f7`5c074d10 CRuntimeClass
         +0x000 m_lpszClassName  : 0x000007f7`5c074d40  "CObject"
         +0x008 m_nObjectSize    : 0n8
         +0x00c m_wSchema        : 0xffff
         +0x010 m_pfnCreateObject : (null) 
         +0x018 m_pBaseClass     : (null) 
         +0x020 m_pNextClass     : (null) 
         +0x028 m_pClassInit     : (null) 
      +0x020 m_pNextClass     : (null) 
      +0x028 m_pClassInit     : (null) 
   ...
   +0x180 m_strRegSection  : ATL::CStringT<wchar_t,StrTraitMFC<wchar_t,ATL::ChTraitsCRT<wchar_t> > >
      +0x000 m_pszData        : 0x000000cc`aa851548  "Workspace"
   ...
   +0x1cc m_bHiColorIcons  : 0n1

It looks good. Of course it is completely possible that after the copying “this” pointer from rcx, rsi was changed again before the crash. The only way be sure about it is to disassemble the function from start to the crash point and read the assembly code. I made a bet here that rsi wan’t changed, and I will dump the complete structure to make sure it looks intact.

Another tricky part of 64-bit crash dump analysis is that executable images are as important as symbols. Many times WinDbg needs access to the binaries to be able to fully restore the call stack, there are lots more function information stored in 64-bit binaries than 32-bit binaries. If images are not available, it is still possible to manually construct the stack by walking through output from dps , but it is going to be very painful since rbp is no longer used as frame base pointer.