Reading Notes of Software Debug
How breakpoint works
When user asks debugger to set a breakpoint, debugger will save a copy of the first byte of the instruction, and replaces it with a INT 3 (0xCC). When the code gets executed to there, INT 3 will make CPU to save the current thread context including pushing value of registers into stack, and call the registered exception handler nt!KiTrap03, which in turn will reduce decrement saved EIP value by 1 and enter into kernel mode eventually. The kernel will find out that the process owns the exception has a DebugPort not of zero, then it will send the event to the user mode debugger, and waiting for response from debugger. When user asked debugger to continue execution, debugger will restore the first byte of the instruction and nt!KiTrap03 will eventually use IRET/IRETD instruction to ask CPU to restore thread context and register and continue from there. Sometimes debugger may perform a single step and set INT 3 again.
For data access breakpoints, windbg provides command ba, e.g. ba w4 <address> will invoke debugger whenever the first 4 bytes starting from <address> was being modified. ba command as well as VS2005’s data access breakpoints were implemented using hardware supported by CPU using debug registers (D0-D7).
How does debugger attach to the debuggee process
If the process being debugged was launched by debugger, debugger will set debug flag when it calls CreateProcess(). When the process (and its first thread) being created, it will initialize the debug object in kernel and setup debug session with the bugger.
If debugger wants to attach to a process that is already running, it has several options:
1. Debugger can call DebugBreakProcess(), which will try to create a remote thread in the target process using entry ntdll!DbgUiRemoteBreakin, inside which it will call DebugBreakPoint to break into debugger. Note that this will not work if some of the target process’ threads are already deadlocked at the loader critical section, since the newly created thread needs to be able to call and return from dllmain() which is impossible.
2. Debugger can also call ntdll!RtlRemoteCall with entry kernel32!BaseAttachCompleteThunk, RtlRemoteCall will suspend remote thread, modify thread context to point EIP to kernel32!BaseAttachCompleteThunk, and resume target thread. kernel32!BaseAttachCompleteThunk will beak the process into debugger. If the target thread was being held in kernel, kernel32!BaseAttachCompleteThunk will only be called when ntdll!KiFastSystemCallRet returns.
3. If remote threads cannot return from kernel or new remote thread cannot be started, debugger can also break in by suspending all remote threads and a special wake debugger event will be generated and delivered to debugger. However continue or single step is not possible in this case.
This is a relative expensive API. When a process calls OutputDebugString, it will first try to raise an exception with special exception code. If there was debugger attached to the process, the exception will be handled by debugger and string will be displayed in the debugger. If nobody handles the exception, OutputDebugString will check whether there is debug string monitor (e.g. DBWIN, Debug View etc.) present. If yes, it will use mapped memory to send data to the monitor. If there was no monitor was present, OutputDebugString will eventually call DbgPrint.
When exception occurs, CPU will look for exception handlers (KiTrapXX routines) in IDT, the exception handler routines will call CommonDispatchException() to prepare necessary parameters and then call KiDispatchException() to dispatch the exception. For each exception, KiDispatchExcetion() will try twice and debugger will be tried first each time, with looking for exception handlers in between. For user mode exception, KiDispatchException() will modify TrapFrame to call KiUserExceptionDispatcher() when control returned from kernel mode to user mode, KiUserExceptionDispatcher() will call ntdll!RtlDispatchException() to look for the list of exception handlers. For kernel mode exception, KiDispatchException() calls NTOSKRNL!RtlDispatchException() to look for the list of exception handlers in kernel stack. The head of the list is in the thread information block(TIB). Before each exception handler is called, system may need to perform global and local stack unwinding to restore the stack to the state that is correct to the handler. Windows XP introduced user-mode only vectored exception handling (VEH) which is API based instead of frame based structured exception handling (SEH).
OS will put exception handlers in BaseProcessStart() and BaseThreadStart. Eventually unhandled exceptions will be caught by kernel32!UnhandledExceptionFilter(), which is responsible for calling program registered unhandled exception handler, collecting error information, reporting error, and launching JIT or notice user of error. The implementation of UnhandledExceptionFilter() differs with each operating system, e.g. Vista will start a new process (WerFault.exe) to collect dumps rather than do it from original process which may be in an unstable state. MSVCRT will call kernel32!SetUnhandledExceptionFilter() during CRT initialization to process C++ exceptions which exception code is 0xe06d7363 (ASCII .msc). kernel32!UnhandledExceptionFilter() will check whether debugger is present before calling user-defined unhandled exception handlers, which makes it hard to debug those handlers. One workaround is use debugger to set breakpoint at 0x77e99be0 inside kernel32!NtQueryInformationProcess() (it is called by kernel32!UnhandledExceptionFilter), and use command “ed [ebp-20] 0” to change local variable for DebugPort stored at [ebp-20] just before the cmp instruction.
User mode applications can use NtRaiseHardError() to display an application error dialog. NtRaiseHardError() will go into kernel mode, which in turn will call ExRaiseHardError(), through which LPC is used to deliver the event back into user mode CSRSS.exe, winsrv!UserHardErrorEx() will process the event and call MessageBoxTimeout() in an appreciate desktop. Kernel mode modules can use functions like IopRaiseHardError() and IoRaiseInformationalHardError() to raise hard error request, one example is “Diskette required” message. If the faulting process is CSRSS itself, a bugcheck will be generated.
Both Application Verifier and Driver Verifier are using import address table hooking to make applications or drivers call verifier support routines first. Those support routines are implemented in ntdll.dll and ntoskrnl.exe respectively. NT loader is responsible to replace the functions in IAT with those support routines.
Each process has following important characters, pid, EPROCESS (executive process block), kernel handle table, dir base, PEB (process environment block) and access token. User can use !process <address of EPROCESS> to view those information. EPROCESS exists in kernel address space, while PEB exists in mapped process address space.
Most threads started from kernel mode KiThreadStartup and PspUserThreadStartup, to execute user mode initialization code, they insert an APC into APC queue of corresponding thread which will execute ntdll!LdrpInitialize. Except kernel system threads, each thread has user stack and kernel stack, which information can be obtained from ETHREAD/KTHREAD and TEB/NT_TIB. All threads were created as non-GUI threads first, with fixed kernel stack size. Once they were converted to GUI threads, their kernel stack will be swapped to large kernel stack which can grow in units of size of a page. For user stack, the reserved size and initial committing size will be specified by PE header. By default programs built by MSVC reserve 1MB stack and increase in size of page each time with one additional guard page. CALL instruction will push EIP into stack, load address into EIP and start execution from there. RET n instruction is used to pop the value from stack into EIP, increase ESP by n bytes (to clean stack if this is a C convention call), and start execution from there.
Applications can use several undocumented functions implemented in ntdll.dll to collect back trace and later use dbghelp.dll to resolve symbols:
Default heap of each process can be obtained from PEB. Each heap has at least one and up to 64 segments. Heap information is in HEAP structure which is at beginning of segment 0. Each segment’s information is stored at HEAP_SEGMENT which is located at beginning of the segment (except segment 0), and each heap segment can have multiple heap entries. Each heap entry (as well as heap structures) begins with a 8 bytes HEAP_ENTRY structure, first two bytes of which contains size of the entry in units of heap granularity. With heap granularity of 8 bytes, this means each heap entry can only be up to 512KB (actual limit is 508KB as 4KB is reserved for entry information). For oversized allocation request, heap manager will allocate virtual memory directly and keep pointers in a linked list. In the HEAP structure, Segments is an array with each element is a pointer to a HEAP_SEGMENT, and LastSegmentIndex is index to the last element in Segments. FreeLists is an array of 128 pointers to free heap entries. For each memory allocated by HeapAlloc, there is a structure HEAP_ENTRY just before the returned address (with exception for pageheap, the returned user buffer will be aligned just before the guard page to make it easier to discover memory overrun). For free entries, it actually begins with structure HEAP_FREE_ENTRY which first 8 bytes are identical to HEAP_ENTRY, and it has additional 8 bytes of LIST_ENTRY pointing to listed list of free entries.
There are several calling conventions existing on x86 systems. C calls (__cdecl) pass parameters from right to left and caller cleans stack; standard calls (__stdcall) also pass parameters from right to left but callee cleans stack; Fast calls (__fastcall) pass first two parameters (they must be integers of 32-bit or smaller) using ECX and EDX, and pass other parameters using stack, callee cleans stack; this calls with fixed parameters will use ECX for this pointer and the left is similar with __stdcall; this calls with variable parameters will use __cdecl and this pointer will be pushed to stack after all parameters. x64 just has one convention which is similar with __fastcall, it uses registers for first four parameters and left parameters will use stack, caller is responsible for cleaning stack.
For return values, compiler may use one of following ways on x86:
If return value is a structure or class, compiler will generate a temporary variable and pass it as a hidden parameter to the callee; callee will copy return value to this hidden parameter and its address will be set to EAX;
Useful WinDbg Commands
|!idt <number>||View the interrupt vector handler|
|!session||Display sessions and/or change session context|
|!sprocess||Display process list for specified session|
|!process 0 0||Display brief information of running processes|
|!process <EPROCESS>||Display information about a process using EPROCESS|
|.process <EPROCESS>||Set process context|
|!thread 0 0||Display threads in current process|
|!thread <ETHREAD>||Display specified thread information|
|.thread <ETHEAD>||Set thread context|
|.frame <frame number>||Set frame context|
|dv /i /v /t||Display parameters and local variables with type and address information|
|x /t /v <module>!<symbol>||Display symbol information|
|!peb||Display current process’s environment block|
|!teb||Display current thread’s environment block|
|!token <token address>||Display token information|
|!handle 0 0 <pid or EPROCESS>||Display handles of specified process|
|t/p||Single step trace|
|bp/bu/bm||Setup software breakpoints, /p and /t can be used limit process and thread context|
|ba||Setup hardware breakpoints|
|dt <module>!<symbol> <address>||Display data with type information|
|s||Search in memory|
|!address/!vprot <address>||Display address information|
|dl/!list||Display list, dt can also be used (reference page 990)|
- struct LIST_ENTRY/SINGLE_LIST_ENTRY
- Kernel mutex vs. fast mutex
Kernel Mutex Fast Mutex Can re-enter from same thread Cannot re-enter from same thread Special kernel APC can be delivered No APC will be delivered unless XxxUnsafe versions were used Can be used by wait functions Cannot be used with wait functions
- KeEnterCriticalRegion (prevents delivering normal kernel and user mode APC but not special kernel APC)