A Fight With ETW and NT Kernel Logger

I am quite familiar with Windows Performance Toolkit and know well on how to use Xperf/XbootMgr to collect events from NT kernel logger and use XperfView/WPA to analyze the result. I have always thought about putting similar functionality of Xperf into our product so that we can control NT kernel logging session programmatically without installing WPT, this will help us collecting valuable information from customers when trouble shooting issues. Since we are already using WPP in our product and I am familiar with ETW already, I thought it shouldn’t be difficult to do it. Little did I know how wrong it is and how bumpy the road will be.

Starting NT kernel logging session

It hasn’t been smooth since beginning. Since my goal is to control NT kernel logging session programmatically, the sample code from MSDN is a good start. Compiled it, copied the executable to a Windows 7 virtual machine, ran it under a command prompt with admin, and… error code 1313,  “a specified privilege does not exist”, nice. I went back to MSDN library and it doesn’t say anything about privilege. Remembering before calling ExitWindowsEx or InitiateSystemShutdown, process token needs to be adjusted to enable SE_SHUTDOWN_NAME, I wonder whether this is same case. I ran Process Explorer and looked through the privileges that are in the process token but disabled by default, and noticed “SeSystemProfilePrivilege” quickly:

SeSystemProfilePrivilege

Do we need this privilege to start NT kernel logging session? Let’s give it a try:

HANDLE hToken = 0;
if (OpenProcessToken(GetCurrentProcess(), TOKEN_ADJUST_PRIVILEGES | TOKEN_QUERY, &hToken))
{
    TOKEN_PRIVILEGES tkp;
    memset(&tkp, 0, sizeof(tkp));
    tkp.PrivilegeCount = 1;
    tkp.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED;
    LookupPrivilegeValue(NULL, SE_SYSTEM_PROFILE_NAME, &tkp.Privileges[0].Luid);

    AdjustTokenPrivileges(hToken, FALSE, &tkp, 0, (PTOKEN_PRIVILEGES)NULL, 0);
    ::CloseHandle(hToken);
}

After adding above code before calling StartTrace(), the sample worked. And by the way, in case the NT kernel logging session has been started, StartTrace() will return error code ERROR_ALREADY_EXISTS and you need to call ControlTrace() with last parameter set to EVENT_TRACE_CONTROL_UPDATE to update the session with the information you provided through the properties.

Believe me, this turns out to be the easiest problem I ran into, actually I might not even call it a problem.

Stopping the NT kernel logging session

Stopping the trace is as simple as calling ControlTrace() with last parameter set to EVENT_TRACE_CONTROL_STOP. Now I can start and stop NT kernel logging session and open the generated ETL file with XperfView.

Enable stackwalk

On Windows Vista SP1 and above, certain kernel events can provide stacks along with the event, this feature can be turned on by Xperf using –stackwalk option. You can run “xperf –help stackwalk” to view all parameters supported. How do we enable stack walk programmatically?

There is an article “What’s New in Event Tracing” on MSDN mentioned:

  • The ability to capture a stack trace for events. To enable stack tracing for kernel events, see the TraceSetInformation function.

That looks promising. Few minutes later I came along with following code to enable stack walk on events of profiling, cswitch and readythread:

CLASSIC_EVENT_ID classes[] = {  {PerfInfoGuid, 46, {0}},
                                {Thread_V2_Guid, 36, {0}},
                                {Thread_V2_Guid, 50, {0}}};
TraceSetInformation(hSessionNT, TraceStackTracingInfo, classes, sizeof(classes);

The GUIDs here can be found in NT Kernel Logger Constants from MSDN, the available classes can be found from MSNT_SystemTrace class, and you can get the type value for cswitch and readythread from Thread_V2 class. Yes, there are some clicks needed to get what you are looking for.

Why the stacks are all “?|?”

After put the call to TraceSetInformation into the code, I got the ETL file with stack enabled for profiling. I was excited until I opened the ETL file with XperfView:

not_good_stack

Oops, where are the module and function names? Half an hour later, I realized that I need to have EVENT_TRACE_FLAG_IMAGE_LOAD in the enabled flags to capture module information. Now it’s looking better:

better_stack

Then quickly I realized that I ran into another problem: I cannot get the name of functions, even if I have setup symbol environment path correctly and asked XperfView to load symbols. I ran Xperf on same machine with same flags and there was no problem, functions are displayed correctly for ETL file from Xperf.

Few hours later, I decided to use “xperf -i input.etl –o output.txt” to dump raw content from both ETL files and compared them manually. One thing caught my attention: the raw content from Xperf has a section named “FileVersion”, it contains information like full path, md5, version number etc. of all binaries, and it is completely missing from the raw content exported from ETL file generated by programmatically controlled NT kernel logging session. Without these image information Xperf cannot load symbols and show correct stack.

How did Xperf put the image information into its ETL file? I ran some trace tool and found out that if no “-f” option was specified in the command line when start tracing, by default Xperf will configure NT kernel logging session to write events into “\kernel.etl”. When Xperf is called again with command line parameter “-d output.etl”, it will stop NT kernel logging session first, then open “\kernel.etl”, and replay all recorded events and rewrite them to output.etl, image information will be merged along the way. The proof? I set a breakpoint on kernel32!CreateFileW and launched Xperf with option “-d output.etl”, after few hits, I got following stack (column of ChildEBP and RetAddr were removed):

001fd594 80000000 00000005 kernel32!CreateFileW
002f9c78 001fd7d0 00000020 kernel32!BasepLoadLibraryAsDataFileInternal+0x113
00000002 00000000 00000020 kernel32!LoadLibraryExW+0x1fb
00000000 002d0388 00000001 VERSION!GetFileVersionInfoSizeExW+0x30
002d0388 0000030c 00000000 perfctrl!SymGetOriginalFilename+0x37
002d0388 0000030c da0e5062 perfctrl!SymGetEstimatedOriginalFilename+0x14
002d0388 0036f5e8 00373b2e perfctrl!_Internal_CvDbgInfoFromImage+0x265
002d0388 00373b2e 00efc718 perfctrl!CvDbgInfoFromImage+0x7e
00000004 002d39d8 00efd5c0 perfctrl!Microsoft::Windows::Performance::CImageIdEventTraceExtender::AddImageIdEvents+0x1ed
00efc120 002d39d8 00000020 perfctrl!Microsoft::Windows::Performance::CImageIdEventTraceExtender::OnBeforeEvent+0x67
00efd380 002d39d8 00000020 perfctrl!Microsoft::Windows::Performance::CEventTraceExtenderBase::OnBeforeEvent+0x1e
00efd380 002d39d8 00000020 perfctrl!Microsoft::Windows::Performance::CEventMetadataEventTraceExtender::OnBeforeEvent+0x53d
00efd5c0 002d39d8 00000020 perfctrl!Microsoft::Windows::Performance::CEventTraceExtenderBase::OnBeforeEvent+0x1e
00efd5c0 002d39d8 00000020 perfctrl!Microsoft::Windows::Performance::CPerfTrackMetadataEventTraceExtender::OnBeforeEvent+0xb8
00f01940 002d39d8 00000020 perfctrl!Microsoft::Windows::Performance::CEventTraceExtenderBase::OnBeforeEvent+0x1e
00f01940 002d39d8 00000020 perfctrl!Microsoft::Windows::Performance::CWinSATEventTraceExtender::OnBeforeEvent+0x3e
00effaf8 002d39d8 00000020 perfctrl!Microsoft::Windows::Performance::CEventTraceExtenderBase::OnBeforeEvent+0x1e
00f01990 002d39d8 00000020 perfctrl!Microsoft::Windows::Performance::CEventTraceExtenderBase::OnBeforeEvent+0x1e
00f01990 002d39d8 00000020 perfctrl!Microsoft::Windows::Performance::CVolumeMappingEventTraceExtender::OnBeforeEvent+0x5c
da0e60dd 002d3b80 002d39c8 perfctrl!Microsoft::Windows::Performance::CEventTraceRelogger::ETW_EventCallback_OtherEvents+0xb8
002f7b78 002d39d8 00000000 ADVAPI32!EtwpDoEventCallbacks+0x3a
001feda4 001feca4 00000001 ADVAPI32!EtwpProcessTraceLogEx+0x9b
001feda4 011feca4 00000001 ADVAPI32!EtwpProcessTraceLog+0x113
002f7ef0 00000001 00000001 ADVAPI32!ProcessTrace+0x2bc
6e04299b da0e656a 00000000 perfctrl!Microsoft::Windows::Performance::CEtwTraceHandleGroup::ProcessTraces+0x60
da0e656a 00000000 00f01a30 perfctrl!Microsoft::Windows::Performance::CEventTraceRelogger::HrProcessTrace+0x5
00000000 001ff138 6e043bac perfctrl!Microsoft::Windows::Performance::CEventTraceRelogger::InternalProcessTraces+0xa3d
00f01a30 0006460a 00000001 perfctrl!Microsoft::Windows::Performance::CEventTraceRelogger::ProcessTraces+0x1d
00000001 002c1fd8 0006460a perfctrl!MergeEtlEx+0x33d
002c1fd8 00000001 000fffff perfctrl!MergeLogFiles+0x23
001ff2e8 00000001 0006460a perfctrl!CreateMergedETLFileEx+0x53
001ff2e8 00000001 0006460a perfctrl!CreateMergedETLFile+0x19
00065f48 da103f37 00000000 xperf!CStopTrace::Execute+0x3d8
001ff974 01223347 00000003 xperf!wmain+0x24b
7ffdf000 001ff9c0 775519bb msvcrt!_initterm+0x13
7ffdf000 77516c4f 00000000 kernel32!BaseThreadInitThunk+0xe
012233b4 7ffdf000 00000000 ntdll!__RtlUserThreadStart+0x23
012233b4 7ffdf000 00000000 ntdll!_RtlUserThreadStart+0x1b
eax=001fd56c ebx=c0150008 ecx=00000000 edx=001fd52c esi=00000000 edi=7ffde000
eip=75ddaecb esp=001fd53c ebp=001fd7a0 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000202
kernel32!CreateFileW:
75ddaecb 8bff            mov     edi,edi

0:000> du 001fd594
001fd594  “C:\Windows\system32\ntkrnlpa.exe

It looks like Xperf called into perfctrl.dll to do some post-trace merging work, and that’s when images information was saved into the ETL. The very first function in the stack belong to perfctrl.dll is CreateMergeETLFile(), and it is actually exported from that DLL. Can we call this function directly? Let’s use IDA Pro to load perfctrl.dll and get the signature of that function:

CreateMergedETLFile

typedef unsigned long (__stdcall *PFN_CreateMergedETLFile)(unsigned short const * * const,unsigned long,unsigned short const *);

It looks like all the parameters are straightforward enough (even though I am not sure why the first one is pointer to pointer) and actually we can call it directly from our code:

typedef unsigned long (__stdcall *PFN_CreateMergedETLFile)(unsigned short const * * const,unsigned long,unsigned short const *);

#ifndef _WIN64
LPCSTR FNNAME_CreateMergeETLFile = “?CreateMergedETLFile@@YGKQAPBGKPBG@Z”;
#else
LPCSTR FNNAME_CreateMergeETLFile = “?CreateMergedETLFile@@YAKQEAPEBGKPEBG@Z”;
#endif

HMODULE hPerfCtrl = LoadLibrary(pszPerfCtrlDll);

PFN_CreateMergedETLFile pfn = (PFN_CreateMergedETLFile)GetProcAddress(hPerfCtrl, FNNAME_CreateMergeETLFile);
if(NULL != pfn)
{

   LPCWSTR pszPathTraceNT = <ETL file specified in NT kernel logging session>;
   (*pfn)((unsigned short const * *)&pszPathTraceNT, 1, (unsigned short const *)pszPathTraceNTMerged);

}

FreeLibrary(hPerfCtrl);

I put above code in, and it worked perfectly. Now I get full stack information, and in the Image Summary Table, the column Checksum now has data which was previously missing.

good_stack

I will take a further look into how to replay recorded events in an ETL file when having time. It probably requires the program to act as consumer and provider at same time when transferring events from one ETL file to another.

Stack walk on Vista

I am aware that TraceSetInformation is only available on Windows 7. The fact that Xperf can collect stack trace on Vista shows that this feature has been built into the Vista kernel, and just there is no API to enable it. I decided to do some debugging to find out how Xperf did it.

I used WinDbg to launch Xperf and start a NT kernel logging session, breakpoints were set on all relevant ETW APIs, and only advapi32!StartTraceW was hit. So this is the only API Xperf called on Vista and actually Xperf passes some very interesting value in the parameters if stack walk was enabled:

00184ec0 00184cb8 0037ece8 ADVAPI32!StartTraceW
00000001 01217e7c 25dab231 perfctrl!KernelLoggingSession::ControlTracing+0x275
25dab231 001848d0 00000005 perfctrl!KernelLoggingSession::StartTracing+0x7
00000000 00000005 00000003 xperf!CLoggerSettings::Start+0x4ac
00185ce8 25dab2e1 00000000 xperf!CStartTrace::Execute+0x34
0029f88c 01223347 00000005 xperf!wmain+0x24b
7ffd5000 0029f8d8 775519bb msvcrt!_initterm+0x13
7ffd5000 7743de91 00000000 kernel32!BaseThreadInitThunk+0xe
012233b4 7ffd5000 00000000 ntdll!__RtlUserThreadStart+0x23
012233b4 7ffd5000 00000000 ntdll!_RtlUserThreadStart+0x1b

0:000> dt /r EVENT_TRACE_PROPERTIES 0037ece8
ole32!EVENT_TRACE_PROPERTIES
   +0x000 Wnode            : _WNODE_HEADER
      +0x000 BufferSize       : 0x5bc
      +0x004 ProviderId       : 0
      +0x008 HistoricalContext : 0
      +0x008 Version          : 0
      +0x00c Linkage          : 0
      +0x010 CountLost        : 0
      +0x010 KernelHandle     : (null)
      +0x010 TimeStamp        : _LARGE_INTEGER 0x0
         +0x000 LowPart          : 0
         +0x004 HighPart         : 0n0
         +0x000 u                : _LARGE_INTEGER::<unnamed-tag>
         +0x000 QuadPart         : 0n0
      +0x018 Guid             : _GUID {9e814aad-3204-11d2-9a82-006008a86939}
         +0x000 Data1            : 0x9e814aad
         +0x004 Data2            : 0x3204
         +0x006 Data3            : 0x11d2
         +0x008 Data4            : [8]  “???”
      +0x028 ClientContext    : 1
      +0x02c Flags            : 0x20000
   +0x030 BufferSize       : 0x40
   +0x034 MinimumBuffers   : 0x40
   +0x038 MaximumBuffers   : 0x140
   +0x03c MaximumFileSize  : 0
   +0x040 LogFileMode      : 1
   +0x044 FlushTimer       : 0
   +0x048 EnableFlags      : 0x80ff0488
   +0x04c AgeLimit         : 0n15
   +0x050 NumberOfBuffers  : 0
   +0x054 FreeBuffers      : 0
   +0x058 EventsLost       : 0
   +0x05c BuffersWritten   : 0
   +0x060 LogBuffersLost   : 0
   +0x064 RealTimeBuffersLost : 0
   +0x068 LoggerThreadId   : (null)
   +0x06c LogFileNameOffset : 0x280
   +0x070 LoggerNameOffset : 0x78

First, the buffer size is unusually big, and secondly, the enabled flags has 0x80000000 set, which is not documented. Evntrace.h has its definition as following which doesn’t really help:

#define EVENT_TRACE_FLAG_EXTENSION          0x80000000  // Indicates more flags

A dissemble of  ADVAPI32!StartTraceW shows that it called another function ADVAPI32!EtwpCheckFlagExtensions internally:

ADVAPI32!StartTraceW+0x243:
76246632 0fb745ec        movzx   eax,word ptr [ebp-14h]
76246636 0fb7ff          movzx   edi,di
76246639 8945e0          mov     dword ptr [ebp-20h],eax
7624663c 03c7            add     eax,edi
7624663e 8d8400b4000000  lea     eax,[eax+eax+0B4h]
76246645 894510          mov     dword ptr [ebp+10h],eax
76246648 8d4510          lea     eax,[ebp+10h]
7624664b 50              push    eax
7624664c 53              push    ebx
7624664d e8e0fcffff      call    ADVAPI32!EtwpCheckFlagExtensions (76246332)
76246652 85c0            test    eax,eax
76246654 8945fc          mov     dword ptr [ebp-4],eax
76246657 0f85d4010000    jne     ADVAPI32!StartTraceW+0x486 (76246831)

Step into it showed following code:

762841b5 0f8499000000    je      ADVAPI32!EtwpCheckFlagExtensions+0xbd (76284254)
762841bb 0fb74148        movzx   eax,word ptr [ecx+48h]   ds:0023:0037ed30=0488
762841bf 6685c0          test    ax,ax
762841c2 0f848c000000    je      ADVAPI32!EtwpCheckFlagExtensions+0xbd (76284254)

It turns out that if EVENT_TRACE_FLAG_EXTENSION was set in enabled flags, ADVAPI32!StartTrace will know that there are undocumented additional data in the Properties parameter which has a minimum size requirement of 0x488. If you call ADVAPI32!StartTrace with EVENT_TRACE_FLAG_EXTENSION set in enabled flags but didn’t supply additional data, ADVAPI32!EtwpCheckFlagExtensions will catch it fail the call with error code 87  (invalid argument):

ADVAPI32!EtwpCheckFlagExtensions+0xbd:
76284254 6a57            push    57h
76284256 58              pop     eax
76284257 e9f020fcff      jmp     ADVAPI32!EtwpCheckFlagExtensions+0x9a (7624634c)

It looks like Xperf used some undocumented structure and flag on Windows Vista to enable stack walk in kernel event tracing. This leaves us only two choices: option 1, dump the memory Xperf provided to StartTrace and use it as template in our code, modify the enabled flags and output file name accordingly. Of course we still need to figure out the undocumented data structure and its meaning if we want to control the stack walk options. Option 2, use Xperf.exe. Xperf.exe and perfctrl.dll are the only two binaries required if you just want to enable trace and get merged result. There is no installation required and they can be run from anywhere as long as they are in same folder.

Enable NT kernel logging session during boot time

To enable NT kernel logging session at boot time, all you need to do is to create couple of registry entries and they are well (sort of) documented in MSDN article: Configuring and Starting the Global Logger Session. Following is an example:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\WMI\GlobalLogger]
“Start”=dword:00000001
“FileName”=”c:\\test\\boot_PROC_THREAD+LOADER+DISK_IO+DISK_IO_INIT+DPC+CSWITCH+DISPATCHER+PROFILE_1_km_premerge.etl”
“EnableKernelFlags”=hex:07,07,00,00,86,02,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00,00
“StackWalkingFilter”=hex:2e,0f,00,00,24,05,00,00,32,05,00,00
“BufferSize”=dword:00000400
“MinimumBuffers”=dword:0000012c
“MaximumBuffers”=dword:00000258
“ClockType”=dword:00000001
“MaxFileSize”=dword:00000000
“LogFileMode”=dword:00000000
“Status”=dword:00000000

The registry above was actually created by XbootMgr, I exported it after reboot. The filename implies the meaning of EnabledKernelFlags value. “StackWalkingFilter” is not documented anywhere and XbootMge mystically just knows how to set it. I enabled stack walking on profile, cswitch and readythread in this example, do they look familiar with the code example I showed earlier for TraceSetInformation()?

Once the registry was set, and at the next boot, Windows will start NT kernel logging session automatically and write events to the file specified. The status will have value 0 if everything went smoothly, or it will be an error code.

Stopping the trace is again as simple as calling ControlTrace() with last parameter set to EVENT_TRACE_CONTROL_STOP. Another option is to call Xperf with parameter “-d output.etl”, Xperf will stop NT kernel logging session and merge result into output.etl with image information, you are going to need it anyway.

Remember to delete the registry entry and the temporary ETL file.

Enable other event providers at boot time

The ETL files generated by XBootMgr have some other information like ReadyBoost and Services, which are provided by some ETW providers that cannot be put into NT kernel logging session. It does do by putting these providers into an AutoLogger session, it will also generate an ETL file, which will be merged by Xperf into final result at the end of tracing. The following are some registry entries created by XBootMgr to start an autologger session:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\WMI\Autologger\XBootMgrSession]
“Start”=dword:00000001
“GUID”=”{1a15901b-0686-4433-955f-2ee7f9c61cfc}”
“FileName”=”c:\\test\\boot_PROC_THREAD+LOADER+DISK_IO+DISK_IO_INIT+DPC+CSWITCH+DISPATCHER+PROFILE_1_um_premerge.etl”
“MinimumBuffers”=dword:00000004
“MaximumBuffers”=dword:0000000c
“MaxFileSize”=dword:00000000
“BufferSize”=dword:00000400
“ClockType”=dword:00000001
“Status”=dword:00000000

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\WMI\Autologger\XBootMgrSession\{0063715b-eeda-4007-9429-ad526f62696e}]
“ProviderFriendlyName”=”Service Control Manager”
“Enabled”=dword:00000001
“MatchAnyKeyword”=hex(b):00,00,00,00,00,00,00,00
“EnableLevel”=dword:00000000
“Status”=dword:00000000

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\WMI\Autologger\XBootMgrSession\{a6ad76e3-867a-4635-91b3-4904ba6374d7}]
“ProviderFriendlyName”=”ReadyBoost”
“Enabled”=dword:00000001
“MatchAnyKeyword”=hex(b):40,00,00,00,00,00,00,00
“EnableLevel”=dword:00000000
“Status”=dword:00000000


Once the registry was set, at the next boot, Windows will start an ETW session, enable specified providers and put them into the session, and events will be written to the file specified by registry. You can stop tracing by calling ControlTrace(), then call Xperf with “-merge” parameter to merge the result into another ETL file, e.g. the one from NT kernel logging session.

Summary

Now I am able to enable NT kernel logging session and collect logs with stack walking enabled, and do the same at boot time with bonus of additional data like ReadyBoot. Most of the time were actually spent on debugging and try to reverse engineering what Xperf was actually doing. Anyway the effort is paid off now.

Advertisements

Posted on August 17, 2012, in Uncategorized. Bookmark the permalink. 2 Comments.

  1. Hi,, I am facing some problem with autologger session. Not able to log.
    I have the code changes ready and working fine with logman .. where the messages are logged by dotracemessage/. added registry entry for autologger
    during boot file is getting created but it is only 528kb, and if I parse is, there is no data. can you please let me know what I might be missing..

  2. That is great content. Your article really should be higher ranked by Google. I just stumbled across it while searching for a way to enable a kernel session with xperf which has another name than NT Kernel Logger. Apparently there are some limitation built into xperf. …

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: