The Case of CoCreateInstance Failed Randomly with E_CLASS_NOT_REGISTERED

Last week I investigated an issue in the product where the calls made to CoCreateInstance failed randomly. There were some twists in the investigation; the result surprised almost everybody, and the process worth mentioning.

Since beginning the issue has been found for weeks, developers noticed that CoCreateInstance may randomly return error E_CLASS_NOT_REGISTERED, the “random” here means that it can happen on different code path, it can happen on arbitrary in-proc COM objects, it can happen on any thread, and it can recover automatically.

First, we ruled out the possibility of the COM objects were not registered, despite of what the error code says. We register these COM objects during product installation; they stay registered until product uninstall, so the issue must be somewhere else.

Next, we used Process Monitor captured trace log when we reproduced the issue. Process Monitor log shows that COM library was able locate the CLSID in registry, read out the value of InProcServer32 under it, and was able to open the DLL to map it into memory. However, it seems to be where COM library stopped, as there was no evidence from log that tells us the DllMain function from the DLL was ever called.
Here I started suspecting that somehow the NT loader actually failed to load the DLL. The best option of looking into NT loader issue is to enable show loader snaps with gflags. It is simple as run following command line and re-run the program within debugger:

gflags.exe -i program.exe +sls

After enabled show loader snaps and run the product again in WinDbg this time, I found following output from WinDbg:

1fac:133c @ 01643484 - LdrpInitializeNode - ERROR: Init routine 000000006CC81844 for DLL "C:\Program Files\Company\abc.dll" failed during DLL_PROCESS_ATTACH
1fac:133c @ 01643484 - LdrpProcessDetachNode - INFO: Uninitializing DLL "C:\Program Files\ Company\abc.dll" (Init routine: 000000006CC81844)
1fac:133c @ 01643484 - LdrpUnloadNode - INFO: Unmapping DLL "C:\Program Files\ Company\abc.dll"
1fac:133c @ 01643484 - LdrpLoadDllInternal - RETURN: Status: 0xc0000142
1fac:133c @ 01643484 - LdrLoadDll - RETURN: Status: 0xc0000142

Therefore, NT loader was able to map abc.dll into memory, but its initialize routine returned FALSE to loader, and loader in turned failed LoadLibrary. Normally in this scenario LoadLibrary will return error 1114, “A dynamic link library (DLL) initialization routine failed”.

That abc.dll was created with Visual Studio with default settings; it statically links to CRT with entry point of CRT function _DllMainCRTStartup. The next logic step is to use WinDbg to step through the function to find out where it failed.
Fortunately _DllMainCRTStartup is not complicated and I quickly narrowed the failure down to following function:

extern "C" bool __cdecl __vcrt_initialize_ptd()
    __vcrt_flsindex = __vcrt_FlsAlloc(&__vcrt_freefls);
    if (__vcrt_flsindex == FLS_OUT_OF_INDEXES)
        return false;

    if (!store_and_initialize_ptd(&__vcrt_startup_thread_ptd))
        return false;

    return true;

FlsAlloc returned FLS_OUT_OF_INDEXES, which in turned failed the CRT initialization. A little research shows that Fiber Local Storage has 128 slots per process; each DLL statically linked to CRT will take a slot during CRT initialization. There is possibility that if process has loaded too many DLLs and exhausted FLS slots, then new DLL statically linked to CRT will fail to load, and that is exactly what we ran into.

Since we knew what is happening, a solution is working in progress.


Posted on March 8, 2017, in Uncategorized. Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: