Nov 272013
 

Hard bugs come in a number of forms. Every experienced programmer has a few war stories in their pocket. Twenty years on, my bloodiest war stories have been replaced a few times as a new, even bloodier stories were written. This is only the most recent to make the cut.

It’s not my fault, I swear!

The C++ code in question came about over a number of years by the hands of many programmers. It wasn’t my fault, but I’m not completely clean of it either. It was in a content-loading system that everyone knew was terrible, but no one had taken the initiative to fix. I’m saying this so when you read this and say, “But if only you had written your code better, you never would have been in this mess in the first place,” I can reply that your’re right, but we don’t all live in a clean, perfect world, and you should get down in the muck with the rest of us.

Symptoms

This code ran fine on all iOS devices (the iPhone 5 was the latest at the time) and almost all Android devices. The only two devices that had problems were the recently-released Galaxy S4 and the HTC-One. Something about those devices was leading us down a dark path where the code would occasionally hang sometime during the initial content load sequence. Oh, and it only happened in optimized release builds – you didn’t think this would be THAT easy, did you?

Initial Investigations

Diving right in, there’s no debugger available, so it’s logging output only. The first observation was that this hang wasn’t actually a hang at all. The main thread was still happily rocketing around it’s little track servicing the data-loading jobs. Strangely, once the bug was happening, there weren’t any more jobs to service, but the main thread was convinced that there was more work to do.

As it happens, the last job to be serviced was always seemed to be data decompression. This particular code had a single worker thread dedicated to decompression. Fixed-sized buffers were read from storage and passed to the decompressor. The resulting decompressed data is then passed back to the main thread in another buffer along with the number of bytes it contains.

The most annoying part of this bug quickly became apparent. As I added logging statements, the bug would sometimes vanish entirely. Adding output inside a certain window of the thread-handling code would cause the bug to vanish.

The inter-thread communication looked something like this:

Main thread

ReadBuffer( mInputBuffer, bufferSize );
mOutputBuffer = GetAvailableBuffer();
mTotalDecompressed = 0;
mDecompressionDone = false;
signalCompressionThread();
// Go do other stuff

Decompression thread

IterateForever
{
    waitForSignal();
    mDecompressedBytes = decompressData( mInputBuffer, mOutputBuffer );
    mDecompressionDone = true;
}

Later, back on the Main thread

// Do lots of job processing
if ( mDecompressionDone )
{
    // The decompression job is done, so process the results
    mTotalDecompressed += mDecompressedBytes;
    if ( mTotalDecompressedBytes == mExpectedDecompressedBytes )
        FetchNextDecompressionJob();
}

The primary gate is the mDecompressionDone flag, and we care about the mDecompressedBytes to move things along. Super-janky, right? But it should work, right? The bug seemed to imply that the sum of all mDecompressedBytes didn’t equal the expected value. Unfortunately, logging the values shows they’re always spot on – even when the hang happens. Weird. A little too weird.

I remembered seeing Bruce Dawson give a talk about the perils of lock-free programming. The primary take-away was that the actual order of reads and writes can be changed by the CPU. So for example, even if you think the logic looks good, your data may not be completely consistent between cores. If that was happening here, the window to see the problem would be exceptionally small, and my usual trick of using a sleep to widen the window would be ineffective. I had no shortage of ways to make the problem go away, but I still needed a way to prove what the problem actually was.

Enter the Network Side-Car

Logging statements are actually pretty expensive relative to the timescales we’re talking about here, so that was right out. I needed a way to log information without invoking the normal logging machinery. Making matters worse, this was also only happening on a mobile phone. My solution was to create a special low-overhead logging object that sat on a network port and waited for a connection.

Bingo!

The formerly-correct mDecompressedBytes was returning zero just before the hang. To prove my case, I added a little more code to detect the zero and then wait until the value became non-zero. And success! This was suddenly a clear-cut case where the memory writes to mDecompressedBytes and then mDecompressionDone were inflight at the same time from one thread on one thread, and then being observed from the main thread in the reverse order.

The fix is really quite simple once you know what’s going on. Memory write barriers are used to ensure that all pending memory writes before the barrier complete before any of the writes after it. Inserting a memory barrier does the trick here. Slightly less exotically, you can use simple atomic-writes on mDecompressedBytes which generally implies a memory barrier.

So why didn’t this problem show up earlier? After all, the problematic code had been around for years and actually shipped in multiple products. It’s hard to say with complete certainty, but I can explain some of it. X86 CPUs generally don’t have the write-pass-write problem, so PC and Mac ports were likely immune. The iOS and Android devices we were using were all ARM CPUs, but until the Galaxy S4 and the HTC One, all the devices were only dual-core. Further, I’ve seen some anecdotal evidence that while Android phones were often multicore, some of them restrict all threads from a single application to a single core which would prevent this issue. Perhaps the quad-core processors lift that restriction.

So why didn’t this show up for other Android devices or the plethora of iOS devices? It turns out the nature of the data becomes important. The compressed data we’ve been dealing with was image data, and the two problematic phones were the first to have super-high 400+ dpi pixel densities. We were using higher-resolution content on those phones, and the fixed-sized decompression input buffer was no longer enough to hold an entire compressed image. Because the decompression required multiple jobs to complete, the book keeping logic was suddenly more susceptible to the memory-write problem that was always there.

Jun 082011
 

Download the source code for this post (19k)

Alright then…Part 4…I think it’s time we tie the first three parts together. We want to use the type of add-in that was introduced in part 3 to fill in the knowledge gaps of the add-ins we played around with in parts 1 & 2. Furthermore, for simplicity, we want to literally combine the two types of add-ins into one DLL.

Filling in the Gaps

In part 2, I spent some time talking about all the things we don’t know when we get a raw pointer from a target process through the AutoExp.Dat file. We don’t know:

  1. The target platform’s endianess
  2. The size of an int or pointer on the target platform
  3. How to access any global symbols because there’s no general way to know where they reside in memory.
  4. The data layout for a type in the target process

When our OnEnterBreakMode handler is triggered, we want to quickly gather all the extra information we might need in anticipation of our autoexp.dat handlers being triggered. We don’t know exactly what pointers will be sent to those handlers, but we can answer our nagging questions that are common to all of them.

Endianess

Unfortunately, endianess isn’t something we can just ask the debugger about, but it is almost always something we can infer. The scope of our work here covers Xbox360 debugging as well as Win32/Win64 debugging. The Windows-targets will use the AutoExp.Dat file in <VisualStudioRoot>\Common7\Packages\Debugger while the Xbox360 will use the one in <XDKROOT>\bin\win32. You can use this difference to link our MyType structure to two different handler functions in our add-in.

That’s probably the simplest way, but there are others. If you dig through the EnvDTE::Debugger object, you’ll see that you can get a Process2 object for the current target process and then ask it about the transport being used. That can also tell you if you’re attached to an Xbox.

Sizes of Types

At the end of part 3 this series, I mentioned that the Debugger object has a GetExpression() function. This thing is pure gold. You can feed it strings, and it returns the evaluations of those strings just like you had typed them into the watch window. It’s a programmatic way to evaluate expressions within the context of the target process. So what if our add-in asked it to evaluate this?

CComBSTR const bsSizeOfVoidPtr(L"(int)sizeof(void*),d");
CComPtr<EnvDTE::Expression> pExpr=NULL;
hr = pDebugger->GetExpression(bsSizeOfVoidPtr,  // our string as a BSTR
                              VARIANT_FALSE,    // Don't use autoexp.dat
                              -1,               // no timeout
                              &pExpr);          // result

It would definitively tell us how large our target platform thinks a pointer is, and allow us to query nested pointer types safely.

Since this is the sort of question that will only ever have exactly one answer for the lifetime of a given process, we only want ask it once and then cache the result for later use. Performance isn’t a huge concern inside the debugger, but we can’t ignore it either.

Global Variables

Globals are fairly similar to data sizes in the way we evaluate them. Let’s say our string table example from part 2 was instead held in a global variable. If we knew the name of that global, then we could feed it into GetExpression and get the current base pointer of the table.

Bear in mind that a table pointer might change if the table is reallocated for any reason, but the pointer to the base pointer will reside in a single place throughout the lifetime of the target process. Depending on the nature and expected lifetime of your data, you should cache it appropriately. Unfortunately, even if a value changes only once in a blue moon, you still need to check it ever time.

Data Layouts

Let’s say you know there is a string data member on your object called StrId that is perfect for displaying in the watch window. You won’t ever change it’s name, but you’re less sure about it being shifted around inside of your object.

If you look elsewhere on this blog, you’ll find an interview question that I ask a lot. I posted it before I posted this entry because the contents are important to what we’re doing here. You can use the answer to the struct-offset question along with the GetExpression function to give you the answer you’re after for the target process.

CComBTR const bsOffsetToStrId(L"(int)&(((MyType*)0)->StrId),d");

If we can figure out the critical part of our data layouts at debug-time, we no longer have to mirror our data in our AutoExp.Dat add-in. We don’t have to worry about things getting out of synch because we can ask the right questions against the actual target every time.

A final word about these strings we’re passing into GetExpression. As I said, they’re evaluated as though you typed them into the watch window. That means they’re subject to the platform’s specific expression evaluator nuances as well as the Hex-display button. Because these things are somewhat unpredictable and difficult to account for, I’ve taken to completely specifying what I want to get back. That’s why I’ve explicitly cast my expressions to integers and suffixed my expressions with “,d”  to force the display to decimal. It’s all about predictability.

Tying It All Together

As I’ve already mentioned, we have we have two types of add-ins across the first three installments. The first add-in is a Visual Studio extension that can extract lots of information from the target process. The second can’t extract much information, but has the ability to augment the watch window’s output. It’s no secret that we want the first add-in type to cache some data that’s useful to the second.

It turns out that sharing data is a pretty simple thing to do. Anywhere that you can stash data in Windows can be used to share this information between the two DLLs. The registry, named-pipes, temporary files, swap-backed files, shared memory, etc are all viable ways to make the connection, but they’re not the way I’ve used in the past. When I originally wrote the FNameAddin, I used a simple assumption and a shared secret.

I assumed that the COM add-in was always loaded at debugging time. When the AutoExp.Dat add-in is loaded, it calls LoadLibrary() on the COM DLL. I only had to know what it was called – not where it was installed – because I could assume it was already loaded. Then I could call GetProcAddress on my special GetTheDataCache() function to get do the data transfer. Everything worked fine, but there were a few caveats that needed attention.

Because I had two DLLs, I had to keep their versions in synch or at least make sure they could handle the edge cases where the versions were out of synch with one another. I also had to make sure I was installing the AutoExp.Dat add-in to an appropriate folder so it would be found by the debugger.

A Key Insight, and An Unintended Benefit!

It’s not hard, but it can be a lot easier – we’re making two DLLs here, why can’t they simply be combined into the same DLL? That way, they’d share the same memory space which makes these coupling shenanigans completely unnecessary. In fact, by combining the DLLs, we don’t even have to worry about putting the AutoExp.Dat DLL in the correct place.

Remember all that messiness in part one where we used an absolute path to our add-in DLL? That requirement just went away. Because our DLL will always be loaded by virtue of being a registered Visual Studio add-in, the AutoExp.Dat mechanism won’t ever have a problem finding it. We can put it anywhere we want as long as it’s registered with Visual Studio to be loaded on start-up. Not too shabby!

The Sample

The sample code is really a re-work of the code from part 2, but it uses all of the Visual Studio extension framework we discussed in part 3. It accomplishes the same fundamental task as part 2, but it doesn’t rely on any shared secrets with respect to types or any fragile linkages. Instead, it derives all of its data at runtime using the methods discussed above. It should also be reasonably portable between Win32, Win64, and Xbox targets.

There’s a lot more code there than in the other samples, but take your time to follow what it’s doing. The sample code also includes a trivial target application. Once you compile the add-in project and use Visual Studio as the debugging target as we discussed in the other parts of this series, load the target-app project in the second Visual Studio instance. It should give you all the breakpoints you need to exercise the add-in.

Be sure to add the correct lines to the right instance of AutoExp.Dat:

VS 2008 %PROGRAMFILES%/Microsoft Visual Studio 9.0\Common7\Packages\Debugger\autoexp.dat
MyType=$ADDIN(MT_Addin4.dll,HandleMyType_WinXX);
VS 2010 %PROGRAMFILES%/Microsoft Visual Studio 10.0\Common7\Packages\Debugger\autoexp.dat
MyType=$ADDIN(MT_Addin4.dll,HandleMyType_WinXX);
Xbox360 %PROGRAMFILES%/Microsoft Xbox 360 SDK\bin\win32\autoexp.dat
MyType=$ADDIN(MT_Addin4.dll,HandleMyType_Xbox360);

Conclusion

That should just about wrap it up for this series on Visual Studio add-ins. I’ve covered enough of the techniques used in the FNameAddin that you should be able to adapt them to your projects. I hope you’ve found it useful. Please let me know if you have any questions or problems with the sample code.

 

May 212011
 

Download the source code for this post (10k)

In the previous installments of this series, we worked completely within the confines of the AutoExp.dat framework which was pretty limiting. That’s not really surprising given that that mechanism dates back at least 15 years as of this writing. That was back before the dot-com bubble happened, I think most people still had non-vestigial tails, and dinosaurs still roamed the earth in the some rural areas. It’s been awhile. In this installment, we’re going to take things up a notch by using mechanisms that are only a mere 10 years old – that’s right, a time after most of you were born. Crazy!

We want to deal with several of the big limitations and unknowns we ran into in the previous two parts. Unfortunately, that will have to wait until part 4. First we need to update the way we hook into the debugger. Microsoft has been developing and expanding the Development Tools Environment framework or ‘DTE’ which is at the heart of Visual Studio extensibility since VS2003. It contains quite a lot of interfaces that allow you to inspect and extend several areas of Visual Studio. The scope of what is available is actually pretty impressive. This is one of the interfaces that make products like Visual Assist X possible, But trust me, there’s a lot more to it!

Within DTE, you can do simple things like add new menus and buttons, or you can write more complex tasks like traversing the solution hierarchy. You can also write code that automates certain coding tasks and even add an entirely new language. Clearly, we’re not going to use most of this stuff. For our part, we really only care about hooking some events in the debugger and then doing some simple expression evaluation.

So what’s the plan here, anyway? First, we want to get a new add-in project up and running that will host our next steps. Thankfully, the new-project wizard has a decent template to accomplish most of that. Then we want to get access to the debugger to tell us when the user is debugging something.

A Brave New Project

If you select File->New->Project and then look through the project templates under “Other Project Types” and then “Extensibility”, you’ll find “Visual Studio Add-in” as an option. There are several options in the ensuing wizard, but most of them are just nice-to-have features. For this project, I selected a C++/ATL project and disabled as many of the add-on features as possible.

Right out of the gate, we get a many-filed project with lots  of inscrutable plumbing. You can download the new project here (10k). Let’s do a quick run-down of what’s in there:

stdafx.h/.cpp – As usual, this is the special header that drives the precompiled headers feature in the compiler. If you look closely, however, you’ll see several #import statements. Depending on your version of Visual Studio, they might contain some messy GUIDs or they might contain references to files like “dte80a.olb”. In either case, these are the large sets of interfaces that we’re going to be using  to integrate with Visual Studio.

Addin.rgs – This odd little file is actually a registry fragment that will be used to register your add-in with Windows. Don’t panic, but we’re actually making a registered component here. The registry is the mechanism that Visual Studio uses to find your add-in.The top section registers your component with Windows while the bottom section registers your add-in with Visual Studio. Don’t worry, this stuff will happen automatically.

Addin.cpp – It might seem like this is where you’ll put the important code in your add-in, but it actually just contains some basic plumbing code that allows your add-in to be a DLL and to register itself as a control.

Addin.idl – IDL stands for interface definition language. It’s used to generate data marshaling code for your control’s type library. Again, don’t worry about this file. We won’t be editing it.

Connect.h – This is where the CConnect class lives. It is the main class for your add-in. It will implement the Visual Studio interfaces that we want to use. Notice that CConnect already derives from something called IDTExtensibility2. This is the main interface that makes a class into a Visual Studio Add-in.

Connect.cpp – THIS is where all your awesome code can go, but notice what’s in there already. It has functions called OnConnection and OnStartupComplete. These functions are part of Visual Studio’s IDTExtensibility2 interface that was mentioned above. These functions will be called at various times as your add-in is loaded. Also note in OnConnection, there is some code to fill in a class variable called m_pDTE. This is the access point to most of the things we care about in Visual Studio.

When you build, two non-standard things will happen. First, a few more files will be generated. This is the IDL file doing its thing and creating its glue code. This code is generated every time you compile, so don’t bother changing it. You don’t even have to add it to the project, so just ignore it unless you’re curious what’s in there. The second thing that will happen was your new control should have been quietly registered with Windows. This is important when you start debugging your code. It doesn’t matter if Visual Studio is pointing to the Debug build profile. It will debug whatever profile was built most recently. Keep that in mind as a possible gotcha when things seem to be going all wrong.

Just like last time, if you run your add-in, you will start a new copy of Visual Studio – it’s like launching an aircraft carrier to test out the galley appliances. Go ahead and look under “Tools->Add-in Manager…”. You should see MT_Addin3 on the list of add-ins and it should be activated. If you put some breakpoints in Connect.cpp before you ran, you should have gotten a break or two in OnConnection, OnStartupComplete, or one of the other IDTExtensibility2 handlers.

Hooking a Debugger

(Easier than debugging a hooker! – hey-o!)

Everything we’ve touched on so far is just to get an add-in working inside of the Visual Studio framework. We won’t get the notification we really want. We want to get the chance to act just before the watch windows are updated. In other words, we want to be told whenever the target program is halted, and the user can inspect variables. That requires that we hook the debugger events interface.

We only need to hang a few new bits of code on out CConnect class in order for it to handle debugger events. For now, we only care about the event called OnEnterBreakMode. This is the event that is fired every time the debugger stops the execution of the target process. The obvious case is when the debugger encounters a breakpoint, but this function will also be called every time the user steps into, out of, or over a line of code.

You can follow along in the code sample for these changes, I’ve denoted each one with the comments

//*** Hooking debugger events ***/

The first thing we need to do is make out CConnect class derive from the debugger events interface. This is as simple as adding the following line of code to the class inheritance in Connect.h:

IDispEventImpl<0, CConnect, &EnvDTE::DIID__dispDebuggerEvents, &EnvDTE::LIBID_EnvDTE, 8, 0>

That’s a bit of a mouthful, so in the sample code, I use a simple typedef to make it easier.

The next thing we need to do is tell CConnect what events it should expect and then what to do with them. For this, we need to add  a sink-map:

BEGIN_SINK_MAP(CConnect)
 SINK_ENTRY_EX(0, EnvDTE::DIID__dispDebuggerEvents, 3, OnEnterBreakMode)
END_SINK_MAP()

This is essentially saying that when the EnvDTE::DIID__dispDebuggerEvents interface sends event #3 our way, we should shunt it to a function called OnEnterBreakMode. But wait, that seems a little voodoo – just a magic number three? Honestly, COM is hardly my strong suit, and Im still trying to track down the origin of this value. Its likely tied up in the definition of the IDispatch interface thats part of the debugger events object.

The other part of this event-sink mechanism is attached and detached at run-time from the Add-in’s OnConnection and OnDisonnection handlers. We use the DTE interface once again to get access to the events interface and then ask for the debugger events interface specifically. Once we have the debugger events interface, we can simply register that we want to be notified about those events.

Finally, we declare the CConnect::OnEnterBreakMode function in Connect.h and make a stub for it in Connect.cpp.

A Quick Test Drive

Set a breakpoint in the OnEnterBreakMode function and run the project. When the other copy of Visual Studio comes up, load up another project in it, set some breakpoints, and then run THAT project. When one of those breakpoints hits, you should get popped all the way back to the first Visual Studio with the Add-in project. Neat, huh? Now we can really do whatever we want within Visual Studio, so take a look at what’s available. Next time we’ll be looking at the GetExpression specifically.

Ok, that’s it for now. My apologies that this posting was mostly just a building block for what’s to come. I promise we’ll tie some stuff together next time out.

 

Apr 182011
 

Download the source code for this post (5k)

Last time, we made an addin that wasnt very useful admittedly. This time we’ll fill it out a little and discuss some of the caveats involved in these sorts of things.

MyType

Now that we want to query memory, we’ll need to fill out the details of MyType. Let’s go with this:

struct MyType
{
    int Integer1;
    int Integer2;
    char * BigString;
    char * StringList[10];
    int StringIndex;
};

Get the latest project here (5k). Ultimately, we’ll look at three different ways to visualize this structure, but for now, we’ll focus only on the first two integers. The rough set of steps we need are:

  1. Get the pointer value from GetReadAddress()
  2. Query Integer1
  3. Check for read errors
  4. Query Integer2
  5. Check for read errors
  6. Compose the output string and return

If you look at the HandleMyType_TwoIntegers() function, you can see the steps called out. Notice that we check for errors a lot – this is very important! Treat your addin code as though you’re literally adding code to Visual Studio becuase that is in essence what you’re doing. If your addin crashes or scribbles on memory, it can destabilize Visual Studio.

On the bright side, we’re finally querying memory, and it’s pretty straight forward. Unfortunately, the code is hiding a lot of bad assumptions. We’ll get into that a little later after we cover the other two examples.

Example: Big String

The second example is a little more complex. We want to read a single string from a character pointer. This means we’re going to dig through a pointer indirection as well as deal with a string of indeterminate size.

C-strings are a common data type that might be read from a debugger addin, but dealing with them can be problematic. First, we have to get the string-pointer out of the structure and then we have to read the character data. Unfortunately, we don’t know how much data to query; it could be a single character or many kilobytes long. My solution is to query 1k at a time, concatenate the data, and then check the new data for the string terminator.

There is an additional caveat that I’ve encountered, however. On some platforms, if you read off the end of a valid memory page and into an invalid one, the read will simple be truncated. Other platforms will simply return a read-failure. As a result, I always read up to 1k at a time and clip the reads to page boundaries. There’s also the possibility that the string isn’t terminated at all. We don’t want to create a dangerous situation where we keep reading memory until we happen upon a terminating null by accident. So regardless of the data, limit the read to a reasonable length. Certainly limit it to the size of the results buffer.

Example: String Table

The truth is that the second example was too complicated. We could have accomplished it in the autoexp.dat file without any addin dll at all. I included it in order to introduce the string-reading code with minimal complications. The third sample on the other hand can’t be accomplished without one. We have an array of strings and an index. The output should be the string that corresponds to the index. This was meant to mimic standard identifier schemes that are used in some game engines.

The steps we have to take are:

  1. Get the pointer value from GetReadAddress()
  2. Query the StringIndex
  3. Check for read errors
  4. Bounds check the index value
  5. Calculate the offset into the string table at the index
  6. Query the pointer value at that offset
  7. Check for read errors
  8. Handle the NULL pointer case
  9. Query the string bytes at the pointer value
  10. Check for read errors
  11. Compose the output string and return

Assumptions and Caveats

The addins mechanism is fairly useful, but it gives you very little solid information on the nature of the target process or the hardware it’s running on. Earlier, we dealt with a potential problem reading strings, but there are several more places where we have incomplete information.

Fundamenal Data sizes

How big is an integer? It may be 32-bits on your development PC, but it could be 64-bits on the target process. More importantly though, how big is a pointer? The same PC might use 32-bits for some processes and 64-bits for others. Having data with potentially different sizes on the target process can also shift other data around inside of structures. While we’re on the topic of structures, we have to be aware of alignment. The example code has a copy of the structure in the addin, but that’s a cop-out. There are no guarantees that the local version and the target version are the same.

All of this is points to a need to be circumspect in how you structure your queries. Fortunately, there is a way to be 100% sure without making assumptions which we’ll cover in an upcoming post.

Endianess

Visual Studio is a Windows PC application, so the debugger is running natively as little-endian. However, you’re only ever reading raw data from target process memory. That means that all data you read is in the target’s endian-format. If you’re drilling through layers of pointers to pointers, you’ll have to read data, swap endian, read-data, swap-endian, etc etc.

Remember that endianess can apply to wide strings as well. Before we can convert a narrow string, we have to flip each character to be local-endian.

Next time

We’ve gone just about as far as we can go with bog-standard addin dlls in the autoexp.dat file. Next time, we’ll look at creating a more versatile kind of debugger addin that give us more certainty in dealing with the caveats, but with a marked increase in complexity.

Apr 152011
 

Download the source code for this post (3k)

Greetings from Seattle! This is the start of a multi-part series about what I’ve learned about Visual Studio debugger addins while writing the FNameAddin.

First, however, we need to take a minute to discuss some basic stuff that most engineers I’ve met are aware of, but few bother to dig into. The autoexp.dat file is a peculiar beast in that you are encouraged to add to it, yet it is squirrelled away deep in the Visual Studio installation folder. It can be customized for your project, but has to be shared among all projects. All in all, it seems poorly thought out. We’ll blast through the really basic stuff first, so anyone who is seeing this for the first time should read up on the supplementary links.

AutoExp.dat

Deep in the Visual Studio install directory, at <VisualStudioInstall>/Common7/Packages/Debugger, there lives the autoexp.dat file. It’s just a text file, so go ahead and open it up and take a look around if you’ve never done that before. This is where you can give the Visual Studio debugger some hints as to how you want your data to be displayed in the watch window. There are three sections: [AutoExpand], [Visualizer], and [hresult]. We’ll only deal with the first one for the time being.

The AutoExpand section is fairly simple and quite well documented at the top of the file, so I won’t bother with it here. Sufficed to say, you can crack lots of simple data types by just adding them to the mix and possibly using one of the type suffixes. Go ahead and play around with this stuff. It’s very safe in that it won’t destabilize Visual Studio if you get it wrong, the debugger reloads the file every time you start a new debugging session, so feel free to edit and play around – just make a copy of the original file before you start. It’s always nice to revert in a pinch.

I’m far more interested in covering the $ADDIN-dlls functionality that is briefly mentioned. The file recommends looking at Microsoft’s EEAddin sample in order to get started, but you should use some caution. It turns out that EEAddin has been broken for as long as I can remember and will crash if you use it as-is – how unfortunate! So let’s take a different route. Let’s make our own sample: MT_Addin.dll. The process looks like this.

  1. Make a new Win32 DLL project called MT_Addin.
    • I ripped out the PCH plumbing to reduce the file count, but you’ll probably want to leave it in.
  2. Write your handler function using the correct function signature. (see below)
  3. Add a .def file so our handler function is exported with a nice undecorated name.
  4. Add an entry in the autoexp.dat file that references our Type, DLL and entry point
    • MyType=$ADDIN(<MT_AddinProjectDir>\Debug\MT_Addin.dll,HandleMyType)

You can download my sample project here (3k)

Ok, now let’s see what we have. Starting with the cpp file, we have our handler function with the following signature:


ADDIN_API  HRESULT  WINAPI HandleMyType( DWORD /*dwAddress*/
                                        , DEBUGHELPER * pHelper
                                        , int nBase
                                        , BOOL /*IsUnicode*/
                                        , char * pResult
                                        , size_t maxResult
                                        , DWORD /*reserved*/ );

There are a few things to note here. First, we declare the entry point as WINAPI which is just a #define for the __stdcall calling convention. This is what’s missing from the EEAddin sample that causes it to crash. Next, notice that we don’t use the dwAddress or bIsUnicode arguments. It might seem a little strange given that the whole point of this exercise is to look up addresses in the target process, but these are really just legacy arguments.

The most important thing here is the DEBUGHELPER. Strangely, you have to define the type for yourself even though the format isn’t under your control. (You can find it in MT_Addin.h) Fortunately, it’s not that complex and contains very little nuance.

dwVersion – Structure version – <0x200000 for VS6, >=0x20000 for VS7 and beyond
GetDebugeeMemory() – Useful for VS6 otherwise deprecated. Doesn’t support 64-bit pointers
GetRealAddress() – Gets the 64-bit value of the variable to be inspected. (Replaces dwAddress)
GetDebugeeMemoryEx()
– Query raw memory from the target process.
GetProcessorType() – Its use seems clear enough, but it always returns 0 for me.

 

Our example doesn’t do any memory queries at all, but it does create a simple string to tell you it’s working.  Finally, it returns S_OK for success – your addin should always return S_OK even when something goes wrong. When data can’t be interpreted correctly, you should return success and set the output string to “Error processing the data!” because returning an error will only display “???” in the watch window. Think of the return code to mean “The addin succeeded/failed” instead of “processing this data type succeeded/failed”

Installing, Testing, and Debugging

In order for Visual Studio to find your DLL, we have two options – specify an absolute path in the autoexp.dat file or copy our DLL into the “<VisualStudioInstall>\Common\IDE” folder. I recommend the former method because it prevents pollution of the Visual Studio folder, and it facilitates debugging the addin.

In order to test the addin, we need a second project that contains a type called “MyType”. At this point, it really doesn’t matter what MyType contains since our addin doesn’t actually query process memory. We just need the match the type name in order to invoke our handler – so go ahead and try it. If you put a MyType variable in the watch window, it should display, “Handled Data Type!!” Don’t worry, we’ll eventually make some memory queries.

Debugging these sorts of addins is fairly straight forward. Edit the addin project’s Debugging properties so that the Command item points to DevStudio IDE binary – it lives at “<VisualStudioInstall>\Common\IDE \devenv.exe”. When you run, your DLL won’t be loaded, so your breakpoints are invalid initially. Don’t worry about it though. The debugger will load your dll just in time and your breakpoints will go active. In fact, it will load and unload your any time it it’s required.

Next Time

Ok, that’s it for now. I think we have the basics pretty well covered. There’s a bit more to do with the standard addins, but hopefully things get a bit more interesting when we get to the Visual Studio extensibility framework.