Download the source code for this post (5k)
Last time, we made an addin that wasnt very useful admittedly. This time we’ll fill it out a little and discuss some of the caveats involved in these sorts of things.
MyType
Now that we want to query memory, we’ll need to fill out the details of MyType. Let’s go with this:
struct MyType
{
int Integer1;
int Integer2;
char * BigString;
char * StringList[10];
int StringIndex;
};
Get the latest project here (5k). Ultimately, we’ll look at three different ways to visualize this structure, but for now, we’ll focus only on the first two integers. The rough set of steps we need are:
- Get the pointer value from GetReadAddress()
- Query Integer1
- Check for read errors
- Query Integer2
- Check for read errors
- Compose the output string and return
If you look at the HandleMyType_TwoIntegers() function, you can see the steps called out. Notice that we check for errors a lot – this is very important! Treat your addin code as though you’re literally adding code to Visual Studio becuase that is in essence what you’re doing. If your addin crashes or scribbles on memory, it can destabilize Visual Studio.
On the bright side, we’re finally querying memory, and it’s pretty straight forward. Unfortunately, the code is hiding a lot of bad assumptions. We’ll get into that a little later after we cover the other two examples.
Example: Big String
The second example is a little more complex. We want to read a single string from a character pointer. This means we’re going to dig through a pointer indirection as well as deal with a string of indeterminate size.
C-strings are a common data type that might be read from a debugger addin, but dealing with them can be problematic. First, we have to get the string-pointer out of the structure and then we have to read the character data. Unfortunately, we don’t know how much data to query; it could be a single character or many kilobytes long. My solution is to query 1k at a time, concatenate the data, and then check the new data for the string terminator.
There is an additional caveat that I’ve encountered, however. On some platforms, if you read off the end of a valid memory page and into an invalid one, the read will simple be truncated. Other platforms will simply return a read-failure. As a result, I always read up to 1k at a time and clip the reads to page boundaries. There’s also the possibility that the string isn’t terminated at all. We don’t want to create a dangerous situation where we keep reading memory until we happen upon a terminating null by accident. So regardless of the data, limit the read to a reasonable length. Certainly limit it to the size of the results buffer.
Example: String Table
The truth is that the second example was too complicated. We could have accomplished it in the autoexp.dat file without any addin dll at all. I included it in order to introduce the string-reading code with minimal complications. The third sample on the other hand can’t be accomplished without one. We have an array of strings and an index. The output should be the string that corresponds to the index. This was meant to mimic standard identifier schemes that are used in some game engines.
The steps we have to take are:
- Get the pointer value from GetReadAddress()
- Query the StringIndex
- Check for read errors
- Bounds check the index value
- Calculate the offset into the string table at the index
- Query the pointer value at that offset
- Check for read errors
- Handle the NULL pointer case
- Query the string bytes at the pointer value
- Check for read errors
- Compose the output string and return
Assumptions and Caveats
The addins mechanism is fairly useful, but it gives you very little solid information on the nature of the target process or the hardware it’s running on. Earlier, we dealt with a potential problem reading strings, but there are several more places where we have incomplete information.
Fundamenal Data sizes
How big is an integer? It may be 32-bits on your development PC, but it could be 64-bits on the target process. More importantly though, how big is a pointer? The same PC might use 32-bits for some processes and 64-bits for others. Having data with potentially different sizes on the target process can also shift other data around inside of structures. While we’re on the topic of structures, we have to be aware of alignment. The example code has a copy of the structure in the addin, but that’s a cop-out. There are no guarantees that the local version and the target version are the same.
All of this is points to a need to be circumspect in how you structure your queries. Fortunately, there is a way to be 100% sure without making assumptions which we’ll cover in an upcoming post.
Endianess
Visual Studio is a Windows PC application, so the debugger is running natively as little-endian. However, you’re only ever reading raw data from target process memory. That means that all data you read is in the target’s endian-format. If you’re drilling through layers of pointers to pointers, you’ll have to read data, swap endian, read-data, swap-endian, etc etc.
Remember that endianess can apply to wide strings as well. Before we can convert a narrow string, we have to flip each character to be local-endian.
Next time
We’ve gone just about as far as we can go with bog-standard addin dlls in the autoexp.dat file. Next time, we’ll look at creating a more versatile kind of debugger addin that give us more certainty in dealing with the caveats, but with a marked increase in complexity.