Visual Studio 2005/2008 debugging with sos.dll
The blog seems to have gone cold, so copying here for good luck.
Vance Morrison's Weblog
Vance Morrison is currently an Architect on the .NET Runtime Team, specializing in performance issues with the runtime or managed code in general.
Digging deeper into managed code with Visual Studio: Using SOS
I have let my blog laps for too long. I am back to blogging. I realized reciently that we have simply not written down many interesting facts about how the runtime actually works. I want to fix this. Coming up in future blogs I am going to be doing a bit of a 'architectural overview' which describe the differences between managed and unmanaged code, but before I do that I realized that I have not even finished a blog entry I started in March.
In my blog How to use Visual Studio to investigate code generation questions in managed code, I talk about the how to configure Visual Studio so that you can actually look at optimized code in the debugger (which sadly is not as trivial as you would like), and showed how to look at the disassembly of managed code. Unfortunately manage code is hard to read without a guide, and so in this blog I will show you some very useful tips for reading managed assembly code.
In this blog entry I will show you the instructions ACTUALLY need to get executed to do something as simple as assigning a string to field of a class. Note that I am assuming a familiarity with X86 assembly code. If you are the type who never wants to read assembly code, you should stop reading now, because most of this blog is a step-by-step explanation of it.
I have attached the file InspectingManageCode.zip, which contains a (trivial), project that I used for this example. You are STRONLY encouraged to open it (you can browse it the main file is Program.cs). Copy the files (simply drag the 'InspectingManagedCode directory inside the ZIP to a directory of your choosing), launch the InspecingManagedCode.sln file and run the example. While the project is already set to build and run optimized code, you will still need to turn off ‘just my code’ and turn on JIT optimization as described in my previous blog to follow along.
The code in the attached example is pretty trivial.
myString = “foo”;
static void Main(string args)
Program p = new Program();
If you were to follow the instructions in the previous blog to see what code was generated for the body of ‘Main’ you would find the following code.
00000000 push esi
00000001 mov ecx,9181F4h
00000006 call FFCB1264
0000000b mov esi,eax
0000000d mov eax,dword ptr ds:[0227307Ch]
00000013 lea edx,[esi+4]
00000016 call 79222B78
0000001b pop esi
At first glance this code has little similarity to the source code: the original source has a call the constructor ‘Program’ and the assembly code has two calls to strange hex addresses. There are also references to magical numbers like 9181F4H and 0227307CH. In this case the disassembly has not proven to be very valuable. What can we do?
Sadly if we try to peer into these CALL instructions we cannot, the debugger comes back with the very unhelpful message ‘There is no code at the specified location’. Actually Visual Studio is LIEING to you. There really is code there, but it simply will not show you. I will show you techniques to get around this.
The key to unlocking mysteries of managed code, is a debug helper called SOS.DLL (it is a dll that is shipped with the runtime). The DLL is what is called a ‘debugger extension’. Basically it implements functionality that is useful in a debugger implementing
functions that are useful for debugging code associated with it (in this case the
runtime). Other bloggers have
also commented on the use of this DLL (do a web search of SOS.DLL for more).
In Visual Studio, you load SOS.DLL by
opening the immediate window (Ctrl-D I) and typing
If you do this you may get the message
SOS not available while Managed only debugging.
To load SOS, enable unmanaged debugging in your project properties.
This message is actually reasonably
helpful. By stopping
the debugger (Shift F5) going to Solution Explorer (Right hand pane), right clicking
on the InspectingManagedCode project file, and selecting Properties, you will get
the properties pane for the project.
If you select the ‘Debug’ tab on the left side you will find 3 check boxes
at the bottom, one of which is labeled ‘Enable unmanaged code debugging’ If you check this, you put the debugger
into a mode where it can debug both mananged and unmanaged code, (which means you
can then use SOS.DLL).
I have already done this on the InspectingManagedCode project, but you will
have to repeat this any time you need to use SOS.
(Sadly the instructions for setting the debugger mode are different
for C++). Note that running
the debugger to debug both managed and unmanaged code will slow the debugger down
a bit (it loads the symbols for all the unmanaged DLLS), so you probably only want
do this on projects like this one where you want to use SOS.DLL.
Now you should be able to set a breakpoint
in Main(), run the program (F5), and go to the immediate window (CTRL-D I) and type
And get the message
extension C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos.dll loaded.
If you are curious the SOS.DLL has reasonably
good help, if you type the command
It will give you a list of commands,
and you can get help on individual commands by specifying the name eg.
It will give you help on the ‘u’ (unassembled)
All SOS commands need to be prefixed by a ! character so that the Visual Studio
Debugger knows that it is an SOS command and not an immediate value to be interpreted
(the normal meaning of text typed in the immediate window).
The unassemble SOS command is the command
we are interested in.
It will disassemble a managed routine, but do a much better job than Visual Studio
Unfortunately, we need the address of the routine we want disassemble, and Visual
Studio goes to some length to hide this information.
If you look at the disassembly for the code (CTRL-ALT-D), you will see that
the address of the routine is never given, only the offset from the beginning of
The way around this is to use the ‘Registers
window’ (Ctrl-D R).
I happen to like to put this window just above the immediate window and shrink it
so that only the two lines that actually show values are showing.
One of the registers is ‘EIP’ which stands for Extended Instruction
Pointer’. It is
the address of the current instruction pointer.
In my particular invokaction EIP has the value of 00DE0071, so
I can do the command
Which will disassemble the ENTIRE routine
that the address 00DE0071 lives in.
I like to right click in the immediate window and select ‘Clear All’ before
I do this so the only thing in that window is the disassembly. On my machine I get the result
Normal JIT generated code
Begin 00de0070, size 1d
>>> 00DE0071 B904309100
0090201C (JitHelp: CORINFO_HELP_NEWSFAST)
eax,dword ptr ds:[022B303Ch]
It is not unlike the version the Visual
Studio produced, but there are differences
You will note that the ‘call instruction
is annoted with ‘JitHelp: CORINFO_HELP_NEWFAST’, which makes it at least a bit clearer
that this helper is used to create a New object (and is the fast version, we have
It printed the whole routine that 00DE0071
lives in and prints a >>>
on the instruction corresponding to the 00DE0071 address.
While it did not print the name for
the ‘call 79E73930’, notice that the HEX value is different than the value in the
Visual Studio Disassembly (79222B78).
The value in the
VS disassembly is simply WRONG (it is bug no one bothered to fix).
So let’s take a look at the first two
0090201C (JitHelp: CORINFO_HELP_NEWSFAST)
I mentioned that this helper call creates
a new object from the GC heap. To do so it needs to know that type of the object
to be created. This is what the magic number 913004 does. Internally in the runtime types are
described by a structure called a MethodTable, and 913004 is the address of the
MethodTable to create.
We can find out what type 913004 corresponds to by using the !DumpMT (dump Method
Table) SOS command.
Produces the output
(C:\Documents and Settings\vancem\My Documents\Visual Studio 2005\Projects\InspectingManagedCode\bin\Release\InspectingManagedCode.exe)
Number of IFaces in IFaceMap: 0
Slots in VTable: 6
The only output of this that is interesting
at this point is the ‘Name’ field, which as you can see, indicates that 913004 cooresponds
to the ‘Program’ type.
Thus these first two instructions create a program object. This program object comes back from
the helper with all its fields zeroed, so the next instructions in the program are
the body of the constructor (the Program() constructor has been inlined into the
body of Main().
The next instructions
eax,dword ptr ds:[022B303Ch]
Basically implement the statement ‘myString
= “foo”’ The helper returns a pointer into the
uninitialized object in the EAX register.
The mov saves this into the ESI register.
EAX is then loaded with what is at the address 022B303Ch. This happens to be the string “foo”
(more on how it go there in a later blog).
You can confirm this by going to the disassembly code, setting
a breakpoing right after the eax,dword ptr ds:[022B303Ch] instruction and looking
at the value of the EAX register in the ‘registers’ window.
In my example it happens to be the value 012B1D44. You can then use the command
Which will dump the managed object at
this address. This
will print .
Size: 24(0×18) bytes
MT Field Offset
790fa3e0 4000099 10
System.String 0 shared static Empty
79124670 400009a 14
System.Char 0 shared static WhitespaceChars
Again, most of the output is uninteresting at this point, except
the Name field (which says its a string), and the ‘String’ field (which shows the
string value is ‘foo’).
So we have confirmed that this instruction loads up the address of the ‘foo’ string
into the EAX register.
What is left is
The first instruction ‘LEA’ may not
be familiar to you.
It is Load Effective Address (LEA).
Basically it works just like a MOV instruction, but instead of moving what
was AT the memory specified, it loads the ADDRESS of the memory. Another way of looking at this is to
imagine a MOV instruction with the  dropped (which represent memory fetching). Thus
Can be thought of as
That is it adds 4 to ESI and places
it in EDX. Now
remember ESI points at our newly created ‘Program’ object. We could find out all the fields of
this object by dumping it,
In my debugger ESI has the value of 012B1D5C so I can do
Size: 12(0xc) bytes
Settings\vancem\My Documents\Visual Studio 2005\Projects\InspectingManagedCode\bin\Release\InspectingManagedCode.exe)
MT Field Offset
0 instance 00000000 myString
Which tells us that ESI points at a
‘Program’ object and that the total size of the object is 12 (more on that in a
later blog), and that at offset 4 there is a field calls ‘myString’ of type System.String
that currently has the value of 0 (null).
So now we can make a pretty good guess
that the LEA instruction is setting EDX to the address of the ‘myString’ field of
the program object.
EAX has been set to the ‘Foo’ String, and next comes the mysterious
Ideally SOS would have annotated this
helper. It is
what we call a ‘WriteBarrier’.
More on exactly what a write barrier is later,
but for now the important thing to know is that ALL updates to
OBJECT REFERENCES that live in the GC heap need to be done by calling a write barrier
Since the Program object lives in the heap, and we are updating a object reference
pointer inside it we need to use the write barrier.
The runtime actually has many write
barriers. All the
write barriers have an unusual calling convention.
They all take the address to be updated in the EDX register. Then depending on the write barrier,
they take the value to update in some other register (this particular write barrier
is the most commonly used, and takes its argument in the EAX register). Logically all the write barrier
does is do (*EDX = EAX)
(that is update what EDX points at to be the value in EAX).
That is about it for this example The only instructions
we did not cover
are the PUSH ESI, and POP ESI at the beginning and end of the routine. As anyone who deals with assembly code
this is simply saving and restoring ESI since we used it in the routine itself.
To recap here are the instructions that
actually got executed in the ‘Main’ program and what they do.
// save ESI
// ECX = MethodTable(Program)
// EAX = New Object (Program)
// ESI = this (new object)
mov eax,dword ptr ds:[022B303Ch] // EAX = “foo”
// EDX = &this.myString
// this.myString = EAX (“foo”)
// restore ESI
We just understood very deaply EXACTLY
what happens when a particular piece of managed code executes.
Hopefully that wasn’t so bad.
Next time we will dig a bit into this WriteBarrier
is and exactly what it does (how expensive is it?).
We will also dig into exactly what went on inside the ‘New’ helper. In later blogs I will go into
how exactly other run time features get converted to native code.
I hope you are enjoying this peek under
the hood of the .NET Runtime.
Published Tuesday, September 05, 2006 7:55 PM by vancem
Filed under: Tools
Great info! Thanks.
BTW, when using windbg + sos to debug, what breakpoint (native: bp / bu) is best
to set in order to use managed breakpoints (thus both !name2ee and !bpmd probably
needed)? With a breakpoint on loading of mscorwks or calling of various CLR functions,
when is the CLR booted up enough so that !name2ee etc. can work?
September 6, 2006 4:14 AM
The subject of using SOS in windbg will be the subject of a future blog, however,
I can quickly answer your question.
The !bpmd (Breakpoint MethodDescriptor), is a command that will set a breakpoint
on a managed method by name. For example in the example the command
!bpmd InspectingManagedCode.exe Program.Main
Will set a breakpoint in the ‘Main’ routine of the example program in the ZIP file.
Note that UNLIKE the !name2ee SOS command (which looks up a method, or class
by name), the method being referenced in the !BPMD command does NOT need to be loaded
to work (it sets a ‘deferred’ breakpoint).
However to use ANY SOS command, you need to load SOS, and it turns out that SOS
needs the .NET runtime dlls ‘mscorwks.dll’ to be loaded before it can load.
There are a variety of techniques you can use. The one I use is
This sets a breakpoint at the ‘EEStartup’ method in the .NET runtime DLL ‘mscorwks.
When this breakpoint hits you can do the command
.loadby sos mscorwks
Which tells windbg to load the sos.dll by searching the in the directory where mscorwks
lives. Once loaded you can execute a ! bpmd command.
Finally if you need !name2ee to work and the module is not yet loaded, you should
set a breakpoint (using !bpmd command), in the module of interest, run to
that breakpoint (now it is loaded), and then do the !name2ee command.
September 6, 2006 12:46 PM