TwitterFacebookInstagramYouTubeDEV CommunityGitHub
Common Intermediate Language (CIL) - What Does It Look Like?

Common Intermediate Language (CIL) - What Does It Look Like?

In the previous blog post, we dived into the MSBuild Engine by creating a project file from scratch. The project file is what MSBuild Engine uses to build our application.

In this part of this series, we will review the compiled CIL file that was built previously to understand how the compiler translated our code.

Common Intermediate Language (CIL)

When we compile our code written in any .NET language, the associated compiler (C#, VB Compiler) generates binaries called assembly which contains IL code. These instructions are low level human readable language which can be converted into machine language by the run-time compiler during its first execution. It's done just during execution so that the compiler has before hand knowledge of which environment it's going to run in so that it can emit the optimized machine language code targeting that platform. This is also known as Just-In-Time (JIT) compiler.

CIL is an object-oriented assembly language and is CPU and platform-independent instructions that can be executed in any environment supporting the Common Language Infrastructure such as the .NET run time.

Evaluation Stack

Before we look at the IL code, it's important that we understand the role of Evaluation Stack in executing the CIL instructions.

A stack is the data structure that follows Last In - First out data storing method as demonstrated in the image below.
Last-In-First-Out Data Structure

Evaluation stack is used to hold the local variable or the method argument before they are evaluated. Instructions that copy values from memory to the evaluation stack are called Load, and instructions that copy values from stack back to memory are called Store. All the Opcodes starting with ld are used for loading the item on the stack, and the Opcodes starting with st are used for storing the item in memory.

At the beginning of the function, it is required to provide the maximum items that would be present on that stack at any particular time. This is done using the .maxstack directive. If not provided, it will be default to 8.

Now, we're ready to go look at some codes. 👩‍💻

CIL Example

If you follow along the tutorial in the last part, by now you should have the HelloWorld.exe file in the Bin folder. Because the compiler embeds IL in files, we need to use a disassembler to view the CIL. All .NET flavors come with Microsoft's own disassembler called ILDASM - Intermediate Languague Disassembler. To use ILDASM, we need to use the Developer Command Prompt for Visual Studio. Invoke the following command from the command prompt:

ildasm Bin\HelloWorld.exe /output:Bin\HelloWorld.il

Let's look at the output HelloWorld.il file. This file is filled with IL code. If you have ever worked in or seen assembly-level programming, you might notice some similarities. Common Intermediate Language is definitely harder to read and more "close to the metal" than regular C# code, but it's not as mysterious as it might look. By stepping through the IL code line by line , you'll see that this is just a different syntax for programming concepts you already know.



//  Microsoft (R) .NET Framework IL Disassembler.  Version 4.8.3928.0
//  Copyright (c) Microsoft Corporation.  All rights reserved.

// Metadata version: v4.0.30319
.assembly extern mscorlib
{
  .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )                         // .z\V.4..
  .ver 4:0:0:0
}
.assembly HelloWorld
{
  .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilationRelaxationsAttribute::.ctor(int32) = ( 01 00 08 00 00 00 00 00 ) 
  .custom instance void [mscorlib]System.Runtime.CompilerServices.RuntimeCompatibilityAttribute::.ctor() = ( 01 00 01 00 54 02 16 57 72 61 70 4E 6F 6E 45 78   // ....T..WrapNonEx
                                                                                                             63 65 70 74 69 6F 6E 54 68 72 6F 77 73 01 )       // ceptionThrows.

  // --- The following custom attribute is added automatically, do not uncomment -------
  //  .custom instance void [mscorlib]System.Diagnostics.DebuggableAttribute::.ctor(valuetype [mscorlib]System.Diagnostics.DebuggableAttribute/DebuggingModes) = ( 01 00 07 01 00 00 00 00 ) 

  .hash algorithm 0x00008004
  .ver 0:0:0:0
}
.module HelloWorld.exe
// MVID: {381571BD-67C6-4919-A3A1-5BAC05B0DDD1}
.imagebase 0x00400000
.file alignment 0x00000200
.stackreserve 0x00100000
.subsystem 0x0003       // WINDOWS_CUI
.corflags 0x00000001    //  ILONLY
// Image base: 0x06F00000


// =============== CLASS MEMBERS DECLARATION ===================

.class private auto ansi beforefieldinit HelloWorld
       extends [mscorlib]System.Object
{
  .method private hidebysig static void  Main(string[] args) cil managed
  {
    .entrypoint
    // Code size       13 (0xd)
    .maxstack  8
    IL_0000:  nop
    IL_0001:  ldstr      "Hello World!"
    IL_0006:  call       void [mscorlib]System.Console::WriteLine(string)
    IL_000b:  nop
    IL_000c:  ret
  } // end of method HelloWorld::Main

  .method public hidebysig specialname rtspecialname 
          instance void  .ctor() cil managed
  {
    // Code size       8 (0x8)
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
    IL_0006:  nop
    IL_0007:  ret
  } // end of method HelloWorld::.ctor

} // end of class HelloWorld


// =============================================================

// *********** DISASSEMBLY COMPLETE ***********************
// WARNING: Created Win32 resource file Bin\HelloWorld.res

Let's go over some of the syntax we notice in the code above.

  • CIL Directives, Tokens, and Attributes
    In the above code, we notice some names (CIL Tokens) with the .. prefix, e.g. .assembly, .namespace, .class, .method, .ctor, .override. These are called CIL Directives. The tokens that are used along CIL Directive and describe how the CIL Directive should be processed are called CIL Attributes.

  • CIL Opcodes
    Operation codes are tokens that are used to build the type's implementation logic. This is the area where we are going to focus in our remaining article.

    CIL Opcodes are actually binary codes but have corresponding friendly mnemonic (e.g. the friendly name for 0x01 is "Break") to assist developer in understanding, debugging, and writing code directly in intermediate language. Below are examples of some Opcodes:

    Opcode Instruction
    0x00 Nop
    0x01 Break
    0x02 ldarg.0
    0x73 newobj
  • CIL Code Labels
    The tokens like IL_000, IL_001, etc. are called CIL Code Labels. These are just optional labels that can be replaced with any text of your choice.

Now that you understand some of the syntax. Let's look at the code.

.assembly extern mscorlib
{
  .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )                         // .z\V.4..
  .ver 4:0:0:0
}

This first block of code has the .assembly extern declaration, which is used to reference an external assembly. In this case, it's the mscorlib, which contains the definition of System.Console - the only type that we have used outside of our assembly. The next block of code also has the .assembly directive but without the extern declaration, which is used to declare the mame of the assembly of this program.

For the remaining of this article we will focus on the last block of code, which has the Main method that is the "heart" of our simple console application.

// =============== CLASS MEMBERS DECLARATION ===================

.class private auto ansi beforefieldinit HelloWorld
       extends [mscorlib]System.Object
{
  .method private hidebysig static void  Main(string[] args) cil managed
  {
    .entrypoint
    // Code size       13 (0xd)
    .maxstack  8
    IL_0000:  nop
    IL_0001:  ldstr      "Hello World!"
    IL_0006:  call       void [mscorlib]System.Console::WriteLine(string)
    IL_000b:  nop
    IL_000c:  ret
  } // end of method HelloWorld::Main

  .method public hidebysig specialname rtspecialname 
          instance void  .ctor() cil managed
  {
    // Code size       8 (0x8)
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
    IL_0006:  nop
    IL_0007:  ret
  } // end of method HelloWorld::.ctor

} // end of class HelloWorld


// =============================================================
  • The .ctor directive represents instance level constructor. ctor is always qualified with specialname and rtspecialname attribute. Special name is used to indicate that this token can be treated differently by different tools.

Next, let's look at the Main method, which was declared as private and static.

  • The hidebysig attribute means that the member in the base class with the same name and signature is hidden from derived class.
  • The .entrypoint directive is the entry point of the executable program. When the C# compiler compiles this code, it marks the Main method with .entrypoint IL directive. In .NET, the Common Language Runtime (CLR) looks for a specific entry point in the compiled executable, making it the application's starting method.
  • The nop instruction is simply a debug build artifact and are used to allow to put breakpoint on the curly braces.
  • The ldstr instruction load the string on the stack. In this case, it's the "Hello World!" string value.
  • Next, the call opcode calls the base class constructor.
  • Finally, the ret opcode exits a method and return a value to the caller (if any).

Does this feel too deep in the weeds for you? Don't worry! We won't be learning how to code in assembly language, I promise.😛 The intent is to get a high level understanding of how everything is wired together under the hood. In the next few posts in this series, we will focus on learning about the ASP.NET Core framework. Stay tuned!