Skip to main content

Iterators in c# - a deep dive


This article explains the in-depth analysis of how the c sharp yield keyword works under the hood.
If you don't have any idea about yield keyword or never used it before, check out my post on Iterators in c# .
Using Iterators is easy, but it's always good to know how this thing works under the hood, right?.
Well for the purpose of understanding let's have a simple example of c# method, which returns list of values.
Here is the code

public class InDepth
    {
        static IEnumerator DoSomething()
        {
            yield return "start";

            for (int i = 1; i < 3; i++)
            {
                yield return i.ToString();
            }

            yield return "end";
        }
    }
It's pretty much simple, isn't it ? Let's have a look at the compiled code.


using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Runtime.CompilerServices;

namespace YieldDemo
{
  public class InDepth
  {
    public InDepth()
    {
      base..ctor();
    }

    private static IEnumerator DoSomething()
    {
      InDepth.<DoSomething>d__0 doSomethingD0 = new InDepth.<DoSomething>d__0(0);
      return (IEnumerator) doSomethingD0;
    }

    [CompilerGenerated]
    private sealed class <DoSomething>d__0 : IEnumerator<object>, IEnumerator, IDisposable
    {
      private object <>2__current;
      private int <>1__state;
      public int <i>5__1;

      object IEnumerator<object>.Current
      {
        [DebuggerHidden] get
        {
          return this.<>2__current;
        }
      }

      object IEnumerator.Current
      {
        [DebuggerHidden] get
        {
          return this.<>2__current;
        }
      }

      [DebuggerHidden]
      public <DoSomething>d__0(int <>1__state)
      {
        base.\u002Ector();
        this.<>1__state = param0;
      }

      bool IEnumerator.MoveNext()
      {
        switch (this.<>1__state)
        {
          case 0:
            this.<>1__state = -1;
            this.<>2__current = (object) "start";
            this.<>1__state = 1;
            return true;
          case 1:
            this.<>1__state = -1;
            this.<i>5__1 = 1;
            break;
          case 2:
            this.<>1__state = -1;
            ++this.<i>5__1;
            break;
          case 3:
            this.<>1__state = -1;
            goto default;
          default:
            return false;
        }
        if (this.<i>5__1 < 3)
        {
          this.<>2__current = (object) this.<i>5__1.ToString();
          this.<>1__state = 2;
          return true;
        }
        else
        {
          this.<>2__current = (object) "end";
          this.<>1__state = 3;
          return true;
        }
      }

      [DebuggerHidden]
      void IEnumerator.Reset()
      {
        throw new NotSupportedException();
      }

      void IDisposable.Dispose()
      {
      }
    }
  }
}
Shocked! I just wrote hardly 10 LOC(Lines of Code), but compiler generated too many lines. Well, the compiler creates auto-generated state machines to implement yield functionality. Let's examine the code that is compiled.
Overall observation
  1. The code shown is not a valid c# code: Yes, the code is not valid. We'll use a valid c# code to write programs and logic and if the compiler uses the same valid code it causes confilcts with the methods and variables declarations during compilation process.
  2. Some of the methods are decorated with [CompilerGenerated] and [DebuggerHidden] attributes. The compiler generated attribute distinguishes the compiler generated element to a user generated element. While the DebuggerHidden attribute stops the method from debugging.
  3. <DoSomething>d__0 implements three interfaces IEnumerator<object>, IEnumerator, IDisposable but we have implemented only one Interface. Well the compiler implemented generic form of IEnumerator even though we have implemented non-generic form of IEnumerator. IEnumerator<object> implies the other two interfaces.


There's whole lot of magic happening in <DoSomething>d__0. Let's have a closer look at it.
  1. Three variables are declared in the method. Namely <>1__state, <>2__current and <i>5__1. <>1_state keeps tracking where the code is reached. <>2__current will return the current value from the iterator.<i>5__1 is just the count variable.
  2. State and current are declared as private while count is declared as public. If we use any parameters to in the Iterator block, those variables will also be public.
  3. There is an important thing to note here. DoSomething() method calls <DoSomething>d__0 which always passes 0 to the constructor. This parameter may vary based on the return type used for the Iterator block. For example, if we use IEnumerable<int> as return type, then it passes the initial value as "-2", instead of 0.
  4. There are two versions of the Current property. They both return <>2__current. MoveNext(), Reset, Dispose are the methods implemented.
  5. The Reset() method always throws NotSupportedException exception. This is normally as per the c# specification.
  6. Whatever the code you write in the Iterator block goes in to the MoveNext() method. Its always a switch statement. The values for current, state, count are modified in this method itself. You can observe the condition statement for the switch is the current state. Based on the current state, the values are modified and returned.
The Iterator doesn't just run on its own. When the Iterator method is called it is just created. The actual process starts when a call to MoveNext() is made. The MoveNext() is called repeatedly until yield break or yield return or at the end of the method is reached.
An important thing to note in the Iterators is that you cannot yield from a try block with a catch block associate with it or with catch and finally blocks. But you can yield from a try block which only has a finally block without a catch.
Till now we've been returning IEnumerator from the Iterator block. Let's replace IEnumerator with IEnumerable. Also note that the IEnumerator returned from the Iterator block earlier is a non-generic version. We'll use the IEnumerable with a generic form to implement Iterator block once again. Here is the code after modification.

static IEnumerable<string> DoSomething()
{
    yield return "start";

    for (int i = 1; i < 3; i++)
    {
        yield return i.ToString();
    }

    yield return "end";

}
Also let's have our compiled code in place. We'll check what's new with the IEnumerable implementation. Here is the code

using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Runtime.CompilerServices;

namespace YieldDemo
{
  public class InDepth
  {
    public InDepth()
    {
      base..ctor();
    }

    private static IEnumerable<string> DoSomething()
    {
      InDepth.<DoSomething>d__0 doSomethingD0 = new InDepth.<DoSomething>d__0(-2);
      return (IEnumerable<string>) doSomethingD0;
    }

    [CompilerGenerated]
    private sealed class <DoSomething>d__0 : IEnumerable<string>, IEnumerable, IEnumerator<string>, IEnumerator, IDisposable
    {
      private string <>2__current;
      private int <>1__state;
      private int <>l__initialThreadId;
      public int <i>5__1;

      string IEnumerator<string>.Current
      {
        [DebuggerHidden] get
        {
          return this.<>2__current;
        }
      }

      object IEnumerator.Current
      {
        [DebuggerHidden] get
        {
          return (object) this.<>2__current;
        }
      }

      [DebuggerHidden]
      public <DoSomething>d__0(int <>1__state)
      {
        base..ctor();
        this.<>1__state = param0;
        this.<>l__initialThreadId = Environment.CurrentManagedThreadId;
      }

      [DebuggerHidden]
      IEnumerator<string> IEnumerable<string>.GetEnumerator()
      {
        InDepth.<DoSomething>d__0 doSomethingD0;
        if (Environment.CurrentManagedThreadId == this.<>l__initialThreadId && this.<>1__state == -2)
        {
          this.<>1__state = 0;
          doSomethingD0 = this;
        }
        else
          doSomethingD0 = new InDepth.<DoSomething>d__0(0);
        return (IEnumerator<string>) doSomethingD0;
      }

      [DebuggerHidden]
      IEnumerator IEnumerable.GetEnumerator()
      {
        return (IEnumerator) this.System.Collections.Generic.IEnumerable<System.String>.GetEnumerator();
      }

      bool IEnumerator.MoveNext()
      {
        switch (this.<>1__state)
        {
          case 0:
            this.<>1__state = -1;
            this.<>2__current = "start";
            this.<>1__state = 1;
            return true;
          case 1:
            this.<>1__state = -1;
            this.<i>5__1 = 1;
            break;
          case 2:
            this.<>1__state = -1;
            ++this.<i>5__1;
            break;
          case 3:
            this.<>1__state = -1;
            goto default;
          default:
            return false;
        }
        if (this.<i>5__1 < 3)
        {
          this.<>2__current = this.<i>5__1.ToString();
          this.<>1__state = 2;
          return true;
        }
        else
        {
          this.<>2__current = "end";
          this.<>1__state = 3;
          return true;
        }
      }

      [DebuggerHidden]
      void IEnumerator.Reset()
      {
        throw new NotSupportedException();
      }

      void IDisposable.Dispose()
      {
      }
    }
  }
}
Observations:
  1. At first, the return type of the DoSomething() method is changed to IEnumerable<string>.
  2. Also noticebly the parameter passing to the <DoSomething>d__0() constructor has changed from 0 to -2.
  3. The compiler generated <DoSomething>d__0 class implements IEnumerable<string>, IEnumerable along with IEnumerator<string> and the others.
  4. The implementation of the IEnumerator<int> in the sealed class implements almost the same as IEnumerator. The Current property just has the current value to return, Reset throws the same exception and MoveNext() has the same logic.
  5. A private variable <>l__initialThreadId is added, set in the constructor to the current thread.


Well, what happened? When the instance of IEnumerable<string> is created, then GetEnumerator() method is called, which returns an IEnumerator interface and methods in the IEnumerator were carried on. Also a readonly access to the collection is turned on. Its the MoveNext() method that is operated over and over again to return the values lazily.
Why is the initial call to DoSomething constructor changed from 0 to -2. Well these are the codes to tell the compiler what state they are in. Here are the states that the state machine is operatos on.
0: indicates the "work is yet to start"(Before) .
-1: indicates the "work is in progress"(Running) or "work is completed" (After).
-2: This is specific to IEnumerable. This is the initial state for IEnumerable before the call to GetEnumerator is made.
greater than 0: indicates the resuming state.
Also a point to note here is that -2 state is specific to IEnumerable. The other states are specific to IEnumerator. So when the GetEnumerator method is called by the IEnumerable the state will be changed to 0 and so on as it returns IEnumerator interface.


That's it ! At first glance it looks freaky, but when we slowly started understanding it has become a lot more easier than what we expected.


Please share your thoughts and reviews on this post ! Thanks !

Comments

Popular posts from this blog

Losing precision after multiplication in SQL Server


Yesterday I was doing a little multiplication operation in SQL SERVER. The computation involves multiplication of two decimal numbers. A point to note here is that I've to be very careful while performing the calculations because it's a monetary thing. So, Whatever the calculation made, it must be very accurate such that I shouldn't miss any fractions as it costs in MILLIONS. The scenario is that I've two numbers, after multiplying each other I'm losing the precision of the values. I've tried to check my database for any faulty data that caused the issue. But the data is absolutely fine.

Stack implementation using C#

A stack is a data structure.

And the definition as on Wikipedia:
In computer science, a stack is an abstract data type that serves as a collection of elements, with two principal operations: push, which adds an element to the collection, and pop, which removes the most recently added element that was not yet removed.
Data structures are so simple if you can understand everything in it. They are just another programs which are tied together with small pieces of implementation.