Monday, May 25, 2020 06:05

Table of contents >> LINQ > IEnumerator and IEnumerable

IEnumerator and IEnumerable

Before we can start delving into LINQ, we need to first understand the underlying principles upon which it is built. LINQ is all about operations over collections, so, you’ve guessed it: we will be dealing with collections.

You already learned that of all data structures, arrays are the fastest, because they are unsorted non-generic data structures. However, arrays have their disadvantages, they are not easily manipulated (searched, resized, sorted, etc). Because Microsoft was stupid and eager to release C# before generics were added to the language, they introduced a few data structures such as ArrayList or HashTable (and others), which were supposed to offer what generics should have offered, had they been released with the first version of C#. When generics were added in C# 2.0, the proper generic data structures were also added, and because of backwards compatibility issues, we are now stuck with non-generic and generic variants of the same things. ArrayList is the non-generic version of List<T>, HashTable corresponds to the generic Dictionary<K, V>, and so on, but since generic variants are always better then their non-generic counterparts, consider the non-generic ones as deprecated and do yourselves a favor by not using them.

Anyway, in this lesson, we will be dealing with just arrays, because they offer the raw speed, and List<T>, because it offers easy manipulation and it is generic.

You already know that the way we iterate collections is by using either the for, foreach and (rarely) while loops. While iterating using a for or a while loop is straightforward (the compiler either uses a control variable, or iterates infinitely until a condition is met), iteration using a foreach loop is actually a lot more than what meets the eye. To demonstrate this, I will try to create my own custom version of a data structure, and use a foreach loop to iterate through its elements.

I will start with a code you are already familiar with:

The result will expectedly look like this:

C# foreach displayThe codes are simple: I just declared a generic List of type int, added some elements to it, and then I displayed its elements on the console using a foreach loop. What you might not know is that all data structures in .NET are based at some point on arrays, if you start looking under the hood. So, when I used a List, I actually used an improved version of an array, which supports sorting, resizing, enumeration, and so on.

So, now let’s try to replicate a generic List. I’ll start by declaring a class, GenericList, of type T, because I also want to make my custom list generic, just like List<T> is, then I will replace List<int> with GenericList<int> in my existing codes:

Of course, now we have errors everywhere, because my GenericList class doesn’t have an Add() method. So, let’s correct that:

At this point, the error underlying the Add() method goes away, because we declared one inside GenericList. We declared an integer that holds the count of how many elements the array contains, initialized as 0, because we have 0 elements at the beginning, then we used that integer to specify the index where we want to add the new item inside the items array. You can also notice that I used count++ and not count alone as the index variable, because we also want to increment the count variable every time we want to add a new item, and we want to increment it AFTER we add that item, because if we increment it before, we would add the new element in a wrong position, since arrays indexes start as 0, but array elements count starts at 1.

You might think that this should suffice, but if you will comment out the foreach loop and try to run the example as it is, you will get a nasty IndexOutOfRangeException exception:

C# array IndexOutOfRangeException

Of course, the explanation is simple: I initialized my items array as an array of 5 elements, and I tried to add 6. This means that now, I must add some codes that will resize my array every time it fills up:

Run the example now, and it should work just fine, no matter how many items we add, the array will resize itself to fit them all.

The last problem that remains is iterating over my GenericList using a loop. If you uncomment the foreach loop, you will still get a squiggly red line indicating an error underneath: Error CS1579 foreach statement cannot operate on variables of type ‘GenericList<int>’ because ‘GenericList<int>’ does not contain a public instance definition for ‘GetEnumerator’.

The foreach loop is the first form of heavy syntactic sugar that Microsoft introduced way back in the first version of C#. The compiler actually converts the foreach loop to this:

Of course, the variable genericList does not contain a method named GetEnumerator(), and you don’t know what IEnumerable is, even though you can probably deduce it is an interface, because its name starts with a capital I, but you don’t need to panic. It all has to do with the concept of enumerating (iterating) over collections, a concept that was introduced even longer than C# existed. The idea is that if you have a sequence of elements, and we do have one – the genericList variable, which holds 6 elements, you could visualize it in memory as a container with multiple compartments, one for each of the stored values. When we want to get an enumerator over that block of elements, we essentially get an object that references a sentinel value at the beginning of the array, a value that doesn’t actually exist. You can imagine it like this:

C# enumerator

Initially, the enumerator points to a sentinel value, a cell at the beginning of the array that does not really exist and has no value. If we want to move the iterator object to the next element and get its value, we have to call the MoveNext() method on it. If MoveNext() returns true, that mean’s there is an element we haven’t looked at yet, and MoveNext() will continue to return true until there are no more items to look at. MoveNext() simply checks if there are items in the array that follow after the current position that the iterator object points at. If there are, it returns true, and it also moves the iterator to point to the next cell in the array. The Current property of the enumerator gives us, obviously, the value of the array cell at which the iterator is currently pointing at. You can test all of this simply by replacing our GenericList variable with a built in .NET List<T>, which does contain a method called GetEnumerator(). You can even debug the program and step through codes, to observe how the iterator advances an array position each time the MoveNext() method is invoked.

With this approach, you should be careful for two things. Take the following piece of code:

I changed my list to a generic List<T>, then, instead of using a foreach loop to iterate its elements, I asked for its enumerator, using GetEnumerator(), as explained. But, as soon as I got the enumerator, I asked for the Current property of it. Think about this for a second: I already said that the enumerator initially points to a sentinel value at the beginning of the array that doesn’t actually exist. What is the value of Current, if I ask for it while the iterator still points at this sentinel cell, which doesn’t have a value? And even weirder, if my list contains only 6 elements, what’s the value of Current, if I ask the iterator to MoveNext() 13 times, meaning 13 array cells? In both cases, you would expect to get an error or an exception of some sort, because any sane programmer would throw an exception under these circumstances. However, Microsoft was stupid once again, and instead of returning an error, their programmers decided that Current should be 0, 0 indicating an error. Well, 0 is also a perfectly valid value for an actual existing array cell, isn’t it? So, imprint on your mind, you will not get errors if you try to get the value of the enumerator before moving to the first array cell, nor will you get one if the iterator is moved next past beyond the bounds of that array, you will get 0.

Even more interesting, if you will use an ArrayList instead of List<T>, and change IEnumerator<T> to its non-generic form IEnumerator, you WILL get exceptions if you try to get the Current property while the iterator points at the sentinel cell value or past beyond the bounds of the collection. So, Microsoft decided to not throw errors in the generic list collection, but to throw them in the non-generic list that came before it. Very smart, Microsoft!

Returning to my initial example, let’s implement our own version of enumerator in our custom generic list:

You will notice that as soon as we added the GetEnumerator() function to our custom list class, all the errors went away, including the one where we tried to iterate over our list using a foreach loop. This means that a foreach loop actually expects that the collection to be iterated implements a function called GetEnumerator(), which returns objects of type IEnumerator<T>. Leaving aside the generic part of that concept, IEnumerator is just an interface, so, the GetEnumerator() function returns objects of type interface IEnumerator, or any object that implements IEnumerator interface. Finally, inside GetEnumerator() function, you can notice that I used yield return, instead of simply return. You definitely know what the return operator does inside a function, it stops the execution immediately, and returns a value to the caller of the function. You don’t know what yield return does, but you can think of it as the same thing as the simple return operator, but which does not stop the execution of the called function. If I had used the return operator, as soon as the first value of the for loop would be returned, the loop would be ended and the execution would step out of the GetEnumerator() function and return to the code that invoked it. Using yield return allowed me to return that value, just like the normal return operator would, but it also did not end the loop, instead, it continue to return values until the for loop ended. I will discuss the yield return operator in greater detail in a future lesson, but for now, think of it just as a way of returning more than one value from a function.

Before yield return existed, programmers would return elements in the GetEnumerator() function by creating a nested class that implements IEnumerator. If you look at the definition of IEnumerator<T>, you will notice that it implements an interface called IDisposable, of which we didn’t learn about yet, but it basically allows the Garbage Collector to recycle the memory of elements that are no longer needed, and it also contains a single read-only property of type T named Current (the one we used to get the value of the array cell that the iterator was pointing at at some moment):

C# generic version of IEnumeratorNow, you might ask yourself, “if this interface only contains a read-only property, where does MoveNext() method comes from?” The answer is also in the above photo, IEnumerator<T> also implements its non-generic version, IEnumerator, which looks like this when previewed:

C# non-generic version of IEnumeratorUnfortunately, Microsoft stroke again, and they decided to add a Reset() method, because back in the day they thought that programmers would call MoveNext() until they reached the end of the array, and they would call Reset() to move the iterator back to the beginning of the array. Personally, I’ve never used Reset() in my entire career as a professional programmer, nor have I ever seen anyone use it. Everyone simple uses foreach loops, or if they deal with iterators manually, they simply iterate through the end of the array, and if they need to iterate again, they simply get a new enumerator. In fact, when you use yield return and also call Reset() on the iterator, you will get a NotSupportedException and your program will crash, because “hey, I don’t support resetting, someone back in the day thought it was a great idea, but then decided it was not such a great idea, so I don’t support it!”. Stupid Microsoft, again. The thing is, if and when you want to create your own enumerator, you are stuck to also implement the Reset() method, even if you don’t want to use it and don’t need it, because the IEnumerator interface declares it, and it must be implemented to fulfill the contract with the interface.

Let’s also create our own iterator by adding a nested class that implements IEnumerator<T>:

Of course, since GenericEnumerator is a nested class to the GenericList<T> class, the type T of the IEnumerator<T> implemented in GenericEnumerator is the same type T of GenericList<T>. If we declare a GenericList<int>, GenericEnumerator will also implement an interface of type IEnumerator<int>.

Next, we also need an int variable that will store the current index in each instance of our custom iterators, so that we know where the enumeration is. Our custom enumerator class also needs to know about the array that it will iterate, so we will also add a constructor to it, which will demand a collection of type GenericList<T>. Finally, we will implement the MoveNext() method and the Current property:

And this is the result:

C# custom GetEnumerator

So, we are done implementing a custom own iterator. I don’t encourage you to do this, the built in .NET one is already there to be used. You should only do something like this if you have a strong motive to implement your own version of iteration, and to understand the processes that happen under the hood of the .NET iterator.

Let’s return to the original code before the custom iterator, where we were still using yield return. You do know that since C# 3.0, we are allowed to add multiple elements to a collection ever since we initialize it, so we don’t have to call the Add() method a zillion times after:

But if you do this at this point, you will get an error in Visual Studio: Error CS1922 Cannot initialize type ‘GenericList<int>’ with a collection initializer because it does not implement ‘System.Collections.IEnumerable’. This basically means that if we want a class to be considered a collection container, we have to make it implement another interface called IEnumerable. That’s because any collection should be… enumerable. Be careful not to confuse IEnumerator with IEnumerable. They are two different interfaces, with two different purposes.

In all honesty, I should have implemented IEnumerable ever since I started created my custom generic list class, because that’s what any container class should do. The reason why I didn’t is because I wanted to show you that the foreach loop does not require IEnumerable to work, it only needs a custom enumerator, or something that implements IEnumerator, such as yield return.

Let’s also implement IEnumerable in our custom generic list:

If you do this, the initial errors under our implicit initialization of the custom generic list with random values will go away, but we still get an error that says we haven’t implemented all the functionality of the IEnumerable<T> interface. Let’s have a look at the signature of this interface, so we know what we need to implement:

C# definition of IEnumerablePay attention, it only has a single function named GetEnumerator(), which returns an interface of type IEnumerator<T>. Don’t we already have such a method, with this exact same signature inside our GenericList? Why is the compiler still complaining we haven’t implemented the interface? That’s also because Microsoft released C# before generics were added, so IEnumerable<T> also implements its non-generic pre-existing version, IEnumerable. Non-generic IEnumerable also contains a non-generic version of the GetEnumerator() method, which we must implement:

If we execute our program now, everything works as expected, and our custom list behaves exactly like a built in .NET generic list, with the only difference of it having many other members and functionality that we are not interested in here.

To conclude this lesson, many beginner to intermediate programmers are often confused about what’s the difference between enumerator and enumerable. Let’s set that once and for all: enumerator is just an object that moves from item to item, so that it gives us the sequence of all items of a collection whenever we want to iterate over that collection using a foreach loop. Enumerable is just a property of a container of elements of being able to give that collection of elements. Enumerable implements an enumerator, so when we make an object enumerable, we know that it must contain an enumerator that can be used to iterate over items.

Finally, you may ask yourselves why this lesson is placed in a chapter called LINQ, when it should have been in the chapter Data Structures, where we dealt with arrays and lists and everything. I already explained in the previous lesson that the real power of LINQ comes from extension methods. There is a class named Enumerable inside the System.Linq namespace that we will use rather heavily in the lessons that will follow in this chapter. Do not confuse the Enumerable CLASS with the IEnumerable INTERFACE. They are two disting things, and, no, Enumerable does not implement IEnumerable. If we take a look at this class, we will observe this:

C# System.Linq.Enumerable class definitionThe Enumerable class is rather huge, it contains a ton of extension methods which all extend… IEnumerable, of which we learned today! And now we are one step away from being ready to start learning about LINQ (after we also analyze yield return operator).



Tags: , , , , , , , ,

Leave a Reply

Do NOT follow this link or you will be banned from the site!