Sunday, November 18, 2018 19:01

String Builder

I was explaining at some point that string is an immutable type. That means that once you assign a value to a string variable, you cannot directly modify it anymore. This also means that any string operation using any function such as Trim(), Replace(), ToUpper(), etc, will actually create a new string in memory where the resulting value will be stored, and it will delete the old, initial value. This behavior is a very complex one, involving pointers and references, and it has many advantages, but in some cases it can cause performance problems.

The worst example of bad performance I can think of is the concatenation of strings inside a loop. NEVER do that! We haven’t learn about the dynamic memory or the garbage collector yet, so I cannot fully explain the reasons why this results in such terrible performance, but still, let’s try to understand the reasons behind it. To understand them we first need to understand what happens when we use the + or += operators on strings. Let’s consider the following code:

What will happen in the memory? When we declare the str1 and str2 variables, they will be stored in a special memory called the Heap. When we concatenate them, we assign the resulting value to a third variable. So, now we have three values in memory and three variables pointing to them, and this is the expected result. However, when we change the value of the already existing variable result, we are actually allocating a new memory area, store the new string in it, and delete the string value that was located in the previous location. This process can take time, specially when repeated many, many times, like in a loop.

In C#, we don’t have to worry about manually deleting the variable values that we no longer need, like in other languages such as C or C++. There is a special component called Garbage Collector that automatically cleans up any unused resources, but this comes with a price: whenever it performs the cleaning, it takes quite some time and it overall slows down the execution speed. So, not only we force the GC to clean the memory all the time, we also make the program transfer characters from one place to another in memory (when string concatenation is executed), which is slow, especially if the strings are long.

Let’s demonstrate this. Let’s concatenate the numbers from 0 to 200,000 in a string. The usual way of doing this would be like so:

We display the current time at the moment we start the concatenation (though we didn’t learn about the DateTime object yet), then we perform the joining of the string inside the loop, and finally display the current time again, to be able to compare the elapsed time.

C# string concatenation in loop performance benchmark

As you can see, on a Intel Quad Core i5 4590 CPU, running at 3.3 GHz, this took almost two minutes. Some of you might say, “yeah, but still, there’s 200.000 operations to be performed! That has to take some time!”, and you would be wrong. Computers are VERY good at performing repeated, extremely fast operations, specially on modern nowadays CPU’s.

But most importantly, in 2017, making your users wait 2 minutes for an operation is almost unacceptable, and many will close it before this gets a chance to complete.

The problem with time-consuming loop processing is related to the way strings work in memory. Each iteration creates a new object in the Heap and point the reference to it, as I explained. This process requires a certain physical time.

Several things happen at each step:

1. An area of memory is allocated for recording the next number of concatenation result. This memory is used only temporarily while concatenating, and is called a buffer.
2. The old string is moved into the new buffer. If the string is long (say 500 KB, 5 MB or 50 MB), it can be quite slow!
3. Next number is concatenated to the buffer.
4. The buffer is converted to a string.
5. The old string and the temporary buffer become unused. Later they are destroyed by the Garbage Collector. This may also be a slow operation.

A much more elegant and appropriate way to concatenate strings in a loop is using the StringBuilder class. I know, we haven’t talked about classes yet, but don’t bother yourself with that. Let’s just see how it works. First, StringBuilder is a class that serves to build and change strings. It overcomes the performance problems that arise when concatenating strings of type string. The class is built in the form of an array of characters and what we need to know about it is that the information in it can be freely changed. Changes that are required in the variables of type StringBuilder, are carried out in the same area of memory (buffer), which saves time and resources. Changing the content does not create a new object but simply changes the current one. Let’s rewrite the above code above in which we concatenated strings in a loop. Notice that the StringBuilder type is declared in an external library called System.Text, so you will need to add another using directive. If you remember, the operation previously took 2 minutes. Let’s measure how long will take the same operation if we use StringBuilder:

After running the code, we get this:

C# StringBuilder benchmark

I don’t know about you, but 200.000 operations in less than a second, now, that’s what I call a performance increase! The required time is actually in the order of milliseconds!

The way we use StringBuilder is by creating a new instance of it, and then use the Append() method to concatenate strings to it. You will better understand this process when you will learn the next chapter. For the time being, just remember that StringBuilder is a MUCH more efficient way of concatenating strings.

Comments

comments

Tags: , , , ,

Leave a Reply