Tuesday, March 19, 2024 05:55

Table of contents >> Strings And Text Processing > Strings

Strings

So far, we have often used Console.ReadLine() in our Console programs, in order to get some text input from the user. Whenever we needed to use that text (and even inspecting the Console.ReadLine() method signature), we needed to store that text in a string variable type. At the time I explained the string type, I was saying that string is actually a collection of char variables packed together in a single variable.

Let’s talk a bit about char. In the .NET Framework, each character has a special serial number from the Unicode Table. Unicode is an international encoding standard for use with different languages and scripts, by which each letter, digit, or symbol is assigned a unique numeric value that applies across different platforms and programs. In other words, each character has a unique number which is the same on any platform (Unix, Windows, Linux, MasOS, etc), so that the character can be recognized on any of them. Unicode was established during the late ’80’s and early ’90’s, because its predecessor, ASCII, was able to identify only 128 or 256 characters. That meant that characters such as ü, ç, ô, ã, or even sign characters such as £, ©, ±, µ, mathematical symbols, Russian, Chinese or Greek letters, etc, would not be available or recognizable. By using Unicode encoding, computers were suddenly able to recognize over 100.000 new characters or symbols!

Now that we talked about character encodings, let’s remember how we used to declare a string:

By knowing that a string variable is just a collection of char variables, we can graphically visualize the above string variable as

String variable representation

We already saw this representation in the arrays examples. So, at the end, we can view the string as an array of characters, and we can even use a char[] array instead of a string, but that would have some disadvantages, such as filling the array only one character at a time, having to know the length of the text when declaring the array, and having to do all the text processing manually. Please note the string value too – the quotes are not part of the text, they are just enclosing its value.

Unlike char, which is a value type, string is a reference type. We haven’t learn about Stack and Heap yet, but suffice to know that reference types are stored in a dynamic memory location, and we access them through a special kind of variable called a pointer, which does what its name implies: points to some memory location, where the value is actually stored. This topic is pretty advanced and you don’t need to preoccupy yourselves with it for now. Suffice to say that a string, being a reference type, cannot be modified directly, and it is immutable (the sequence of characters stored are never changing) – whenever we assign a new value to a string variable, we are actually creating a new string variable in the managed memory and make the pointer variable point to this new location, while the contents of the old one are recycled. To demonstrate that a string is an array of characters, but also a reference type, lets take the following code:

Just like arrays, we can access the individual characters of the string using an array index, and just as in the array’s case, we get an exception if we try to access an index that does not exist. We can also see that this kind of array is read only – when we tried to change the value of a certain array element, we got an error as well. This happened because, as I said, string cannot be modified directly: when we tried to change the value of the character inside the string, we tried to modify the string that was originally created when we declared the string variable; but, like I explained earlier, this is not permitted. The reason why we don’t get an error on the last line, when we change the value of the variable, is because doing so, we don’t modify the contents of the original str variable. Instead, the compiler will declare a new variable in the memory, point the pointer variable to the location of this new variable, and delete the old variable.

Starting from the next lesson, we will begin text processing, which mean operations that we can apply on our strings variables. As you will soon learn, text processing is a very common task in programming.

Tags: , , , , ,

Leave a Reply



Follow the white rabbit