Wednesday, June 19, 2019 01:45

Searching for a string within another string

Another very useful operation when dealing with text is the searching of a certain string or letter inside another string. There are multiple ways of accomplishing this, each behaving in a different way.

The first function that we can use to perform a search is Contains(). It’s purpose should be self descriptive:

As you can see, the Contains() function requires a parameter of type string, which will be our search parameter. If the string upon we perform the search contains the string that we are searching for, the method will return True, otherwise False. Be careful, though, the search is case sensitive! This means that it will take into account the difference between uppercase and lowercase letters. The following example will return False, because the casing of the compared string is different:

If we want to perform a search that is case insensitive, we can transform the searched string to whatever casing we want to:

You notice not only that the search will return True in this case, but also that we can queue string operation methods one after another, as in the case of ToLower() followed by Contains().

The next method of searching for a string inside another string is by using the IndexOf(), LastIndexOf()StartsWith() and EndsWith() functions. Unlike the previous method, these will not return a boolean result of the search, but instead they will return a numerical value indicating the position of the search string inside the searched string. Let’s start with IndexOf():

First obvious thing is the result stored in an int variable. Like I said, this function returns a numeric value that represents the location of the found term. In the first usage, we searched our string for “Follow”, and we obtained 0 as a returned value. This is because “Follow” was found at the exact beginning of the phrase we are searching in. The numeric indexing always starts from 0, not 1! And, also, if we are searching for more than a single character, if an occurrence is found, the result will be the index location of the first letter of the searched string inside the string that is searched in. In other words, when we searched for “Follow”, IndexOf() found an occurance, and returned the index of the first character of the searched term, meaning “F”, which, inside the str variable, is located at index 0. This is the same for the second search, when we searched for “White”. When IndexOf() found an occurrence inside str, it returned the index of the first letter of the searched term, “W”, which is located at index 11 inside str.

Just as Contains(), IndexOf() is case sensitive. This means that when we searched for “WHITE”, it returned -1 as a result. This value always reads as “nothing found”. If we want to perform a case insensitive search, we can use an overload of the IndexOf() function which lets us specify that we don’t want a case sensitive search.

If you pay attention to the last three searches, you will realize that our str variable contains two occurrences of “o”. How do we find the location of the second? If we perform the usual search, we will always get 1 as a result, which is expected. What we can do in this case is to use an overload of the IndexOf() function, which takes a number as parameter that indicates the index from which we want to start our search, or what is called an offset. When we used  str.IndexOf("o", 2); we told the compiler that we want to find the first occurrence of the letter “o” inside the str string, BUT starting from the position 2 inside it, which is the character “l”. This returned 4, because the first “o” after “l” is located at that index. So, even if we specify an offset for our search, we will still get the position inside the whole str variable, not from the offset we specified! That offset is used only to state the index from which the search should start.

Another overload of IndexOf() lets us also specify a number of characters that we want to examine, in addition to the offset index. This is done in the last search, where we are searching for the “o” string, starting from index 2, but only in the next two characters that follow after that index. Inside our str variable, index 2 is occupied by the first letter “l”, but it is considered in front of it. That is why two more characters from that location will mean the index that is just before the second “o” letter, and hence, the result will still be -1, because there is no “o” in that interval.

LastIndexOf() works on the same principle of IndexOf(), but it will always return the position of the last occurrence of the searched term within the searched string. Keep in mind that the index is the position of the first searched character, so if you are searching for a string composed of multiple characters, if you want to find out the position at the end of the searched string, you need to add the length of the searched term itself. For instance:

In the above example, we searched for the last occurrence of “bb”, which was found at index 19. It can be better visualized if we display the characters array of the string:

When we wanted to get the position after the searched string (after “bb”), we needed to add 2 to the found index (because we searched for 2 characters), which resulted in index 21, which is occupied by letter “i”. As in the case of IndexOf(), we can also specify an offset and a characters count.

StartsWith() and EndsWith() search whether a string starts or ends with the string we are searching for. They return a boolean value:

Here is the output:

 

Comments

comments

Tags: , , , , , ,

Leave a Reply