sub() and gsub() function in R

sub() and gsub() function in R

Both the sub() and gsub() functions in R can replace strings with others, but how different are they? This article will show you the difference between them and how to use the functions effectively.

What are the sub() and gsub() functions in R?

The sub() function

The sub() function replaces the first matched string with another string

Syntax: 

sub(target, replacement, string)

Parameters: 

  • target: The substring needs to be replaced
  • replacement: The alternative substring
  • string: The string to replace the substring

The gsub() function

The gsub() function finds and replaces all the matched strings with another string. Because the function replaces all the matched strings, it can replace one set of characters with another.

Syntax: 

gsub(target, replacement, string)

Parameters: 

  • target: The substring needs to be replaced.
  • replacement: The alternative substring.
  • string: The string to replace the substring.

How to use the sub() and gsub() functions?

Replace a substring in a string

Take a simple example to compare the sub() and gsub() strings. In the example, we replace the character i with the upper character. In the first string, we use the sub() function, and only the first character is replaced, while the gsub() function is used with the second string, and all the characters are replaced.

Code:

string1 = "This is a string replaced by the sub() function"
string2 = "This is a string replaced by the gsub() function"
 
# Replace the character i by I by the sub() function
string1 = sub("i", "I", string1)
 
# Replace the character i by I by the gsub() function
string2 = gsub("i", "I", string2)
 
print(string1)
print(string2)

Result:

"ThIs is a string replaced by the sub() function"
"ThIs Is a strIng replaced by the gsub() functIon"

Replace a substring in a data frame

We can apply sub(), and gsub() functions to a data frame to replace substrings. Each cell is considered a string when replacing substrings in a data frame’ column. As a result, the sub() function only replaces the first character. Look at the following example and see the difference. 

Code:

index = c(0, 1, 2, 3)
char = c("this is", "a", "sample", "string")
 
df1 = data.frame(index, char)
df2 = data.frame(index, char)
 
# Replace the character s by S by the sub() function
df1$char = sub("s", "S", df1$char)
 
# Replace the character s by S by the gsub() function
df2$char = gsub("s", "S", df2$char)
 
print("The first data frame")
print(df1)
print("The second data frame")
print(df2)

Result:

"The first data frame"
  index    char
1     0 thiS is
2     1       a
3     2  Sample
4     3  String
"The second data frame"
  index    char
1     0 thiS iS
2     1       a
3     2  Sample
4     3  String

Replace a set of characters with the gsub() function

An advantage of the gsub() function is that it can replace a set of characters, which is known as a regular expression. In this example, we will eliminate all digits from the string by assigning [0-9].

Code:

string = "Abraham Lincoln was born on February 12 1809"
 
# Eliminate all digits from the string
result = gsub('[0-9]', "", string)
 
print("The strings before and after are:")
print(string)
print(trimws(result))

Result:

"The strings before and after are:"
"Abraham Lincoln was born on February 12 1809"
"Abraham Lincoln was born on February"

Summary

In summary, both the sub() and gsub() functions in R replace a substring with another, but the sub() function only replaces the first matched substring, while the gsub() function replaces all the matched strings. Moreover, the gsub() function can be used with regular expression. 

Maybe you are interested:

Posted in R

Leave a Reply

Your email address will not be published. Required fields are marked *