To remove URLs from Text in Python, we can use use the
re.sub() functions. Follow the article to understand better.
Remove URLs from Text in Python
We have a text string and a URL inside the string.
myString = ''' Text1 https://google.com '''
To remove URLs from Text, do as the following guide:
Use the findall() function
You can use the findall() function to search for URLs and then delete those URLs with the replace() function. Note that the findall() function is in the re module, so you need to import re before calling findall().
- regex: regular expression to search for digits.
- string: string you want the regular expression to search for.
The findall() function returns a list containing the pattern matches in the string. If not found, the function returns an empty list.
- Import re module.
- Create a string with the URL.
- Use findall() function to find URL in the string.
- Use the replace function to replace that URL with a space. So that URL has been removed.
import re # String containing URL myString = "This is a string with a URL https://learnshareit.com/" # Use findall() function to search for URL search = re.findall('http://\S+|https://\S+', myString) for i in search: # Remove that URL with replace() function text = myString.replace(i, '') print('String after removing URL:', text)
String after removing URL: This is a string with a URL
Use the re.sub function
Module ‘re’ has many methods and functions to work with RegEx, but one of the essential methods is ‘re.sub’.
The Re.sub() method will replace all pattern matches in the string with something else passed in and return the modified string.
re.sub(pattern, replace, string, count)
- pattern: is RegEx.
- replace: is the replacement for the resulting string that matches the pattern.
- string: is the string to match.
- count: is the number of replacements. Python will treat this value as 0, match, and replace all qualified strings if left blank.
- Import re module
- Create a string with a URL
- Use the re.sub() function to remove those URLs.
import re # String of URL myString = ''' Text1 https://google.com ''' # Use the re.sub function to remove URL from the string text = re.sub(r"\S*https?:\S*", "", myString) print('String after removing URL:', text)
String after removing URL: Text1
Use module urllib
You can use the urllib module with the urllib.urlparse class has a scheme attribute combined with the split() function to remove the URL in the string.
- In the urllib module, there is a urllib.urlparse class that helps with URL parsing.
- Use the scheme attribute to check if the string matches the URL structure.
- To remove the URL with this: Use the split() function to split the string into a list, then use the scheme function to check if each string in the list matches a URL.
- Finally, use the join() function to join the remaining elements.
from urllib.parse import urlparse # String containing URL myString = "This is a text with a URL https://learnshareit.com/" # Search and delete URL search = [l for l in myString.split() if not urlparse(l).scheme] # Merge string after removing URL text = ' '.join(search) print('String after removing URL:', text)
String after removing URL: This is a text with a URL
If you have any questions about how to remove URLs from Text in Python, leave a comment below. I will answer your questions. Thank you for reading!
Maybe you are interested:
- How to add space between variables in Python
- Remove Leading And Trailing Zeros From A String In Python
- How To Remove Everything Before A Character In A String In Python
My name is Jason Wilson, you can call me Jason. My major is information technology, and I am proficient in C++, Python, and Java. I hope my writings are useful to you while you study programming languages.
Name of the university: HHAU
Programming Languages: C++, Python, Java