When we use Python to crawl data from web pages, the results returns include HTML tags, but if we don’t want HTML tags to appear in our string, we need to know how to remove the HTML tags from a String in Python. So today, we will show you how to deal with this challenge. Read on for more details.
Remove the HTML tags from a String in Python – How can we do it?
To solve this problem, you must have basic regular expression knowledge. If you don’t know, don’t worry, we’ll show you.
Here are three methods to remove the HTML tags from a String in Python.
Using sub() function and compile() function from regex
re.sub() syntax:
re.sub(pattern, replace, string, count=0, flags=0)
re.compile() syntax:
re.compile(pattern, flags=0)
Parameters:
- pattern (required): a regex that has to be found in the string.
- replace (required): a string that will replace the pattern in the string.
- string (required): a given string.
- count (optional): the number of pattern occurrences.
- flags (optional): flags in regex such as IGNORECASE, GLOBAL.
Before using we have to import the regex with a line of code like this:
import re
The idea here is that we use the compile()
function to create a regex object that we pass to the sub()
function to perform a search on the HTML string. The sub()
function will automatically find all matches that match the pattern we passed in and replace them with the empty string. So we have solved the problem.
import re htmlStr = '<h2>learn Python programming at learnshareit.com</h2>' # Remove everything starting with '<' and ending with '>' pattern = re.compile('<.*?>') myStr = re.sub(pattern, '', htmlStr) print(myStr)
Output:
learn Python programming at learnshareit.com
Using BeautifulSoup() function in BeautifulSoup library
BeautifulSoup is a Python library for pulling data from HTML and XML files. It works with parsers that give you ways to navigate, search, and edit within the parse tree.
Syntax:
soup = BeautifulSoup(html, parser)
Parameters:
- html: a given HTML string.
- parser: a parser such as an HTML parser or XML parser
BeautifulSoup uses a parser to turn an HTML structure into a soup object.
Before using it, we must import BeautfulSoup with a line of code like this:
from bs4 import BeautifulSoup
Once imported, we will use an XML parser to initialize a soup object from an HTML string then we use the text property to remove the HTML tags from a String in Python. Like this:
from bs4 import BeautifulSoup htmlStr = '<h2>learn Python programming at learnshareit.com</h2>' myStr = BeautifulSoup(htmlStr, "lxml").text print(myStr)
Notice: If lxml is not installed, please install by command line: pip install lxml
before using it.
Output:
learn Python programming at learnshareit.com
Using xml.etree.ElementTree library
ElementTree is an XML parsing and manipulation library. We use the itertext()
function and the fromstring()
function from this library to solve the problem.
fromstring() syntax:
fromstring(text, parser)
Parameters:
- text: a string contains HTML or XML data.
- parser (optional): a parser such as an HTML parser or XML parser (default: XML parser).
itertext() syntax:
itertext()
Description:
Create an iterator that iterates over the element calling this method and returns all the text inside.
We will use fromstring()
to create a root element from the HTML string, then use the itertext()
function to get all the text inside that root element. Finally, we use join()
to return a string from the returned result of the itertext()
function. Like this:
import xml.etree.ElementTree as ET htmlStr = '<h2>learn Python programming at learnshareit.com</h2>' # Create root element root_el = ET.fromstring(htmlStr) # root_el = h2 element # Convert to a string myStr = ''.join(root_el.itertext()) print(myStr)
Notice: don’t forget to import xml.etree.ElementTree
before using it.
Output:
learn Python programming at learnshareit.com
Summary
You have discovered three ways to remove the HTML tags from a String in Python. If you don’t want to install too much, you should use the first method we have recommended. On the other hand, if you want to be quick and convenient, use the second method. We hope this article helps you.
Have a beautiful day!
Maybe you are interested:
- Strip the HTML tags from a string in Python
- Split a string on the first occurrence in Python
- Split a string by comma in Python

Hi, I’m Cora Lopez. I have a passion for teaching programming languages such as Python, Java, Php, Javascript … I’m creating the free python course online. I hope this helps you in your learning journey.
Name of the university: HCMUE
Major: IT
Programming Languages: HTML/CSS/Javascript, PHP/sql/laravel, Python, Java