How To Split A String Into Words And Punctuation In Python

Split a string into words and punctuation in Python

To split a string into words and punctuation in Python, which ways would you use? If you still have questions, please refer to the following article: I will use the re.findall function, re.split function, and list comprehension. Let’s find out together.

Split a string into words and punctuation in Python

Use the re.findall function

Syntax:

re.findall(regex, string))

Parameters:

  • regex: regular expression to search for digits.
  • string: string you want the regular expression to search for.

The findall() function returns a list containing the pattern matches in the string. If not found, the function returns an empty list.

Example:

  • Import the re-module.
  • Use the findall() function to split string to and punctuation marks.
import re

tesrStr = 'visit?learnshareit!website'

# Use the findall() function 
result = re.findall(r'\w+|[^\s\w]+', tesrStr)
print('String after splitting into words and punctuation marks:', result)

Output:

String after splitting into words and punctuation marks: ['visit', '?', 'learnshareit', '!', 'website']

Use the re.split() function

Syntax:

re.split(RegEx, string, maxsplit)

Parameters:

  • RegEx: regular expressions.
  • string: string you want to compare.
  • maxsplit: is the maximum number of splits. If not specified, Python defaults to an infinite number of splits.

The re.split() function. You can also use it to separate punctuation marks from strings.

Example:

  • Import the re-module.
  • Use the re.split() function to split string to and punctuation marks.
  • \W to find the delimiters in the string when splitting.
import re

tesrStr = 'vi,sit?learnshareit!website'

# Use the re.split() function to split string to words and punctuation marks
result = re.split("(\W+)", tesrStr)
print('String after splitting into words and punctuation marks:', result)

Output:

String after splitting into words and punctuation marks: ['vi', ',', 'sit', '?', 'learnshareit', '!', 'website']

Or you can use the following RegEx, which does the same thing.

Example:

import re

tesrStr = 'vi,sit?learnshareit!website'

# Use the re.split() function to split string to words and punctuation marks
result = re.split('([^a-zA-Z0-9])', tesrStr)
print('String after splitting into words and punctuation marks:', result)

Output:

String after splitting into words and punctuation marks: ['vi', ',', 'sit', '?', 'learnshareit', '!', 'website']

Use list comprehension

Example:

  • Initialize a string of delimiters.
  • Use a list comprehension to split the words and punctuation marks.
tesrStr = 'visit,learnshareit,website'

# Use a list comprehension to split the words and punctuation marks
print('String after the words and punctuation marks:')
print([u for x in tesrStr.split(',') for u in (x, ',')])

Output:

String after the words and punctuation marks:
['visit', ',', 'learnshareit', ',', 'website', ',']

Punctuation characters are separated from letters by list comprehension.

Summary

Those are three ways to split a string into words and punctuation in Python. You can use the split() function more than a regular expression if you can use it. That’s great. Visit our website to read more articles.

Maybe you are interested:

Leave a Reply

Your email address will not be published. Required fields are marked *