How To Split String By Space And Preserve Quoted Strings In Python

Split string by space and preserve quoted strings in Python

Python’s spit() function can easily split a simple string into a list. However, the requirement is not always to simply split a string; for example, how can we split a string with spaces while preserving a character like quotes? So, today we’ll show you three easy ways to split string by space and preserve quoted strings in Python.

Split string by space and preserve quoted strings in Python

Splitting a string by space and maintaining quoted substrings can help us organize and work with our data more effectively. Let’s see how we can do this with Python.

Using shlex.split() function

The shlex module provides a function, shlex.split(), we can use that to split strings by space and preserve quoted strings.

Syntax:

shlex.split(string, comments, posix)

Parameters:

  • string: The string to be split
  • comments: If it is False (the default), comment parsing in the given string is disabled.
  • posix: If True, split the string by space and not keep quoted strings. If False, split the string by space and keep quoted strings.

To use the shlex module, we first need to import it:

import shlex

After that, we pass the string to be split into shlex.split() to split it and keep the quoted strings. Like this:

# Import the required module
import shlex

string = 'Learn Share "IT knowledge"'

# Split the string and preserve quoted strings using shlex.split()
print(shlex.split(string))

Output:

['Learn', 'Share', 'IT knowledge']

To split the string and keep quoted strings, set the posix option to False. Like this:

# Import the required module
import shlex

string = 'Learn Share "IT knowledge"'

# Split the string and keep quoted strings using shlex.split()
print(shlex.split(string, posix=False))

Output:

['Learn', 'Share', '"IT knowledge"']

Using csv.reader() function

The csv.reader() function is part of the CSV module and is used to read CSV files in Python. It is a must-have function when working with CSV files in Python.

Syntax:

csv.reader(f)

Parameters:

  • f: object returned as a list after successfully opening a file with the open() function.

In this case, we will use the csv.reader() function to split the strings within a list while preserving quoted strings. To use the csv.reader() function, we first need to import the CSV module:

import csv

Then, using the csv.reader() function, we can create a CSV reader object with the delimiter option set to a space, and use the For loop to access the reader() function’s return value. As an example:

# Import the required module
import csv

strings = ['Learn Share "IT knowledge"', 'Learn "Python programming"']

# Use csv.reader() to split the strings within a list and preserve quoted strings
readerObj = csv.reader(strings, delimiter=' ')

# Use For loop to access the reader iterator
for item in readerObj:
	print(item)

Output:

['Learn', 'Share', 'IT knowledge']
['Learn', 'Python programming']

Using re.findall() function

The re (regular expression) module includes a function called findall() that we can use to split strings by space while preserving quoted strings.

To use the findall() function, we first need to import the re module:

import re

Then we’ll use the findall() function with the following regex: [^"\s]\S*|".+?"

Except for spaces, this regex matches all characters (spaces in double quotes are also matched). For a better understanding, consider the example below.

# Import the required module
import re

string = 'Learn Share "IT knowledge"'

# Use findall() with the regex to split the string and keep quoted strings intact
print(re.findall(r'[^"\s]\S*|".+?"', string))

Output:

['Learn', 'Share', '"IT knowledge"']

Summary

We have shown you three simple ways to split string by space and preserve quoted strings in Python. All of these ways have their own uses and benefits, so choose the best method based on your situation.

Have a great day!

Maybe you are interested:

Leave a Reply

Your email address will not be published. Required fields are marked *