How to remove duplicate words from given string using Regular expression with Python?
What we’ll do:
Welcome to this blog, here in this blog we will write a python program to remove duplicate words from given string using Regular expression with Python.
Intro to Regex:
Regular Expression, or regex or regexp in short, is extremely and amazingly powerful in searching and manipulating text strings, particularly in processing text files. One line of regex can easily replace several dozen lines of programming codes.
regex = "\\b(\\w+)(?:\\W+\\1\\b)+";
The explanations of the above regular expression can be understood as:
- “\\b” means a word boundary. Boundaries are needed for special cases. For example, in “My thesis is great”, “is” wont be matched twice.
- “\\w” means a word character: (i.e.)[a-zA-Z_0–9]
- “\\W+” means a non-word character: [^\w]
- “\\1” matches whatever was matched in the 1st group of parentheses, which in our case is the (\w+)
- “+” is used to match whatever it’s placed after 1 or more times
First of all we import re which is the regular expression module. Then we start writing a function named “removeDuplicates()” which takes input string as a argument
Inside the function we write the regex checks to check for repeated words and we pass it to a sub function which is again provided to us by “re” module. In that we are passing the case where we don't need the repeated words along with the given input string as one of the parameter.
Then, we return the modified sentence in which duplicate words has been removed
Fine, now we pass input to the function “removeDuplicates()”, as parameter.
Finally we print the sentence which doesn't have duplicated words in it.
import re #using Regular Expression or Regex.def removeDuplicates(input): #Regex to matching repeated words
regex = r'\b(\w+)(?:\W+\1\b)+'
#Ignoring all repeated words with re
return re.sub(regex, r'\1', input, flags=re.IGNORECASE)# Test Case: 1
str1 = "How are are you"
print(removeDuplicates(str1))# Test Case: 2
str2 = "Guvi is the the best platform to learn"
print(removeDuplicates(str2))# Test Case: 3
str3 = "Programming is fun fun"
How are you
Guvi is the best platform to learn
Programming is fun
What we learnt:
Through this blog, you learnt What is Regular expressions(Regex) and how to remove duplicate words from given string using Regular expression with Python.