How to remove duplicate words from given string using Regular expression with Python?

regex = "\\b(\\w+)(?:\\W+\\1\\b)+";
  • “\\b” means a word boundary. Boundaries are needed for special cases. For example, in “My thesis is great”, “is” wont be matched twice.
  • “\\w” means a word character: (i.e.)[a-zA-Z_0–9]
  • “\\W+” means a non-word character: [^\w]
  • “\\1” matches whatever was matched in the 1st group of parentheses, which in our case is the (\w+)
  • “+” is used to match whatever it’s placed after 1 or more times

What’s happening:

Source Code:

import re #using Regular Expression or Regex.def removeDuplicates(input):    #Regex to matching repeated words
regex = r'\b(\w+)(?:\W+\1\b)+'

#Ignoring all repeated words with re
return re.sub(regex, r'\1', input, flags=re.IGNORECASE)
# Test Case: 1
str1 = "How are are you"
# Test Case: 2
str2 = "Guvi is the the best platform to learn"
# Test Case: 3
str3 = "Programming is fun fun"


How are you
Guvi is the best platform to learn
Programming is fun




