How to remove duplicate words from given string using Regular expression with Python?

regex = "\\b(\\w+)(?:\\W+\\1\\b)+";
  • “\\b” means a word boundary. Boundaries are needed for special cases. For example, in “My thesis is great”, “is” wont be matched twice.
  • “\\w” means a word character: (i.e.)[a-zA-Z_0–9]
  • “\\W+” means a non-word character: [^\w]
  • “\\1” matches whatever was matched in the 1st group of parentheses, which in our case is the (\w+)
  • “+” is used to match whatever it’s placed after 1 or more times

What’s happening:

Source Code:

import re #using Regular Expression or Regex.def removeDuplicates(input):    #Regex to matching repeated words
regex = r'\b(\w+)(?:\W+\1\b)+'

#Ignoring all repeated words with re
return re.sub(regex, r'\1', input, flags=re.IGNORECASE)
# Test Case: 1
str1 = "How are are you"
# Test Case: 2
str2 = "Guvi is the the best platform to learn"
# Test Case: 3
str3 = "Programming is fun fun"


How are you
Guvi is the best platform to learn
Programming is fun




Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Advice and Tips from the VCs Judging Impervious’ Inaugural Hackathon

Meli.Games Launches Global Beta Test Whitelist

CSS Essentials

All Things Open 2017 Conference

7 Reasons you should learn Python now

How to send contact form data to an email address without server-side language

preview image

The State of the Union

Create a Highly Scalable Image Processing Service on AWS Lambda and API Gateway in 10 Minutes

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Santhosh Sudhaan

Santhosh Sudhaan

More from Medium

How to Extract Data from JSON File in Python?

How python is used in automation?

How to reduce lines of code [python]

Functions vs Methods