Live testing regular expressions …
Published:
As some point in your career you’ll find yourself needing to parse large amounts of alpha numeric data and selecting sections which match certain string patterns. For example, maybe you are parsing the contents of a webpage looking for hyperlinks, or summarizing the contents of a log file, or searching for values of parameters in a file heavy with text.
If you hadn’t heard of regular expressions upto this point googling what you are trying to achieve can be frustrating. So let me save you the trouble, you need to use “regular expressions”!! While the concept of regular expressions is quite simple, matching string patterns, mastering their use is no trivial task. They will become, simultaneously, both your best and most frustrating friend.
With that being said, this isn’t a post on how to use regular expressions, there are hundreds of those already, but more about the tools and resources available to you for quick learning.
Note: The phrase regular expression
is often shortened to regex
(or regexp
)
The inevitable workflow usually consists of these steps (we’ve all gone through this!)
- you write a regex and execute it to see if you matched the correct piece of the text
- you realize you got it all wrong
- you try again … muttering under your breath
- you repeat this process until you either get it right (woohoo!) or give up swearing profusely gesturing with your favored hand toward your monitor!
There is a fantastic online tool that allows you test out your regexes live, updating results in real-time as you type, on any piece of custom string with explanations of what your regex is ACTUALLY trying to match (not what you think it is matching). This tool will help speed up the learning process and reduce the level of frustration. Bookmark that site and give it a go!!
Other helpful, but obviously not exhaustive, regex resources.
- A cheat sheet of some of the most common expressions
- Regular expressions for R users
- If you’re a datacamp fan, they have a cheat sheet too
- What some reference as the bible for regular expressions
If you have any other great resources, let me know and i’ll add them!
Regular expressions should be the same across all languages, but there often some slight differences due to interpreters or compilers. For example, my preferred language of choice for data analyis is R. In many cases an additional \
is required to escape special characters. See here for more R specific details
Good luck and have fun!