Email LinkedIn

MN3441 Technology for Managerial Data Analysis

Pattern Matching (video time: 29 minutes)

video duration 1:36

Pattern Matching is the act of checking a sequence of characters for the presence of a given pattern. Regular expressions are sequences of characters that define the search patterns themselves. The underlying mathematical theory was introduced in 1951 by Stephen Cole Kleene and was implemented in computers for widespread use in the late 1960s and 1970s.

Almost all modern computer applications include a find feature, where you can type a word and identify where that word appears. Applications that are designed for editing data also typically include a replace feature, which overwrites the value searched with a new value provided. Find and replace are examples of pattern matching, where the pattern is simply the words you provide.

Motivation

People are pretty good recognizing patterns, for example, consider the following:

(248)-424-5555
7779995454
323-455-6746
(800) 999 1000

Although, these 4 lines are all formatted differently, you may have pretty quickly seen a pattern that led you to conclude these are all phone numbers. If you were given a large file of data and asked to consistently format all the phone numbers though, it would be a painful task to complete manually.

When it comes to preparing data, regular expressions are one of the most useful tools, because of their ability to find and reformat data. In this lesson, we’re going to understand how they work.

Software Tools

Programming languages include capabilities for pattern matching. Most modern text editors have at least basic support for regular expressions. There are also GUI tools available both online and as desktop applications.

...