UNIX Primer

Home | Primer Home

Regular Expressions

Regular expressions (which you will learn a great deal more about in courses like CSE 262) are a way of representing a string or group of strings that share common characteristics. This is done by creating a 'pattern' to which strings can be compared or 'matched.' For instance, the pattern 'hello' matches the strings 'hello', 'It's nice to say hello', and 'I say hello, you say goodbye.', among others.

Generally, we want a more versatile tool at our disposal, and matching simple strings is not very powerful. Therefore, we have other regular expression constructs to aid us in our string matching. Note that different regular expression engines might use different symbols for these constructs. What follows are six of the more commonly used symbols.

For instance:

Reg. Exp.Matches Strings...Examples
^(hello)+$ that contain only the word hello 1 or more times "hello", "hellohello", "hellohellohello", etc.
^(hello)*$ that contain only the word hello 0 or more times "", "hello", "hellohello", etc.
hello|goodbye that contain either 'hello' or 'goodbye' "goodbye", "hello", "hello, goodbye", "I like to say hello", "I hate to say goodbye", etc.
(good|bad)? that may or may not contain 'good' or 'bad' "hello", "", "goodbye", "bad dog!", "goodfellows", etc.

The best way to familiarize yourself with regular expressions is to use them. The following man files describe how regexps are used in C (two different ways) and Perl, respectively, and give more information about how regular expressions work: 'man regex', 'man regexp', 'man perlre'

Designed by D. Kaminsky
Edited by Diana Palsetia
© University of Pennsylvania, 2008