[Overview] [Previous] [Next]

Three Ways of Defining a Language

This page presents an example solved three different ways. No new information is presented.


Problem: Define a language containing all strings over sigma = {a, b, c} where no symbol ever follows itself; that is, no string contains any of the substrings aa, bb, or cc.

Definition by grammar

Define the grammar G = (V, T, S, P) where These should be pretty obvious except for the set V, which we generally make up as we construct P.

Since the empty string belongs to the language, we need the production

S goes to empty string

Some strings belonging to the language begin with the symbol a. The a can be followed by any other string in the language, so long as this other string does not begin with a. So we make up a variable, call it NOTA, to produce these other strings, and add the production

S goes to a NOTA
By similar logic, we add the variables NOTB and NOTC and the productions
S goes to b NOTB
S goes to c NOTc

Now, NOTA is either the empty string, or some string that begins with b, or some string that begins with c. If it begins with b, then it must be followed by a (possibly empty) string that does not begin with b--and we already have a variable for that case, NOTB. Similarly, if NOTA is some string beginning with c, the c must be followed by NOTC. This gives the productions

NOTA goes to empty string
NOTA goes to b NOTB
NOTA goes to c NOTC
Similar logic gives the following productions for NOTB and NOTC:
NOTB goes to empty string
NOTB goes to a NOTA
NOTB goes to c NOTC
NOTC goes toempty string
NOTC goes to a NOTA
NOTC goes to b NOTB

We add NOTA, NOTB, and NOTC to set V, and we're done.

Example derivation:

S directly derives a NOTA directly derives a b NOTB directly derives a b a NOTA directly derives a b a c NOTC directly derives a b a c.

Definition by nfa

Defining the language by an nfa follows almost exactly the same logic as defining the language by a grammar. Whenever an input symbol is read, go to a state that will accept any symbol other than the one read. To emphasize the similarity with the preceding grammar, we will name our states to correspond to variables in the grammar.

nfa

Definition by regular expression

As usual, it is more difficult to find a suitable regular expression to define this language, and the regular expression we do find bears little resemblance to the grammar or to the nfa.

The key insight is that strings of the language can be viewed as consisting of zero or more repetitions of the symbol a, and between them must be strings of the form bcbcbc... or cbcbcb.... So we can start with

X a Y a Y a Y a ... Y a Z
where we have to find suitable expressions for X, Y, and Z. But first, let's get the above expression in a proper form, by getting rid of the "...". This gives
X a (Y a)* Z
and, since we might not have any as at all,
(X a (Y a)* Z) + X

Now X can be empty, a single b, a single c, or can consist of an alternating sequence of bs and cs. This gives

X = (empty string + b + c + (bc)* + (cb)*)
This isn't quite right, because it doesn't allow (bc)*b or (cb)*c. When we include these, we get
X = (empty string + b + c + (bc)* + (cb)* + (bc)*b + (cb)*c)
This is now correct, but could be simplified. The last four terms include the empty string+b+c cases, so we can drop those three terms. Then we can combine the last four terms into
X = (bc)*(b + empty string) + (cb)*(c + empty string)
Now, what about Z? As it happens, there isn't any difference between what we need for Z and what we need for X, so we can also use the above expression for Z.

Finally, what about Y? This is just like the others, except that Y cannot be empty. Luckily, it's easy to adjust the above expression for X and Z so that it can't be empty:

Y = ((bc)*b + (cb)*c)

Substituting into (X a (Y a)* Z) + X, we get

((bc)*(b + empty string) + (cb)*(c + empty string) a (((bc)*b + (cb)*c) a)* (bc)*(b + empty string) + (cb)*(c + empty string)) + (bc)*(b + empty string) + (cb)*(c + empty string)


Copyright 1996 by David Matuszek
Last modified Feb 10, 1996