[Overview] [Previous] [Next]

# Three Ways of Defining a Language

This page presents an example solved three different ways. No new information is presented.

Problem: Define a language containing all strings over = {a, b, c} where no symbol ever follows itself; that is, no string contains any of the substrings aa, bb, or cc.

## Definition by grammar

Define the grammar G = (V, T, S, P) where
• V = `{S, ...some other variables...}.`
• T = = ```{a, b, c}.```
• The start symbol is `S.`
• P is given below.
These should be pretty obvious except for the set V, which we generally make up as we construct P.

Since the empty string belongs to the language, we need the production

`S `

Some strings belonging to the language begin with the symbol `a`. The `a` can be followed by any other string in the language, so long as this other string does not begin with `a`. So we make up a variable, call it `NOTA`, to produce these other strings, and add the production

`S a NOTA`
By similar logic, we add the variables `NOTB` and `NOTC` and the productions
`S b NOTB`
`S c NOTc`

Now, `NOTA` is either the empty string, or some string that begins with `b`, or some string that begins with `c`. If it begins with `b`, then it must be followed by a (possibly empty) string that does not begin with `b`--and we already have a variable for that case, `NOTB`. Similarly, if `NOTA` is some string beginning with `c`, the `c` must be followed by `NOTC`. This gives the productions

`NOTA `
`NOTA b NOTB`
`NOTA c NOTC`
Similar logic gives the following productions for NOTB and NOTC:
```NOTB ```
```NOTB a NOTA ```
`NOTB c NOTC`
`NOTC `
`NOTC a NOTA`
`NOTC b NOTB`

We add `NOTA, NOTB,` and `NOTC` to set V, and we're done.

Example derivation:

```S a NOTA a b NOTB a b a NOTA a b a c NOTC a b a c.```

## Definition by nfa

Defining the language by an nfa follows almost exactly the same logic as defining the language by a grammar. Whenever an input symbol is read, go to a state that will accept any symbol other than the one read. To emphasize the similarity with the preceding grammar, we will name our states to correspond to variables in the grammar.

## Definition by regular expression

As usual, it is more difficult to find a suitable regular expression to define this language, and the regular expression we do find bears little resemblance to the grammar or to the nfa.

The key insight is that strings of the language can be viewed as consisting of zero or more repetitions of the symbol `a`, and between them must be strings of the form `bcbcbc...` or `cbcbcb...`. So we can start with

`X a Y a Y a Y a ... Y a Z`
where we have to find suitable expressions for `X, Y,` and `Z`. But first, let's get the above expression in a proper form, by getting rid of the "...". This gives
`X a (Y a)* Z`
and, since we might not have any `a`s at all,
`(X a (Y a)* Z) + X`

Now X can be empty, a single `b`, a single `c`, or can consist of an alternating sequence of `b`s and `c`s. This gives

```X = ( + b + c + (bc)* + (cb)*)```
This isn't quite right, because it doesn't allow `(bc)*b` or `(cb)*c`. When we include these, we get
```X = ( + b + c + (bc)* + (cb)* + (bc)*b + (cb)*c)```
This is now correct, but could be simplified. The last four terms include the `+b+c` cases, so we can drop those three terms. Then we can combine the last four terms into
```X = (bc)*(b + ) + (cb)*(c + )```
Now, what about `Z`? As it happens, there isn't any difference between what we need for `Z` and what we need for `X`, so we can also use the above expression for `Z`.

Finally, what about `Y`? This is just like the others, except that `Y` cannot be empty. Luckily, it's easy to adjust the above expression for `X` and `Z` so that it can't be empty:

`Y = ((bc)*b + (cb)*c)`

Substituting into `(X a (Y a)* Z) + X`, we get

``` ((bc)*(b + ) + (cb)*(c + ) a (((bc)*b + (cb)*c) a)* (bc)*(b + ) + (cb)*(c + )) + (bc)*(b + ) + (cb)*(c + ) ```