Bernstein + Sons


5 Brewster Lane, Bellport, New York 11713-2803
Phone: +1-631-286-1339

November 26, 2000

Some Comments on Parsing for Computer Programming Languages

© Copyright 2000 All Rights Reserved
Herbert J. Bernstein
Originally published in part within the lecture notes
for a course in Computer Programming Languages in Spring 2000
Mathematics and Computer Science Department, St. Joseph's College,
Patchogue, NY, February 2000

The following are some supplemental notes on defining and parsing of Computer Programming Languages.

The defining and parsing of languages has a long and complex history, but current "industrial" practice by most programmers is based on use of commonly available tools for lexical scanning of tokens and parsing of grammars based on:

Two highly accessible descendants of these programs are



In addition to specifying the valid sentences in a language, we need to specify the meanings of those sentences. We will use the language Pascal to explore issues in the representation of semantics. In order to understand the language, see the Sun Workshop documentation at:

Some of the material we present is from:



  • Two major issues in semantics
  • Static semantics

    Denotational Semantics

    We use the term "state" to refer to the aggregate of information that is stored in the memory of the computer. As instructions are executed this information changes. Therefore we can say that the meaning of instruction, i, is a function M(i,s) which depends on the instruction and the state, s before execution and which returns the state after execution. If we call I the set of possible instructions and S the set of possible states, then we specify the denotational semantics of the instructions by

    M: I x S -> S

    If two instructions i1 and i2 are executed in sequence, denoted by i1;12, then the composition of the mappings is performed by

    M(i1;i2,s) = M(i2,M(i1,s))

    Note that the two instructions appear in the reverse order on the right hand side of the definition.

    In order to write definitions of mappings, we need access to the information stored. For simplicity of expression, we wave our hands a bit and define a function V from states to values which returns the "last" value stored. Thus, if <expr> is a sequence of instructions forming an expression, then V(M(<expr>,s)) is the value of that expression.

    Let us consider the semantics of a C "while" statement:

    M( while (<expr>) <statement>, s ) =
    if ( V(M(<expr>,s)) = true) then
    M( while (<expr>) <statement>, M( <statement>, M(<expr>,s)))

    Note that M(<expr>,s) appears three times. This does not imply three evaluations of the expression. Rather M(<expr>,s) is the state of the computer after evaluation of the expression starting in state s. We could just as well have written:

    M( while (<expr>) <statement>, s ) =
    t = M(<expr>,s)
    if ( V(t) = true) then
    M( while (<expr>) <statement>, M( <statement>, t))

    Thus the meaning of this while statement is operationally the same as the following steps:

    Similarly a C "for" statement is approximately:

    M( for (<expr1>; <expr2>; <expr3>) <statement>, s ) =
    t = M(<expr2>,M(<expr1>, s))
    if ( V(t) = true ) then
    M( for (; <expr2>; <expr3>) <statement>, M (<expr3>, M(<statement>, t)))