I think.
Philip Wadler proposed "views" in a 1986 paper (see References.) This section just explains pattern-matching, views, and their relationship to various programming languages.
ML, Haskell, Aardappel, and many other functional programming languages
(plus Prolog) support discriminated unions ("sum types" or I think
"algebraic data types") that can be used in "pattern matching" that combines
case-analysis control flow with data access. Here's an example program in
OCaml that defines a single type called tree
and uses it to sort a list of
strings.
(* a simple binary tree sort program *)
(* an object of type "foo tree" can be either a Leaf, containing
no data, or a Fork, containing another "foo tree", a "foo", and
another "foo tree". *)
type 'foo tree = Leaf | Fork of 'foo tree * 'foo * 'foo tree ;;
(* Here we define a function whose unnamed second argument is a tree.
If it's a Fork, we bind the names "left", "key", and "right" to its
data contents, and which of the two branches we take is determined
by whether it's a Leaf or a Fork. *)
let rec tree_insert datum = function
Leaf -> Fork (Leaf, datum, Leaf)
| Fork(left, key, right) -> if datum < key
then Fork(tree_insert datum left, key, right)
else Fork( left, key, tree_insert datum right) ;;
(* Here we pattern-match on a list, which can be either the empty
list, or a list consisting of a first item "h" followed by a list
of zero or more other items "t". *)
let rec make_tree old_tree = function
[] -> old_tree
| h :: t -> make_tree (tree_insert h old_tree) t ;;
let sorted_tree = make_tree Leaf ["t"; "r"; "e"; "e"; "t"; "o"; "p"; "!"] ;;
(* Here we use pattern-matching to iterate over the data of the tree, in
what will be sorted order if the tree was built with the tree_insert
function above. *)
let rec tree_iter f = function
Leaf -> ()
| Fork(left, key, right) -> tree_iter f left; f key; tree_iter f right ;;
tree_iter print_endline sorted_tree ;;
The trouble with the pattern-matching is that it breaks data abstraction. The interface to the data structure is the same as its implementation, at least for those who want to pattern-match on it instead of just passing it to other functions. Changing the implementation implies changing all of the code that pattern-matches on it, and possibly changing that code to not use pattern-matching.
Wadler gives several examples:
Succ(Succ(Zero))
) or
explicit binary (Twice(Twiceplusone(Twiceplusone(Zero)))
for 6).Wadler's suggested explicit-binary view looks like this:
view int ::= Zero | Even int | Odd int
in n = Zero, if n = 0
= Even (n div 2), if n > 0 && n mod 2 = 0
= Odd ((n-1) div 2), if n > 0 && n mod 2 = 1
out Zero = 0
out (Even n) = 2 * n, if 2 * n > 0
out (Odd n) = 2 * n + 1, if 2 * n + 1 > 0
Here Even
means "twice", and "Odd" means "twice, plus one". His out
"function" works by pattern-matching, just like the OCaml functions above;
and the in
"function" can do the same. The in
and out
clauses explain
how to translate between the Zero
/Even
/Odd
view of integers and the
language's native representation of integers.
In Wouter van Oortmerssen's Aardappel, the problem is diminished; if you define a new representation of some abstract data type, then in a single place, you can provide new clauses for all the functions that pattern-match on the old representation. However, the problem still exists, since you have to do an amount of work that's proportional to the number of places that pattern-match on the old concrete data type.
In 2007, I wrote about object-oriented equational rewrite rules which are basically nothing but views. I still haven't implemented the system I described in that post.
OMeta is a system intended to dramatically simplify compiler development. Its authors write:
Several popular programming languages --- ML, for instance --- include support for pattern matching. Unfortunately, while ML-style pattern matching is a great tool for processing structured data, it is not expressive enough on its own to support more complex pattern matching tasks such as lexical analysis and parsing.
But what is weak about ML-style pattern-matching? Why can't it do lexical analysis and parsing, if you have (perhaps lazy) lists of characters or tokens?
In fact, I suspect that views, plus non-backtracking pattern matching, plus some syntactic sugar, more or less equals OMeta. Here's the first sample grammar from the OMeta paper, modified to remove left recursion; incidentally, this is roughly Figure 3-1 from Bryan Ford's PEG parsing thesis:
meta E {
dig ::= '0' | ... | '9';
num ::= <dig>+;
fac ::= <num> '*' <fac>
| <num>;
exp ::= <fac> '+' <exp>
| <fac>;
}
We can think of this as a "view" on a list of characters, with the proviso that a particular list of characters may fail to parse with some or all of the nonterminals. Let's augment the grammar with some semantic actions:
meta E {
dig ::= '0' | ... | '9';
num ::= <dig>+:ds => number ds;
fac ::= <num>:x '*' <fac>:y => x * y
| <num>;
exp ::= <fac>:x '+' <exp>:y => x + y
| <fac>;
}
(I'm assuming a number
function from a list of characters to a number.)
So when we ask for the fac
"view" of the character stream, we're hoping to
get a number --- and also, implicitly, the rest of the stream after the
parsed-out factor. In Wadler's notation, borrowing from OCaml where
necessary:
view fac ::= Fac int * char list
in Num (x, '*' :: Fac(y, rest)) = Fac(x * y, rest)
in Num (x, rest) = Fac(x, rest)
(I've omitted the out
clause because it's not relevant to the problem of
parsing.) If the grammar were more OCaml-like, I think it would look like
this. (Incomplete matches are considered bad style in OCaml, and the
compiler warns about them.)
view fac = Fac of int * char list
in Num (x, '*' :: Fac(y, rest)) -> Fac(x * y, rest)
| Num (x, rest) -> Fac(x, rest) ;;
The theory here is that if neither in
clause matches, then the attempt to
view a list of characters as a fac
will fail to pattern-match, which may
cause backtracking of outer parses. In this syntax, it is only two and a
half times as long as the rule in the OMeta-style grammar above.
The +
notation causes a little bit of trouble; it's syntactic sugar. We
can write out a desugared form as follows:
num ::= <digplus>:ds -> number ds;
digplus ::= <dig>:x <digplus>:y => x :: y
| <dig>:x => [x];
Here are the other nonterminals in this grammar, in the OCamly notation:
view dig = Dig of char * char list
in c :: t when '0' <= c && c <= '9' -> Dig (c, t) ;;
view num = Num of int * char list
in Digplus (ds, rest) -> Num (number ds, rest) ;;
view digplus = Digplus of char list * char list
in Dig (x, Digplus (y, rest)) -> Digplus (x :: y, rest)
| Dig (x, rest) -> Digplus ([x], rest) ;;
view exp = Exp of int * char list
in Fac(x, '+' :: Exp(y, rest)) -> Exp(x + y, rest)
| Fac(x, rest) -> Exp(x, rest) ;;
And I think there you have the parser and expression evaluator, in a
hypothetical extension of OCaml. Although this is several times wordier
than the OMeta version above, it's several times more concise than the
Haskell version in Ford's thesis. It's missing a bit of syntactic sugar,
resulting in nested parentheses, unnecessary type definitions, redundant
default cases, and explicit passing-around of the rest
parameter, but it's
much of the way there. ...
You might think that all of these nonterminals should be part of a single
view E
, but the normal case in OCaml and the other languages Wadler was
thinking about that support pattern-matching is that the variants of a sum
type are disjoint, which generally allows the compiler to tell you if you've
missed a case in your case analysis. So here each one is a separate "type".
OMeta introduces four major extensions over normal PEGs:
Of course, if you have a functional programming language with pattern-matching, you have the first three of these four already.
So if OMeta "contains views", can we write Wadler's other examples of uses of "views" in a reasonably homeomorphic way in OMeta?
My unimplemented object-oriented equational rewrite rules language supports views quite straightforwardly and concisely. ...
Bicicleta, although it doesn't have pattern-matching, does have "views" in the sense that accessing data that's stored inside a data structure looks the same as accessing data that's computed on the fly. This is also a feature of Self and Forth....
, and also contains a sort of notion of failure, which I believe can be implemented entirely outside of the core language itself --- it merely requires overloading the infix !! operator differently on non-error objects and error objects.
So I suspect that OMeta's simplistic notion of failure (it doesn't do general backtracking) should be fairly straightforward to implement in Bicicleta with an overloaded '/' operator, something like this:
parsed = {self:
'/' = {op: '()' = self}
}
parse_failure = ...
Wadler's paper, "Views: a way for pattern matching to cohabit with data abstraction", was written in 1986 and published in 1987 in some ACM journal, pp.307--313.
Aardappel is an experimental visual programming language designed by Wouter van Oortmerssen, with a novel concurrency primitive and a very simple side-effect-free subset used to describe sequential computation.
Object-oriented equational rewrite rules is a kragen-tol post I wrote in 2007 about a slightly novel design for a pattern-matching programming language that natively incorporates views. I wrote further on it in 2008 in IRC bots with object-oriented equational rewrite rules, which is IRC bots with object-oriented equational rewrite rules.
OMeta, by Alessandro Warth and Ian Piumarta, is a PEG parsing system generalized so that it can be used for a wider range of tasks. To date it has been used for scanners, parsers, optimizers, code generators, network protocol implementations, and automated code inspections, among other things. They published a paper about it in the Dynamic Language Symposium 2007.
PEG parsing is a new parsing method invented by Bryan Ford in 2002, based on work from the late 1960s on Bob McClure's TMG "TransMoGrifier" program. It can parse a large set of languages, including some non-context-free languages but probably not including all context-free languages, supports scannerless one-level parsing well, and has a provably linear-time parsing algorithm called "packrat parsing", which uses a lot of memory. It has its problems; the non-memory-hungry implementation can require exponential time, the absence of backtracking makes some languages more difficult to parse and presumably makes some languages impossible to parse, and the lack of support for left recursion in the basic algorithm is annoying (although Warth and buddies have a solution to this problem that I haven't read yet.)