This tutorial started as a translation of Monadic Parser Combinators 1, or at least the first half, into common lisp.

The example code in the following document is completely self-contained, and does not require an installation of the `SMUG` library.

No prior experience with functional programming, monads or recursive descent parsing is assumed. The only requirements are a common lisp environment, and a burning desire to find a better way to parse.

## Introduction

This tutorial, like this library, is based on an approach to building parsers using higher-order functions (combinators) that is popular in the functional programming community. Incidentally, these parsers form an instance of something called a monad, which is itself a useful construct with implications beyond parsing.

With great debt to Monadic Parser Combinators 1, the paper from which this library is derived, this tutorial presents a step by step introduction to the topics of parser combinators and monads and their use in common lisp.

### Common Lisp

In some cases, the natural name for a parser conflicts with a name in the `COMMON-LISP` package. Rather then shadow the symbols, the prefix of all parser names is a #\. character. It is thought that this aids usability, as one can simply `(:use :smug)`. It also helps to distinguish parser returning functions from other functions.

## How To Combine Parsers

A Parser for Things is a function from Strings to Lists of Pairs of Things and Strings!

– Fritz Ruehr, Willamette University 2

A parser is something that is familiar to all programmers… a function that, given a series of tokens as input, produces a data structure that relates to the grammatical structure of the input in some way. Or, to put it simply, a function from strings to things.

```;; our fictional parser matches the string "string"
;; and returns a SYMBOL thing
≻ (parse-thing "string") ≕≻ THING
```

In order to combine simple parsers into larger more complex ones, they need a way to communicate between them. First, because any given parser might consume only a part of the input, we'll have our parser return a `CONS` with the result in the `CAR` and the remaining input in the `CDR`.

```≻ (parse-thing "string string") ≕≻ (THING . " string")
```

Because a parser may return multiple results when the grammar is ambiguous, or may return no results all, we'll put our conses in a list, and have the empty list, `NIL`, denote a failed parse.

```≻ (parse-thing "string string")  ≕≻ ((THING . " string"))
≻ (parse-thing "strong string")  ≕≻ NIL
```

So, for our purposes, a parser is just a `FUNCTION` that takes a single value as the input and returns a `LIST` of `CONS`'s of results and unconsumed input.

It is this trivial protocol that allows us to combine small simple parsers into larger more useful ones.

Smug parsers allow infinite look-ahead and backtracking. To support parsing many different things, it's useful to define an input protocol. Smug parsers only require three operations on input : `INPUT-FIRST`, `INPUT-REST` and `INPUT-EMPTY-P`.

We'll define them in terms of strings. It serves our purposes and makes for a nice visual presentation.

```(defgeneric input-empty-p (input)
(:method ((input string)) (zerop (length input))))

(defgeneric input-first (input)
(:method ((input string)) (aref input 0)))

(defgeneric input-rest (input)
(:method ((input string))
(multiple-value-bind (string displacement)
(array-displacement input)
(make-array (1- (length input))
:displaced-to (or string input)
:displaced-index-offset (1+ displacement)
:element-type (array-element-type input)))))
```
```(≻ (input-empty-p "")
≕≻ T)
(≻ (input-empty-p "foo")
≕≻ NIL)
(≻ (input-first "foo")
≕≻ #\f)
(≻ (input-rest "foo")
≕≻ "oo")
```

## The Three Primitive Parsers

There are 3 simple primitive parsers. It it only necessary to understand them, and one sequencing combinator, `.BIND`, to understand all of `SMUG/TUTORIAL`.

### `.IDENTITY`

The first parser is `.IDENTITY`, which always succeeds by returning the value passed to it, and does not consume any input. Because we've earlier defined parsers as functions that take a single argument we'll make a curry3 with the input parameter.

```(defun .identity (value)
(lambda (input)
(list (cons value input))))
```
```(≻ (funcall (.identity :foo) "bar baz")
≕≻ ((:foo . "bar baz")))
```

### `.FAIL`

The second parser, `.FAIL`, is the inverse behaviour of `.IDENTITY`. It simply fails regardless of the input. we could define `.FAIL` as a function that takes a single argument, but then we'd have to access it using `FUNCTION` (#'), and aesthetically that inconsistency is undesirable, so we'll again make curry with the input parameter.

```(defun .fail ()
(lambda (input) (declare (ignore input)) nil))
```
```(test> (funcall (.fail) "foo") => NIL)
```

### `.ITEM`

The last true primitive is `.ITEM`, which is a parser that consumes the first token in the input, or fails in the input is empty.

```(defun .item ()
(lambda (input)
(unless (input-empty-p input)
(list (cons (input-first input)
(input-rest input))))))
```
```(test> (funcall (.item) "foo")
=> ((#\f . "oo")))

(test> (funcall (.item) "")
=> NIL)
```

## `RUN`, `PARSE`: `FUNCALL` and `CAAR` in disguise

All the primitives return a `FUNCTION` that must be `FUNCALL`'ed with `INPUT` in order to run the parser.

There are many reasons to define a `RUN` function. We can `CL:TRACE` it, or change the input parameter `TYPE`, or change what the actual primitives return.

```(defun run (parser input)
(funcall parser input))
```

The `RUN` function returns the entire parse tree. Most of the time we simply want the `CAR` of the `FIRST` result. The `CDR` is the leftover input, and the `REST` of the result alternative outcomes. We might want these as well, so we return them all as `VALUES`.

```(defun parse (parser input)
(let ((result (run parser input)))
(when result
(destructuring-bind ((result . input) &rest rest)
result
(apply #'values result input rest)))))
```

## `.BIND`: Our first primitive combinator

Now that we have our primitive parsers, we need a way to combine them. We'd like to be able to apply parsers in sequence, and it would also come in handy if we could give names to the intermediate results of parsers. Both these requirements are fulfilled by using the monadic sequencing operator, `.BIND`.

`.BIND` is a function that takes as arguments a parser P, and a function F which takes a value and returns a parser P2. `.BIND` returns a parser that first applies P to the input, returning a list of `(VALUE . INPUT)` pairs. The the function `F` is applied to each `VALUE`, and the result `P2` then applied to the `INPUT`. The collected lists of pairs returned from the `P2`'s are then concatenated and the result returned.

```(defun .bind (parser function)
(lambda (input)
(loop :for (value . input) :in (run parser input)
:append (run (funcall function value) input))))
```
```(let ((char-token
(.bind (.item)
(lambda (char)
(.identity (list :char char))))))
(run char-token "foo"))
;; ~> (((:CHAR #\f) . "oo"))
```

Because `.BIND` itself returns a parser, the result of a `.BIND` can be returned as P2. This allows parsers to be chained, and allows us to use `LAMBDA` to provide names for the values of parser results. For example, the following parser uses `.BIND` to return the first two characters as a cons.

```(let ((two-chars
(.bind (.item)
(lambda (char)
(.bind (.item)
(lambda (char2)
(.identity (cons char char2))))))))
(run two-chars "asd"))
;;=> (((#\a . #\s) . "d"))
```

The next section gets into some details about why our parser is a monad. You don't really need to know this, so feel free to skip it if you're in a hurry.

### A quick word on monads

By virtue of having the functions `.BIND` and `.IDENTITY` defined as they are, our parser interface forms a monad. A monad is, essentially, a category of things that provide the functions `.BIND` and `.IDENTITY`.

Of course, just having functions called `.BIND` and `.IDENTITY` does not a monad make. There are other contracts that `.BIND` (also known as pipe, >>~, *, or let) or `.IDENTITY` (aka result, lift, unit, return) must fulfill.

#### The monad laws

In order to be properly categorized as a monad, the thing providing a definition for `.BIND` and `.IDENTITY` must obey three laws (a static functional programmer would say 'must have a certain type', but the word type means something different to a dynamic functional programmer, so we'll avoid it here)

In order to describe those laws we need to define a few terms

a function that, given a value, returns a value in the form expected by the internals of `.BIND`. In our examples above, a parser (taking an input and returning a list of results) is the Monadic Value.
A function that, given a value returns a monadic value encapsulating that value. `.IDENTITY` is the canonical Monadic Function

In Object-Oriented terms, the MF is a constructor, and the MV an object.

The laws which all things must obey in order to be called a monad are simple :

"Left identity"
(bind (result x) MF) = (funcall MF x)
"Right identity"
(bind MV result) = MV
"Associativity"
(bind (bind MV MF) MF2) = (bind MV (lambda (x) (bind (MF x) MF2)))

With static type systems, the compiler will enforce this contract for you. In a dynamic system, we just need to be a little more careful. Proving the monad laws for our `.BIND` and `.IDENTITY` is left as an exercise.

That's really all there is to monads except for syntax, which we'll get to later. There are extended laws that other monads obey, and monads have other uses beyond parsing, but we're reaching the end of our scope already.

## `.SATISFIES` : the parser predicate

Often, we only want to consume input if a certain condition is true. This where `.SATISFIES` comes in.

```(defun .satisfies (predicate &rest args)
(.bind (.item)
(lambda (x)
(if (apply predicate x args)
(.identity x)
(.fail)))))
```
```TEST>
```
```(run (.satisfies #'digit-char-p) "1 and")
;;=> ((#\1 . " and"))
```

If `.ITEM` fails, so will the `.SATISFIES` parser. This is because (bind (fail) MF) will always fail. `.FAIL`, also known as `.ZERO`, is a function belonging to a category of monads knows as "monads with a zero". That's not terribly important for parsing, but interesting if you're into that sort of thing.

### `.IS` and `.IS-NOT`

Imagine we need to parse all characters that come before a `#\;`. The simple way is to have a function that uses `CL:NOT`.

```(.satisfies
(lambda (item)
(not (char= #\; item))))
```

It turns out that `(.satisfies (lambda (i) (not ...)))` is quite common, so we define a parser that has a shorter and more relevant name.

```(defun .is-not (predicate &rest args)
(.satisfies (lambda (i)
(cl:not (apply predicate i args)))))
```

This makes things a lot shorter and easier to read.

```(test>
(run (.is-not #'char= #\;) "foobar;%^*&")
=> ((#\f . "oobar;%^*&")))
```
```(test>
(run (.is-not #'char= #\;) ";%^*&")
=> NIL)
```

For that matter, now that we have `.IS-NOT`, `.SATISFIES` is a bit long, and does not prefix `-NOT` . So we type a few keys in order to save a bundle in the future.

```(defun .is (predicate &rest args)
(apply #'.satisfies predicate args))
```

### Example Parsers for letters and numbers using `.SATISFIES`

`.SATISFIES` allows us to `DEFUN` some simple parsers

```(defun .char= (x)
(.is #'cl:char= x))

(defun .digit-char-p ()
(.is #'cl:digit-char-p))

(defun .lower-case-p ()
(.is #'cl:lower-case-p))

(defun .upper-case-p ()
(.is #'cl:upper-case-p))
```
```(run (.char= #\x) "xyzzy") ;=> ((#\x . "yzzy"))
(run (.digit-char-p) "1234") ;=> ((#\1 . "234"))
(run (.lower-case-p) "abcd") ;=> ((#\a . "bcd"))
(run (.upper-case-p) "Abcd") ;=> ((#\A . "bcd"))
(run (.upper-case-p) "doh!") ;=> NIL
```

## `.PLUS`, the non-deterministic choice combinator

If we want to combine our earlier parsers, say to create an `ALPHANUMERIC-CHAR` from `UPPER-CASE-P` and `LOWER-CASE-P` we need a combinator capable of making the choice between them.

In some cases, it may not be an exclusive choice. There might be multiple ways to parse a string, or a later pass might resolve the ambiguity.

For example, in one of our earlier examples of `.BIND`, we saw a parser that returned the first two characters in a stream. This parser will fail if there is only one character left in the input.

```(let ((two-chars
(.bind (.item)
(lambda (char)
(.bind (.item)
(lambda (char2)
(.identity (cons char char2))))))))
(funcall two-chars "a"))
;;=> NIL
```

If we want to parse one or two characters, or an arbitrarily long series of characters, we need some a way to express that.

Enter the `.PLUS` combinator.

```(defun .plus (first-parser second-parser)
(lambda (input)
(append (funcall first-parser input) (funcall second-parser input))))
```
```(let ((two-chars
(.bind (.item)
(lambda (char)
(.bind (.item)
(lambda (char2)
(.identity (cons char char2))))))))
(funcall (.plus two-chars (.item)) "a")
;;=> ((#\a . ""))
(funcall (.plus two-chars (.item)) "asd")
;;=> (((#\a . #\s) . "d") (#\a . "sd"))
)
```

Note that the second parse returned two pairs, as both parsers were successful. The string parsed as both two chars and a single item.

### Example parsers using PLUS

The examples used in the original paper1 are for letters and alphanumeric characters. There's no good reason to use them over /(.is #'alpha-char-p)/and the like, but they do serve as simple example.

```(defun letter () (plus (lower-case-char) (upper-case-char)))

(funcall (letter) "foo") => ((#\f . "oo"))
(funcall (letter) "1foo") => NIL

(defun alphanumeric () (plus (letter) (.digit-char)))

(funcall (alphanumeric) "1foo") => ((#\1 . "foo"))
(funcall (alphanumeric) "!1foo") => NIL
```

The other example is more illustrative, a parser that returns a series of letters or the empty string.

```(defun word ()
(let ((non-empty-letters
(bind (letter)
(lambda (first-letter)
(bind (word)
(lambda (rest-of-letters)
(result (format nil "~A~A"
first-letter
rest-of-letters))))))))
(plus non-empty-letters (result ""))))

(funcall (word) "asd")
=>
(("asd" . "") ("as" . "d") ("a" . "sd") ("" . "asd"))
```

This is our first recursive parser, but it's a common idiom. Notice that it returns all the possible strings of letters.

This is obviously inefficient when one only requires the first value. required, a deterministic combinator `.OR`, will be introduced later in the tutorial.

### Efficiency

`.FIRST` is the real choice when it comes down to it, as `.PLUS` really does matter.

```(defun .first (parser)
(lambda (input)
(let ((results (run parser input)))
(when results (list (cl:first results))))))
```

## Syntax : `LET*` and the identity monad

If you read the earlier section on monads, you'd know that `.BIND` and `.IDENTITY` are the interface to many different types of monads, of which our parser is but one example. If you didn't, you know now. Again, if you're not at all interested and really just want to keep on parsing, skip down to the macro.

The most basic monad is the identity monad. A definition of its `.BIND` and `.IDENTITY` might look like the following.

```(defun i-bind (mv mf) (funcall mf mv))
(defun i-result (value) value)
```

In Lisp, the identity monad is so trivial as to be useless. In a functional programming language, or any language where the order of operations is not guaranteed, the identity monad serves to sequence operations.

Imagine a silly lisp where the order of evaluation isn't defined as strict left to right4. The following form could have disastrous consequences.

```(progn (remove-gun-from-pants)
(pull-trigger))
```

The identity monad makes the sequencing explicit. In a purely functional lisp, one might sequence the operations as follows.

```(i-bind (remove-gun-from-pants)
(lambda (gun)
(lambda (pointed-gun)
(i-bind (pull-trigger pointed-gun)
(lambda (fired-gun)
(i-result fired-gun)))))))
```

In functional programming languages this pattern is so common that there is special syntax for it. The usual choices are 'do notation' or 'list comprehension syntax'.

First, the previous example rendered in list comprehension notation :

```[fgun | gun <- removeGun
, pgun <- pointGunAtBadGuy gun
, fgun <- pullTrigger pgun]
```

And in do notation :

```do
gun <- removeGun
fgun <- pullTrigger pgun
return fgun
```

The astute lisper might notice that do notation looks a lot like LET*. In fact, that's really all it is. LET* is lisp syntax for the identity monad, and our i-bind using forms above are directly translatable.

```(let* ((gun (remove-gun-from-pants))
(fired-gun (pull-trigger pointed-gun)))
(identity fired-gun))
```

One could legitimately say that the common lisp package is an instance of the identity monad, if one cared for such insights.

### `.LET*`, our version of `LET*` like do notation

A `LET*` like construct is the obvious notation for a lisper to take advantage of the monadic nature of parsers. It's often useful to ignore a value. In haskell, the underscore character is used to denote an ignorable variable, so we'll use the same convention.

```(defmacro .let* (bindings &body body)
(if bindings
(let ((symbol (first (first bindings))))
`(.bind ,@(cdr (first bindings))
(lambda (,symbol)
,@(when (or (string-equal (symbol-name symbol) "_")
(null (symbol-package symbol)))
`((declare (ignorable ,symbol))))
(.let* ,(cdr bindings)
,@body))))
`(progn ,@body)))
```
```(funcall (.let* ((a (.identity 1)))
(.identity a)) "")
```

If we replace `.BIND` with our `I-BIND` function above, we get a macro that is equivalent to `LET*`. `.LET*` binds the results of parsers, and is a much nicer way to work over nesting `.BIND`'s.

### Examples using `.LET*`

#### `.PROGN`, `.PROG1`, `.PROG2`

Using `.LET*`, we can implement the macros `.PROGN` (which is similar `.AND` because it will fail when the parser does), `.PROG1` (which comes in handy for matching things and the end of the line, or when there is no more input) and `.PROG2`, which as we will see is also quite useful.

```(defmacro .progn (&rest parsers)
(if (rest parsers)
(let ((name (gensym)))
`(.let* ((,name ,(first parsers)))
(.progn ,@(rest parsers))))
(first parsers)))

(defmacro .prog1 (parser &rest parsers)
(let ((name (gensym))
(ignore (gensym)))
`(.let* ((,name ,parser)
(,ignore (.progn ,@parsers)))
(.identity ,name))))

(defmacro .prog2 (parser1 parser2 &rest parsers)
(let ((name (gensym))
(ignore (gensym)))
`(.let* ((,ignore ,parser1)
(,name ,parser2)
(,ignore (.progn ,@parsers)))
(.identity ,name))))
```

#### `.STRING=`

Using recursion like we did in our WORD parser, we'll create a parser that matches a specific string.

```(defun .string= (string)
(if (string= string "")
(.identity string)
(.let*
((_ (.is 'char= (aref string 0)))
(_ (.string= (input-rest string))))
(.identity string))))
```
```(run (.string= "asdf")  "asdfjkl") => (("asdf" . "jkl"))
```
```(run (.string= "asdf")  "asd") => NIL
```

Once can see how much nicer `.LET*` notation is, and also how the ignorable `_` comes in handy.

## `.MAP` : The repetition combinator

Earlier, we defined a parser, `.WORD`, using `.BIND` and a recursive call. Lets define a similar parser using `.LET*` that returns a list of letters.

```(defun .letters ()
(.plus (.let* ((x (.letter))
(xs (.letters)))
(.identity (cons x xs)))
(.identity nil)))
```

This pattern can easily be abstracted into a more general combinator, `.ZERO-OR-MORE`

### `.ZERO-OR-MORE`

```(defun .zero-or-more (parser)
(.plus (.let* ((x parser)
(xs (.zero-or-more parser)))
(.identity (cons x xs)))
(.identity ())))
```
```(test>
(run (.zero-or-more (.char= #\a)) "aaaab"
=>  (((#\a #\a #\a #\a) . "b") ((#\a #\a #\a) . "ab") ((#\a #\a) . "aab")
((#\a) . "aaab") (NIL . "aaaab")))

(test>
(run (.zero-or-more (.char= #\a)) "bbbba")
=>
((NIL . "bbbba")))
```

Note that zero or more always succeeds. If one needs a parser that matches one or more items and fails otherwise, we can define one in terms of ZERO-OR-MORE, can call it, appropriately enough, ONE-OR-MORE.

### `.ONE-OR-MORE`

```(defun .one-or-more (parser)
(.let* ((x parser)
(y (.zero-or-more parser)))
(.identity (cons x y))))

(test> (funcall (.one-or-more (.char= #\a)) "aaaab")
=>
(((#\a #\a #\a #\a) .many "b")))

(test> (funcall (.one-or-more (.char= #\a)) "bbbba")
=> NIL)
```

We could now define a `TWO-OR-MORE` and `THREE-OR-MORE` etc., but it is likely better to define a function to rule them all. It needs a base to rule from.

### Function`.MAPC`, `.MAPCAR`

Syntax:

`.mapc` parser => parser

`.mapcar` parser => result-list

Arguments and Values:

parser
The parser that is attempted
result-list
a list

Description:

The mapping operation involves attempting parser many times. Except for `.mapc` and `.mapl`, the result contains the results returned by the parser.

`.MAPCAR` operates on successive results of parser. The iteration terminates when the parser fails. The value returned by mapcar is a list of the results of parser

```(defun .mapcar (parser)
(.plus (.let* ((x parser)
(xs (.mapcar parser)))
(.identity (cons x xs)))
(.identity ())))
```
```\.MAPCAR
```

`.MAPC` is like `.MAPCAR` except that the results of applying function are not accumulated. The parser argument is returned as a result.

```(defun .mapc (parser)
(.plus (.let* ((_ parser)
(_ (.mapc parser)))
(.identity parser))
(.identity parser)))
```
```\.MAPC
```

Examples:

```(test> (parse (.prog1 (.mapcar (.item))
(.char= #\!))
"Yay!")
=> (#\Y #\a #\y))
```
```(let ((/parser/ (.item)))
(test> (parse (.let* ((parser (.prog1 (.mapc /parser/)
(.char= #\!)))
(char parser))
(.identity (cons (eq parser /parser/)
char)))
"Holy Guacamole!?")
=>(T . #\?)))
```

### Function`.MAKE-LIST`, `.MAKE-SEQUENCE`

Often, we know what we want, and how many we want, all at once. So, just like `CL:MAKE-LIST`, we take a number.

Syntax:

`.make-list` size &key initial-element => list

`.make-sequence` result-type size &key initial-element => list

Arguments and Values:

size
a non-negative integer.
initial-element
a parser. The default is `(.item)`.
list
a list.

Description:

Returns a list of length given by size, each of the elements of which is a successful run of the initial-element parser.

```(defun .make-list (size &key (initial-element (.item)))
(if (zerop size)
(.identity nil)
(.let* ((first initial-element)
(rest (.make-list (1- size)
:initial-element initial-element)))
(.identity (list* first rest)))))
```
```(defun .make-sequence (type length &key (initial-element (.item)))
(.let* ((list (.make-list length :initial-element initial-element)))
(.identity (coerce list type))))
```

### Function`.CONCATENATE`

There are often mulitple parsers that are run one after another, and the results joined together at the end. `.CONCATENATE` takes care of that.

```(defun .concatenate (output-type-spec &rest parsers)
(if (not parsers)
(.fail)
(.let* ((first (first parsers))
(rest (if (rest parsers)
(apply
#'.concatenate output-type-spec (rest parsers))
(.identity nil))))
(.identity (cl:concatenate output-type-spec first rest)))))
```

### `.MAP` : ONE FUNCTION TO RULE THEM ALL!

In the end

The `:AT-LEAST` keyword solves the "how many do we want to start with?". We have decided on `1` as the default, as most of the time we do want the parser to succeed, and besides that, `.OPTIONAL` is a better way of saying `:at-least 0`.

So, we can specify the `.MAP` parser as follows.

```(defun .map (result-type parser
&key
(at-least 1))
"=> a ~result-type~ of /parser/ results."
(.let* ((list-1 (.make-list at-least :initial-element parser))
(list-2 (funcall (if result-type #'.mapcar #'.mapc) parser)))
(.identity (when result-type (concatenate result-type list-1 list-2)))))
```

### Examples using `.MAP`

Let's make a parser for standard quoted strings. We'll use the #\' character as the quotes, and the #\| character as the escape character, simply to make it easier to embed in our example text in common lisp strings.

```(defun .quoted-string (&key (quote #\')
(escape #\|))
(.let* ((_ (.char= quote))
(string
(.map 'string
(.plus (.let* ((_ (.char= escape)))
(.item))
(.is-not 'char= quote))))
(_ (.char= quote)))
(.identity string)))
```
 (The quote char is ' and the escape char is . . '')

## `.OR`, `.NOT`, and `.AND` : deterministic logic combinators

### `.OR`

`.OR` is a deterministic `.PLUS`. It take any number of parsers. The first parser is run, and if it succeeds, evaluation short circuits and the result of the parser is returned. Otherwise, the next parser is run, and so on, until one succeeds or there are no more parsers.

If we use `.BIND` or `.LET*` for `.OR` because it would fail if one of its parsers fails. We have `.PLUS`, so we could simply use it determanistacally.

```(let ((fail (gensym)))
(defun .or (parser &rest parsers)
(.let* ((result (.plus parser (.identity fail))))
(if (not (eq result fail))
(.identity result)
(if parsers
(apply #'.or parsers)
(.fail))))))
```

But, to make it simpler, `.OR` can be a primitive. This will be slightly more effeciant and works for our purposes.

```(defun .or (parser &rest parsers)
(lambda (input)
(or (funcall parser input)
(when parsers
(funcall (apply #'.or parsers) input)))))
```

### `.NOT`

Similarly, `.NOT`, which continues parsing only when the parser fails, can be both high-level and primitive as well.

```(let ((succeed (gensym)))
(defun .not (parser)
(.let* ((result (.or parser (.identity succeed))))
(if (eq result succeed)
(.identity t)
(.fail)))))
```
```(defun .not (parser)
(lambda (input)
(let ((result (funcall parser input)))
(if result
nil
(list (cons t input))))))
```

### `.AND`

On the other hand, `.AND` can be defined in terms of `IF`, and doesn't even need to test for failure, as `.BIND` handles failure automatically.

`.AND` (known as '>>' in haskell) sequentially composes parsers, discarding the results of all but the last one, and returning that result. We use `.PROGN` for similar purposes.

```(defun .and (p1 &rest ps)
(.let* ((result p1))
(if ps
(apply #'.and ps)
(.identity result))))
```

### Examples using `.OR`, `.NOT`, and `.AND`

#### `.OPTIONAL`

The OPTIONAL combinator, which allows a parser to fail and still continue, is a natural use of `.OR`.

```(defun .optional (parser)
(.or parser (.identity nil)))
```

#### `.IF` / `.WHEN` / `.UNLESS`

Finally, using `.OR`, `.AND` and `.NOT`, we can make parser versions of the lisp conditionals we all know and love.

```(defun .if (test-parser then-parser
&optional (else-parser (.fail)))
(let ((no (gensym)))
(.let* ((no? (.or test-parser (.identity no))))
(if (not (eq no? no))
then-parser
else-parser))))

(defun .when (test-parser then-parser)
"we define .when in terms of .IF, but it's really just .AND again"
(.if test-parser then-parser))

(defun .unless (test-parser then-parser)
"defined in term of .when, even though it's just (.AND (.NOT ...))"
(.when (.not test-parser) then-parser))
```

#### `.NO-MORE-INPUT`

Now that we have `.NOT`, we can specifically test for failure rather than abort the parse entirely. since the primitive parser `.ITEM` only fails when the input is empty, we can define `.NO-MORE-INPUT` by negating it.

```(defun .no-more-input ()
(.not (.item)))
```

## `.READ-LINE`: `#\Newline` is a decent break up

Lines are quite important for lexing/parsing most text files. For everything before the last line it is very simple, but though `#\Newline` is a decent break up, it does not always end that way.

```(defun .read-line-newline ()
(.let* ((line (.optional (.map 'list (.is-not #'char= #\Newline))))
(newline (.is #'char= #\Newline)))
(.identity (concatenate 'string line (string newline)))))
```

The issue is that the last "line" has text but does does not end with a `#\Newline`, and the first line does not start with one. Beyond that, for testing we often want to try with only one "line" and most likely without the newline at all.

In other words, if we want the entire thing as a list of lines, this works, as the string ends with a `#\Newline`

```(smug/tutorial:≻
(parse (.map 'list (.read-line-newline))
"foo
bar
")
:=> ;; Which gives us :
("foo
"
"bar
"))
```

If there is no newline at the end, it does not.

```(smug/tutorial:≻ (parse (.map 'list (.read-line-newline))
"foo
bar")
;; Which gives us :
=> ("foo
"))
```

In the latter case, there is no "bar" at the end, which makes it almost not worth the journey there.

As luck would have it, there is a `.NOT` which will help us tie up this particular line.

```(defun .last-line ()
(.prog1 (.map 'string (.is-not #'char= #\Newline))
(.not (.item))))
```
```(test> (parse (.last-line) "bar")
=> "bar")

(test> (parse (.last-line) "bar
")
=> "bar")
```

For every line save for the last will fail for `.LAST-LINE`, but every line save for an empty one will succeed until the end of that parser, so using `.PLUS` would be a waste of CPU time.

```(defun .line ()
(.last-line)))
```

Using `.OR` to run the last only when the first fails works great, but that still means but in the end there is a better option. We combine them both into one function.

• Function `.READ-LINE`
```(defun .read-line (&optional
(eof-error-p t)
eof-value)
(.let* ((text (.optional
(.first (.map 'list (.is-not #'char= #\Newline)))))
(newline (.or (.char= #\Newline)
(.and (.not (.item))
(.identity '())))))
(if (or text newline)
(.identity (concatenate 'string text (when newline (string newline))))
(if eof-error-p
(.fail)
(.identity eof-value)))))
```
```(t> (parse (.map 'list (.document-line)) "
")
' ("
"
"* Very important
"
"** Less important
"
"*** A detail
"
"And the rest is text
"
```
```(parse (.map 'list (.line)) "
")
```
```=>
("
"
" * Very important
"
" ** Less important
"
" *** A detail
"
" And the rest is text
"
" between the headers.")
```

There is really one important detail, and that is the use of `.OR` ,which could easily be `.PLUS` without much waste, rather than having the `TEXT` be `.OPTIONAL` or `:AT-LEAST 0`.

What we do not want is the parser to always succeed when there is no input, because then the parser succeeds when there is no input, ad infinitum.

This kind of event happens quite often when using `.NOT`, so often it is best to take another approach, yet `(.not (.item))` does have its uses. Thus, caution is in the air.

## `.STRING-EQUAL`: For case insensitivity

The `#+NAME`, ```# are case insensitive. We have ~.CHAR~``` and `.STRING=` already, so `.CHAR-EQUAL` and `.STRING-EQUAL` are in order5.

```(defun .char-equal (char)
(.is #'cl:char-equal char))
```

For our `.STRING=`, we simply return the string we passed in. Because `CL:STRING-EQUAL` "ignore[s] differences in case"5, we actually need to return that matched string from what we are parsing.

```(defun .string-equal (string)
(labels ((%string-equal (string)
(.let* ((first (.char-equal (aref string 0)))
(rest (if (> (length string) 1)
(%string-equal (subseq string 1))
(.identity nil))))
(.identity (cons first rest)))))
(.let* ((list (%string-equal string)))
(.identity (coerce list 'string)))))
```
```(test>
(run (.string-equal "asd") "AsD")
=> (("AsD" . "")))
```

## The Conditions Dictionary

```(define-condition smug-condition (simple-condition)
:initarg :input))
(:report (lambda (condition stream)
(apply #'format stream
(simple-condition-format-control condition)
(simple-condition-format-arguments condition))
(terpri stream)
(write-string "Input:" stream)
(print (smug-condition-input condition) stream))))

(define-condition smug-error (error smug-condition) ())

(defun .error (datum &rest arguments)
(lambda (input)
(apply #'error 'smug-error
:input input
(if (or (stringp datum)
(functionp datum))
(list :format-control datum
:format-arguments arguments)
arguments))))
```

## Parsing S-Expressions

We should already by quite familiar with sexps because that is the syntax `SMUG` uses.

In the usual parenthesized syntax of Lisp, an s-expression is classically defined inductively as

1. an atom, or
2. an expression of the form (x . y) where x and y are s-expressions. […]

Most modern sexpr notations in addition use an abbreviated notation to represent lists in s-expressions, so that

(x y z) stands for

(x . (y . (z . NIL)))

where NIL is the special end-of-list symbol (written '() in Scheme).

### `.whitespace` from `+whitespace+`

`+WHITESPACE+` is a list of characters that do not matter to the `.READ`'er.

```(defvar *whitespace* '(#\space #\newline #\tab))
```

The Parser itself is very simple, but we do use three `SMUG` parsers so it is a good example of use.

```(defun .whitespace (&optional result-type)
(.first (.map result-type (.is 'member *whitespace*))))
```

In particular, the `RESULT-TYPE` defaults to `NIL`. This means that we can throw the result away which saves on time and memory.

Also, using `.FIRST` prevents a large parse tree which also saves on such things.

### `.READ`

Anything that is not whitespace must be parsed. This is a perfect use of `.PROG2` and `.OPTIONAL`.

```(defun .read (&optional (parser (.sexp)))
(.prog2 (.optional (.whitespace))
parser
(.optional (.whitespace))))
```

Using that we can now parse anything that is not in `.( )`. Strangely enough, that is very close to what we want for an `ATOM`.

```(smug/tutorial:≻ (parse (.read (.map 'string (.is-not 'find ".( )")))
(format nil "~%~T asd()1234"))
:=> (values "asd" "()1234a"))
```

### `.ATOM`

An atom is anything that is not a list save for the empty list `()`, which is also know as `NIL`.

```(defun .nil ()
(.or (.string-equal "NIL")
(.string= "()")))
```

So to make an atom it is ether that or we simply need to know what is a member of the constituent characters and make that into a token. We use `.FIRST` again.

```(defun .atom ()
(.or (.nil) (.first (.token)))
```

#### `.TOKEN`

At this point, anything that is a not a parenthesis or whitespace makes up a token. They are called constituent characters.

```(defun .token ()
(.first (.map 'string (.constituent))))
```

#### `.CONSTITUENT`

constituent n., adj. 1. a. n. the syntax type of a character that is part of a token – http://www.lispworks.com/documentation/HyperSpec/Body/26_glo_c.htm#constituent

Well, we know what does not constitute a character that is part of a token, so anything that is not a not a part of token is part of a token. The double negative works.

```(defvar *non-constituents*
(list* #\( #\) *whitespace*))

(defun .constituent (&optional (non-constituents
*non-constituents*))
(.or (.and (.char= #\\) (.item))
(.is-not 'member non-constituents)))
```
```(defun .read (&optional (parser (.sexp)))
(.prog2 (.optional (.whitespace))
parser
(.optional (.whitespace))))

(defun .sexp ()
(.read (.or (.list) (.atom))))

(defun .constituent ()
(let ((non-constituent
(list* #\( #\) (whitespace))))
(.or (.is-not 'member non-constituent)
(.and (.char= #\\) (.item)))))

(defun .list-first ()
(.error "List has nothing before after ."))
(.sexp)))

(defun .list-rest ()
(.or (.and (.read (.char= #\.))
(.or (.sexp) (.error "List has nothing after .")))
(.list :start nil :end nil)))

(defun .list (&key
(start (.char= #\())
(endchar #\))
(end (.char= endchar)))

(let ((false (gensym "false")))
(.let* ((_ (or start
(.identity nil)))
(first (.or (.list-first) (.identity false)))
(rest (if (not (eq first false))
(.or (.list-rest)
(.identity false))
(.identity false)))
(_ (if (or (listp rest) (eq false rest))
(.progn (.optional (.whitespace))
(or end (.identity nil)))
(.or (.char= endchar)
(.error "More than one object follows . in list.")))))
(.identity (if (not (eq false first))
(list* first (if (eq false rest) nil rest)))))))

(defun .list (&key
(start (.char= #\())
(end (.char=  #\))))
(.or
(.let* ((start (or start
(.identity nil)))
(first (.optional (.sexp)))
(rest (if first
(.optional (.list :start nil :end nil))
(.identity nil)))
(end (.progn (.optional (.whitespace))
(or end (.identity nil)))))
(.identity (if first (list* first rest))))))

(defvar *nil* (gensym))

(defun .nil ()
(.and (.or (.string-equal "NIL")
(.string= "()"))
(.identity *nil*)))

(defun .list (&key delimited &aux (not (gensym)))
(.or (.nil)
(.let* ((_ (if (not delimited)
(prog1 (.char= #\()
(setf delimited #\)))
(.identity nil)))
(first (.sexp))
(dot (.optional (.dot)))
(rest (if dot
(.sexp)
(.or (.list :delimited #\))
(.and (.char= delimited)
(.identity :nil))))))
(.identity (list* first rest)))))

(defun .cons ()
(.let* ((_ (.char= #\())
(car (.sexp))
(_ (.dot))
(cdr (.sexp))
(_ (.char= #\))))
(.identity (cons car cdr))))
```
```(defun .read (&optional (parser (.sexp)))
(.prog2 (.optional (.whitespace))
parser
(.optional (.whitespace))))

(defun .sexp ()
(.read (.or (.list) (.atom))))

(defun .atom ()
(.first (.map 'string (.constituent))))

(defun .constituent ()
(let ((non-constituent
(list* #\( #\) (whitespace))))
(.or (.is-not 'member non-constituent)
(.and (.char= #\\) (.item)))))

(defun .error (message &rest args)
(lambda (input)
(declare (ignore input))
(apply #'error message args)))

(defun .list-first ()
(.error "List has nothing before after ."))
(.sexp)))

(defun .list-rest ()
(.or (.and (.read (.char= #\.))
(.or (.sexp) (.error "List has nothing after .")))
(.list :start nil :end nil)))

(defun .list (&key
(start (.char= #\())
(endchar #\))
(end (.char= endchar)))

(let ((false (gensym "false")))
(.let* ((_ (or start
(.identity nil)))
(first (.or (.list-first) (.identity false)))
(rest (if (not (eq first false))
(.or (.list-rest)
(.identity false))
(.identity false)))
(_ (if (or (listp rest) (eq false rest))
(.progn (.optional (.whitespace))
(or end (.identity nil)))
(.or (.char= endchar)
(.error "More than one object follows . in list.")))))
(.identity (if (not (eq false first))
(list* first (if (eq false rest) nil rest)))))))

(defun .list (&key
(start (.char= #\())
(end (.char=  #\))))
(.or
(.let* ((start (or start
(.identity nil)))
(first (.optional (.sexp)))
(rest (if first
(.optional (.list :start nil :end nil))
(.identity nil)))
(end (.progn (.optional (.whitespace))
(or end (.identity nil)))))
(.identity (if first (list* first rest))))))

(defvar *nil* (gensym))

(defun .nil ()
(.and (.or (.string-equal "NIL")
(.string= "()"))
(.identity *nil*)))

(defun .list (&key delimited &aux (not (gensym)))
(.or (.nil)
(.let* ((_ (if (not delimited)
(prog1 (.char= #\()
(setf delimited #\)))
(.identity nil)))
(first (.sexp))
(dot (.optional (.dot)))
(rest (if dot
(.sexp)
(.or (.list :delimited #\))
(.and (.char= delimited)
(.identity :nil))))))
(.identity (list* first rest)))))

(defun .cons ()
(.let* ((_ (.char= #\())
(car (.sexp))
(_ (.dot))
(cdr (.sexp))
(_ (.char= #\))))
(.identity (cons car cdr))))
```

In the usual parenthesized syntax of Lisp, an s-expression is classically defined inductively as

1. an atom, or
2. an expression of the form (x . y) where x and y are s-expressions.

[…]

Most modern sexpr notations in addition use an abbreviated notation to represent lists in s-expressions, so that

(x y z) stands for

(x . (y . (z . NIL)))

where NIL is the special end-of-list symbol (written '() in Scheme).

## Source Code

### Testing

```  ; emacs : ≻ C-x 8 RET 227b RET
≕ C-x 8 RET 2255 RET
; gnome/X : Ctrl-Shift-u 227b
(let ((defpackage *package*))
(defmacro ≻ (form &key ((≕≻ provided-result) nil result-provided?)
(test ''equal))
(let* ((form-result (gensym))
(form-string (with-output-to-string (s) (print form s)))
(new-form (let ((*package* (find-package (package-name defpackage))))
(result (gensym))
(values (when (and result-provided? (listp provided-result))
(eq (first provided-result) 'cl:values))))

`(progn
(let* ((,form-result (multiple-value-list ,new-form))
(,result (if ,result-provided?
(funcall ,test ,(if values
form-result
`(first ,form-result))
,(if values
`(multiple-value-list ,provided-result)
provided-result))
t)))
(assert ,result
() "~A~% => ~S ~%  ...should be :~%    ~S"
',form ,form-result ',(if result-provided?
provided-result
"Something that evaluates to a non-NIL value"))
(apply #'values ,form-result))))))
```

### `SMUG/TUTORIAL`

#### The `PARSER` itself

```#:.item
```

#### The Testing Reader Macro

```(defmacro smug> (form &key ((:=> expected-result) nil result-given?))
(let ((results (gensym))
(result (gensym)))
`(let* ((,results (multiple-value-list ,form))
(,result (first ,results)))
(multiple-value-prog1 (apply #'values ,results)
, (when result-given?
`(assert (equalp ,result ',expected-result)))))))

(set-macro-character
#\≻
(declare (ignore c))
(set-macro-character #\≻ nil)
(let* ((form (read s))
(char (peek-char t s nil #\null))
(x (gensym))
(expected-result (if (char= char #\≕)
x)))

`(smug> ,form ,@(when (not (eq x expected-result))
`(:=> ,expected-result))))))
```

#### The tutorial.lisp file

```(defpackage :smug/tutorial
(:use :cl)
(:export
#:≻
#:.identity
#:.fail
#:.item
#:.bind

#:input-empty-p
#:input-first
#:input-rest
#:run
#:parse

#:.plus
#:.or
#:.not
#:.let*
#:.map
#:.concatenate
#:.is
#:.is-not
#:.char=
#:.char-equal
#:.string-equal
#:.string=
#:.progn
#:.prog1
#:.prog2
#:.and
#:.or
#:.not
#:.first
#:.optional
))
(in-package :smug/tutorial)

(defmacro .let* (bindings &body body)
(if bindings
(let ((symbol (first (first bindings))))
`(.bind ,@(cdr (first bindings))
(lambda (,symbol)
,@(when (or (string-equal (symbol-name symbol) "_")
(null (symbol-package symbol)))
`((declare (ignorable ,symbol))))
(.let* ,(cdr bindings)
,@body))))
`(progn ,@body)))

(defun run (parser input)
(funcall parser input))

(defun .fail ()
(lambda (input) (declare (ignore input)) nil))

(defun .plus (first-parser second-parser)
(lambda (input)
(append (funcall first-parser input) (funcall second-parser input))))

(defun .identity (value)
(lambda (input)
(list (cons value input))))

(defun .bind (parser function)
(lambda (input)
(loop :for (value . input) :in (run parser input)
:append (run (funcall function value) input))))

(defun .or (parser &rest parsers)
(lambda (input)
(or (funcall parser input)
(when parsers
(funcall (apply #'.or parsers) input)))))

(defun .not (parser)
(lambda (input)
(let ((result (funcall parser input)))
(if result
nil
(list (cons t input))))))

(defgeneric input-empty-p (input)
(:method ((input string)) (zerop (length input))))

(defgeneric input-first (input)
(:method ((input string)) (aref input 0)))

(defgeneric input-rest (input)
(:method ((input string))
(multiple-value-bind (string displacement)
(array-displacement input)
(make-array (1- (length input))
:displaced-to (or string input)
:displaced-index-offset (1+ displacement)
:element-type (array-element-type input)))))

(defun .item ()
(lambda (input)
(unless (input-empty-p input)
(list (cons (input-first input)
(input-rest input))))))

(defun .satisfies (predicate &rest args)
(.bind (.item)
(lambda (x)
(if (apply predicate x args)
(.identity x)
(.fail)))))

(defun .optional (parser)
(.or parser (.identity nil)))

(defun .and (p1 &rest ps)
(.let* ((result p1))
(if ps
(apply #'.and ps)
(.identity result))))

(defmacro .progn (&rest parsers)
(if (rest parsers)
(let ((name (gensym)))
`(.let* ((,name ,(first parsers)))
(.progn ,@(rest parsers))))
(first parsers)))

(defmacro .prog1 (parser &rest parsers)
(let ((name (gensym))
(ignore (gensym)))
`(.let* ((,name ,parser)
(,ignore (.progn ,@parsers)))
(.identity ,name))))

(defmacro .prog2 (parser1 parser2 &rest parsers)
(let ((name (gensym))
(ignore (gensym)))
`(.let* ((,ignore ,parser1)
(,name ,parser2)
(,ignore (.progn ,@parsers)))
(.identity ,name))))

(defun .is-not (predicate &rest args)
(.satisfies (lambda (i)
(cl:not (apply predicate i args)))))

(defun .is (predicate &rest args)
(apply #'.satisfies predicate args))

(defun .mapcar (parser)
(.plus (.let* ((x parser)
(xs (.mapcar parser)))
(.identity (cons x xs)))
(.identity ())))

(defun .mapc (parser)
(.plus (.let* ((_ parser)
(_ (.mapc parser)))
(.identity parser))
(.identity parser)))

(defun .make-list (size &key (initial-element (.item)))
(if (zerop size)
(.identity nil)
(.let* ((first initial-element)
(rest (.make-list (1- size)
:initial-element initial-element)))
(.identity (list* first rest)))))

(defun .concatenate (output-type-spec &rest parsers)
(if (not parsers)
(.fail)
(.let* ((first (first parsers))
(rest (if (rest parsers)
(apply
#'.concatenate output-type-spec (rest parsers))
(.identity nil))))
(.identity (cl:concatenate output-type-spec first rest)))))

(defun .map (result-type parser
&key
(at-least 1))
"=> a ~result-type~ of /parser/ results."
(.let* ((list-1 (.make-list at-least :initial-element parser))
(list-2 (funcall (if result-type #'.mapcar #'.mapc) parser)))
(.identity (when result-type (concatenate result-type list-1 list-2)))))

(defun .char= (x)
(.is #'cl:char= x))

(defun .digit-char-p ()
(.is #'cl:digit-char-p))

(defun .lower-case-p ()
(.is #'cl:lower-case-p))

(defun .upper-case-p ()
(.is #'cl:upper-case-p))

(eof-error-p t)
eof-value)
(.let* ((text (.optional
(.first (.map 'list (.is-not #'char= #\Newline)))))
(newline (.or (.char= #\Newline)
(.and (.not (.item))
(.identity '())))))
(if (or text newline)
(.identity (concatenate 'string text (when newline (string newline))))
(if eof-error-p
(.fail)
(.identity eof-value)))))

(defun .string= (string)
(if (string= string "")
(.identity string)
(.let*
((_ (.is 'char= (aref string 0)))
(_ (.string= (input-rest string))))
(.identity string))))

(defun .char-equal (char)
(.is #'cl:char-equal char))

(defun .string-equal (string)
(labels ((%string-equal (string)
(.let* ((first (.char-equal (aref string 0)))
(rest (if (> (length string) 1)
(%string-equal (subseq string 1))
(.identity nil))))
(.identity (cons first rest)))))
(.let* ((list (%string-equal string)))
(.identity (coerce list 'string)))))

(defun .first (parser)
(lambda (input)
(let ((results (run parser input)))
(when results (list (cl:first results))))))

; emacs : ≻ C-x 8 RET 227b RET
≕ C-x 8 RET 2255 RET
; gnome/X : Ctrl-Shift-u 227b
(let ((defpackage *package*))
(defmacro ≻ (form &key ((≕≻ provided-result) nil result-provided?)
(test ''equal))
(let* ((form-result (gensym))
(form-string (with-output-to-string (s) (print form s)))
(new-form (let ((*package* (find-package (package-name defpackage))))
(result (gensym))
(values (when (and result-provided? (listp provided-result))
(eq (first provided-result) 'cl:values))))

`(progn
(let* ((,form-result (multiple-value-list ,new-form))
(,result (if ,result-provided?
(funcall ,test ,(if values
form-result
`(first ,form-result))
,(if values
`(multiple-value-list ,provided-result)
provided-result))
t)))
(assert ,result
() "~A~% => ~S ~%  ...should be :~%    ~S"
',form ,form-result ',(if result-provided?
provided-result
"Something that evaluates to a non-NIL value"))
(apply #'values ,form-result))))))
```

## Footnotes:

1

Monadic parser combinators (pdf, ps, bibtex) Graham Hutton and Erik Meijer. Technical Report NOTTCS-TR-96-4, Department of Computer Science, University of Nottingham, 1996.

3