Wherein is discussed the lexing, parsing, semantics, etc. of Joy language implemented in Prolog.
Compare and contrast with similar work for a small FP language:
→ Programming Language Semantics in PrologSymbols can be made of any non-blank characters except ``[`` and ``]`` which are fully reserved for list literals (aka "quotes"). ``true`` and ``false`` would be valid symbols but they are reserved for Boolean literals.
First lex a stream of character codes into tokens separated by square brackets or whitespace. We keep the brackets and throw away the blanks.
joy_lex([tok(Token)|Ls]) --> chars(Token), !, joy_lex(Ls).
joy_lex([ lbracket|Ls]) --> "[", !, joy_lex(Ls).
joy_lex([ rbracket|Ls]) --> "]", !, joy_lex(Ls).
joy_lex(Ls) --> [Space], {code_type(Space, space)}, !, joy_lex(Ls).
joy_lex([]) --> [].
The `chars//1` predicate is just a greedy non-blank, non-bracket character code recognizer that scoops them into a list
chars([Ch|Rest]) --> char(Ch), chars(Rest).
chars([Ch]) --> char(Ch).
char(Ch) --> [Ch],
{ Ch \== 0'[,
Ch \== 0'],
code_type(Ch, graph) }.
Then parse the tokens converting them to Prolog values and building up the list structures (if any.) Integers are converted to Prolog integers, symbols and bools to Prolog atoms, and list literals to Prolog lists. Then the values are wrapped in unary "type" predicates:
int/1, symbol/1, bool/1, list/1
The parser just recognizes streams of terms:
joy_parse([J|Js]) --> joy_term(J), joy_parse(Js).
joy_parse([]) --> [].
Terms are either tokens or lists of terms:
joy_term(list(J)) --> [lbracket], joy_parse(J), [rbracket].
joy_term(Atomic) --> [tok(Codes)], {joy_token(Atomic, Codes)}.
Tokens are integers, Boolean values, or symbols (names of functions):
joy_token(int(I), Codes) :- number(I, Codes, []), !. % See dcg/basics.
joy_token(bool(true), "true") :- !.
joy_token(bool(false), "false") :- !.
joy_token(symbol(S), Codes) :- atom_codes(S, Codes).
All that remains is to put the lexer and parser together in a single relation like so
text_to_expression(Text, Expression) :-
phrase(joy_lex(Tokens), Text), !,
phrase(joy_parse(Expression), Tokens).
The fundamental Joy relation involves an expression and two stacks. One stack serves as input and the other as output.
thun(Expression, InputStack, OutputStack)
The null expression (denoted by an empty Prolog list) is effectively an identity function and serves as the end-of-processing marker.
thun([], S, S).
Literals are put on the stack.
thun([Lit|E], Si, So) :- literal(Lit), !, thun(E, [Lit|Si], So).
literal(int(_)).
literal(bool(_)).
literal(list(_)).
Symbols are the names of either functions, combinators, or definitions, and are looked up (as Prolog clauses ``func/3`` or ``combo/5`` or ``def/2``) and evaluated
thun([Func|E], Si, So) :-
func(Func, Si, S),
thun(E, S, So).
thun([Combo|E], Si, So) :-
combo(Combo, Si, S, E, Eo),
thun(Eo, S, So).
thun([Def|E], Si, So) :-
def(Def, Body),
append(Body, E, Eo),
thun(Eo, Si, So).
Note that functions cannot affect the expression while combinators can, that's the semantic difference between them. Technically combinators are functions.
Definitions simply append their body expressions to the current expression and carry on evaluation from there.
(These Prolog relations are all tail-recursive, which is nice.)
Functions are relations ``func/3`` that relate a symbol name to some operation from stacks to stacks.
Basic stack manipulation use logic variables as a kind of "any type" as they do not inspect or otherwise use the values on the stack:
func(swap, [A, B|S], [B, A|S]).
func(dup, [A|S], [A, A|S]).
func(pop, [_|S], S).
Can be described in several ways. Symbolic:
func(add, [int(A), int(B)|S], [int(A+B)|S]).
Note that, in Prolog, the above code does not actually perform arithmetic. The term ``A+B`` is syntactic sugar for ``'+'(A, B)``, it's a plain ol' Prolog term.
We can use CLP(FD) constraints instead:
func(add, [int(A), int(B)|S], [int(C)|S]) :- C #= A + B.
This gets into "Categorical" programing (using different semantics is akin to evaluating over different categories, I think.)
The difference between regular functions and combinators is that functions (as we are using the word here) cannot change the pending expression (continuation) but combinators can, typically by prepending quoted programs to the pending expression. As currently written, there are no combinators that can alter the pending expression in ways other than prepending code.
combo(i, [list(P)|S], S, Ei, Eo) :- append(P, Ei, Eo).
combo(branch, [list(T), list(_), bool(true)|S], S, Ei, Eo) :- append(T, Ei, Eo).
combo(branch, [list(_), list(F), bool(false)|S], S, Ei, Eo) :- append(F, Ei, Eo).
combo(loop, [list(_), bool(false)|S], S, E, E ).
combo(loop, [list(B), bool(true)|S], S, Ei, Eo) :-
append(B, [list(B), symbol(loop)|Ei], Eo).
As you can see, there is nothing to it. The branch and loop combinators are prefectly straightforward, the loop combinator mentions itself showing the underlying unity of looping and recursion.
New functions are just named Joy expressions, and work by prepending their expressions to the pending expression.
swapd := [swap] dip swons := swap cons swoncat := swap concat etc...
And that's it. That's all there is to Joy. It's extremely simple and elegant.