Programming Language Technology DAT151/DIT231 Andreas Abel
Why an interpreter if we can have a compiler?
- Defines the meaning of the language.
- Can serve as a reference implementation.
- (In the end, of a compilation chain, there is always some interpreter.)
An interpreter should be compositional!
We interpret type-checked programs only, since the meaning of overloaded operators depends on their type.
In analogy to type-checking Γ ⊢ e : t
we have judgement
γ ⊢ e ⇓ v
in environment γ
, expression e
has value v
Environments γ
are similar to contexts Γ
, but instead of mapping
variables x
to types t
, they map variables to values v
.
An environment γ
is a stack of blocks δ
, which are finite maps
from variables x
to values v
.
Values are integer, floating-point, and boolean literals, or a special
value null
for being undefined.
This would be a LBNF grammar for our values:
VInt. Value ::= Integer;
VDouble. Value ::= Double;
VBool. Value ::= Bool;
VNull. Value ::= "null";
rules Bool ::= "true" | "false";
(It is possible but not necessary to use BNFC to generate the value representation.)
Variable rule.
---------- γ(x) = v
γ ⊢ x ⇓ v
Note that we take the value of the "topmost" x
--- it may appear in
several blocks δ
of γ
.
{ double z = 3.14; // (z=3.14)
int y = 1; // (z=3.14,y=1)
{ int z = 0; // (z=3.14,y=1).(z=0)
int x = y + z; // (z=3.14,y=1).(z=0,x=1)
printInt(x); // 1
}
}
The meaning of the arithmetic operators depends on the static type.
γ ⊢ e₁ ⇓ v₁ γ ⊢ e₂ ⇓ v₂
--------------------------- v = divide(t,v₁,v₂)
γ ⊢ e₁ /ₜ e₂ ⇓ v
divide(int,v₁,v₂)
is integer division of the integer literalsv₁
byv₂
.divide(double,v₁,v₂)
is floating-point division of the floating-point literalsv₁
byv₂
. Integer literals will be converted to floating-point first.divide(bool,v₁,v₂)
is undefined.
Judgement γ ⊢ e ⇓ v
should be read as a function with inputs γ
and
e
and output v
.
eval(Env γ, Exp e): Value
eval(γ, EId x) = lookup(γ,x)
eval(γ, EInt i) = VInt i
eval(γ, EDouble d) = VDouble d
eval(γ, EDiv t e₁ e₂) = divide(t, eval(γ,e₁), eval(γ,e₂))
In the last clause, eval(γ,e₁)
and eval(γ,e₂)
can be run in any
order, even in parallel!
Expression forms like increment (++x
and x++
) and decrement and
assignment x = e
in general change the values of variables.
This is called a (side) effect.
(In contrast, the type of variables never changes. Typing has no effects.)
We need to update the environment. We return the updated environment along with the value:
γ ⊢ e ⇓ ⟨v; γ'⟩
"In environment γ
, expression e
has value v
and updates environment to γ'
."
We write γ[x=v]
for environment γ'
with γ'(x) = v
and
γ'(y) = γ(y)
when y ≠ x
.
Note that we update the "topmost" x
only --- it may appear in
several blocks δ
of γ
.
Rules.
-------------- γ(x) = v
γ ⊢ x ⇓ ⟨v; γ⟩
-------------------------------- γ(x) = i
γ ⊢ ++(int)x ⇓ ⟨i+1; γ[x = i+1]⟩
γ ⊢ e ⇓ ⟨v; γ'⟩
--------------------------
γ ⊢ (x = e) ⇓ ⟨v; γ'[x=v]⟩
γ ⊢ e₁ ⇓ ⟨v₁;γ₁⟩ γ₁ ⊢ e₂ ⇓ ⟨v₂;γ₂⟩
------------------------------------ v = divide(t,v₁,v₂)
γ ⊢ e₁ /ₜ e₂ ⇓ ⟨v;γ₂⟩
The last rule show: we need to thread the environment through the judgements.
The evaluation order matters now!
eval(Env γ, Exp e): Value × Env
eval(γ, EId x) = ⟨ lookup(γ,x), γ ⟩
eval(γ, EAssign x e) = let
⟨v,γ'⟩ = eval(γ,e)
in ⟨ v, update(γ',x,v) ⟩
eval(γ, EDiv t e₁ e₂) = let
⟨v₁,γ₁⟩ = eval(γ, e₁)
⟨v₂,γ₂⟩ = eval(γ₁,e₂)
in ⟨ divide(t,v₁,v₂), γ₂ ⟩
Consider x=0 ⊢ x + x++
vs x=0 ⊢ x++ + x
.
E.g. in Java, we can use a environment env
global to the interpreter and
mutate it with update.
eval(Exp e): Value
eval(EId x): return env.lookup(x)
eval(EAssign x e):
v ← eval(e)
env.update(x,v)
return v
eval(γ, EDiv t e₁ e₂):
v₁ ← eval(e₁)
v₂ ← eval(e₂)
return divide(t,v₁,v₂)
This keeps the environment implicit.
In Haskell, we can use the state monad for the same purpose.
import Control.Monad.State (State, get, modify, evalState)
eval : Exp → State Env Value
eval (EId x) = do
γ ← get
lookupVar γ x
eval (EAssign x e) = do
v ← eval e
modify (λ γ → updateVar γ x v)
return v
eval (EDiv TInt e₁ e₂) = do
VInt i₁ ← eval e₁
VInt i₂ ← eval e₂
return (VInt (div i₁ i₂))
interpret : Exp → Value
interpret e = evalState (eval e) emptyEnv
The Haskell state monad is just sugar. It is implemented roughly by:
type State s a = s → (a, s)
get :: State s s -- s → (s, s)
get s = (s, s)
modify :: (s → s) → State s () -- s → ((), s)
modify f s = ((), f s)
return :: a → State s a -- s → (a, s)
return a s = (a, s)
(do x ← p; q(x)) s = let
(a, s₁) = p s
(b, s₂) = q(a) s₁
in (b, s₂)
Input and output:
---------------------- read i from stdin
γ ⊢ readInt() ⇓ ⟨i, γ⟩
γ ⊢ e ⇓ ⟨i, γ'⟩
---------------------------- print i on stdout
γ ⊢ printInt(e) ⇓ ⟨null, γ'⟩
We do not model input and output mathematically here.
Note that read i from stdin
can also abort with an exception.
Exercise: Could we model evaluation with input and output as a mathematical function? Hint:
- The input and the output both could be a stream of strings.
- Exceptions need to propagate. (See
return
statement below.)
What is wrong with this rule?
γ ⊢ e₁ ⇓ ⟨b₁;γ₁⟩ γ₁ ⊢ e₂ ⇓ ⟨b₂;γ₂⟩
------------------------------------ b = b₁ ∧ b₂
γ ⊢ e₁ && e₂ ⇓ ⟨b;γ₂⟩
Here are some examples where we would like to shortcut computation:
int b = 1;
if (b == 0 && the_goldbach_conjecture_holds_up_to_10E100) { ... }
int x = 0 * number_of_atoms_on_the_moon;
Short-cutting logical operators like &&
is essentially used in C, e.g.:
if (p != NULL && p[0] > 10)
Without short-cutting, the program would crash in p[0]
by accessing
a NULL
pointer.
Rules with short-cutting:
γ ⊢ e₁ ⇓ ⟨false, γ₁⟩
-------------------------
γ ⊢ e₁ && e₂ ⇓ ⟨false, γ₂⟩
γ ⊢ e₁ ⇓ ⟨true, γ₁⟩ γ₁ ⊢ e₂ ⇓ ⟨b, γ₂⟩
----------------------------------------
γ ⊢ e₁ && e₂ ⇓ ⟨b;γ₂⟩
The effects of e₂
are not executed if e₁
already evaluated to false
.
Example:
while (x < 10 && f(x++)) { ... }
You can circumvent this by defining your own and
:
bool and(bool x, bool y) { return x && y; }
while (and(x < 10, f(x++))) { ... }
Digression on: call-by-name (call-by-need) call-by-value
- call-by-value: evaluate function arguments, pass values to function
- call-by-name: pass expressions to function unevaluated (substitution)
- call-by-need: like call-by-name, only evaluate each argument when it is used the first time. If it is used a second time, reuse the value computed the first time.
Example 1:
double (x) = x + x
double (1+2)
- call-by-value:
... = double (3) = 3 + 3 = 6
- call-by-name:
... = (1+2) + (1+2) = 3 + (1+2) = 3 + 3 = 6
- call-by-need:
... = let x=1+2 in x + x = let x = 3 in x + x = 3 + 3 = 6
Example 2:
zero (x) = 0
zero (1+2)
- call-by-value:
... = double (3) = 0
- call-by-name:
... = 0
- call-by-need:
... = let x=1+2 in 0 = 0
Languages with effects (such as C/C++) mostly use call-by-value. Haskell is pure (no effects) and uses call-by-need, which refines call-by-name.
Statements do not have a value. They just have effects.
γ ⊢ s ⇓ γ'
γ ⊢ ss ⇓ γ'
Sequencing γ ⊢ ss ⇓ γ'
:
γ ⊢ ε ⇓ γ
γ ⊢ s ⇓ γ₁ γ₁ ⊢ ss ⇓ γ₂
---------------------------
γ ⊢ s ss ⇓ γ₂
An empty sequence is interpreted as the identity function id : Env → Env
,
sequence composition as function composition.
Blocks.
γ. ⊢ ss ⇓ γ'.δ
--------------
γ ⊢ {ss} ⇓ γ'
Variable declaration.
γ.δ[x=null] ⊢ e ⇓ ⟨v, γ'.δ'⟩
----------------------------
γ.δ ⊢ t x = e; ⇓ γ'.δ'[x=v]
While.
γ ⊢ e ⇓ ⟨false; γ'⟩
--------------------
γ ⊢ while (e) s ⇓ γ'
γ ⊢ e ⇓ ⟨true; γ₁⟩ γ₁. ⊢ s ⇓ γ₂.δ γ₂ ⊢ while e (s) ⇓ γ₃
-------------------------------------------------------------
γ ⊢ while (e) s ⇓ γ₃
Or:
γ ⊢ e ⇓ ⟨true; γ₁⟩ γ₁ ⊢ { s } while e (s) ⇓ γ₃
-------------------------------------------------------------
γ ⊢ while (e) s ⇓ γ₃
Return?
γ ⊢ e ⇓ ⟨v, γ'⟩
-------------------
γ ⊢ return e; ⇓ ??
Return alters the control flow: We can exit from the middle of a loop, block etc.!
bool prime (int p) {
if (p <= 2) return p == 2;
else {
int q = 3;
while (q * q <= p) {
if (divides(q,p)) return false;
else q = q + 2;
}
}
return true;
}
Similar to return
: break
, continue
.
return
can be modelled as an exception that carries the return
value v
as exception information.
When calling a function, we handle the exception, treating it as
regular return from the function.
γ ⊢ s ⇓ r
r ::= v | γ'
The disjoint union Value ⊎ Env
could be modeled in LBNF as:
Return. Result ::= "return" Value;
Continue. Result ::= "continue" Env;
Return.
γ ⊢ e ⇓ ⟨v, γ'⟩
------------------
γ ⊢ return e; ⇓ v
Statement as expression.
γ ⊢ e ⇓ ⟨v, γ'⟩
---------------
γ ⊢ e; ⇓ γ'
Sequence: Propagate exception.
γ ⊢ s ⇓ v
-------------
γ ⊢ s ss ⇓ v
γ ⊢ s ⇓ γ₁ γ₁ ⊢ ss ⇓ r
-------------------------
γ ⊢ s ss ⇓ r
Exercise: How to change while
?
One solution:
γ ⊢ e ⇓ ⟨true; γ₁⟩ γ₁ ⊢ { s } while e (s) ⇓ r
-------------------------------------------------------------
γ ⊢ while (e) s ⇓ r
Function call.
γ ⊢ e₁ ⇓ ⟨v₁, γ₁⟩
γ₁ ⊢ e₂ ⇓ ⟨v₂, γ₂⟩
...
γₙ₋₁ ⊢ eₙ ⇓ ⟨vₙ, γₙ⟩
(x₁=v₁,...,xₙ=vₙ) ⊢ ss ⇓ v
------------------------------- σ(f) = t f (t₁ x₁,...,tₙ xₙ) { ss }
γ ⊢ f(e₁,...,eₙ) ⇓ ⟨v, γₙ⟩
To implement the side condition, we need a global map σ
from
function names f
to their definition t f (t₁ x₁,...,tₙ xₙ) { ss }
.
In Java, we can use Java's exception mechanism.
eval(ECall f es):
vs ← eval(es)
(Δ,ss) ← lookupFun(f)
γ ← saveEnv()
setEnv(makeEnv(Δ,vs))
try {
execStms(ss)
} catch {
ReturnException v:
setEnv(γ)
return v
}
execStm(SReturn e):
v ← eval(e)
throw new ReturnException(v)
In Haskell, we can use the exception monad.
import Control.Monad.Except
import Control.Monad.Reader
import Control.Monad.State
type M = ReaderT Sig (StateT Env (ExceptT Value IO))
eval :: Exp → M Value
eval (ECall f es) = do
vs ← mapM eval es
(Δ,ss) ← lookupFun f
γ ← get
put (makeEnv Δ vs)
execStms ss `catchError` λ v → do
put γ
return v
execStm :: Stm → M ()
execStm (SReturn e) = do
v ← eval e
throwError v
More in the hands-on lecture...
A program is interpreted by executing the statements of the main
function in an environment with just one empty block.