|
Concept overview
Index
Just add modules in JastAdd
The name JastAdd is meant to indicate that it is easy to
just add various modules written in a language based on java and
the ast (the Abstract Syntax Tree). You can easily extend your
language implementation, both syntactically and computationally.
There are three main ideas that contribute to this modularity and
extensibility: object-orientation, static aspects, and
declarative computations. In addition, you can make use of
imperative computations (ordinary Java code), but those parts of your
system can then not be extended in the same way. This will be
discussed below.
Abstract syntax is a class hierarchy
The abstract syntax is modelled by an object-oriented class
hierarchy in Java, giving the normal benefits of object-orientation.
These classes are called AST classes since they model nodes in
the abstract syntax tree. For example, if your language has
assignments and while loops, you will typically have an abstract
superclass Stmt with concrete subclasses Assignment and WhileLoop.
General and default behavior for statements can be implemented by
adding behavior to the Stmt class, and behavior special for
Assignment and WhileLoop can be implemented in the subclasses.
Static aspects modularize crosscutting
behavior
Our notion of static aspects is essentially the same as
introduction in AspectJ or open classes in MultiJava.
The aspects allow you to add features to AST classes without having
to syntactically edit those classes. For example, when you implement
type checking, you need to add some specific behavior to most of the
AST classes. This behavior can be grouped together into an aspect. If
you want to extend the system with code generation, simply add a new
aspect module for that.
Declarative features allow decoupling of
behavior
JastAdd supports declarative features for implementing behavior in
the form of attributes and rewrites. Attributes are
values attached to the AST nodes. Rewrites allow the AST to be
changed. The declarative implementation of this behavior means that
you don't need to specify in which order attributes are given values
or when rewrites are carried out. The order is instead figured out
automatically by the JastAdd evaluation engine, and follows the
dependencies between the different attributes and rewrites. This
order will depend on which aspect modules are included and also on
the actual program that you are compiling or analyzing.
The important advantage of using these declarative features over
simply programming using ordinary Java code, is that they allow a
high degree of decoupling between the different pieces of behavior.
For example, you don't need to specify that name resolution is
computed before type checking. If the type checker needs the
information computed by the name resolution module, the system will
automatically compute the name resolution behavior first. More
importantly, it may be the case that you actually need some of the
name resolution in order to compute some of the type checking, and
vice versa. The declarative features allow you to write the type
checker assuming that all the names are already resolved, and the
name resolution assuming that all the types are already computed. The
system will figure out in which order to compute the individual names
and types. (For circular dependencies, see the discussion on circular
attributes below.)
Since you do not have to specify the order in which different
pieces of information are computed, you can modularize chunks of
behavior according to how you want to reuse the behavior,
rather than according to in which order the behavior is computed.
An additional advantage of declarative implementation is that it
is more concise than the imperative implementation. This is mainly
because the order of computation is implicit and because
optimizations are generated and need not be hand coded.
Attributes are defined by equations
JastAdd supports attributes in the sense of attribute
grammars: attributes are declared in AST classes, and their
values are defined by equations. As in attribute grammars, an
attribute is either synthesized or inherited depending
on if it is used for propagating information upwards or downwards in
the AST. In addition, JastAdd has features not found in plain
attribute grammars: there is support for reference-valued attributes,
for parameterized attributes, for circularly defined attributes, and
for nonterminal attributes. All of these are explained briefly
below.
There are several possible analogies between attributes and OO
concepts. One is that an attribute corresponds to a field in an AST
node, and an equation corresponds to a method that computes the value
of the field. Similar to OO methods, an equation in a subclass can
override an equation in a superclass. The JastAdd system ensures that
the "equation methods" are called in such an order so that whenever
you try to access an attribute, its value will already be computed.
If you traverse an AST and look at all the attributes, they will all
have values according to their defining equations. I.e., the
equations hold.
In JastAdd, some attributes are indeed represented as fields, but
this is done under the hood. The interface to an attribute is a
method with the same name. E.g., to access the value of an attribute
"a", you write "a()".
An equation is different from an ordinary method in that it must
correspond to a function. I.e., it must compute the same value
each time it is executed, and it must not result in any visible side
effects. The syntax for equations is also slightly different from the
syntax for ordinary methods.
Synthesized attributes are almost like
virtual methods
Synthesized attributes are used for computing information in an
AST node, usually in order to propagate that information upwards in
the AST. A typical example is to compute the type of an expression.
The type is declared as a synthesized attribute of a general AST
class for expressions. Equations then define the value of the
attribute for the different kinds (subclasses) of expressions. For
example, an addition node can have an equation defining the type as
an Integer or Float, depending on the types of its operands.
A synthesized attribute is analogous to an ordinary virtual method
(where the method is a side-effect free function): The attribute
declaration corresponds to the method declaration, and the equations
to the method implementations. General classes can have default
equations that are overridden in subclasses, analogous to method
overriding.
So why use synthesized attributes instead of ordinary virtual
methods? The main reason is that the system can automatically cache
the value of the synthesized attribute (in a private field), so that
the method body (equation) does not have to be executed each time the
attribute is accessed. For many attributes, such caching is essential
in order to get reasonable speed. Another reason is that the syntax
for equations is more concise than for methods, making your aspects
easier to read.
Inherited attributes pass information to
children
Inherited attributes are used for passing information downwards in
an AST, giving child nodes information about their context. A typical
example is that information about visible declarations, usually
called the environment is passed down to each statement, which
in turn passes down this information to the variables inside it. A
variable can use this environment information to find its declaration
and type.
Note that the term "inherited" attribute has different meanings in
attribute grammars and object-orientation. In the attribute grammar
sense, an inherited attribute is an attribute whose value is
defined in the parent AST node. I.e., if an AST class C
declares an inherited a, then each AST class that has a child
of type C must have an equation defining the a
attribute of that C child. This is checked by the JastAdd
system.
The use of inherited attributes decouples an AST node from its
parent: the AST node does not need to know which parent it has. All
the information it needs is in the inherited attributes whose values
are defined by the parent. This allows AST classes with all their
behavior to be reused in many different contexts. For example,
expressions may occur inside many different kinds of statements and
declarations. The behavior of the expression depends only on its
inherited attributes, not on any specific surrounding node.
Reference attributes link AST nodes
together
In traditional plain attribute grammars, all attributes are
functional values without identity. In contrast, JastAdd allows
attributes to have reference values. This means that an
attribute can be a reference to another AST node. This is a very
powerful notion. It allows different places in the AST to be directly
connected. For example, a use of a variable can be connected directly
to its declaration. Attributes can be accessed directly via such
references. For example, the use node can access its type directly
from the declaration. Similarly, a class can be connected directly to
its superclass; a method call directly to the method declaration; and
so on.
Reference attributes are powerful because they allow graph
structure to be superimposed on top of a tree (the AST), and
information to be propagated along those edges. In contrast, plain
AGs only support information propagation along the AST structure, in
effect forcing any desired graph structure to be encoded in large
complex attributes. It is symptomatic of plain AGs to encode symbol
table information into large complex environment attributes, whereas
in JastAdd, the environment can simply be a reference attribute to
another AST node. All the symbol table information is already in the
AST and does not need to be encoded.
Attributes can have parameters - just like
methods
Another difference between plain AGs and JastAdd AGs is that
JastAdd attributes can have parameters. Previously, we made
the analogy between an attribute and a field. But if the attribute
has parameters, the analogy of a method is closer. Parameterized
attributes are powerful in combination with reference attributes.
They allow information in a distant node to be accessed using a
parameterized method call rather than just accessing a value. For
example, to do type checking of an OO language, a class node can have
a parameterized attribute subclassOf(c) which performs a type
comparison with another class node c.
As mentioned earlier, attribute values can be cached for
efficiency. The same goes for parameterized attributes. If a
parameterized attribute is called with a particular set of arguments,
the resulting value can be cached and returned directly when the
attribute is accessed the next time with the same arguments
(memoization).
Rewrite ASTs to improve them
The AST produced by a parser is seldom ideal for all analysis and
compilation. Rewrite rules allow conditional changes to the AST to be
declared in order to obtain an improved AST, better suited for other
computations. For example, a Java parser cannot distinguish between
static method calls and virtual method calls. Suppose the parser
constructs general MethodCall nodes. These nodes can be replaced by
more specific StaticMethodCall or VirtualMethodCall nodes by using a
rewrite rule that accesses attributes to find out if the call is a
static or virtual method call. By introducing these more specialized
nodes, other computations, e.g., code generation, can be simplified
and modularized.
The rewrites are carried out automatically and implicitly as soon
as the AST is traversed: code traversing the AST will only see the
final rewritten AST. This implicit rewriting frees the user of
scheduling the order of the rewrites, and makes it possible to
combine rewrites implemented in different modules.
Another important use of rewrites is for normalizing the AST,
e.g., to replace shorthands used in the language with a normal form.
This makes subsequent compilation simpler since the special cases
with shorthands do not need to be dealt with. An example from Java is
replacing literal strings with character arrays.
Rewriting also allows for making up for deficiencies in the parser
or odd corners in the language. Many languages do not have a clean
context-free grammar that is easy to parse. An infamous example is
the language C. Instead of using ad-hoc techniques in the semantic
actions of the parser, one can define a clean (but undesired)
context-free grammar for parsing, build the (undesired) AST in a
straight-forward manner in the parser's semantic actions, and then
use rewrites to change the AST into the desired form.
It should be noted that rewrites can use and depend on attribute
information, and that performing one rewrite may lead to that another
rewrite becomes applicable. This allows complex rewrites to be broken
down into a collection of small simple rewrites. Again, the order in
which these smaller rewrites are applied does not need to be
specified explicitly.
Circular attributes allow declarative
iteration
Many static analyses require iterative computations: starting out
with a set of initial values, and iterating until a solution is
found. Examples include generation of SSA form, data-flow analysis,
reachability analysis, etc. The normal way of implementing such
computations is to do it imperatively, using an iterative work-list
algorithm. By using circular attributes, it is possible to express
these computations in a declarative manner instead of
imperatively.
A circular attribute is given an initial value and an equation
defining the attribute, possibly in terms of itself (usually
indirectly via other circular attributes). The JastAdd system will
automatically perform the iteration as needed until a solution is
found, i.e., the equations for the circular attributes hold. Such
iteration will terminate if the values that the attributes can take
on can be organized into a lattice of finite height, and all
equations are monotonic, i.e., they can only raise the value to a
higher point in the lattice. Typical examples are booleans (that go
from false to true) and sets (that go from the empty set to a finite
set, e.g., the set of all declared variables in a program).
By using circular attributes to express iteration declaratively,
there is no need for coding when the iteration should take
place. This allows modules containing these computations to be freely
combined with other modules that use the circular attributes. JastAdd
automatically makes sure that a circular attribute is computed if its
value is needed in some other equation or rewrite.
Expand the AST with nonterminal
attributes
It is sometimes useful to expand an AST with additional nodes that
are defined by equations, rather than constructed by the parser or by
a rewrite rule. These nodes are called nonterminal attributes
(NTAs) since they are both similar to nodes (nonterminals) and to
attributes. An NTA is like a node in that it can itself have
attributes and it can be rewritten. It is also like an attribute in
that it is defined by an equation.
There are similarities between NTAs and rewrites. In both cases,
you can declaratively define changes to the AST. The main difference
is that you should use rewrites when you are interested in replacing
some nodes with others, and NTA's when you want to keep existing
nodes and introduce some additional ones. A typical use of NTA's is
for adding AST subtrees corresponding to predefined types and
methods. This allows predefined and user-defined entities to be
treated in the same way. Another use of NTAs is in doing
instantiation-like computations in the compiler. For example, for
macro-expansion or computations on generic types.
Combine imperative and declarative
aspects
JastAdd allows you to use both imperative and declarative aspect
modules. The declarative modules result in an attributed AST where
the attributes and the traversal methods form an API that can be
freely used by imperative modules. For example, you can express name
and type analysis declaratively, and implement the printing of error
messages imperatively. It is in principle possible to also let the
declarative aspects use things computed imperatively. But then you
need to understand how the declarative evaluation works inside
JastAdd.
Typically, you will write your imperative code as aspect modules,
adding ordinary Java methods to the AST classes. You could also use
the Visitor design pattern if you like, but aspects are usually
simpler to write and extend.
Use any parser generator
JastAdd should work with any Java-based parser generator that
supports user-defined semantic actions. In our projects we have used
JavaCC, CUP,
and Beaver. You need to
write semantic actions in the parser specification in order to build JastAdd ASTs.
If you use the JavaCC system you can use its add-on tool
JJTree for building the JastAdd ASTs. Look at the provided JastAdd
examples for implementation examples.
Under the hood
You don't have to understand the underlying evaluation machinery in order to be able to
use JastAdd. However, you might be curious as to how it works. The main idea is that all evaluation
is done on demand. I.e., an attribute is not evaluated until its value is asked for. And a rewrite of
an AST node is not carried out until there is some code that accesses that node through the AST
traversal API. All accesses
to attributes and nodes go via Java method calls, and the implementation machinery performs the
evaluation transparently. The algorithms
are in our published papers, and you can also look at the generated Java code if you are interested
in specific details of the evaluation. The generated code of a JastAdd project is usually in a
subdirectory called AST.
The demand evaluation scheme allows traversing code to regard the AST as a fully evaluated tree:
any attribute will have a value according to its definition, and there will be no
remaining rewrite
rules that apply to the nodes. Initially, the AST is in fact completely unevaluated, coming raw from
the parser. But any access to a node or an attribute will cause a sufficient amount of evaluation to
take place so that the final value of that node
or attribute is returned. An important consequence of the demand evaluation is that it does not cost
anything in execution time to define an additional attribute. There is only a cost if the attribute is
actually accessed.
As a concrete example of demand evaluation, lets say you have implemented a type checker
for a language. All the code for type checking is implemented in declarative aspects where each expression
has a type() attribute and each identifier has a reference attribute decl() which refers to the
appropriate declaration. The decl() attributes are defined in terms of lookup(String) attributes in the
different blocks that return matching local declarations.
In addition to the declarative aspects, there will be an imperative
aspect that simply traverses the AST, looks at the values of the type() attributes and prints possible
error messages. The main program calls the parser to create the AST and then calls the
type-checking traversal code. It is this
traversal code that ultimately drives the order of evaluation. Let's say the traversal has arrived
at an expression "a" and to type-check it, it accesses its type() attribute.
This attribute is defined in terms of the decl() attribute, so this triggers an evaluation of the decl()
attribute for "a". The decl() attribute is defined in terms of an inherited lookup(String) attribute which
is then evaluated, possibly accessing and evaluating other lookup(String) attributes
in other parts of the AST in order to search through the different blocks for matching declarations.
Later on during the type-checking traversal, perhaps another
expression "a" is encountered. A similar sequence of evaluation takes place, but this time
the access to lookup attributes can be short-circuited returning the same (cached) value that was
accessed for the previous use of "a".
The rewrites are also carried out transparently.
Let's say there are rewrite rules that rewrite the expression "a" to a
VariableAccess node if the declaration of "a" is a variable. In this case,
accessing the "a" node will trigger rewriting of it so that the resulting VariableAccess node
will be returned.
In this process, the decl() attribute of "a" will be evaluated and cached, since it is needed to
determine if the declaration of "a" is indeed a variable.
When the traversal code then accesses the type() attribute, the
computation can be short-circuited already at the decl() attribute which is already computed.
Future work includes optimizations of global
rewrites
There is ongoing research and development for improving JastAdd.
In particular, we are looking at how high-level phases can be used in
order to improve performance. Currently, attributes are not cached if
they are evaluated during a rewrite. This is not a problem for many
applications since most rewrites are very local and affect only a
small subtree of the AST at a time. After the rewrite has finished, the attributes can be cached
for subsequent speedups. However, if the application makes
use of many global rewrites (affecting large subtrees of the AST),
efficiency may be a problem. We expect high-level phases to solve
this problem, allowing attributes to be cached earlier.
|