CZT Design Alternatives and Rationale

This document describes some CZT design issues, the solutions that were considered, the pros and cons for those solutions, the criteria used to choose between, and which choice was made.

Mutable

Each AST interface provides setXXX(newVal) methods as well as getXXX().

+	Some visitors/algorithms are more efficient. For example, the typechecker can just add the missing generic instantiation parameters, without copying the tree.
-	Makes sharing of common subtrees difficult/error-prone
-	Programmers must be careful not to mutate AST objects that are used in the domain of Maps etc. This requires global knowledge.

Mutable-with-listeners

The setXXX(newVal) methods maintain a set of listeners. These listeners are notified whenever a change is made. The code generated by Eclipse uses this approach.

-	More complex.
-	Moves the responsibility for handling changes from the AST classes to the listeners.
+	Allows flexibility of the mutable approach.

Immutable

Each AST interface has only getXXX() methods. All data fields have to be set in the constructor. A variant would be to allow annotations to be mutable, while other data fields are immutable.

+	Less error prone to accidental changes.
+	Makes sharing of common subtrees easy (except for annotations like LocAnn!)
+	Section managers must provide methods to update the AST of a section after a visitor has changed it. This tells the section manager when to discard derived products.

Both

Each AST interface provides setXXX(newVal) methods as well as getXXX(), but implementations are allowed to throw an exception instead of implementing the setXXX methods. Different factories would be provided for each style (mutable/immutable). This is similar to the Java List interface.

-	More complex -- more classes -- user must choose which to use.
+	Allows efficiency of the mutable approach or the safety of the immutable one (depending upon which factory is used).

Criteria

Simplicity for programmers...

Conclusions

We have initially implemented mutable. We are considering implementing 'Both' and gradually moving towards immutable.

Scope Problems: how to avoid variable capture?

There are two kinds of scope problems that arise in a Z tool. The first is that the typechecker must sometimes insert a reference to a global type T from within a hole in the scope of T. This is described in Section 4.3 of the ZB2003 paper ZML: XML Support for Standard Z.

The second scope problem is the standard one of variable capture during substitution P[E/x] -- one must be careful that local (bound) variables do not capture any free variables of E.

These problems become more complex because we want to save parts of a specification as XML files, then reload them later. We also want the XML file to contain links between variable declarations and uses (where possible).

There are three common solutions to the second problem, so we describe these first.

Rename during Substitution

Bound variables are renamed when necessary, to avoid capturing free variables.

+	This is a traditional well-known solution.
-	It is incredibly easy to make subtle errors while implementing this. These errors are rarely uncovered, ecause they depend upon certain combinations of free and bound ariable names, so they result in rare and difficult bugs.

Use de Bruijn indices

Bound variables do not have names. Instead, each reference to a bound variable uses a number -- the distance from the reference up to the point of declaration of that bound variable. So for example, \lambda x:T @ (\lambda y:U @ y x) is represented by \lambda T @ (\lambda U @ 0 1). This would need to be extended to handle the complexity of Z, such as multiple variable declarations.

+	This is another traditional well-known solution, which is reportedly easier to get right than the renaming solution.
-	The printer has to re-add names in a way that is readable and that does not create scope holes.
-	Complexity: the defns of substitution etc. are quite complex.

Unique names for all bound variables

Every bound variable is assigned a unique name which is distinct from all global names. Typically this is done by associating a unique number with each bound variable. A variant is to assign unique names to ALL variables, including globals.

Variants:

we could do this only while the AST is in memory (not save the unique names in XML files), or
within the XML files we could use the ID/IDREF attributes to record these unique names (this is only allowed within a single XML file, not between XML files), or
define our own crossref attributes (an arbitrary string) and allow dangling references between XML files.

+	Substitution becomes trivial -- `P[E/x2]` just involves replacing EVERY occurence of x2 by E (because the free vars of E are disjoint from the bound vars of P).
-	Must generate unique names/numbers across all AST trees. This implies that bound variables within AST trees loaded from XML files must be renumbered (if the numbers are saved in the XML) or assigned fresh numbers.
-	The printer must be careful to print the unique name (rather than the original name) if this is necessary to avoid scope holes. One simple approach is to always use the unique name of bound variables that have the same base name as a global name. This is safe, but prints renamed bound variables slightly more often than necessary.

CZT: Community Z Tools

Should the AST be mutable or immutable or both?