The info database provides a functional interface to global information about named things in CMUCL. Information is considered to be global if it must persist between invocations of the compiler. The use of a functional interface eliminates the need for the compiler to worry about the details of the representation. The info database also handles the need to multiple ``global'' environments, which makes it possible to change something in the compiler without trashing the running Lisp environment.
The info database contains arbitrary lisp values, addressed by a combination of name, class and type. The Name is an EQUAL-thing which is the name of the thing that we are recording information about. Class is the kind of object involved: typical classes are Function, Variable, Type. A type names a particular piece of information within a given class. Class and Type are symbols, but are compared with STRING=.
May be worth having a byte-code representation for interpreted code. This way, an entire system could be compiled into byte-code for debugging (the "check-out" compiler?).
Given our current inclination for using a stack machine to interpret IR1, it would be straightforward to layer a byte-code interpreter on top of this.
Instead of having no interpreter, or a more-or-less conventional interpreter, or byte-code interpreter, how about directly executing IR1?
We run through the IR1 passes, possibly skipping optional ones, until we get through environment analysis. Then we run a post-pass that annotates IR1 with information about where values are kept, i.e. the stack slot.
We can lazily convert functions by having FUNCTION make an interpreted function object that holds the code (really a closure over the interpreter). The first time that we try to call the function, we do the conversion and processing. Also, we can easily keep track of which interpreted functions we have expanded macros in, so that macro redefinition automatically invalidates the old expansion, causing lazy reconversion.
Probably the interpreter will want to represent MVs by a recognizable structure that is always heap-allocated. This way, we can punt the stack issues involved in trying to spread MVs. So a continuation value can always be kept in a single cell.
The compiler can have some special frobs for making the interpreter efficient, such as a call operation that extracts arguments from the stack slots designated by a continuation list. Perhaps
(values-mapcar fun . lists) <==> (values-list (mapcar fun . lists))
This would be used with MV-CALL.
This scheme seems to provide nearly all of the advantages of both the compiler and conventional interpretation. The only significant disadvantage with respect to a conventional interpreter is that there is the one-time overhead of conversion, but doing this lazily should make this quite acceptable.
With respect to a conventional interpreter, we have major advantages: + Full syntax checking: safety comparable to compiled code. + Semantics similar to compiled code due to code sharing. Similar diagnostic messages, etc. Reduction of error-prone code duplication. + Potential for full type checking according to declarations (would require running IR1 optimize?) + Simplifies debugger interface, since interpreted code can look more like compiled code: source paths, edit definition, etc.
For all non-run-time symbol annotations (anything other than SYMBOL-FUNCTION and SYMBOL-VALUE), we use the compiler's global database. MACRO-FUNCTION will use INFO, rather than vice-versa.
When doing the IR1 phases for the interpreter, we probably want to suppress optimizations that change user-visible function calls: - Don't do local call conversion of any named functions (even lexical ones). This is so that a call will appear on the stack that looks like the call in the original source. The keyword and optional argument transformations done by local call mangle things quite a bit. Also, note local-call converting prevents unreferenced arguments from being deleted, which is another non-obvious transformation. - Don't run source-transforms, IR1 transforms and IR1 optimizers. This way, TRACE and BACKTRACE will show calls with the original arguments, rather than the "optimized" form, etc. Also, for the interpreter it will actually be faster to call the original function (which is compiled) than to "inline expand" it. Also, this allows implementation-dependent transforms to expand into
There are some problems with stepping, due to our non-syntactic IR1 representation. The source path information is the key that makes this conceivable. We can skip over the stepping of a subform by quietly evaluating nodes whose source path lies within the form being skipped.
One problem with determining what value has been returned by a form. With a function call, it is theoretically possible to precisely determine this, since if we complete evaluation of the arguments, then we arrive at the Combination node whose value is synonymous with the value of the form. We can even detect this case, since the Node-Source will be EQ to the form. And we can also detect when we unwind out of the evaluation, since we will leave the form without having ever reached this node.
But with macros and special-forms, there is no node whose value is the value of the form, and no node whose source is the macro call or special form. We can still detect when we leave the form, but we can't be sure whether this was a normal evaluation result or an explicit RETURN-FROM.
But does this really matter? It seems that we can print the value returned (if any), then just print the next form to step. In the rare case where we did unwind, the user should be able to figure it out.
[We can look at this as a side-effect of CPS: there isn't any difference between a "normal" return and a non-local one.]
[Note that in any control transfer (normal or otherwise), the stepper may need to unwind out of an arbitrary number of levels of stepping. This is because a form in a TR position may yield its to a node arbitrarily far out.]
Another problem is with deciding what form is being stepped. When we start evaluating a node, we dive into code that is nested somewhere down inside that form. So we actually have to do a loop of asking questions before we do any evaluation. But what do we ask about?
If we ask about the outermost enclosing form that is a subform of the the last form that the user said to execute, then we might offer a form that isn't really evaluated, such as a LET binding list.
But once again, is this really a problem? It is certainly different from a conventional stepper, but a pretty good argument could be made that it is superior. Haven't you ever wanted to skip the evaluation of all the LET bindings, but not the body? Wouldn't it be useful to be able to skip the DO step forms?
All of this assumes that nobody ever wants to step through the guts of a macroexpansion. This seems reasonable, since steppers are for weenies, and weenies don't define macros (hence don't debug them). But there are probably some weenies who don't know that they shouldn't be writing macros.
We could handle this by finding the "source paths" in the expansion of each macro by sticking some special frob in the source path marking the place where the expansion happened. When we hit code again that is in the source, then we revert to the normal source path. Something along these lines might be a good idea anyway (for compiler error messages, for example).
The source path hack isn't guaranteed to work quite so well in generated code, though, since macros return stuff that isn't freshly consed. But we could probably arrange to win as long as any given expansion doesn't return two EQ forms.
It might be nice to have a command that skipped stepping of the form, but printed the results of each outermost enclosed evaluated subform, i.e. if you used this on the DO step-list, it would print the result of each new-value form. I think this is implementable. I guess what you would do is print each value delivered to a DEST whose source form is the current or an enclosing form. Along with the value, you would print the source form for the node that is computing the value.
The stepper can also have a "back" command that "unskips" or "unsteps". This would allow the evaluation of forms that are pure (modulo lexical variable setting) to be undone. This is useful, since in stepping it is common that you skip a form that you shouldn't have, or get confused and want to restart at some earlier point.
What we would do is remember the current node and the values of all local variables. heap before doing each step or skip action. We can then back up the state of all lexical variables and the "program counter". To make this work right with set closure variables, we would copy the cell's value, rather than the value cell itself.
[To be fair, note that this could easily be done with our current interpreter: the stepper could copy the environment alists.]
We can't back up the "program counter" when a control transfer leaves the current function, since this state is implicitly represented in the interpreter's state, and is discarded when we exit. We probably want to ask for confirmation before leaving the function to give users a chance to "unskip" the forms in a TR position.
Another question is whether the conventional stepper is really a good thing to imitate... How about an editor-based mouse-driven interface? Instead of "skipping" and "stepping", you would just designate the next form that you wanted to stop at. Instead of displaying return values, you replace the source text with the printed representation of the value.
It would show the "program counter" by highlighting the *innermost* form that we are about to evaluate, i.e. the source form for the node that we are stopped at. It would probably also be useful to display the start of the form that was used to designate the next stopping point, although I guess this could be implied by the mouse position.
Such an interface would be a little harder to implement than a dumb stepper, but it would be much easier to use. [It would be impossible for an evalhook stepper to do this.]
Note: %PRIMITIVE
can only be used in compiled code. It is a
trapdoor into the compiler, not a general syntax for accessing
"sub-primitives". It's main use is in implementation-dependent
compiler transforms. It saves us the effort of defining a "phony
function" (that is not really defined), and also allows direct
communication with the code generator through codegen-info arguments.
Some primitives may be exported from the VM so that %PRIMITIVE
can be used to make it explicit that an escape routine or interpreter
stub is assuming an operation is implemented by the compiler.
Although the compiler's great freedom in choice of function call conventions and variable representations has major efficiency advantages, it also has unfortunate consequences for the debugger. The debug information that we need is even more elaborate than for conventional "compiled" languages, since we cannot even do a simple backtrace without some debug information. However, once having gone this far, it is not that difficult to go the extra distance, and provide full source level debugging of compiled code.
Full debug information has a substantial space penalty, so we allow different levels of debug information to be specified. In the extreme case, we can totally omit debug information.
The Debug-Info structure directly represents information about the source code, and points to other structures that describe the layout of run-time data structures.
Make some sort of minimal debug-info format that would support at least the common cases of level 1 (since that is what we would release), and perhaps level 0. Actually, it seems it wouldn't be hard to crunch nearly all of the debug-function structure and debug-info function map into a single byte-vector. We could have an uncrunch function that restored the current format. This would be used by the debugger, and also could be used by purify to delete parts of the debug-info even when the compiler dumps it in crunched form. [Note that this isn't terribly important if purify is smart about debug-info...]
Compiled source map representation:
[### store in debug-function PC at which env is properly initialized, i.e. args (and return-pc, etc.) in internal locations. This is where a :function-start breakpoint would break.]
[### Note that that we can easily cache the form-number => source-path or form-number => form translation using a vector indexed by form numbers that we build during a walk.]
Instead of using source paths in the debug-info, use "form numbers". The form number of a form is the number of forms that we walk to reach that form when doing a pre-order walk of the source form. [Might want to use a post-order walk, as that would more closely approximate evaluation order.]
We probably want to continue using source-paths in the compiler, since they are quick to compute and to get you to a particular form. [### But actually, I guess we don't have to precompute the source paths and annotate nodes with them: instead we could annotate the nodes with the actual original source form. Then if we wanted to find the location of that form, we could walk the root source form, looking that original form. But we might still need to enter all the forms in a hashtable so that we can tell during IR1 conversion that a given form appeared in the original source.]
Note that form numbers have an interesting property: it is quite efficient to
determine whether an arbitrary form is a subform of some other form, since the
form number of B will be >
than A's number and <
A's next sibling's number iff
B is a subform of A.
This should be quite useful for doing the source=>pc
mapping in the debugger,
since that problem reduces to finding the subset of the known locations that
are for subforms of the specified form.
Assume a byte vector with a standard variable-length integer format, something like this:
0..253 => the integer 254 => read next two bytes for integer 255 => read next four bytes for integer
Then a compiled debug block is just a sequence of variable-length integers in a particular order, something like this:
number of successors ...offsets of each successor in the function's blocks vector... first PC [offset of first top-level form (in forms) (only if not component default)] form number of first source form first live mask (length in bytes determined by number of VARIABLES) ...more <PC, top-level form offset, form-number, live-set> tuples...
We determine the number of locations recorded in a block by finding the start of the next compiled debug block in the blocks vector.
[### Actually, only need 2 bits for number of successors 0,1,2. We might want to use other bits in the first byte to indicate the kind of location.] [### We could support local packing by having a general concept of "alternate locations" instead of just regular and save locations. The location would have a bit indicating that there are alternate locations, in which case we read the number of alternate locations and then that many more SC-OFFSETs. In the debug-block, we would have a second bit mask with bits set for TNs that are in an alternate location. We then read a number for each such TN, with the value being interpreted as an index into the Location's alternate locations.]
It looks like using structures for the compiled-location-info is too bulky. Instead we need some packed binary representation.
First, let's represent an SC/offset pair with an "SC-Offset", which is an integer with the SC in the low 5 bits and the offset in the remaining bits:
---------------------------------------------------- | Offset (as many bits as necessary) | SC (5 bits) | ----------------------------------------------------Probably the result should be constrained to fit in a fixnum, since it will be more efficient and gives more than enough possible offsets.
We can then represent a compiled location like this:
single byte of boolean flags: uninterned name packaged name environment-live has distinct save location has ID (name not unique in this fun) name length in bytes (as var-length integer) ...name bytes... [if packaged, var-length integer that is package name length] ...package name bytes...] [If has ID, ID as var-length integer] SC-Offset of primary location (as var-length integer) [If has save SC, SC-Offset of save location (as var-length integer)]
But for a whizzy breakpoint facility, we would need a good source=>code
map.
Dumping a complete code=>source map
might be as good a way as any to represent
this, due to the one-to-many relationship between source and code locations.
We might be able to get away with just storing the source locations for the beginnings of blocks and maintaining a mapping from code ranges to blocks. This would be fine both for the profiler and for the "where am I running now" indication. Users might also be convinced that it was most interesting to break at block starts, but I don't really know how easily people could develop an understanding of basic blocks.
It could also be a bit tricky to map an arbitrary user-designated source location to some "closest" source location actually in the debug info. This problem probably exists to some degree even with a full source map, since some forms will never appear as the source of any node. It seems you might have to negotiate with the user. He would mouse something, and then you would highlight some source form that has a common prefix (i.e. is a prefix of the user path, or vice-versa.) If they aren't happy with the result, they could try something else. In some cases, the designated path might be a prefix of several paths. This ambiguity might be resolved by picking the shortest path or letting the user choose.
At the primitive level, I guess what this means is that the structure of source
locations (i.e. source paths) must be known, and the source=>code
operation
should return a list of <source,code>
pairs, rather than just a list of code
locations. This allows the debugger to resolve the ambiguity however it wants.
I guess the formal definition of which source paths we would return is: All source paths in the debug info that have a maximal common prefix with the specified path. i.e. if several paths have the complete specified path as a prefix, we return them all. Otherwise, all paths with an equally large common prefix are returned: if the path with the most in common matches only the first three elements, then we return all paths that match in the first three elements. As a degenerate case (which probably shouldn't happen), if there is no path with anything in common, then we return *all* of the paths.
In the DEBUG-SOURCE structure we may ultimately want a vector of the start positions of each source form, since that would make it easier for the debugger to locate the source. It could just open the file, FILE-POSITION to the form, do a READ, then loop down the source path. Of course, it could read each form starting from the beginning, but that might be too slow.
Do XEPs really need Debug-Functions? The only time that we will commonly end up in the debugger on an XEP is when an argument type check fails. But I suppose it would be nice to be able to print the arguments passed...
Note that assembler-level code motion such as pipeline reorganization can cause problems with our PC maps. The assembler needs to know that debug info markers are different from real labels anyway, so I suppose it could inhibit motion across debug markers conditional on policy. It seems unworthwhile to remember the node for each individual instruction.
For tracing block-compiled calls: Info about return value passing locations? Info about where all the returns are?
We definitely need the return-value passing locations for debug-return. The question is what the interface should be. We don't really want to have a visible debug-function-return-locations operation, since there are various value passing conventions, and we want to paper over the differences.
Probably should be a compiler option to initialize stack frame to a special uninitialized object (some random immediate type). This would aid debugging, and would also help GC problems. For the latter reason especially, this should be locally-turn-onable (off of policy? the new debug-info quality?).
What about the interface between the evaluator and the debugger? (i.e. what happens on an error, etc.) Compiler error handling should be integrated with run-time error handling. Ideally the error messages should look the same. Practically, in some cases the run-time errors will have less information. But the error should look the same to the debugger (or at least similar).
How does the debugger interface to the "evaluator" (where the evaluator means all of native code, byte-code and interpreted IR1)? It seems that it would be much more straightforward to have a consistent user interface to debugging all code representations if there was a uniform debugger interface to the underlying stuff, and vice-versa.
Of course, some operations might not be supported by some representations, etc. For example, fine-control stepping might not be available in native code. In other cases, we might reduce an operation to the lowest common denominator, for example fetching lexical variables by string and admitting the possibility of ambiguous matches. [Actually, it would probably be a good idea to store the package if we are going to allow variables to be closed over.]
Some objects we would need:
Location: The constant information about the place where a value is stored, everything but which particular frame it is in. Operations: location name, type, etc. location-value frame location (setf'able) monitor-location location function Function is called whenever location is set with the location, frame and old value. If active values aren't supported, then we dummy the effect using breakpoints, in which case the change won't be noticed until the end of the block (and intermediate changes will be lost.) debug info: All the debug information for a component. Frame: frame-changed-locations frame => location* Return a list of the locations in frame that were changed since the last time this function was called. Or something. This is for displaying interesting state changes at breakpoints. save-frame-state frame => frame-state restore-frame-state frame frame-state These operations allow the debugger to back up evaluation, modulo side-effects and non-local control transfers. This copies and restores all variables, temporaries, etc, local to the frame, and also the current PC and dynamic environment (current catch, etc.) At the time of the save, the frame must be for the running function (not waiting for a call to return.) When we restore, the frame becomes current again, effectively exiting from any frames on top. (Of course, frame must not already be exited.) Thread: Representation of which stack to use, etc. Block: What successors the block has, what calls there are in the block. (Don't need to know where calls are as long as we know called function, since can breakpoint at the function.) Whether code in this block is wildly out of order due to being the result of loop-invariant optimization, etc. Operations: block-successors block => code-location* block-forms block => (source-location code-location)* Return the corresponding source locations and code locations for all forms (and form fragments) in the block.
There are about five things that the debugger might want to know about a variable:
Name Although a lexical variable's name is "really" a symbol (package and all), in practice it doesn't seem worthwhile to require all the symbols for local variable names to be retained. There is much less VM and GC overhead for a constant string than for a symbol. (Also it is useful to be able to access gensyms in the debugger, even though they are theoretically ineffable).
ID Which variable with the specified name is this? It is possible to have multiple variables with the same name in a given function. The ID is something that makes Name unique, probably a small integer. When variables aren't unique, we could make this be part of the name, e.g. "FOO#1", "FOO#2". But there are advantages to keeping this separate, since in many cases lifetime information can be used to disambiguate, making qualification unnecessary.
SC When unboxed representations are in use, we must have type information to properly read and write a location. We only need to know the SC for this, which would be amenable to a space-saving numeric encoding.
Location Simple: the offset in SC. [Actually, we need the save location too.]
Lifetime In what parts of the program does this variable hold a meaningful value? It seems prohibitive to record precise lifetime information, both in space and compiler effort, so we will have to settle for some sort of approximation.
The finest granularity at which it is easy to determine liveness is the block: we can regard the variable lifetime as the set of blocks that the variable is live in. Of course, the variable may be dead (and thus contain meaningless garbage) during arbitrarily large portions of the block.
Note that this subsumes the notion of which function a variable belongs to. A given block is only in one function, so the function is implicit.
The variable map should represent this information space-efficiently and with adequate computational efficiency.
The SC and ID can be represented as small integers. Although the ID can in principle be arbitrarily large, it should be 100 in practice. The location can be represented by just the offset (a moderately small integer), since the SB is implicit in the SC.
The lifetime info can be represented either as a bit-vector indexed by block numbers, or by a list of block numbers. Which is more compact depends both on the size of the component and on the number of blocks the variable is live in. In the limit of large component size, the sparse representation will be more compact, but it isn't clear where this crossover occurs. Of course, it would be possible to use both representations, choosing the more compact one on a per-variable basis. Another interesting special case is when the variable is live in only one block: this may be common enough to be worth picking off, although it is probably rarer for named variables than for TNs in general.
If we dump the type, then a normal list-style type descriptor is fine: the space overhead is small, since the shareability is high.
We could probably save some space by cleverly representing the var-info as parallel vectors of different types, but this would be more painful in use. It seems better to just use a structure, encoding the unboxed fields in a fixnum. This way, we can pass around the structure in the debugger, perhaps even exporting it from the low-level debugger interface.
[### We need the save location too. This probably means that we need two slots of bits, since we need the save offset and save SC. Actually, we could let the save SC be implied by the normal SC, since at least currently, we always choose the same save SC for a given SC. But even so, we probably can't fit all that stuff in one fixnum without squeezing a lot, so we might as well split and record both SCs.
In a localized packing scheme, we would have to dump a different var-info whenever either the main location or the save location changes. As a practical matter, the save location is less likely to change than the main location, and should never change without the main location changing.
One can conceive of localized packing schemes that do saving as a special case of localized packing. If we did this, then the concept of a save location might be eliminated, but this would require major changes in the IR2 representation for call and/or lifetime info. Probably we will want saving to continue to be somewhat magical.]
How about:
(defstruct var-info ;; ;; This variable's name. (symbol-name of the symbol) (name nil :type simple-string) ;; ;; The SC, ID and offset, encoded as bit-fields. (bits nil :type fixnum) ;; ;; The set of blocks this variable is live in. If a bit-vector, then it has ;; a 1 when indexed by the number of a block that it is live in. If an ;; I-vector, then it lists the live block numbers. If a fixnum, then that is ;; the number of the sole live block. (lifetime nil :type (or vector fixnum)) ;; ;; The variable's type, represented as list-style type descriptor. type)
Then the debug-info holds a simple-vector of all the var-info structures for that component. We might as well make it sorted alphabetically by name, so that we can binary-search to find the variable corresponding to a particular name.
We need to be able to translate PCs to block numbers. This can be done by an I-Vector in the component that contains the start location of each block. The block number is the index at which we find the correct PC range. This requires that we use an emit-order block numbering distinct from the IR2-Block-Number, but that isn't any big deal. This seems space-expensive, but it isn't too bad, since it would only be a fraction of the code size if the average block length is a few words or more.
An advantage of our per-block lifetime representation is that it directly supports keeping a variable in different locations when in different blocks, i.e. multi-location packing. We use a different var-info for each different packing, since the SC and offset are potentially different. The Name and ID are the same, representing the fact that it is the same variable. It is here that the ID is most significant, since the debugger could otherwise make same-name variables unique all by itself.
[### Probably not worth trying to make the stack parseable from the bottom up. There are too many complications when we start having variable sized stuff on the stack. It seems more profitable to work on making top-down parsing robust. Since we are now planning to wire the bottom-up linkage info, scanning from the bottom to find the top frame shouldn't be too inefficient, even when there was a runaway recursion. If we somehow jump into hyperspace, then the debugger may get confused, but we can debug this sort of low-level system lossage using ADB.]
There are currently three relevant context pointers: - The PC. The current PC is wired (implicit in the machine). A saved PC (RETURN-PC) may be anywhere in the current frame. - The current stack context (CONT). The current CONT is wired. A saved CONT (OLD-CONT) may be anywhere in the current frame. - The current code object (ENV). The current ENV is wired. When saved, this is extra-difficult to locate, since it is saved by the caller, and is thus at an unknown offset in OLD-CONT, rather than anywhere in the current frame.
We must have all of these to parse the stack.
With the proposed Debug-Function, we parse the stack (starting at the top) like this: 1] Use ENV to locate the current Debug-Info 2] Use the Debug-Info and PC to determine the current Debug-Function. 3] Use the Debug-Function to find the OLD-CONT and RETURN-PC. 4] Find the old ENV by searching up the stack for a saved code object containing the RETURN-PC. 5] Assign old ENV to ENV, OLD-CONT to CONT, RETURN-PC to PC and goto 1.
If we changed the function representation so that the code and environment were a single object, then the location of the old ENV would be simplified. But we still need to represent ENV as separate from PC, since interrupts and errors can happen when the current PC isn't positioned at a valid return PC.
It seems like it might be a good idea to save OLD-CONT, RETURN-PC and ENV at the beginning of the frame (before any stack arguments). Then we wouldn't have to search to locate ENV, and we also have a hope of parsing the stack even if it is damaged. As long as we can locate the start of some frame, we can trace the stack above that frame. We can recognize a probable frame start by scanning the stack for a code object (presumably a saved ENV).
Probably we want some fairly general mechanism for specifying that a TN should be considered to be live for the duration of a specified environment. It would be somewhat easier to specify that the TN is live for all time, but this would become very space-inefficient in large block compilations.
This mechanism could be quite useful for other debugger-related things. For example, when debuggability is important, we could make the TNs holding arguments live for the entire environment. This would guarantee that a backtrace would always get the right value (modulo setqs).
Note that in this context, "environment" means the Environment structure (one per non-let function). At least according to current plans, even when we do inter-routine register allocation, the different functions will have different environments: we just "equate" the environments. So the number of live per-environment TNs is bounded by the size of a "function", and doesn't blow up in block compilation.
The implementation is simple: per-environment TNs are flagged by the :Environment kind. :Environment TNs are treated the same as :Normal TNs by everyone except for lifetime/conflict analysis. An environment's TNs are also stashed in a list in the IR2-Environment structure. During the conflict analysis post-pass, we look at each block's environment, and make all the environment's TNs always-live in that block.
We can implement the "fixed save location" concept needed for lazy frame creation by allocating the save TNs as wired TNs at IR2 conversion time. We would use the new "environment lifetime" concept to specify the lifetimes of the save locations. There isn't any run-time overhead if we never get around to using the save TNs. [Pack would also have to notice TNs with pre-allocated save TNs, packing the original TN in the stack location if its FSC is the stack.]
We want a standard (recognizable) format for an "escape" frame. We must make an escape frame whenever we start running another function without the current function getting a chance to save its registers. This may be due either to a truly asynchronous event such as a software interrupt, or due to an "escape" from a miscop. An escape frame marks a brief conversion to a callee-saves convention.
Whenever a miscop saves registers, it should make an escape frame. This ensures that the "current" register contents can always be located by the debugger. In this case, it may be desirable to be able to indicate that only partial saving has been done. For example, we don't want to have to save all the FP registers just so that we can use a couple extra general registers.
When the debugger see an escape frame, it knows that register values are located in the escape frame's "register save" area, rather than in the normal save locations.
It would be nice if there was a better solution to this internal error concept. One problem is that it seems there is a substantial space penalty for emitting all that error code, especially now that we don't share error code between errors because we want to preserve the source context in the PC. But this probably isn't really all that bad when considered as a fraction of the code. For example, the check part of a type check is 12 bytes, whereas the error part is usually only 6. In this case, we could never reduce the space overhead for type checks by more than 1/3, thus the total code size reduction would be small. This will be made even less important when we do type check optimizations to reduce the number of type checks.
Probably we should stick to the same general internal error mechanism, but make it interact with the debugger better by allocating linkage registers and allowing proceedable errors. We could support shared error calls and non-proceedable errors when space is more important than debuggability, but this is probably more complexity than is worthwhile.
We jump or trap to a routine that saves the context (allocating at most the return PC register). We then encode the error and context in the code immediately following the jump/trap. (On the MIPS, the error code can be encoded in the trap itself.) The error arguments would be encoded as SC-offsets relative to the saved context. This could solve both the arg-trashing problem and save space, since we could encode the SC-offsets more tersely than the corresponding move instructions.
The following is a key of the three bit low-tagging scheme:
This tagging scheme forces a dual-word alignment of data-blocks on the heap, but this can be pretty negligible:
Data-Blocks comprise only descriptors, or they contain immediate data and raw bits interpreted by the system. GC must skip the latter when scanning the heap, so it does not look at a word of raw bits and interpret it as a pointer descriptor. These data-blocks require headers for GC as well as for operations that need to know how to interpret the raw bits. When GC is scanning, and it sees a header-word, then it can determine how to skip that data-block if necessary. Header-Words are tagged as other-immediates. See the sections "Other-Immediates" and "Data-Blocks and Header-Words" for comments on distinguishing header-words from other-immediate data. This distinction is necessary since we scan through data-blocks containing only descriptors just as we scan through the heap looking for header-words introducing data-blocks.
Data-Blocks containing only descriptors do not require header-words for GC since the entire data-block can be scanned by GC a word at a time, taking whatever action is necessary or appropriate for the data in that slot. For example, a cons is referenced by a descriptor with a specific tag, and the system always knows the size of this data-block. When GC encounters a pointer to a cons, it can transport it into the new space, and when scanning, it can simply scan the two words manifesting the cons interpreting each word as a descriptor. Actually there is no cons tag, but a list tag, so we make sure the cons is not nil when appropriate. A header may still be desired if the pointer to the data-block does not contain enough information to adequately maintain the data-block. An example of this is a simple-vector containing only descriptor slots, and we attach a header-word because the descriptor pointing to the vector lacks necessary information - the type of the vector's elements, its length, etc.
There is no need for a major tag for GC forwarding pointers. Since the tag bits are in the low end of the word, a range check on the start and end of old space tells you if you need to move the thing. This is all GC overhead.
A structure descriptor has the structure lowtag type code, making structurep a fast operation. A structure data-block has the following format:
------------------------------------------------------- | length (24 bits) | Structure header type (8 bits) | ------------------------------------------------------- | structure type name (a symbol) | ------------------------------------------------------- | structure slot 0 | ------------------------------------------------------- | ... structure slot length - 2 | -------------------------------------------------------
The header word contains the structure length, which is the number of words (other than the header word.) The length is always at least one, since the first word of the structure data is the structure type name.
A fixnum has one of the following formats in 32 bits:
------------------------------------------------------- | 30 bit 2's complement even integer | 0 0 0 | -------------------------------------------------------or
------------------------------------------------------- | 30 bit 2's complement odd integer | 1 0 0 | -------------------------------------------------------
Effectively, there is one tag for immediate integers, two zeros. This buys one more bit for fixnums, and now when these numbers index into simple-vectors or offset into memory, they point to word boundaries on 32-bit, byte-addressable machines. That is, no shifting need occur to use the number directly as an offset.
This format has another advantage on byte-addressable machines when fixnums are offsets into vector-like data-blocks, including structures. Even though we previously mentioned data-blocks are dual-word aligned, most indexing and slot accessing is word aligned, and so are fixnums with effectively two tag bits.
Two tags also allow better usage of special instructions on some machines that can deal with two low-tag bits but not three.
Since the two bits are zeros, we avoid having to mask them off before using the words for arithmetic, but division and multiplication require special shifting.
As for fixnums, there are two different three-bit lowtag codes for other-immediate, allowing 64 other-immediate types:
---------------------------------------------------------------- | Data (24 bits) | Type (8 bits with low-tag) | 1 0 | ----------------------------------------------------------------
The type-code for an other-immediate type is considered to include the two lowtag bits. This supports the concept of a single "type code" namespace for all descriptors, since the normal lowtag codes are disjoint from the other-immediate codes.
For other-pointer objects, the full eight bits of the header type code are used as the type code for that kind of object. This is why we use two lowtag codes for other-immediate types: each other-pointer object needs a distinct other-immediate type to mark its header.
The system uses the other-immediate format for characters, the symbol-value unbound trap value, and header-words for data-blocks on the heap. The type codes are laid out to facilitate range checks for common subtypes; for example, all numbers will have contiguous type codes which are distinct from the contiguous array type codes. See section 38.7 for details.
Pointers to data-blocks have the following format:
---------------------------------------------------------------- | Dual-word address of data-block (29 bits) | 1 1 1 | ----------------------------------------------------------------
The word pointed to by the above descriptor is a header-word, and it has the same format as an other-immediate:
---------------------------------------------------------------- | Data (24 bits) | Type (8 bits with low-tag) | 0 1 0 | ----------------------------------------------------------------This is convenient for scanning the heap when GC'ing, but it does mean that whenever GC encounters an other-immediate word, it has to do a range check on the low byte to see if it is a header-word or just a character (for example). This is easily acceptable performance hit for scanning.
The system interprets the data portion of the header-word for non-vector data-blocks as the word length excluding the header-word. For example, the data field of the header for ratio and complex numbers is two, one word each for the numerator and denominator or for the real and imaginary parts.
For vectors and data-blocks representing Lisp objects stored like vectors, the system ignores the data portion of the header-word:
---------------------------------------------------------------- | Unused Data (24 bits) | Type (8 bits with low-tag) | 0 1 0 | ---------------------------------------------------------------- | Element Length of Vector (30 bits) | 0 0 | ----------------------------------------------------------------
Using a separate word allows for much larger vectors, and it allows length to simply access a single word without masking or shifting. Similarly, the header for complex arrays and vectors has a second word, following the header-word, the system uses for the fill pointer, so computing the length of any array is the same code sequence.
These are the other-immediate types. We specify them including all low eight bits, including the other-immediate tag, so we can think of the type bits as one type - not an other-immediate major type and a subtype. Also, fetching a byte and comparing it against a constant is more efficient than wasting even a small amount of time shifting out the other-immediate tag to compare against a five bit constant.
Number (< 36) bignum 10 ratio 14 single-float 18 double-float 22 complex 26 (complex single-float) 30 (complex double-float) 34 Array (>= 38 code 118) Simple-Array (>= 38 code 102) simple-array 38 Vector (>= 42 code 114) simple-string 42 simple-bit-vector 46 simple-vector 50 (simple-array (unsigned-byte 2) (*)) 54 (simple-array (unsigned-byte 4) (*)) 58 (simple-array (unsigned-byte 8) (*)) 62 (simple-array (unsigned-byte 16) (*)) 66 (simple-array (unsigned-byte 32) (*)) 70 (simple-array (signed-byte 8) (*)) 74 (simple-array (signed-byte 16) (*)) 78 (simple-array (signed-byte 30) (*)) 82 (simple-array (signed-byte 32) (*)) 86 (simple-array single-float (*)) 90 (simple-array double-float (*)) 94 (simple-array (complex single-float) (*) 98 (simple-array (complex double-float) (*) 102 complex-string 106 complex-bit-vector 110 (array * (*)) -- general complex vector. 114 complex-array 118 code-header-type 122 function-header-type 126 closure-header-type 130 funcallable-instance-header-type 134 unused-function-header-1-type 138 unused-function-header-2-type 142 unused-function-header-3-type 146 closure-function-header-type 150 return-pc-header-type (a.k.a LRA) 154 value-cell-header-type 158 symbol-header-type 162 base-character-type 166 system-area-pointer-type (header type) 170 unbound-marker 174 weak-pointer-type 178 structure-header-type 182 fdefn-type 186
All strings in the system are C-null terminated. This saves copying the bytes when calling out to C. The only time this wastes memory is when the string contains a multiple of eight characters, and then the system allocates two more words (since Lisp objects are dual-word aligned) to hold the C-null byte. Since the system will make heavy use of C routines for systems calls and libraries that save reimplementation of higher level operating system functionality (such as pathname resolution or current directory computation), saving on copying strings for C should make C call out more efficient.
The length word in a string header, see section "Data-Blocks and Header-Word Format", counts only the characters truly in the Common Lisp string. Allocation and GC will have to know to handle the extra C-null byte, and GC already has to deal with rounding up various objects to dual-word alignment.
Symbol data-block has the following format:
------------------------------------------------------- | 7 (data-block words) | Symbol Type (8 bits) | ------------------------------------------------------- | Value Descriptor | ------------------------------------------------------- | Function Pointer | ------------------------------------------------------- | Raw Function Address | ------------------------------------------------------- | Setf Function | ------------------------------------------------------- | Property List | ------------------------------------------------------- | Print Name | ------------------------------------------------------- | Package | -------------------------------------------------------
Most of these slots are self-explanatory given what symbols must do in Common Lisp, but a couple require comments. We added the Raw Function Address slot to speed up named call which is the most common calling convention. This is a non-descriptor slot, but since objects are dual word aligned, the value inherently has fixnum low-tag bits. The GC method for symbols must know to update this slot. The Setf Function slot is currently unused, but we had an extra slot due to adding Raw Function Address since objects must be dual-word aligned.
The issues with nil are that we want it to act like a symbol, and we need list operations such as CAR and CDR to be fast on it. CMU Common Lisp solves this by putting nil as the first object in static space, where other global values reside, so it has a known address in the system:
------------------------------------------------------- <-- space | 0 | start ------------------------------------------------------- | 7 (data-block words) | Symbol Type (8 bits) | ------------------------------------------------------- <-- nil | Value/CAR | ------------------------------------------------------- | Definition/CDR | ------------------------------------------------------- | Raw Function Address | ------------------------------------------------------- | Setf Function | ------------------------------------------------------- | Property List | ------------------------------------------------------- | Print Name | ------------------------------------------------------- | Package | ------------------------------------------------------- | ... | -------------------------------------------------------In addition, we make the list typed pointer to nil actually point past the header word of the nil symbol data-block. This has usefulness explained below. The value and definition of nil are nil. Therefore, any reference to nil used as a list has quick list type checking, and CAR and CDR can go right through the first and second words as if nil were a cons object.
When there is a reference to nil used as a symbol, the system adds offsets to the address the same as it does for any symbol. This works due to a combination of nil pointing past the symbol header-word and the chosen list and other-pointer type tags. The list type tag is four less than the other-pointer type tag, but nil points four additional bytes into its symbol data-block.
;;;; Array Headers.
The array-header data-block has the following format:
---------------------------------------------------------------- | Header Len (24 bits) = Array Rank +5 | Array Type (8 bits) | ---------------------------------------------------------------- | Fill Pointer (30 bits) | 0 0 | ---------------------------------------------------------------- | Fill Pointer p (29 bits) -- t or nil | 1 1 1 | ---------------------------------------------------------------- | Available Elements (30 bits) | 0 0 | ---------------------------------------------------------------- | Data Vector (29 bits) | 1 1 1 | ---------------------------------------------------------------- | Displacement (30 bits) | 0 0 | ---------------------------------------------------------------- | Displacedp (29 bits) -- t or nil | 1 1 1 | ---------------------------------------------------------------- | Range of First Index (30 bits) | 0 0 | ---------------------------------------------------------------- . . .The array type in the header-word is one of the eight-bit patterns from section "Data-Blocks and Other-immediates Typing", indicating that this is a complex string, complex vector, complex bit-vector, or a multi-dimensional array. The data portion of the other-immediate word is the length of the array header data-block. Due to its format, its length is always five greater than the array's number of dimensions. The following words have the following interpretations and types:
Bignum data-blocks have the following format:
------------------------------------------------------- | Length (24 bits) | Bignum Type (8 bits) | ------------------------------------------------------- | least significant bits | ------------------------------------------------------- . . .The elements contain the two's complement representation of the integer with the least significant bits in the first element or closer to the header. The sign information is in the high end of the last element.
A code data-block is the run-time representation of a "component". A component is a connected portion of a program's flow graph that is compiled as a single unit, and it contains code for many functions. Some of these functions are callable from outside of the component, and these are termed "entry points".
Each entry point has an associated user-visible function data-block (of type function). The full call convention provides for calling an entry point specified by a function object.
Although all of the function data-blocks for a component's entry points appear to the user as distinct objects, the system keeps all of the code in a single code data-block. The user-visible function object is actually a pointer into the middle of a code data-block. This allows any control transfer within a component to be done using a relative branch.
Besides a function object, there are other kinds of references into the middle of a code data-block. Control transfer into a function also occurs at the return-PC for a call. The system represents a return-PC somewhat similarly to a function, so GC can also recognize a return-PC as a reference to a code data-block. This representation is known as a Lisp Return Address (LRA).
It is incorrect to think of a code data-block as a concatenation of "function data-blocks". Code for a function is not emitted in any particular order with respect to that function's function-header (if any). The code following a function-header may only be a branch to some other location where the function's "real" definition is.
The following are the three kinds of pointers to code data-blocks:
Information about functions that is only useful for entry points is kept in some descriptors following the function's self-pointer descriptor. All of these together with the function's header-word are known as the "function header". GC must be able to locate the function header. We provide for this by chaining together the function headers in a NIL terminated list kept in a known slot in the code data-block.
A code data-block has the following format:
A --> **************************************************************** | Header-Word count (24 bits) | Code-Type (8 bits) | ---------------------------------------------------------------- | Number of code words (fixnum tag) | ---------------------------------------------------------------- | Pointer to first function header (other-pointer tag) | ---------------------------------------------------------------- | Debug information (structure tag) | ---------------------------------------------------------------- | First constant (a descriptor) | ---------------------------------------------------------------- | ... | ---------------------------------------------------------------- | Last constant (and last word of code header) | ---------------------------------------------------------------- | Some instructions (non-descriptor) | ---------------------------------------------------------------- | (pad to dual-word boundary if necessary) | B --> **************************************************************** | Word offset from code header (24) | Return-PC-Type (8) | ---------------------------------------------------------------- | First instruction after return | ---------------------------------------------------------------- | ... more code and LRA header-words | ---------------------------------------------------------------- | (pad to dual-word boundary if necessary) | C --> **************************************************************** | Offset from code header (24) | Function-Header-Type (8) | ---------------------------------------------------------------- | Self-pointer back to previous word (with other-pointer tag) | ---------------------------------------------------------------- | Pointer to next function (other-pointer low-tag) or NIL | ---------------------------------------------------------------- | Function name (a string or a symbol) | ---------------------------------------------------------------- | Function debug arglist (a string) | ---------------------------------------------------------------- | Function type (a list-style function type specifier) | ---------------------------------------------------------------- | Start of instructions for function (non-descriptor) | ---------------------------------------------------------------- | More function headers and instructions and return PCs, | | until we reach the total size of header-words + code | | words. | ----------------------------------------------------------------
The following are detailed slot descriptions:
(function (fixnum fixnum fixnum) fixnum)or
(function (string &key (:start unsigned-byte)) string)This information is intended for machine readablilty, such as by the compiler.
A closure data-block has the following format:
---------------------------------------------------------------- | Word size (24 bits) | Closure-Type (8 bits) | ---------------------------------------------------------------- | Pointer to function header (other-pointer low-tag) | ---------------------------------------------------------------- | . | | Environment information | | . | ----------------------------------------------------------------
A closure descriptor has function low-tag bits. This means that a descriptor with function low-tag bits may point to either a function header or to a closure. The idea is that any callable Lisp object has function low-tag bits. Insofar as call is concerned, we make the format of closures and non-closure functions compatible. This is the reason for the self-pointer in a function header. Whenever you have a callable object, you just jump through the second word, offset some bytes, and go.
Due to alignment requirements and low-tag codes, it is not possible to use a hardware call instruction to compute the LRA. Instead the LRA for a call is computed by doing an add-immediate to the start of the code data-block.
An advantage of using a single data-block to represent both the descriptor and non-descriptor parts of a function is that both can be represented by a single pointer. This reduces the number of memory accesses that have to be done in a full call. For example, since the constant pool is implicit in an LRA, a call need only save the LRA, rather than saving both the return PC and the constant pool.
CMUCL has four spaces, read-only, static, dynamic-0, and dynamic-1. Read-only contains objects that the system never modifies, moves, or reclaims. Static space contains some global objects necessary for the system's runtime or performance (since they are located at a known offset at a known address), and the system never moves or reclaims these. However, GC does need to scan static space for references to moved objects. Dynamic-0 and dynamic-1 are the two heap areas for stop-and-copy GC algorithms.
What global objects are at the head of static space???
NIL eval::*top-of-stack* lisp::*current-catch-block* lisp::*current-unwind-protect* FLAGS (RT only) BSP (RT only) HEAP (RT only)
In addition to the above spaces, the system has a control stack, binding stack, and a number stack. The binding stack contains pairs of descriptors, a symbol and its previous value. The number stack is the same as the C stack, and the system uses it for non-Lisp objects such as raw system pointers, saving non-Lisp registers, parts of bignum computations, etc.
The system pointers reference raw allocated memory, data returned by foreign function calls, etc. The system uses these when you need a pointer to a non-Lisp block of memory, using an other-pointer. This provides the greatest flexibility by relieving contraints placed by having more direct references that require descriptor type tags.
A system area pointer data-block has the following format:
------------------------------------------------------- | 1 (data-block words) | SAP Type (8 bits) | ------------------------------------------------------- | system area pointer | -------------------------------------------------------
"SAP" means "system area pointer", and much of our code contains this naming scheme. We don't currently restrict system pointers to one area of memory, but if they do point onto the heap, it is up to the user to prevent being screwed by GC or whatever.
The linkage table feature is based on how dynamic libraries dispatch. A table of functions is used which is filled in with the appropriate code to jump to the correct address.
For CMUCL, this table is stored at target-foreign-linkage-space-start. Each entry is target-foreign-linkage-entry-size bytes long.
At startup, the table is initialized with default values in os_foreign_linkage_init. On x86 platforms, the first entry is code to call the routine resolve_linkage_tramp. All other entries jump to the first entry. The function resolve_linkage_tramp looks at where it was called from to figure out which entry in the table was used. It calls lazy_resolve_linkage with the address of the linkage entry. This routine then fills in the appropriate linkage entry with code to jump to where the real routine is located, and returns the address of the entry. On return, resolve_linkage_tramp then just jumps to the returned address to call the desired function. On all subsequent calls, the entry no longer points to resolve_linkage_tramp but to the real function.
This describes how function calls are made. For foreign data, lazy_resolve_linkage stuffs the address of the actual foreign data into the linkage table. The lisp code then just loads the address from there to get the actual address of the foreign data.
For sparc, the linkage table is slightly different. The first entry is the entry for call_into_c so we never have to look this up. All other entries are for resolve_linkage_tramp. This has the advantage that resolve_linkage_tramp can be much simpler since all calls to foreign code go through call_into_c anyway, and that means all live Lisp registers have already been saved. Also, to make life simpler, we lie about closure_tramp and undefined_tramp in the Lisp code. These are really functions, but we treat them as foreign data since these two routines are only used as addresses in the Lisp code to stuff into a lisp function header.
On the Lisp side, there are two supporting data structures for the linkage table: *linkage-table-data* and *foreign-linkage-symbols*. The latter is a hash table whose key is the foriegn symbol (a string) and whose value is an index into *linkage-table-data*.
*linkage-table-data* is a vector with an unlispy layout. Each entry has 3 parts:
Whenever a new foreign symbol is defined, a new *linkage-table-data* entry is created. *foreign-linkage-symbols* is updated with the symbol and the entry number into *linkage-table-data*.
The *linkage-table-data* is accessed from C (hence the unlispy layout), to figure out the symbol name and the type so that the address of the symbol can be determined. The type tells the C code how to fill in the entry in the linkage-table itself.
The purpose of Fasload files is to allow concise storage and rapid loading of Lisp data, particularly function definitions. The intent is that loading a Fasload file has the same effect as loading the source file from which the Fasload file was compiled, but accomplishes the tasks more efficiently. One noticeable difference, of course, is that function definitions may be in compiled form rather than S-expression form. Another is that Fasload files may specify in what parts of memory the Lisp data should be allocated. For example, constant lists used by compiled code may be regarded as read-only.
In some Lisp implementations, Fasload file formats are designed to allow sharing of code parts of the file, possibly by direct mapping of pages of the file into the address space of a process. This technique produces great performance improvements in a paged time-sharing system. Since the Mach project is to produce a distributed personal-computer network system rather than a time-sharing system, efficiencies of this type are explicitly not a goal for the CMU Common Lisp Fasload file format.
On the other hand, CMU Common Lisp is intended to be portable, as it will eventually run on a variety of machines. Therefore an explicit goal is that Fasload files shall be transportable among various implementations, to permit efficient distribution of programs in compiled form. The representations of data objects in Fasload files shall be relatively independent of such considerations as word length, number of type bits, and so on. If two implementations interpret the same macrocode (compiled code format), then Fasload files should be completely compatible. If they do not, then files not containing compiled code (so-called "Fasdump" data files) should still be compatible. While this may lead to a format which is not maximally efficient for a particular implementation, the sacrifice of a small amount of performance is deemed a worthwhile price to pay to achieve portability.
The primary assumption about data format compatibility is that all implementations can support I/O on finite streams of eight-bit bytes. By "finite" we mean that a definite end-of-file point can be detected irrespective of the content of the data stream. A Fasload file will be regarded as such a byte stream.
A Fasload file may be regarded as a human-readable prefix followed by code in a funny little language. When interpreted, this code will cause the construction of the encoded data structures. The virtual machine which interprets this code has a stack and a table, both initially empty. The table may be thought of as an expandable register file; it is used to remember quantities which are needed more than once. The elements of both the stack and the table are Lisp data objects. Operators of the funny language may take as operands following bytes of the data stream, or items popped from the stack. Results may be pushed back onto the stack or pushed onto the table. The table is an indexable stack that is never popped; it is indexed relative to the base, not the top, so that an item once pushed always has the same index.
More precisely, a Fasload file has the following macroscopic
organization. It is a sequence of zero or more groups concatenated
together. End-of-file must occur at the end of the last group. Each
group begins with a series of seven-bit ASCII characters terminated
by one or more bytes of all ones #xFF
; this is called the
header. Following the bytes which terminate the header is the
body, a stream of bytes in the funny binary language. The body
of necessity begins with a byte other than #xFF
. The body is
terminated by the operation FOP-END-GROUP.
The first nine characters of the header must be FASL FILE
in
upper-case letters. The rest may be any ASCII text, but by
convention it is formatted in a certain way. The header is divided
into lines, which are grouped into paragraphs. A paragraph begins
with a line which does not begin with a space or tab character,
and contains all lines up to, but not including, the next such line.
The first word of a paragraph, defined to be all characters up to but
not including the first space, tab, or end-of-line character, is the
name of the paragraph. A Fasload file header might look something like
this:
FASL FILE >SteelesPerq>User>Guy>IoHacks>Pretty-Print.Slisp Package Pretty-Print Compiled 31-Mar-1988 09:01:32 by some random luser Compiler Version 1.6, Lisp Version 3.0. Functions: INITIALIZE DRIVER HACK HACK1 MUNGE MUNGE1 GAZORCH MINGLE MUDDLE PERTURB OVERDRIVE GOBBLE-KEYBOARD FRY-USER DROP-DEAD HELP CLEAR-MICROCODE %AOS-TRIANGLE %HARASS-READTABLE-MAYBE Macros: PUSH POP FROB TWIDDLEone or more bytes of
#xFF
The particular paragraph names and contents shown here are only intended as suggestions.
Each operation in the binary Fasload language is an eight-bit (one-byte) opcode. Each has a name beginning with "FOP-". In the following descriptions, the name is followed by operand descriptors. Each descriptor denotes operands that follow the opcode in the input stream. A quantity in parentheses indicates the number of bytes of data from the stream making up the operand. Operands which implicitly come from the stack are noted in the text. The notation " stack" means that the result is pushed onto the stack; " table" similarly means that the result is added to the table. A construction like "n(1) value(n)" means that first a single byte n is read from the input stream, and this byte specifies how many bytes to read as the operand named value. All numeric values are unsigned binary integers unless otherwise specified. Values described as "signed" are in two's-complement form unless otherwise specified. When an integer read from the stream occupies more than one byte, the first byte read is the least significant byte, and the last byte read is the most significant (and contains the sign bit as its high-order bit if the entire integer is signed).
Some of the operations are not necessary, but are rather special cases of or combinations of others. These are included to reduce the size of the file or to speed up important cases. As an example, nearly all strings are less than 256 bytes long, and so a special form of string operation might take a one-byte length rather than a four-byte length. As another example, some implementations may choose to store bits in an array in a left-to-right format within each word, rather than right-to-left. The Fasload file format may support both formats, with one being significantly more efficient than the other for a given implementation. The compiler for any implementation may generate the more efficient form for that implementation, and yet compatibility can be maintained by requiring all implementations to support both formats in Fasload files.
Measurements are to be made to determine which operation codes are worthwhile; little-used operations may be discarded and new ones added. After a point the definition will be "frozen", meaning that existing operations may not be deleted (though new ones may be added; some operations codes will be reserved for that purpose).
return (node, vops) safe, safety saving (of registers, costs) SB SC (restriction) semi-inline side-effect in ICR in VMR sparse set splitting (of VMR blocks) SSET SUBPRIMITIVE successor block tail recursion tail recursive tail recursive loop user tail recursion
template TN TNBIND TN-REF transform (source, ICR) type assertion inference top-down, bottom-up assertion propagation derived, asserted descriptor, specifier, intersection, union, member type check type-check (in continuation) UNBOXED (boxed) descriptor unknown values continuation unset variable unwind-block, unwinding used value (dest) value passing VAR VM VOP