This paper discusses in detail the optimizations which were made to achieve an approximately 42% decrease in class memory footprint from PersonalJava 1.0 to PersonalJava 1.1. It is assumed that the reader is familiar with Java and the workings of the Java Virtual Machine as specified in "The Java Virtual Machine Specification" by Lindholm and Yellin. Familiarity with a Sun Virtual Machine is also helpful.
For small and embedded systems lacking fast dynamic class loading facilities such as a network connection, a local file system, or other permanent storage, a "class preloader" to load the class offline makes sense. PersonalJava has a class preloader called JavaCodeCompact which performs this task. JavaCodeCompact creates the internal VM data structures representing a class offline, and lays them out in mostly read-only memory.
Changes were made that reduce the memory allocations required per class. Some of these changes apply only to JavaCodeCompact, thus affecting only the ROM footprint. Some of the other techniques apply both to preloaded and dynamically loaded classes, affecting both the ROM and RAM usage.
During the preloading process, all symbolic references are resolved in constant pools, and all bytecodes that make references to these entries are quickened. Therefore NameAndType and Utf8 constants no longer need to be kept in class constant pools since Java bytecodes never have direct references to these constants. Method bytecodes are modified to refer to the updated constant pool indices after the deletion of the entries.
Constant pools whose references have been resolved have also had the referring bytecode instructions quickened. Therefore, there is no need for type information on these constant pools since the VM is never going to be resolving their entries.
Each Java class file has occurrences of Utf8 strings in its constant pool. When JavaCodeCompact preloads multiple classes, it keeps track of all occurrences of Utf8 strings in its input classes and outputs a shared Utf8 table which individual preloaded classes point to. Moreover, for any two strings A and B, for which B is a suffix of A, only string A is generated in the global string table. B would simply point to the middle of A. In this manner, common string sufixes are shared in order to save space.
JavaCodeCompact resolves all class constant pool entries, including those of type String. It therefore creates Java String's with handles as if the runtime memory allocator created them. Each Java String is formed of an array of java.lang.Character's, an offset field, and a count field. When creating each String and associated handle, JavaCodeCompact has to generate the correct String object fields. In PersonalJava 1.0, a separate handle was created for each embedded Character array. To improve upon this in PersonalJava1.1, one giant Character array is created instead that contains the bodies of all Java String's. Each Java String is laid out to refer to portions of the same array with the right offset and count fields initialized. Separate handles for the separate Character arrays are therefore eliminated.
Many of the internal VM data structures are short tables formed of small integers, like constant pool indices. Examples are tables listing the exceptions that a method throws, or the interfaces that a class implements. JavaCodeCompact tracks the contents of each of these tables, and shares them where appropriate.
JavaCodeCompact generates most of its preloaded class data structures in read-only segments. There are, however, certain kinds of data, such as static Java fields, which need to be stored in read-write segments in order to allow the VM to write to them. The disadvantage of this in PersonalJava1.0 was that a system employing preloaded classes needed to copy the VM-writeable data from persistent storage to RAM, essentially causing duplication of that segment. Changes were made in the VM and JavaCodeCompact to allow the putting of as much data as possible in read-only memory or BSS (the uninitialized global data segment of the executable program) to reduce the memory size impact. The BSS section does not suffer from the extra copy problem, since it is only allocated at load time, and does not occupy any space in permanent storage.
Java does not have a way to express initialized data in the class file format. For example, to initialize a large static constant array, the Java compiler needs to emit class initializer code that allocates an array and fills it in element by element. For large arrays typically encountered in code related to internationalization, these static initializers can get quite large. An example is java.lang.Character where the bulk of the class file size consists of such static initializer code. Classes with large static initializers were identified in the PersonalJava VM and changed to make the initializers unnecessary, thus reducing their size drastically.
The VM and JavaCodeCompact have been changed to ignore all debugging information in classes, such as the source file name attribute, line number tables, and local variable tables. This information is only processed when a debugging option is passed to the VM and JavaCodeCompact at build time, and ignored otherwise in order to save memory.
In examining the class memory footprint requirements, the discovery was made that class metadata supporting loaded classes took up excessive space. For example, in many cases, a methodblock describing a class method takes up much more space than the bytecodes for the method. To rectify this, aggressive changes were made to the VM data structures representing class components. The changes were done in concert with changes in JavaCodeCompact, allowing for the reduction of both the ROM footprint of preloaded classes and the RAM footprint of dynamically loaded classes.
The first comparison is of PersonalJava 1.1 with all optional packages included (rmi, sql, math, zip, and code signing), with a similarly configured JDK 1.1.6, adjusted to exclude components unsupported by PersonalJava (e.g. java.security.acl).
PersonalJava 1.1 | JDK 1.1.6 | |
---|---|---|
No of classes | 913 | 911 |
Total size of class files | 1.997M | 2.022M |
Total class memory footprint | 1.593M | 2.222M |
% of RAM in total footprint | 0.2% | 6.5% |
JDK 1.1.6 comes out to be around 39.5% larger. It also has about 32.5 times the RAM requirement of pjava 1.1.
The second comparison is of PersonalJava 1.1 with no optional packages included, with PersonalJava 1.0.
PersonalJava 1.0 | PersonalJava 1.1 | |
---|---|---|
No of classes | 641 | 689 |
Total size of class files | 1.284M | 1.506M |
Total class memory footprint | 1.402M | 1.227M |
% of RAM in total footprint | 7.6% | 0.3% |
PersonalJava 1.0 comes out to be 14% bigger, even though it has only 83% of the .class data of PersonalJava 1.1. Normalized, this translates to PersonalJava 1.0 class footprint being 38% bigger than PersonalJava 1.1. Similarly, the RAM requirement for the classes has gone down as much as 25-fold when measured relative to total class memory footprint. The normalized value is around 30-fold.
First, all self-recursions and call cycles in the native code which could result in uncontrolled stack overflow were identified. Most of the cycles could be eliminated by iterative rewrites. There were, however, some call cycles that could not be removed. An example is native code to Java transitions and back: there can essentially be several such transitions, with no real bound on the number. On each such cycle, a routine was chosen which employs a dynamic stack check at its beginning, testing for remaining stack space before executing the current routine. These test points are called safe points, since in the case of insufficient stack space, these are the places to throw a StackOverflowError. If execution were to continue, there would be an uncontrolled stack overflow with possible memory corruption. If the remaining stack space at a safe point is under a certain limit (the stack red zone), a StackOverflowError is thrown.
After having dealt with recursion, all call paths requiring excessive stack usage were identified. The frames that "contributed" the most to these paths were chosen and analyzed on a case by case basis. It was possible to significantly reduce the stack usage of most of these by moving large local variable storage to global static storage for non-reentrant routines and by dynamically allocating storage for reentrant routines. In addition, the worst-case stack requirement was reduced on any path between two stack checks by introducing extra stack checks. This allowed the stack red zone to be made smaller.
These efforts resulted in well-defined, exact stack requirements. Static analysis of stack consumption performed on SolarisTM/SPARCTM revealed that the appropriate red zone in the VM implementation is 3.3k plus the maximum stack consumption of the underlying library functions. As for native method implementations including AWT and socket features, another red zone called the native red zone was introduced. Unfortunately, since some Motif routines require large stacks, the native red zone had to be set to 80k for the reference implementation on Solaris/SPARC.
The VM's stack usage was checked by running actual applications like Personal WebAccess (a Web browser). It was found that the worst stack usage at any stack check point in the VM code was less than 12k. As for the check point for the "native red zone", the maximum usage was less than 10k. Thus, a typical upper bound for the thread stack size of the Sparc implementation would be:
upperBound = max {12k + 3.3k + libfuncStackMax, 10k + nativeImplStackMax}
where libfuncStackMax is the maximum stack consumption of the library functions used by the VM like vfprintf, abort, free, etc., and nativeImplStackMax is the maximum stack consumption of the native method implementations including underlying libraries.
Note that the Sparc architecture consumes more stack because of its register windows architecture, and that a lower default stack size can be expected for embedded purpose CPUs.