java performance
by Glen McCluskey
Glen McCluskey is a consultant with 15 years of experience and
has focused on programming languages since 1988. He specializes in Java
and C++ performance, testing, and technical documentation
areas.
Representation The first issue we look at is list representation. How is each list element stored, and how are element sequences stored? The Java language has arrays for primitive types such as int, double, and so on, along with arrays of object-reference types (somewhat like pointers in other languages). So an obvious representation of a list is to use one of these arrays; that is, an ArrayList class uses an underlying primitive array to represent list elements and supports methods for adding and deleting elements, expanding the internal array if the list grows, and so on. But there are a couple of problems with using primitive arrays. One is that the Java language has no templates (parameterized types), so a list class using an array would need to be duplicated for each possible element type. For example, you'd need ArrayList_int, ArrayList_double, ArrayList_string, and so on. Such an approach is unwieldy. Instead of this approach, the collection classes use wrappers. A wrapper is a class object that represents a value of some underlying primitive type. For example, I can say: Object obj = new Integer(37); to represent the value 37 using an Integer class wrapper; Integer is a standard Java class. The Object class is the superclass of all classes, and thus a reference of any class type can be assigned to an Object reference. So a collection class like ArrayList uses a primitive array of Object references. For example: Object objvec[] = new Object[100]; This scheme has the advantage of uniformity: all list elements are objects, instead of primitive types like int or float. Element uniformity achieved via wrappers also has the advantage that elements of disparate types can be stored in one list, such as a list consisting of a mixture of numeric and string types. There's a Java instanceof operator that you use to query the type of an object reference. The cost of this approach is twofold: wrappers take more space than primitive types, and there's some overhead in creating the wrapper and in later extracting wrapper values. Linked Lists Another problem with use of primitive arrays, even if wrappers are used everywhere, is simply that it's expensive to insert elements into the middle of an array, because existing elements must be pushed back. This is also true in other languages such as C and C++. So the collection classes include both an ArrayList class, which uses an underlying Object array to store elements, and a LinkedList class, which employs an internal linked structure to store elements. Each internal link contains an Object reference for the actual element, along with a reference to the next element in the list. As an illustration of the difference between these two representations, here are two programs that exercise ArrayList and LinkedList: import java.util.*;
public class testlist1 {
import java.util.*;
public class testlist2 {
The first takes about 50 times as long to run as the second. Both add 50,000 elements to the front of a list. If an internal array representation is used, each time an element is inserted all the existing elements must be pushed back, whereas if a linked representation is used, it's simply a matter of creating a link and adding it to the head of the list. The linked-list approach is much faster, at the expense of some extra space for the links. However, the linked approach is slower in another case, where you're randomly accessing list elements. A conventional array handles such lookups very quickly, but for a linked list you have to traverse the list to find the element. Interface Programming The two examples above illustrate an important feature of collection class programming, the idea of programming using interfaces. An interface is a description of specific functionality (such as a set of methods) that a class must implement, and a class that implements the interface will have definitions of all the interface's methods. In the present example, List is an interface that describes methods such as add() for adding elements to a list. Both ArrayList and LinkedList implement the List interface, so both are guaranteed to have an add() method. Given this, it's possible to program in terms of interfaces, that is, use methods described in the interface, and use interface types by saying: List list = new ArrayList(); If you do this, then it's possible to change the actual underlying collection class from ArrayList to LinkedList to obtain different performance characteristics. Growing a List Suppose that you are using an ArrayList object to represent a list of elements, and you keep adding elements. What happens to the internal array used to store the elements? At some point the array will become full, and its elements must then be copied to a larger array. How much larger? If a large growth factor is chosen, there will be less copying, but more space at the end of the array will be wasted. The version of ArrayList in the implementation from Sun uses a growth factor of 1.5, that is, when a 100-element list becomes full, a new list of size 150 is allocated, and existing elements are copied to it. It's possible to avoid list copying, for example by calling ensureCapacity() on a list to force it to expand so that there are slots allocated for all elements: import java.util.*;
public class cap {
This approach works against the dynamic nature of lists, where element slots are allocated only as needed, and thus will waste a lot of space unless you're sure you know in advance how many elements the list will ultimately contain. Making Collection Classes Thread-Safe If you've used Java libraries much, you may have noticed that the old Vector class uses synchronized methods, that is, methods that allow execution by only one program thread at a time. A newer collection class such as ArrayList does not use synchronized methods, and therefore methods such as add() called on a list are not thread-safe. If two threads call a method simultaneously, the list may be left in an inconsistent state as a result of interleaved execution. Synchronized methods have some expense associated with them, because the Java Virtual Machine must obtain a lock on the method's instance. Newer classes such as ArrayList push the burden of ensuring thread safety onto the user. One way you can add synchronization to ArrayList looks like this: List list = Collections.synchronizedList(new ArrayList()); That is, add a wrapper on top of the actual collection class. Such a wrapper works by forcing a list method such as add() to go through an extra layer, one that adds synchronized methods. Conclusion
We've spent some time discussing performance tradeoffs, as made in a
couple of standard Java collection classes. One final point to note is
this: you don't have to accept these tradeoffs you can always
design your own classes, with a different set of tradeoffs. One of the
virtues of Java interface programming is that you can design a class
that implements a collection interface (like List), and use it instead
of a standard class, without having to change the rest of your code
much.
|
|
Last changed: 13 Dec. 1999 mc |
|