Recent changes
Table of Contents

Type Erasure Is Not Evil

Sun has been taking a lot of flak for implementing Java’s generics with “type erasure”. They deserve it, of course, but the concept of type erasure has taken an beating as well and I don’t think it deserved to.

Effects Of Compile-time Type Erasure

Java has two compilation steps. In the first step, the Java compiler turns Java source code into JVM bytecode. During this phase, information about type parameters is available to the compiler. The second phase is where the JVM interprets or compiles the bytecode. This second phase doesn’t make use of type information for type parameters (though it seems like the information is stored in the “.class” file metadata area).

Because the JVM isn’t aware of the type parameters associated with a type, it isn’t able to infer that certain operations are safe.

Map<String,String> m = new HashMap<String,String>();
m.put("A", "B");
String s = m.get("A");

The new Java language allows the above code, but it gets compiled to:

Map m = new HashMap();
m.put("A", "B");
String s = (String) m.get("A");

The VM needs the cast because it can’t tell (in general) that the given Map will only contain Strings. The result is an unnecessary performance hit and loss of type safety.

Traditionally, type erasure is when you don’t use type information at runtime. The Java guys are actually eliminating the information before compilation is complete. That is a bad idea.

Effects Of Runtime Type Erasure

The lack of runtime type information causes things to not work as expected but if you tilt your head and squint just the right amount, I think you’ll agree that type erasure is not to blame.

Generic Arrays

You cannot instantiate arrays of concrete parameterized types.

List<String>[] l = new List<String>[10];

The compiler will reject that because it could lead to a type error somewhere down the line. Full explanation.

The problem stems from the fact that every time you store a value into an array, there’s a runtime check involved. Really. While that check might be optimized away in certain situations, the language spec doesn’t identify those situations; as far as the semantics are concerned, arrays are not really type safe.

The main source of all the array-related issues is that Java arrays are covariant even though this is not typesafe. That’s why storing to an array requires runtime checks. If array variance was handled properly, there’d be no problem.

Casts

Runtime type casting relies on runtime type information. When you don’t have that information, casts don’t work correctly.

1 void f(Object o) {
2 	Object o = ls;
3 	List<Integer> l = (List<Integer>) o;
4 }

The compiler will issue a warning on line 3 saying that the cast is unchecked. It’s not fully unchecked, because the JVM will check to see if it’s actually a “List”, but it can’t check whether it’s a list of Integers or Strings because that information has been erased.

Since it’s only a warning, you can execute that code and this particular fragment will run just fine. The problem only surfaces when you try and access one of the list elements with, for example, get(…). The Java compiler automatically casts the return value of get(…) to Integer so if the list you got actually contained String objects, you’ll end up with a ClassCastException.

The root problem here is that casts aren’t really type safe. Many functional programming languages use type-safe tagged unions to achieve the similar functionality.

Can’t Call Constructor

class Pair<T> {
    T x1, x2;
    Pair() {}
    void Init() {
        x1 = new T();  // <-- error
        x2 = new T();  // <-- error
    }
}

This limitation isn’t solely due to type erasure. This is disallowed because there’s no guarantee that the class “T” has a constructor that takes zero parameters. The C# people decided to allow for this by adding a special case for the constructor. In Java, you can get around this by using a factory:

inteface Factory<T> {
    T Make();
}
class Container<T> {
    Factory<T> f;
    T x1, x2;
    Pair(Factory<T> f) { this.f = f }
    void Init() {
        x1 = f.Make();
        x2 = f.Make();
    }
}

To allow “normal” use of the constructor, you need to have runtime type information. However, without constructor constraints (which C# has), runtime type information alone wouldn’t allow you to use this feature in a type-safe manner.

But in the end, this isn’t even something that was possible before generics, so you’re not really losing anything. Besides, generics make the factory pattern less painful to use. (Support for tuples would further reduce the pain.)

Conclusion

In a cleaner programming language, a type erasure-based generics implementation wouldn’t be that bad. It’s actually the many other unsafe features of the Java language that are to blame for the current mess. Their shortcomings were hidden when Java had a weaker type system but generics has brought those problems to the surface (and, as a result, type erasure has become the scapegoat). Type erasure itself is not necessarily a bad idea.

The decision to use type erasure for Java, however, was a bad idea. Java programmers are used to unsafe casts and unsafe array variance and Sun should have designed a system that would accomodate them.

data/type_erasure_is_not_evil.txt Last modified: 01.10.2008 00:37
Driven by DokuWiki