Recent changes
Table of Contents

compareTo(…) Is Bad

Traditionally, generic operations on ordered objects are provided a comparison function to use. In Java, you define a class’ natural ordering by writing a virtual method that performs the comparison. I think that was a mistake made by people who tried to force OO-ness where it didn’t belong.

The Problem

Let’s take a 2-dimensional Point class as an example.

class Point implements Comparable
{
  int x, y;

  int compareTo(Object otherObject)
  {
    Point p = (Point) otherObject;
    // Do something with: x, y, p.x, p.y
  }
}

Type Safety

The comparison function isn’t type safe. The first line could bomb out with a ClassCastException at runtime. Though Java’s type system isn’t the greatest, comparing values is fundamental to programming. We shouldn’t have to hack up a workaround for something so simple.

What could help is if the parameter was a Point instead of an Object. Doing that would not allow it to be a virtual method, though, since it wouldn’t be compatible with Comparable’s method signature. You can make the return type more specific or a parameter type more general, but not vice versa.

But wait a minute…why is it a virtual method?

Overriding

The reason you’d want to make a method virtual is to let you override the behavior in a subtype. However, overriding compareTo(…) is fundamentally problematic.

To be useful, a comparison function must behave sanely. One of the requirements for sane behavior is that the order of arguments passed in shouldn’t matter. This means that “a.compareTo(b)” should return the opposite of “b.compareTo(a)” (symmetry).

If object “b” is a subtype of “a”, then overriding compareTo(…) in a subclass is almost guaranteed to break the symmetry requirement because you’re changing the behavior of one of the comparisons without changing the behavior of the other. Not sane.

Let’s say we defined a new subtype:

class Point3D extends Point
{
  int z;

  int compareTo(Object otherObject)
  {
    Point3D p = (Point3D) otherObject;
    // Do something with: x, y, z, p.x, p.y, p.z
  }
}

Now the comparison is not symmetric and will behave unpredictably:

Point Max(Point p1, Point p2)
{
  if (p1.compareTo(p2) > 0)
    return p1;
  else
    return p2;
}

If both the parameters are Point objects, everything will be fine. The same goes for when they are both Point3D objects.

If you pass in a Point3D for “p1” and a plain Point for “p2”, you’ll hit the ClassCastException in Point3D’s comparison function. In the inverse case, Point’s comparison function might think that the two objects are equal without even looking at the Point3D’s “z” component.

This is the kind of garbage that can be detected by the compiler if the API is designed to properly take advantage of the type system.

But Why Don’t I Run Into This Problem Often?

Because you almost always compare objects that are of the exact same type, which means you don’t need compareTo(…) to be virtual method.

The API makes it seem like compareTo(…) is this great method that lets you compare any object to any other object, but…

Put another way, the compareTo() technique destroys type safety to let you do something you don’t want to do anyway. Though there are ways to use compareTo(…) that avoid some of the issues (such as explicitly checking for subtype relations and deferring to the appropriate class), I can’t come up with one that does The Right Thing in every case.

The Solution

A proper solution is to pass around a separate comparison function. This is definitely not a new idea; it’s what everyone was doing before compareTo(…) came along and screwed everything up.

Since Java still doesn’t let you pass around functions, you’d have to use something like java.util.Comparator. (For simplicity, I’m going to pretend it works like a function, though.)

void QuickSort<T>(T[] array, Comparator<T> cmp)
{
  ...
}

class TreeMap<T>
{
  Comparator<T> cmp;
  ...

  TreeMap(Comparator<T> cmp)
  { 
    this.cmp = cmp;
    ...
  }
}

Default Comparison Function

The only problem with a separate function is that it’s kinda inconvenient to use. It’s an extra value to pass around when most classes only have a single commonly-used ordering function. With some modest language extensions, there might be a way around this. You could define some standard some location to look for the “default” comparison function.

void QuickSort<T>(T[] array, Comparator<T> cmp = T.cmp)
{
  // sort it
}

class Point
{
  static final Comparator<Point> cmp = new Comparator<Point>()
  {
    compare(Point p1, Point p2) { ... }
  }
}

The “cmp” parameter is given a default value to use if none is passed in. This lets you call QuickSort with only an array of Points and the compiler will automatically fill in the default value for “cmp”. But you still have the option of passing in your own comparison function.

In C++, the appropriate comparison function is matched at compile time and the compiler will let you know if one is not found. This works because C++ templates are also expanded at compile time.

Java and C# both use explicit constraint specifications on generic type parameters. Since QuickSort doesn’t place any constraints on what “T” must look like, there’s no way to guarantee that “T” will always have “T.cmp”. Using C#-style type constraints, QuickSort could possibly be written:

void QuickSort<T>(T[] array, Comparator<T> cmp = T.cmp)
  where T.cmp : Comparator<T>
{
  // sort it
}

But there’s still a problem. This method now requires that “T.cmp” exist, even if the caller passes in its own comparison function. If we remove the default parameter feature, we can still use the painful (but simple) technique of creating a pair of overloaded functions:

void QuickSort<T>(T[] array)
  where T.cmp : Comparator<T>
{
  QuickSort(array, T.cmp);
}

void QuickSort<T>(T[] array, Comparator<T> cmp)
{
  // sort it
}

Oh well. Not an ideal situation, but what are you going to do? If this idiom becomes more common, it might be a good idea to have the language automatically handle the details for you. If the default parameter value is part of the function’s interface specification, then I think it still doesn’t violate the idea of specifying generic constraints explicitly.

If you know another way to accommodate the concept of a “default comparision function”, let me know.

What About equals(…)?

One would think that Java’s technique for checking equality also suffers from similar problems, but it looks like the relative simplicity equals(…) lets it squeeze by. For one, the ClassCastException isn’t really an issue because the equals(…) function can just return false if the types aren’t compatible (an option that compareTo(…) doesn’t have).

To avoid the problem of asymmetry (“a.equals(b)” vs. “b.equals(a)”), we just have to match the type exactly instead of simply checking for compatibility using “instanceof”:

class Point
{
  int x, y;

  boolean equals(Object o)
  {
    // Check for an EXACT type match.
    if (this.getClass() != o.getClass()) return false;
    Point p = (Point) o;
    // Do something with: x, y, p.x, p.y
  }
}

All equals(…) functions must follow the rule of checking for an exact type match. Even though everything works out, it still feels like a bad idea. Any time you need to use runtime type information for something so simple, you’re probably doing it the wrong way.

Though I think a separate comparison function would be a better idea, the arguments against compareTo(…) don’t really apply here; the ulimate behavior of the equals(…) function is just fine. So far, all I can come up with is:

Of course, that’s not really a solid argument against it, so if you can come up with a better one, let me know.

Keywords

This is just to get hits from web searches:

Incidentally, when I was first looking for information on this topic, I searched for “compareTo considered harmful”. I woulda used that title for this page if it didn’t feel so blasphemous.

data/compareto_is_bad.txt Last modified: 01.10.2008 00:37
Driven by DokuWiki