Recent changes
Table of Contents

Generic Tuple Computation

I don’t really understand this stuff very well. My goal in writing this down was to get some feedback. Since you’re probably the only person who will ever visit this site, your feedback better be really good.

I want to perform computations on tuples generically.

zip in Haskell

There’s a function in the Haskell Prelude called zip. It takes a pair of lists and joins their elements together, returning a list of pairs:

-- The type of 'zip':
zip :: [a] -> [b] -> [(a,b)]

-- Sample invocation of 'zip'
> zip  [1, 2, 3, 4, 5]  ["one", "two", "three"]
[(1, "one"), (2, "two"), (3, "three")]

Note that zip doesn’t take a tuple of two arrays as it’s argument. It uses currying to process two arguments. For our purposes, that wont do. We need a regular tuple. So let’s make a “tuple zip” that looks slightly different:

-- Type signature:
tzip :: ([a], [b]) -> [(a,b)]

-- Implementation
tzip ((x:xs), (y:ys)) = (x, y) : tzip (xs, ys)
tzip _                = []

-- Sample invocation
> tzip ([1, 2, 3, 4, 5], ["one", "two", "three"])
[(1, "one"), (2, "two"), (3, "three")]

The problem with the above function is that it only works with two arguments. The Haskell Prelude had a curried 3-argument function called zip3, so it’s probably a useful thing to have variants of tzip to handle more lists. We could define tzip3, tzip4, tzip5 and so on until we get to the point where nobody would need an additional version, but that’s not a real solution. What we should do is handle all the different sizes generically. To make things clearer, lets switch over to a “normal” language.

zip in a "normal" language

Haskell’s pattern maching syntax and native support for lists hide a lot of the messy work. Too much happens automagically. With a Java-like language, we can see exactly what is going on.

The plain old 2-argument zip:

<X,Y>
List<(X,Y)> zip(List<X> x, List<Y> y)
{
   Iterator<X> xi = x.iterator()
   Iterator<Y> yi = y.iterator()
   List<(X,Y)> result = new List<(X,Y)>()

   while (xi.hasNext() && yi.hasNext())
      result.add((xi.next(), yi.next())

   return result
}

What a pain. OK, on to the generic version. But first, we need to pretend we have some extra language features.

Language support

Type expansion

To write a generic zip function, we need to be able to take a tuple of type (int, bool) and derive the type (Iterator<int>, Iterator<bool>). Here’s the plan:

Here are some examples:

1*(int,bool)        :  (int, bool)
List<1>*(int,bool)  :  (List<int>, List<bool>)
(1,1)*(int,bool)    :  ((int,int), (bool,bool))

// Assume the following type definitions
X = (a, b, c)
Y = (p, q)

List<1*Y>        :  List<(p, q)>
List<1>*X        :  (List<a>, List<b>, List<c>)

(1,2)*X*Y        :  ((a,2),(b,2),(c,2)) * Y
                 :  (((a,p),(b,p),(c,p)), ((a,q),(b,q),(c,q)))

(2,1)*Y*X        :  ((2,p),(2,q)) * Y
                 :  (((a,p),(a,q)), ((b,p),(b,q)), ((c,p),(c,q)))

(1,2)*X.. * Y    :  ((a,2),(b,2),(c,2)).. * Y
                 :  (((a,p),(b,p),(c,p)).., ((a,q),(b,q),(c,q))..)
                 :  ((a,p),(b,p),(c,p),(a,q),(b,q),(c,q))

This looks a lot like function invocation, but I’m pretty sure it isn’t powerful enough to spin out into halting problem land. Then again, I probably would have said that about C++ templates when I first learned about them (but I think it’s the fact that C++ allows templates to perform integer computations that makes them Turing complete).

Tuple value computation

We need a way to perform a computation on all the elements of a tuple value. The most obvious way of doing this is to copy the two fundamental functions of list computations: “map” and “reduce”.

“map” invokes the same function on multiple elements in parallel. This is essentially the same functionality provided by the type expansion syntax and so we could always use a variation on that. However, that syntax is really tedious and so we could use a simpler syntax that will cover most of our needs: the “@” suffix. If a tuple expression is suffixed by a “@”, then the surrounding expression will be applied to all of the tuple elements individually and the results will be combined back into a tuple.

a = (12, "Goose", false)
a@.GetClassName()      : ("int", "String", "bool")
a@ instanceof String   : (false, true, false)

b = (1, 2, 5)
b@ + 10    :  (12, 13, 15)
b@ == 2    :  (false, true, false)

Similarly, we’ll define a limited subset of “reduce”: when the “&&” and “||” operators are prefixed to a tuple expression, they perform an AND or an OR on all the values in the tuple.

&& (true, false)     :  false
|| (true, false)     :  true

&& ((1,2,3)@ == 2)   :  false
|| ((1,2,3)@ == 2)   :  true

Now that we have the necessary tools, we can define a generic zip function.

Generic zip

<X>
List<X> zip(List<1>*X x)
{
   Iterator<1>*X xi = x@.iterator()
   List<X> result = new List<X>()

   while (&& xi@.hasNext())
      result.add(xi@.next())

   return result
}

You may have noticed that we could have problems with empty tuples. Since “&&” on an empty tuple will always return “true”, the “while” loop will go on forever. Luckily, empty tuples don’t really exist. See the tuples page for an attempt at an explanation.

unzip

unzip undoes the work that zip does. Here’s the two-argument version

<X,Y>
(List<X>,List<Y>) unzip(List<(X,Y)> l)
{
   Iterator<(X,Y)> i = l.iterator();
   List<X> resultX = new List<X>();
   List<Y> resultY = new List<Y>();

   while (i.hasNext()) {
      X valX;
      Y valY;
      (valX, valY) = i.next();
      resultX.add(valX);
      resultY.add(valY);     
   }
 
   return (resultX, resultY);
}

Generic unzip

Um…I don’t know how to do this. I don’t think the syntax we have so far is enough. We need to be able to do the following:

X  :  (int, String, bool)
Y  :  (List<int>, List<String>, List<bool>)

????? : ((int, List<int>), (String, List<String>), (bool, List<bool>))

Any ideas?

<X>
List<1>*X unzip(List<X> x)
{
   Iterator<X> xs = x.iterator()
   List<1>*X result = new List<X@>()

   while (xs.hasNext())
      // do some voodoo here

   return result
}

Miscellaneous details

When the "@" isn’t powerful enough

b = (1, 2, 3)
c = (10, 20, 30)

b@ + c@      :  ((12, 13, 13), (22, 23, 23), (32, 33, 33))
c@ + b@      :  ((12, 22, 32), (13, 23, 33), (13, 23, 33))

b@ - c@      :  ((-9,-19,-29), (-8,-18,-28),  (-7,-17,-27))
???????      :  (( 9, 19, 29), ( 8, 18, 28),  ( 7, 17, 27))
c@ - b@      :  (( 9,  8,  7), (19, 18, 17),  (29, 28, 27))

The “???????” means that I don’t know an easy way to generate the result on the right. That’s because we have to control the order of expansion of the two tuples, but we can’t. With addition, we can just reverse the arguments because addition commutes. Can’t do the same thing with subtraction.

The long way of getting around this is to create a function that will reverse the arguments:

int RevSub(int x, int y) { return (x - y) }
RevSub(@c, @b)  :  (( 9, 19, 29), ( 8, 18, 28),  ( 7, 17, 27))

Do we need special syntax to handle this more concisely?

Type constraints

This is the syntax that lets you specify what the tuple elements should look like. Though Haskell-style (or even C++-style) implicit constraints may come in handy, the predictability of explicit constraints is sometimes nice to have.

Here are example constraints for a remote function call proxy:

class RemoteProxy<(Param)>
   where Param@ implements Serializable
{
   void Invoke(Param.. p)
}

This constraint can be used to ensure that all the parameter types in the Param tuple implement the Serializable interface (so that all the parameters can be shipped to a remote target).

Interestingly, in this special case we might be able to avoid the

class RemoveProxy<Params>
   where Params implements Serializable
{
   void Invoke(Params p..)
}

The second example only checks that the entire tuple implements Serializable. Since the runtime system creates tuples on the fly, it could treat Serializable in a special way by automatically making a tuple Serializable if all its constituents are serializable.

But we can’t bake everything into the language. We need allow interface writers to provide implementations for tuples:

interface Hashable
{
   int hash();
}

tuple (T..) implements Hashable
  where T@ implements Hashable
{
   int hash()
   {
      // TODO: try to figure out how to do this without degenerating to
      // list-style programming.
   }
}

Implementation

Often, when a new language feature is proposed, the proposer provides a reference implementation. That’s hard and so I’m not going to do that. If you know a language that has similar features, let me know and I’ll link to it here.

data/generic_tuple_computation.txt Last modified: 01.10.2008 00:37
Driven by DokuWiki