Wednesday, March 17, 2010

Polyvariadic PrimeFib problem

begin Numbers.lhs

Some time ago, I claimed in a reddit comment, that it's possible to write a function that accepts only prime number of arguments, that are all strings, but occurrences of integers at the indices that are Fibonacci numbers. Sounds cool and a little bit retarded, doesn't it? kalven asked about it, so now I have to prove, that I wasn't bluffing.

But why would I even say such a thing? Well, it started with this post about polyvariadic functions in Haskell. The author claimed, that most modern languages have support for functions taking variable number of arguments. Let's take a look at Java: this "support" consists of hidden array creation and passing that array as a single argument. Other languages work in a similar way, with the exception of C, where everything is allocated on the stack, and it works thanks to grit, spit and a whole lotta duct tape. This approach isn't as glamorous as it sounds, and there are disadvantages. All arguments have to be of the same type, even in dynamically typed languages - they only have one universal type. To use arguments that have different types, they have to be upcasted at the call site (e.g. to Object in Java) and later downcasted in such vararg function. It is unsafe and forces all arguments to have the same size or use autoboxing. In Haskell, thanks to better syntax and boiler-plat free list creation, it's easy to pass many arguments to a function, just pass a list. Compare:

Apologies for all those underlined words, it seems that my blog software doesn't like certain bad words.

class Test {

public static void main(String[] args) {
int x = foo(1, 2, 3, 4);
System.out.println(x);
}

static int foo(Object... args) {
return args.length;
}
}

With this:

main = do {
x <- foo[1, 2, 3, 4];
print x;
}

foo args = do {
return (length args);
}

The kind of parens is the only difference in syntax or semantics (of vararg function call). Of course, being of the same type means something different in Java (sub-typing), than in Haskell, but that's not the point of this post. The point is, that Haskell is the only usable language*, that truly allows polyvariadic functions, because it's possible to write functions that take variable number of variably typed arguments, which is not possible to fake with a hidden list wrapping. As usual, now is the time to write as usual, Oleg already did it. Some polyvariadic functions are easy to implement (e.g. one that takes even number of arguments, with alternating integer and string arguments, do try to implement it), some could actually be useful.

* usable language is one, that makes it possible to download pics from the internet.

Others are just a useless, contrived example, but can prove that anything is possible in Haskell. The problem with the function of prime number of arguments, where Fibonacci indices are integers, while the rest are strings has been solved by cdsmith. Nevertheless, I want to present another two solutions - one is easier to implement, but full of type hacks, and the other needed some thinking specific to this problem, but works much better - no boiler-plate required. Solutions are independent, so don't get discouraged by the first one, you can still be able to enjoy the second.

This post, like the previous one, needs ghc-6.12.2, 6.10 or 6.12.1 with hacked HList library. It's splitted across 4 files for modular reasons, the code is available at the blog repo.

Both solutions rely on prime and Fibonacci numbers, so we need type predicates for deciding if the number is prime/fib. I'm lazy and I didn't feel like translating regular algorithms to the type level, so I've assumed, that there are no prime/fib numbers bigger then 20, that way, it's possible to enumerate the positives, and choose all the other numbers as negatives.

Both predicates are similar, type-level function, that results in HTrue/HFalse, method has default bottom definition, because it will be only used at the type-level. Implementation uses classic trick, but as usual, I prefer equality constraints to TypeCasts wherever possible.

> {-# LANGUAGE MultiParamTypeClasses
> , FunctionalDependencies
> , FlexibleInstances
> , TypeFamilies
> , UndecidableInstances
> , TypeSynonymInstances
> , OverlappingInstances
> #-}

> module Numbers where
> import Data.HList
> type S n = HSucc n
> type Z = HZero
> class IsPrime n result | n -> result where
> isPrime :: n -> result
> isPrime = undefined
> instance IsPrime(S(S Z)) HTrue
> instance IsPrime(S(S(S Z))) HTrue
> instance IsPrime(S(S(S(S(S Z))))) HTrue
> instance IsPrime(S(S(S(S(S(S(S Z))))))) HTrue
> instance IsPrime(S(S(S(S(S(S(S(S(S(S(S Z))))))))))) HTrue
> instance IsPrime(S(S(S(S(S(S(S(S(S(S(S(S(S Z))))))))))))) HTrue
> instance IsPrime(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S Z))))))))))))))))) HTrue
> instance IsPrime(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S Z))))))))))))))))))) HTrue
> instance result ~ HFalse => IsPrime a result
> class IsFib n result | n -> result where
> isFib :: n -> result
> isFib = undefined
> instance IsFib(Z) HTrue
> instance IsFib(S(Z)) HTrue
> instance IsFib(S(S Z)) HTrue
> instance IsFib(S(S(S Z))) HTrue
> instance IsFib(S(S(S(S(S Z))))) HTrue
> instance IsFib(S(S(S(S(S(S(S(S Z)))))))) HTrue
> instance IsFib(S(S(S(S(S(S(S(S(S(S(S(S(S Z))))))))))))) HTrue
> instance result ~ HFalse => IsFib n result

begin HListExtras.lhs

Before we get to the first solution, we need two list functions, that are missing from HList library. hPartition and hAll are type-level equivalents of partition and all functions from Data.List module. The implementation follows closely the style of the other, higher-order functions from HList, so there aren't any explanations, you can always read the excellent HList paper. If you want more explanations, please ask in comments/email/irc, I could explain it some more if there's a need.

> {-# LANGUAGE MultiParamTypeClasses
> , FunctionalDependencies
> , FlexibleInstances
> , UndecidableInstances
> #-}

> module HListExtras where
> import Data.HList
> class HPartition p l r1 r2 | p l -> r1 r2 where
> hPartition :: p -> l -> (r1,r2)
> class HPartition' flag p l r1 r2 | flag p l -> r1 r2 where
> hPartition' :: flag -> p -> l -> (r1,r2)
> instance HPartition p HNil HNil HNil where
> hPartition _ _ = (HNil, HNil)
> instance ( Apply p x flag
> , HPartition' flag p (HCons x xs) r1 r2
> ) => HPartition p (HCons x xs) r1 r2 where
> hPartition p l@(HCons x _) = hPartition' (apply p x) p l
> instance HPartition p xs r1 r2 => HPartition' HTrue p (HCons x xs) (HCons x r1) r2 where
> hPartition' _ p (HCons x xs) = let (r1, r2) = hPartition p xs
> in (HCons x r1, r2)
> instance HPartition p xs r1 r2 => HPartition' HFalse p (HCons x xs) r1 (HCons x r2) where
> hPartition' _ p (HCons x xs) = let (r1, r2) = hPartition p xs
> in (r1, HCons x r2)
> class HAll p l r | p l -> r where
> hAll :: p -> l -> r
> instance HAll p HNil HTrue where
> hAll _ HNil = hTrue
> instance (Apply p x b1, HAll p xs b2, HAnd b1 b2 b) => HAll p (HCons x xs) b where
> hAll p (HCons x xs) = apply p x `hAnd` hAll p xs

begin Solution1.lhs

Before we can write a function that meets the specification, we have to decide what this function should actually do:) It's funny, if a type-blind person would read the problem, the only conclusion would be, that it's possible to write a function in Haskell - truly amazing. The easiest thing to do (beside ignoring arguments and returning unit value), is to return a pair of lists, one will collect integers, and the other strings.

So what is the first solution about? It's based on a simple idea, we start with a function that accepts any number of integer and string arguments, in any configuration, and then we write a constrained wrapper.

> {-# LANGUAGE MultiParamTypeClasses
> , FunctionalDependencies
> , FlexibleInstances
> , UndecidableInstances
> , TypeFamilies
> , TypeSynonymInstances
> , NoMonomorphismRestriction
> , OverlappingInstances
> #-}

> module Solution1 where
> import Data.HList hiding (TypeEq, typeEq)
> import Data.HList.TypeEqGeneric2
> import HListExtras
> import Numbers
> type X = ([Int], [String])

Function foo is defined inductively, very similar to c function from introduction of Oleg's polyvariadic page, nothing fancy here:

> class Foo result where
> foo :: X -> result
> instance Foo X where
> foo (ints,strings) = (reverse ints, reverse strings)
> instance Foo r => Foo (Int -> r) where
> foo (ints,strings) = \k -> foo (k:ints,strings)
> instance Foo r => Foo (String -> r) where
> foo (ints,strings) = \s -> foo (ints,s:strings)
> testFoo = foo ([],[]) (1::Int) "2" "3" (4::Int)

We have to specify Int types of every argument, because otherwise it's just a type variable. Strings are already monomorphic. When we decide that foo has enough, we force it to be of final X type.

*Solution1> :t testFoo
testFoo :: (Foo t) => t
*Solution1> testFoo :: X
([1,4],["2","3"])

Now, if we could get a hold of the final type of particular usage of foo function, extract from that the list of argument types, it would be very easy to verify correct usage.

Turning type of function into a heterogeneous list of types (values are bottoms) of arguments is performed by the following, type-level function. It's very similar to ResultType from my previous post, but it returns every type but the last one, which is the type of the result. It relies on the same trick to overcome problems with functional dependencies and overlapping instances.

> class FunctionArguments f r | f -> r where
> functionArguments :: f -> r

Base instance, if the type is not a function, return empty list, we don't care about result type:

> instance result ~ HNil => FunctionArguments x result where
> functionArguments _ = HNil

Recursive case - if argument is of arrow type, put the argument into list head and recurse down the rest of the type. Value of this type is of course bottom, it's not possible to come up with anything meaningful, but it's not needed fortunately, we only care about the types. Similarly, the rest of the type is a result of applying dummy bottom value to our input function.

> instance ( FunctionArguments rest others
> , result ~ (HCons x others)
> ) => FunctionArguments (x -> rest) result where
> functionArguments f = HCons undefined $ functionArguments $ f undefined

In Haskell it's easy to get a hold of value of the result type, and use it inside its definition, thanks to the following trick:

primeFib = let r = foo ([],[])
_ = ....
in r

In that dotted block, we can operate on the final value. The problem with let-expressions is, that they are polymorphic, and this together with other very polymorphic, type-level code can result in ambiguities, that can be resolved only with explicit type signatures and usage of ScopedTypeVariables. But, I don't want to write down that type, it's very complicated (over 10 lines!), because it records the whole algorithm. Fortunately, we can choose the case-expression, which has monomorphic bindings, and let ghc figure out the type:

> primeFib = case foo ([],[]) of
> r -> let _ = checkList $ functionArguments r
> in r

It's easy to see, that this block doesn't do anything, but type-checker still takes it under consideration, so it's possible for it to constrain the resulting type. The only thing left, is to define checkList function, that will do all those calculations.

There's one more type-level (defined in terms of classes) piece of code needed: equivalent of zip [1..] value-level function, that will pair every element of the list with its index, which will allow to check its primality and fibonality.

NumberList takes a counter, that represents the first free index number, list, and returns list of numbered pairs:

> class NumberList n l result | n l -> result where
> numberList' :: n -> l -> result

Base case is trivial, there's nothing to number, return empty list:

> instance NumberList n HNil HNil where
> numberList' _ _ = HNil

Inductive case, tag the head value of the input list with the current counter, and recurse down the list with incremented counter:

> instance (HNat n, NumberList (S n) xs ys) => NumberList n (HCons x xs) (HCons (n,x) ys) where
> numberList' n (HCons x xs) = HCons (n,x) $ numberList' (hSucc n) xs

Numbering starts with one:

> numberList = numberList' (hSucc hZero)

checkList function, like any code operating on a list in a functional language, will use higher-order functions, if you remember explanations from my previous post, this requires defunctionalization. There are 4 HOFs used, and their arguments have to be adjusted accordingly:

  1. map snd
> data Snd = Snd
> instance Apply Snd (a, b) b
  1. partition fibIndex partitions list of pairs, according to predicate that checks fibonality of the index.
> data FibIndex = FibIndex
> instance IsFib n r => Apply FibIndex (n,x) r
  1. (and 4.) there's all isInt and all isString, this could be implemented as two different functions, but it's possible to defunctionalize partial application of type-level equivalent of ==, TypeEq. This creates closures, that have to be reifed as data types in normal circumstances, but at the type level we use type parameter, and since we don't really care about value level in these predicates, we don't need any constructors, but I'm not a fan of EmptyDataDecls. This makes Is a phantom type.
> data Is x = Is
> instance TypeEq x y r => Apply (Is x) y r

I think, that checkList's definition doesn't need any explanations, we already have all the hard type-level parts defined, the rest is just functional programming:

> checkList l = intsOk `hAnd` stringsOk `hAnd` isPrime len :: HTrue
> where (fibIndices, nonFibIndices) = hPartition FibIndex $ numberList l
> removeIndices = hMap Snd
> isInt = Is :: Is Int
> isString = Is :: Is String
> intsOk = hAll isInt $ removeIndices fibIndices
> stringsOk = hAll isString $ removeIndices nonFibIndices
> len = hLength l
> testPrimeFib1 = primeFib (1::Int) (2::Int) (3::Int) "4" (5::Int) -- correct
> testPrimeFib2 = primeFib (1::Int) "2" -- string at fib index
> testPrimeFib3 = primeFib (1::Int) (2::Int) (3::Int) "4" -- not prime number of args
*Solution1> testPrimeFib1 :: X
([1,2,3,5],["4"])
*Solution1> testPrimeFib2 :: X
Top level:
Couldn't match expected type `HTrue' against inferred type `HFalse'
When using functional dependencies to combine
HAnd HFalse HTrue HFalse,
(...)
*Solution1> testPrimeFib3 :: X
Top level:
Couldn't match expected type `HTrue' against inferred type `HFalse'

*Solution1> :t testPrimeFib1
testPrimeFib1
:: (HMap Snd r2 ys2,
HMap Snd r11 ys1,
HAnd r r1 t'',
IsPrime (HSucc (HSucc (HSucc (HSucc (HSucc n))))) result,
HAnd t'' result HTrue,
HAnd HTrue b23 r1,
HAll (Is String) ys2 b23,
HAnd HTrue b2 r,
HAnd HTrue b21 b2,
HAnd HTrue b22 b21,
HAll (Is Int) ys1 b211,
HAnd HTrue b211 b22,
HPartition FibIndex ys r11 r2,
NumberList
(S (HSucc (HSucc (HSucc (HSucc (HSucc HZero)))))) others ys,
HLength others n,
FunctionArguments t others,
Foo t) =>
t

It works. Pros of this approach:

  • easy implementation:
    • many things available in HList
    • others (like FunctionArguments) are very general and are usable with other, non variadic problems
    • problem boils down to list manipulation
  • it scales to other contrived examples - only checkList has to be modified

Cons:

  • a lot of boiler-plate at call-site
  • huge types of intermediate expressions
  • unreadable type errors

begin Solution2.lhs

This time, we'll analyze this specific problem, and come up with much better and easier solution. What's the first problem with primeFib from my Solution1, and test from cdsmith's solution? Asking ghci about the type of both those functions results in x - simple type variable. Sure, there's also entire algorithm hidden in the type context, that could hide in the wardrobe and scare little children at night. But, we could tell more about that type, because we understand the problem. The function has to take only prime number of arguments, and the smallest prime number is 2. 1 and 2 are Fibonacci numbers, so clearly the type of bare primeFib function should be Int -> Int -> r, that much we can tell without choosing what to do later.

The second solution will be more aggressive, instead of checking the type at the end, when the only possible outcome is a type error, this approach will try to generate the correct type. It won't use any type-level tricks, in fact there will be three classes without functional dependencies - I almost forgot how to use those.

Here's how the algorithm will work. Let's assume that we are in the middle of computation and our functions has the following type:

primeFib :: a1 -> ... -> an -> r

and we want to generate the r part. The first thing to check, is to decide if n is a prime number. If so, we have two choices (let's call this situation overlap), we can either stop and return the accumulator (did I mention, that we are carrying an accumulator?), because we already have accepted prime number of arguments, or we can generate a bigger solution (this boils down to the other primality branch). If n isn't prime, it's clear that we must accept more arguments (this situation will be called generate), so r will be a function type an+1 -> r'. Now we have to decide the type of an+1: if n+1 is a Fibonacci number, an+1 will be Int, otherwise it is String. What about r' ? Well, we repeat the process with the successor of n.

> {-# LANGUAGE MultiParamTypeClasses
> , FunctionalDependencies
> , FlexibleInstances
> , FlexibleContexts
> , UndecidableInstances
> , TypeSynonymInstances
> , NoMonomorphismRestriction
> , OverlappingInstances
> #-}

> import Data.HList
> import Numbers
> type X = ([Int], [String])

Foo will be the entry point to the algorithm. n is our counter - how many arguments have been already eaten. foo method has to carry our accumulator. Notice that there are no functional dependencies - we want it to have many instances (as many as there are prime numbers, which is a finite number, when there are no primes bigger than 20 :)

> class Foo n result where
> foo :: n -> X -> result

Foo has only one instance, which means that it's a class synonym, and it's possible to write it as a function. I prefer a class here, because it will be used in two places, function approach would mean writing down twice this synonym, class approach can use type simplifier to do this rewriting.

This instance calls the next, auxiliary class, with an additional knowledge of the primality of n. It will be calculated as soon as n is known, because IsPrime is a type-level function.

> instance (IsPrime n flag, Foo' flag n result) => Foo n result where
> foo n acc = foo' (isPrime n) n acc

Foo' class can now dispatch on the primality of the counter (flag argument):

> class Foo' flag n result where
> foo' :: flag -> n -> X -> result

So, n isn't prime, we forward to the Generate class, but we also provide Generate class the result of IsFib on the successor of n, so it can dispatch on that fact.

> instance (HNat n, IsFib (HSucc n) flag, Generate flag n result) => Foo' HFalse n result where
> foo' _ n acc = generate (isFib $ hSucc n) n acc

Generate class has a functional dependency, if we know that n+1 is a Fibonacci number, we know which instance to choose - thus we know part or the resulting type.

> class Generate flag n result | flag -> result where
> generate :: flag -> n -> X -> result

n+1 isn't a Fibonacci number, generate a function from String and continue the recursive process by calling Foo with a bigger counter. At the value level, return a function that will insert eaten argument to the accumulator.

> instance (HNat n, Foo (S n) result) => Generate HFalse n (String -> result) where
> generate _ n (ints,strings) = \s -> foo (hSucc n) (ints, s:strings)

The other case, similar to the previous one, but for integers:

> instance (HNat n, Foo (S n) result) => Generate HTrue n (Int -> result) where
> generate _ n (ints,strings) = \k -> foo (hSucc n) (k:ints, strings)

Now the other branch of our first if, if counter was a prime number, go to the Overlap situation:

> instance Overlap n result => Foo' HTrue n result where
> foo' _ n acc = overlap n acc

This class has two overlapping instances - this is what stops the process, because it cannot decide which one to choose without further knowledge.

> class Overlap n result where
> overlap :: n -> X -> result

Final case - our result type has been forced to X, return reversed accumulators:

> instance Overlap n X where
> overlap _ (ints, strings) = (reverse ints, reverse strings)

Otherwise, go to the Overlap case. There's no need to decide if n+1 is a Fibonacci number, we can reuse Foo' HFalse instance, which does just that:

> instance (Foo' HFalse n result) => Overlap n result where
> overlap n acc = foo' hFalse n acc

Obviously, we start with empty accumulators and counter value of zero - we haven't accepted any arguments yet.

> primeFib = foo hZero ([],[])

This solution always generates the best approximation of the final type:

*Main> :t primeFib
primeFib :: (Overlap (HSucc (HSucc HZero)) result) => Int -> Int -> result
*Main> :t \a b c d -> primeFib a b c d
\a b c d -> primeFib a b c d
:: (Overlap (HSucc (HSucc (HSucc (HSucc (HSucc HZero))))) result) =>
Int -> Int -> Int -> String -> Int -> result

This results in less boiler-plate required at the call-site:

> testPrimeFib1 = primeFib 1 2 3 "4" 5
*Main> :t testPrimeFib1
testPrimeFib1 :: (Overlap (HSucc (HSucc (HSucc (HSucc (HSucc HZero))))) result) => result
*Main> testPrimeFib1 :: X
([1,2,3,5],["4"])

And better error messages:

*Main> primeFib "1"
:1:9:
Couldn't match expected type `Int' against inferred type `[Char]'
In the first argument of `primeFib', namely `"1"'

*Main> primeFib 1 2 3 "4" :: X
Top level:
Couldn't match expected type `([Int], [String])'
against inferred type `Int -> result'

The only disadvantage of this approach is the fact, that it required thinking specific to this problem, it could be much harder (impossible?) in different, contrived problems.

That is all, thanks for reading, comments are welcome.

Friday, March 5, 2010

Generalized zipWithN

BEGIN Part1.lhs

Some technical info about this post: it looks like literate Haskell file, but it isn't one. I'm not sure if it is possible to write this in a single Haskell file (I'll explain later why), it had to be split into two files. Since the extra part should be read in the middle, this post ended up being split into three Haskell modules, that can be type-checked and compiled, then concatenated into single file, that no longer type-checks, but it parses just fine, and that's enough to use BlogLiterately on such files. It cannot be copy-pasted into a Haskell file, but it's useful anyway - there's syntax highlighting for free, and you should be familiar with literate Haskell structure - only lines prefixed with "> " are Haskell code, other code (even highlighted one) isn't taken into consideration by the Haskell compiler. Contents of this post (in regular Haskell files) are available at my blog repository

The code depends on HList library. It works with ghc 6.10. It works with ghc 6.12.1, but HList 0.2 won't compile with 6.12.1 because of ghc bug, so you can either hack HList to work around this bug, or wait for 6.12.2. It won't work with ghc 6.8, it uses equality constraints from TypeFamilies extension, introduced in 6.10.

This post is about generalized function zipWithN. Why?

  • many people think it's impossible

  • there were a lot of attempts, but they all required boilerplate - extra type annotations, idiom brackets, some value or type-level counters in one way or another

  • I've seen only one version of such function, that didn't require any boilerplate from the user, but that version couldn't deal with some polymorphic functions, without staying boilerplate-free (I know how to fix it now)

  • but, all of these versions, simple HM ones and those full of type-hackery, are based on the same idea, which is a step in the right direction, but somehow no one noticed what this idea is really about.

I'll present a solution, that works for all polymorphic functions, without any boiler-plate, and has a very beautiful implementation, that's very easy to explain. What can be considered beautiful? Short code, that is built from more general pieces. What to do, when you have ugly, verbose code? Either you can spot a known structure and re-use its implementation, which makes code shorter, or you discover a new, previously unknown abstraction, then you can publish it, gain fame and fortune and perhaps even some up votes on reddit.

First some background. There's a 12 year old paper on this matter: "An n-ary zipWith in Haskell" by Daniel Frindler and Mia Indrika, you can read about it there, but don't try to understand the code - I [hope, that I] can explain it much better. So, there's a family of functions in functional languages, that allows to "zip" n lists of arguments with a a n-argument function, and get back a list of results. Just like in that paper, I'll refer to them as zipWithk, where k stands for the number of function arguments, or equivalently number of list arguments. In Haskell's Prelude we have zipWith3 - zipWith7, zipWith2 is called just zipWith, zipWith1 is called map, and zipWith0 is called repeat. Other Haskell libraries have smaller number of zipWith functions - vector has up to 6, stream-fusion has up to 7, but only 0-4 are fusible. OCaml only has zipWith1 and zipWith2 in the List module, they're called map and map2 respectively. It's not a good situation, user has to match zipping function arity to the name of zipWith function. There once was a similar problem with matching name of function to the type of its argument - people were tired of using printInt and printString. The solution was easy, just use (and invent first) type classes. In case of OCaml it was even easier - just switch to Haskell.

The paper presented the following solution (module header is obviously needed for other code, they didn't know back then how to abuse Haskell like that):

> {-# LANGUAGE MultiParamTypeClasses
> , FunctionalDependencies
> , FlexibleInstances
> , FlexibleContexts
> , UndecidableInstances
> , OverlappingInstances
> , TypeFamilies
> , NoMonomorphismRestriction
> #-}

> module Part1 where
> import Data.HList
> inzip :: [a] -> [a -> b] -> [b]
> inzip (a:as) (f:fs) = f a : inzip as fs
> inzip _ _ = []
> (~~~) :: (a -> b) -> (b -> c) -> a -> c
> (~~~) = flip (.)
> (~~) :: [b] -> ([b1] -> c) -> [b -> b1] -> c
> as ~~ rest = inzip as ~~~ rest
> infixr ~~
> zipWithFI f t = t (repeat f)

zipWith function from the paper got a different name (FI stands for names of the authors)

> test1 = zipWithFI (,,) ([1..] ~~ "hi" ~~ "world" ~~ id)
*Part1> :t test1
test1 :: (Num a, Enum a) => [(a, Char, Char)]
*Part1> test1
[(1,'h','w'),(2,'i','o')]

It was certainly usable, but there were two problems - a lot of boilerplate and it was extremely complicated - there were some continuations involved. I didn't understand it - sure, I could use it, but I wouldn't be able to explain it to someone else. Now I know why, and I'll tell you later, so forget about it for the moment.

There was also another solution mentioned, by Magnus Carlsson:

> zap = flip inzip
> test2 = repeat (,,) `zap` [1..] `zap` "hi" `zap` "world"
*Part1> :t test2
test2 :: (Num a, Enum a) => [(a, Char, Char)]
*Part1> test2
[(1,'h','w'),(2,'i','o')]

If you're wondering why I didn't just test if test1 == test2, it's because such expressions could help disambiguate some type variables in test2. That's not the case here, but it's a good habit when wri^H^H^Habusing Haskell code.

After noticing, that zap == Prelude.zipWith ($) (it's also <*> for ZipLists), this solution starts to look very good - it has pretty semantics: we start with infinite list of initial functions, and apply to them list of arguments, getting lists of partial applications, until we end up with the final result.

Have you ever read anything on sigfpe's blog? He always has pretty drawings of smart things (e.g. here). Let's give it a shot, here's a picture that shows left to right evaluation of test2:

evaluation of test

Oh well... yes, I am art-impaired. I did this with kolourpaint and this is the best I can do.

Now, that you've managed to stop laughing, let's get back to zipWiths. What's the problem with this solution? Turns out, it can be expressed as a very general recursion pattern, but instead, it does it manually. This is the idea behind Carlsson's version of zipWithN:

zipWithN f a1 a2 ... an == repeat f `zap` a1 `zap` a2 `zap` ... `zap` an

After adding parens around repeat f, and using prefix form of zap, which is left-associative, we get this:

zipWithN f a1 a2 ... an == zap (... (zap (zap (repeat f) a1) a2) ...) an

Any programmer, that accepted functional programming into his heart, should recognize that it's just a left fold over the list of arguments. Can it be rewritten like the following?

zipWithN f a1 a2 ... an = foldl zap (repeat f) [a1, a2, ..., an]

Well, not really, after all, these lists don't need to be of the same type, and you cannot put them into one list. But the solution will look just like that fold.

Now, when we know that zipWithN is just a left fold, let's go back to the original solution of Frindler and Indrika. After inlining definition of zipWithFI, we get this scheme:

zipWithN f a1 a2 ... an = (a1 ~~ a2 ~~ ... ~~ an ~~ id) (repeat f)

Understanding (~~) should suffice, to get a hold of this idea. Let's unfold all sub-expressions into (~~) definition. inzip will be replaced with flip zap:

as ~~ rest = inzip as ~~~ rest
   { prefix forms }
(~~) as rest = (~~~) (inzip as) rest
   { def (~~~) }
(~~) as rest = (flip (.)) (inzip as) rest
   { def flip }
(~~) as rest = (.) rest (inzip as)
   { eta expansion }
(~~) as rest a = (.) rest (inzip as) a
   { def (.) }
(~~) as rest a = rest (inzip as a)
   { inzip == flip zap }
(~~) as rest a = rest (flip zap as a)
   { def flip }
(~~) as rest a = rest (zap a as)

We use some alpha conversions, to not get confused about arguments and their names:

(~~) x y z = y (zap z x)

We can treat zap in the prev equation as a free variable. You were probably expecting, that now I'll say "but this is just ...". No. My only thought about it was "it's weird" (actually it was something else, but let's pretend that I'm more civilized). Then I realised, that I've already seen similar "weird" code, and it was also followed by id function. Where? The paper "Cheap deforestation for non-strict functional languages" by Andy Gill. There's a theorem/equation that states, that any left fold can be rewritten as a right fold:

foldl f z xs = foldr (\b g a -> g (f a b)) id xs z

After transforming the "folded" idea of Magnus Carlsson with this equality, we get the following:

zipWithN f a1 a2 ... an = foldl zap (repeat f) [a1, a2, ..., an] = foldr (~~) id [a1, a2, ..., an] (repeat f)

The authors were claiming, that Carlsson's idea was sidestepping some dependent-type problems, but as it turns out, their preferred version was identical, only extremely obfuscated and hard to understand.

I've promised you a beautiful solution to the zipWithN problem. According to the previous definition of beautiful implementations, it has to be short and made from bigger pieces. It'll take two lines of code (unfortunately, because of Haskell type system, one of those has to be transformed, with a simple transformation that could be automatic, to 4 lines of code, so in the end there are 5 lines of code) and it will use 5 or 6 other, very general functions, that have nothing to do with zipWiths and can be used to implement other things.

We'll start with a simpler problem - uncurried version of zipWithN. uncurriedZipWithN will take a zipping function and a tuple of list arguments. There's no point in using normal Haskell tuples, 2-tuple (,) and 3-tuple (,,) have as much in common, as Ints and Strings - there's no recursive structure. Better option is to use nested 2-tuples, that lean to the right (e.g. (a,(b,(c,d)))). While it's possible to use nested tuples directly, it's easier to use nested tuples, that have an explicit terminator at the end, which makes it isomorphic to lists, that can contain values of different types. There's a whole library for use with these heterogeneous lists - HList, created by the one and only - Oleg Kiselyov.

As any list library, HList contains a fold function, unfortunately only right one, we have to write HFoldl ourselves. This definition follows the style of other list functions from HList, so you can read more about it in the HList paper (HFoldr is explained in one of the appendices).

> class HFoldl f z l r | f z l -> r where
> hFoldl :: f -> z -> l -> r
> instance HFoldl f z HNil z where
> hFoldl _ z _ = z

There's one problem with hFoldl (compared to other functions like hTail) - it's a higher-order function and it's not possible (in general) to use such functions at the type level in Haskell. What do we do, when we want to write a higher-order function in first-order language (in this case - Haskell's type system)? We apply Reynolds defunctionalization. There is a open type function Apply, modelled as a class with a single method apply. hFoldl's argument takes 2 arguments, but it's easier to model it with a pair of arguments, that way single class Apply is sufficient for all functions. Here's the defunctionalized version of inductive case of HFoldl:

> instance ( HFoldl f y xs r
> , Apply f (z,x) y
> ) => HFoldl f z (HCons x xs) r where
> hFoldl f z (HCons x xs) = hFoldl f (apply f (z,x)) xs

Now we have to adjust our zap to this defunctionalized hFoldl. We need "avatar" for zap function. It doesn't need any values inside (those would probably be modelled as type variables), because there are no closures involved.

Naive solution would look like this:

> data ApplyZapNaive = ApplyZapNaive
> instance Apply ApplyZapNaive ([x->y], [x]) [y] where
> apply ApplyZapNaive = uncurry zap

This seems to be working:

> testNaiveApply1 = apply ApplyZapNaive ([not], [True, False])
*Part1> :t testNaiveApply1
testNaiveApply1 :: [Bool]
*Part1> testNaiveApply1
[False]

But, there are problems:

> testNaiveApply2 = apply ApplyZapNaive ([negate], [1])
*Part1> :t testNaiveApply2
testNaiveApply2 :: (Num a, Num t, Apply ApplyZapNaive ([a -> a], [t]) r) => r

([negate], [1]) has polymorphic type, over two different type variables.
Haskell type-checker tries to satisfy the Apply ApplyZapNaive ([a -> a], [t]) r constraint, but there is no instance that matches this pattern - the only instance that matches ApplyZapNaive, has the same types in both lists, which is too specific.

The usual solution is to use local functional dependencies, that allow to use more general patterns in the instance head (which is the only thing taken under consideration when choosing instances), and after the instance is chosen, force the required equalities, which "improves" the types, or if they're different, there's a type error. I'll use equality constraints for this purpose, they are very similar to TypeCasts, but they look better (almost like "="), and they don't require any functions at the value level.

> data ApplyZap = ApplyZap
> instance (a ~ [x->y], b ~ [x]) => Apply ApplyZap (a,b) [y] where
> apply _ = uncurry zap

This works as expected:

*Part1> :t apply ApplyZap ([negate], [1])
apply ApplyZap ([negate], [1]) :: (Num a) => [a]
*Part1> apply ApplyZap ([negate], [1])
[-1]

Now, we're ready to define the first of the two lines, that form the solution to the main problem (notice that there's no need for any type signatures):

> uncurriedZipWithN f = hFoldl ApplyZap (repeat f)
> test3 = uncurriedZipWithN (,,) ([1..] `HCons` ("hi" `HCons` ("world" `HCons` HNil)))
*Part1> :t test3
test3 :: (Enum a, Num a) => [(a, Char, Char)]
*Part1> test3
[(1,'h','w'),(2,'i','o')]

It works, but it looks similar (or even worse, thanks to the left-associative HCons) to the original, obfuscated version. But we're not done yet!

Now, let's develop some of those "bigger pieces" for creating pretty code. Let's start with a function that will count the number of arguments of another function. It's clear, that it has to be a class method. Don't mind the HNat constraints in contexts, I use Peano numbers from HList, and they use this constraint. It's not needed for anything, it's just a "comment", that helps writing code in a dynamically typed language - Haskell's type system.

> class HNat result => Arity x result | x -> result where
> arity :: x -> result

Base case for non-functions:

> instance (result ~ HZero, HNat result) => Arity x result where
> arity _ = hZero

Recursive case for function arguments. We create dummy, undefined value, force it to match the type of this functions arguments to get a dummy value of the resulting type, and use it (actually, only it's type) to call arity recursively, and finally increment it.

> instance (Arity y n, result ~ (HSucc n), HNat result) => Arity (x -> y) result where
> arity f = let x = undefined
> y = f x
> in hSucc $ arity y

If you're wondering why the type of the result is bound with equality constraint, instead of being used directly in the instance head, it's because it wouldn't compile. There would be an error, because functional dependencies don't play along with overlapping instances. The solution again relies on TypeCasts, and once again I choose equality constraints.

If you're familiar with Oleg's papers/code (e.g. the previous link), you know that Oleg doesn't like overlapping instances, and tries to limit their use to writing overlapping type predicates (like IsFunction), that are later combined with the two class trick, one of which is a wrapper class (and could be substituted with a function), and the other dispatches on the type of "flag" argument. I decided against this style, because overlapping instances extension is still needed. Moreover, the wrapper class, with a single instance, gets inlined by the type simplifier at every call place, which leads to longer (additional call to that type predicate) type signatures, that have an extra type variable (flag). When using 2 or 3 complicated functions, inferred types get filled with unneeded info pretty quickly.

> arityTest1 = arity map
> arityTest2 = arity (,,,,)
> arityTest3 = arity (+)
*Part1> :t arityTest1
arityTest1 :: HSucc (HSucc HZero)
*Part1> :t arityTest2
arityTest2 :: HSucc (HSucc (HSucc (HSucc (HSucc HZero))))
*Part1> :t arityTest3
arityTest3 :: (Arity a n, Num a) => HSucc (HSucc n)

It works correctly in the first two cases, but fails to count the arity of (+). The difference between functions like map and (+) is the type of the result. The recursive case is correct, it did what it could when checking (+) arity, we know that it is greater or equal to two. But, at the end, when the type-checker has to decide the type of the result, which is just a type variable, it doesn't know which instance to choose, because they both match. So the type-checker decides to be lazy and wait until it knows something more about this type variable. There is no problem with map or tuple constructor, because the result type is known to not unify with the arrow type, so the more general, base case can be chosen safely.

The next type function is ResultType, it returns the type of the result. It uses the same pattern of recursion as Arity, and the same issues apply. This must only be used at the type-level, because its value has to be bottom. You wouldn't want to live in a world, where you could implement this as a total function (unless you're a lawyer).

> class ResultType x result | x -> result where
> resultType :: x -> result
> instance result ~ x => ResultType x result where
> resultType x = x
> instance ResultType y result => ResultType (x -> y) result where
> resultType f = let x = undefined
> y = f x
> in resultType y

END Part1.lhs

BEGIN Part2.lhs

Previously on Lost, oh wait... anyway, those issues with Arity function have to be fixed. How? More abuse and more type system extensions. IncoherentInstances extension is an obscure extension of OverlappingInstances. Whenever there's a situation, that there are two instances, matching some unknown type variable, knocking on the door, instead of cowardly hiding in the kitchen, like OverlappingInstances, it flings the door open and says "IncoherentInstances here, what do you got?". Then, it eagerly chooses the more general one.

IncoherentArity is the exact copy of Arity, but it is defined in a module with IncoherentInstances extension enabled. In theory all the code could be defined in such a module, but that would break many things, and it would be very confusing.

Instead of equality constraints, it uses explicit TypeCasts, because the former don't work so good with IncoherentInstances extension. While it is possible, it requires many explicit type annotations, or else type simplifier simplifies things too early.

> {-# LANGUAGE MultiParamTypeClasses
> , FunctionalDependencies
> , FlexibleInstances
> , UndecidableInstances
> , IncoherentInstances
> , NoMonomorphismRestriction
> #-}

> module Part2 where
> import Data.HList
> import Data.HList.TypeCastGeneric2
> class HNat result => IncoherentArity x result | x -> result where
> incoherentArity :: x -> result
> instance (TypeCast HZero result, HNat result) => IncoherentArity x result where
> incoherentArity _ = typeCast hZero
> instance ( IncoherentArity y n
> , TypeCast (HSucc n) result
> , HNat result) => IncoherentArity (x -> y) result where
> incoherentArity f = let x = undefined
> y = f x
> in typeCast $ hSucc $ incoherentArity y

How does it work?

> arityTest1 = incoherentArity map
> arityTest2 = incoherentArity (+)
> arityTest3 = incoherentArity undefined
*Part2> :t arityTest1
arityTest1 :: HSucc (HSucc HZero)
*Part2> :t arityTest2
arityTest2 :: HSucc (HSucc HZero)
*Part2> :t arityTest3
arityTest3 :: HZero

There is a downside to this extension:

*Part2> :t incoherentArity
incoherentArity :: x -> HZero

Since incoherentArity == (\x -> incoherentArity x), and there are two matching instances for this case, the more general, base case was chosen. It can be very confusing. It is possible to write an expression in a Haskell file that type-checks, but asking about its type in ghci results in a type error.

There's another problem with using it in a sane way. Trying to write a simple alias for it:

> incoherentArityAlias = incoherentArity

Results in a statically chosen instance (but the dreaded MonomorphismRestriction is disabled), that really doesn't care about its argument type:

*Part2> :t incoherentArityAlias (+)
incoherentArityAlias (+) :: HZero

To define correct alias (or any other function that should use it in a normal way), we have to add explicit type signature, which delays the whole process:

> incoherentArityAlias2 :: IncoherentArity x result => x -> result
> incoherentArityAlias2 = incoherentArity
*Part2> :t incoherentArityAlias2 (+)
incoherentArityAlias2 (+) :: HSucc (HSucc HZero)

END Part2.lhs

BEGIN Part3.lhs

> {-# LANGUAGE MultiParamTypeClasses
> , FunctionalDependencies
> , FlexibleInstances
> , UndecidableInstances
> , FlexibleContexts
> , NoMonomorphismRestriction
> #-}

> module Part3 where
> import Data.HList
> import Part1
> import Part2

OK, back to zipWithN problem. We have uncurriedZipWithN available, so it would seem that we're one curry away from the final solution. But is it even possible to write such a magic curry function? The answer is: Yes! Haskell can do that.

We want to write a function curryN, that takes an uncurried function from a tuple/heterogeneous list, and returns a curried version. So how does curry really work?

curry :: ((a, b) -> c) -> a -> b -> c
curry f x y = f (x,y)

It may seem trivial, but most of that triviality comes from the fact, that it is fixed for functions working on a pair. Let's try to disassemble it - take a look at the following version:

curry :: ((a, b) -> c) -> a -> b -> c
curry f = \x y -> let z = (x,y)
in f z

It makes things much more clear. curry does 3 things:

  1. "eats" function arguments
  2. constructs a tuple from them
  3. calls the original function with this tuple as an argument

This can be generalized. First step is to write a function eat, that will fulfil steps 1 and 2. Eat is a type function (with mirroring value level method), that takes a type numeral, and returns a function, that will "eat" that many arguments and construct heterogeneous list from them. As many similar functions from the value level, it will be defined in a more general way, with an accumulator that will carry list of already eaten arguments. Again, just like in Arity case, ignore HNat constraints, compiler can tell you where they are expected.

> class HNat n => Eat n acc result | n acc -> result where
> eat' :: n -> acc -> result

Base case - the list is full and couldn't eat another bite. Return reversed accumulator.

> instance HReverse acc result => Eat HZero acc result where
> eat' _ acc = hReverse acc

Recursive case: how to eat n+1 arguments? Return a function, that eats the first one, and then eats n more, remembering the one just eaten by adding it the accumulator.

> instance Eat n (HCons x acc) result => Eat (HSucc n) acc (x -> result) where
> eat' n acc = \x -> eat' (hPred n) (HCons x acc)
> eat n = eat' n HNil
> eatTest = eat four
> where four = hSucc $ hSucc $ hSucc $ hSucc hZero
*Part3> :t eatTest
eatTest :: x -> x1 -> x2 -> x3 -> HCons x (HCons x1 (HCons x2 (HCons x3 HNil)))

All that's remaining is step 3. curry from Prelude can easily call the original function on the tuple, because it has a value of that tuple, whereas generalized curryN only has a function that will in the end return such tuple. Situation may look pointless, but fear not! All we have to do, is to compose such tuple-producing function with the original, tuple-consuming function. It has to be so called "furthest" composition, and as usual, Oleg already did it. I didn't use it, because that code looks really old (pre-OverlappingInstances era), and it is only defined for some basic, monomorphic types. I also didn't understand all those 5 class variables. I ended up writing version with only 3 class arguments, while it worked correctly, it wasn't too helpful for the type-checker, and couldn't deal when composing very polymorphic functions. Otherwise there were these uncommon problems, that ghc doesn't like the type it inferred (I'll explain later why it was a problem). The final version uses 4 arguments, and thanks to additional functional dependency it helps a lot when disambiguating types. Interesting thing is, that all versions (mine 3 and 4 parameter, and Oleg's 5) had different types, but method definitions were always the same.

MComp class has mcomp' method, that takes two functions of the following types:

f :: a1 -> ... -> cp
g :: cp -> d

where cp is not a function, and returns composed function of type:

f `mcomp` g :: a1 -> ... -> d

Yes, it takes arguments in different order than (.), more like (>>>). It's not a bad thing, after all, we are used to reading (code) from left to right.

> class MComp f cp d result | f cp d -> result where
> mcomp' :: f -> (cp -> d) -> result

Base case is even more base (sub-base?) than Oleg's - when first function isn't even a function, just apply it to the second function.

f :: cp
g :: cp -> d
f `mcomp`g :: d
> instance MComp cp cp result result where
> mcomp' f g = g f

Recursive case. Value level method has the obvious implementation, instance signature is the only one, that would type-check with such method.

> instance MComp rest cp d result => MComp (a -> rest) cp d (a -> result) where
> mcomp' f g = \a -> mcomp' (f a) g

I mentioned additional functional dependency earlier, it couldn't be added to the class, because of that problem that overlapping instances don't play along with functional dependencies. So it was added to extra, wrapper class with a single instance. And classes with a single instance can be rewritten as regular functions. Constraint ResultType f cp helps to disambiguate (part of) the type of the second function.

> mcomp :: (ResultType f cp, MComp f cp d result) => f -> (cp -> d) -> result
> mcomp = mcomp'

MComp has the same problems as the coherent version of Arity - it can't deal with functions (as the first argument) that have type variable as a result type. This works fine:

*Part3> :t (,,,) `mcomp` show
(,,,) `mcomp` show :: (Show a, Show b, Show c, Show d) => a -> b -> c -> d -> [Char]

because (,,,) has return type that doesn't unify with an arrow type (it's a tuple). (+) isn't so lucky:

*Part3> :t (+) `mcomp` show
(+) `mcomp` show :: (Num a, Show cp, ResultType a cp, MComp a cp [Char] result) => a -> a -> result

While it could be solved with IncoherentInstances, there's no need for this - we use if with eat function, that results in a tuple/heterogeneous list, so there's no ambiguity. Here's the definition of curryN' function, it still needs some type number:

> curryN' n f = eat n `mcomp` f

Here's the definition of curryN, that computes the number from the function itself, by generating dummy value of the function's argument type, and counting its length (it's a type level operation, so there are no problems with bottoms). That "case undefined of x ->" trick is needed to make x monomorphic, thanks to copumpkin from #haskell for this.

> curryN f = case undefined of
> x -> let _ = f x
> in curryN' (hLength x) f

curryN cannot be used for the final zipWithN definition, because it's based on computing length of tuple function argument, which is not possible for hFoldl - it works with all lists. But we can reuse curryN' and compute needed number another way. Here's the final solution:

> zipWithN f = curryN' (incoherentArity f) (uncurriedZipWithN f)

If you remember the problems with incoherent functions, we have to delay choosing instance by providing the type signature:

> zipWithN :: ( MComp r1 cp r2 r3
> , ResultType r1 cp
> , Eat r HNil r1
> , IncoherentArity a r
> , HFoldl ApplyZap [a] cp r2)
> => a -> r3

How did I come up with that? I didn't. I substituted incoherentArity for regular, coherent arity in zipWithN's definition, asked ghc to infer the type, copy-pasted it here, and replaced arity/Arity with incoherentArity/IncoherentArity.

The other "incoherent" problem remains unfortunately - asking ghci about zipWithN's type is confusing:

*Part3> :t zipWithN
zipWithN :: a -> [a]

But it stops being confusing when it's applied to the argument. Some tests demonstrating, that it can be used without any boiler-plate, even for functions polymorphic in their result type:

> ones     = zipWithN 1
> succList = zipWithN (+1)
> hiWorld = zipWithN (,,) [1..] "hi" "world"
> fibs = 0 : 1 : zipWithN (+) fibs (tail fibs)
*Part3> :t ones
ones :: (Num t) => [t]
*Part3> ones !! 10
1
*Part3> :t succList
succList :: (Num a) => [a] -> [a]
*Part3> succList [1..5]
[2,3,4,5,6]
*Part3> :t hiWorld
hiWorld :: (Enum a, Num a) => [(a, Char, Char)]
*Part3> hiWorld
[(1,'h','w'),(2,'i','o')]
*Part3> :t fibs
fibs :: (Num y) => [y]
*Part3> fibs !! 10
55

Well, that is all. What Frindler and Indrika thought to be impossible, implemented in two lines of code, that got defunctionalized to 5 lines. Everything based on a simple idea, boiler-plate free, and also there's some general purpose (if you like to abuse Haskell that is) utility functions.

Thanks for reading, comments are welcome.

END Part3.lhs