The power of types compels you

Sunday, November 6, 2011

Virtual Haskell Environment

I am pleased to announce the first public release of the virthualenv tool.

virthualenv is a tool (inspired by Python's virtualenv) to create isolated Haskell environments, similar in purpose to tools like cabal-dev. You can read more about it on hackage, or in README, available on github.

Why did I bother with writing it, when there's already cabal-dev available? Back in July, I decided to start writing in Haskell again, so I picked up one of my projects and tried to build it. But as usual, too many installed packages and it failed. I knew, I needed something to keep separate environments for every project, something like Python's virtualenv. There were two such tools for Haskell: capri and cabal-dev. capri was just a toy. cabal-dev was unusable on two (all of them) of my projects, for two different reasons (I have to mention, that cabal-dev >= 0.8 works fine). But even if cabal-dev would work back then, I hated the way it worked/works. Before I understood why it wouldn't work with my project (no cabal configure flag switches), I had to consult the readme file, seek help on #haskell. Python's virtualenv was simpler, I had to learn two commands and the environment was ready and activated, then I could work as I worked with Python code before using virtualenv. No new commands, no new switches, no manual reading — it just worked. I wanted something like this for Haskell. After reading capri sources I realized, that it was really easy (and cabal-dev was overcomplicated) and I could create a bash script to do it in a couple of hours. And I did. Then I rewrote it in Haskell, made it user-friendly and added new features. The only thing missing was a README file. Unfortunately (for the project, for me it was very fortunate), some things happened and I didn't have the time and/or motivation to finish the project. But I was tired of having so many dead projects and decided to finally release it.

Is it still needed when cabal-dev-0.8 is in a usable state? I think so. Here's why:

* It's simpler than cabal-dev so there's less code.
* It's simpler to use, regular user can create an environment with a single word command, activate it using another, provided command and then can work just like he's used to. "How do I install a package?" — "Use 'cabal install'.". "How do I check what packages are installed?" — "Use 'ghc-pkg list'.". "How do I ...?" — "Like you always did".
* It will be familiar to people coming from Python.
* Because of its nature (manipulating environment vars), you continue to use the same binaries (or wrappers). This means things like bash completion continue to work without any extra setup.
* There's a simple emacs mode included (virthualenv.el). It was trivial to write, even for such an elisp-noob like me. It was simple because of how simple setting two env vars is, there was no need to modify haskell-mode at all. Emacs integration is very important for me, maybe other emacs people are ok with using cabal-dev only from the terminal, but I prefer typechecking my code using simple keyboard shortcut from emacs, with the regular things like jumping to error locations and other goodies. ghci embedded in emacs also works just fine. I don't know what vimers do with Haskell code, but it's probably equally trivial to write a vim plugin for virthualenv (just fork a virtualenv plugin and you'll be 80% done).
* There's a nice feature, not available in cabal-dev — using external GHC from compiled tarball. It's the easiest way (or so I believe) to do things like testing your code with a different version of GHC (even nightly builds!).

Are there any disadvantages to using this over cabal-dev? Unfortunately yes. First, it's a new project, it hasn't been tested by anyone beside me, so there can be bugs (please report them on github). The bigger disadvantage is portability. I've tested it only on i386 Linux and FreeBSD (a little). It should work on every Linux, if it doesn't, I can probably fix it. Other unices like *BSD, Solaris and whatever else is there should also work, but I haven't tested it, due to no available machines with those OSes. It could work on MacOS X using system's GHC (already installed one) (at least I think so, since it's derived from BSD). MacOS X and GHC from tarball will probably not work, since it doesn't use regular tarballs, and I have no idea how those .dmg files work. As usual, Windows works completely different, so it would need some research and probably a lot of new code to make it work there.

What will I do with this project? Currently, it satisfies all my needs, so I can focus on other projects. But, if other people (it means you!) start to use it, I will try to fix all bugs, develop some new features (I have a few ideas on github, but you can tell me yours as well) and work on porting it to MacOS X and Windows (I'd need access to such machines though).

Monday, July 12, 2010

Two-Dimensional Analog Literals in Haskell

> module Intro where

Hello! If you were wondering what happened to me and why I've stopped posting, well, I was found guilty on type abuse charges and was sentenced to three months in maximum type security prison of Coq. Since I'm such a degenerate, I've loved every minute of it. And there were plenty of things to abuse on the inside. Now, I've been released, but I can't believe how fast things move on the outside. I saw an automobile once when I was a kid, but now they're everywhere. And SPJ thinks that OverlappingInstances are unsound. It's been a while, since I made a new post, so I decided, that because of recent events I could make a new post.

Over a year ago, this post was all the rage on reddit. The original implementation used C++ and redditors contributed code in such godless languages like Python and Ruby. There was no Haskell solution and we can't have that, can we? I'm going to present two solutions, to prove that it's possible in ML-like languages, and by using Haskell extensions, it's possible to have some advantages over the original implementation. I've limited myself only to rectangles, because lines are boring and cuboids are too hard to even "draw". Here's an example for the impatient:

rectangle = begin
              o ᜭ ᜭ ᜭ ᜭ o
              ǀ         ǀ
              ǀ         ǀ
              o ᜭ ᜭ ᜭ ᜭ o
            end

If it looks broken, it's because it uses two unusual unicode characters (like "LATIN LETTER DENTAL CLICK"), if your browser or font (or my mad web skills) can't handle it, here's a picture.

What's so hard about Haskell implementation of those literals? In languages with ML-like syntax, there can't be two operators next to each other and there are no (usable) postfix operators, so the original idea of overloading decrement and subtraction operators fails already on the syntax level. Unless we want to alternate glyphs between operators and identifiers, which seems a bit against the spirit of this hack, it's clear that the solution has to rely on a sequence of single-char combinators, delimited with spaces (otherwise it would be just one token). It helps if those combinators use letters, that look like operators (it's possible in GHC).

> module PostCombinator(post) where

Before we start, let's talk about this programming style, that I like to call applicative pointfree/pointless. Here's couple functions to calculate reversed list of odd squares:

> square x       = x*x
> odds           = filter odd
> squares        = map square
> oddSquares1 xs = reverse (odds (squares xs))
> oddSquares2 xs = reverse $ odds $ squares xs
> oddSquares3    = reverse . odds . squares

It's composed of three sub-programs in a good functional style. Category-theory guys would tell you, that the third one is pointfree, but you don't have to be a rocket-scientist to notice, that it's the only solution actually containing points. But don't tell that to CT people, they will answer "does not commute!" and their head will explode. The common thing about those functions is this: sub-expressions are delimited or explicitly composed (be it with '.', '$' or opening paren). It would be nice, to be able to write only function names, without any boiler-plate in between, and make it magically combine itself.

I've discovered a way to achieve this goal. It's really beautiful, it's based on very smart people results, there are continuations involved and it has interesting connections with concatenative languages like Factor - all the ingredients of a great paper. Unfortunately (for me), Chris Okasaki already took care of it - Techniques for Embedding Postfix Languages in Haskell. I encourage you to read it yourself, but here's a quick roundup of this technique.

We want a function odds', that will be able to use directly the immediately following sub-program, without any composition means. It must take two arguments: x - the usual input and k - the continuation (Okasaki calls it a partial continuation) that is, the next expression. The following order of those arguments is the easier one:

odds' x k = ...

On the right hand side, the obvious thing to do with x is to filter odd numbers from x:

odds' x k = ...
    where y = filter odd x

Now, we have an output of this sub-computation, the only sensible thing to do is to apply it to the next computation:

> odds' :: [Int] -> ([Int] -> t) -> t
> odds' x k = k y
>     where y = filter odd x

> squares', doubles' :: [Int] -> ([Int] -> t) -> t
> squares' x k = k y
>     where y = map square x

> doubles' x k = k $ map (*2) x

Now we can compose these expressions:

*PostCombinator> :t odds' [1..10]
odds' [1..10] :: ([Int] -> t) -> t
*PostCombinator> :t odds' [1..10] squares'
odds' [1..10] squares' :: ([Int] -> t) -> t
*PostCombinator> :t odds' [1..10] squares' doubles'
odds' [1..10] squares' doubles' :: ([Int] -> t) -> t

In order to not make special case of the first part, by applying to it the initial state, that gets transformed by the whole pipeline, we can add special combinator, that will apply that initial state to the first case:

> begin x k = k x

We finish the pipeline computation by adding the final continuation, e.g. id, but any other function would be fine too.

*PostCombinator> (begin [1..10]) odds' squares' doubles' id
[2,18,50,98,162]

I call this style applicative pointfree, because composition is based on application and there are no points, or any other delimiters.

We can abstract the pattern from odds', squares' and doubles' functions. Okasaki calls this combinator post:

> post :: (a -> b) -> a -> (b -> t) -> t
> post f = \x k -> k $ f x

Explicit usage of post, while possible, doesn't make much sense, because you still have to use parens, it's better suited for creating wrappers around simple functions.

*PostCombinator> begin [1..10] (post$ filter odd) (post$ map square) (post$ map (*2)) reverse
[162,98,50,18,2]

If you want to learn more about this style, you should read Techniques for Embedding Postfix Languages in Haskell, Flattening combinators: surviving without parentheses and the classic Functional Unparsing wouldn't hurt either, because it lays the foundations. But be careful, you don't want to start coding in Factor.

> module DynamicLiterals where

The first solution is very simple, it should work in any ML-like language (I've tested it in Haskell and OCaml), because it only requires Hindley-Milner type system. If the implementation doesn't support unicode identifiers, rectangles have to be drawn with other characters though. I call it dynamic solution, because though it does work for correct rectangles, it produces wrong results (at runtime) for wrong combinator sequences, that don't represent rectangles.

> import PostCombinator

There are three different parts of a rectangle and each gets a combinator. These combinators will transform the rectangle state, consisting of width, current width and height (shortened to w,cw and h respectively):

> type RectState = (Int, Int, Int)

> corner, dash, bar :: RectState -> RectState

> dash   (w, cw, h) = (w, cw + 1, h)
> bar    (w, cw, h) = (w, cw, h + 1)
> corner (w, cw, h) = (w `max` cw, 0, h)

dash and bar are obvious, corner takes maximum of width and current width to account for transition from sequence of bars (that didn't touch current width) to corner.

We wrap these functions with our post combinator:

> c, d, b :: RectState -> (RectState -> t) -> t
> c = post corner
> d = post dash
> b = post bar

All that's left are definitions of the first combinator, that starts the process, and the final continuation:

> beginRect :: (RectState -> t) -> t
> beginRect k = k (0, 0, 0)

> endRect :: RectState -> (Int, Int)
> endRect (w, cw, h) = (w, h `div` 2)

Now we can draw pretty rectangles:

> rect = beginRect
>          c d d d d c
>          b         b
>          b         b
>          b         b
>          c d d d d c
>        endRect

*DynamicLiterals> rect
(4,3)

Unfortunately, since there are no requirements on these combinators, they will be perfectly happy to transform any sequence of rectangle states:

> lambda = beginRect
>              d
>            b  d
>                b
>               d d
>              d   d b
>             d     c
>          endRect

*DynamicLiterals> lambda
(7,1)

It's worth mentioning that this solution (and the next one) doesn't scale - the types are huge, in the smallest proper rectangle (1x1), the first corner combinator c has a type, that takes over 250 lines when printed by ghci. This makes type checking extremely slow for bigger examples. Memory usage varies by implementation, ghc for 4x4 rectangle needs 500 mb of memory (I don't have a machine capable of checking 5x5), while OCaml still takes a long time to compute the type, but it doesn't use any significant amount of memory.

Now it's time for the second solution. This is Haskell after all, find a problem and make it statically impossible to violate. There will be also a few advantages over original C++ implementation.

While we can't forbid drawing rectangles like the lambda letter, because there's no access to the lexer at the type level (but can you imagine the possibilities?), we can forbid usage of incorrect sequences (e.g. dash following bar). This of course needs a bit of type level programming, but there's nothing really hacky and abusive.

> {-# LANGUAGE NoMonomorphismRestriction
>   , OverlappingInstances
>   , FlexibleInstances
>   , FlexibleContexts
>   , UndecidableInstances
>   , ScopedTypeVariables
>   , MultiParamTypeClasses
>   , FunctionalDependencies
>   , EmptyDataDecls
>   #-}

Base implementation only uses MPTCs+FunDeps (and UndecidableInstances, but I'm pretty sure this could be solved without them), so it's probably doable with all those new hip TypeFamilies. OverlappingInstances and UndecidableInstances are needed for nice error messages though, but don't worry, any instance choice, that requires UndecidableInstances will result in a type error - but a pretty one!

> module StaticLiterals where

> import Prelude hiding (Either(..))
> import Peano

Let's take a look at the type of the dash combinator from the previous solution:

d :: RectState -> (RectState -> t) -> t

One of the situations, we wish to forbid, is having bottom side of different length than the top side. This clearly requires tracking rectangle's width at the type level. Since we have to carry the rectangle state through types, there's no need to carry it at the value level, so the RectState argument isn't needed anymore. So, we're left with (t) -> t, and we have to attach type-level equivalent of RectState.

The first t is the continuation. The problem is, that it takes any continuation, so dash followed by bar type checks just fine. Since all three combinator have the same type, we have to do something to be able to distinguish between them. We're going to wrap every continuation type in a newtype wrapper with a phantom type tag, telling what kind of combinator this is.

> data Corner
> data Dash
> data Bar
> data End

Besides width, current width and height we have to carry a position in the rectangle, because e.g. dash can follow the first corner, but it cannot follow the second. Our type level state will be a 4-tuple of 3 Peano numbers and a position.

> data Top
> data Bottom
> data Left
> data Right
> data TopLeft
> data TopRight
> data BottomLeft
> data BottomRight

Our continuation wrapper:

> newtype TagCont tag rectState k = TagCont { unTagCont :: k }

Now it's time for some type level computations, this is the only class (unfortunately, /dev/meaningfull_names run out of entropy). This class contains the combinator, that will work like all three, because interesting things will happen only at the type level. This class is parameterized by the tag of the combinator, the tag of the following combinator, the type level rectangle state input, and those arguments determine the resulting state.

> class Rectangly tag nextTag inpState outState | tag nextTag inpState -> outState where

The r combinator is still of that t -> t type, but that function is tagged with the combinator tag (e.g. Dash), and the inner (left) t is tagged with the tag of the next combinator. State of one combinator determines (together with tags) the state of the following (next) one.

>     r :: TagCont tag inpState (TagCont nextTag outState t -> t)

At the value level, all combinators work the same, just tag untagging function, so if we disregard newtypes with phantom arguments, this is just identity function, where everything interesting is happening at the type level.

>     r = TagCont unTagCont

All the instances follow the same pattern: dispatching on the rectangle part, tag of the current combinator, tag of the next combinator, accepting any width, current width and height, and computing from that resulting values and next rectangle part.

This one says, that after Corner we can use Bar, but only in the TopRight part (it's certainly not true at any other corner). Current width (calculated by sequence of dashes) becomes our new width, we reset current width and set the rectangle part to Left.

> instance Rectangly Corner Bar (w, cw, h, TopRight) (cw, Z, h, Left)

The rest is very similar:

> instance Rectangly Corner Dash (w, cw, h, TopLeft) (w, cw, h, Top)
> instance Rectangly Corner Dash (w, cw, h, BottomLeft) (w, w, h, Bottom)
> instance Rectangly Corner End (w, cw, h, BottomRight) (w, cw, h, Bottom)
> instance Rectangly Dash Dash (w, cw, h, Top) (w, (S cw), h, Top)
> instance Rectangly Dash Corner(w, cw, h, Top) (w, (S cw), h, TopRight)
> instance Rectangly Dash Dash (w, (S cw), h, Bottom) (w, cw, h, Bottom)
> instance Rectangly Dash Corner(w, (S Z), h, Bottom) (w, cw, h, BottomRight)
> instance Rectangly Bar Bar (w, cw, h, Left) (w, cw, h, Right)
> instance Rectangly Bar Bar (w, cw, h, Right) (w, cw, (S h), Left)
> instance Rectangly Bar Corner(w, cw, h, Right) (w, cw, (S h), BottomLeft)

Of course, it's not possible to build rectangles with a single combinator (without explicit type sigs everywhere), because the tag of the combinator is polymorphic so there would be an overlap. Binding these three combinators to the r requires specializing the type to the specific tag:

> o :: Rectangly Corner nextTag inpState outState => TagCont Corner inpState (TagCont nextTag outState t -> t)
> o = r

> ᜭ :: Rectangly Dash nextTag inpState outState => TagCont Dash inpState (TagCont nextTag outState t -> t)
> ᜭ = r

> ǀ :: Rectangly Bar nextTag inpState outState => TagCont Bar inpState (TagCont nextTag outState t -> t)
> ǀ = r

All that's left is begin for starting the process and end - the final continuation.

> begin :: TagCont Corner (Z, Z, Z, TopLeft) t -> t
> begin = unTagCont

> data Rectangle w h = Rectangle Int Int deriving Show
> area (Rectangle w h) = w * h

end calculates the rectangle with phantom types set to its Peano dimensions, and corresponding integers at the value level.

> end :: forall w cw h s. (NatToInt w, NatToInt h) => TagCont End (w, cw, h, s) (Rectangle w h)
> end = TagCont $ Rectangle (natToInt (undefined :: w)) (natToInt (undefined :: h))

This is enough to define the following rectangle:

> rectangle = begin
>               o ᜭ ᜭ ᜭ ᜭ o
>               ǀ         ǀ
>               ǀ         ǀ
>               o ᜭ ᜭ ᜭ ᜭ o
>             end

The first advantage over the original implementation is the correctness, it's probably a bug, but C++ version doesn't detect the odd number of bars, and what's worse, it calculates them incorrectly in such a case:

  unsigned int r1 = ( o-----o
                      |     !
                      !  !  !
                      !     !
                      o-----o ).area;

  unsigned int r2 = ( o-----o
                      |     !
                      !     !
                      !     !
                      o-----o ).area;

  assert (r1 == r2);

Another advantage is what the author calls storing these literals directly in a variable, there's no need to explicitly provide a type, it is inferred automatically:

> r2 = begin
>        o ᜭ ᜭ ᜭ o
>        ǀ       ǀ
>        o ᜭ ᜭ ᜭ o
>      end

*StaticLiterals> :t r2
r2 :: Rectangle (S (S (S Z))) (S Z)

> foo = let r = begin
>                 o ᜭ ᜭ ᜭ o
>                 ǀ       ǀ
>                 o ᜭ ᜭ ᜭ o
>               end
>       in print r

*StaticLiterals> foo
Rectangle 3 1

What about errors? They're still impossible, types inferred have contexts that are impossible to fulfill, because there are no such instances. But we can do better! By using Oleg's trick with Fail class we can force ghc to return type errors of our choice.

There's this new paper Errors for the Common Man about debugging type errors, but I don't like it:

they aren't aware of any solutions to this problem, though they mention HList paper (Fail class trick comes from that paper)
they talk about generalized state monad, again without mentioning Monadish
doing research in the area of Haskell type hacks and not being familiar with Oleg's work is a sin
using state monad to simulate exceptions is bad.
their idea imposes different type level coding style
nice error messages require using different api
no location provided for the errors

Fail class trick solves all of these problems. We provide additional instances to Rectangly class, with a fake Fail dependency with a specially crafted argument. Since Fail doesn't have any instances, it results in a type error, but this error mentions our argument and correct location.

> class Fail x

We need some types for pretty error messages:

> data Expected x y z
> data Either x or z
> data Or
> data ButGot
> data BottomLineToo x
> data Short
> data Long

These instances complement the previous set, trying to make Rectangly a total function, so there will be no errors like "no instance for Rectangly ....", because in these situations, it will match one of these faily instances and reduce to missing Fail instance with a nice error message.

> instance Fail (Expected Dash ButGot tag) =>
>     Rectangly Corner tag (w, cw, h, TopLeft) (w', cw', h', s')
> instance Fail (Expected Bar ButGot tag) =>
>     Rectangly Corner tag (w, cw, h, TopRight) (w', cw', h', s')
> instance Fail (Expected Dash ButGot tag) =>
>     Rectangly Corner tag (w, cw, h, BottomLeft) (w', cw', h', s')
> instance Fail (Expected End ButGot tag) =>
>     Rectangly Corner tag (w, cw, h, BottomRight) (w', cw', h', s')
> instance Fail (Expected (Either Dash Or Corner) ButGot tag) =>
>     Rectangly Dash tag (w, cw, h, Top) (w', cw', h', s')
> instance Fail (BottomLineToo Short) =>
>     Rectangly Dash Corner(w, (S (S n)), h, Bottom) (w', cw', h', s')
> instance Fail (BottomLineToo Long) =>
>     Rectangly Dash Corner(w, Z, h, Bottom) (w', cw', h', s')
> instance Fail (Expected (Either Dash Or Corner) ButGot tag) =>
>     Rectangly Dash tag (w, cw, h, Bottom) (w', cw', h', s')
> instance Fail (Expected Bar ButGot tag) =>
>     Rectangly Bar tag (w, cw, h, Left) (w', cw', h', s')
> instance Fail (Expected (Either Bar Or Corner) ButGot tag) =>
>     Rectangly Bar tag (w, cw, h, Right) (w', cw', h', s')

> module StaticLiteralsErrors where

> import StaticLiterals

Let's compare quality of error messages between C++ and Haskell version:

  unsigned int r1 = ( o-----o
                      |     -
                      o-----o ).area;

paczesiowa@laptop /tmp $ g++ tutorial.cpp -o tutorial && ./tutorial
tutorial.cpp: In function ‘int main()’:
tutorial.cpp:34: error: no match for ‘operator-’ in ‘-analog_literals::operator--((analog_literals::
line_end)0u, 0).analog_literals::dashes::operator-- [with T = analog_literals::line_end, unsig
ned int n = 1u](0)’
analogliterals.hpp:72: note: candidates are: analog_literals::line<0u> analog_literals::operator-(an
alog_literals::line_end, analog_literals::line_end)

> r1 = area $ begin
>               o ᜭ ᜭ o
>               ǀ     ᜭ
>               o ᜭ ᜭ o
>             end

No instance for (Fail (Expected Bar ButGot Dash))
  arising from a use of `ǀ'
               at /home/paczesiowa/blog/04-literals/StaticLiteralsErrors.lhs:23:16
Possible fix:
  add an instance declaration for (Fail (Expected Bar ButGot Dash))
In the fifth argument of `begin', namely `ǀ'
In the second argument of `($)', namely
    `begin o ᜭ ᜭ o ǀ ᜭ o ᜭ ᜭ o end'
In the expression: area $ begin o ᜭ ᜭ o ǀ ᜭ o ᜭ ᜭ o end

Error message with the reason can be easily spotted, there is a location (with a column number!) provided, and even number of the wrong combinator ("fifth argument of begin"), shifted one to the left, though. That fix hint doesn't seem very helpful, but I'm sure that throwing an ascii-art of Clippy there would help a lot.

Another example:

  unsigned int r1 = ( o-----o
                      |     !
                      o---o ).area;

tutorial.cpp: In function ‘int main()’:
tutorial.cpp:34: error: no match for ‘operator|’ in ‘analog_literals::operator-
[with unsigned int x
= 2u]((analog_literals::operator--((analog_literals::line_end)0u, 0).
analog_literals::dashes<T, n>:
:operator-- [with T = analog_literals::line_end, unsigned int n = 1u](0),
analog_literals::dashes<an
alog_literals::line_end, 2u>()), (analog_literals::line_end)0u) |
analog_literals::operator- [with u
nsigned int excl_marks = 1u, unsigned int x = 1u]((((analog_literals::
excls<analog_literals::dashes<
analog_literals::line_end, 1u>, 0u>*)(& analog_literals::operator--((
analog_literals::line_end)0u, 0
)))->analog_literals::excls<T, n>::operator! [with T =
analog_literals::dashes<analog_literals::line
_end, 1u>, unsigned int n = 0u](), analog_literals::excls<analog_literals::
dashes<analog_literals::l
ine_end, 1u>, 1u>()), (analog_literals::line_end)0u)’

> r2 = area $ begin
>               o ᜭ ᜭ o
>               ǀ     ǀ
>               o ᜭ o
>             end

No instance for (Fail (BottomLineToo Short))
  arising from a use of `ᜭ'
               at /home/paczesiowa/blog/04-literals/StaticLiteralsErrors.lhs:61:18
Possible fix:
  add an instance declaration for (Fail (BottomLineToo Short))
In the 8th argument of `begin', namely `ᜭ'

That's it, the code is available here. Thanks for reading, comments are welcome.

Wednesday, March 17, 2010

Polyvariadic PrimeFib problem

begin Numbers.lhs

Some time ago, I claimed in a reddit comment, that it's possible to write a function that accepts only prime number of arguments, that are all strings, but occurrences of integers at the indices that are Fibonacci numbers. Sounds cool and a little bit retarded, doesn't it? kalven asked about it, so now I have to prove, that I wasn't bluffing.

But why would I even say such a thing? Well, it started with this post about polyvariadic functions in Haskell. The author claimed, that most modern languages have support for functions taking variable number of arguments. Let's take a look at Java: this "support" consists of hidden array creation and passing that array as a single argument. Other languages work in a similar way, with the exception of C, where everything is allocated on the stack, and it works thanks to grit, spit and a whole lotta duct tape. This approach isn't as glamorous as it sounds, and there are disadvantages. All arguments have to be of the same type, even in dynamically typed languages - they only have one universal type. To use arguments that have different types, they have to be upcasted at the call site (e.g. to Object in Java) and later downcasted in such vararg function. It is unsafe and forces all arguments to have the same size or use autoboxing. In Haskell, thanks to better syntax and boiler-plat free list creation, it's easy to pass many arguments to a function, just pass a list. Compare:

Apologies for all those underlined words, it seems that my blog software doesn't like certain bad words.

class Test {

 public static void main(String[] args) {
  int x = foo(1, 2, 3, 4);
  System.out.println(x);
 }

 static int foo(Object... args) {
  return args.length;
 }
}

With this:

main = do {
     x <- foo[1, 2, 3, 4];
     print x;
}

foo args = do {
    return (length args);
}

The kind of parens is the only difference in syntax or semantics (of vararg function call). Of course, being of the same type means something different in Java (sub-typing), than in Haskell, but that's not the point of this post. The point is, that Haskell is the only usable language^*, that truly allows polyvariadic functions, because it's possible to write functions that take variable number of variably typed arguments, which is not possible to fake with a hidden list wrapping. As usual, now is the time to write as usual, Oleg already did it. Some polyvariadic functions are easy to implement (e.g. one that takes even number of arguments, with alternating integer and string arguments, do try to implement it), some could actually be useful.

* usable language is one, that makes it possible to download pics from the internet.

Others are just a useless, contrived example, but can prove that anything is possible in Haskell. The problem with the function of prime number of arguments, where Fibonacci indices are integers, while the rest are strings has been solved by cdsmith. Nevertheless, I want to present another two solutions - one is easier to implement, but full of type hacks, and the other needed some thinking specific to this problem, but works much better - no boiler-plate required. Solutions are independent, so don't get discouraged by the first one, you can still be able to enjoy the second.

This post, like the previous one, needs ghc-6.12.2, 6.10 or 6.12.1 with hacked HList library. It's splitted across 4 files for modular reasons, the code is available at the blog repo.

Both solutions rely on prime and Fibonacci numbers, so we need type predicates for deciding if the number is prime/fib. I'm lazy and I didn't feel like translating regular algorithms to the type level, so I've assumed, that there are no prime/fib numbers bigger then 20, that way, it's possible to enumerate the positives, and choose all the other numbers as negatives.

Both predicates are similar, type-level function, that results in HTrue/HFalse, method has default bottom definition, because it will be only used at the type-level. Implementation uses classic trick, but as usual, I prefer equality constraints to TypeCasts wherever possible.

> {-# LANGUAGE MultiParamTypeClasses
>   , FunctionalDependencies
>   , FlexibleInstances
>   , TypeFamilies
>   , UndecidableInstances
>   , TypeSynonymInstances
>   , OverlappingInstances
>   #-}

> module Numbers where

> import Data.HList

> type S n = HSucc n
> type Z = HZero

> class IsPrime n result | n -> result where
>     isPrime :: n -> result
>     isPrime = undefined

> instance IsPrime(S(S Z)) HTrue
> instance IsPrime(S(S(S Z))) HTrue
> instance IsPrime(S(S(S(S(S Z))))) HTrue
> instance IsPrime(S(S(S(S(S(S(S Z))))))) HTrue
> instance IsPrime(S(S(S(S(S(S(S(S(S(S(S Z))))))))))) HTrue
> instance IsPrime(S(S(S(S(S(S(S(S(S(S(S(S(S Z))))))))))))) HTrue
> instance IsPrime(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S Z))))))))))))))))) HTrue
> instance IsPrime(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S(S Z))))))))))))))))))) HTrue
> instance result ~ HFalse => IsPrime a result

> class IsFib n result | n -> result where
>     isFib :: n -> result
>     isFib = undefined

> instance IsFib(Z) HTrue
> instance IsFib(S(Z)) HTrue
> instance IsFib(S(S Z)) HTrue
> instance IsFib(S(S(S Z))) HTrue
> instance IsFib(S(S(S(S(S Z))))) HTrue
> instance IsFib(S(S(S(S(S(S(S(S Z)))))))) HTrue
> instance IsFib(S(S(S(S(S(S(S(S(S(S(S(S(S Z))))))))))))) HTrue
> instance result ~ HFalse => IsFib n result

begin HListExtras.lhs

Before we get to the first solution, we need two list functions, that are missing from HList library. hPartition and hAll are type-level equivalents of partition and all functions from Data.List module. The implementation follows closely the style of the other, higher-order functions from HList, so there aren't any explanations, you can always read the excellent HList paper. If you want more explanations, please ask in comments/email/irc, I could explain it some more if there's a need.

> {-# LANGUAGE MultiParamTypeClasses
>   , FunctionalDependencies
>   , FlexibleInstances
>   , UndecidableInstances
>   #-}

> module HListExtras where

> import Data.HList

> class HPartition p l r1 r2 | p l -> r1 r2 where
>     hPartition :: p -> l -> (r1,r2)

> class HPartition' flag p l r1 r2 | flag p l -> r1 r2 where
>     hPartition' :: flag -> p -> l -> (r1,r2)

> instance HPartition p HNil HNil HNil where
>     hPartition _ _ = (HNil, HNil)

> instance ( Apply p x flag
>          , HPartition' flag p (HCons x xs) r1 r2
>          ) => HPartition p (HCons x xs) r1 r2 where
>     hPartition p l@(HCons x _) = hPartition' (apply p x) p l

> instance HPartition p xs r1 r2 => HPartition' HTrue p (HCons x xs) (HCons x r1) r2 where
>     hPartition' _ p (HCons x xs) = let (r1, r2) = hPartition p xs
>                                    in (HCons x r1, r2)

> instance HPartition p xs r1 r2 => HPartition' HFalse p (HCons x xs) r1 (HCons x r2) where
>     hPartition' _ p (HCons x xs) = let (r1, r2) = hPartition p xs
>                                    in (r1, HCons x r2)

> class HAll p l r | p l -> r where
>     hAll :: p -> l -> r

> instance HAll p HNil HTrue where
>     hAll _ HNil = hTrue

> instance (Apply p x b1, HAll p xs b2, HAnd b1 b2 b) => HAll p (HCons x xs) b where
>     hAll p (HCons x xs) = apply p x `hAnd` hAll p xs

begin Solution1.lhs

Before we can write a function that meets the specification, we have to decide what this function should actually do:) It's funny, if a type-blind person would read the problem, the only conclusion would be, that it's possible to write a function in Haskell - truly amazing. The easiest thing to do (beside ignoring arguments and returning unit value), is to return a pair of lists, one will collect integers, and the other strings.

So what is the first solution about? It's based on a simple idea, we start with a function that accepts any number of integer and string arguments, in any configuration, and then we write a constrained wrapper.

> {-# LANGUAGE MultiParamTypeClasses
>   , FunctionalDependencies
>   , FlexibleInstances
>   , UndecidableInstances
>   , TypeFamilies
>   , TypeSynonymInstances
>   , NoMonomorphismRestriction
>   , OverlappingInstances
>   #-}

> module Solution1 where

> import Data.HList hiding (TypeEq, typeEq)
> import Data.HList.TypeEqGeneric2
> import HListExtras
> import Numbers

> type X = ([Int], [String])

Function foo is defined inductively, very similar to c function from introduction of Oleg's polyvariadic page, nothing fancy here:

> class Foo result where
>     foo :: X -> result

> instance Foo X where
>     foo (ints,strings) = (reverse ints, reverse strings)

> instance Foo r => Foo (Int -> r) where
>     foo (ints,strings) = \k -> foo (k:ints,strings)

> instance Foo r => Foo (String -> r) where
>     foo (ints,strings) = \s -> foo (ints,s:strings)

> testFoo = foo ([],[]) (1::Int) "2" "3" (4::Int)

We have to specify Int types of every argument, because otherwise it's just a type variable. Strings are already monomorphic. When we decide that foo has enough, we force it to be of final X type.

*Solution1> :t testFoo
testFoo :: (Foo t) => t
*Solution1> testFoo :: X
([1,4],["2","3"])

Now, if we could get a hold of the final type of particular usage of foo function, extract from that the list of argument types, it would be very easy to verify correct usage.

Turning type of function into a heterogeneous list of types (values are bottoms) of arguments is performed by the following, type-level function. It's very similar to ResultType from my previous post, but it returns every type but the last one, which is the type of the result. It relies on the same trick to overcome problems with functional dependencies and overlapping instances.

> class FunctionArguments f r | f -> r where
>     functionArguments :: f -> r

Base instance, if the type is not a function, return empty list, we don't care about result type:

> instance result ~ HNil => FunctionArguments x result where
>     functionArguments _ = HNil

Recursive case - if argument is of arrow type, put the argument into list head and recurse down the rest of the type. Value of this type is of course bottom, it's not possible to come up with anything meaningful, but it's not needed fortunately, we only care about the types. Similarly, the rest of the type is a result of applying dummy bottom value to our input function.

> instance ( FunctionArguments rest others
>          , result ~ (HCons x others)
>          ) => FunctionArguments (x -> rest) result where
>     functionArguments f = HCons undefined $ functionArguments $ f undefined

In Haskell it's easy to get a hold of value of the result type, and use it inside its definition, thanks to the following trick:

primeFib = let r = foo ([],[])
               _ = ....
           in r

In that dotted block, we can operate on the final value. The problem with let-expressions is, that they are polymorphic, and this together with other very polymorphic, type-level code can result in ambiguities, that can be resolved only with explicit type signatures and usage of ScopedTypeVariables. But, I don't want to write down that type, it's very complicated (over 10 lines!), because it records the whole algorithm. Fortunately, we can choose the case-expression, which has monomorphic bindings, and let ghc figure out the type:

> primeFib = case foo ([],[]) of
>                r -> let _ = checkList $ functionArguments r
>                     in r

It's easy to see, that this block doesn't do anything, but type-checker still takes it under consideration, so it's possible for it to constrain the resulting type. The only thing left, is to define checkList function, that will do all those calculations.

There's one more type-level (defined in terms of classes) piece of code needed: equivalent of zip [1..] value-level function, that will pair every element of the list with its index, which will allow to check its primality and fibonality.

NumberList takes a counter, that represents the first free index number, list, and returns list of numbered pairs:

> class NumberList n l result | n l -> result where
>     numberList' :: n -> l -> result

Base case is trivial, there's nothing to number, return empty list:

> instance NumberList n HNil HNil where
>     numberList' _ _ = HNil

Inductive case, tag the head value of the input list with the current counter, and recurse down the list with incremented counter:

> instance (HNat n, NumberList (S n) xs ys) => NumberList n (HCons x xs) (HCons (n,x) ys) where
>     numberList' n (HCons x xs) = HCons (n,x) $ numberList' (hSucc n) xs

Numbering starts with one:

> numberList = numberList' (hSucc hZero)

checkList function, like any code operating on a list in a functional language, will use higher-order functions, if you remember explanations from my previous post, this requires defunctionalization. There are 4 HOFs used, and their arguments have to be adjusted accordingly:

map snd

> data Snd = Snd
> instance Apply Snd (a, b) b

partition fibIndex partitions list of pairs, according to predicate that checks fibonality of the index.

> data FibIndex = FibIndex
> instance IsFib n r => Apply FibIndex (n,x) r

(and 4.) there's all isInt and all isString, this could be implemented as two different functions, but it's possible to defunctionalize partial application of type-level equivalent of ==, TypeEq. This creates closures, that have to be reifed as data types in normal circumstances, but at the type level we use type parameter, and since we don't really care about value level in these predicates, we don't need any constructors, but I'm not a fan of EmptyDataDecls. This makes Is a phantom type.

> data Is x = Is
> instance TypeEq x y r => Apply (Is x) y r

I think, that checkList's definition doesn't need any explanations, we already have all the hard type-level parts defined, the rest is just functional programming:

> checkList l = intsOk `hAnd` stringsOk `hAnd` isPrime len :: HTrue
>     where (fibIndices, nonFibIndices) = hPartition FibIndex $ numberList l
>           removeIndices = hMap Snd
>           isInt     = Is :: Is Int
>           isString  = Is :: Is String
>           intsOk    = hAll isInt $ removeIndices fibIndices
>           stringsOk = hAll isString $ removeIndices nonFibIndices
>           len = hLength l

> testPrimeFib1 = primeFib (1::Int) (2::Int) (3::Int) "4" (5::Int) -- correct
> testPrimeFib2 = primeFib (1::Int) "2"                            -- string at fib index
> testPrimeFib3 = primeFib (1::Int) (2::Int) (3::Int) "4"          -- not prime number of args

*Solution1> testPrimeFib1 :: X
([1,2,3,5],["4"])
*Solution1> testPrimeFib2 :: X
Top level:
    Couldn't match expected type `HTrue' against inferred type `HFalse'
    When using functional dependencies to combine
      HAnd HFalse HTrue HFalse,
  (...)
*Solution1> testPrimeFib3 :: X
Top level:
    Couldn't match expected type `HTrue' against inferred type `HFalse'

*Solution1> :t testPrimeFib1
testPrimeFib1
  :: (HMap Snd r2 ys2,
      HMap Snd r11 ys1,
      HAnd r r1 t'',
      IsPrime (HSucc (HSucc (HSucc (HSucc (HSucc n))))) result,
      HAnd t'' result HTrue,
      HAnd HTrue b23 r1,
      HAll (Is String) ys2 b23,
      HAnd HTrue b2 r,
      HAnd HTrue b21 b2,
      HAnd HTrue b22 b21,
      HAll (Is Int) ys1 b211,
      HAnd HTrue b211 b22,
      HPartition FibIndex ys r11 r2,
      NumberList
        (S (HSucc (HSucc (HSucc (HSucc (HSucc HZero)))))) others ys,
      HLength others n,
      FunctionArguments t others,
      Foo t) =>
     t

It works. Pros of this approach:

easy implementation:
- many things available in HList
- others (like FunctionArguments) are very general and are usable with other, non variadic problems
- problem boils down to list manipulation
it scales to other contrived examples - only checkList has to be modified

Cons:

a lot of boiler-plate at call-site
huge types of intermediate expressions
unreadable type errors

begin Solution2.lhs

This time, we'll analyze this specific problem, and come up with much better and easier solution. What's the first problem with primeFib from my Solution1, and test from cdsmith's solution? Asking ghci about the type of both those functions results in x - simple type variable. Sure, there's also entire algorithm hidden in the type context, that could hide in the wardrobe and scare little children at night. But, we could tell more about that type, because we understand the problem. The function has to take only prime number of arguments, and the smallest prime number is 2. 1 and 2 are Fibonacci numbers, so clearly the type of bare primeFib function should be Int -> Int -> r, that much we can tell without choosing what to do later.

The second solution will be more aggressive, instead of checking the type at the end, when the only possible outcome is a type error, this approach will try to generate the correct type. It won't use any type-level tricks, in fact there will be three classes without functional dependencies - I almost forgot how to use those.

Here's how the algorithm will work. Let's assume that we are in the middle of computation and our functions has the following type:

primeFib :: a₁ -> ... -> a_n -> r

and we want to generate the r part. The first thing to check, is to decide if n is a prime number. If so, we have two choices (let's call this situation overlap), we can either stop and return the accumulator (did I mention, that we are carrying an accumulator?), because we already have accepted prime number of arguments, or we can generate a bigger solution (this boils down to the other primality branch). If n isn't prime, it's clear that we must accept more arguments (this situation will be called generate), so r will be a function type a_n+1 -> r'. Now we have to decide the type of a_n+1: if n+1 is a Fibonacci number, a_n+1 will be Int, otherwise it is String. What about r' ? Well, we repeat the process with the successor of n.

> {-# LANGUAGE MultiParamTypeClasses
>    , FunctionalDependencies
>    , FlexibleInstances
>    , FlexibleContexts
>    , UndecidableInstances
>    , TypeSynonymInstances
>    , NoMonomorphismRestriction
>    , OverlappingInstances
>    #-}

> import Data.HList
> import Numbers

> type X = ([Int], [String])

Foo will be the entry point to the algorithm. n is our counter - how many arguments have been already eaten. foo method has to carry our accumulator. Notice that there are no functional dependencies - we want it to have many instances (as many as there are prime numbers, which is a finite number, when there are no primes bigger than 20 :)

> class Foo n result where
>     foo :: n -> X -> result

Foo has only one instance, which means that it's a class synonym, and it's possible to write it as a function. I prefer a class here, because it will be used in two places, function approach would mean writing down twice this synonym, class approach can use type simplifier to do this rewriting.

This instance calls the next, auxiliary class, with an additional knowledge of the primality of n. It will be calculated as soon as n is known, because IsPrime is a type-level function.

> instance (IsPrime n flag, Foo' flag n result) => Foo n result where
>     foo n acc = foo' (isPrime n) n acc

Foo' class can now dispatch on the primality of the counter (flag argument):

> class Foo' flag n result where
>     foo' :: flag -> n -> X -> result

So, n isn't prime, we forward to the Generate class, but we also provide Generate class the result of IsFib on the successor of n, so it can dispatch on that fact.

> instance (HNat n, IsFib (HSucc n) flag, Generate flag n result) => Foo' HFalse n result where
>     foo' _ n acc = generate (isFib $ hSucc n) n acc

Generate class has a functional dependency, if we know that n+1 is a Fibonacci number, we know which instance to choose - thus we know part or the resulting type.

> class Generate flag n result | flag -> result where
>     generate :: flag -> n -> X -> result

n+1 isn't a Fibonacci number, generate a function from String and continue the recursive process by calling Foo with a bigger counter. At the value level, return a function that will insert eaten argument to the accumulator.

> instance (HNat n, Foo (S n) result) => Generate HFalse n (String -> result) where
>     generate _ n (ints,strings) = \s -> foo (hSucc n) (ints, s:strings)

The other case, similar to the previous one, but for integers:

> instance (HNat n, Foo (S n) result) => Generate HTrue n (Int -> result) where
>     generate _ n (ints,strings) = \k -> foo (hSucc n) (k:ints, strings)

Now the other branch of our first if, if counter was a prime number, go to the Overlap situation:

> instance Overlap n result => Foo' HTrue n result where
>     foo' _ n acc = overlap n acc

This class has two overlapping instances - this is what stops the process, because it cannot decide which one to choose without further knowledge.

> class Overlap n result where
>     overlap :: n -> X -> result

Final case - our result type has been forced to X, return reversed accumulators:

> instance Overlap n X where
>     overlap _ (ints, strings) = (reverse ints, reverse strings)

Otherwise, go to the Overlap case. There's no need to decide if n+1 is a Fibonacci number, we can reuse Foo' HFalse instance, which does just that:

> instance (Foo' HFalse n result) => Overlap n result where
>     overlap n acc = foo' hFalse n acc

Obviously, we start with empty accumulators and counter value of zero - we haven't accepted any arguments yet.

> primeFib = foo hZero ([],[])

This solution always generates the best approximation of the final type:

*Main> :t primeFib
primeFib :: (Overlap (HSucc (HSucc HZero)) result) => Int -> Int -> result
*Main> :t \a b c d -> primeFib a b c d
\a b c d -> primeFib a b c d
  :: (Overlap (HSucc (HSucc (HSucc (HSucc (HSucc HZero))))) result) =>
     Int -> Int -> Int -> String -> Int -> result

This results in less boiler-plate required at the call-site:

> testPrimeFib1 = primeFib 1 2 3 "4" 5

*Main> :t testPrimeFib1
testPrimeFib1 :: (Overlap (HSucc (HSucc (HSucc (HSucc (HSucc HZero))))) result) => result
*Main> testPrimeFib1 :: X
([1,2,3,5],["4"])

And better error messages:

*Main> primeFib "1"
:1:9:
    Couldn't match expected type `Int' against inferred type `[Char]'
    In the first argument of `primeFib', namely `"1"'

*Main> primeFib 1 2 3 "4" :: X
Top level:
    Couldn't match expected type `([Int], [String])'
           against inferred type `Int -> result'

The only disadvantage of this approach is the fact, that it required thinking specific to this problem, it could be much harder (impossible?) in different, contrived problems.

That is all, thanks for reading, comments are welcome.