In Haskell, how to parse an untyped AST to a typed one based on a GADT?

Question

I'm writing a domain-specific language in Haskell, and have settled on a design with two ASTs: an initial untyped one that represents syntax and a final typed one that represents everything. I'm writing the final one as a GADT to get better typechecking.

I think it's almost working, but I'm having trouble writing a function that converts initial -> final (checks the types, plus some other things not shown, like all references correspond to a variable).

Here's a simplified example:

{-# LANGUAGE GADTs, StandaloneDeriving #-}

module Main where

-- untyped initial AST

data Untyped
  = UNum Int
  | UStr String
  | UAdd Untyped Untyped
  deriving (Show, Eq)

-- typed final AST

data Typed a where
  TNum :: Int    -> Typed Int
  TStr :: String -> Typed String
  TAdd :: Typed Int -> Typed Int -> Typed Int

deriving instance Eq   (Typed a)
deriving instance Show (Typed a)

-- wrapper that allows working with a `Typed a` for any `a`
data TypedExpr where
  TypedExpr :: Typed a -> TypedExpr

And this is my attempt at the check function. The base cases are simple:

check :: Untyped -> Either String TypedExpr
check (UNum n) = Right $ TypedExpr $ TNum n
check (UStr s) = Right $ TypedExpr $ TStr s
-- check (Uadd e1 e2) = ???

But how do I add Add? It can recursively evaluate the sub-expressions to values of type Either String (TypedExpr (Typed a)), but I haven't managed to unwrap those, check that the types line up (both as should be Ints), and re-wrap afterward. I planned to do it all with big pattern matches, but GHC doesn't approve:

My brain just exploded
  I can't handle pattern bindings for existential or GADT data constructors.
  Instead, use a case-expression, or do-notation, to unpack the constructor.

That's explained here, but I didn't understand the explanation. It seems I don't want pattern matching.

Update: I must have messed something else up without noticing. Pattern matching works, as shown by Nikita.

So I fiddled around trying to force things into the right shape, but haven't gotten anything substantial yet. If these were just Either String SomeValue I would want to use applicatives, right? Is it possible to add another level of unwrapping + type checking to them? I suspect this answer is close to what I want since the question is very similar, but again I don't understand it. This may also be related.

Update: That first answer is just what I wanted after all. But I couldn't see how until chi wrote the intermediate version below without the extra Type type. Here's a working solution. The trick was to tag TypedExprs with a new type representing just the return type (a) of a Typed a:

data Returns a where
  RNum :: Returns Int
  RStr :: Returns String

-- extend TypedExpr to include the return type
data TypedExpr2 where
  TypedExpr2 :: Returns a -> Typed a -> TypedExpr2

That way check doesn't have to know whether each subexpression is a raw TNum or a function (like Add) that returns a TNum:

check :: Untyped -> Either String TypedExpr2
check (UNum n) = Right $ TypedExpr2 RNum (TNum n)
check (UStr s) = Right $ TypedExpr2 RStr (TStr s)
check (UAdd u1 u2) = do
  -- typecheck subexpressions, then unwrap by pattern matching
  TypedExpr2 r1 t1 <- check u1
  TypedExpr2 r2 t2 <- check u2
  -- check the tags to find out their return types
  case (r1, r2) of
    -- if correct, create an overall expression tagged with its return type
    (RNum, RNum) -> return $ TypedExpr2 RNum $ TAdd t1 t2
    _ -> Left "type error"

GHC is smart enough to know that the two as in any TypedExpr2 must match, so it catches you if you attempt to use the wrong overall return type at the end. Wonderful!

How about something like [this](http://stackoverflow.com/a/27838550/477476)? See https://gergo.erdi.hu/blog/2015-02-05-typed_embedding_of_stlc_into_haskell/ for an extended explanation. — Cactus, May 03 '16 at 02:34

score 2 · Answer 1 · edited May 23 '17 at 12:02

My recommendation would be to use a "plain" representation of your universe of types (with an interpretation type function), and then store a Singleton reification of your existential type in TypedExpr, i.e. something like

{-# LANGUAGE DataKinds, KindSignatures, TypeFamilies #-}

data Type = TInt | TString

type family InterpT (a :: Type) where
  InterpT TInt = Int
  InterpT TString = String

-- plus the usual singletons stuff
-- ...

-- and finally
data Typed (a :: Type) where ...

data TypedExpr where
  TypedExpr :: Sing a -> Typed a -> TypedExpr

This way, you can do something like

check (UAdd e1 e2) = do
  TypedExpr t1 e1' <- check e1
  TypedExpr t2 e2' <- check e2
  case testEquality t1 t2 of
    Just Refl -> ... use e1' and e2' here, you know they have the same Type
    Nothing -> Left ...

Find a fully fleshed-out example here.

score 0 · Answer 2 · answered May 03 '16 at 02:37

0

Your exact question is easily answerable with the following solution:

check (UAdd (UNum a) (UNum b)) = Right $ TypedExpr $ TAdd (TNum a) (TNum b)

However there's quite a few indications of a design maze there.

Do you realize that once you put something into TypedExpr you lose all the information about the a type? This renders your check function quite pointless.

I understand that you do so because it's the only way you can unify the types of your GADT, and otherwise you can't implement the check function. But actually it just proves that you're modeling things wrong and that GADT might be inappropriate for your use case.

I don't understand why the UAdd constructor is over Untyped values, instead of Int, and I don't understand what made you go with this multi-stage AST strategy.

I could go on, but I'll break here and just recommend you to redesign your model.

answered May 03 '16 at 02:37

Nikita Volkov

42,792
11
94
169

Thanks, you're right! I thought that was the first thing I tried (causing the "my brain exploded" error), but must have accidentally changed something else at the same time. Guess the whole question was a little pointless. What do you mean by lose all information though? You can get the values back, and types aren't needed anymore because the GADT gaurantees they line up. EDIT: Oh, the reason it's not over `Int`s is the language also involves set operations and I'm using `+` for unions. – jefdaj May 03 '16 at 03:33
Oh nevermind I think I get it. Value-level type tags are needed to know how you can combine the GADTs later when you have more than one. The answer [here](https://stackoverflow.com/questions/30564967/converting-an-untyped-ast-for-a-simple-typed-language-into-a-gadt) seems like it would work, but you're right it's a little convoluted. – jefdaj May 03 '16 at 07:28

score -1 · Accepted Answer · edited May 03 '16 at 08:27

-1

Technically it is doable, but it's quite inconvenient: you have to "dig" until the GADT constructor is found.

check :: Untyped -> Either String TypedExpr
check (UNum n) = return $ TypedExpr $ TNum n
check (UStr s) = return $ TypedExpr $ TStr s
check (UAdd t1 t2) = do
    t1 <- check t1
    t2 <- check t2
    case (t1, t2) of
      (TypedExpr (TNum x)     , TypedExpr (TNum y))
         -> return $ TypedExpr $ TAdd (TNum x    ) (TNum y)
      (TypedExpr (TAdd x1 x2) , TypedExpr (TNum y))
         -> return $ TypedExpr $ TAdd (TAdd x1 x2) (TNum y)
      (TypedExpr (TNum x)     , TypedExpr (TAdd y1 y2))
         -> return $ TypedExpr $ TAdd (TNum x    ) (TAdd y1 y2)
      (TypedExpr (TAdd x1 x2) , TypedExpr (TAdd y1 y2))
         -> return $ TypedExpr $ TAdd (TAdd x1 x2) (TAdd y1 y2)
      _ -> Left "type error"

I would look for alternatives. The above approach suffers from combinatorial explosion when the number of constructors is large.

edited May 03 '16 at 08:27

Cactus

27,075
9
69
149

answered May 03 '16 at 08:14

chi

111,837
3
133
218

What do you think of [this](http://stackoverflow.com/a/30565412/429898)? It adds a separate `Type` tag to the `TypedExpr`s (which it calls `ETerm`s), then checks the tag rather than the constructor used. It seems you only have to define a `Type` for each possible *return* type in the DSL that way. For example `Add` and `TNum` could both be tagged with `Type ReturnsNum` (should have picked different names). It still explodes with the number of basic types, but I could probably keep that to just numbers, strings, booleans, and sets of those. – jefdaj May 03 '16 at 16:05
@Jeff I like that approach. – chi May 03 '16 at 16:37
Thanks, this helped a lot! – jefdaj May 03 '16 at 18:53

In Haskell, how to parse an untyped AST to a typed one based on a GADT?

3 Answers3