I'm writing a domain-specific language in Haskell, and have settled on a design with two ASTs: an initial untyped one that represents syntax and a final typed one that represents everything. I'm writing the final one as a GADT to get better typechecking.
I think it's almost working, but I'm having trouble writing a function that converts initial -> final (checks the types, plus some other things not shown, like all references correspond to a variable).
Here's a simplified example:
{-# LANGUAGE GADTs, StandaloneDeriving #-}
module Main where
-- untyped initial AST
data Untyped
= UNum Int
| UStr String
| UAdd Untyped Untyped
deriving (Show, Eq)
-- typed final AST
data Typed a where
TNum :: Int -> Typed Int
TStr :: String -> Typed String
TAdd :: Typed Int -> Typed Int -> Typed Int
deriving instance Eq (Typed a)
deriving instance Show (Typed a)
-- wrapper that allows working with a `Typed a` for any `a`
data TypedExpr where
TypedExpr :: Typed a -> TypedExpr
And this is my attempt at the check function. The base cases are simple:
check :: Untyped -> Either String TypedExpr
check (UNum n) = Right $ TypedExpr $ TNum n
check (UStr s) = Right $ TypedExpr $ TStr s
-- check (Uadd e1 e2) = ???
But how do I add Add? It can recursively evaluate the sub-expressions to
values of type Either String (TypedExpr (Typed a)), but I haven't managed to
unwrap those, check that the types line up (both as should be Ints), and
re-wrap afterward. I planned to do it all with big pattern matches, but GHC
doesn't approve:
My brain just exploded
I can't handle pattern bindings for existential or GADT data constructors.
Instead, use a case-expression, or do-notation, to unpack the constructor.
That's explained here, but I didn't understand the explanation. It seems I don't want pattern matching.
Update: I must have messed something else up without noticing. Pattern matching works, as shown by Nikita.
So I fiddled around trying to force things into
the right shape, but haven't gotten anything substantial yet. If these were
just Either String SomeValue I would want to use applicatives, right? Is it
possible to add another level of unwrapping + type checking to them? I suspect this answer is close to what I want since the question is very similar, but again I don't understand it. This may also be related.
Update: That first answer is just what I wanted after all. But I couldn't see how until chi wrote the intermediate version below without the extra Type type. Here's a working solution. The trick was to tag TypedExprs with a new type representing just the return type (a) of a Typed a:
data Returns a where
RNum :: Returns Int
RStr :: Returns String
-- extend TypedExpr to include the return type
data TypedExpr2 where
TypedExpr2 :: Returns a -> Typed a -> TypedExpr2
That way check doesn't have to know whether each subexpression is a raw
TNum or a function (like Add) that returns a TNum:
check :: Untyped -> Either String TypedExpr2
check (UNum n) = Right $ TypedExpr2 RNum (TNum n)
check (UStr s) = Right $ TypedExpr2 RStr (TStr s)
check (UAdd u1 u2) = do
-- typecheck subexpressions, then unwrap by pattern matching
TypedExpr2 r1 t1 <- check u1
TypedExpr2 r2 t2 <- check u2
-- check the tags to find out their return types
case (r1, r2) of
-- if correct, create an overall expression tagged with its return type
(RNum, RNum) -> return $ TypedExpr2 RNum $ TAdd t1 t2
_ -> Left "type error"
GHC is smart enough to know that the two as in any TypedExpr2 must match,
so it catches you if you attempt to use the wrong overall return type at the
end. Wonderful!