I'm writing a very simple two-pass assembler in Haskell and I've come across a scenario that I don't yet have the experience to solve. I think the solution is likely to involve monad transformers, which I don't really understand.
The assembler parses the assembly code into a list of Statements, which are either instructions or labels. Some Statements may refer to labels. The assembler needs to convert the Statements into Instructions, which involves eliminating the labels and substituting the label references with an appropriate value.
I have written the first pass of the assembler, which produces a [(String, Int)] representing a map from labels to addresses. I have also written the following function for translating a Statement into an Instruction:
stmtToInstruction :: Int -> [(String, Int)] -> Statement -> Either String [I.Instruction]
stmtToInstruction addr labels stmt = case stmt of
ADD d s1 s2 -> Right [I.ADD d s1 s2]
BEQL s1 s2 l -> case do label <- find (\e -> fst e == l) labels
let labelAddr = snd label
let relativeAddr = I.ImmS $ fromIntegral (labelAddr - addr)
return (I.BEQ s1 s2 relativeAddr) of
Just i -> Right [i]
Nothing -> Left $ "Label " ++ l ++ " not defined"
LABEL _ -> Right []
I've omitted several cases for brevity, but you can see all the possible results here:
ADDalways succeeds and produces an instructionBEQLcan either succeed or fail, depending on whether a label is foundLABELalways succeeds, even though it produces no actual instructions
This works as expected. The problem I now have is writing this function:
replaceLabels :: [Statement] -> Either String [I.Instruction]
replaceLabels takes a list of statements, and runs stmtToInstruction on each one. The addr argument to stmtToInstruction must be the length of the [Instruction] accumulated so far. The output may either be a Left String, if one of the label references was invalid, or a Right [I.Instruction], if there were no errors.
mapM :: Monad m => (a -> m b) -> [a] -> m [b] gets us some of the way there, but provides no way to inject the current address into the (a -> m b) function. How do I make this work?