The short answer is that you supply attributes of the word coffee (like w[-1]=drank to indicate the previous word) and its label (NOUN), and CRFsuite generates the actual indicator functions that compose the CRF model (including a feature that indicates that the label of the previous word is VERB). It knows to do this because it uses a "1st-order Markov CRF with dyad features," as described on the manual page you linked to.
One distinction that's important to make (and that the documentation could be more precise about) is the difference between "features" and "attributes" where features are links in the model that represent either (attribute, label) or (label, label) pairs.
So in your example, w[-1]=drank is an attribute that you supply. The combination of w[-1]=drank, NOUN is a state feature and the transition between labels VERB --> NOUN is a transition feature, both of which are generated by CRFsuite.
I recommend the tutorial, which discusses this in more detail.