I am trying to understand how to use mdptoolbox and had a few questions.
What does 20 mean in the following statement?
P, R = mdptoolbox.example.forest(10, 20, is_sparse=False)
I understand that 10 here denotes the number of possible states. What does 20 mean here? Does it represent the total number of actions per state? I want to restrict the MDP to exactly 2 actions per state. How could I do this?
The shape of P returned above is (2, 10, 10). What does 2 represent here? No matter what values I use for total states and actions, it is always 2.