I am learning about MDP's and value iteration in self-study and I hope someone can improve my understanding.
Consider the problem of a 3 sided dice having numbers 1, 2, 3.  If you roll a 1 or a 2 you get that value in $ but if you roll a 3 you loose all your money and the game ends (finite horizon problem)
Conceptually I understand how this done with the following forumla:
So let's break that down:
Since this is a finite horizon problem we can ignore gamma. 
If I observe 1, I can either go or stop.  The utility/value of that is:
V(1) = max(Q(1, g), Q(1, s))
Q(1, g) = r + SUM( P( 2 | 1,g) * V(2) + P( 3 | 1,g) * V(3))
Q(1, s) = r + SUM( P( 2 | 1,s) * V(2) + P( 3 | 1,s) * V(3))
where r = 1
I observe 2, I can either go or stop:
V(2) = max(Q(2, g), Q(2, s))
Q(2, g) = r + SUM( P( 1 | 2,g) * V(1) + P( 3 | 1,g) * V(3))
Q(2, s) = r + SUM( P( 1 | 2,s) * V(1) + P( 3 | 1,s) * V(3))
where r = 2
I observe 3, the game ends.
Intuitively V(3) is 0 because the game is over, so we can remove that half from the equation of Q(1, g).  We defined V(2) above also so we can substitute that as:
Q(1, g) = r + SUM( P( 2 | 1,g) *     
    MAX ((P( 1 | 2,g) * V(1)) , (P( 1 | 2,s) * V(1))))
This where things take a bad turn.  I am not sure how to solve Q(1, g) if it has its own definition in its solution. This likely due to poor math background.
What I do understand is that the utilities or the values of the states will change based on the reward and therefore the decision will change.
Specifically if rolling three gave you $3 while rolling one ended the game, that will affect your decision because the utility has changed.
But I am not sure how to write code to calculate that.
Can someone explain how Dynamic Programming works in this? How do I solve Q(1,g) or Q(1,s) when it is in its own definition?
