The formula to calculate the value function using the Bellman equation is:
\[ V^*(s) = R(s) + \gamma \cdot V(s') \]
Where:
The Bellman equation is a recursive equation that is central to dynamic programming and reinforcement learning. It expresses the value of a decision problem at a certain point in time in terms of the payoff from some initial choices and the value of the remaining decision problem that results from those initial choices. This equation is fundamental in finding the optimal policy in a Markov Decision Process (MDP).
Let's assume the following values:
Using the formula to calculate the value function:
\[ V^*(s) = 10 + (0.9 \times 50) = 10 + 45 = 55 \]
The value function (V*(s)) is 55.