Bellman Equation Calculator

Formula

The formula to calculate the value function using the Bellman equation is:

\[ V^*(s) = R(s) + \gamma \cdot V(s') \]

Where:

\(V^*(s)\) is the value function of the current state (s)
\(R(s)\) is the immediate reward received from the current state (s)
\(\gamma\) is the discount factor, which represents the difference in importance between future rewards and present rewards
\(V(s')\) is the value of the next state (s')

What is the Bellman Equation?

The Bellman equation is a recursive equation that is central to dynamic programming and reinforcement learning. It expresses the value of a decision problem at a certain point in time in terms of the payoff from some initial choices and the value of the remaining decision problem that results from those initial choices. This equation is fundamental in finding the optimal policy in a Markov Decision Process (MDP).

Example Calculation

Let's assume the following values:

Immediate Reward (R(s)) = 10
Discount Factor (γ) = 0.9
Value of Next State (V(s')) = 50

Using the formula to calculate the value function:

\[ V^*(s) = 10 + (0.9 \times 50) = 10 + 45 = 55 \]

The value function (V*(s)) is 55.