- MDP has states actions transition functions and reward functions
- Partially observable stuff is where you can’t see every part of state.
- Chess is fully observable, battleship is not
- Update beliefs based on results of actions still, but be more cautious cause you don’t have the whole picture
- Train with Bellman equation. Curse of dimensionality