Optimal decision-making with POMDPs

:star: :star:


  • MDP has states actions transition functions and reward functions
  • Partially observable stuff is where you can’t see every part of state.
    • Chess is fully observable, battleship is not
  • Update beliefs based on results of actions still, but be more cautious cause you don’t have the whole picture
  • Train with Bellman equation. Curse of dimensionality