Markov Decision Processes


:star: :star: :star:

Notes

  • Everything depends only on previous state and action taken from state
  • Four tuple: state, action, transition function, reward function
  • Optimal policy gives best action for max reward from any state
  • Very hard to train at large scale due to curse of dimensionality