Reinforcement learning
Reinforcement learning (RL)
– a way of programming agents by reward and punishment without needing to specify how the task is to be achieved.
– The basic RL problem includes states (s), actions (a) and rewards (r). The typical formulation is as follows:
1. Observe state
2. Decide on an action
3. Perform action
4. Observe new state
5. Observe reward
6. Learn from experience
7. Repeat
A quick background review
https://github.com/vmayoral/basic_reinforcement_learning/blob/master/BACKGROUND.md
– an area of machine learning inspired by behaviorist psychology
– concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
In machine learning, the environment is typically formulated as a Markov decision process (MDP)
The difference between the classical techniques and RL
– RL do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible.
– In RL, correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected.
– Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).
Markov decision process (MDP)
– provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker.
– useful for studying a wide range of optimization problems solved via dynamic programming and reinforcement learning.
Markov Decision Process (MDP) Tutorial