Reinforcement learning

Reinforcement learning (RL)

– a way of programming agents by reward and punishment without needing to specify how the task is to be achieved.
– The basic RL problem includes states (s), actions (a) and rewards (r). The typical formulation is as follows:
1. Observe state
2. Decide on an action
3. Perform action
4. Observe new state
5. Observe reward
6. Learn from experience
7. Repeat

A quick background review
https://github.com/vmayoral/basic_reinforcement_learning/blob/master/BACKGROUND.md

– an area of machine learning inspired by behaviorist psychology
– concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.

In machine learning, the environment is typically formulated as a Markov decision process (MDP)

The difference between the classical techniques and RL
– RL do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible.
– In RL, correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected.
– Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).

Wikipedia

Markov decision process (MDP)

– provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker.
– useful for studying a wide range of optimization problems solved via dynamic programming and reinforcement learning.

Markov Decision Process (MDP) Tutorial

Reference

http://www.cse.unsw.edu.au/%7Ecs9417ml/RL1/index.html

ページトップへ