technical-knockout.com : Programming

Reinforcement learning

Reinforcement learning (RL)

– a way of programming agents by reward and punishment without needing to specify how the task is to be achieved.
– The basic RL problem includes states (s), actions (a) and rewards (r). The typical formulation is as follows:
1. Observe state
2. Decide on an action
3. Perform action
4. Observe new state
5. Observe reward
6. Learn from experience
7. Repeat

A quick background review
https://github.com/vmayoral/basic_reinforcement_learning/blob/master/BACKGROUND.md

– an area of machine learning inspired by behaviorist psychology
– concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.

In machine learning, the environment is typically formulated as a Markov decision process (MDP)

The difference between the classical techniques and RL
– RL do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible.
– In RL, correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected.
– Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).

Wikipedia

Markov decision process (MDP)

– provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker.
– useful for studying a wide range of optimization problems solved via dynamic programming and reinforcement learning.

Markov Decision Process (MDP) Tutorial

Reference

http://www.cse.unsw.edu.au/%7Ecs9417ml/RL1/index.html

OpenAI Gym Tutorial

From a tutorial:
https://gym.openai.com/docs

Observations

The environment’s step function returns four values:

observation (object):
– an environment-specific object representing your observation of the environment.
e.g.
– pixel data from a camera
– joint angles and joint velocities of a robot
– the board state in a board game.

reward (float):
– amount of reward achieved by the previous action.
– The scale varies between environments, but the goal is always to increase your total reward.

done (boolean):
– whether it’s time to reset the environment again.
– Most (but not all) tasks are divided up into well-defined episodes, and done being True indicates the episode has terminated.
e.g.
– perhaps the pole tipped too far, or you lost your last life.

info (dict):
– diagnostic information useful for debugging.
– It can sometimes be useful for learning
e.g. it might contain the raw probabilities behind the environment’s last state change.
– However, official evaluations of your agent are not allowed to use this for learning.

=> agent-environment loop
Each timestep,
– the agent chooses an action
– the environment returns an observation and a reward.

observation

Spaces

The Discrete space
– allows a fixed range of non-negative numbers,
– in this case, valid actions are either 0 or 1.

The Box space
– represents an n-dimensional box,
– valid observations will be an array of 4 numbers.

Box and Discrete: the most common Spaces.

Environments

EnvSpec(environment ID)
– define parameters for a particular task, including the number of trials to run and the maximum number of steps.

e.g.
EnvSpec(Hopper-v1)
– defines an environment where the goal is to get a 2D simulated robot to hop;
EnvSpec(Go9x9-v0)
– defines a Go game on a 9×9 board.

Recording and uploading results

Use Monitor() to export your algorithm’s performance as a JSON file:

To upload your results to OpenAI Gym:

OpenAI Gym Setup

https://github.com/openai/gym

Basics

2 basic concepts:
1) the environment: the outside world;
2) the agent: the algorithm you are writing.
– The agent sends actions to the environment, and the environment replies with observations and rewards (a score).

The core gym interface: Env
https://github.com/openai/gym/blob/master/gym/core.py

To confirm Python’s version.

import platform print(platform.python_version())

1
2

import platform
print(platform.python_version())

To confirm Python’s location from Terminal.

which python

1

which python

Install Atari environments.

Prepare Python files like following examples and execute it from Terminal.

Cart Pole

Space Invaders

Ms Pacman

GIF Animation

To make a GIF animation.

http://nbviewer.jupyter.org/github/patrickmineault/xcorr-notebooks/blob/master/Render%20OpenAI%20gym%20as%20GIF.ipynb

OpenAI

– a non-profit artificial intelligence (AI) research company
– one of founders is Elon Musk.
– aims to carefully promote and develop friendly AI in such a way as to benefit humanity as a whole.
– The founders are motivated in part by concerns about existential risk from artificial general intelligence.

OpenAI
https://openai.com/

Wikipedia
https://en.wikipedia.org/wiki/OpenAI

Wired
Inside OpenAI, Elon Musk’s Wild Plan to Set Artificial Intelligence Free

OpenAI Gym

– a toolkit for developing and comparing reinforcement learning (RL) algorithms.
– consists of a growing suite of environments (from simulated robots to Atari games), and a site for comparing and reproducing results.
– compatible with algorithms written in any framework, such as Tensorflow and Theano.
– As of May 2017, “OpenAI Gym” can only be used through Python, but more languages are coming soon.

OpenAI Gym
https://gym.openai.com/

Documentation
https://gym.openai.com/docs

GitHub
https://github.com/openai/gym

Universe

– a software platform for measuring and training an AI’s general intelligence across the world’s supply of games, websites and other applications.
– allows an AI agent to use a computer like a human does: by looking at screen pixels and operating a virtual keyboard and mouse.

Universe
https://universe.openai.com/

Blog Post
https://blog.openai.com/universe/

The Complete Jenkins Course For Developers and DevOps

Finished an online program @ Udemy.

The Complete Jenkins Course For Developers and DevOps

Tutorial
https://www.tutorialspoint.com/jenkins/index.htm

Tested Maven Plugin

GitHub
https://github.com/takuoyoneda/maven-project