OpenAI Gym Tutorial

From a tutorial:
https://gym.openai.com/docs

Observations

The environment’s step function returns four values:

observation (object):
– an environment-specific object representing your observation of the environment.
e.g.
– pixel data from a camera
– joint angles and joint velocities of a robot
– the board state in a board game.

reward (float):
– amount of reward achieved by the previous action.
– The scale varies between environments, but the goal is always to increase your total reward.

done (boolean):
– whether it’s time to reset the environment again.
– Most (but not all) tasks are divided up into well-defined episodes, and done being True indicates the episode has terminated.
e.g.
– perhaps the pole tipped too far, or you lost your last life.

info (dict):
– diagnostic information useful for debugging.
– It can sometimes be useful for learning
e.g. it might contain the raw probabilities behind the environment’s last state change.
– However, official evaluations of your agent are not allowed to use this for learning.

=> agent-environment loop
Each timestep,
– the agent chooses an action
– the environment returns an observation and a reward.

observation

Spaces

The Discrete space
– allows a fixed range of non-negative numbers,
– in this case, valid actions are either 0 or 1.

The Box space
– represents an n-dimensional box,
– valid observations will be an array of 4 numbers.

Box and Discrete: the most common Spaces.

Environments

EnvSpec(environment ID)
– define parameters for a particular task, including the number of trials to run and the maximum number of steps.

e.g.
EnvSpec(Hopper-v1)
– defines an environment where the goal is to get a 2D simulated robot to hop;
EnvSpec(Go9x9-v0)
– defines a Go game on a 9×9 board.

Recording and uploading results

Use Monitor() to export your algorithm’s performance as a JSON file:

To upload your results to OpenAI Gym:

ページトップへ