technical-knockout.com

Machine Learning A-Z: Part 2 – Regression (Polynomial Regression)

Polynomial Regression

Python

Reset console.

IPythonコンソール|カーネルの再起動

Show summary.

Templates

Python

Machine Learning A-Z: Part 2 – Regression (Multiple Linear Regression)

Dummy Variable Trap

Dummy variables must be:
D₂ = 1 – D₁

You cannot have more than one pair of dummy variables at the same time.

Building a model

PDF

1. All-in
=> 2. Backward Elimination
3. Forward Selection
4. Bidirectional Elimination
5. Score Comparison

Akaike information criterion (AIC) 赤池情報量規準

– a measure of the relative quality of statistical models for a given set of data.
– Given a collection of models for the data, estimates the quality of each model, relative to each of the other models.
– Hence, provides a means for model selection.

Multiple Linear Regression

Python

Clear environment of RStudio.

RStudio Keyboard Shortcuts

Machine Learning A-Z: Part 2 – Regression (Simple Linear Regression)

Show current directory.

Simple Linear Regression

Python

Templates

Python

Basic Reinforcement Learning Tutorial 1

Background

Value Functions (state-action pair functions) estimate:
– how good a particular action will be in a given state
– what the return for that action is expected to be.

Q-Learning
– an off-policy (can update the estimated value functions using hypothetical actions, those which have not actually been tried) algorithm for temporal difference learning (method to estimate value functions).
– can be proven that given sufficient training,
– the Q-learning converges with probability 1 to a close approximation of the action-value function for an arbitrary target policy.
– learns the optimal policy even when actions are selected according to a more exploratory or even random policy.
– can be implemented as follows:

where:
s: is the previous state
a: is the previous action
Q(): is the Q-learning algorithm
s’: is the current state
alpha: is the the learning rate, set generally between 0 and 1. Setting it to 0 means that the Q-values are never updated, thereby nothing is learned. Setting alpha to a high value such as 0.9 means that learning can occur quickly.
gamma: is the discount factor, also set between 0 and 1. This models the fact that future rewards are worth less than immediate rewards.
max,: is the the maximum reward that is attainable in the state following the current one (the reward for taking the optimal action thereafter).

The algorithm can be interpreted as:

Initialize the Q-values table, Q(s, a).
Observe the current state, s.
Choose an action, a, for that state based on the selection policy.
Take the action, and observe the reward, r, as well as the new state, s’.
Update the Q-value for the state using the observed reward and the maximum reward possible for the next state.
Set the state to the new state, and repeat the process until a terminal state is reached.