Visual Studio

Short Cuts

close files: CTRL + F4

comment: CTRL + K, CTRL + C
uncomment: CTRL + K, CTRL + U

https://blogs.msdn.microsoft.com/zainnab/2010/04/13/comment-and-uncomment-code/

https://jeremybytes.blogspot.jp/2015/12/visual-studio-shortcuts-comment.html

Extensions

ReSharper
https://www.jetbrains.com/resharper/

Intermediate Language of C#

Types of IL (Intermediate Language)
– CIL (Common Intermediate Language)
– MSIL (Microsoft Intermediate Language)

View IL in Unity project on MonoDevelop
1. Open Edit References
Solution Window | References | Settings
2. Open dll file by Assembly Browser
Edit References | .Net Assembly | Browse

2 .NET Compilers
1) Intermediate Language Compiler => CIL (Common Intermediate Language)
2) JIT (just-in-time) Compiler => Machine Language (optimized for CPU)

Demonstration

int i = 456;
i = i + 1;

ldc.i4.1
ldloc.0
add
stloc.0

ldc.i4.1
Load the 4-byte signed integer constant 1 on the evaluation stack.
[Local variable locations] 456
[Evaluation stack] 1

ldloc.0
Load the variable in location 0 on the evaluation stack. The other value (1) on the stack is pushed down.
[Local variable locations] 456
[Evaluation stack] 456, 1

add
Add the top two numbers on the evaluation stack together, and replace them with the result.
[Local variable locations] 456
[Evaluation stack] 457

stloc.0
Remove the value at the top of the evaluation stack, and store it in location 0.
[Local variable locations] 457
[Evaluation stack]

Reference

box: box the top value on the stack
bne: branch if top 2 values on stack are not equal
call: call a static member of a class
callvirt: call an instance member of a class
ldelem/stelem: load and store array elements
newarr: create a new 1-dimensional array
newobj: create a new object on the heap
ret: return from method
throw: throw an exception
unbox: unbox the top reference on the stack

pop: Pop value from the stack.
dup: Duplicate the value on the top of the stack.
stsfld: Replace the value of field with val.
ldsfld: Push the value of field on the stack.

List of CIL instructions
https://en.wikipedia.org/wiki/List_of_CIL_instructions

AWS: Initial Setup

Access from Windows with Putty

Host Name
Amazon Linux: ec2-user@[Public DNS]
Ubuntu: ubuntu@[Public DNS]

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html

Environment Variables for Jupyter Notebook on Ubuntu

https://medium.com/@josemarcialportilla/getting-spark-python-and-jupyter-notebook-running-on-amazon-ec2-dec599e1c297

Connecting to Your Linux Instance Using SSH

How can I manage the directory structure of an Amazon EC2 Ubuntu Linux instance from the command shell?

Recover Lost Key Pair of AWS EC2 Linux Instance

AWS: How to create a default VPC

How to create a default VPC:
> Create a VPC with a size /16 IPv4 CIDR block.
> Create a default subnet in each Availability Zone.
> Create an Internet gateway and connect it to your default VPC.
> Create a main route table for your default VPC with a rule that sends all IPv4 traffic destined for the Internet to the Internet gateway. > Create a default security group and associate it with your default VPC.
> Create a default network access control list (ACL) and associate it with your default VPC.
> Associate the default DHCP options set for your AWS account with your default VPC.

Default VPC – http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/default-vpc.html

Basic Reinforcement Learning Tutorial 1

Basic Reinforcement Learning Tutorial 1

Background

Value Functions (state-action pair functions) estimate:
– how good a particular action will be in a given state
– what the return for that action is expected to be.

Q-Learning
– an off-policy (can update the estimated value functions using hypothetical actions, those which have not actually been tried) algorithm for temporal difference learning (method to estimate value functions).
– can be proven that given sufficient training,
– the Q-learning converges with probability 1 to a close approximation of the action-value function for an arbitrary target policy.
– learns the optimal policy even when actions are selected according to a more exploratory or even random policy.
– can be implemented as follows:

where:
s: is the previous state
a: is the previous action
Q(): is the Q-learning algorithm
s’: is the current state
alpha: is the the learning rate, set generally between 0 and 1. Setting it to 0 means that the Q-values are never updated, thereby nothing is learned. Setting alpha to a high value such as 0.9 means that learning can occur quickly.
gamma: is the discount factor, also set between 0 and 1. This models the fact that future rewards are worth less than immediate rewards.
max,: is the the maximum reward that is attainable in the state following the current one (the reward for taking the optimal action thereafter).

The algorithm can be interpreted as:

Initialize the Q-values table, Q(s, a).
Observe the current state, s.
Choose an action, a, for that state based on the selection policy.
Take the action, and observe the reward, r, as well as the new state, s’.
Update the Q-value for the state using the observed reward and the maximum reward possible for the next state.
Set the state to the new state, and repeat the process until a terminal state is reached.

ページトップへ