Introduction to Reinforcement Learning: Key Terminologies

An important Machine Learning concept is Reinforcement Learning which is different from the more common Supervised or Unsupervised Learning models. In Supervised learning, you have the labels for training, in Unsupervised learning, there is no labeled data. Reinforcement Learning falls in between the two because it does not have a label but it learns from the feedback it receives which is kind of like the label.

In Reinforcement Learning (RL), there are some key concepts. Let us introduce them through the Hello World example for RL. Imagine you have a robot which does not know how to navigate through a maze. We call this robot an agent. The maze is divided into grids.

![](https://pxt.azureedge.net/blob/0b9ace70b7cc48419a39c23253970318d7f191f0/static/lessons/maze-ai.png)

Source: <https://minecraft.makecode.com/lessons/maze-ai-part1>

Your robot can be located in any of the grids at a time. We call these grids “states”. The robot takes an “action” at a time which makes it move from grid (state) to another grid. The movement is called “transition“. Through this transition it receives some sense of how much it is close to the goal. This is what we call as the “reward“. The objective of a RL agent is to accomplish the task so that the rewards are maximum.

Since your robot does not know the task of how to navigate through the maze and reach the goal which is getting out of the maze, so it needs to learn the best “policy” which is basically which decision it should take at every state.

That’s all for now. We covered the key terms in RL:

Agent
State
Action
Transition
Rewards
Policy

A problem with such structure can be modeled as a Markov Decision Process (MDP) There are awesome RL algorithms that helps us find the best policies to take at every state to help us plan! So, next, checkout this follow up post in this Reinforcement Learning series where you get introduced to the famous Bellman Equation, Policy Iteration, Value Iteration, and the model-free Q-Learning algorithm!