Iyfjky

Question

I am not sure of what the Agent in Reinforcement Learning exactly is. I think it is not the Neural Net behind?

So what is the Agent?
What does the Agent in Reinforcement Learning exactly do?

Douglas Daseeco 3,046331 · Accepted Answer · 2018-10-17 11:37:36Z

To understand why reinforcement learning needs and agent it is best to break down the concepts of learning.

Input

Output

Criteria for evaluation of the mapping from input to output

This criteria is use to determine two things.

What is deemed improvement

What is considered sufficient at any given moment of evaluation

Let's consider some typical kinds of learning using the above simple framework.

Supervised MLP (multilayer perceptron) learning â€” Criteria: A loss function that compares the labels (presumed to be ideal) with the output determines the correction signal that back-propagates and reliability and accuracy expected as a result of learning (convergence)

A child learning to sound out words â€” Criteria: The feedback of listener in terms of facial affect, body language, tone of voice, and words spoken as determined within the language framework of the child and the teacher or parent

Unsupervised feature extraction â€” Criteria: The narrowness (proportional reduction of the number of bits required to represent the input) and the reliability and accuracy in which the input can be reconstructed from its features.

Notice that these three examples have criteria that are essentially multivariate boundary conditions:

$mathbbR^3$: $(f_loss(...), 1 - delta, 1 - epsilon)$ â€” Loss function and the typical expressions of the reliability and accuracy resulting from training

$mathbbR^3$: Whether the word sounds right to the listener, whether the listener detects improvement they wish to acknowledge to reinforce what is perceived as a systemic improvement in visual typesetting to phonic translation¹, or whether the child is showing aversion toward the task

$mathbbR^3$: $(dfrac b_f b_in le eta, 1 - delta, 1 - epsilon)$ â€” The fraction is the ratio of bits representing the features in relation to the input that generated them and is bounded by $eta$

Now let's consider a fourth example.

Q-learning (a reinforcement strategy originating from C. Watson 1989 PhD thesis) â€” Criteria: A function called the Q function that evaluates behavioral wellness on the basis of a sequence of states between which actions are expected to cause the transition from the prior state to the subsequent state.

Here the multivariate boundary condition for Q-learning.

$mathbbR^2$: $mathbbR^3$: $(Q(...), 1 - delta, 1 - epsilon)$ â€” Q function and the typical expressions of the reliability and accuracy resulting from training

Strictly speaking, the Q function evaluation can be encapsulated, in which case that system component is technically an agent.

Real time learning networks may also receive a signal that indicates functional desirability (a reward signal to use Pavlovian behavioral psychology terminology) at any given point in time. The evaluation is external to the network from a design perspective. Because the criteria is decoupled from the network, the learning system is now configurable and extensible in ways that an integrated system cannot.

In this case, the external component, a black box that simply supplies a wellness signal, is an agent of reinforcement (or dissuasion if it has a pain-like signal too).

As AI architectures mature, the oversight function is more often decoupled from the network being overseen. There are several linguistic formulations of this causal relationship.

a. Controller â‡’ agent

b. Agent â‡’ network

c. Master â‡’ slave

Notice that when the word Controller is used, the agent IS the network and the criteria is evaluated by the controller. However, in machine learning the term Agent is generally used to indicate the evaluation component and the network is the slave to the Agent's evaluation.

It is this later case that one sees the term used in game theory (based on the work of Euler, Kirchhoff, PoincarÃƒÂ©, Markov, Morgenstern, and von Neumann), where the Agent is the automaton that chooses moves based on a strategy to achieve maximum reward.

In robotics, the evaluation may include both pain and pleasure as in the nervous systems arthropods, vertebrates, and other bilaterally symmetric animals. In this case the agent may be the robotics if the (a) causality language is used above, controlled by the AI controller. If the (b) causality language is used, the agent would be as in the game theory paragraph above.

Terms in AI can have more than one meaning, since AI is largely interdisciplinary, and there's no standards body to publish a canonical glossary of AI terminology. Over time, what was once jargon will become the academic standard. For instance, no one in electrical engineering quibbles over the meaning of voltage, but centuries ago they did.

The sad truth is that academic authors started using (b) without thinking of the normal use of the word Agent in English, to which (a) more closely conforms.

score 2 · Answer 2 · 2018-10-17 11:18:53Z

The agent in RL is the component that makes the decision of what action to take.

In order to make that decision, the agent is allowed to use any observation from the environment, and any internal rules that it has. Those internal rules can be anything, but typically in RL, it expects the current state to be provided by the environment, for that state to have the Markov property, and then it processes that state using a policy function $pi(a|s)$ that decides what action to take.

In addition, in RL we usually care about handling a reward signal (received from the environment) and optimising the agent towards maximising the expected reward in future. To do this, the agent will maintain some data which is influenced by the rewards it received in the past, and use that to construct a better policy.

One interesting thing about the definition of an agent, is that the agent/environment boundary is usually considered to be very close to the abstract decision making unit. For instance, for a robot, the agent is typically not the whole robot, but the specific program running on the robot's CPU that makes the decision on the action. All the relays/motors and other parts of the physical body of the robot are parts of the environment in RL terms. Although often loose language is used here, as the distinction might not matter in most descriptions - we would say that "the robot moves its arm to achieve the goal" when in stricter RL terms we should say that "the agent running on the robot CPU instructs the arm motors to move to achieve the goal".

I think it is not the Neural Net behind?

That is correct, the agent is more than the neural network. One or more neural networks might be part of an agent, and take the role of estimating the value of a state, or state/action pair, or even directly driving the policy function.

Manuel Rodriguez 1,092119 · Answer 3 · 2018-10-17 09:26:14Z

The agent replaces the human operator. He controls instead of him the game. The colloquial term for such a software is Aimbot. In the context of Reinforcement Learning the inner working of an aimbot is not realized with scripting languages like AutoIt, but with machine learning algorithm like q-learning and neural networks.

Douglas Daseeco 3,046331 · Accepted Answer · 2018-10-17 11:37:36Z

To understand why reinforcement learning needs and agent it is best to break down the concepts of learning.

Input

Output

Criteria for evaluation of the mapping from input to output

This criteria is use to determine two things.

What is deemed improvement

What is considered sufficient at any given moment of evaluation

Let's consider some typical kinds of learning using the above simple framework.

Supervised MLP (multilayer perceptron) learning â€” Criteria: A loss function that compares the labels (presumed to be ideal) with the output determines the correction signal that back-propagates and reliability and accuracy expected as a result of learning (convergence)

A child learning to sound out words â€” Criteria: The feedback of listener in terms of facial affect, body language, tone of voice, and words spoken as determined within the language framework of the child and the teacher or parent

Unsupervised feature extraction â€” Criteria: The narrowness (proportional reduction of the number of bits required to represent the input) and the reliability and accuracy in which the input can be reconstructed from its features.

Notice that these three examples have criteria that are essentially multivariate boundary conditions:

$mathbbR^3$: $(f_loss(...), 1 - delta, 1 - epsilon)$ â€” Loss function and the typical expressions of the reliability and accuracy resulting from training

$mathbbR^3$: Whether the word sounds right to the listener, whether the listener detects improvement they wish to acknowledge to reinforce what is perceived as a systemic improvement in visual typesetting to phonic translation¹, or whether the child is showing aversion toward the task

$mathbbR^3$: $(dfrac b_f b_in le eta, 1 - delta, 1 - epsilon)$ â€” The fraction is the ratio of bits representing the features in relation to the input that generated them and is bounded by $eta$

Now let's consider a fourth example.

Q-learning (a reinforcement strategy originating from C. Watson 1989 PhD thesis) â€” Criteria: A function called the Q function that evaluates behavioral wellness on the basis of a sequence of states between which actions are expected to cause the transition from the prior state to the subsequent state.

Here the multivariate boundary condition for Q-learning.

$mathbbR^2$: $mathbbR^3$: $(Q(...), 1 - delta, 1 - epsilon)$ â€” Q function and the typical expressions of the reliability and accuracy resulting from training

Strictly speaking, the Q function evaluation can be encapsulated, in which case that system component is technically an agent.

Real time learning networks may also receive a signal that indicates functional desirability (a reward signal to use Pavlovian behavioral psychology terminology) at any given point in time. The evaluation is external to the network from a design perspective. Because the criteria is decoupled from the network, the learning system is now configurable and extensible in ways that an integrated system cannot.

In this case, the external component, a black box that simply supplies a wellness signal, is an agent of reinforcement (or dissuasion if it has a pain-like signal too).

As AI architectures mature, the oversight function is more often decoupled from the network being overseen. There are several linguistic formulations of this causal relationship.

a. Controller â‡’ agent

b. Agent â‡’ network

c. Master â‡’ slave

Notice that when the word Controller is used, the agent IS the network and the criteria is evaluated by the controller. However, in machine learning the term Agent is generally used to indicate the evaluation component and the network is the slave to the Agent's evaluation.

It is this later case that one sees the term used in game theory (based on the work of Euler, Kirchhoff, PoincarÃƒÂ©, Markov, Morgenstern, and von Neumann), where the Agent is the automaton that chooses moves based on a strategy to achieve maximum reward.

In robotics, the evaluation may include both pain and pleasure as in the nervous systems arthropods, vertebrates, and other bilaterally symmetric animals. In this case the agent may be the robotics if the (a) causality language is used above, controlled by the AI controller. If the (b) causality language is used, the agent would be as in the game theory paragraph above.

Terms in AI can have more than one meaning, since AI is largely interdisciplinary, and there's no standards body to publish a canonical glossary of AI terminology. Over time, what was once jargon will become the academic standard. For instance, no one in electrical engineering quibbles over the meaning of voltage, but centuries ago they did.

The sad truth is that academic authors started using (b) without thinking of the normal use of the word Agent in English, to which (a) more closely conforms.

score 2 · Answer 5 · 2018-10-17 11:18:53Z

The agent in RL is the component that makes the decision of what action to take.

In order to make that decision, the agent is allowed to use any observation from the environment, and any internal rules that it has. Those internal rules can be anything, but typically in RL, it expects the current state to be provided by the environment, for that state to have the Markov property, and then it processes that state using a policy function $pi(a|s)$ that decides what action to take.

In addition, in RL we usually care about handling a reward signal (received from the environment) and optimising the agent towards maximising the expected reward in future. To do this, the agent will maintain some data which is influenced by the rewards it received in the past, and use that to construct a better policy.

One interesting thing about the definition of an agent, is that the agent/environment boundary is usually considered to be very close to the abstract decision making unit. For instance, for a robot, the agent is typically not the whole robot, but the specific program running on the robot's CPU that makes the decision on the action. All the relays/motors and other parts of the physical body of the robot are parts of the environment in RL terms. Although often loose language is used here, as the distinction might not matter in most descriptions - we would say that "the robot moves its arm to achieve the goal" when in stricter RL terms we should say that "the agent running on the robot CPU instructs the arm motors to move to achieve the goal".

I think it is not the Neural Net behind?

That is correct, the agent is more than the neural network. One or more neural networks might be part of an agent, and take the role of estimating the value of a state, or state/action pair, or even directly driving the policy function.

Manuel Rodriguez 1,092119 · Answer 6 · 2018-10-17 09:26:14Z

The agent replaces the human operator. He controls instead of him the game. The colloquial term for such a software is Aimbot. In the context of Reinforcement Learning the inner working of an aimbot is not realized with scripting languages like AutoIt, but with machine learning algorithm like q-learning and neural networks.

Search This Blog

Iyfjky

What does the Agent in Reinforcement Learning exactly do?

3 Answers
3

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Post as a guest

Comments

Post a Comment

Category

Random preview

What does the Agent in Reinforcement Learning exactly do?

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Comments

Post a Comment

3 Answers
3

3 Answers
3

3 Answers
3