What does the Agent in Reinforcement Learning exactly do?

Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
I am not sure of what the Agent in Reinforcement Learning exactly is. I think it is not the Neural Net behind?
So what is the Agent?
What does the Agent in Reinforcement Learning exactly do?
reinforcement-learning intelligent-agent
New contributor
TVSuchty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
2
down vote
favorite
I am not sure of what the Agent in Reinforcement Learning exactly is. I think it is not the Neural Net behind?
So what is the Agent?
What does the Agent in Reinforcement Learning exactly do?
reinforcement-learning intelligent-agent
New contributor
TVSuchty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I am not sure of what the Agent in Reinforcement Learning exactly is. I think it is not the Neural Net behind?
So what is the Agent?
What does the Agent in Reinforcement Learning exactly do?
reinforcement-learning intelligent-agent
New contributor
TVSuchty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I am not sure of what the Agent in Reinforcement Learning exactly is. I think it is not the Neural Net behind?
So what is the Agent?
What does the Agent in Reinforcement Learning exactly do?
reinforcement-learning intelligent-agent
reinforcement-learning intelligent-agent
New contributor
TVSuchty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
TVSuchty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
TVSuchty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 6 hours ago
TVSuchty
184
184
New contributor
TVSuchty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
TVSuchty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
TVSuchty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
1
down vote
accepted
To understand why reinforcement learning needs and agent it is best to break down the concepts of learning.
- Input
- Output
- Criteria for evaluation of the mapping from input to output
This criteria is use to determine two things.
- What is deemed improvement
- What is considered sufficient at any given moment of evaluation
Let's consider some typical kinds of learning using the above simple framework.
- Supervised MLP (multilayer perceptron) learning â Criteria: A loss function that compares the labels (presumed to be ideal) with the output determines the correction signal that back-propagates and reliability and accuracy expected as a result of learning (convergence)
- A child learning to sound out words â Criteria: The feedback of listener in terms of facial affect, body language, tone of voice, and words spoken as determined within the language framework of the child and the teacher or parent
- Unsupervised feature extraction â Criteria: The narrowness (proportional reduction of the number of bits required to represent the input) and the reliability and accuracy in which the input can be reconstructed from its features.
Notice that these three examples have criteria that are essentially multivariate boundary conditions:
$mathbbR^3$: $(f_loss(...), 1 - delta, 1 - epsilon)$ â Loss function and the typical expressions of the reliability and accuracy resulting from training
$mathbbR^3$: Whether the word sounds right to the listener, whether the listener detects improvement they wish to acknowledge to reinforce what is perceived as a systemic improvement in visual typesetting to phonic translation1, or whether the child is showing aversion toward the task
$mathbbR^3$: $(dfrac b_f b_in le eta, 1 - delta, 1 - epsilon)$ â The fraction is the ratio of bits representing the features in relation to the input that generated them and is bounded by $eta$
Now let's consider a fourth example.
- Q-learning (a reinforcement strategy originating from C. Watson 1989 PhD thesis) â Criteria: A function called the Q function that evaluates behavioral wellness on the basis of a sequence of states between which actions are expected to cause the transition from the prior state to the subsequent state.
Here the multivariate boundary condition for Q-learning.
$mathbbR^2$: $mathbbR^3$: $(Q(...), 1 - delta, 1 - epsilon)$ â Q function and the typical expressions of the reliability and accuracy resulting from training
Strictly speaking, the Q function evaluation can be encapsulated, in which case that system component is technically an agent.
Real time learning networks may also receive a signal that indicates functional desirability (a reward signal to use Pavlovian behavioral psychology terminology) at any given point in time. The evaluation is external to the network from a design perspective. Because the criteria is decoupled from the network, the learning system is now configurable and extensible in ways that an integrated system cannot.
In this case, the external component, a black box that simply supplies a wellness signal, is an agent of reinforcement (or dissuasion if it has a pain-like signal too).
As AI architectures mature, the oversight function is more often decoupled from the network being overseen. There are several linguistic formulations of this causal relationship.
a. Controller â agent
b. Agent â network
c. Master â slave
Notice that when the word Controller is used, the agent IS the network and the criteria is evaluated by the controller. However, in machine learning the term Agent is generally used to indicate the evaluation component and the network is the slave to the Agent's evaluation.
It is this later case that one sees the term used in game theory (based on the work of Euler, Kirchhoff, Poincaré, Markov, Morgenstern, and von Neumann), where the Agent is the automaton that chooses moves based on a strategy to achieve maximum reward.
In robotics, the evaluation may include both pain and pleasure as in the nervous systems arthropods, vertebrates, and other bilaterally symmetric animals. In this case the agent may be the robotics if the (a) causality language is used above, controlled by the AI controller. If the (b) causality language is used, the agent would be as in the game theory paragraph above.
Terms in AI can have more than one meaning, since AI is largely interdisciplinary, and there's no standards body to publish a canonical glossary of AI terminology. Over time, what was once jargon will become the academic standard. For instance, no one in electrical engineering quibbles over the meaning of voltage, but centuries ago they did.
The sad truth is that academic authors started using (b) without thinking of the normal use of the word Agent in English, to which (a) more closely conforms.
add a comment |Â
up vote
2
down vote
The agent in RL is the component that makes the decision of what action to take.
In order to make that decision, the agent is allowed to use any observation from the environment, and any internal rules that it has. Those internal rules can be anything, but typically in RL, it expects the current state to be provided by the environment, for that state to have the Markov property, and then it processes that state using a policy function $pi(a|s)$ that decides what action to take.
In addition, in RL we usually care about handling a reward signal (received from the environment) and optimising the agent towards maximising the expected reward in future. To do this, the agent will maintain some data which is influenced by the rewards it received in the past, and use that to construct a better policy.
One interesting thing about the definition of an agent, is that the agent/environment boundary is usually considered to be very close to the abstract decision making unit. For instance, for a robot, the agent is typically not the whole robot, but the specific program running on the robot's CPU that makes the decision on the action. All the relays/motors and other parts of the physical body of the robot are parts of the environment in RL terms. Although often loose language is used here, as the distinction might not matter in most descriptions - we would say that "the robot moves its arm to achieve the goal" when in stricter RL terms we should say that "the agent running on the robot CPU instructs the arm motors to move to achieve the goal".
I think it is not the Neural Net behind?
That is correct, the agent is more than the neural network. One or more neural networks might be part of an agent, and take the role of estimating the value of a state, or state/action pair, or even directly driving the policy function.
add a comment |Â
up vote
1
down vote
The agent replaces the human operator. He controls instead of him the game. The colloquial term for such a software is Aimbot. In the context of Reinforcement Learning the inner working of an aimbot is not realized with scripting languages like AutoIt, but with machine learning algorithm like q-learning and neural networks.
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
To understand why reinforcement learning needs and agent it is best to break down the concepts of learning.
- Input
- Output
- Criteria for evaluation of the mapping from input to output
This criteria is use to determine two things.
- What is deemed improvement
- What is considered sufficient at any given moment of evaluation
Let's consider some typical kinds of learning using the above simple framework.
- Supervised MLP (multilayer perceptron) learning â Criteria: A loss function that compares the labels (presumed to be ideal) with the output determines the correction signal that back-propagates and reliability and accuracy expected as a result of learning (convergence)
- A child learning to sound out words â Criteria: The feedback of listener in terms of facial affect, body language, tone of voice, and words spoken as determined within the language framework of the child and the teacher or parent
- Unsupervised feature extraction â Criteria: The narrowness (proportional reduction of the number of bits required to represent the input) and the reliability and accuracy in which the input can be reconstructed from its features.
Notice that these three examples have criteria that are essentially multivariate boundary conditions:
$mathbbR^3$: $(f_loss(...), 1 - delta, 1 - epsilon)$ â Loss function and the typical expressions of the reliability and accuracy resulting from training
$mathbbR^3$: Whether the word sounds right to the listener, whether the listener detects improvement they wish to acknowledge to reinforce what is perceived as a systemic improvement in visual typesetting to phonic translation1, or whether the child is showing aversion toward the task
$mathbbR^3$: $(dfrac b_f b_in le eta, 1 - delta, 1 - epsilon)$ â The fraction is the ratio of bits representing the features in relation to the input that generated them and is bounded by $eta$
Now let's consider a fourth example.
- Q-learning (a reinforcement strategy originating from C. Watson 1989 PhD thesis) â Criteria: A function called the Q function that evaluates behavioral wellness on the basis of a sequence of states between which actions are expected to cause the transition from the prior state to the subsequent state.
Here the multivariate boundary condition for Q-learning.
$mathbbR^2$: $mathbbR^3$: $(Q(...), 1 - delta, 1 - epsilon)$ â Q function and the typical expressions of the reliability and accuracy resulting from training
Strictly speaking, the Q function evaluation can be encapsulated, in which case that system component is technically an agent.
Real time learning networks may also receive a signal that indicates functional desirability (a reward signal to use Pavlovian behavioral psychology terminology) at any given point in time. The evaluation is external to the network from a design perspective. Because the criteria is decoupled from the network, the learning system is now configurable and extensible in ways that an integrated system cannot.
In this case, the external component, a black box that simply supplies a wellness signal, is an agent of reinforcement (or dissuasion if it has a pain-like signal too).
As AI architectures mature, the oversight function is more often decoupled from the network being overseen. There are several linguistic formulations of this causal relationship.
a. Controller â agent
b. Agent â network
c. Master â slave
Notice that when the word Controller is used, the agent IS the network and the criteria is evaluated by the controller. However, in machine learning the term Agent is generally used to indicate the evaluation component and the network is the slave to the Agent's evaluation.
It is this later case that one sees the term used in game theory (based on the work of Euler, Kirchhoff, Poincaré, Markov, Morgenstern, and von Neumann), where the Agent is the automaton that chooses moves based on a strategy to achieve maximum reward.
In robotics, the evaluation may include both pain and pleasure as in the nervous systems arthropods, vertebrates, and other bilaterally symmetric animals. In this case the agent may be the robotics if the (a) causality language is used above, controlled by the AI controller. If the (b) causality language is used, the agent would be as in the game theory paragraph above.
Terms in AI can have more than one meaning, since AI is largely interdisciplinary, and there's no standards body to publish a canonical glossary of AI terminology. Over time, what was once jargon will become the academic standard. For instance, no one in electrical engineering quibbles over the meaning of voltage, but centuries ago they did.
The sad truth is that academic authors started using (b) without thinking of the normal use of the word Agent in English, to which (a) more closely conforms.
add a comment |Â
up vote
1
down vote
accepted
To understand why reinforcement learning needs and agent it is best to break down the concepts of learning.
- Input
- Output
- Criteria for evaluation of the mapping from input to output
This criteria is use to determine two things.
- What is deemed improvement
- What is considered sufficient at any given moment of evaluation
Let's consider some typical kinds of learning using the above simple framework.
- Supervised MLP (multilayer perceptron) learning â Criteria: A loss function that compares the labels (presumed to be ideal) with the output determines the correction signal that back-propagates and reliability and accuracy expected as a result of learning (convergence)
- A child learning to sound out words â Criteria: The feedback of listener in terms of facial affect, body language, tone of voice, and words spoken as determined within the language framework of the child and the teacher or parent
- Unsupervised feature extraction â Criteria: The narrowness (proportional reduction of the number of bits required to represent the input) and the reliability and accuracy in which the input can be reconstructed from its features.
Notice that these three examples have criteria that are essentially multivariate boundary conditions:
$mathbbR^3$: $(f_loss(...), 1 - delta, 1 - epsilon)$ â Loss function and the typical expressions of the reliability and accuracy resulting from training
$mathbbR^3$: Whether the word sounds right to the listener, whether the listener detects improvement they wish to acknowledge to reinforce what is perceived as a systemic improvement in visual typesetting to phonic translation1, or whether the child is showing aversion toward the task
$mathbbR^3$: $(dfrac b_f b_in le eta, 1 - delta, 1 - epsilon)$ â The fraction is the ratio of bits representing the features in relation to the input that generated them and is bounded by $eta$
Now let's consider a fourth example.
- Q-learning (a reinforcement strategy originating from C. Watson 1989 PhD thesis) â Criteria: A function called the Q function that evaluates behavioral wellness on the basis of a sequence of states between which actions are expected to cause the transition from the prior state to the subsequent state.
Here the multivariate boundary condition for Q-learning.
$mathbbR^2$: $mathbbR^3$: $(Q(...), 1 - delta, 1 - epsilon)$ â Q function and the typical expressions of the reliability and accuracy resulting from training
Strictly speaking, the Q function evaluation can be encapsulated, in which case that system component is technically an agent.
Real time learning networks may also receive a signal that indicates functional desirability (a reward signal to use Pavlovian behavioral psychology terminology) at any given point in time. The evaluation is external to the network from a design perspective. Because the criteria is decoupled from the network, the learning system is now configurable and extensible in ways that an integrated system cannot.
In this case, the external component, a black box that simply supplies a wellness signal, is an agent of reinforcement (or dissuasion if it has a pain-like signal too).
As AI architectures mature, the oversight function is more often decoupled from the network being overseen. There are several linguistic formulations of this causal relationship.
a. Controller â agent
b. Agent â network
c. Master â slave
Notice that when the word Controller is used, the agent IS the network and the criteria is evaluated by the controller. However, in machine learning the term Agent is generally used to indicate the evaluation component and the network is the slave to the Agent's evaluation.
It is this later case that one sees the term used in game theory (based on the work of Euler, Kirchhoff, Poincaré, Markov, Morgenstern, and von Neumann), where the Agent is the automaton that chooses moves based on a strategy to achieve maximum reward.
In robotics, the evaluation may include both pain and pleasure as in the nervous systems arthropods, vertebrates, and other bilaterally symmetric animals. In this case the agent may be the robotics if the (a) causality language is used above, controlled by the AI controller. If the (b) causality language is used, the agent would be as in the game theory paragraph above.
Terms in AI can have more than one meaning, since AI is largely interdisciplinary, and there's no standards body to publish a canonical glossary of AI terminology. Over time, what was once jargon will become the academic standard. For instance, no one in electrical engineering quibbles over the meaning of voltage, but centuries ago they did.
The sad truth is that academic authors started using (b) without thinking of the normal use of the word Agent in English, to which (a) more closely conforms.
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
To understand why reinforcement learning needs and agent it is best to break down the concepts of learning.
- Input
- Output
- Criteria for evaluation of the mapping from input to output
This criteria is use to determine two things.
- What is deemed improvement
- What is considered sufficient at any given moment of evaluation
Let's consider some typical kinds of learning using the above simple framework.
- Supervised MLP (multilayer perceptron) learning â Criteria: A loss function that compares the labels (presumed to be ideal) with the output determines the correction signal that back-propagates and reliability and accuracy expected as a result of learning (convergence)
- A child learning to sound out words â Criteria: The feedback of listener in terms of facial affect, body language, tone of voice, and words spoken as determined within the language framework of the child and the teacher or parent
- Unsupervised feature extraction â Criteria: The narrowness (proportional reduction of the number of bits required to represent the input) and the reliability and accuracy in which the input can be reconstructed from its features.
Notice that these three examples have criteria that are essentially multivariate boundary conditions:
$mathbbR^3$: $(f_loss(...), 1 - delta, 1 - epsilon)$ â Loss function and the typical expressions of the reliability and accuracy resulting from training
$mathbbR^3$: Whether the word sounds right to the listener, whether the listener detects improvement they wish to acknowledge to reinforce what is perceived as a systemic improvement in visual typesetting to phonic translation1, or whether the child is showing aversion toward the task
$mathbbR^3$: $(dfrac b_f b_in le eta, 1 - delta, 1 - epsilon)$ â The fraction is the ratio of bits representing the features in relation to the input that generated them and is bounded by $eta$
Now let's consider a fourth example.
- Q-learning (a reinforcement strategy originating from C. Watson 1989 PhD thesis) â Criteria: A function called the Q function that evaluates behavioral wellness on the basis of a sequence of states between which actions are expected to cause the transition from the prior state to the subsequent state.
Here the multivariate boundary condition for Q-learning.
$mathbbR^2$: $mathbbR^3$: $(Q(...), 1 - delta, 1 - epsilon)$ â Q function and the typical expressions of the reliability and accuracy resulting from training
Strictly speaking, the Q function evaluation can be encapsulated, in which case that system component is technically an agent.
Real time learning networks may also receive a signal that indicates functional desirability (a reward signal to use Pavlovian behavioral psychology terminology) at any given point in time. The evaluation is external to the network from a design perspective. Because the criteria is decoupled from the network, the learning system is now configurable and extensible in ways that an integrated system cannot.
In this case, the external component, a black box that simply supplies a wellness signal, is an agent of reinforcement (or dissuasion if it has a pain-like signal too).
As AI architectures mature, the oversight function is more often decoupled from the network being overseen. There are several linguistic formulations of this causal relationship.
a. Controller â agent
b. Agent â network
c. Master â slave
Notice that when the word Controller is used, the agent IS the network and the criteria is evaluated by the controller. However, in machine learning the term Agent is generally used to indicate the evaluation component and the network is the slave to the Agent's evaluation.
It is this later case that one sees the term used in game theory (based on the work of Euler, Kirchhoff, Poincaré, Markov, Morgenstern, and von Neumann), where the Agent is the automaton that chooses moves based on a strategy to achieve maximum reward.
In robotics, the evaluation may include both pain and pleasure as in the nervous systems arthropods, vertebrates, and other bilaterally symmetric animals. In this case the agent may be the robotics if the (a) causality language is used above, controlled by the AI controller. If the (b) causality language is used, the agent would be as in the game theory paragraph above.
Terms in AI can have more than one meaning, since AI is largely interdisciplinary, and there's no standards body to publish a canonical glossary of AI terminology. Over time, what was once jargon will become the academic standard. For instance, no one in electrical engineering quibbles over the meaning of voltage, but centuries ago they did.
The sad truth is that academic authors started using (b) without thinking of the normal use of the word Agent in English, to which (a) more closely conforms.
To understand why reinforcement learning needs and agent it is best to break down the concepts of learning.
- Input
- Output
- Criteria for evaluation of the mapping from input to output
This criteria is use to determine two things.
- What is deemed improvement
- What is considered sufficient at any given moment of evaluation
Let's consider some typical kinds of learning using the above simple framework.
- Supervised MLP (multilayer perceptron) learning â Criteria: A loss function that compares the labels (presumed to be ideal) with the output determines the correction signal that back-propagates and reliability and accuracy expected as a result of learning (convergence)
- A child learning to sound out words â Criteria: The feedback of listener in terms of facial affect, body language, tone of voice, and words spoken as determined within the language framework of the child and the teacher or parent
- Unsupervised feature extraction â Criteria: The narrowness (proportional reduction of the number of bits required to represent the input) and the reliability and accuracy in which the input can be reconstructed from its features.
Notice that these three examples have criteria that are essentially multivariate boundary conditions:
$mathbbR^3$: $(f_loss(...), 1 - delta, 1 - epsilon)$ â Loss function and the typical expressions of the reliability and accuracy resulting from training
$mathbbR^3$: Whether the word sounds right to the listener, whether the listener detects improvement they wish to acknowledge to reinforce what is perceived as a systemic improvement in visual typesetting to phonic translation1, or whether the child is showing aversion toward the task
$mathbbR^3$: $(dfrac b_f b_in le eta, 1 - delta, 1 - epsilon)$ â The fraction is the ratio of bits representing the features in relation to the input that generated them and is bounded by $eta$
Now let's consider a fourth example.
- Q-learning (a reinforcement strategy originating from C. Watson 1989 PhD thesis) â Criteria: A function called the Q function that evaluates behavioral wellness on the basis of a sequence of states between which actions are expected to cause the transition from the prior state to the subsequent state.
Here the multivariate boundary condition for Q-learning.
$mathbbR^2$: $mathbbR^3$: $(Q(...), 1 - delta, 1 - epsilon)$ â Q function and the typical expressions of the reliability and accuracy resulting from training
Strictly speaking, the Q function evaluation can be encapsulated, in which case that system component is technically an agent.
Real time learning networks may also receive a signal that indicates functional desirability (a reward signal to use Pavlovian behavioral psychology terminology) at any given point in time. The evaluation is external to the network from a design perspective. Because the criteria is decoupled from the network, the learning system is now configurable and extensible in ways that an integrated system cannot.
In this case, the external component, a black box that simply supplies a wellness signal, is an agent of reinforcement (or dissuasion if it has a pain-like signal too).
As AI architectures mature, the oversight function is more often decoupled from the network being overseen. There are several linguistic formulations of this causal relationship.
a. Controller â agent
b. Agent â network
c. Master â slave
Notice that when the word Controller is used, the agent IS the network and the criteria is evaluated by the controller. However, in machine learning the term Agent is generally used to indicate the evaluation component and the network is the slave to the Agent's evaluation.
It is this later case that one sees the term used in game theory (based on the work of Euler, Kirchhoff, Poincaré, Markov, Morgenstern, and von Neumann), where the Agent is the automaton that chooses moves based on a strategy to achieve maximum reward.
In robotics, the evaluation may include both pain and pleasure as in the nervous systems arthropods, vertebrates, and other bilaterally symmetric animals. In this case the agent may be the robotics if the (a) causality language is used above, controlled by the AI controller. If the (b) causality language is used, the agent would be as in the game theory paragraph above.
Terms in AI can have more than one meaning, since AI is largely interdisciplinary, and there's no standards body to publish a canonical glossary of AI terminology. Over time, what was once jargon will become the academic standard. For instance, no one in electrical engineering quibbles over the meaning of voltage, but centuries ago they did.
The sad truth is that academic authors started using (b) without thinking of the normal use of the word Agent in English, to which (a) more closely conforms.
answered 2 hours ago
Douglas Daseeco
3,046331
3,046331
add a comment |Â
add a comment |Â
up vote
2
down vote
The agent in RL is the component that makes the decision of what action to take.
In order to make that decision, the agent is allowed to use any observation from the environment, and any internal rules that it has. Those internal rules can be anything, but typically in RL, it expects the current state to be provided by the environment, for that state to have the Markov property, and then it processes that state using a policy function $pi(a|s)$ that decides what action to take.
In addition, in RL we usually care about handling a reward signal (received from the environment) and optimising the agent towards maximising the expected reward in future. To do this, the agent will maintain some data which is influenced by the rewards it received in the past, and use that to construct a better policy.
One interesting thing about the definition of an agent, is that the agent/environment boundary is usually considered to be very close to the abstract decision making unit. For instance, for a robot, the agent is typically not the whole robot, but the specific program running on the robot's CPU that makes the decision on the action. All the relays/motors and other parts of the physical body of the robot are parts of the environment in RL terms. Although often loose language is used here, as the distinction might not matter in most descriptions - we would say that "the robot moves its arm to achieve the goal" when in stricter RL terms we should say that "the agent running on the robot CPU instructs the arm motors to move to achieve the goal".
I think it is not the Neural Net behind?
That is correct, the agent is more than the neural network. One or more neural networks might be part of an agent, and take the role of estimating the value of a state, or state/action pair, or even directly driving the policy function.
add a comment |Â
up vote
2
down vote
The agent in RL is the component that makes the decision of what action to take.
In order to make that decision, the agent is allowed to use any observation from the environment, and any internal rules that it has. Those internal rules can be anything, but typically in RL, it expects the current state to be provided by the environment, for that state to have the Markov property, and then it processes that state using a policy function $pi(a|s)$ that decides what action to take.
In addition, in RL we usually care about handling a reward signal (received from the environment) and optimising the agent towards maximising the expected reward in future. To do this, the agent will maintain some data which is influenced by the rewards it received in the past, and use that to construct a better policy.
One interesting thing about the definition of an agent, is that the agent/environment boundary is usually considered to be very close to the abstract decision making unit. For instance, for a robot, the agent is typically not the whole robot, but the specific program running on the robot's CPU that makes the decision on the action. All the relays/motors and other parts of the physical body of the robot are parts of the environment in RL terms. Although often loose language is used here, as the distinction might not matter in most descriptions - we would say that "the robot moves its arm to achieve the goal" when in stricter RL terms we should say that "the agent running on the robot CPU instructs the arm motors to move to achieve the goal".
I think it is not the Neural Net behind?
That is correct, the agent is more than the neural network. One or more neural networks might be part of an agent, and take the role of estimating the value of a state, or state/action pair, or even directly driving the policy function.
add a comment |Â
up vote
2
down vote
up vote
2
down vote
The agent in RL is the component that makes the decision of what action to take.
In order to make that decision, the agent is allowed to use any observation from the environment, and any internal rules that it has. Those internal rules can be anything, but typically in RL, it expects the current state to be provided by the environment, for that state to have the Markov property, and then it processes that state using a policy function $pi(a|s)$ that decides what action to take.
In addition, in RL we usually care about handling a reward signal (received from the environment) and optimising the agent towards maximising the expected reward in future. To do this, the agent will maintain some data which is influenced by the rewards it received in the past, and use that to construct a better policy.
One interesting thing about the definition of an agent, is that the agent/environment boundary is usually considered to be very close to the abstract decision making unit. For instance, for a robot, the agent is typically not the whole robot, but the specific program running on the robot's CPU that makes the decision on the action. All the relays/motors and other parts of the physical body of the robot are parts of the environment in RL terms. Although often loose language is used here, as the distinction might not matter in most descriptions - we would say that "the robot moves its arm to achieve the goal" when in stricter RL terms we should say that "the agent running on the robot CPU instructs the arm motors to move to achieve the goal".
I think it is not the Neural Net behind?
That is correct, the agent is more than the neural network. One or more neural networks might be part of an agent, and take the role of estimating the value of a state, or state/action pair, or even directly driving the policy function.
The agent in RL is the component that makes the decision of what action to take.
In order to make that decision, the agent is allowed to use any observation from the environment, and any internal rules that it has. Those internal rules can be anything, but typically in RL, it expects the current state to be provided by the environment, for that state to have the Markov property, and then it processes that state using a policy function $pi(a|s)$ that decides what action to take.
In addition, in RL we usually care about handling a reward signal (received from the environment) and optimising the agent towards maximising the expected reward in future. To do this, the agent will maintain some data which is influenced by the rewards it received in the past, and use that to construct a better policy.
One interesting thing about the definition of an agent, is that the agent/environment boundary is usually considered to be very close to the abstract decision making unit. For instance, for a robot, the agent is typically not the whole robot, but the specific program running on the robot's CPU that makes the decision on the action. All the relays/motors and other parts of the physical body of the robot are parts of the environment in RL terms. Although often loose language is used here, as the distinction might not matter in most descriptions - we would say that "the robot moves its arm to achieve the goal" when in stricter RL terms we should say that "the agent running on the robot CPU instructs the arm motors to move to achieve the goal".
I think it is not the Neural Net behind?
That is correct, the agent is more than the neural network. One or more neural networks might be part of an agent, and take the role of estimating the value of a state, or state/action pair, or even directly driving the policy function.
edited 2 hours ago
answered 5 hours ago
Neil Slater
3,518415
3,518415
add a comment |Â
add a comment |Â
up vote
1
down vote
The agent replaces the human operator. He controls instead of him the game. The colloquial term for such a software is Aimbot. In the context of Reinforcement Learning the inner working of an aimbot is not realized with scripting languages like AutoIt, but with machine learning algorithm like q-learning and neural networks.
add a comment |Â
up vote
1
down vote
The agent replaces the human operator. He controls instead of him the game. The colloquial term for such a software is Aimbot. In the context of Reinforcement Learning the inner working of an aimbot is not realized with scripting languages like AutoIt, but with machine learning algorithm like q-learning and neural networks.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
The agent replaces the human operator. He controls instead of him the game. The colloquial term for such a software is Aimbot. In the context of Reinforcement Learning the inner working of an aimbot is not realized with scripting languages like AutoIt, but with machine learning algorithm like q-learning and neural networks.
The agent replaces the human operator. He controls instead of him the game. The colloquial term for such a software is Aimbot. In the context of Reinforcement Learning the inner working of an aimbot is not realized with scripting languages like AutoIt, but with machine learning algorithm like q-learning and neural networks.
answered 4 hours ago
Manuel Rodriguez
1,092119
1,092119
add a comment |Â
add a comment |Â
TVSuchty is a new contributor. Be nice, and check out our Code of Conduct.
TVSuchty is a new contributor. Be nice, and check out our Code of Conduct.
TVSuchty is a new contributor. Be nice, and check out our Code of Conduct.
TVSuchty is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f8476%2fwhat-does-the-agent-in-reinforcement-learning-exactly-do%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
