What does the Agent in Reinforcement Learning exactly do?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite












I am not sure of what the Agent in Reinforcement Learning exactly is. I think it is not the Neural Net behind?



So what is the Agent?
What does the Agent in Reinforcement Learning exactly do?










share|improve this question







New contributor




TVSuchty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.























    up vote
    2
    down vote

    favorite












    I am not sure of what the Agent in Reinforcement Learning exactly is. I think it is not the Neural Net behind?



    So what is the Agent?
    What does the Agent in Reinforcement Learning exactly do?










    share|improve this question







    New contributor




    TVSuchty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





















      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      I am not sure of what the Agent in Reinforcement Learning exactly is. I think it is not the Neural Net behind?



      So what is the Agent?
      What does the Agent in Reinforcement Learning exactly do?










      share|improve this question







      New contributor




      TVSuchty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      I am not sure of what the Agent in Reinforcement Learning exactly is. I think it is not the Neural Net behind?



      So what is the Agent?
      What does the Agent in Reinforcement Learning exactly do?







      reinforcement-learning intelligent-agent






      share|improve this question







      New contributor




      TVSuchty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      TVSuchty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      TVSuchty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 6 hours ago









      TVSuchty

      184




      184




      New contributor




      TVSuchty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      TVSuchty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      TVSuchty is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          To understand why reinforcement learning needs and agent it is best to break down the concepts of learning.



          • Input

          • Output

          • Criteria for evaluation of the mapping from input to output

          This criteria is use to determine two things.



          • What is deemed improvement

          • What is considered sufficient at any given moment of evaluation

          Let's consider some typical kinds of learning using the above simple framework.



          1. Supervised MLP (multilayer perceptron) learning — Criteria: A loss function that compares the labels (presumed to be ideal) with the output determines the correction signal that back-propagates and reliability and accuracy expected as a result of learning (convergence)

          2. A child learning to sound out words — Criteria: The feedback of listener in terms of facial affect, body language, tone of voice, and words spoken as determined within the language framework of the child and the teacher or parent

          3. Unsupervised feature extraction — Criteria: The narrowness (proportional reduction of the number of bits required to represent the input) and the reliability and accuracy in which the input can be reconstructed from its features.

          Notice that these three examples have criteria that are essentially multivariate boundary conditions:




          1. $mathbbR^3$: $(f_loss(...), 1 - delta, 1 - epsilon)$ — Loss function and the typical expressions of the reliability and accuracy resulting from training


          2. $mathbbR^3$: Whether the word sounds right to the listener, whether the listener detects improvement they wish to acknowledge to reinforce what is perceived as a systemic improvement in visual typesetting to phonic translation1, or whether the child is showing aversion toward the task


          3. $mathbbR^3$: $(dfrac b_f b_in le eta, 1 - delta, 1 - epsilon)$ — The fraction is the ratio of bits representing the features in relation to the input that generated them and is bounded by $eta$

          Now let's consider a fourth example.



          1. Q-learning (a reinforcement strategy originating from C. Watson 1989 PhD thesis) — Criteria: A function called the Q function that evaluates behavioral wellness on the basis of a sequence of states between which actions are expected to cause the transition from the prior state to the subsequent state.

          Here the multivariate boundary condition for Q-learning.




          1. $mathbbR^2$: $mathbbR^3$: $(Q(...), 1 - delta, 1 - epsilon)$ — Q function and the typical expressions of the reliability and accuracy resulting from training

          Strictly speaking, the Q function evaluation can be encapsulated, in which case that system component is technically an agent.



          Real time learning networks may also receive a signal that indicates functional desirability (a reward signal to use Pavlovian behavioral psychology terminology) at any given point in time. The evaluation is external to the network from a design perspective. Because the criteria is decoupled from the network, the learning system is now configurable and extensible in ways that an integrated system cannot.



          In this case, the external component, a black box that simply supplies a wellness signal, is an agent of reinforcement (or dissuasion if it has a pain-like signal too).



          As AI architectures mature, the oversight function is more often decoupled from the network being overseen. There are several linguistic formulations of this causal relationship.




          a. Controller ⇒ agent



          b. Agent ⇒ network



          c. Master ⇒ slave




          Notice that when the word Controller is used, the agent IS the network and the criteria is evaluated by the controller. However, in machine learning the term Agent is generally used to indicate the evaluation component and the network is the slave to the Agent's evaluation.



          It is this later case that one sees the term used in game theory (based on the work of Euler, Kirchhoff, Poincaré, Markov, Morgenstern, and von Neumann), where the Agent is the automaton that chooses moves based on a strategy to achieve maximum reward.



          In robotics, the evaluation may include both pain and pleasure as in the nervous systems arthropods, vertebrates, and other bilaterally symmetric animals. In this case the agent may be the robotics if the (a) causality language is used above, controlled by the AI controller. If the (b) causality language is used, the agent would be as in the game theory paragraph above.



          Terms in AI can have more than one meaning, since AI is largely interdisciplinary, and there's no standards body to publish a canonical glossary of AI terminology. Over time, what was once jargon will become the academic standard. For instance, no one in electrical engineering quibbles over the meaning of voltage, but centuries ago they did.



          The sad truth is that academic authors started using (b) without thinking of the normal use of the word Agent in English, to which (a) more closely conforms.






          share|improve this answer



























            up vote
            2
            down vote













            The agent in RL is the component that makes the decision of what action to take.



            In order to make that decision, the agent is allowed to use any observation from the environment, and any internal rules that it has. Those internal rules can be anything, but typically in RL, it expects the current state to be provided by the environment, for that state to have the Markov property, and then it processes that state using a policy function $pi(a|s)$ that decides what action to take.



            In addition, in RL we usually care about handling a reward signal (received from the environment) and optimising the agent towards maximising the expected reward in future. To do this, the agent will maintain some data which is influenced by the rewards it received in the past, and use that to construct a better policy.



            One interesting thing about the definition of an agent, is that the agent/environment boundary is usually considered to be very close to the abstract decision making unit. For instance, for a robot, the agent is typically not the whole robot, but the specific program running on the robot's CPU that makes the decision on the action. All the relays/motors and other parts of the physical body of the robot are parts of the environment in RL terms. Although often loose language is used here, as the distinction might not matter in most descriptions - we would say that "the robot moves its arm to achieve the goal" when in stricter RL terms we should say that "the agent running on the robot CPU instructs the arm motors to move to achieve the goal".




            I think it is not the Neural Net behind?




            That is correct, the agent is more than the neural network. One or more neural networks might be part of an agent, and take the role of estimating the value of a state, or state/action pair, or even directly driving the policy function.






            share|improve this answer





























              up vote
              1
              down vote













              The agent replaces the human operator. He controls instead of him the game. The colloquial term for such a software is Aimbot. In the context of Reinforcement Learning the inner working of an aimbot is not realized with scripting languages like AutoIt, but with machine learning algorithm like q-learning and neural networks.






              share|improve this answer




















                Your Answer




                StackExchange.ifUsing("editor", function ()
                return StackExchange.using("mathjaxEditing", function ()
                StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
                StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
                );
                );
                , "mathjax-editing");

                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "658"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                convertImagesToLinks: false,
                noModals: false,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: null,
                bindNavPrevention: true,
                postfix: "",
                noCode: true, onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );






                TVSuchty is a new contributor. Be nice, and check out our Code of Conduct.









                 

                draft saved


                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f8476%2fwhat-does-the-agent-in-reinforcement-learning-exactly-do%23new-answer', 'question_page');

                );

                Post as a guest






























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes








                up vote
                1
                down vote



                accepted










                To understand why reinforcement learning needs and agent it is best to break down the concepts of learning.



                • Input

                • Output

                • Criteria for evaluation of the mapping from input to output

                This criteria is use to determine two things.



                • What is deemed improvement

                • What is considered sufficient at any given moment of evaluation

                Let's consider some typical kinds of learning using the above simple framework.



                1. Supervised MLP (multilayer perceptron) learning — Criteria: A loss function that compares the labels (presumed to be ideal) with the output determines the correction signal that back-propagates and reliability and accuracy expected as a result of learning (convergence)

                2. A child learning to sound out words — Criteria: The feedback of listener in terms of facial affect, body language, tone of voice, and words spoken as determined within the language framework of the child and the teacher or parent

                3. Unsupervised feature extraction — Criteria: The narrowness (proportional reduction of the number of bits required to represent the input) and the reliability and accuracy in which the input can be reconstructed from its features.

                Notice that these three examples have criteria that are essentially multivariate boundary conditions:




                1. $mathbbR^3$: $(f_loss(...), 1 - delta, 1 - epsilon)$ — Loss function and the typical expressions of the reliability and accuracy resulting from training


                2. $mathbbR^3$: Whether the word sounds right to the listener, whether the listener detects improvement they wish to acknowledge to reinforce what is perceived as a systemic improvement in visual typesetting to phonic translation1, or whether the child is showing aversion toward the task


                3. $mathbbR^3$: $(dfrac b_f b_in le eta, 1 - delta, 1 - epsilon)$ — The fraction is the ratio of bits representing the features in relation to the input that generated them and is bounded by $eta$

                Now let's consider a fourth example.



                1. Q-learning (a reinforcement strategy originating from C. Watson 1989 PhD thesis) — Criteria: A function called the Q function that evaluates behavioral wellness on the basis of a sequence of states between which actions are expected to cause the transition from the prior state to the subsequent state.

                Here the multivariate boundary condition for Q-learning.




                1. $mathbbR^2$: $mathbbR^3$: $(Q(...), 1 - delta, 1 - epsilon)$ — Q function and the typical expressions of the reliability and accuracy resulting from training

                Strictly speaking, the Q function evaluation can be encapsulated, in which case that system component is technically an agent.



                Real time learning networks may also receive a signal that indicates functional desirability (a reward signal to use Pavlovian behavioral psychology terminology) at any given point in time. The evaluation is external to the network from a design perspective. Because the criteria is decoupled from the network, the learning system is now configurable and extensible in ways that an integrated system cannot.



                In this case, the external component, a black box that simply supplies a wellness signal, is an agent of reinforcement (or dissuasion if it has a pain-like signal too).



                As AI architectures mature, the oversight function is more often decoupled from the network being overseen. There are several linguistic formulations of this causal relationship.




                a. Controller ⇒ agent



                b. Agent ⇒ network



                c. Master ⇒ slave




                Notice that when the word Controller is used, the agent IS the network and the criteria is evaluated by the controller. However, in machine learning the term Agent is generally used to indicate the evaluation component and the network is the slave to the Agent's evaluation.



                It is this later case that one sees the term used in game theory (based on the work of Euler, Kirchhoff, Poincaré, Markov, Morgenstern, and von Neumann), where the Agent is the automaton that chooses moves based on a strategy to achieve maximum reward.



                In robotics, the evaluation may include both pain and pleasure as in the nervous systems arthropods, vertebrates, and other bilaterally symmetric animals. In this case the agent may be the robotics if the (a) causality language is used above, controlled by the AI controller. If the (b) causality language is used, the agent would be as in the game theory paragraph above.



                Terms in AI can have more than one meaning, since AI is largely interdisciplinary, and there's no standards body to publish a canonical glossary of AI terminology. Over time, what was once jargon will become the academic standard. For instance, no one in electrical engineering quibbles over the meaning of voltage, but centuries ago they did.



                The sad truth is that academic authors started using (b) without thinking of the normal use of the word Agent in English, to which (a) more closely conforms.






                share|improve this answer
























                  up vote
                  1
                  down vote



                  accepted










                  To understand why reinforcement learning needs and agent it is best to break down the concepts of learning.



                  • Input

                  • Output

                  • Criteria for evaluation of the mapping from input to output

                  This criteria is use to determine two things.



                  • What is deemed improvement

                  • What is considered sufficient at any given moment of evaluation

                  Let's consider some typical kinds of learning using the above simple framework.



                  1. Supervised MLP (multilayer perceptron) learning — Criteria: A loss function that compares the labels (presumed to be ideal) with the output determines the correction signal that back-propagates and reliability and accuracy expected as a result of learning (convergence)

                  2. A child learning to sound out words — Criteria: The feedback of listener in terms of facial affect, body language, tone of voice, and words spoken as determined within the language framework of the child and the teacher or parent

                  3. Unsupervised feature extraction — Criteria: The narrowness (proportional reduction of the number of bits required to represent the input) and the reliability and accuracy in which the input can be reconstructed from its features.

                  Notice that these three examples have criteria that are essentially multivariate boundary conditions:




                  1. $mathbbR^3$: $(f_loss(...), 1 - delta, 1 - epsilon)$ — Loss function and the typical expressions of the reliability and accuracy resulting from training


                  2. $mathbbR^3$: Whether the word sounds right to the listener, whether the listener detects improvement they wish to acknowledge to reinforce what is perceived as a systemic improvement in visual typesetting to phonic translation1, or whether the child is showing aversion toward the task


                  3. $mathbbR^3$: $(dfrac b_f b_in le eta, 1 - delta, 1 - epsilon)$ — The fraction is the ratio of bits representing the features in relation to the input that generated them and is bounded by $eta$

                  Now let's consider a fourth example.



                  1. Q-learning (a reinforcement strategy originating from C. Watson 1989 PhD thesis) — Criteria: A function called the Q function that evaluates behavioral wellness on the basis of a sequence of states between which actions are expected to cause the transition from the prior state to the subsequent state.

                  Here the multivariate boundary condition for Q-learning.




                  1. $mathbbR^2$: $mathbbR^3$: $(Q(...), 1 - delta, 1 - epsilon)$ — Q function and the typical expressions of the reliability and accuracy resulting from training

                  Strictly speaking, the Q function evaluation can be encapsulated, in which case that system component is technically an agent.



                  Real time learning networks may also receive a signal that indicates functional desirability (a reward signal to use Pavlovian behavioral psychology terminology) at any given point in time. The evaluation is external to the network from a design perspective. Because the criteria is decoupled from the network, the learning system is now configurable and extensible in ways that an integrated system cannot.



                  In this case, the external component, a black box that simply supplies a wellness signal, is an agent of reinforcement (or dissuasion if it has a pain-like signal too).



                  As AI architectures mature, the oversight function is more often decoupled from the network being overseen. There are several linguistic formulations of this causal relationship.




                  a. Controller ⇒ agent



                  b. Agent ⇒ network



                  c. Master ⇒ slave




                  Notice that when the word Controller is used, the agent IS the network and the criteria is evaluated by the controller. However, in machine learning the term Agent is generally used to indicate the evaluation component and the network is the slave to the Agent's evaluation.



                  It is this later case that one sees the term used in game theory (based on the work of Euler, Kirchhoff, Poincaré, Markov, Morgenstern, and von Neumann), where the Agent is the automaton that chooses moves based on a strategy to achieve maximum reward.



                  In robotics, the evaluation may include both pain and pleasure as in the nervous systems arthropods, vertebrates, and other bilaterally symmetric animals. In this case the agent may be the robotics if the (a) causality language is used above, controlled by the AI controller. If the (b) causality language is used, the agent would be as in the game theory paragraph above.



                  Terms in AI can have more than one meaning, since AI is largely interdisciplinary, and there's no standards body to publish a canonical glossary of AI terminology. Over time, what was once jargon will become the academic standard. For instance, no one in electrical engineering quibbles over the meaning of voltage, but centuries ago they did.



                  The sad truth is that academic authors started using (b) without thinking of the normal use of the word Agent in English, to which (a) more closely conforms.






                  share|improve this answer






















                    up vote
                    1
                    down vote



                    accepted







                    up vote
                    1
                    down vote



                    accepted






                    To understand why reinforcement learning needs and agent it is best to break down the concepts of learning.



                    • Input

                    • Output

                    • Criteria for evaluation of the mapping from input to output

                    This criteria is use to determine two things.



                    • What is deemed improvement

                    • What is considered sufficient at any given moment of evaluation

                    Let's consider some typical kinds of learning using the above simple framework.



                    1. Supervised MLP (multilayer perceptron) learning — Criteria: A loss function that compares the labels (presumed to be ideal) with the output determines the correction signal that back-propagates and reliability and accuracy expected as a result of learning (convergence)

                    2. A child learning to sound out words — Criteria: The feedback of listener in terms of facial affect, body language, tone of voice, and words spoken as determined within the language framework of the child and the teacher or parent

                    3. Unsupervised feature extraction — Criteria: The narrowness (proportional reduction of the number of bits required to represent the input) and the reliability and accuracy in which the input can be reconstructed from its features.

                    Notice that these three examples have criteria that are essentially multivariate boundary conditions:




                    1. $mathbbR^3$: $(f_loss(...), 1 - delta, 1 - epsilon)$ — Loss function and the typical expressions of the reliability and accuracy resulting from training


                    2. $mathbbR^3$: Whether the word sounds right to the listener, whether the listener detects improvement they wish to acknowledge to reinforce what is perceived as a systemic improvement in visual typesetting to phonic translation1, or whether the child is showing aversion toward the task


                    3. $mathbbR^3$: $(dfrac b_f b_in le eta, 1 - delta, 1 - epsilon)$ — The fraction is the ratio of bits representing the features in relation to the input that generated them and is bounded by $eta$

                    Now let's consider a fourth example.



                    1. Q-learning (a reinforcement strategy originating from C. Watson 1989 PhD thesis) — Criteria: A function called the Q function that evaluates behavioral wellness on the basis of a sequence of states between which actions are expected to cause the transition from the prior state to the subsequent state.

                    Here the multivariate boundary condition for Q-learning.




                    1. $mathbbR^2$: $mathbbR^3$: $(Q(...), 1 - delta, 1 - epsilon)$ — Q function and the typical expressions of the reliability and accuracy resulting from training

                    Strictly speaking, the Q function evaluation can be encapsulated, in which case that system component is technically an agent.



                    Real time learning networks may also receive a signal that indicates functional desirability (a reward signal to use Pavlovian behavioral psychology terminology) at any given point in time. The evaluation is external to the network from a design perspective. Because the criteria is decoupled from the network, the learning system is now configurable and extensible in ways that an integrated system cannot.



                    In this case, the external component, a black box that simply supplies a wellness signal, is an agent of reinforcement (or dissuasion if it has a pain-like signal too).



                    As AI architectures mature, the oversight function is more often decoupled from the network being overseen. There are several linguistic formulations of this causal relationship.




                    a. Controller ⇒ agent



                    b. Agent ⇒ network



                    c. Master ⇒ slave




                    Notice that when the word Controller is used, the agent IS the network and the criteria is evaluated by the controller. However, in machine learning the term Agent is generally used to indicate the evaluation component and the network is the slave to the Agent's evaluation.



                    It is this later case that one sees the term used in game theory (based on the work of Euler, Kirchhoff, Poincaré, Markov, Morgenstern, and von Neumann), where the Agent is the automaton that chooses moves based on a strategy to achieve maximum reward.



                    In robotics, the evaluation may include both pain and pleasure as in the nervous systems arthropods, vertebrates, and other bilaterally symmetric animals. In this case the agent may be the robotics if the (a) causality language is used above, controlled by the AI controller. If the (b) causality language is used, the agent would be as in the game theory paragraph above.



                    Terms in AI can have more than one meaning, since AI is largely interdisciplinary, and there's no standards body to publish a canonical glossary of AI terminology. Over time, what was once jargon will become the academic standard. For instance, no one in electrical engineering quibbles over the meaning of voltage, but centuries ago they did.



                    The sad truth is that academic authors started using (b) without thinking of the normal use of the word Agent in English, to which (a) more closely conforms.






                    share|improve this answer












                    To understand why reinforcement learning needs and agent it is best to break down the concepts of learning.



                    • Input

                    • Output

                    • Criteria for evaluation of the mapping from input to output

                    This criteria is use to determine two things.



                    • What is deemed improvement

                    • What is considered sufficient at any given moment of evaluation

                    Let's consider some typical kinds of learning using the above simple framework.



                    1. Supervised MLP (multilayer perceptron) learning — Criteria: A loss function that compares the labels (presumed to be ideal) with the output determines the correction signal that back-propagates and reliability and accuracy expected as a result of learning (convergence)

                    2. A child learning to sound out words — Criteria: The feedback of listener in terms of facial affect, body language, tone of voice, and words spoken as determined within the language framework of the child and the teacher or parent

                    3. Unsupervised feature extraction — Criteria: The narrowness (proportional reduction of the number of bits required to represent the input) and the reliability and accuracy in which the input can be reconstructed from its features.

                    Notice that these three examples have criteria that are essentially multivariate boundary conditions:




                    1. $mathbbR^3$: $(f_loss(...), 1 - delta, 1 - epsilon)$ — Loss function and the typical expressions of the reliability and accuracy resulting from training


                    2. $mathbbR^3$: Whether the word sounds right to the listener, whether the listener detects improvement they wish to acknowledge to reinforce what is perceived as a systemic improvement in visual typesetting to phonic translation1, or whether the child is showing aversion toward the task


                    3. $mathbbR^3$: $(dfrac b_f b_in le eta, 1 - delta, 1 - epsilon)$ — The fraction is the ratio of bits representing the features in relation to the input that generated them and is bounded by $eta$

                    Now let's consider a fourth example.



                    1. Q-learning (a reinforcement strategy originating from C. Watson 1989 PhD thesis) — Criteria: A function called the Q function that evaluates behavioral wellness on the basis of a sequence of states between which actions are expected to cause the transition from the prior state to the subsequent state.

                    Here the multivariate boundary condition for Q-learning.




                    1. $mathbbR^2$: $mathbbR^3$: $(Q(...), 1 - delta, 1 - epsilon)$ — Q function and the typical expressions of the reliability and accuracy resulting from training

                    Strictly speaking, the Q function evaluation can be encapsulated, in which case that system component is technically an agent.



                    Real time learning networks may also receive a signal that indicates functional desirability (a reward signal to use Pavlovian behavioral psychology terminology) at any given point in time. The evaluation is external to the network from a design perspective. Because the criteria is decoupled from the network, the learning system is now configurable and extensible in ways that an integrated system cannot.



                    In this case, the external component, a black box that simply supplies a wellness signal, is an agent of reinforcement (or dissuasion if it has a pain-like signal too).



                    As AI architectures mature, the oversight function is more often decoupled from the network being overseen. There are several linguistic formulations of this causal relationship.




                    a. Controller ⇒ agent



                    b. Agent ⇒ network



                    c. Master ⇒ slave




                    Notice that when the word Controller is used, the agent IS the network and the criteria is evaluated by the controller. However, in machine learning the term Agent is generally used to indicate the evaluation component and the network is the slave to the Agent's evaluation.



                    It is this later case that one sees the term used in game theory (based on the work of Euler, Kirchhoff, Poincaré, Markov, Morgenstern, and von Neumann), where the Agent is the automaton that chooses moves based on a strategy to achieve maximum reward.



                    In robotics, the evaluation may include both pain and pleasure as in the nervous systems arthropods, vertebrates, and other bilaterally symmetric animals. In this case the agent may be the robotics if the (a) causality language is used above, controlled by the AI controller. If the (b) causality language is used, the agent would be as in the game theory paragraph above.



                    Terms in AI can have more than one meaning, since AI is largely interdisciplinary, and there's no standards body to publish a canonical glossary of AI terminology. Over time, what was once jargon will become the academic standard. For instance, no one in electrical engineering quibbles over the meaning of voltage, but centuries ago they did.



                    The sad truth is that academic authors started using (b) without thinking of the normal use of the word Agent in English, to which (a) more closely conforms.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered 2 hours ago









                    Douglas Daseeco

                    3,046331




                    3,046331






















                        up vote
                        2
                        down vote













                        The agent in RL is the component that makes the decision of what action to take.



                        In order to make that decision, the agent is allowed to use any observation from the environment, and any internal rules that it has. Those internal rules can be anything, but typically in RL, it expects the current state to be provided by the environment, for that state to have the Markov property, and then it processes that state using a policy function $pi(a|s)$ that decides what action to take.



                        In addition, in RL we usually care about handling a reward signal (received from the environment) and optimising the agent towards maximising the expected reward in future. To do this, the agent will maintain some data which is influenced by the rewards it received in the past, and use that to construct a better policy.



                        One interesting thing about the definition of an agent, is that the agent/environment boundary is usually considered to be very close to the abstract decision making unit. For instance, for a robot, the agent is typically not the whole robot, but the specific program running on the robot's CPU that makes the decision on the action. All the relays/motors and other parts of the physical body of the robot are parts of the environment in RL terms. Although often loose language is used here, as the distinction might not matter in most descriptions - we would say that "the robot moves its arm to achieve the goal" when in stricter RL terms we should say that "the agent running on the robot CPU instructs the arm motors to move to achieve the goal".




                        I think it is not the Neural Net behind?




                        That is correct, the agent is more than the neural network. One or more neural networks might be part of an agent, and take the role of estimating the value of a state, or state/action pair, or even directly driving the policy function.






                        share|improve this answer


























                          up vote
                          2
                          down vote













                          The agent in RL is the component that makes the decision of what action to take.



                          In order to make that decision, the agent is allowed to use any observation from the environment, and any internal rules that it has. Those internal rules can be anything, but typically in RL, it expects the current state to be provided by the environment, for that state to have the Markov property, and then it processes that state using a policy function $pi(a|s)$ that decides what action to take.



                          In addition, in RL we usually care about handling a reward signal (received from the environment) and optimising the agent towards maximising the expected reward in future. To do this, the agent will maintain some data which is influenced by the rewards it received in the past, and use that to construct a better policy.



                          One interesting thing about the definition of an agent, is that the agent/environment boundary is usually considered to be very close to the abstract decision making unit. For instance, for a robot, the agent is typically not the whole robot, but the specific program running on the robot's CPU that makes the decision on the action. All the relays/motors and other parts of the physical body of the robot are parts of the environment in RL terms. Although often loose language is used here, as the distinction might not matter in most descriptions - we would say that "the robot moves its arm to achieve the goal" when in stricter RL terms we should say that "the agent running on the robot CPU instructs the arm motors to move to achieve the goal".




                          I think it is not the Neural Net behind?




                          That is correct, the agent is more than the neural network. One or more neural networks might be part of an agent, and take the role of estimating the value of a state, or state/action pair, or even directly driving the policy function.






                          share|improve this answer
























                            up vote
                            2
                            down vote










                            up vote
                            2
                            down vote









                            The agent in RL is the component that makes the decision of what action to take.



                            In order to make that decision, the agent is allowed to use any observation from the environment, and any internal rules that it has. Those internal rules can be anything, but typically in RL, it expects the current state to be provided by the environment, for that state to have the Markov property, and then it processes that state using a policy function $pi(a|s)$ that decides what action to take.



                            In addition, in RL we usually care about handling a reward signal (received from the environment) and optimising the agent towards maximising the expected reward in future. To do this, the agent will maintain some data which is influenced by the rewards it received in the past, and use that to construct a better policy.



                            One interesting thing about the definition of an agent, is that the agent/environment boundary is usually considered to be very close to the abstract decision making unit. For instance, for a robot, the agent is typically not the whole robot, but the specific program running on the robot's CPU that makes the decision on the action. All the relays/motors and other parts of the physical body of the robot are parts of the environment in RL terms. Although often loose language is used here, as the distinction might not matter in most descriptions - we would say that "the robot moves its arm to achieve the goal" when in stricter RL terms we should say that "the agent running on the robot CPU instructs the arm motors to move to achieve the goal".




                            I think it is not the Neural Net behind?




                            That is correct, the agent is more than the neural network. One or more neural networks might be part of an agent, and take the role of estimating the value of a state, or state/action pair, or even directly driving the policy function.






                            share|improve this answer














                            The agent in RL is the component that makes the decision of what action to take.



                            In order to make that decision, the agent is allowed to use any observation from the environment, and any internal rules that it has. Those internal rules can be anything, but typically in RL, it expects the current state to be provided by the environment, for that state to have the Markov property, and then it processes that state using a policy function $pi(a|s)$ that decides what action to take.



                            In addition, in RL we usually care about handling a reward signal (received from the environment) and optimising the agent towards maximising the expected reward in future. To do this, the agent will maintain some data which is influenced by the rewards it received in the past, and use that to construct a better policy.



                            One interesting thing about the definition of an agent, is that the agent/environment boundary is usually considered to be very close to the abstract decision making unit. For instance, for a robot, the agent is typically not the whole robot, but the specific program running on the robot's CPU that makes the decision on the action. All the relays/motors and other parts of the physical body of the robot are parts of the environment in RL terms. Although often loose language is used here, as the distinction might not matter in most descriptions - we would say that "the robot moves its arm to achieve the goal" when in stricter RL terms we should say that "the agent running on the robot CPU instructs the arm motors to move to achieve the goal".




                            I think it is not the Neural Net behind?




                            That is correct, the agent is more than the neural network. One or more neural networks might be part of an agent, and take the role of estimating the value of a state, or state/action pair, or even directly driving the policy function.







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited 2 hours ago

























                            answered 5 hours ago









                            Neil Slater

                            3,518415




                            3,518415




















                                up vote
                                1
                                down vote













                                The agent replaces the human operator. He controls instead of him the game. The colloquial term for such a software is Aimbot. In the context of Reinforcement Learning the inner working of an aimbot is not realized with scripting languages like AutoIt, but with machine learning algorithm like q-learning and neural networks.






                                share|improve this answer
























                                  up vote
                                  1
                                  down vote













                                  The agent replaces the human operator. He controls instead of him the game. The colloquial term for such a software is Aimbot. In the context of Reinforcement Learning the inner working of an aimbot is not realized with scripting languages like AutoIt, but with machine learning algorithm like q-learning and neural networks.






                                  share|improve this answer






















                                    up vote
                                    1
                                    down vote










                                    up vote
                                    1
                                    down vote









                                    The agent replaces the human operator. He controls instead of him the game. The colloquial term for such a software is Aimbot. In the context of Reinforcement Learning the inner working of an aimbot is not realized with scripting languages like AutoIt, but with machine learning algorithm like q-learning and neural networks.






                                    share|improve this answer












                                    The agent replaces the human operator. He controls instead of him the game. The colloquial term for such a software is Aimbot. In the context of Reinforcement Learning the inner working of an aimbot is not realized with scripting languages like AutoIt, but with machine learning algorithm like q-learning and neural networks.







                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered 4 hours ago









                                    Manuel Rodriguez

                                    1,092119




                                    1,092119




















                                        TVSuchty is a new contributor. Be nice, and check out our Code of Conduct.









                                         

                                        draft saved


                                        draft discarded


















                                        TVSuchty is a new contributor. Be nice, and check out our Code of Conduct.












                                        TVSuchty is a new contributor. Be nice, and check out our Code of Conduct.











                                        TVSuchty is a new contributor. Be nice, and check out our Code of Conduct.













                                         


                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f8476%2fwhat-does-the-agent-in-reinforcement-learning-exactly-do%23new-answer', 'question_page');

                                        );

                                        Post as a guest













































































                                        Comments

                                        Popular posts from this blog

                                        White Anglo-Saxon Protestant

                                        BuddyTV

                                        Conflict (narrative)