Automated Labelling

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












Let's say I have been given 1000 documents and 6 labels from someone. My job is to label each of these 1000 documents into 1 of the 6 labels which are words not numbers. How can I automate or semi-automate this process using data science??
Can I manually label some and then train and make a predictor...I think the accuracy won't be very high.
Are there any other solutions than just this one??










share|improve this question







New contributor




Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.























    up vote
    1
    down vote

    favorite












    Let's say I have been given 1000 documents and 6 labels from someone. My job is to label each of these 1000 documents into 1 of the 6 labels which are words not numbers. How can I automate or semi-automate this process using data science??
    Can I manually label some and then train and make a predictor...I think the accuracy won't be very high.
    Are there any other solutions than just this one??










    share|improve this question







    New contributor




    Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





















      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      Let's say I have been given 1000 documents and 6 labels from someone. My job is to label each of these 1000 documents into 1 of the 6 labels which are words not numbers. How can I automate or semi-automate this process using data science??
      Can I manually label some and then train and make a predictor...I think the accuracy won't be very high.
      Are there any other solutions than just this one??










      share|improve this question







      New contributor




      Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      Let's say I have been given 1000 documents and 6 labels from someone. My job is to label each of these 1000 documents into 1 of the 6 labels which are words not numbers. How can I automate or semi-automate this process using data science??
      Can I manually label some and then train and make a predictor...I think the accuracy won't be very high.
      Are there any other solutions than just this one??







      machine-learning clustering text-mining






      share|improve this question







      New contributor




      Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 1 hour ago









      Rishabh Baid

      61




      61




      New contributor




      Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          1
          down vote













          You have two options. Supervised learning where you will have to label the data manually and then use those data points to train a model and predict the remaining instances.



          Or, you can use unsupervised learning, these are techniques which do not need a label. You can use k-means to cluster your data into $k=6$ labels. Then you can associate these clusters with the label based on your experience.






          share|improve this answer




















          • How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
            – Rishabh Baid
            48 mins ago










          • It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
            – JahKnows
            45 mins ago

















          up vote
          1
          down vote













          Semi-supervised learning. You label 1% manually, let the algorithm learn, then it labels unknown data, learns from it and labels again.






          share|improve this answer




















            Your Answer





            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "557"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );






            Rishabh Baid is a new contributor. Be nice, and check out our Code of Conduct.









             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40546%2fautomated-labelling%23new-answer', 'question_page');

            );

            Post as a guest






























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote













            You have two options. Supervised learning where you will have to label the data manually and then use those data points to train a model and predict the remaining instances.



            Or, you can use unsupervised learning, these are techniques which do not need a label. You can use k-means to cluster your data into $k=6$ labels. Then you can associate these clusters with the label based on your experience.






            share|improve this answer




















            • How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
              – Rishabh Baid
              48 mins ago










            • It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
              – JahKnows
              45 mins ago














            up vote
            1
            down vote













            You have two options. Supervised learning where you will have to label the data manually and then use those data points to train a model and predict the remaining instances.



            Or, you can use unsupervised learning, these are techniques which do not need a label. You can use k-means to cluster your data into $k=6$ labels. Then you can associate these clusters with the label based on your experience.






            share|improve this answer




















            • How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
              – Rishabh Baid
              48 mins ago










            • It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
              – JahKnows
              45 mins ago












            up vote
            1
            down vote










            up vote
            1
            down vote









            You have two options. Supervised learning where you will have to label the data manually and then use those data points to train a model and predict the remaining instances.



            Or, you can use unsupervised learning, these are techniques which do not need a label. You can use k-means to cluster your data into $k=6$ labels. Then you can associate these clusters with the label based on your experience.






            share|improve this answer












            You have two options. Supervised learning where you will have to label the data manually and then use those data points to train a model and predict the remaining instances.



            Or, you can use unsupervised learning, these are techniques which do not need a label. You can use k-means to cluster your data into $k=6$ labels. Then you can associate these clusters with the label based on your experience.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 53 mins ago









            JahKnows

            4,146423




            4,146423











            • How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
              – Rishabh Baid
              48 mins ago










            • It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
              – JahKnows
              45 mins ago
















            • How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
              – Rishabh Baid
              48 mins ago










            • It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
              – JahKnows
              45 mins ago















            How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
            – Rishabh Baid
            48 mins ago




            How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
            – Rishabh Baid
            48 mins ago












            It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
            – JahKnows
            45 mins ago




            It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
            – JahKnows
            45 mins ago










            up vote
            1
            down vote













            Semi-supervised learning. You label 1% manually, let the algorithm learn, then it labels unknown data, learns from it and labels again.






            share|improve this answer
























              up vote
              1
              down vote













              Semi-supervised learning. You label 1% manually, let the algorithm learn, then it labels unknown data, learns from it and labels again.






              share|improve this answer






















                up vote
                1
                down vote










                up vote
                1
                down vote









                Semi-supervised learning. You label 1% manually, let the algorithm learn, then it labels unknown data, learns from it and labels again.






                share|improve this answer












                Semi-supervised learning. You label 1% manually, let the algorithm learn, then it labels unknown data, learns from it and labels again.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 20 mins ago









                keiv.fly

                3378




                3378




















                    Rishabh Baid is a new contributor. Be nice, and check out our Code of Conduct.









                     

                    draft saved


                    draft discarded


















                    Rishabh Baid is a new contributor. Be nice, and check out our Code of Conduct.












                    Rishabh Baid is a new contributor. Be nice, and check out our Code of Conduct.











                    Rishabh Baid is a new contributor. Be nice, and check out our Code of Conduct.













                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40546%2fautomated-labelling%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Comments

                    Popular posts from this blog

                    What does second last employer means? [closed]

                    List of Gilmore Girls characters

                    Confectionery