Automated Labelling

up vote
1
down vote

favorite

Let's say I have been given 1000 documents and 6 labels from someone. My job is to label each of these 1000 documents into 1 of the 6 labels which are words not numbers. How can I automate or semi-automate this process using data science??
Can I manually label some and then train and make a predictor...I think the accuracy won't be very high.
Are there any other solutions than just this one??

asked 1 hour ago

Rishabh Baid

New contributor

add a commentÂ |Â

up vote
1
down vote

favorite

asked 1 hour ago

Rishabh Baid

New contributor

add a commentÂ |Â

up vote
1
down vote

favorite

asked 1 hour ago

Rishabh Baid

New contributor

machine-learning clustering text-mining

asked 1 hour ago

Rishabh Baid

New contributor

asked 1 hour ago

Rishabh Baid

New contributor

asked 1 hour ago

Rishabh Baid

New contributor

asked 1 hour ago

Rishabh Baid

asked 1 hour ago

Rishabh Baid

New contributor

Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
1
down vote

You have two options. Supervised learning where you will have to label the data manually and then use those data points to train a model and predict the remaining instances.

Or, you can use unsupervised learning, these are techniques which do not need a label. You can use k-means to cluster your data into $k=6$ labels. Then you can associate these clusters with the label based on your experience.

answered 53 mins ago

JahKnows

4,146423

How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
â€“Â Rishabh Baid
48 mins ago

It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
â€“Â JahKnows
45 mins ago

add a commentÂ |Â

up vote
1
down vote

Semi-supervised learning. You label 1% manually, let the algorithm learn, then it labels unknown data, learns from it and labels again.

answered 20 mins ago

keiv.fly

3378

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Rishabh Baid is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40546%2fautomated-labelling%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

You have two options. Supervised learning where you will have to label the data manually and then use those data points to train a model and predict the remaining instances.

answered 53 mins ago

JahKnows

4,146423

How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
â€“Â Rishabh Baid
48 mins ago

It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
â€“Â JahKnows
45 mins ago

add a commentÂ |Â

up vote
1
down vote

You have two options. Supervised learning where you will have to label the data manually and then use those data points to train a model and predict the remaining instances.

answered 53 mins ago

JahKnows

4,146423

How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
â€“Â Rishabh Baid
48 mins ago

It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
â€“Â JahKnows
45 mins ago

add a commentÂ |Â

up vote
1
down vote

You have two options. Supervised learning where you will have to label the data manually and then use those data points to train a model and predict the remaining instances.

answered 53 mins ago

JahKnows

4,146423

You have two options. Supervised learning where you will have to label the data manually and then use those data points to train a model and predict the remaining instances.

answered 53 mins ago

JahKnows

4,146423

answered 53 mins ago

JahKnows

4,146423

answered 53 mins ago

JahKnows

4,146423

answered 53 mins ago

JahKnows

4,146423

How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
â€“Â Rishabh Baid
48 mins ago

It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
â€“Â JahKnows
45 mins ago

add a commentÂ |Â

How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
â€“Â Rishabh Baid
48 mins ago

It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
â€“Â JahKnows
45 mins ago

How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
â€“Â Rishabh Baid
48 mins ago

It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
â€“Â JahKnows
45 mins ago

add a commentÂ |Â

up vote
1
down vote

Semi-supervised learning. You label 1% manually, let the algorithm learn, then it labels unknown data, learns from it and labels again.

answered 20 mins ago

keiv.fly

3378

add a commentÂ |Â

up vote
1
down vote

Semi-supervised learning. You label 1% manually, let the algorithm learn, then it labels unknown data, learns from it and labels again.

answered 20 mins ago

keiv.fly

3378

add a commentÂ |Â

up vote
1
down vote

Semi-supervised learning. You label 1% manually, let the algorithm learn, then it labels unknown data, learns from it and labels again.

answered 20 mins ago

keiv.fly

3378

Semi-supervised learning. You label 1% manually, let the algorithm learn, then it labels unknown data, learns from it and labels again.

answered 20 mins ago

keiv.fly

3378

answered 20 mins ago

keiv.fly

3378

answered 20 mins ago

keiv.fly

3378

answered 20 mins ago

keiv.fly

3378

add a commentÂ |Â

Rishabh Baid is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Rishabh Baid is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky