Automated Labelling
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
Let's say I have been given 1000 documents and 6 labels from someone. My job is to label each of these 1000 documents into 1 of the 6 labels which are words not numbers. How can I automate or semi-automate this process using data science??
Can I manually label some and then train and make a predictor...I think the accuracy won't be very high.
Are there any other solutions than just this one??
machine-learning clustering text-mining
New contributor
Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
1
down vote
favorite
Let's say I have been given 1000 documents and 6 labels from someone. My job is to label each of these 1000 documents into 1 of the 6 labels which are words not numbers. How can I automate or semi-automate this process using data science??
Can I manually label some and then train and make a predictor...I think the accuracy won't be very high.
Are there any other solutions than just this one??
machine-learning clustering text-mining
New contributor
Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Let's say I have been given 1000 documents and 6 labels from someone. My job is to label each of these 1000 documents into 1 of the 6 labels which are words not numbers. How can I automate or semi-automate this process using data science??
Can I manually label some and then train and make a predictor...I think the accuracy won't be very high.
Are there any other solutions than just this one??
machine-learning clustering text-mining
New contributor
Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Let's say I have been given 1000 documents and 6 labels from someone. My job is to label each of these 1000 documents into 1 of the 6 labels which are words not numbers. How can I automate or semi-automate this process using data science??
Can I manually label some and then train and make a predictor...I think the accuracy won't be very high.
Are there any other solutions than just this one??
machine-learning clustering text-mining
machine-learning clustering text-mining
New contributor
Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 1 hour ago


Rishabh Baid
61
61
New contributor
Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Rishabh Baid is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
1
down vote
You have two options. Supervised learning where you will have to label the data manually and then use those data points to train a model and predict the remaining instances.
Or, you can use unsupervised learning, these are techniques which do not need a label. You can use k-means to cluster your data into $k=6$ labels. Then you can associate these clusters with the label based on your experience.
How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
– Rishabh Baid
48 mins ago
It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
– JahKnows
45 mins ago
add a comment |Â
up vote
1
down vote
Semi-supervised learning. You label 1% manually, let the algorithm learn, then it labels unknown data, learns from it and labels again.
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
You have two options. Supervised learning where you will have to label the data manually and then use those data points to train a model and predict the remaining instances.
Or, you can use unsupervised learning, these are techniques which do not need a label. You can use k-means to cluster your data into $k=6$ labels. Then you can associate these clusters with the label based on your experience.
How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
– Rishabh Baid
48 mins ago
It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
– JahKnows
45 mins ago
add a comment |Â
up vote
1
down vote
You have two options. Supervised learning where you will have to label the data manually and then use those data points to train a model and predict the remaining instances.
Or, you can use unsupervised learning, these are techniques which do not need a label. You can use k-means to cluster your data into $k=6$ labels. Then you can associate these clusters with the label based on your experience.
How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
– Rishabh Baid
48 mins ago
It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
– JahKnows
45 mins ago
add a comment |Â
up vote
1
down vote
up vote
1
down vote
You have two options. Supervised learning where you will have to label the data manually and then use those data points to train a model and predict the remaining instances.
Or, you can use unsupervised learning, these are techniques which do not need a label. You can use k-means to cluster your data into $k=6$ labels. Then you can associate these clusters with the label based on your experience.
You have two options. Supervised learning where you will have to label the data manually and then use those data points to train a model and predict the remaining instances.
Or, you can use unsupervised learning, these are techniques which do not need a label. You can use k-means to cluster your data into $k=6$ labels. Then you can associate these clusters with the label based on your experience.
answered 53 mins ago


JahKnows
4,146423
4,146423
How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
– Rishabh Baid
48 mins ago
It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
– JahKnows
45 mins ago
add a comment |Â
How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
– Rishabh Baid
48 mins ago
It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
– JahKnows
45 mins ago
How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
– Rishabh Baid
48 mins ago
How to use k-means...the centroids are initialised randomly so they won't cluster the documents according to my labels? Will it be right here to not initialize centroids randomly??
– Rishabh Baid
48 mins ago
It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
– JahKnows
45 mins ago
It's best to randomly initialize them to avoid introducing bias. Let the centroids converge. Then attribute each cluster with one of your labels.
– JahKnows
45 mins ago
add a comment |Â
up vote
1
down vote
Semi-supervised learning. You label 1% manually, let the algorithm learn, then it labels unknown data, learns from it and labels again.
add a comment |Â
up vote
1
down vote
Semi-supervised learning. You label 1% manually, let the algorithm learn, then it labels unknown data, learns from it and labels again.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Semi-supervised learning. You label 1% manually, let the algorithm learn, then it labels unknown data, learns from it and labels again.
Semi-supervised learning. You label 1% manually, let the algorithm learn, then it labels unknown data, learns from it and labels again.
answered 20 mins ago
keiv.fly
3378
3378
add a comment |Â
add a comment |Â
Rishabh Baid is a new contributor. Be nice, and check out our Code of Conduct.
Rishabh Baid is a new contributor. Be nice, and check out our Code of Conduct.
Rishabh Baid is a new contributor. Be nice, and check out our Code of Conduct.
Rishabh Baid is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40546%2fautomated-labelling%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password