Which type of data normalizing should be used with KNN?

Clash Royale CLAN TAG#URR8PPP

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
2
down vote

favorite

I know that there is more than two type of normalizing.

For example,

1- Transforming data using a z-score or t-score. This is usually called standardization.

2- Rescaling data to have values between 0 and 1.

The question now if I need normalizing

Which type of data normalizing should be used with KNN? and Why?

edited Aug 25 at 10:16

asked Aug 25 at 8:59

jeza

17119

add a commentÂ |Â

up vote
2
down vote

favorite

I know that there is more than two type of normalizing.

For example,

1- Transforming data using a z-score or t-score. This is usually called standardization.

2- Rescaling data to have values between 0 and 1.

The question now if I need normalizing

Which type of data normalizing should be used with KNN? and Why?

edited Aug 25 at 10:16

asked Aug 25 at 8:59

jeza

17119

add a commentÂ |Â

up vote
2
down vote

favorite

I know that there is more than two type of normalizing.

For example,

1- Transforming data using a z-score or t-score. This is usually called standardization.

2- Rescaling data to have values between 0 and 1.

The question now if I need normalizing

Which type of data normalizing should be used with KNN? and Why?

edited Aug 25 at 10:16

asked Aug 25 at 8:59

jeza

17119

I know that there is more than two type of normalizing.

For example,

1- Transforming data using a z-score or t-score. This is usually called standardization.

2- Rescaling data to have values between 0 and 1.

The question now if I need normalizing

Which type of data normalizing should be used with KNN? and Why?

edited Aug 25 at 10:16

asked Aug 25 at 8:59

jeza

17119

edited Aug 25 at 10:16

asked Aug 25 at 8:59

jeza

17119

asked Aug 25 at 8:59

jeza

17119

asked Aug 25 at 8:59

jeza

17119

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
4
down vote

accepted

For k-NN, I'd suggest normalizing the data between $0$ and $1$.

k-NN uses the Euclidean distance, as its means of comparing examples. To calculate the distance between two points $x_1 = (f_1^1, f_1^2, ..., f_1^M)$ and $x_2 = (f_2^1, f_2^2, ..., f_2^M)$, where $f_1^i$ is the value of the $i$-th feature of $x_1$:

$$
d(x_1, x_2) = sqrt(f_1^1 - f_2^1)^2 + (f_1^2 - f_2^2)^2 + ... + (f_1^M - f_2^M)^2
$$

In order for all of the features to be of equal importance when calculating the distance, the features must have the same range of values. This is only achievable through normalization.

If they were not normalized and for instance feature $f^1$ had a range of values in $[0, 1$), while $f^2$ had a range of values in $[1, 10)$. When calculating the distance, the second term would be $10$ times important than the first, leading k-NN to rely more on the second feature than the first. Normalization ensures that all features are mapped to the same range of values.

Standardization, on the other hand, does have many useful properties, but can't ensure that the features are mapped to the same range. While standardization may be best suited for other classifiers, this is not the case for k-NN or any other distance-based classifier.

answered Aug 25 at 11:40

Djib2011

1,607616

2

Is your answer will be the same if I used different distance instead of Euclidean distance (for example Manhattan distance or other distance even fractional distance)? Also If the range of the variables is approximately close to each other.
â€“Â jeza
Aug 25 at 12:33

2

Yes I just showed Euclidean distance as an example, but all distance metrics suffer from the same thing. If the ranges are close to one another then it wouldn't affect the calculation of the metric that much, but it still would. For example if $f^1 in [0, 1)$ and $f^2 in [0, 1.2)$, $f^2$ would still be $20%$ more important than $f^1$. One thing I forgot to mention was that standardizing, obviously, is much better than not performing any feature scaling; it is simply worse than normalization.
â€“Â Djib2011
Aug 25 at 13:07

Ah I see. "it is simply worse than normalization"!?
â€“Â jeza
Aug 25 at 13:24

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f363889%2fwhich-type-of-data-normalizing-should-be-used-with-knn%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
4
down vote

accepted

For k-NN, I'd suggest normalizing the data between $0$ and $1$.

$$
d(x_1, x_2) = sqrt(f_1^1 - f_2^1)^2 + (f_1^2 - f_2^2)^2 + ... + (f_1^M - f_2^M)^2
$$

In order for all of the features to be of equal importance when calculating the distance, the features must have the same range of values. This is only achievable through normalization.

answered Aug 25 at 11:40

Djib2011

1,607616

2

Is your answer will be the same if I used different distance instead of Euclidean distance (for example Manhattan distance or other distance even fractional distance)? Also If the range of the variables is approximately close to each other.
â€“Â jeza
Aug 25 at 12:33

2

Yes I just showed Euclidean distance as an example, but all distance metrics suffer from the same thing. If the ranges are close to one another then it wouldn't affect the calculation of the metric that much, but it still would. For example if $f^1 in [0, 1)$ and $f^2 in [0, 1.2)$, $f^2$ would still be $20%$ more important than $f^1$. One thing I forgot to mention was that standardizing, obviously, is much better than not performing any feature scaling; it is simply worse than normalization.
â€“Â Djib2011
Aug 25 at 13:07

Ah I see. "it is simply worse than normalization"!?
â€“Â jeza
Aug 25 at 13:24

add a commentÂ |Â

up vote
4
down vote

accepted

For k-NN, I'd suggest normalizing the data between $0$ and $1$.

$$
d(x_1, x_2) = sqrt(f_1^1 - f_2^1)^2 + (f_1^2 - f_2^2)^2 + ... + (f_1^M - f_2^M)^2
$$

In order for all of the features to be of equal importance when calculating the distance, the features must have the same range of values. This is only achievable through normalization.

answered Aug 25 at 11:40

Djib2011

1,607616

2

Is your answer will be the same if I used different distance instead of Euclidean distance (for example Manhattan distance or other distance even fractional distance)? Also If the range of the variables is approximately close to each other.
â€“Â jeza
Aug 25 at 12:33

2

Yes I just showed Euclidean distance as an example, but all distance metrics suffer from the same thing. If the ranges are close to one another then it wouldn't affect the calculation of the metric that much, but it still would. For example if $f^1 in [0, 1)$ and $f^2 in [0, 1.2)$, $f^2$ would still be $20%$ more important than $f^1$. One thing I forgot to mention was that standardizing, obviously, is much better than not performing any feature scaling; it is simply worse than normalization.
â€“Â Djib2011
Aug 25 at 13:07

Ah I see. "it is simply worse than normalization"!?
â€“Â jeza
Aug 25 at 13:24

add a commentÂ |Â

up vote
4
down vote

accepted

For k-NN, I'd suggest normalizing the data between $0$ and $1$.

$$
d(x_1, x_2) = sqrt(f_1^1 - f_2^1)^2 + (f_1^2 - f_2^2)^2 + ... + (f_1^M - f_2^M)^2
$$

In order for all of the features to be of equal importance when calculating the distance, the features must have the same range of values. This is only achievable through normalization.

answered Aug 25 at 11:40

Djib2011

1,607616

For k-NN, I'd suggest normalizing the data between $0$ and $1$.

$$
d(x_1, x_2) = sqrt(f_1^1 - f_2^1)^2 + (f_1^2 - f_2^2)^2 + ... + (f_1^M - f_2^M)^2
$$

In order for all of the features to be of equal importance when calculating the distance, the features must have the same range of values. This is only achievable through normalization.

answered Aug 25 at 11:40

Djib2011

1,607616

answered Aug 25 at 11:40

Djib2011

1,607616

answered Aug 25 at 11:40

Djib2011

1,607616

answered Aug 25 at 11:40

Djib2011

1,607616

2

Is your answer will be the same if I used different distance instead of Euclidean distance (for example Manhattan distance or other distance even fractional distance)? Also If the range of the variables is approximately close to each other.
â€“Â jeza
Aug 25 at 12:33

2

Yes I just showed Euclidean distance as an example, but all distance metrics suffer from the same thing. If the ranges are close to one another then it wouldn't affect the calculation of the metric that much, but it still would. For example if $f^1 in [0, 1)$ and $f^2 in [0, 1.2)$, $f^2$ would still be $20%$ more important than $f^1$. One thing I forgot to mention was that standardizing, obviously, is much better than not performing any feature scaling; it is simply worse than normalization.
â€“Â Djib2011
Aug 25 at 13:07

Ah I see. "it is simply worse than normalization"!?
â€“Â jeza
Aug 25 at 13:24

add a commentÂ |Â

2

Is your answer will be the same if I used different distance instead of Euclidean distance (for example Manhattan distance or other distance even fractional distance)? Also If the range of the variables is approximately close to each other.
â€“Â jeza
Aug 25 at 12:33

2

Yes I just showed Euclidean distance as an example, but all distance metrics suffer from the same thing. If the ranges are close to one another then it wouldn't affect the calculation of the metric that much, but it still would. For example if $f^1 in [0, 1)$ and $f^2 in [0, 1.2)$, $f^2$ would still be $20%$ more important than $f^1$. One thing I forgot to mention was that standardizing, obviously, is much better than not performing any feature scaling; it is simply worse than normalization.
â€“Â Djib2011
Aug 25 at 13:07

Ah I see. "it is simply worse than normalization"!?
â€“Â jeza
Aug 25 at 13:24

Is your answer will be the same if I used different distance instead of Euclidean distance (for example Manhattan distance or other distance even fractional distance)? Also If the range of the variables is approximately close to each other.
â€“Â jeza
Aug 25 at 12:33

Yes I just showed Euclidean distance as an example, but all distance metrics suffer from the same thing. If the ranges are close to one another then it wouldn't affect the calculation of the metric that much, but it still would. For example if $f^1 in [0, 1)$ and $f^2 in [0, 1.2)$, $f^2$ would still be $20%$ more important than $f^1$. One thing I forgot to mention was that standardizing, obviously, is much better than not performing any feature scaling; it is simply worse than normalization.
â€“Â Djib2011
Aug 25 at 13:07

Ah I see. "it is simply worse than normalization"!?
â€“Â jeza
Aug 25 at 13:24

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky