What is the reason behind taking log for few continuous variables?

up vote
1
down vote

favorite

I have been doing a classification problem and I have read many people's code and tutorials. One things I've noticed is that many people take np.log or log of continuous variable like loan_amount or applicant_income etc.

I just want to understand the reason behind it. Does it help improve our model prediction accuracy. Is it mandatory or is there any logic behind it?

Please give me some explanation behind it. Thank you.

edited 44 mins ago

asked 49 mins ago

Sai Kumar

1064

New contributor

add a commentÂ |Â

up vote
1
down vote

favorite

I just want to understand the reason behind it. Does it help improve our model prediction accuracy. Is it mandatory or is there any logic behind it?

Please give me some explanation behind it. Thank you.

edited 44 mins ago

asked 49 mins ago

Sai Kumar

1064

New contributor

add a commentÂ |Â

up vote
1
down vote

favorite

I just want to understand the reason behind it. Does it help improve our model prediction accuracy. Is it mandatory or is there any logic behind it?

Please give me some explanation behind it. Thank you.

edited 44 mins ago

asked 49 mins ago

Sai Kumar

1064

New contributor

I just want to understand the reason behind it. Does it help improve our model prediction accuracy. Is it mandatory or is there any logic behind it?

Please give me some explanation behind it. Thank you.

machine-learning python classification scikit-learn

edited 44 mins ago

asked 49 mins ago

Sai Kumar

1064

New contributor

edited 44 mins ago

asked 49 mins ago

Sai Kumar

1064

New contributor

edited 44 mins ago

asked 49 mins ago

Sai Kumar

1064

New contributor

asked 49 mins ago

Sai Kumar

1064

asked 49 mins ago

Sai Kumar

1064

New contributor

Sai Kumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
2
down vote

Mostly because of skewed distribution. Logarithm naturally reduces the dynamic range of a variable so the differences are preserved while the scale is not that dramatically skewed. Imagine some people got 100,000,000 loan and some got 10000 and some 0. Any feature scaling will probably put 0 and 10000 so close to each other as the biggest number anyway pushes the boundary. Logarithm solves the issue.

answered 37 mins ago

Kasra Manshaei

2,9661035

Manshael, So I can use MinMaxScaler or StandardScaler right? or Is it necessary to take log?
â€“Â Sai Kumar
31 mins ago

Necessary. If you use scalers they compress small values dramatically. That's what I meant to say.
â€“Â Kasra Manshaei
30 mins ago

I didn't get you here. Can you explain?
â€“Â Sai Kumar
29 mins ago

1

Yes. If you take values 1000,000,000 and 10000 and 0 into account. In many cases, the first one is too big to let others be seen properly by your model. But if you take logarithm you will have 9, 4 and 0 respectively. As you see the dynamic range is reduced while the differences are almost preserved. It comes from any exponential nature in your feature. In those cases you need logarithm as the other answer depicted. Hope it helped :)
â€“Â Kasra Manshaei
26 mins ago

1

Well, scaling! Imagine two variables with normal distribution (so there is no need for logarithm) but one of them in the scale of 10ish and the other in the scale of milions. Again feeding them to the model makes the small one invisible. In this case you use scalers to make their scales reasonable.
â€“Â Kasra Manshaei
20 mins ago

Â |Â
show 4 more comments

up vote
2
down vote

This is done when the variables span several orders of magnitude. Income is a typical example: its distribution is "power law", meaning that the vast majority of incomes are small and very few are big.

This type of "fat tailed" distribution is studied in logarithmic scale because of the mathematical properties of the logarithm:

$$log(x^n)= n log(x)$$

which implies

$$log(10^4) = 4 * log(10)$$

and

$$log(10^3) = 3 * log(10)$$

edited 28 mins ago

answered 34 mins ago

Duccio Piovani

213

New contributor

1

Nice answer specially talking about exponential distributions.
â€“Â Kasra Manshaei
16 mins ago

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Sai Kumar is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40089%2fwhat-is-the-reason-behind-taking-log-for-few-continuous-variables%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

answered 37 mins ago

Kasra Manshaei

2,9661035

Manshael, So I can use MinMaxScaler or StandardScaler right? or Is it necessary to take log?
â€“Â Sai Kumar
31 mins ago

Necessary. If you use scalers they compress small values dramatically. That's what I meant to say.
â€“Â Kasra Manshaei
30 mins ago

I didn't get you here. Can you explain?
â€“Â Sai Kumar
29 mins ago

1

Yes. If you take values 1000,000,000 and 10000 and 0 into account. In many cases, the first one is too big to let others be seen properly by your model. But if you take logarithm you will have 9, 4 and 0 respectively. As you see the dynamic range is reduced while the differences are almost preserved. It comes from any exponential nature in your feature. In those cases you need logarithm as the other answer depicted. Hope it helped :)
â€“Â Kasra Manshaei
26 mins ago

1

Well, scaling! Imagine two variables with normal distribution (so there is no need for logarithm) but one of them in the scale of 10ish and the other in the scale of milions. Again feeding them to the model makes the small one invisible. In this case you use scalers to make their scales reasonable.
â€“Â Kasra Manshaei
20 mins ago

Â |Â
show 4 more comments

up vote
2
down vote

answered 37 mins ago

Kasra Manshaei

2,9661035

Manshael, So I can use MinMaxScaler or StandardScaler right? or Is it necessary to take log?
â€“Â Sai Kumar
31 mins ago

Necessary. If you use scalers they compress small values dramatically. That's what I meant to say.
â€“Â Kasra Manshaei
30 mins ago

I didn't get you here. Can you explain?
â€“Â Sai Kumar
29 mins ago

1

Yes. If you take values 1000,000,000 and 10000 and 0 into account. In many cases, the first one is too big to let others be seen properly by your model. But if you take logarithm you will have 9, 4 and 0 respectively. As you see the dynamic range is reduced while the differences are almost preserved. It comes from any exponential nature in your feature. In those cases you need logarithm as the other answer depicted. Hope it helped :)
â€“Â Kasra Manshaei
26 mins ago

1

Well, scaling! Imagine two variables with normal distribution (so there is no need for logarithm) but one of them in the scale of 10ish and the other in the scale of milions. Again feeding them to the model makes the small one invisible. In this case you use scalers to make their scales reasonable.
â€“Â Kasra Manshaei
20 mins ago

Â |Â
show 4 more comments

up vote
2
down vote

answered 37 mins ago

Kasra Manshaei

2,9661035

answered 37 mins ago

Kasra Manshaei

2,9661035

answered 37 mins ago

Kasra Manshaei

2,9661035

answered 37 mins ago

Kasra Manshaei

2,9661035

answered 37 mins ago

Kasra Manshaei

2,9661035

Manshael, So I can use MinMaxScaler or StandardScaler right? or Is it necessary to take log?
â€“Â Sai Kumar
31 mins ago

Necessary. If you use scalers they compress small values dramatically. That's what I meant to say.
â€“Â Kasra Manshaei
30 mins ago

I didn't get you here. Can you explain?
â€“Â Sai Kumar
29 mins ago

1

Yes. If you take values 1000,000,000 and 10000 and 0 into account. In many cases, the first one is too big to let others be seen properly by your model. But if you take logarithm you will have 9, 4 and 0 respectively. As you see the dynamic range is reduced while the differences are almost preserved. It comes from any exponential nature in your feature. In those cases you need logarithm as the other answer depicted. Hope it helped :)
â€“Â Kasra Manshaei
26 mins ago

1

Well, scaling! Imagine two variables with normal distribution (so there is no need for logarithm) but one of them in the scale of 10ish and the other in the scale of milions. Again feeding them to the model makes the small one invisible. In this case you use scalers to make their scales reasonable.
â€“Â Kasra Manshaei
20 mins ago

Â |Â
show 4 more comments

Manshael, So I can use MinMaxScaler or StandardScaler right? or Is it necessary to take log?
â€“Â Sai Kumar
31 mins ago

Necessary. If you use scalers they compress small values dramatically. That's what I meant to say.
â€“Â Kasra Manshaei
30 mins ago

I didn't get you here. Can you explain?
â€“Â Sai Kumar
29 mins ago

1

Yes. If you take values 1000,000,000 and 10000 and 0 into account. In many cases, the first one is too big to let others be seen properly by your model. But if you take logarithm you will have 9, 4 and 0 respectively. As you see the dynamic range is reduced while the differences are almost preserved. It comes from any exponential nature in your feature. In those cases you need logarithm as the other answer depicted. Hope it helped :)
â€“Â Kasra Manshaei
26 mins ago

1

Well, scaling! Imagine two variables with normal distribution (so there is no need for logarithm) but one of them in the scale of 10ish and the other in the scale of milions. Again feeding them to the model makes the small one invisible. In this case you use scalers to make their scales reasonable.
â€“Â Kasra Manshaei
20 mins ago

Manshael, So I can use MinMaxScaler or StandardScaler right? or Is it necessary to take log?
â€“Â Sai Kumar
31 mins ago

Necessary. If you use scalers they compress small values dramatically. That's what I meant to say.
â€“Â Kasra Manshaei
30 mins ago

I didn't get you here. Can you explain?
â€“Â Sai Kumar
29 mins ago

Yes. If you take values 1000,000,000 and 10000 and 0 into account. In many cases, the first one is too big to let others be seen properly by your model. But if you take logarithm you will have 9, 4 and 0 respectively. As you see the dynamic range is reduced while the differences are almost preserved. It comes from any exponential nature in your feature. In those cases you need logarithm as the other answer depicted. Hope it helped :)
â€“Â Kasra Manshaei
26 mins ago

Well, scaling! Imagine two variables with normal distribution (so there is no need for logarithm) but one of them in the scale of 10ish and the other in the scale of milions. Again feeding them to the model makes the small one invisible. In this case you use scalers to make their scales reasonable.
â€“Â Kasra Manshaei
20 mins ago

Â |Â
show 4 more comments

up vote
2
down vote

This type of "fat tailed" distribution is studied in logarithmic scale because of the mathematical properties of the logarithm:

$$log(x^n)= n log(x)$$

which implies

$$log(10^4) = 4 * log(10)$$

and

$$log(10^3) = 3 * log(10)$$

edited 28 mins ago

answered 34 mins ago

Duccio Piovani

213

New contributor

1

Nice answer specially talking about exponential distributions.
â€“Â Kasra Manshaei
16 mins ago

add a commentÂ |Â

up vote
2
down vote

This type of "fat tailed" distribution is studied in logarithmic scale because of the mathematical properties of the logarithm:

$$log(x^n)= n log(x)$$

which implies

$$log(10^4) = 4 * log(10)$$

and

$$log(10^3) = 3 * log(10)$$

edited 28 mins ago

answered 34 mins ago

Duccio Piovani

213

New contributor

1

Nice answer specially talking about exponential distributions.
â€“Â Kasra Manshaei
16 mins ago

add a commentÂ |Â

up vote
2
down vote

This type of "fat tailed" distribution is studied in logarithmic scale because of the mathematical properties of the logarithm:

$$log(x^n)= n log(x)$$

which implies

$$log(10^4) = 4 * log(10)$$

and

$$log(10^3) = 3 * log(10)$$

edited 28 mins ago

answered 34 mins ago

Duccio Piovani

213

New contributor

This type of "fat tailed" distribution is studied in logarithmic scale because of the mathematical properties of the logarithm:

$$log(x^n)= n log(x)$$

which implies

$$log(10^4) = 4 * log(10)$$

and

$$log(10^3) = 3 * log(10)$$

edited 28 mins ago

answered 34 mins ago

Duccio Piovani

213

New contributor

edited 28 mins ago

answered 34 mins ago

Duccio Piovani

213

New contributor

answered 34 mins ago

Duccio Piovani

213

answered 34 mins ago

Duccio Piovani

213

New contributor

Duccio Piovani is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

1

Nice answer specially talking about exponential distributions.
â€“Â Kasra Manshaei
16 mins ago

add a commentÂ |Â

1

Nice answer specially talking about exponential distributions.
â€“Â Kasra Manshaei
16 mins ago

Nice answer specially talking about exponential distributions.
â€“Â Kasra Manshaei
16 mins ago

add a commentÂ |Â

Sai Kumar is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sai Kumar is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky