Why does Random Forest variable importance not sum to 100%?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite
1












The randomForest package in R has the importance() function to get both node impurity and mean premutation importance for variables. Why, when calculating mean permutation importance, do the results not sum to 100%?



Here's a simple reproducible example:



library(randomForest)
data(iris)
iris.rf <- randomForest(Species~., importance = TRUE, data = iris, ntrees=1000)
imp <- importance(iris.rf, type = 1)
sum_imp <- sum(imp)
sum_imp # != 100


Thanks







share|cite|improve this question







New contributor




Micha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.














  • 1




    Why do you assume it should sum to 1? I see no reason for that belief.
    – Firebug
    Sep 8 at 18:31











  • See Measures of variable importance in random forests
    – Firebug
    Sep 8 at 21:22
















up vote
1
down vote

favorite
1












The randomForest package in R has the importance() function to get both node impurity and mean premutation importance for variables. Why, when calculating mean permutation importance, do the results not sum to 100%?



Here's a simple reproducible example:



library(randomForest)
data(iris)
iris.rf <- randomForest(Species~., importance = TRUE, data = iris, ntrees=1000)
imp <- importance(iris.rf, type = 1)
sum_imp <- sum(imp)
sum_imp # != 100


Thanks







share|cite|improve this question







New contributor




Micha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.














  • 1




    Why do you assume it should sum to 1? I see no reason for that belief.
    – Firebug
    Sep 8 at 18:31











  • See Measures of variable importance in random forests
    – Firebug
    Sep 8 at 21:22












up vote
1
down vote

favorite
1









up vote
1
down vote

favorite
1






1





The randomForest package in R has the importance() function to get both node impurity and mean premutation importance for variables. Why, when calculating mean permutation importance, do the results not sum to 100%?



Here's a simple reproducible example:



library(randomForest)
data(iris)
iris.rf <- randomForest(Species~., importance = TRUE, data = iris, ntrees=1000)
imp <- importance(iris.rf, type = 1)
sum_imp <- sum(imp)
sum_imp # != 100


Thanks







share|cite|improve this question







New contributor




Micha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.










The randomForest package in R has the importance() function to get both node impurity and mean premutation importance for variables. Why, when calculating mean permutation importance, do the results not sum to 100%?



Here's a simple reproducible example:



library(randomForest)
data(iris)
iris.rf <- randomForest(Species~., importance = TRUE, data = iris, ntrees=1000)
imp <- importance(iris.rf, type = 1)
sum_imp <- sum(imp)
sum_imp # != 100


Thanks









share|cite|improve this question







New contributor




Micha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|cite|improve this question




share|cite|improve this question






New contributor




Micha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked Sep 8 at 14:02









Micha

1083




1083




New contributor




Micha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Micha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Micha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







  • 1




    Why do you assume it should sum to 1? I see no reason for that belief.
    – Firebug
    Sep 8 at 18:31











  • See Measures of variable importance in random forests
    – Firebug
    Sep 8 at 21:22












  • 1




    Why do you assume it should sum to 1? I see no reason for that belief.
    – Firebug
    Sep 8 at 18:31











  • See Measures of variable importance in random forests
    – Firebug
    Sep 8 at 21:22







1




1




Why do you assume it should sum to 1? I see no reason for that belief.
– Firebug
Sep 8 at 18:31





Why do you assume it should sum to 1? I see no reason for that belief.
– Firebug
Sep 8 at 18:31













See Measures of variable importance in random forests
– Firebug
Sep 8 at 21:22




See Measures of variable importance in random forests
– Firebug
Sep 8 at 21:22










1 Answer
1






active

oldest

votes

















up vote
4
down vote



accepted










As far as I can tell, variable importance is measuring either: a) the percentage that the prediction error increases when the variable is removed, or b) the change in the purity of each node when the variable is removed. (Averaged over all trees in the forest.) Neither of these is a probability, so there's no reason they should add up to 100%.



You can, of course, divide by the sum of all importances to get a percentage, but I think that would create confusion: you now have a percentage of what exactly?



(Welcome to the site!)






share|cite|improve this answer




















  • Thanks for the welcome. I expect I'll be back :-)
    – Micha
    Sep 9 at 6:50










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);






Micha is a new contributor. Be nice, and check out our Code of Conduct.









 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f365955%2fwhy-does-random-forest-variable-importance-not-sum-to-100%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
4
down vote



accepted










As far as I can tell, variable importance is measuring either: a) the percentage that the prediction error increases when the variable is removed, or b) the change in the purity of each node when the variable is removed. (Averaged over all trees in the forest.) Neither of these is a probability, so there's no reason they should add up to 100%.



You can, of course, divide by the sum of all importances to get a percentage, but I think that would create confusion: you now have a percentage of what exactly?



(Welcome to the site!)






share|cite|improve this answer




















  • Thanks for the welcome. I expect I'll be back :-)
    – Micha
    Sep 9 at 6:50














up vote
4
down vote



accepted










As far as I can tell, variable importance is measuring either: a) the percentage that the prediction error increases when the variable is removed, or b) the change in the purity of each node when the variable is removed. (Averaged over all trees in the forest.) Neither of these is a probability, so there's no reason they should add up to 100%.



You can, of course, divide by the sum of all importances to get a percentage, but I think that would create confusion: you now have a percentage of what exactly?



(Welcome to the site!)






share|cite|improve this answer




















  • Thanks for the welcome. I expect I'll be back :-)
    – Micha
    Sep 9 at 6:50












up vote
4
down vote



accepted







up vote
4
down vote



accepted






As far as I can tell, variable importance is measuring either: a) the percentage that the prediction error increases when the variable is removed, or b) the change in the purity of each node when the variable is removed. (Averaged over all trees in the forest.) Neither of these is a probability, so there's no reason they should add up to 100%.



You can, of course, divide by the sum of all importances to get a percentage, but I think that would create confusion: you now have a percentage of what exactly?



(Welcome to the site!)






share|cite|improve this answer












As far as I can tell, variable importance is measuring either: a) the percentage that the prediction error increases when the variable is removed, or b) the change in the purity of each node when the variable is removed. (Averaged over all trees in the forest.) Neither of these is a probability, so there's no reason they should add up to 100%.



You can, of course, divide by the sum of all importances to get a percentage, but I think that would create confusion: you now have a percentage of what exactly?



(Welcome to the site!)







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered Sep 8 at 14:43









Wayne

15.4k13572




15.4k13572











  • Thanks for the welcome. I expect I'll be back :-)
    – Micha
    Sep 9 at 6:50
















  • Thanks for the welcome. I expect I'll be back :-)
    – Micha
    Sep 9 at 6:50















Thanks for the welcome. I expect I'll be back :-)
– Micha
Sep 9 at 6:50




Thanks for the welcome. I expect I'll be back :-)
– Micha
Sep 9 at 6:50










Micha is a new contributor. Be nice, and check out our Code of Conduct.









 

draft saved


draft discarded


















Micha is a new contributor. Be nice, and check out our Code of Conduct.












Micha is a new contributor. Be nice, and check out our Code of Conduct.











Micha is a new contributor. Be nice, and check out our Code of Conduct.













 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f365955%2fwhy-does-random-forest-variable-importance-not-sum-to-100%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What does second last employer means? [closed]

List of Gilmore Girls characters

Confectionery