Need advice on evaluating forecast accuracy in R

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
1
down vote

favorite
1












I'm trying to evaluate some software for forecast accuracy. It works by summing up all the orders from a number of locations for each month, then determines the best model out of a series of models based on the one the generates the minimum MSE. The it takes that model to forecast the demand for each location. For example, for Jan-Jun, Location A has demand (1,0,2,0,0,3) and Location B has demand (2,1,0,0,3,1). The aggregate would be A+B =(3,1,2,0,3,4). The software would then build models using ses, holt, MA, Croston's and Weighted Average. The one that produces the smallest MSE (in-sample) would be chosen to build the forecast for July. Then it would do the same thing again for August when it has an actual demand for July. It continues this way and may change the forecasting method at each month based on the minimum MSE. Therefore, it may generate forecasts for July-Dec using methods like, for example, (ses, ses, MA, Croston's ses, holt).



I currently have data from Jan 2016 to Dec 2017 (24 months) and I'm looking for advice regarding how to determine how well the tool determines a forecast. I thought about using tsCV, but that assume the same model will be applied each month in a rolling forecast, which isn't the case.










share|cite|improve this question























  • @SecretAgentMan: MAD/Mean is not a good idea, especially for intermittent demands. If you try to minimize this error measure, you may end up with "optimal" flat zero forecasts. See here for details and a few pointers to literature, and here for why this effect occurs.
    – Stephan Kolassa
    52 mins ago










  • @SecretAgentMan: the Smart-Willemain method is nice, but it cannot deal with dynamics in the time series, like trend, seasonality or causal factors. In addition, it is patented, which may be an IP problem for some practitioners.
    – Stephan Kolassa
    50 mins ago










  • @StephanKolassa, Thank you for the clarification. Since you're probably the SK from the forecasting book I have, I'll remove my comment until I find evidence to present with it.
    – SecretAgentMan
    44 mins ago










  • @StephanKolassa at the Foresight Practitioner conference, there were talks specifically encouraging using this metric for ID. Further, Willemain gives out R code for the Markov Bootstrap (without the jittering) but your point on the IP is well-taken. Thanks for the links.
    – SecretAgentMan
    41 mins ago
















up vote
1
down vote

favorite
1












I'm trying to evaluate some software for forecast accuracy. It works by summing up all the orders from a number of locations for each month, then determines the best model out of a series of models based on the one the generates the minimum MSE. The it takes that model to forecast the demand for each location. For example, for Jan-Jun, Location A has demand (1,0,2,0,0,3) and Location B has demand (2,1,0,0,3,1). The aggregate would be A+B =(3,1,2,0,3,4). The software would then build models using ses, holt, MA, Croston's and Weighted Average. The one that produces the smallest MSE (in-sample) would be chosen to build the forecast for July. Then it would do the same thing again for August when it has an actual demand for July. It continues this way and may change the forecasting method at each month based on the minimum MSE. Therefore, it may generate forecasts for July-Dec using methods like, for example, (ses, ses, MA, Croston's ses, holt).



I currently have data from Jan 2016 to Dec 2017 (24 months) and I'm looking for advice regarding how to determine how well the tool determines a forecast. I thought about using tsCV, but that assume the same model will be applied each month in a rolling forecast, which isn't the case.










share|cite|improve this question























  • @SecretAgentMan: MAD/Mean is not a good idea, especially for intermittent demands. If you try to minimize this error measure, you may end up with "optimal" flat zero forecasts. See here for details and a few pointers to literature, and here for why this effect occurs.
    – Stephan Kolassa
    52 mins ago










  • @SecretAgentMan: the Smart-Willemain method is nice, but it cannot deal with dynamics in the time series, like trend, seasonality or causal factors. In addition, it is patented, which may be an IP problem for some practitioners.
    – Stephan Kolassa
    50 mins ago










  • @StephanKolassa, Thank you for the clarification. Since you're probably the SK from the forecasting book I have, I'll remove my comment until I find evidence to present with it.
    – SecretAgentMan
    44 mins ago










  • @StephanKolassa at the Foresight Practitioner conference, there were talks specifically encouraging using this metric for ID. Further, Willemain gives out R code for the Markov Bootstrap (without the jittering) but your point on the IP is well-taken. Thanks for the links.
    – SecretAgentMan
    41 mins ago












up vote
1
down vote

favorite
1









up vote
1
down vote

favorite
1






1





I'm trying to evaluate some software for forecast accuracy. It works by summing up all the orders from a number of locations for each month, then determines the best model out of a series of models based on the one the generates the minimum MSE. The it takes that model to forecast the demand for each location. For example, for Jan-Jun, Location A has demand (1,0,2,0,0,3) and Location B has demand (2,1,0,0,3,1). The aggregate would be A+B =(3,1,2,0,3,4). The software would then build models using ses, holt, MA, Croston's and Weighted Average. The one that produces the smallest MSE (in-sample) would be chosen to build the forecast for July. Then it would do the same thing again for August when it has an actual demand for July. It continues this way and may change the forecasting method at each month based on the minimum MSE. Therefore, it may generate forecasts for July-Dec using methods like, for example, (ses, ses, MA, Croston's ses, holt).



I currently have data from Jan 2016 to Dec 2017 (24 months) and I'm looking for advice regarding how to determine how well the tool determines a forecast. I thought about using tsCV, but that assume the same model will be applied each month in a rolling forecast, which isn't the case.










share|cite|improve this question















I'm trying to evaluate some software for forecast accuracy. It works by summing up all the orders from a number of locations for each month, then determines the best model out of a series of models based on the one the generates the minimum MSE. The it takes that model to forecast the demand for each location. For example, for Jan-Jun, Location A has demand (1,0,2,0,0,3) and Location B has demand (2,1,0,0,3,1). The aggregate would be A+B =(3,1,2,0,3,4). The software would then build models using ses, holt, MA, Croston's and Weighted Average. The one that produces the smallest MSE (in-sample) would be chosen to build the forecast for July. Then it would do the same thing again for August when it has an actual demand for July. It continues this way and may change the forecasting method at each month based on the minimum MSE. Therefore, it may generate forecasts for July-Dec using methods like, for example, (ses, ses, MA, Croston's ses, holt).



I currently have data from Jan 2016 to Dec 2017 (24 months) and I'm looking for advice regarding how to determine how well the tool determines a forecast. I thought about using tsCV, but that assume the same model will be applied each month in a rolling forecast, which isn't the case.







time-series forecasting cross-validation






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited 2 hours ago

























asked 2 hours ago









Angus

486




486











  • @SecretAgentMan: MAD/Mean is not a good idea, especially for intermittent demands. If you try to minimize this error measure, you may end up with "optimal" flat zero forecasts. See here for details and a few pointers to literature, and here for why this effect occurs.
    – Stephan Kolassa
    52 mins ago










  • @SecretAgentMan: the Smart-Willemain method is nice, but it cannot deal with dynamics in the time series, like trend, seasonality or causal factors. In addition, it is patented, which may be an IP problem for some practitioners.
    – Stephan Kolassa
    50 mins ago










  • @StephanKolassa, Thank you for the clarification. Since you're probably the SK from the forecasting book I have, I'll remove my comment until I find evidence to present with it.
    – SecretAgentMan
    44 mins ago










  • @StephanKolassa at the Foresight Practitioner conference, there were talks specifically encouraging using this metric for ID. Further, Willemain gives out R code for the Markov Bootstrap (without the jittering) but your point on the IP is well-taken. Thanks for the links.
    – SecretAgentMan
    41 mins ago
















  • @SecretAgentMan: MAD/Mean is not a good idea, especially for intermittent demands. If you try to minimize this error measure, you may end up with "optimal" flat zero forecasts. See here for details and a few pointers to literature, and here for why this effect occurs.
    – Stephan Kolassa
    52 mins ago










  • @SecretAgentMan: the Smart-Willemain method is nice, but it cannot deal with dynamics in the time series, like trend, seasonality or causal factors. In addition, it is patented, which may be an IP problem for some practitioners.
    – Stephan Kolassa
    50 mins ago










  • @StephanKolassa, Thank you for the clarification. Since you're probably the SK from the forecasting book I have, I'll remove my comment until I find evidence to present with it.
    – SecretAgentMan
    44 mins ago










  • @StephanKolassa at the Foresight Practitioner conference, there were talks specifically encouraging using this metric for ID. Further, Willemain gives out R code for the Markov Bootstrap (without the jittering) but your point on the IP is well-taken. Thanks for the links.
    – SecretAgentMan
    41 mins ago















@SecretAgentMan: MAD/Mean is not a good idea, especially for intermittent demands. If you try to minimize this error measure, you may end up with "optimal" flat zero forecasts. See here for details and a few pointers to literature, and here for why this effect occurs.
– Stephan Kolassa
52 mins ago




@SecretAgentMan: MAD/Mean is not a good idea, especially for intermittent demands. If you try to minimize this error measure, you may end up with "optimal" flat zero forecasts. See here for details and a few pointers to literature, and here for why this effect occurs.
– Stephan Kolassa
52 mins ago












@SecretAgentMan: the Smart-Willemain method is nice, but it cannot deal with dynamics in the time series, like trend, seasonality or causal factors. In addition, it is patented, which may be an IP problem for some practitioners.
– Stephan Kolassa
50 mins ago




@SecretAgentMan: the Smart-Willemain method is nice, but it cannot deal with dynamics in the time series, like trend, seasonality or causal factors. In addition, it is patented, which may be an IP problem for some practitioners.
– Stephan Kolassa
50 mins ago












@StephanKolassa, Thank you for the clarification. Since you're probably the SK from the forecasting book I have, I'll remove my comment until I find evidence to present with it.
– SecretAgentMan
44 mins ago




@StephanKolassa, Thank you for the clarification. Since you're probably the SK from the forecasting book I have, I'll remove my comment until I find evidence to present with it.
– SecretAgentMan
44 mins ago












@StephanKolassa at the Foresight Practitioner conference, there were talks specifically encouraging using this metric for ID. Further, Willemain gives out R code for the Markov Bootstrap (without the jittering) but your point on the IP is well-taken. Thanks for the links.
– SecretAgentMan
41 mins ago




@StephanKolassa at the Foresight Practitioner conference, there were talks specifically encouraging using this metric for ID. Further, Willemain gives out R code for the Markov Bootstrap (without the jittering) but your point on the IP is well-taken. Thanks for the links.
– SecretAgentMan
41 mins ago










1 Answer
1






active

oldest

votes

















up vote
5
down vote



accepted










First off, don't use the in-sample accuracy to choose a model. This will invariably lead to overfitting. In-sample accuracy is not a good guide to out-of-sample prediction. Instead, use a holdout sample.



Regarding your main question: again, use a holdout sample to see how well your algorithm performs on truly new data.



Thus, if you are interested in $h$-month-ahead forecasts:



  1. Fit your models to the data except for the last $2h$ months.

  2. Forecast all of them out to a horizon of $h$ months. Note the forecast error of each model, using rmse or whatever.

  3. Pick the model that performed best. Re-fit this model to the data except the last $h$ months. Forecast $h$ months ahead. Note the forecast error.

Do this for all your time series. Check how well this algorithm worked, and compare it to the performance of a few very simple benchmark methods, like always forecasting the historical mean, or the last observation. Or taking the average of all your candidate models' forecasts - averages of forecasts often outperform choosing the "best" method by some criterion.






share|cite|improve this answer
















  • 1




    Stephan Kolassa is perfectly right. I'd just like to add a purely business criteria. Which forecast will have the least negative impact on the business in case of error ?
    – AlainD
    1 hour ago










  • +1, " don't use the in-sample accuracy to choose a model"
    – SecretAgentMan
    58 mins ago










  • Stephan, if I only have 12 months of data to forecast month 13, could I still use the forecast package with tsCV?
    – Angus
    9 mins ago










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f366935%2fneed-advice-on-evaluating-forecast-accuracy-in-r%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
5
down vote



accepted










First off, don't use the in-sample accuracy to choose a model. This will invariably lead to overfitting. In-sample accuracy is not a good guide to out-of-sample prediction. Instead, use a holdout sample.



Regarding your main question: again, use a holdout sample to see how well your algorithm performs on truly new data.



Thus, if you are interested in $h$-month-ahead forecasts:



  1. Fit your models to the data except for the last $2h$ months.

  2. Forecast all of them out to a horizon of $h$ months. Note the forecast error of each model, using rmse or whatever.

  3. Pick the model that performed best. Re-fit this model to the data except the last $h$ months. Forecast $h$ months ahead. Note the forecast error.

Do this for all your time series. Check how well this algorithm worked, and compare it to the performance of a few very simple benchmark methods, like always forecasting the historical mean, or the last observation. Or taking the average of all your candidate models' forecasts - averages of forecasts often outperform choosing the "best" method by some criterion.






share|cite|improve this answer
















  • 1




    Stephan Kolassa is perfectly right. I'd just like to add a purely business criteria. Which forecast will have the least negative impact on the business in case of error ?
    – AlainD
    1 hour ago










  • +1, " don't use the in-sample accuracy to choose a model"
    – SecretAgentMan
    58 mins ago










  • Stephan, if I only have 12 months of data to forecast month 13, could I still use the forecast package with tsCV?
    – Angus
    9 mins ago














up vote
5
down vote



accepted










First off, don't use the in-sample accuracy to choose a model. This will invariably lead to overfitting. In-sample accuracy is not a good guide to out-of-sample prediction. Instead, use a holdout sample.



Regarding your main question: again, use a holdout sample to see how well your algorithm performs on truly new data.



Thus, if you are interested in $h$-month-ahead forecasts:



  1. Fit your models to the data except for the last $2h$ months.

  2. Forecast all of them out to a horizon of $h$ months. Note the forecast error of each model, using rmse or whatever.

  3. Pick the model that performed best. Re-fit this model to the data except the last $h$ months. Forecast $h$ months ahead. Note the forecast error.

Do this for all your time series. Check how well this algorithm worked, and compare it to the performance of a few very simple benchmark methods, like always forecasting the historical mean, or the last observation. Or taking the average of all your candidate models' forecasts - averages of forecasts often outperform choosing the "best" method by some criterion.






share|cite|improve this answer
















  • 1




    Stephan Kolassa is perfectly right. I'd just like to add a purely business criteria. Which forecast will have the least negative impact on the business in case of error ?
    – AlainD
    1 hour ago










  • +1, " don't use the in-sample accuracy to choose a model"
    – SecretAgentMan
    58 mins ago










  • Stephan, if I only have 12 months of data to forecast month 13, could I still use the forecast package with tsCV?
    – Angus
    9 mins ago












up vote
5
down vote



accepted







up vote
5
down vote



accepted






First off, don't use the in-sample accuracy to choose a model. This will invariably lead to overfitting. In-sample accuracy is not a good guide to out-of-sample prediction. Instead, use a holdout sample.



Regarding your main question: again, use a holdout sample to see how well your algorithm performs on truly new data.



Thus, if you are interested in $h$-month-ahead forecasts:



  1. Fit your models to the data except for the last $2h$ months.

  2. Forecast all of them out to a horizon of $h$ months. Note the forecast error of each model, using rmse or whatever.

  3. Pick the model that performed best. Re-fit this model to the data except the last $h$ months. Forecast $h$ months ahead. Note the forecast error.

Do this for all your time series. Check how well this algorithm worked, and compare it to the performance of a few very simple benchmark methods, like always forecasting the historical mean, or the last observation. Or taking the average of all your candidate models' forecasts - averages of forecasts often outperform choosing the "best" method by some criterion.






share|cite|improve this answer












First off, don't use the in-sample accuracy to choose a model. This will invariably lead to overfitting. In-sample accuracy is not a good guide to out-of-sample prediction. Instead, use a holdout sample.



Regarding your main question: again, use a holdout sample to see how well your algorithm performs on truly new data.



Thus, if you are interested in $h$-month-ahead forecasts:



  1. Fit your models to the data except for the last $2h$ months.

  2. Forecast all of them out to a horizon of $h$ months. Note the forecast error of each model, using rmse or whatever.

  3. Pick the model that performed best. Re-fit this model to the data except the last $h$ months. Forecast $h$ months ahead. Note the forecast error.

Do this for all your time series. Check how well this algorithm worked, and compare it to the performance of a few very simple benchmark methods, like always forecasting the historical mean, or the last observation. Or taking the average of all your candidate models' forecasts - averages of forecasts often outperform choosing the "best" method by some criterion.







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered 2 hours ago









Stephan Kolassa

40.6k686150




40.6k686150







  • 1




    Stephan Kolassa is perfectly right. I'd just like to add a purely business criteria. Which forecast will have the least negative impact on the business in case of error ?
    – AlainD
    1 hour ago










  • +1, " don't use the in-sample accuracy to choose a model"
    – SecretAgentMan
    58 mins ago










  • Stephan, if I only have 12 months of data to forecast month 13, could I still use the forecast package with tsCV?
    – Angus
    9 mins ago












  • 1




    Stephan Kolassa is perfectly right. I'd just like to add a purely business criteria. Which forecast will have the least negative impact on the business in case of error ?
    – AlainD
    1 hour ago










  • +1, " don't use the in-sample accuracy to choose a model"
    – SecretAgentMan
    58 mins ago










  • Stephan, if I only have 12 months of data to forecast month 13, could I still use the forecast package with tsCV?
    – Angus
    9 mins ago







1




1




Stephan Kolassa is perfectly right. I'd just like to add a purely business criteria. Which forecast will have the least negative impact on the business in case of error ?
– AlainD
1 hour ago




Stephan Kolassa is perfectly right. I'd just like to add a purely business criteria. Which forecast will have the least negative impact on the business in case of error ?
– AlainD
1 hour ago












+1, " don't use the in-sample accuracy to choose a model"
– SecretAgentMan
58 mins ago




+1, " don't use the in-sample accuracy to choose a model"
– SecretAgentMan
58 mins ago












Stephan, if I only have 12 months of data to forecast month 13, could I still use the forecast package with tsCV?
– Angus
9 mins ago




Stephan, if I only have 12 months of data to forecast month 13, could I still use the forecast package with tsCV?
– Angus
9 mins ago

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f366935%2fneed-advice-on-evaluating-forecast-accuracy-in-r%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

Long meetings (6-7 hours a day): Being “babysat” by supervisor

Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

Confectionery