Need advice on evaluating forecast accuracy in R
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
1
down vote
favorite
I'm trying to evaluate some software for forecast accuracy. It works by summing up all the orders from a number of locations for each month, then determines the best model out of a series of models based on the one the generates the minimum MSE. The it takes that model to forecast the demand for each location. For example, for Jan-Jun, Location A has demand (1,0,2,0,0,3) and Location B has demand (2,1,0,0,3,1). The aggregate would be A+B =(3,1,2,0,3,4). The software would then build models using ses, holt, MA, Croston's and Weighted Average. The one that produces the smallest MSE (in-sample) would be chosen to build the forecast for July. Then it would do the same thing again for August when it has an actual demand for July. It continues this way and may change the forecasting method at each month based on the minimum MSE. Therefore, it may generate forecasts for July-Dec using methods like, for example, (ses, ses, MA, Croston's ses, holt).
I currently have data from Jan 2016 to Dec 2017 (24 months) and I'm looking for advice regarding how to determine how well the tool determines a forecast. I thought about using tsCV, but that assume the same model will be applied each month in a rolling forecast, which isn't the case.
time-series forecasting cross-validation
add a comment |Â
up vote
1
down vote
favorite
I'm trying to evaluate some software for forecast accuracy. It works by summing up all the orders from a number of locations for each month, then determines the best model out of a series of models based on the one the generates the minimum MSE. The it takes that model to forecast the demand for each location. For example, for Jan-Jun, Location A has demand (1,0,2,0,0,3) and Location B has demand (2,1,0,0,3,1). The aggregate would be A+B =(3,1,2,0,3,4). The software would then build models using ses, holt, MA, Croston's and Weighted Average. The one that produces the smallest MSE (in-sample) would be chosen to build the forecast for July. Then it would do the same thing again for August when it has an actual demand for July. It continues this way and may change the forecasting method at each month based on the minimum MSE. Therefore, it may generate forecasts for July-Dec using methods like, for example, (ses, ses, MA, Croston's ses, holt).
I currently have data from Jan 2016 to Dec 2017 (24 months) and I'm looking for advice regarding how to determine how well the tool determines a forecast. I thought about using tsCV, but that assume the same model will be applied each month in a rolling forecast, which isn't the case.
time-series forecasting cross-validation
@SecretAgentMan: MAD/Mean is not a good idea, especially for intermittent demands. If you try to minimize this error measure, you may end up with "optimal" flat zero forecasts. See here for details and a few pointers to literature, and here for why this effect occurs.
â Stephan Kolassa
52 mins ago
@SecretAgentMan: the Smart-Willemain method is nice, but it cannot deal with dynamics in the time series, like trend, seasonality or causal factors. In addition, it is patented, which may be an IP problem for some practitioners.
â Stephan Kolassa
50 mins ago
@StephanKolassa, Thank you for the clarification. Since you're probably the SK from the forecasting book I have, I'll remove my comment until I find evidence to present with it.
â SecretAgentMan
44 mins ago
@StephanKolassa at the Foresight Practitioner conference, there were talks specifically encouraging using this metric for ID. Further, Willemain gives out R code for the Markov Bootstrap (without the jittering) but your point on the IP is well-taken. Thanks for the links.
â SecretAgentMan
41 mins ago
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I'm trying to evaluate some software for forecast accuracy. It works by summing up all the orders from a number of locations for each month, then determines the best model out of a series of models based on the one the generates the minimum MSE. The it takes that model to forecast the demand for each location. For example, for Jan-Jun, Location A has demand (1,0,2,0,0,3) and Location B has demand (2,1,0,0,3,1). The aggregate would be A+B =(3,1,2,0,3,4). The software would then build models using ses, holt, MA, Croston's and Weighted Average. The one that produces the smallest MSE (in-sample) would be chosen to build the forecast for July. Then it would do the same thing again for August when it has an actual demand for July. It continues this way and may change the forecasting method at each month based on the minimum MSE. Therefore, it may generate forecasts for July-Dec using methods like, for example, (ses, ses, MA, Croston's ses, holt).
I currently have data from Jan 2016 to Dec 2017 (24 months) and I'm looking for advice regarding how to determine how well the tool determines a forecast. I thought about using tsCV, but that assume the same model will be applied each month in a rolling forecast, which isn't the case.
time-series forecasting cross-validation
I'm trying to evaluate some software for forecast accuracy. It works by summing up all the orders from a number of locations for each month, then determines the best model out of a series of models based on the one the generates the minimum MSE. The it takes that model to forecast the demand for each location. For example, for Jan-Jun, Location A has demand (1,0,2,0,0,3) and Location B has demand (2,1,0,0,3,1). The aggregate would be A+B =(3,1,2,0,3,4). The software would then build models using ses, holt, MA, Croston's and Weighted Average. The one that produces the smallest MSE (in-sample) would be chosen to build the forecast for July. Then it would do the same thing again for August when it has an actual demand for July. It continues this way and may change the forecasting method at each month based on the minimum MSE. Therefore, it may generate forecasts for July-Dec using methods like, for example, (ses, ses, MA, Croston's ses, holt).
I currently have data from Jan 2016 to Dec 2017 (24 months) and I'm looking for advice regarding how to determine how well the tool determines a forecast. I thought about using tsCV, but that assume the same model will be applied each month in a rolling forecast, which isn't the case.
time-series forecasting cross-validation
time-series forecasting cross-validation
edited 2 hours ago
asked 2 hours ago
Angus
486
486
@SecretAgentMan: MAD/Mean is not a good idea, especially for intermittent demands. If you try to minimize this error measure, you may end up with "optimal" flat zero forecasts. See here for details and a few pointers to literature, and here for why this effect occurs.
â Stephan Kolassa
52 mins ago
@SecretAgentMan: the Smart-Willemain method is nice, but it cannot deal with dynamics in the time series, like trend, seasonality or causal factors. In addition, it is patented, which may be an IP problem for some practitioners.
â Stephan Kolassa
50 mins ago
@StephanKolassa, Thank you for the clarification. Since you're probably the SK from the forecasting book I have, I'll remove my comment until I find evidence to present with it.
â SecretAgentMan
44 mins ago
@StephanKolassa at the Foresight Practitioner conference, there were talks specifically encouraging using this metric for ID. Further, Willemain gives out R code for the Markov Bootstrap (without the jittering) but your point on the IP is well-taken. Thanks for the links.
â SecretAgentMan
41 mins ago
add a comment |Â
@SecretAgentMan: MAD/Mean is not a good idea, especially for intermittent demands. If you try to minimize this error measure, you may end up with "optimal" flat zero forecasts. See here for details and a few pointers to literature, and here for why this effect occurs.
â Stephan Kolassa
52 mins ago
@SecretAgentMan: the Smart-Willemain method is nice, but it cannot deal with dynamics in the time series, like trend, seasonality or causal factors. In addition, it is patented, which may be an IP problem for some practitioners.
â Stephan Kolassa
50 mins ago
@StephanKolassa, Thank you for the clarification. Since you're probably the SK from the forecasting book I have, I'll remove my comment until I find evidence to present with it.
â SecretAgentMan
44 mins ago
@StephanKolassa at the Foresight Practitioner conference, there were talks specifically encouraging using this metric for ID. Further, Willemain gives out R code for the Markov Bootstrap (without the jittering) but your point on the IP is well-taken. Thanks for the links.
â SecretAgentMan
41 mins ago
@SecretAgentMan: MAD/Mean is not a good idea, especially for intermittent demands. If you try to minimize this error measure, you may end up with "optimal" flat zero forecasts. See here for details and a few pointers to literature, and here for why this effect occurs.
â Stephan Kolassa
52 mins ago
@SecretAgentMan: MAD/Mean is not a good idea, especially for intermittent demands. If you try to minimize this error measure, you may end up with "optimal" flat zero forecasts. See here for details and a few pointers to literature, and here for why this effect occurs.
â Stephan Kolassa
52 mins ago
@SecretAgentMan: the Smart-Willemain method is nice, but it cannot deal with dynamics in the time series, like trend, seasonality or causal factors. In addition, it is patented, which may be an IP problem for some practitioners.
â Stephan Kolassa
50 mins ago
@SecretAgentMan: the Smart-Willemain method is nice, but it cannot deal with dynamics in the time series, like trend, seasonality or causal factors. In addition, it is patented, which may be an IP problem for some practitioners.
â Stephan Kolassa
50 mins ago
@StephanKolassa, Thank you for the clarification. Since you're probably the SK from the forecasting book I have, I'll remove my comment until I find evidence to present with it.
â SecretAgentMan
44 mins ago
@StephanKolassa, Thank you for the clarification. Since you're probably the SK from the forecasting book I have, I'll remove my comment until I find evidence to present with it.
â SecretAgentMan
44 mins ago
@StephanKolassa at the Foresight Practitioner conference, there were talks specifically encouraging using this metric for ID. Further, Willemain gives out R code for the Markov Bootstrap (without the jittering) but your point on the IP is well-taken. Thanks for the links.
â SecretAgentMan
41 mins ago
@StephanKolassa at the Foresight Practitioner conference, there were talks specifically encouraging using this metric for ID. Further, Willemain gives out R code for the Markov Bootstrap (without the jittering) but your point on the IP is well-taken. Thanks for the links.
â SecretAgentMan
41 mins ago
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
5
down vote
accepted
First off, don't use the in-sample accuracy to choose a model. This will invariably lead to overfitting. In-sample accuracy is not a good guide to out-of-sample prediction. Instead, use a holdout sample.
Regarding your main question: again, use a holdout sample to see how well your algorithm performs on truly new data.
Thus, if you are interested in $h$-month-ahead forecasts:
- Fit your models to the data except for the last $2h$ months.
- Forecast all of them out to a horizon of $h$ months. Note the forecast error of each model, using rmse or whatever.
- Pick the model that performed best. Re-fit this model to the data except the last $h$ months. Forecast $h$ months ahead. Note the forecast error.
Do this for all your time series. Check how well this algorithm worked, and compare it to the performance of a few very simple benchmark methods, like always forecasting the historical mean, or the last observation. Or taking the average of all your candidate models' forecasts - averages of forecasts often outperform choosing the "best" method by some criterion.
1
Stephan Kolassa is perfectly right. I'd just like to add a purely business criteria. Which forecast will have the least negative impact on the business in case of error ?
â AlainD
1 hour ago
+1, " don't use the in-sample accuracy to choose a model"
â SecretAgentMan
58 mins ago
Stephan, if I only have 12 months of data to forecast month 13, could I still use the forecast package with tsCV?
â Angus
9 mins ago
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
accepted
First off, don't use the in-sample accuracy to choose a model. This will invariably lead to overfitting. In-sample accuracy is not a good guide to out-of-sample prediction. Instead, use a holdout sample.
Regarding your main question: again, use a holdout sample to see how well your algorithm performs on truly new data.
Thus, if you are interested in $h$-month-ahead forecasts:
- Fit your models to the data except for the last $2h$ months.
- Forecast all of them out to a horizon of $h$ months. Note the forecast error of each model, using rmse or whatever.
- Pick the model that performed best. Re-fit this model to the data except the last $h$ months. Forecast $h$ months ahead. Note the forecast error.
Do this for all your time series. Check how well this algorithm worked, and compare it to the performance of a few very simple benchmark methods, like always forecasting the historical mean, or the last observation. Or taking the average of all your candidate models' forecasts - averages of forecasts often outperform choosing the "best" method by some criterion.
1
Stephan Kolassa is perfectly right. I'd just like to add a purely business criteria. Which forecast will have the least negative impact on the business in case of error ?
â AlainD
1 hour ago
+1, " don't use the in-sample accuracy to choose a model"
â SecretAgentMan
58 mins ago
Stephan, if I only have 12 months of data to forecast month 13, could I still use the forecast package with tsCV?
â Angus
9 mins ago
add a comment |Â
up vote
5
down vote
accepted
First off, don't use the in-sample accuracy to choose a model. This will invariably lead to overfitting. In-sample accuracy is not a good guide to out-of-sample prediction. Instead, use a holdout sample.
Regarding your main question: again, use a holdout sample to see how well your algorithm performs on truly new data.
Thus, if you are interested in $h$-month-ahead forecasts:
- Fit your models to the data except for the last $2h$ months.
- Forecast all of them out to a horizon of $h$ months. Note the forecast error of each model, using rmse or whatever.
- Pick the model that performed best. Re-fit this model to the data except the last $h$ months. Forecast $h$ months ahead. Note the forecast error.
Do this for all your time series. Check how well this algorithm worked, and compare it to the performance of a few very simple benchmark methods, like always forecasting the historical mean, or the last observation. Or taking the average of all your candidate models' forecasts - averages of forecasts often outperform choosing the "best" method by some criterion.
1
Stephan Kolassa is perfectly right. I'd just like to add a purely business criteria. Which forecast will have the least negative impact on the business in case of error ?
â AlainD
1 hour ago
+1, " don't use the in-sample accuracy to choose a model"
â SecretAgentMan
58 mins ago
Stephan, if I only have 12 months of data to forecast month 13, could I still use the forecast package with tsCV?
â Angus
9 mins ago
add a comment |Â
up vote
5
down vote
accepted
up vote
5
down vote
accepted
First off, don't use the in-sample accuracy to choose a model. This will invariably lead to overfitting. In-sample accuracy is not a good guide to out-of-sample prediction. Instead, use a holdout sample.
Regarding your main question: again, use a holdout sample to see how well your algorithm performs on truly new data.
Thus, if you are interested in $h$-month-ahead forecasts:
- Fit your models to the data except for the last $2h$ months.
- Forecast all of them out to a horizon of $h$ months. Note the forecast error of each model, using rmse or whatever.
- Pick the model that performed best. Re-fit this model to the data except the last $h$ months. Forecast $h$ months ahead. Note the forecast error.
Do this for all your time series. Check how well this algorithm worked, and compare it to the performance of a few very simple benchmark methods, like always forecasting the historical mean, or the last observation. Or taking the average of all your candidate models' forecasts - averages of forecasts often outperform choosing the "best" method by some criterion.
First off, don't use the in-sample accuracy to choose a model. This will invariably lead to overfitting. In-sample accuracy is not a good guide to out-of-sample prediction. Instead, use a holdout sample.
Regarding your main question: again, use a holdout sample to see how well your algorithm performs on truly new data.
Thus, if you are interested in $h$-month-ahead forecasts:
- Fit your models to the data except for the last $2h$ months.
- Forecast all of them out to a horizon of $h$ months. Note the forecast error of each model, using rmse or whatever.
- Pick the model that performed best. Re-fit this model to the data except the last $h$ months. Forecast $h$ months ahead. Note the forecast error.
Do this for all your time series. Check how well this algorithm worked, and compare it to the performance of a few very simple benchmark methods, like always forecasting the historical mean, or the last observation. Or taking the average of all your candidate models' forecasts - averages of forecasts often outperform choosing the "best" method by some criterion.
answered 2 hours ago
Stephan Kolassa
40.6k686150
40.6k686150
1
Stephan Kolassa is perfectly right. I'd just like to add a purely business criteria. Which forecast will have the least negative impact on the business in case of error ?
â AlainD
1 hour ago
+1, " don't use the in-sample accuracy to choose a model"
â SecretAgentMan
58 mins ago
Stephan, if I only have 12 months of data to forecast month 13, could I still use the forecast package with tsCV?
â Angus
9 mins ago
add a comment |Â
1
Stephan Kolassa is perfectly right. I'd just like to add a purely business criteria. Which forecast will have the least negative impact on the business in case of error ?
â AlainD
1 hour ago
+1, " don't use the in-sample accuracy to choose a model"
â SecretAgentMan
58 mins ago
Stephan, if I only have 12 months of data to forecast month 13, could I still use the forecast package with tsCV?
â Angus
9 mins ago
1
1
Stephan Kolassa is perfectly right. I'd just like to add a purely business criteria. Which forecast will have the least negative impact on the business in case of error ?
â AlainD
1 hour ago
Stephan Kolassa is perfectly right. I'd just like to add a purely business criteria. Which forecast will have the least negative impact on the business in case of error ?
â AlainD
1 hour ago
+1, " don't use the in-sample accuracy to choose a model"
â SecretAgentMan
58 mins ago
+1, " don't use the in-sample accuracy to choose a model"
â SecretAgentMan
58 mins ago
Stephan, if I only have 12 months of data to forecast month 13, could I still use the forecast package with tsCV?
â Angus
9 mins ago
Stephan, if I only have 12 months of data to forecast month 13, could I still use the forecast package with tsCV?
â Angus
9 mins ago
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f366935%2fneed-advice-on-evaluating-forecast-accuracy-in-r%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
@SecretAgentMan: MAD/Mean is not a good idea, especially for intermittent demands. If you try to minimize this error measure, you may end up with "optimal" flat zero forecasts. See here for details and a few pointers to literature, and here for why this effect occurs.
â Stephan Kolassa
52 mins ago
@SecretAgentMan: the Smart-Willemain method is nice, but it cannot deal with dynamics in the time series, like trend, seasonality or causal factors. In addition, it is patented, which may be an IP problem for some practitioners.
â Stephan Kolassa
50 mins ago
@StephanKolassa, Thank you for the clarification. Since you're probably the SK from the forecasting book I have, I'll remove my comment until I find evidence to present with it.
â SecretAgentMan
44 mins ago
@StephanKolassa at the Foresight Practitioner conference, there were talks specifically encouraging using this metric for ID. Further, Willemain gives out R code for the Markov Bootstrap (without the jittering) but your point on the IP is well-taken. Thanks for the links.
â SecretAgentMan
41 mins ago