What the good general regression technqiue for a problem with 50 independent varaibles
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I am a newbie to data science and statistics. I came across this problem, which has 50 independent variables and one dependent variable and trying to identify the good regression technique to start with. The following is the flow chart that I executed:
Data Exploration -> Correlational matrix -> dimension reduction -> PCA (Dimension reduction) -> Basic Linear Regression technique.
Can someone guide me, if there is any other better technique or procedure.
regression statistics data-science-model
New contributor
user86752 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
1
down vote
favorite
I am a newbie to data science and statistics. I came across this problem, which has 50 independent variables and one dependent variable and trying to identify the good regression technique to start with. The following is the flow chart that I executed:
Data Exploration -> Correlational matrix -> dimension reduction -> PCA (Dimension reduction) -> Basic Linear Regression technique.
Can someone guide me, if there is any other better technique or procedure.
regression statistics data-science-model
New contributor
user86752 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I am a newbie to data science and statistics. I came across this problem, which has 50 independent variables and one dependent variable and trying to identify the good regression technique to start with. The following is the flow chart that I executed:
Data Exploration -> Correlational matrix -> dimension reduction -> PCA (Dimension reduction) -> Basic Linear Regression technique.
Can someone guide me, if there is any other better technique or procedure.
regression statistics data-science-model
New contributor
user86752 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I am a newbie to data science and statistics. I came across this problem, which has 50 independent variables and one dependent variable and trying to identify the good regression technique to start with. The following is the flow chart that I executed:
Data Exploration -> Correlational matrix -> dimension reduction -> PCA (Dimension reduction) -> Basic Linear Regression technique.
Can someone guide me, if there is any other better technique or procedure.
regression statistics data-science-model
regression statistics data-science-model
New contributor
user86752 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
user86752 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
user86752 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 6 hours ago
user86752
61
61
New contributor
user86752 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
user86752 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
user86752 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
3
down vote
In no way is this going to be an exhaustive answer, but it will definitely give you a starting point in Python
-
Data Exploration
Start with Pandas Profiling
. It will give you HTML reports of your variables. If the quality of the data is good, it will provide some insights into the fill rate, depending upon the variable type some statistics for each variable
Correlational matrix
The pandas profiling report includes the coorelation matrix. But if you are looking to compute by hand, use pd.corr()
. You can vary parameters to get different correlation metrics like ‘pearson’, ‘kendall’, ‘spearman’
Dimension Reduction -> PCA (Dimension reduction)
There are many ways to do this. Keep in mind if you are looking for accuracies only and don't care about how X
is influencing y
, (1) is an optional step (applies to (2) as well).
- Analyse the correlation matrix and use
VIF
to dump variables with high correlation - Factor Analysis / PCA for dimensionality reduction
- Use LASSO to fit a model, check the coefficients and the ones that are
0
or going to0
can be thought of as weak indicators and can be eliminated. - Keep all 50, and use Ridge Regression and vary the alpha parameter to fine-tune accuracy (or whatever metric you are trying to optimize)
- If the model still doesn't seem to be stable, try to cook non-linear features with sklearn's Polynomial Features, regularize and repeat.
- Probably the most important in the real world, ask the domain expert on what he/she thinks might be the important variables
Basic Linear Regression technique
- Playing with hyperparamters to get good cross-validation/test score is the key here for a basic Linear Regression model.
- Try as many techniques as you can from here and here
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
In no way is this going to be an exhaustive answer, but it will definitely give you a starting point in Python
-
Data Exploration
Start with Pandas Profiling
. It will give you HTML reports of your variables. If the quality of the data is good, it will provide some insights into the fill rate, depending upon the variable type some statistics for each variable
Correlational matrix
The pandas profiling report includes the coorelation matrix. But if you are looking to compute by hand, use pd.corr()
. You can vary parameters to get different correlation metrics like ‘pearson’, ‘kendall’, ‘spearman’
Dimension Reduction -> PCA (Dimension reduction)
There are many ways to do this. Keep in mind if you are looking for accuracies only and don't care about how X
is influencing y
, (1) is an optional step (applies to (2) as well).
- Analyse the correlation matrix and use
VIF
to dump variables with high correlation - Factor Analysis / PCA for dimensionality reduction
- Use LASSO to fit a model, check the coefficients and the ones that are
0
or going to0
can be thought of as weak indicators and can be eliminated. - Keep all 50, and use Ridge Regression and vary the alpha parameter to fine-tune accuracy (or whatever metric you are trying to optimize)
- If the model still doesn't seem to be stable, try to cook non-linear features with sklearn's Polynomial Features, regularize and repeat.
- Probably the most important in the real world, ask the domain expert on what he/she thinks might be the important variables
Basic Linear Regression technique
- Playing with hyperparamters to get good cross-validation/test score is the key here for a basic Linear Regression model.
- Try as many techniques as you can from here and here
add a comment |Â
up vote
3
down vote
In no way is this going to be an exhaustive answer, but it will definitely give you a starting point in Python
-
Data Exploration
Start with Pandas Profiling
. It will give you HTML reports of your variables. If the quality of the data is good, it will provide some insights into the fill rate, depending upon the variable type some statistics for each variable
Correlational matrix
The pandas profiling report includes the coorelation matrix. But if you are looking to compute by hand, use pd.corr()
. You can vary parameters to get different correlation metrics like ‘pearson’, ‘kendall’, ‘spearman’
Dimension Reduction -> PCA (Dimension reduction)
There are many ways to do this. Keep in mind if you are looking for accuracies only and don't care about how X
is influencing y
, (1) is an optional step (applies to (2) as well).
- Analyse the correlation matrix and use
VIF
to dump variables with high correlation - Factor Analysis / PCA for dimensionality reduction
- Use LASSO to fit a model, check the coefficients and the ones that are
0
or going to0
can be thought of as weak indicators and can be eliminated. - Keep all 50, and use Ridge Regression and vary the alpha parameter to fine-tune accuracy (or whatever metric you are trying to optimize)
- If the model still doesn't seem to be stable, try to cook non-linear features with sklearn's Polynomial Features, regularize and repeat.
- Probably the most important in the real world, ask the domain expert on what he/she thinks might be the important variables
Basic Linear Regression technique
- Playing with hyperparamters to get good cross-validation/test score is the key here for a basic Linear Regression model.
- Try as many techniques as you can from here and here
add a comment |Â
up vote
3
down vote
up vote
3
down vote
In no way is this going to be an exhaustive answer, but it will definitely give you a starting point in Python
-
Data Exploration
Start with Pandas Profiling
. It will give you HTML reports of your variables. If the quality of the data is good, it will provide some insights into the fill rate, depending upon the variable type some statistics for each variable
Correlational matrix
The pandas profiling report includes the coorelation matrix. But if you are looking to compute by hand, use pd.corr()
. You can vary parameters to get different correlation metrics like ‘pearson’, ‘kendall’, ‘spearman’
Dimension Reduction -> PCA (Dimension reduction)
There are many ways to do this. Keep in mind if you are looking for accuracies only and don't care about how X
is influencing y
, (1) is an optional step (applies to (2) as well).
- Analyse the correlation matrix and use
VIF
to dump variables with high correlation - Factor Analysis / PCA for dimensionality reduction
- Use LASSO to fit a model, check the coefficients and the ones that are
0
or going to0
can be thought of as weak indicators and can be eliminated. - Keep all 50, and use Ridge Regression and vary the alpha parameter to fine-tune accuracy (or whatever metric you are trying to optimize)
- If the model still doesn't seem to be stable, try to cook non-linear features with sklearn's Polynomial Features, regularize and repeat.
- Probably the most important in the real world, ask the domain expert on what he/she thinks might be the important variables
Basic Linear Regression technique
- Playing with hyperparamters to get good cross-validation/test score is the key here for a basic Linear Regression model.
- Try as many techniques as you can from here and here
In no way is this going to be an exhaustive answer, but it will definitely give you a starting point in Python
-
Data Exploration
Start with Pandas Profiling
. It will give you HTML reports of your variables. If the quality of the data is good, it will provide some insights into the fill rate, depending upon the variable type some statistics for each variable
Correlational matrix
The pandas profiling report includes the coorelation matrix. But if you are looking to compute by hand, use pd.corr()
. You can vary parameters to get different correlation metrics like ‘pearson’, ‘kendall’, ‘spearman’
Dimension Reduction -> PCA (Dimension reduction)
There are many ways to do this. Keep in mind if you are looking for accuracies only and don't care about how X
is influencing y
, (1) is an optional step (applies to (2) as well).
- Analyse the correlation matrix and use
VIF
to dump variables with high correlation - Factor Analysis / PCA for dimensionality reduction
- Use LASSO to fit a model, check the coefficients and the ones that are
0
or going to0
can be thought of as weak indicators and can be eliminated. - Keep all 50, and use Ridge Regression and vary the alpha parameter to fine-tune accuracy (or whatever metric you are trying to optimize)
- If the model still doesn't seem to be stable, try to cook non-linear features with sklearn's Polynomial Features, regularize and repeat.
- Probably the most important in the real world, ask the domain expert on what he/she thinks might be the important variables
Basic Linear Regression technique
- Playing with hyperparamters to get good cross-validation/test score is the key here for a basic Linear Regression model.
- Try as many techniques as you can from here and here
answered 4 hours ago


Vivek Kalyanarangan
36816
36816
add a comment |Â
add a comment |Â
user86752 is a new contributor. Be nice, and check out our Code of Conduct.
user86752 is a new contributor. Be nice, and check out our Code of Conduct.
user86752 is a new contributor. Be nice, and check out our Code of Conduct.
user86752 is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f40843%2fwhat-the-good-general-regression-technqiue-for-a-problem-with-50-independent-var%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password