Gradient descent optimization
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
2
down vote
favorite
I am trying to understand gradient descent optimization in ML algorithms. I understand that there's a cost function - where the aim is to minimize the error y^-y
. Now in a scenario where weights w1, w2
are being optimized to give the minimum error, When the optimization does occur through partial derivatives, in each turn does it change both w1
and w2
or is it a combination like in few iterations only w1
is changed and when w1
isn't reducing the error more, the derivative starts with w2
- to reach the local minima? The application can be a linear regression model or a logistic regression model or Boosting algorithms.
optimization gradient-descent
add a comment |Â
up vote
2
down vote
favorite
I am trying to understand gradient descent optimization in ML algorithms. I understand that there's a cost function - where the aim is to minimize the error y^-y
. Now in a scenario where weights w1, w2
are being optimized to give the minimum error, When the optimization does occur through partial derivatives, in each turn does it change both w1
and w2
or is it a combination like in few iterations only w1
is changed and when w1
isn't reducing the error more, the derivative starts with w2
- to reach the local minima? The application can be a linear regression model or a logistic regression model or Boosting algorithms.
optimization gradient-descent
gradient descent is a decent optimisation algorithm.
– Berkan
16 mins ago
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I am trying to understand gradient descent optimization in ML algorithms. I understand that there's a cost function - where the aim is to minimize the error y^-y
. Now in a scenario where weights w1, w2
are being optimized to give the minimum error, When the optimization does occur through partial derivatives, in each turn does it change both w1
and w2
or is it a combination like in few iterations only w1
is changed and when w1
isn't reducing the error more, the derivative starts with w2
- to reach the local minima? The application can be a linear regression model or a logistic regression model or Boosting algorithms.
optimization gradient-descent
I am trying to understand gradient descent optimization in ML algorithms. I understand that there's a cost function - where the aim is to minimize the error y^-y
. Now in a scenario where weights w1, w2
are being optimized to give the minimum error, When the optimization does occur through partial derivatives, in each turn does it change both w1
and w2
or is it a combination like in few iterations only w1
is changed and when w1
isn't reducing the error more, the derivative starts with w2
- to reach the local minima? The application can be a linear regression model or a logistic regression model or Boosting algorithms.
optimization gradient-descent
optimization gradient-descent
edited 17 mins ago


Berkan
5,53212033
5,53212033
asked 4 hours ago
Pb89
589
589
gradient descent is a decent optimisation algorithm.
– Berkan
16 mins ago
add a comment |Â
gradient descent is a decent optimisation algorithm.
– Berkan
16 mins ago
gradient descent is a decent optimisation algorithm.
– Berkan
16 mins ago
gradient descent is a decent optimisation algorithm.
– Berkan
16 mins ago
add a comment |Â
4 Answers
4
active
oldest
votes
up vote
2
down vote
When the optimization does occur through partial derivatives, in each turn does it change both w1 and w2 or is it a combination like in few iterations only w1 is changed and when w1 isn't reducing the error more, the derivative starts with w2 - to reach the local minima?
In each iteration, the algorithm will change all weights at the same time based on gradient vector. In fact, the gradient is a vector. The length of the gradient is as same as number of the weights in the model.
On the other hand, changing one parameter at a time did exist and it is called coordinate decent algorithm, which is a type of gradient free optimization algorithm. In practice, it may not work as well as gradient based algorithm.
Here is an interesting answer on gradient free algorithm
Is it possible to train a neural network without backpropagation?
add a comment |Â
up vote
1
down vote
Gradient descent updates all parameters at each step. You can see this in the update rule:
$$
w^(t+1)=w^(t) - etanabla fleft(w^(t)right).
$$
Since the gradient of the loss function $nabla f(w)$ is vector-valued with dimension matching that of $w$, all parameters are updated at each iteration.
The learning rate $eta$ is a positive number that re-scales the gradient. Taking too large a step can endlessly bounce you across the loss surface with no improvement in your loss function; too small a step can mean tediously slow progress towards the optimum.
So the algorithm may try different combinations like increasew1
, decreasew2
based on the direction from partial derivative to reach local minima and just to confirm the algorithm will not necessarily give the global minima always?
– Pb89
4 hours ago
and does the partial derivative also help to explain how much increase or decrease has to be done tow1
andw2
or that is done by learning rate/shrinkage while partial derivative only provides direction of descent?
– Pb89
4 hours ago
The gradient is a vector, so it gives a direction and a magnitude. A vector can be arbitrarily rescaled by a positive scalar and it will have the same direction, but the rescaling will change its magnitude.
– Sycorax
3 hours ago
If magnitude is also given by the gradient then what is the role of shrinkage or learning rate?
– Pb89
3 hours ago
The learning rate rescales the gradient. Suppose $nabla f(x)$ has a large norm (length). Taking a large step will move you to a distant part of the loss surface (jumping from one mountain to another). The core justification of gradient descent is that it's a linear approximation in the vicinity of $w^(t)$. That approximation is always inexact, but it's probably worse the farther away you move -- hence, you want to take small steps, so you use some small $eta$, where 'small' is entirely problem-specific.
– Sycorax
3 hours ago
add a comment |Â
up vote
1
down vote
The aim of gradient descent is to minimize the cost function. This minimization is achieved by adjusting weights, for your case w1 and w2. In general there could be n such weights.
Gradient descent is done in the following way:
- initialize weights randomly.
- compute the cost function and gradient with initialized weights.
- update weigths:
It might happen that the gradient is O for some weights, in that case
those weights doesn't show any change after updating.
for example: Let say gradient is [1,0] the W2 will remain
unchanged. - check the cost function with updated weights, if the decrement is acceptable enough continue the iterations else terminate.
while updating weights which weight ( W1 or W2) gets changed is entirely decided by gradient. All the weights get updated ( some weights might not change based on gradient).
New contributor
A Santosh Kumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
"if the decrement is acceptable enough continue the iterations else terminate", is there a default value which is applied in packages of python (sklearn
) or R packages such ascaret
? It can be user specified only in a manually created gradient descent function?
– Pb89
23 mins ago
add a comment |Â
up vote
1
down vote
Gradient decent is applied to both w1
and w2
for each iteration. During each iteration, the parameters updated according to the gradients. They would likely have different partial derivative.
Check here.
add a comment |Â
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
When the optimization does occur through partial derivatives, in each turn does it change both w1 and w2 or is it a combination like in few iterations only w1 is changed and when w1 isn't reducing the error more, the derivative starts with w2 - to reach the local minima?
In each iteration, the algorithm will change all weights at the same time based on gradient vector. In fact, the gradient is a vector. The length of the gradient is as same as number of the weights in the model.
On the other hand, changing one parameter at a time did exist and it is called coordinate decent algorithm, which is a type of gradient free optimization algorithm. In practice, it may not work as well as gradient based algorithm.
Here is an interesting answer on gradient free algorithm
Is it possible to train a neural network without backpropagation?
add a comment |Â
up vote
2
down vote
When the optimization does occur through partial derivatives, in each turn does it change both w1 and w2 or is it a combination like in few iterations only w1 is changed and when w1 isn't reducing the error more, the derivative starts with w2 - to reach the local minima?
In each iteration, the algorithm will change all weights at the same time based on gradient vector. In fact, the gradient is a vector. The length of the gradient is as same as number of the weights in the model.
On the other hand, changing one parameter at a time did exist and it is called coordinate decent algorithm, which is a type of gradient free optimization algorithm. In practice, it may not work as well as gradient based algorithm.
Here is an interesting answer on gradient free algorithm
Is it possible to train a neural network without backpropagation?
add a comment |Â
up vote
2
down vote
up vote
2
down vote
When the optimization does occur through partial derivatives, in each turn does it change both w1 and w2 or is it a combination like in few iterations only w1 is changed and when w1 isn't reducing the error more, the derivative starts with w2 - to reach the local minima?
In each iteration, the algorithm will change all weights at the same time based on gradient vector. In fact, the gradient is a vector. The length of the gradient is as same as number of the weights in the model.
On the other hand, changing one parameter at a time did exist and it is called coordinate decent algorithm, which is a type of gradient free optimization algorithm. In practice, it may not work as well as gradient based algorithm.
Here is an interesting answer on gradient free algorithm
Is it possible to train a neural network without backpropagation?
When the optimization does occur through partial derivatives, in each turn does it change both w1 and w2 or is it a combination like in few iterations only w1 is changed and when w1 isn't reducing the error more, the derivative starts with w2 - to reach the local minima?
In each iteration, the algorithm will change all weights at the same time based on gradient vector. In fact, the gradient is a vector. The length of the gradient is as same as number of the weights in the model.
On the other hand, changing one parameter at a time did exist and it is called coordinate decent algorithm, which is a type of gradient free optimization algorithm. In practice, it may not work as well as gradient based algorithm.
Here is an interesting answer on gradient free algorithm
Is it possible to train a neural network without backpropagation?
edited 18 mins ago
answered 2 hours ago


hxd1011
17.3k445134
17.3k445134
add a comment |Â
add a comment |Â
up vote
1
down vote
Gradient descent updates all parameters at each step. You can see this in the update rule:
$$
w^(t+1)=w^(t) - etanabla fleft(w^(t)right).
$$
Since the gradient of the loss function $nabla f(w)$ is vector-valued with dimension matching that of $w$, all parameters are updated at each iteration.
The learning rate $eta$ is a positive number that re-scales the gradient. Taking too large a step can endlessly bounce you across the loss surface with no improvement in your loss function; too small a step can mean tediously slow progress towards the optimum.
So the algorithm may try different combinations like increasew1
, decreasew2
based on the direction from partial derivative to reach local minima and just to confirm the algorithm will not necessarily give the global minima always?
– Pb89
4 hours ago
and does the partial derivative also help to explain how much increase or decrease has to be done tow1
andw2
or that is done by learning rate/shrinkage while partial derivative only provides direction of descent?
– Pb89
4 hours ago
The gradient is a vector, so it gives a direction and a magnitude. A vector can be arbitrarily rescaled by a positive scalar and it will have the same direction, but the rescaling will change its magnitude.
– Sycorax
3 hours ago
If magnitude is also given by the gradient then what is the role of shrinkage or learning rate?
– Pb89
3 hours ago
The learning rate rescales the gradient. Suppose $nabla f(x)$ has a large norm (length). Taking a large step will move you to a distant part of the loss surface (jumping from one mountain to another). The core justification of gradient descent is that it's a linear approximation in the vicinity of $w^(t)$. That approximation is always inexact, but it's probably worse the farther away you move -- hence, you want to take small steps, so you use some small $eta$, where 'small' is entirely problem-specific.
– Sycorax
3 hours ago
add a comment |Â
up vote
1
down vote
Gradient descent updates all parameters at each step. You can see this in the update rule:
$$
w^(t+1)=w^(t) - etanabla fleft(w^(t)right).
$$
Since the gradient of the loss function $nabla f(w)$ is vector-valued with dimension matching that of $w$, all parameters are updated at each iteration.
The learning rate $eta$ is a positive number that re-scales the gradient. Taking too large a step can endlessly bounce you across the loss surface with no improvement in your loss function; too small a step can mean tediously slow progress towards the optimum.
So the algorithm may try different combinations like increasew1
, decreasew2
based on the direction from partial derivative to reach local minima and just to confirm the algorithm will not necessarily give the global minima always?
– Pb89
4 hours ago
and does the partial derivative also help to explain how much increase or decrease has to be done tow1
andw2
or that is done by learning rate/shrinkage while partial derivative only provides direction of descent?
– Pb89
4 hours ago
The gradient is a vector, so it gives a direction and a magnitude. A vector can be arbitrarily rescaled by a positive scalar and it will have the same direction, but the rescaling will change its magnitude.
– Sycorax
3 hours ago
If magnitude is also given by the gradient then what is the role of shrinkage or learning rate?
– Pb89
3 hours ago
The learning rate rescales the gradient. Suppose $nabla f(x)$ has a large norm (length). Taking a large step will move you to a distant part of the loss surface (jumping from one mountain to another). The core justification of gradient descent is that it's a linear approximation in the vicinity of $w^(t)$. That approximation is always inexact, but it's probably worse the farther away you move -- hence, you want to take small steps, so you use some small $eta$, where 'small' is entirely problem-specific.
– Sycorax
3 hours ago
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Gradient descent updates all parameters at each step. You can see this in the update rule:
$$
w^(t+1)=w^(t) - etanabla fleft(w^(t)right).
$$
Since the gradient of the loss function $nabla f(w)$ is vector-valued with dimension matching that of $w$, all parameters are updated at each iteration.
The learning rate $eta$ is a positive number that re-scales the gradient. Taking too large a step can endlessly bounce you across the loss surface with no improvement in your loss function; too small a step can mean tediously slow progress towards the optimum.
Gradient descent updates all parameters at each step. You can see this in the update rule:
$$
w^(t+1)=w^(t) - etanabla fleft(w^(t)right).
$$
Since the gradient of the loss function $nabla f(w)$ is vector-valued with dimension matching that of $w$, all parameters are updated at each iteration.
The learning rate $eta$ is a positive number that re-scales the gradient. Taking too large a step can endlessly bounce you across the loss surface with no improvement in your loss function; too small a step can mean tediously slow progress towards the optimum.
edited 3 hours ago
answered 4 hours ago


Sycorax
36k694180
36k694180
So the algorithm may try different combinations like increasew1
, decreasew2
based on the direction from partial derivative to reach local minima and just to confirm the algorithm will not necessarily give the global minima always?
– Pb89
4 hours ago
and does the partial derivative also help to explain how much increase or decrease has to be done tow1
andw2
or that is done by learning rate/shrinkage while partial derivative only provides direction of descent?
– Pb89
4 hours ago
The gradient is a vector, so it gives a direction and a magnitude. A vector can be arbitrarily rescaled by a positive scalar and it will have the same direction, but the rescaling will change its magnitude.
– Sycorax
3 hours ago
If magnitude is also given by the gradient then what is the role of shrinkage or learning rate?
– Pb89
3 hours ago
The learning rate rescales the gradient. Suppose $nabla f(x)$ has a large norm (length). Taking a large step will move you to a distant part of the loss surface (jumping from one mountain to another). The core justification of gradient descent is that it's a linear approximation in the vicinity of $w^(t)$. That approximation is always inexact, but it's probably worse the farther away you move -- hence, you want to take small steps, so you use some small $eta$, where 'small' is entirely problem-specific.
– Sycorax
3 hours ago
add a comment |Â
So the algorithm may try different combinations like increasew1
, decreasew2
based on the direction from partial derivative to reach local minima and just to confirm the algorithm will not necessarily give the global minima always?
– Pb89
4 hours ago
and does the partial derivative also help to explain how much increase or decrease has to be done tow1
andw2
or that is done by learning rate/shrinkage while partial derivative only provides direction of descent?
– Pb89
4 hours ago
The gradient is a vector, so it gives a direction and a magnitude. A vector can be arbitrarily rescaled by a positive scalar and it will have the same direction, but the rescaling will change its magnitude.
– Sycorax
3 hours ago
If magnitude is also given by the gradient then what is the role of shrinkage or learning rate?
– Pb89
3 hours ago
The learning rate rescales the gradient. Suppose $nabla f(x)$ has a large norm (length). Taking a large step will move you to a distant part of the loss surface (jumping from one mountain to another). The core justification of gradient descent is that it's a linear approximation in the vicinity of $w^(t)$. That approximation is always inexact, but it's probably worse the farther away you move -- hence, you want to take small steps, so you use some small $eta$, where 'small' is entirely problem-specific.
– Sycorax
3 hours ago
So the algorithm may try different combinations like increase
w1
, decrease w2
based on the direction from partial derivative to reach local minima and just to confirm the algorithm will not necessarily give the global minima always?– Pb89
4 hours ago
So the algorithm may try different combinations like increase
w1
, decrease w2
based on the direction from partial derivative to reach local minima and just to confirm the algorithm will not necessarily give the global minima always?– Pb89
4 hours ago
and does the partial derivative also help to explain how much increase or decrease has to be done to
w1
and w2
or that is done by learning rate/shrinkage while partial derivative only provides direction of descent?– Pb89
4 hours ago
and does the partial derivative also help to explain how much increase or decrease has to be done to
w1
and w2
or that is done by learning rate/shrinkage while partial derivative only provides direction of descent?– Pb89
4 hours ago
The gradient is a vector, so it gives a direction and a magnitude. A vector can be arbitrarily rescaled by a positive scalar and it will have the same direction, but the rescaling will change its magnitude.
– Sycorax
3 hours ago
The gradient is a vector, so it gives a direction and a magnitude. A vector can be arbitrarily rescaled by a positive scalar and it will have the same direction, but the rescaling will change its magnitude.
– Sycorax
3 hours ago
If magnitude is also given by the gradient then what is the role of shrinkage or learning rate?
– Pb89
3 hours ago
If magnitude is also given by the gradient then what is the role of shrinkage or learning rate?
– Pb89
3 hours ago
The learning rate rescales the gradient. Suppose $nabla f(x)$ has a large norm (length). Taking a large step will move you to a distant part of the loss surface (jumping from one mountain to another). The core justification of gradient descent is that it's a linear approximation in the vicinity of $w^(t)$. That approximation is always inexact, but it's probably worse the farther away you move -- hence, you want to take small steps, so you use some small $eta$, where 'small' is entirely problem-specific.
– Sycorax
3 hours ago
The learning rate rescales the gradient. Suppose $nabla f(x)$ has a large norm (length). Taking a large step will move you to a distant part of the loss surface (jumping from one mountain to another). The core justification of gradient descent is that it's a linear approximation in the vicinity of $w^(t)$. That approximation is always inexact, but it's probably worse the farther away you move -- hence, you want to take small steps, so you use some small $eta$, where 'small' is entirely problem-specific.
– Sycorax
3 hours ago
add a comment |Â
up vote
1
down vote
The aim of gradient descent is to minimize the cost function. This minimization is achieved by adjusting weights, for your case w1 and w2. In general there could be n such weights.
Gradient descent is done in the following way:
- initialize weights randomly.
- compute the cost function and gradient with initialized weights.
- update weigths:
It might happen that the gradient is O for some weights, in that case
those weights doesn't show any change after updating.
for example: Let say gradient is [1,0] the W2 will remain
unchanged. - check the cost function with updated weights, if the decrement is acceptable enough continue the iterations else terminate.
while updating weights which weight ( W1 or W2) gets changed is entirely decided by gradient. All the weights get updated ( some weights might not change based on gradient).
New contributor
A Santosh Kumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
"if the decrement is acceptable enough continue the iterations else terminate", is there a default value which is applied in packages of python (sklearn
) or R packages such ascaret
? It can be user specified only in a manually created gradient descent function?
– Pb89
23 mins ago
add a comment |Â
up vote
1
down vote
The aim of gradient descent is to minimize the cost function. This minimization is achieved by adjusting weights, for your case w1 and w2. In general there could be n such weights.
Gradient descent is done in the following way:
- initialize weights randomly.
- compute the cost function and gradient with initialized weights.
- update weigths:
It might happen that the gradient is O for some weights, in that case
those weights doesn't show any change after updating.
for example: Let say gradient is [1,0] the W2 will remain
unchanged. - check the cost function with updated weights, if the decrement is acceptable enough continue the iterations else terminate.
while updating weights which weight ( W1 or W2) gets changed is entirely decided by gradient. All the weights get updated ( some weights might not change based on gradient).
New contributor
A Santosh Kumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
"if the decrement is acceptable enough continue the iterations else terminate", is there a default value which is applied in packages of python (sklearn
) or R packages such ascaret
? It can be user specified only in a manually created gradient descent function?
– Pb89
23 mins ago
add a comment |Â
up vote
1
down vote
up vote
1
down vote
The aim of gradient descent is to minimize the cost function. This minimization is achieved by adjusting weights, for your case w1 and w2. In general there could be n such weights.
Gradient descent is done in the following way:
- initialize weights randomly.
- compute the cost function and gradient with initialized weights.
- update weigths:
It might happen that the gradient is O for some weights, in that case
those weights doesn't show any change after updating.
for example: Let say gradient is [1,0] the W2 will remain
unchanged. - check the cost function with updated weights, if the decrement is acceptable enough continue the iterations else terminate.
while updating weights which weight ( W1 or W2) gets changed is entirely decided by gradient. All the weights get updated ( some weights might not change based on gradient).
New contributor
A Santosh Kumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
The aim of gradient descent is to minimize the cost function. This minimization is achieved by adjusting weights, for your case w1 and w2. In general there could be n such weights.
Gradient descent is done in the following way:
- initialize weights randomly.
- compute the cost function and gradient with initialized weights.
- update weigths:
It might happen that the gradient is O for some weights, in that case
those weights doesn't show any change after updating.
for example: Let say gradient is [1,0] the W2 will remain
unchanged. - check the cost function with updated weights, if the decrement is acceptable enough continue the iterations else terminate.
while updating weights which weight ( W1 or W2) gets changed is entirely decided by gradient. All the weights get updated ( some weights might not change based on gradient).
New contributor
A Santosh Kumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
A Santosh Kumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered 2 hours ago
A Santosh Kumar
111
111
New contributor
A Santosh Kumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
A Santosh Kumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
A Santosh Kumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
"if the decrement is acceptable enough continue the iterations else terminate", is there a default value which is applied in packages of python (sklearn
) or R packages such ascaret
? It can be user specified only in a manually created gradient descent function?
– Pb89
23 mins ago
add a comment |Â
"if the decrement is acceptable enough continue the iterations else terminate", is there a default value which is applied in packages of python (sklearn
) or R packages such ascaret
? It can be user specified only in a manually created gradient descent function?
– Pb89
23 mins ago
"if the decrement is acceptable enough continue the iterations else terminate", is there a default value which is applied in packages of python (
sklearn
) or R packages such as caret
? It can be user specified only in a manually created gradient descent function?– Pb89
23 mins ago
"if the decrement is acceptable enough continue the iterations else terminate", is there a default value which is applied in packages of python (
sklearn
) or R packages such as caret
? It can be user specified only in a manually created gradient descent function?– Pb89
23 mins ago
add a comment |Â
up vote
1
down vote
Gradient decent is applied to both w1
and w2
for each iteration. During each iteration, the parameters updated according to the gradients. They would likely have different partial derivative.
Check here.
add a comment |Â
up vote
1
down vote
Gradient decent is applied to both w1
and w2
for each iteration. During each iteration, the parameters updated according to the gradients. They would likely have different partial derivative.
Check here.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Gradient decent is applied to both w1
and w2
for each iteration. During each iteration, the parameters updated according to the gradients. They would likely have different partial derivative.
Check here.
Gradient decent is applied to both w1
and w2
for each iteration. During each iteration, the parameters updated according to the gradients. They would likely have different partial derivative.
Check here.
edited 1 hour ago
Sven Hohenstein
4,74762333
4,74762333
answered 4 hours ago


SmallChess
5,44341837
5,44341837
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f373055%2fgradient-descent-optimization%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
gradient descent is a decent optimisation algorithm.
– Berkan
16 mins ago