Question about the latent variable in EM algorithm

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
2
down vote

favorite

In mixture models, Expectation maximization algorithm (EM) is a commonly used method to estimate the model parameters. Suppose that I have bivariate mixture model with two mixture components, with mixture weights, $pi_1$ and $pi_2$, respectively. EM introduces a $z$ variable which takes 1 if the point is come from the first mixture component, and takes 0 otherwise. These variables are assumed to be i.i.d and are distributed from multinomial ($pi_1$, $pi_2$). Is that correct? If yes, how they can have only 0,1 values and in the sometimes they have multinomial distribution?

Here is the paragraph from the source: here is the link

3.2. EM algorithm

In this section, we describe the EM algorithm (Dempster et al., 1977) to obtain the estimates for the parameters ÃŽÂ¸ in a
mixture of M-component D-vine densities, given the data set and the number of components M. The determination of M
will be discussed later in Section 3.3.
Assume that N observations, say $x_k = (x_k,1, . . . , x_k,N )$ where $k = 1, . . . , d$, drawn randomly from a $M$-component.

Let us denote latent variables $z_n = (z_n1, . . . , z_nm, . . . , z_nM)$, where $z_nm = 1$ if $x_n$ comes from the $m$-th component and
$z_nm$ = 0 otherwise. Assume that $z_n$ is independent and identically distributed from a multinomial distribution, that is,
$z_n Ã¢ÂˆÂ¼ Mult(M, ÃÂ€ = (ÃÂ€_1, . . . , ÃÂ€_M))$.

Any help, please?

edited 27 mins ago

asked 1 hour ago

Maryam

7810

@Xi'an thank you so much for your answer, I add the link.
â€“Â Maryam
28 mins ago

@Xi'an Ok. Sorry for the typo.
â€“Â Maryam
26 mins ago

add a commentÂ |Â

up vote
2
down vote

favorite

Here is the paragraph from the source: here is the link

3.2. EM algorithm

Any help, please?

edited 27 mins ago

asked 1 hour ago

Maryam

7810

@Xi'an thank you so much for your answer, I add the link.
â€“Â Maryam
28 mins ago

@Xi'an Ok. Sorry for the typo.
â€“Â Maryam
26 mins ago

add a commentÂ |Â

up vote
2
down vote

favorite

Here is the paragraph from the source: here is the link

3.2. EM algorithm

Any help, please?

edited 27 mins ago

asked 1 hour ago

Maryam

7810

Here is the paragraph from the source: here is the link

3.2. EM algorithm

Any help, please?

maximum-likelihood expectation-maximization mixture latent-variable finite-mixture-model

edited 27 mins ago

asked 1 hour ago

Maryam

7810

edited 27 mins ago

asked 1 hour ago

Maryam

7810

edited 27 mins ago

asked 1 hour ago

Maryam

7810

asked 1 hour ago

Maryam

7810

asked 1 hour ago

Maryam

7810

@Xi'an thank you so much for your answer, I add the link.
â€“Â Maryam
28 mins ago

@Xi'an Ok. Sorry for the typo.
â€“Â Maryam
26 mins ago

add a commentÂ |Â

@Xi'an thank you so much for your answer, I add the link.
â€“Â Maryam
28 mins ago

@Xi'an Ok. Sorry for the typo.
â€“Â Maryam
26 mins ago

@Xi'an thank you so much for your answer, I add the link.
â€“Â Maryam
28 mins ago

@Xi'an Ok. Sorry for the typo.
â€“Â Maryam
26 mins ago

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

There is a lot of confusion in the question, confusion that could be reduced by looking at a textbook on the paper, or even the original 1977 paper by Dempster, Laird and Rubin.

Here is an excerpt of our book, Introducing Monte Carlo Methods with R, followed by my answer:

Assume that we observe $X_1, ldots, X_n$, jointly distributed from $g(mathbf x|theta)$ that satisfies
$$
g(mathbf x|theta)=int_cal Z f(mathbf x, mathbf z|theta), textdmathbf z,
$$
and that we want to compute $hattheta = argmax L(theta|mathbf x)= argmax g(mathbf x|theta)$.
Since the augmented data is $mathbf z$, where $(mathbf X, mathbf Z) sim f(mathbf x,mathbf z| theta)$
the conditional distribution of the missing data $mathbf Z$ given the observed data $mathbf x$ is
$$
k(mathbf z| theta, mathbf x) = f(mathbf x, mathbf z|theta)big/g(mathbf x|theta),.
$$
Taking the logarithm of this expression
leads to the following relationship between the complete-data likelihood $L^c(theta|mathbf x,
mathbf z)$ and the observed-data likelihood $L(theta|mathbf x)$. For any value $theta_0$,
$$
log L(theta|mathbf x)= mathbbE_theta_0[log L^c(theta|mathbf x,mathbf Z)]
-mathbbE_theta_0[log k(mathbf Z| theta, mathbf x)],qquad(1)
$$where the expectation is with respect to $k(mathbf z| theta_0, mathbf x)$. In the EM algorithm,
while we aim at maximizing $log L(theta|mathbf x)$, only the first term on the right side of
(1) will be considered.

Denoting$$
Q(theta |theta_0, mathbf x) = mathbbE_theta_0
[log L^c(theta|mathbf x,mathbf Z)],
$$
the EM algorithm indeed proceeds iteratively by maximizing
$Q(theta |theta_0, mathbf x)$ at each iteration and, if $hattheta_(1)$
is the value of $theta$ maximizing $Q(theta |theta_0, mathbf x)$,
by replacing $theta_0$ by the updated value $hattheta_(1)$. In this manner, a sequence of estimators
$hattheta_(j)_j$ is obtained, where $hattheta_(j)$ is defined as the value of
$theta$ maximizing $Q(theta |hattheta_(j-1), mathbf x)$; that is,$$
Q(hattheta_(j) |hattheta_(j-1), mathbf x)
= max_theta,Q(theta |hattheta_(j-1),
mathbf x).$$This iterative scheme thus contains both an expectation step
and a maximization step, giving the algorithm its name.

EM Algorithm
Pick a starting value $hattheta_(0)$

Repeat

Compute the E-step
$$
Q(theta|hattheta_(m), mathbf x)
=mathbbE_hattheta_(m) [log L^c(theta|mathbf x, mathbf Z)],,
$$
where the expectation is with respect to $k(mathbf z|hattheta_(m),mathbf x)$ and set $m=0$.

Maximize $Q(theta|hattheta_(m), mathbf x)$ in
$theta$ and take the M-step
$$
hattheta_(m+1)=argmax_theta ; Q(theta|hattheta_(m), mathbf x)
$$
and set $m=m+1$

until a fixed point is reached; i.e., $hattheta_(m+1)=hattheta_(m)$.

For the normal mixture, using the missing data structure exhibited in previously leads to an objective function
equal to
$$
Q(theta^prime|theta,mathbfx) = -frac12,sum_i=1^n
mathbbE_thetaleft[left. Z_i (x_i-mu_1)^2 + (1-Z_i) (x_i-mu_2)^2 right| mathbfx right].
$$
Solving the M-step then provides the closed-form expressions
$$
mu_1^prime = mathbbE_thetaleft[ sum_i=1^n Z_i x_i |mathbfx right]
bigg/ mathbbE_thetaleft[ sum_i=1^n Z_i| mathbfx right]
$$
and
$$
mu_1^prime = mathbbE_thetaleft[ sum_i=1^n (1-Z_i) x_i |mathbfx right]
bigg/ mathbbE_thetaleft[ sum_i=1^n (1-Z_i)| mathbfx right].
$$
Since
$$
mathbbE_thetaleft[Z_i|mathbfx right]=fracvarphi(x_i-mu_1) varphi(x_i-mu_1)+3varphi(x_i-mu_2),,
$$
the EM algorithm can easily be implemented in this setting.

Whatever the mixture involved, the latent variables $Z_i$ are Multinomial $mathcalM_M(1;pi_1,ldots,pi_M)$ which means only one component of the vector $Z_i$ is equal to one and all of the $M-1$ others are zero. (Note the difference with the question in the notations: the original notation $mathcalM(M;pi_1,ldots,pi_M)$ fails to indicate how many draws are taken, that is, what is the sum of the components of $Z_i$.).When $k=2$ as in the above excerpt, $Z_i$ is an integer in $0,1$. There may be a confusion between a Multinomial distribution and the property of a distribution (like some mixtures) to be multimodal. The $Z_i$ do not have a multimodal distribution, taking only two values, even conditional on the $X_i$'s, while the $X_i$'s may, at least unconditionally.

edited 24 mins ago

answered 41 mins ago

Xi'an

50.4k686335

Yes, now I understand my problem. Thank you so much. I really learn a new thing.
â€“Â Maryam
19 mins ago

add a commentÂ |Â

up vote
0
down vote

If I correctly read between the lines, your question is about the difference between the distribution of $[z]$ (i.e., the prior distribution of the latent variable), and the distribution of $[z mid y]$ (i.e., the posterior distribution of the latent variable given the data $y$).

Indeed the prior is i.i.d. Bernoulli with probability $pi$. However, the posterior is not i.i.d. because each subject will his/her own probability of belonging in the first component (i.e., the component for which $z_i = 1$) depending on their data $y_i$. Hence, if you do the plot of the posterior probabilities, it can be multimodal.

answered 42 mins ago

Dimitris Rizopoulos

1,890110

Thank you so much for your help. I have updated my question to make it so clear.
â€“Â Maryam
32 mins ago

1

Sorry, I have a typo in my question. Could you please have a look. I meant multinomial, not multimodal.
â€“Â Maryam
23 mins ago

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f371080%2fquestion-about-the-latent-variable-in-em-algorithm%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

There is a lot of confusion in the question, confusion that could be reduced by looking at a textbook on the paper, or even the original 1977 paper by Dempster, Laird and Rubin.

Here is an excerpt of our book, Introducing Monte Carlo Methods with R, followed by my answer:

Assume that we observe $X_1, ldots, X_n$, jointly distributed from $g(mathbf x|theta)$ that satisfies
$$
g(mathbf x|theta)=int_cal Z f(mathbf x, mathbf z|theta), textdmathbf z,
$$
and that we want to compute $hattheta = argmax L(theta|mathbf x)= argmax g(mathbf x|theta)$.
Since the augmented data is $mathbf z$, where $(mathbf X, mathbf Z) sim f(mathbf x,mathbf z| theta)$
the conditional distribution of the missing data $mathbf Z$ given the observed data $mathbf x$ is
$$
k(mathbf z| theta, mathbf x) = f(mathbf x, mathbf z|theta)big/g(mathbf x|theta),.
$$
Taking the logarithm of this expression
leads to the following relationship between the complete-data likelihood $L^c(theta|mathbf x,
mathbf z)$ and the observed-data likelihood $L(theta|mathbf x)$. For any value $theta_0$,
$$
log L(theta|mathbf x)= mathbbE_theta_0[log L^c(theta|mathbf x,mathbf Z)]
-mathbbE_theta_0[log k(mathbf Z| theta, mathbf x)],qquad(1)
$$where the expectation is with respect to $k(mathbf z| theta_0, mathbf x)$. In the EM algorithm,
while we aim at maximizing $log L(theta|mathbf x)$, only the first term on the right side of
(1) will be considered.

Denoting$$
Q(theta |theta_0, mathbf x) = mathbbE_theta_0
[log L^c(theta|mathbf x,mathbf Z)],
$$
the EM algorithm indeed proceeds iteratively by maximizing
$Q(theta |theta_0, mathbf x)$ at each iteration and, if $hattheta_(1)$
is the value of $theta$ maximizing $Q(theta |theta_0, mathbf x)$,
by replacing $theta_0$ by the updated value $hattheta_(1)$. In this manner, a sequence of estimators
$hattheta_(j)_j$ is obtained, where $hattheta_(j)$ is defined as the value of
$theta$ maximizing $Q(theta |hattheta_(j-1), mathbf x)$; that is,$$
Q(hattheta_(j) |hattheta_(j-1), mathbf x)
= max_theta,Q(theta |hattheta_(j-1),
mathbf x).$$This iterative scheme thus contains both an expectation step
and a maximization step, giving the algorithm its name.

EM Algorithm
Pick a starting value $hattheta_(0)$

Repeat

Compute the E-step
$$
Q(theta|hattheta_(m), mathbf x)
=mathbbE_hattheta_(m) [log L^c(theta|mathbf x, mathbf Z)],,
$$
where the expectation is with respect to $k(mathbf z|hattheta_(m),mathbf x)$ and set $m=0$.

Maximize $Q(theta|hattheta_(m), mathbf x)$ in
$theta$ and take the M-step
$$
hattheta_(m+1)=argmax_theta ; Q(theta|hattheta_(m), mathbf x)
$$
and set $m=m+1$

until a fixed point is reached; i.e., $hattheta_(m+1)=hattheta_(m)$.

For the normal mixture, using the missing data structure exhibited in previously leads to an objective function
equal to
$$
Q(theta^prime|theta,mathbfx) = -frac12,sum_i=1^n
mathbbE_thetaleft[left. Z_i (x_i-mu_1)^2 + (1-Z_i) (x_i-mu_2)^2 right| mathbfx right].
$$
Solving the M-step then provides the closed-form expressions
$$
mu_1^prime = mathbbE_thetaleft[ sum_i=1^n Z_i x_i |mathbfx right]
bigg/ mathbbE_thetaleft[ sum_i=1^n Z_i| mathbfx right]
$$
and
$$
mu_1^prime = mathbbE_thetaleft[ sum_i=1^n (1-Z_i) x_i |mathbfx right]
bigg/ mathbbE_thetaleft[ sum_i=1^n (1-Z_i)| mathbfx right].
$$
Since
$$
mathbbE_thetaleft[Z_i|mathbfx right]=fracvarphi(x_i-mu_1) varphi(x_i-mu_1)+3varphi(x_i-mu_2),,
$$
the EM algorithm can easily be implemented in this setting.

edited 24 mins ago

answered 41 mins ago

Xi'an

50.4k686335

Yes, now I understand my problem. Thank you so much. I really learn a new thing.
â€“Â Maryam
19 mins ago

add a commentÂ |Â

up vote
2
down vote

accepted

There is a lot of confusion in the question, confusion that could be reduced by looking at a textbook on the paper, or even the original 1977 paper by Dempster, Laird and Rubin.

Here is an excerpt of our book, Introducing Monte Carlo Methods with R, followed by my answer:

Assume that we observe $X_1, ldots, X_n$, jointly distributed from $g(mathbf x|theta)$ that satisfies
$$
g(mathbf x|theta)=int_cal Z f(mathbf x, mathbf z|theta), textdmathbf z,
$$
and that we want to compute $hattheta = argmax L(theta|mathbf x)= argmax g(mathbf x|theta)$.
Since the augmented data is $mathbf z$, where $(mathbf X, mathbf Z) sim f(mathbf x,mathbf z| theta)$
the conditional distribution of the missing data $mathbf Z$ given the observed data $mathbf x$ is
$$
k(mathbf z| theta, mathbf x) = f(mathbf x, mathbf z|theta)big/g(mathbf x|theta),.
$$
Taking the logarithm of this expression
leads to the following relationship between the complete-data likelihood $L^c(theta|mathbf x,
mathbf z)$ and the observed-data likelihood $L(theta|mathbf x)$. For any value $theta_0$,
$$
log L(theta|mathbf x)= mathbbE_theta_0[log L^c(theta|mathbf x,mathbf Z)]
-mathbbE_theta_0[log k(mathbf Z| theta, mathbf x)],qquad(1)
$$where the expectation is with respect to $k(mathbf z| theta_0, mathbf x)$. In the EM algorithm,
while we aim at maximizing $log L(theta|mathbf x)$, only the first term on the right side of
(1) will be considered.

Denoting$$
Q(theta |theta_0, mathbf x) = mathbbE_theta_0
[log L^c(theta|mathbf x,mathbf Z)],
$$
the EM algorithm indeed proceeds iteratively by maximizing
$Q(theta |theta_0, mathbf x)$ at each iteration and, if $hattheta_(1)$
is the value of $theta$ maximizing $Q(theta |theta_0, mathbf x)$,
by replacing $theta_0$ by the updated value $hattheta_(1)$. In this manner, a sequence of estimators
$hattheta_(j)_j$ is obtained, where $hattheta_(j)$ is defined as the value of
$theta$ maximizing $Q(theta |hattheta_(j-1), mathbf x)$; that is,$$
Q(hattheta_(j) |hattheta_(j-1), mathbf x)
= max_theta,Q(theta |hattheta_(j-1),
mathbf x).$$This iterative scheme thus contains both an expectation step
and a maximization step, giving the algorithm its name.

EM Algorithm
Pick a starting value $hattheta_(0)$

Repeat

Compute the E-step
$$
Q(theta|hattheta_(m), mathbf x)
=mathbbE_hattheta_(m) [log L^c(theta|mathbf x, mathbf Z)],,
$$
where the expectation is with respect to $k(mathbf z|hattheta_(m),mathbf x)$ and set $m=0$.

Maximize $Q(theta|hattheta_(m), mathbf x)$ in
$theta$ and take the M-step
$$
hattheta_(m+1)=argmax_theta ; Q(theta|hattheta_(m), mathbf x)
$$
and set $m=m+1$

until a fixed point is reached; i.e., $hattheta_(m+1)=hattheta_(m)$.

For the normal mixture, using the missing data structure exhibited in previously leads to an objective function
equal to
$$
Q(theta^prime|theta,mathbfx) = -frac12,sum_i=1^n
mathbbE_thetaleft[left. Z_i (x_i-mu_1)^2 + (1-Z_i) (x_i-mu_2)^2 right| mathbfx right].
$$
Solving the M-step then provides the closed-form expressions
$$
mu_1^prime = mathbbE_thetaleft[ sum_i=1^n Z_i x_i |mathbfx right]
bigg/ mathbbE_thetaleft[ sum_i=1^n Z_i| mathbfx right]
$$
and
$$
mu_1^prime = mathbbE_thetaleft[ sum_i=1^n (1-Z_i) x_i |mathbfx right]
bigg/ mathbbE_thetaleft[ sum_i=1^n (1-Z_i)| mathbfx right].
$$
Since
$$
mathbbE_thetaleft[Z_i|mathbfx right]=fracvarphi(x_i-mu_1) varphi(x_i-mu_1)+3varphi(x_i-mu_2),,
$$
the EM algorithm can easily be implemented in this setting.

edited 24 mins ago

answered 41 mins ago

Xi'an

50.4k686335

Yes, now I understand my problem. Thank you so much. I really learn a new thing.
â€“Â Maryam
19 mins ago

add a commentÂ |Â

up vote
2
down vote

accepted

There is a lot of confusion in the question, confusion that could be reduced by looking at a textbook on the paper, or even the original 1977 paper by Dempster, Laird and Rubin.

Here is an excerpt of our book, Introducing Monte Carlo Methods with R, followed by my answer:

Assume that we observe $X_1, ldots, X_n$, jointly distributed from $g(mathbf x|theta)$ that satisfies
$$
g(mathbf x|theta)=int_cal Z f(mathbf x, mathbf z|theta), textdmathbf z,
$$
and that we want to compute $hattheta = argmax L(theta|mathbf x)= argmax g(mathbf x|theta)$.
Since the augmented data is $mathbf z$, where $(mathbf X, mathbf Z) sim f(mathbf x,mathbf z| theta)$
the conditional distribution of the missing data $mathbf Z$ given the observed data $mathbf x$ is
$$
k(mathbf z| theta, mathbf x) = f(mathbf x, mathbf z|theta)big/g(mathbf x|theta),.
$$
Taking the logarithm of this expression
leads to the following relationship between the complete-data likelihood $L^c(theta|mathbf x,
mathbf z)$ and the observed-data likelihood $L(theta|mathbf x)$. For any value $theta_0$,
$$
log L(theta|mathbf x)= mathbbE_theta_0[log L^c(theta|mathbf x,mathbf Z)]
-mathbbE_theta_0[log k(mathbf Z| theta, mathbf x)],qquad(1)
$$where the expectation is with respect to $k(mathbf z| theta_0, mathbf x)$. In the EM algorithm,
while we aim at maximizing $log L(theta|mathbf x)$, only the first term on the right side of
(1) will be considered.

Denoting$$
Q(theta |theta_0, mathbf x) = mathbbE_theta_0
[log L^c(theta|mathbf x,mathbf Z)],
$$
the EM algorithm indeed proceeds iteratively by maximizing
$Q(theta |theta_0, mathbf x)$ at each iteration and, if $hattheta_(1)$
is the value of $theta$ maximizing $Q(theta |theta_0, mathbf x)$,
by replacing $theta_0$ by the updated value $hattheta_(1)$. In this manner, a sequence of estimators
$hattheta_(j)_j$ is obtained, where $hattheta_(j)$ is defined as the value of
$theta$ maximizing $Q(theta |hattheta_(j-1), mathbf x)$; that is,$$
Q(hattheta_(j) |hattheta_(j-1), mathbf x)
= max_theta,Q(theta |hattheta_(j-1),
mathbf x).$$This iterative scheme thus contains both an expectation step
and a maximization step, giving the algorithm its name.

EM Algorithm
Pick a starting value $hattheta_(0)$

Repeat

Compute the E-step
$$
Q(theta|hattheta_(m), mathbf x)
=mathbbE_hattheta_(m) [log L^c(theta|mathbf x, mathbf Z)],,
$$
where the expectation is with respect to $k(mathbf z|hattheta_(m),mathbf x)$ and set $m=0$.

Maximize $Q(theta|hattheta_(m), mathbf x)$ in
$theta$ and take the M-step
$$
hattheta_(m+1)=argmax_theta ; Q(theta|hattheta_(m), mathbf x)
$$
and set $m=m+1$

until a fixed point is reached; i.e., $hattheta_(m+1)=hattheta_(m)$.

For the normal mixture, using the missing data structure exhibited in previously leads to an objective function
equal to
$$
Q(theta^prime|theta,mathbfx) = -frac12,sum_i=1^n
mathbbE_thetaleft[left. Z_i (x_i-mu_1)^2 + (1-Z_i) (x_i-mu_2)^2 right| mathbfx right].
$$
Solving the M-step then provides the closed-form expressions
$$
mu_1^prime = mathbbE_thetaleft[ sum_i=1^n Z_i x_i |mathbfx right]
bigg/ mathbbE_thetaleft[ sum_i=1^n Z_i| mathbfx right]
$$
and
$$
mu_1^prime = mathbbE_thetaleft[ sum_i=1^n (1-Z_i) x_i |mathbfx right]
bigg/ mathbbE_thetaleft[ sum_i=1^n (1-Z_i)| mathbfx right].
$$
Since
$$
mathbbE_thetaleft[Z_i|mathbfx right]=fracvarphi(x_i-mu_1) varphi(x_i-mu_1)+3varphi(x_i-mu_2),,
$$
the EM algorithm can easily be implemented in this setting.

edited 24 mins ago

answered 41 mins ago

Xi'an

50.4k686335

There is a lot of confusion in the question, confusion that could be reduced by looking at a textbook on the paper, or even the original 1977 paper by Dempster, Laird and Rubin.

Here is an excerpt of our book, Introducing Monte Carlo Methods with R, followed by my answer:

Assume that we observe $X_1, ldots, X_n$, jointly distributed from $g(mathbf x|theta)$ that satisfies
$$
g(mathbf x|theta)=int_cal Z f(mathbf x, mathbf z|theta), textdmathbf z,
$$
and that we want to compute $hattheta = argmax L(theta|mathbf x)= argmax g(mathbf x|theta)$.
Since the augmented data is $mathbf z$, where $(mathbf X, mathbf Z) sim f(mathbf x,mathbf z| theta)$
the conditional distribution of the missing data $mathbf Z$ given the observed data $mathbf x$ is
$$
k(mathbf z| theta, mathbf x) = f(mathbf x, mathbf z|theta)big/g(mathbf x|theta),.
$$
Taking the logarithm of this expression
leads to the following relationship between the complete-data likelihood $L^c(theta|mathbf x,
mathbf z)$ and the observed-data likelihood $L(theta|mathbf x)$. For any value $theta_0$,
$$
log L(theta|mathbf x)= mathbbE_theta_0[log L^c(theta|mathbf x,mathbf Z)]
-mathbbE_theta_0[log k(mathbf Z| theta, mathbf x)],qquad(1)
$$where the expectation is with respect to $k(mathbf z| theta_0, mathbf x)$. In the EM algorithm,
while we aim at maximizing $log L(theta|mathbf x)$, only the first term on the right side of
(1) will be considered.

Denoting$$
Q(theta |theta_0, mathbf x) = mathbbE_theta_0
[log L^c(theta|mathbf x,mathbf Z)],
$$
the EM algorithm indeed proceeds iteratively by maximizing
$Q(theta |theta_0, mathbf x)$ at each iteration and, if $hattheta_(1)$
is the value of $theta$ maximizing $Q(theta |theta_0, mathbf x)$,
by replacing $theta_0$ by the updated value $hattheta_(1)$. In this manner, a sequence of estimators
$hattheta_(j)_j$ is obtained, where $hattheta_(j)$ is defined as the value of
$theta$ maximizing $Q(theta |hattheta_(j-1), mathbf x)$; that is,$$
Q(hattheta_(j) |hattheta_(j-1), mathbf x)
= max_theta,Q(theta |hattheta_(j-1),
mathbf x).$$This iterative scheme thus contains both an expectation step
and a maximization step, giving the algorithm its name.

EM Algorithm
Pick a starting value $hattheta_(0)$

Repeat

Compute the E-step
$$
Q(theta|hattheta_(m), mathbf x)
=mathbbE_hattheta_(m) [log L^c(theta|mathbf x, mathbf Z)],,
$$
where the expectation is with respect to $k(mathbf z|hattheta_(m),mathbf x)$ and set $m=0$.

Maximize $Q(theta|hattheta_(m), mathbf x)$ in
$theta$ and take the M-step
$$
hattheta_(m+1)=argmax_theta ; Q(theta|hattheta_(m), mathbf x)
$$
and set $m=m+1$

until a fixed point is reached; i.e., $hattheta_(m+1)=hattheta_(m)$.

For the normal mixture, using the missing data structure exhibited in previously leads to an objective function
equal to
$$
Q(theta^prime|theta,mathbfx) = -frac12,sum_i=1^n
mathbbE_thetaleft[left. Z_i (x_i-mu_1)^2 + (1-Z_i) (x_i-mu_2)^2 right| mathbfx right].
$$
Solving the M-step then provides the closed-form expressions
$$
mu_1^prime = mathbbE_thetaleft[ sum_i=1^n Z_i x_i |mathbfx right]
bigg/ mathbbE_thetaleft[ sum_i=1^n Z_i| mathbfx right]
$$
and
$$
mu_1^prime = mathbbE_thetaleft[ sum_i=1^n (1-Z_i) x_i |mathbfx right]
bigg/ mathbbE_thetaleft[ sum_i=1^n (1-Z_i)| mathbfx right].
$$
Since
$$
mathbbE_thetaleft[Z_i|mathbfx right]=fracvarphi(x_i-mu_1) varphi(x_i-mu_1)+3varphi(x_i-mu_2),,
$$
the EM algorithm can easily be implemented in this setting.

edited 24 mins ago

answered 41 mins ago

Xi'an

50.4k686335

edited 24 mins ago

answered 41 mins ago

Xi'an

50.4k686335

answered 41 mins ago

Xi'an

50.4k686335

answered 41 mins ago

Xi'an

50.4k686335

Yes, now I understand my problem. Thank you so much. I really learn a new thing.
â€“Â Maryam
19 mins ago

add a commentÂ |Â

Yes, now I understand my problem. Thank you so much. I really learn a new thing.
â€“Â Maryam
19 mins ago

Yes, now I understand my problem. Thank you so much. I really learn a new thing.
â€“Â Maryam
19 mins ago

add a commentÂ |Â

up vote
0
down vote

answered 42 mins ago

Dimitris Rizopoulos

1,890110

Thank you so much for your help. I have updated my question to make it so clear.
â€“Â Maryam
32 mins ago

1

Sorry, I have a typo in my question. Could you please have a look. I meant multinomial, not multimodal.
â€“Â Maryam
23 mins ago

add a commentÂ |Â

up vote
0
down vote

answered 42 mins ago

Dimitris Rizopoulos

1,890110

Thank you so much for your help. I have updated my question to make it so clear.
â€“Â Maryam
32 mins ago

1

Sorry, I have a typo in my question. Could you please have a look. I meant multinomial, not multimodal.
â€“Â Maryam
23 mins ago

add a commentÂ |Â

up vote
0
down vote

answered 42 mins ago

Dimitris Rizopoulos

1,890110

answered 42 mins ago

Dimitris Rizopoulos

1,890110

answered 42 mins ago

Dimitris Rizopoulos

1,890110

answered 42 mins ago

Dimitris Rizopoulos

1,890110

answered 42 mins ago

Dimitris Rizopoulos

1,890110

Thank you so much for your help. I have updated my question to make it so clear.
â€“Â Maryam
32 mins ago

1

Sorry, I have a typo in my question. Could you please have a look. I meant multinomial, not multimodal.
â€“Â Maryam
23 mins ago

add a commentÂ |Â

Thank you so much for your help. I have updated my question to make it so clear.
â€“Â Maryam
32 mins ago

1

Sorry, I have a typo in my question. Could you please have a look. I meant multinomial, not multimodal.
â€“Â Maryam
23 mins ago

Thank you so much for your help. I have updated my question to make it so clear.
â€“Â Maryam
32 mins ago

Sorry, I have a typo in my question. Could you please have a look. I meant multinomial, not multimodal.
â€“Â Maryam
23 mins ago

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky