Iyfjky

Question

Recently I asked a question here about how to construct a transition probability matrix given the following list:

x = "A", "A", "A", "E", "D", "D", "D", "C", "B", "E", "E", "E", "D", 
 "B", "A", "D", "B", "E", "C", "A", "D", "A", "A", "A", "A", "C", 
 "C", "C", "D", "D", "E"

For which one can get the following matrix: (see the detail from the previous question)

$$beginarraycccccc
& textA & textB & textC & textD & textE \
textA & frac59 & 0 & frac19 & frac29 & frac19 \
textB & frac13 & 0 & 0 & 0 & frac23 \
textC & frac15 & frac15 & frac25 & frac15 & 0 \
textD & frac18 & frac14 & frac18 & frac38 & frac18 \
textE & 0 & 0 & frac15 & frac25 & frac25 \
endarray$$

above is equivalent of partitioning list $x$ into sublists with size 2 and offset of 1, then counting each element and divide it by the sum of the row. The command to find the right partition is Partition[x, 2, 1] (again I refer you to the previous question). Now what if we want to find the higher order transition matrix? For example the second order would be Partition[x, 3, 1] and the expected matrix shall look like:

$$beginarraycccccc
& A & B & C & D &E \
AA & P_AA,A & P_AA,B & P_AA,C & P_AA,D &P_AA,E\
AB & P_AB,A & P_AB,B & P_AB,C & P_AB,D &P_AB,E\
AC & P_AC,A & P_AC,B & P_AC,C & P_AC,D &P_AC,E\
AD & P_AD,A & P_AD,B & P_AD,C & P_AD,D &P_AD,E\
AE & P_AE,A & P_AE,B & P_AE,C & P_AE,D &P_AE,E\
vdots & vdots & vdots & vdots &vdots & vdots\
EC & P_EC,A & P_EC,B & P_EC,C & P_EC,D &P_EC,E\
ED &P_ED,A & P_ED,B & P_ED,C & P_ED,D &P_ED,E\
EE & P_EE,A & P_EE,B & P_EE,C & P_EE,D &P_EE,E\
endarray$$

In general the dimension of the matrix follows $S$, where n is the order of the Markov chain.

Thanks for the edit @rhermans it was really bugging me.
â€“Â William
Aug 8 at 9:59 — Aug 8 at 9:59

score 5 · Accepted Answer · 2018-08-08 12:05:00Z

The following code is just brute-force. But at least yields the expected results. Also, it can be used for any order.

The first parameter is the data. The second parameter is the order.

probM[data_, ord_] := 
 Module[uniques = Union[data], acc = 0, len, trans, trPre, tData, 
 toCount, toGather, toNormalize,
 trans = Dispatch@Thread[uniques -> Range[len = Length[uniques]]];
 trPre = Dispatch@Flatten[Array[## -> ++acc &, ConstantArray[len, ord]]];
 tData = Replace[data, trans, 1];
 toCount = Partition[tData, ord + 1, 1];
 toGather = Map[Replace[#[[1, ;; -2]], trPre], #[[1, -1]] -> #[[2]] &, 
 Tally[toCount]];
 toNormalize = GatherBy[toGather, #[[1, 1]] &];
 SparseArray[
 Flatten@Map[
 With[tot = 1/Plus @@ #[[All, 2]], 
 Map[#[[1]] -> #[[2]] tot &, #]] &, toNormalize]]];

Let us check the dimensions of the first three orders.

Table[probM[x, i] // Dimensions, i, 3]
(*5, 5, 25, 5, 125, 5*)

As for the efficiency of probM, I tried replacing some of the Map with ParallelMap but it did not yield any improvement. You might want to combine with niceties from the other answer. For example, use ArrayComponents instead of dispatch tables.

In any case, check the second order table:

$$
beginarraycccccc
text & textA & textB & textC & textD & textE \
textAA & frac35 & 0 & frac15 & 0 & frac15 \
textAB & 0 & 0 & 0 & 0 & 0 \
textAC & 0 & 0 & 1 & 0 & 0 \
textAD & frac12 & frac12 & 0 & 0 & 0 \
textAE & 0 & 0 & 0 & 1 & 0 \
textBA & 0 & 0 & 0 & 1 & 0 \
textBB & 0 & 0 & 0 & 0 & 0 \
textBC & 0 & 0 & 0 & 0 & 0 \
textBD & 0 & 0 & 0 & 0 & 0 \
textBE & 0 & 0 & frac12 & 0 & frac12 \
textCA & 0 & 0 & 0 & 1 & 0 \
textCB & 0 & 0 & 0 & 0 & 1 \
textCC & 0 & 0 & frac12 & frac12 & 0 \
textCD & 0 & 0 & 0 & 1 & 0 \
textCE & 0 & 0 & 0 & 0 & 0 \
textDA & 1 & 0 & 0 & 0 & 0 \
textDB & frac12 & 0 & 0 & 0 & frac12 \
textDC & 0 & 1 & 0 & 0 & 0 \
textDD & 0 & 0 & frac13 & frac13 & frac13 \
textDE & 0 & 0 & 0 & 0 & 0 \
textEA & 0 & 0 & 0 & 0 & 0 \
textEB & 0 & 0 & 0 & 0 & 0 \
textEC & 1 & 0 & 0 & 0 & 0 \
textED & 0 & frac12 & 0 & frac12 & 0 \
textEE & 0 & 0 & 0 & frac12 & frac12 \
endarray
$$

Thanks a lot, what command did you use to get the final table? when I run your code I get 5, 5, 22, 5, 109, 5 — Aug 8 at 11:53
@William Try probM[x, 2] // MatrixForm. As for your result, it seems that you run Table[probM[x, i] // Dimensions, i, 3]. But that should have returned 5, 5, 25, 5, 125, 5. What version of MMA are you running? Mine is 9.0. — Aug 8 at 11:57
Perfect, and how can I change the orders (this is second order, if I want to check the code with 1, 3 and other orders)? and the offset (possibly)? — Aug 8 at 12:00

score 5 · Answer 2 · 2018-08-08 13:34:29Z

Update: Using EmpiricalDistribution and MarginalDistribution to compute the conditional probabilities:

ClearAll[transitionProb]
transitionProb[step_: 1][x_] := Module[states = DeleteDuplicates@x, 
 ed = EmpiricalDistribution[Partition[ArrayComponents @ x, step + 1, 1]], 
 ordering, tuples, md, condpdF,
 ordering = Ordering[states]; tuples = Tuples[ordering, step];
 md = MarginalDistribution[ed, Range[step]];
 condpdF[u__, w_] := If[PDF[md, u] === 0, 0, PDF[ed, u, w]/PDF[md, u]];
 Prepend[Row @ states[[##]], 
 ## & @@ Table[## & @@ condpdF[##, i], i, ordering] & @@@ tuples, 
 Prepend[states[[ordering]], ""]]]

Examples:

transitionProb[2][x] // Grid[#, Dividers -> All] & // TeXForm

$beginarray
hline
text & textA & textB & textC & textD & textE \
hline
textAtextA & frac35 & 0 & frac15 & 0 & frac15 \
hline
textAtextB & 0 & 0 & 0 & 0 & 0 \
hline
textAtextC & 0 & 0 & 1 & 0 & 0 \
hline
textAtextD & frac12 & frac12 & 0 & 0 & 0 \
hline
textAtextE & 0 & 0 & 0 & 1 & 0 \
hline
textBtextA & 0 & 0 & 0 & 1 & 0 \
hline
textBtextB & 0 & 0 & 0 & 0 & 0 \
hline
textBtextC & 0 & 0 & 0 & 0 & 0 \
hline
textBtextD & 0 & 0 & 0 & 0 & 0 \
hline
textBtextE & 0 & 0 & frac12 & 0 & frac12 \
hline
textCtextA & 0 & 0 & 0 & 1 & 0 \
hline
textCtextB & 0 & 0 & 0 & 0 & 1 \
hline
textCtextC & 0 & 0 & frac12 & frac12 & 0 \
hline
textCtextD & 0 & 0 & 0 & 1 & 0 \
hline
textCtextE & 0 & 0 & 0 & 0 & 0 \
hline
textDtextA & 1 & 0 & 0 & 0 & 0 \
hline
textDtextB & frac12 & 0 & 0 & 0 & frac12 \
hline
textDtextC & 0 & 1 & 0 & 0 & 0 \
hline
textDtextD & 0 & 0 & frac13 & frac13 & frac13 \
hline
textDtextE & 0 & 0 & 0 & 0 & 0 \
hline
textEtextA & 0 & 0 & 0 & 0 & 0 \
hline
textEtextB & 0 & 0 & 0 & 0 & 0 \
hline
textEtextC & 1 & 0 & 0 & 0 & 0 \
hline
textEtextD & 0 & frac12 & 0 & frac12 & 0 \
hline
textEtextE & 0 & 0 & 0 & frac12 & frac12 \
hline
endarray$

transitionProb[1][x] // Grid[#, Dividers -> All] & // TeXForm

$beginarray
hline
text & textA & textB & textC & textD & textE \
hline
textA & frac59 & 0 & frac19 & frac29 & frac19 \
hline
textB & frac13 & 0 & 0 & 0 & frac23 \
hline
textC & frac15 & frac15 & frac25 & frac15 & 0 \
hline
textD & frac18 & frac14 & frac18 & frac38 & frac18 \
hline
textE & 0 & 0 & frac15 & frac25 & frac25 \
hline
endarray$

Original answer:

states = DeleteDuplicates[x];
ordering = Ordering[states]; 
data = ArrayComponents@x ;
estproc = EstimatedProcess[data, DiscreteMarkovProcess[Length@states]];
tuples = Tuples[Range[5][[ordering]], 2];
table = Row@states[[##]], ## & @@ 
 Table[Probability[p[3] == s [Conditioned] p[1] == # && p[2] == #2, 
 p [Distributed] estproc], s, Range[Length @ states]] & @@@ tuples ;

TeXForm @ Grid[Prepend[table, Prepend[states[[ordering]], ""]], Dividers -> All]

$beginarray
hline
text & textA & textB & textC & textD & textE \
hline
textAA & frac59 & frac19 & frac29 & frac19 & 0 \
hline
textAB & 0 & 0 & 0 & 0 & 0 \
hline
textAC & frac15 & 0 & frac15 & frac25 & frac15 \
hline
textAD & frac18 & frac18 & frac38 & frac18 & frac14 \
hline
textAE & 0 & frac25 & frac25 & frac15 & 0 \
hline
textBA & 0 & 0 & 0 & 0 & 0 \
hline
textBB & 0 & 0 & 0 & 0 & 0 \
hline
textBC & 0 & 0 & 0 & 0 & 0 \
hline
textBD & 0 & 0 & 0 & 0 & 0 \
hline
textBE & 0 & 0 & 0 & 0 & 0 \
hline
textCA & frac59 & frac19 & frac29 & frac19 & 0 \
hline
textCB & frac13 & frac23 & 0 & 0 & 0 \
hline
textCC & frac15 & 0 & frac15 & frac25 & frac15 \
hline
textCD & frac18 & frac18 & frac38 & frac18 & frac14 \
hline
textCE & 0 & 0 & 0 & 0 & 0 \
hline
textDA & frac59 & frac19 & frac29 & frac19 & 0 \
hline
textDB & frac13 & frac23 & 0 & 0 & 0 \
hline
textDC & frac15 & 0 & frac15 & frac25 & frac15 \
hline
textDD & frac18 & frac18 & frac38 & frac18 & frac14 \
hline
textDE & 0 & frac25 & frac25 & frac15 & 0 \
hline
textEA & 0 & 0 & 0 & 0 & 0 \
hline
textEB & 0 & 0 & 0 & 0 & 0 \
hline
textEC & frac15 & 0 & frac15 & frac25 & frac15 \
hline
textED & frac18 & frac18 & frac38 & frac18 & frac14 \
hline
textEE & 0 & frac25 & frac25 & frac15 & 0 \
hline
endarray$

Thanks a lot for this, however as a check I tried 'Partition[x, 3, 1] // Counts' which shows that I have 3 AAA instead of 5, this happens all over the table my worry is that the state ordering and data which you defined are not matched, for example in ordering E has value of 5, while in data it is 2. — Aug 8 at 10:28
@William, re ordering of states, that's why we sort them using ordering so that 2 corresponds to E. Re the discrepancy between Partition[x, 3, 1] // Counts and the Prob[A|AA] in the table above, i think it is because table is based on the TransitionMatrix of estproc and estproc is based on one-step transitions (Partition[x,2,1]). — Aug 8 at 10:39
So basically DiscreteMarkovProcess doesn't have a memory, because in second order the transition is remembering the two steps behind that's why it is AA,A and not for example A,A,A, interesting. — Aug 8 at 10:51
thank you @kglr, it indeed works fine and is shorter yet if you change the variable in Partition[data, 3, 1] for example to Partition[data, 2, 1] or else, the final table will not come out appropriate. I think hector's answer is good because you can change the order — Aug 8 at 12:37

score 2 · Answer 3 · 2018-08-08 21:13:08Z

As a variant of my answer to the linked question, the following should work correctly and efficiently.

Some random data to work with:

x = RandomChoice[Alphabet["English", "IndexCharacters"], 1000000];

Creating the probability tensor P:

n = 2;
data = Flatten[ToCharacterCode[x]] - (ToCharacterCode["A"][[1]] - 1); // AbsoluteTiming // First
A = With[spopt = SystemOptions["SparseArrayOptions"], 
 Internal`WithLocalSettings[
 (*switch to additive assembly*)
 SetSystemOptions["SparseArrayOptions" -> "TreatRepeatedEntries" -> Total],

 (*assemble matrix*)
 SparseArray[Partition[data, n + 1, 1] -> 1, ConstantArray[Max[data], n + 1] ],

 (*reset "SparseArrayOptions" to previous value*)
 SetSystemOptions[spopt]]]; // AbsoluteTiming // First
P = #/N[Total[Abs[#], n + 1] /. 0 -> 1] &@Flatten[A, n - 1];

0.717521

0.184357

The row labels of P should be

Tuples[Sort[DeleteDuplicates[x]], n]

score 1 · Answer 4 · 2018-08-08 21:03:26Z

You can use CrossTensorate from the package CrossTabulate.m, which I used and referenced in my answer of the previous question.

The making of contingency tensors with that function is discussed in this blog post: "Contingency tables creation examples".

In general, though, I would say it is better to use Tries with Frequencies or nested associations.

tmat3 = CrossTensorate[Count == 1 + 2 + 3, Partition[x, 3, 1]];

tmat4 = CrossTensorate[Count == 1 + 2 + 3 + 4, Partition[x, 4, 1]];

tmat3["XTABTensor"] = #/(Total[#, Length[Dimensions[#]]] /. 0 -> 1) &@tmat3["XTABTensor"];
tmat4["XTABTensor"] = #/(Total[#, Length[Dimensions[#]]] /. 0 -> 1) &@tmat4["XTABTensor"];

Grid["tmat3", "tmat4", MatrixForm[tmat3], MatrixForm[tmat4]]

enter image description here

ArrayRules[tmat3["XTABTensor"]]

(* 1, 1, 1 -> 3/5, 1, 1, 5 -> 1/5, 1, 5, 4 -> 
 1, 1, 4, 2 -> 1/2, 1, 4, 1 -> 1/2, 1, 1, 3 -> 1/
 5, 1, 3, 3 -> 1, 2, 5, 5 -> 1/2, 2, 1, 4 -> 1, 2, 5, 3 -> 1/
 2, 3, 2, 5 -> 1, 3, 1, 4 -> 1, 3, 3, 3 -> 1/2, 3, 3, 4 -> 1/
 2, 3, 4, 4 -> 1, 4, 4, 4 -> 1/3, 4, 4, 3 -> 1/3, 4, 3, 2 -> 
 1, 4, 2, 1 -> 1/2, 4, 2, 5 -> 1/2, 4, 1, 1 -> 1, 4, 4, 5 -> 
 1/3, 5, 4, 4 -> 1/2, 5, 5, 5 -> 1/2, 5, 5, 4 -> 1/
 2, 5, 4, 2 -> 1/2, 5, 3, 1 -> 1, _, _, _ -> 0 *)

score 5 · Accepted Answer · 2018-08-08 12:05:00Z

The following code is just brute-force. But at least yields the expected results. Also, it can be used for any order.

The first parameter is the data. The second parameter is the order.

probM[data_, ord_] := 
 Module[uniques = Union[data], acc = 0, len, trans, trPre, tData, 
 toCount, toGather, toNormalize,
 trans = Dispatch@Thread[uniques -> Range[len = Length[uniques]]];
 trPre = Dispatch@Flatten[Array[## -> ++acc &, ConstantArray[len, ord]]];
 tData = Replace[data, trans, 1];
 toCount = Partition[tData, ord + 1, 1];
 toGather = Map[Replace[#[[1, ;; -2]], trPre], #[[1, -1]] -> #[[2]] &, 
 Tally[toCount]];
 toNormalize = GatherBy[toGather, #[[1, 1]] &];
 SparseArray[
 Flatten@Map[
 With[tot = 1/Plus @@ #[[All, 2]], 
 Map[#[[1]] -> #[[2]] tot &, #]] &, toNormalize]]];

Let us check the dimensions of the first three orders.

Table[probM[x, i] // Dimensions, i, 3]
(*5, 5, 25, 5, 125, 5*)

As for the efficiency of probM, I tried replacing some of the Map with ParallelMap but it did not yield any improvement. You might want to combine with niceties from the other answer. For example, use ArrayComponents instead of dispatch tables.

In any case, check the second order table:

$$
beginarraycccccc
text & textA & textB & textC & textD & textE \
textAA & frac35 & 0 & frac15 & 0 & frac15 \
textAB & 0 & 0 & 0 & 0 & 0 \
textAC & 0 & 0 & 1 & 0 & 0 \
textAD & frac12 & frac12 & 0 & 0 & 0 \
textAE & 0 & 0 & 0 & 1 & 0 \
textBA & 0 & 0 & 0 & 1 & 0 \
textBB & 0 & 0 & 0 & 0 & 0 \
textBC & 0 & 0 & 0 & 0 & 0 \
textBD & 0 & 0 & 0 & 0 & 0 \
textBE & 0 & 0 & frac12 & 0 & frac12 \
textCA & 0 & 0 & 0 & 1 & 0 \
textCB & 0 & 0 & 0 & 0 & 1 \
textCC & 0 & 0 & frac12 & frac12 & 0 \
textCD & 0 & 0 & 0 & 1 & 0 \
textCE & 0 & 0 & 0 & 0 & 0 \
textDA & 1 & 0 & 0 & 0 & 0 \
textDB & frac12 & 0 & 0 & 0 & frac12 \
textDC & 0 & 1 & 0 & 0 & 0 \
textDD & 0 & 0 & frac13 & frac13 & frac13 \
textDE & 0 & 0 & 0 & 0 & 0 \
textEA & 0 & 0 & 0 & 0 & 0 \
textEB & 0 & 0 & 0 & 0 & 0 \
textEC & 1 & 0 & 0 & 0 & 0 \
textED & 0 & frac12 & 0 & frac12 & 0 \
textEE & 0 & 0 & 0 & frac12 & frac12 \
endarray
$$

Thanks a lot, what command did you use to get the final table? when I run your code I get 5, 5, 22, 5, 109, 5 — Aug 8 at 11:53
@William Try probM[x, 2] // MatrixForm. As for your result, it seems that you run Table[probM[x, i] // Dimensions, i, 3]. But that should have returned 5, 5, 25, 5, 125, 5. What version of MMA are you running? Mine is 9.0. — Aug 8 at 11:57
Perfect, and how can I change the orders (this is second order, if I want to check the code with 1, 3 and other orders)? and the offset (possibly)? — Aug 8 at 12:00

score 5 · Answer 6 · 2018-08-08 13:34:29Z

Update: Using EmpiricalDistribution and MarginalDistribution to compute the conditional probabilities:

ClearAll[transitionProb]
transitionProb[step_: 1][x_] := Module[states = DeleteDuplicates@x, 
 ed = EmpiricalDistribution[Partition[ArrayComponents @ x, step + 1, 1]], 
 ordering, tuples, md, condpdF,
 ordering = Ordering[states]; tuples = Tuples[ordering, step];
 md = MarginalDistribution[ed, Range[step]];
 condpdF[u__, w_] := If[PDF[md, u] === 0, 0, PDF[ed, u, w]/PDF[md, u]];
 Prepend[Row @ states[[##]], 
 ## & @@ Table[## & @@ condpdF[##, i], i, ordering] & @@@ tuples, 
 Prepend[states[[ordering]], ""]]]

Examples:

transitionProb[2][x] // Grid[#, Dividers -> All] & // TeXForm

$beginarray
hline
text & textA & textB & textC & textD & textE \
hline
textAtextA & frac35 & 0 & frac15 & 0 & frac15 \
hline
textAtextB & 0 & 0 & 0 & 0 & 0 \
hline
textAtextC & 0 & 0 & 1 & 0 & 0 \
hline
textAtextD & frac12 & frac12 & 0 & 0 & 0 \
hline
textAtextE & 0 & 0 & 0 & 1 & 0 \
hline
textBtextA & 0 & 0 & 0 & 1 & 0 \
hline
textBtextB & 0 & 0 & 0 & 0 & 0 \
hline
textBtextC & 0 & 0 & 0 & 0 & 0 \
hline
textBtextD & 0 & 0 & 0 & 0 & 0 \
hline
textBtextE & 0 & 0 & frac12 & 0 & frac12 \
hline
textCtextA & 0 & 0 & 0 & 1 & 0 \
hline
textCtextB & 0 & 0 & 0 & 0 & 1 \
hline
textCtextC & 0 & 0 & frac12 & frac12 & 0 \
hline
textCtextD & 0 & 0 & 0 & 1 & 0 \
hline
textCtextE & 0 & 0 & 0 & 0 & 0 \
hline
textDtextA & 1 & 0 & 0 & 0 & 0 \
hline
textDtextB & frac12 & 0 & 0 & 0 & frac12 \
hline
textDtextC & 0 & 1 & 0 & 0 & 0 \
hline
textDtextD & 0 & 0 & frac13 & frac13 & frac13 \
hline
textDtextE & 0 & 0 & 0 & 0 & 0 \
hline
textEtextA & 0 & 0 & 0 & 0 & 0 \
hline
textEtextB & 0 & 0 & 0 & 0 & 0 \
hline
textEtextC & 1 & 0 & 0 & 0 & 0 \
hline
textEtextD & 0 & frac12 & 0 & frac12 & 0 \
hline
textEtextE & 0 & 0 & 0 & frac12 & frac12 \
hline
endarray$

transitionProb[1][x] // Grid[#, Dividers -> All] & // TeXForm

$beginarray
hline
text & textA & textB & textC & textD & textE \
hline
textA & frac59 & 0 & frac19 & frac29 & frac19 \
hline
textB & frac13 & 0 & 0 & 0 & frac23 \
hline
textC & frac15 & frac15 & frac25 & frac15 & 0 \
hline
textD & frac18 & frac14 & frac18 & frac38 & frac18 \
hline
textE & 0 & 0 & frac15 & frac25 & frac25 \
hline
endarray$

Original answer:

states = DeleteDuplicates[x];
ordering = Ordering[states]; 
data = ArrayComponents@x ;
estproc = EstimatedProcess[data, DiscreteMarkovProcess[Length@states]];
tuples = Tuples[Range[5][[ordering]], 2];
table = Row@states[[##]], ## & @@ 
 Table[Probability[p[3] == s [Conditioned] p[1] == # && p[2] == #2, 
 p [Distributed] estproc], s, Range[Length @ states]] & @@@ tuples ;

TeXForm @ Grid[Prepend[table, Prepend[states[[ordering]], ""]], Dividers -> All]

$beginarray
hline
text & textA & textB & textC & textD & textE \
hline
textAA & frac59 & frac19 & frac29 & frac19 & 0 \
hline
textAB & 0 & 0 & 0 & 0 & 0 \
hline
textAC & frac15 & 0 & frac15 & frac25 & frac15 \
hline
textAD & frac18 & frac18 & frac38 & frac18 & frac14 \
hline
textAE & 0 & frac25 & frac25 & frac15 & 0 \
hline
textBA & 0 & 0 & 0 & 0 & 0 \
hline
textBB & 0 & 0 & 0 & 0 & 0 \
hline
textBC & 0 & 0 & 0 & 0 & 0 \
hline
textBD & 0 & 0 & 0 & 0 & 0 \
hline
textBE & 0 & 0 & 0 & 0 & 0 \
hline
textCA & frac59 & frac19 & frac29 & frac19 & 0 \
hline
textCB & frac13 & frac23 & 0 & 0 & 0 \
hline
textCC & frac15 & 0 & frac15 & frac25 & frac15 \
hline
textCD & frac18 & frac18 & frac38 & frac18 & frac14 \
hline
textCE & 0 & 0 & 0 & 0 & 0 \
hline
textDA & frac59 & frac19 & frac29 & frac19 & 0 \
hline
textDB & frac13 & frac23 & 0 & 0 & 0 \
hline
textDC & frac15 & 0 & frac15 & frac25 & frac15 \
hline
textDD & frac18 & frac18 & frac38 & frac18 & frac14 \
hline
textDE & 0 & frac25 & frac25 & frac15 & 0 \
hline
textEA & 0 & 0 & 0 & 0 & 0 \
hline
textEB & 0 & 0 & 0 & 0 & 0 \
hline
textEC & frac15 & 0 & frac15 & frac25 & frac15 \
hline
textED & frac18 & frac18 & frac38 & frac18 & frac14 \
hline
textEE & 0 & frac25 & frac25 & frac15 & 0 \
hline
endarray$

Thanks a lot for this, however as a check I tried 'Partition[x, 3, 1] // Counts' which shows that I have 3 AAA instead of 5, this happens all over the table my worry is that the state ordering and data which you defined are not matched, for example in ordering E has value of 5, while in data it is 2. — Aug 8 at 10:28
@William, re ordering of states, that's why we sort them using ordering so that 2 corresponds to E. Re the discrepancy between Partition[x, 3, 1] // Counts and the Prob[A|AA] in the table above, i think it is because table is based on the TransitionMatrix of estproc and estproc is based on one-step transitions (Partition[x,2,1]). — Aug 8 at 10:39
So basically DiscreteMarkovProcess doesn't have a memory, because in second order the transition is remembering the two steps behind that's why it is AA,A and not for example A,A,A, interesting. — Aug 8 at 10:51
thank you @kglr, it indeed works fine and is shorter yet if you change the variable in Partition[data, 3, 1] for example to Partition[data, 2, 1] or else, the final table will not come out appropriate. I think hector's answer is good because you can change the order — Aug 8 at 12:37

score 2 · Answer 7 · 2018-08-08 21:13:08Z

As a variant of my answer to the linked question, the following should work correctly and efficiently.

Some random data to work with:

x = RandomChoice[Alphabet["English", "IndexCharacters"], 1000000];

Creating the probability tensor P:

n = 2;
data = Flatten[ToCharacterCode[x]] - (ToCharacterCode["A"][[1]] - 1); // AbsoluteTiming // First
A = With[spopt = SystemOptions["SparseArrayOptions"], 
 Internal`WithLocalSettings[
 (*switch to additive assembly*)
 SetSystemOptions["SparseArrayOptions" -> "TreatRepeatedEntries" -> Total],

 (*assemble matrix*)
 SparseArray[Partition[data, n + 1, 1] -> 1, ConstantArray[Max[data], n + 1] ],

 (*reset "SparseArrayOptions" to previous value*)
 SetSystemOptions[spopt]]]; // AbsoluteTiming // First
P = #/N[Total[Abs[#], n + 1] /. 0 -> 1] &@Flatten[A, n - 1];

0.717521

0.184357

The row labels of P should be

Tuples[Sort[DeleteDuplicates[x]], n]

score 1 · Answer 8 · 2018-08-08 21:03:26Z

You can use CrossTensorate from the package CrossTabulate.m, which I used and referenced in my answer of the previous question.

The making of contingency tensors with that function is discussed in this blog post: "Contingency tables creation examples".

In general, though, I would say it is better to use Tries with Frequencies or nested associations.

tmat3 = CrossTensorate[Count == 1 + 2 + 3, Partition[x, 3, 1]];

tmat4 = CrossTensorate[Count == 1 + 2 + 3 + 4, Partition[x, 4, 1]];

tmat3["XTABTensor"] = #/(Total[#, Length[Dimensions[#]]] /. 0 -> 1) &@tmat3["XTABTensor"];
tmat4["XTABTensor"] = #/(Total[#, Length[Dimensions[#]]] /. 0 -> 1) &@tmat4["XTABTensor"];

Grid["tmat3", "tmat4", MatrixForm[tmat3], MatrixForm[tmat4]]

enter image description here

ArrayRules[tmat3["XTABTensor"]]

(* 1, 1, 1 -> 3/5, 1, 1, 5 -> 1/5, 1, 5, 4 -> 
 1, 1, 4, 2 -> 1/2, 1, 4, 1 -> 1/2, 1, 1, 3 -> 1/
 5, 1, 3, 3 -> 1, 2, 5, 5 -> 1/2, 2, 1, 4 -> 1, 2, 5, 3 -> 1/
 2, 3, 2, 5 -> 1, 3, 1, 4 -> 1, 3, 3, 3 -> 1/2, 3, 3, 4 -> 1/
 2, 3, 4, 4 -> 1, 4, 4, 4 -> 1/3, 4, 4, 3 -> 1/3, 4, 3, 2 -> 
 1, 4, 2, 1 -> 1/2, 4, 2, 5 -> 1/2, 4, 1, 1 -> 1, 4, 4, 5 -> 
 1/3, 5, 4, 4 -> 1/2, 5, 5, 5 -> 1/2, 5, 5, 4 -> 1/
 2, 5, 4, 2 -> 1/2, 5, 3, 1 -> 1, _, _, _ -> 0 *)

Search This Blog

Iyfjky

Constructing higher order transition probability matrix

4 Answers
4

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

Post as a guest

Comments

Post a Comment

Popular posts from this blog

List of Gilmore Girls characters

What does second last employer means? [closed]

One-line joke

Category

Random preview

Constructing higher order transition probability matrix

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Comments

Post a Comment

Popular posts from this blog

List of Gilmore Girls characters

What does second last employer means? [closed]

One-line joke

4 Answers
4

4 Answers
4

4 Answers
4