Count the Number of Consecutive TRUEs in DataFrame (Python)

up vote
6
down vote

favorite

I have a dataset made of True and False.

Sample Table:
 A B C
0 False True False
1 False False False
2 True True False
3 True True True
4 False True False
5 True True True
6 True False False
7 True False True
8 False True True
9 True False False

I want to count the number of consecutive True values for every column, and if there's more than one consecutive True series, I want to get the max of it.

For the table above, I would get:

length = [3, 4, 2]

I found similar threads but none resolved my problem.

Since I do and will have many more columns(products), I need to do this regardless of the column name, for the whole table and get an array as the result.

And if possible, I'd like to learn the index of the first true of the longest sequence aka where this longest true series starts, so the result would be for this one:

index = [5, 2, 7]

edited 2 hours ago

asked 2 hours ago

crinix

455

New contributor

add a commentÂ |Â

up vote
6
down vote

favorite

I have a dataset made of True and False.

Sample Table:
 A B C
0 False True False
1 False False False
2 True True False
3 True True True
4 False True False
5 True True True
6 True False False
7 True False True
8 False True True
9 True False False

I want to count the number of consecutive True values for every column, and if there's more than one consecutive True series, I want to get the max of it.

For the table above, I would get:

length = [3, 4, 2]

I found similar threads but none resolved my problem.

Since I do and will have many more columns(products), I need to do this regardless of the column name, for the whole table and get an array as the result.

And if possible, I'd like to learn the index of the first true of the longest sequence aka where this longest true series starts, so the result would be for this one:

index = [5, 2, 7]

edited 2 hours ago

asked 2 hours ago

crinix

455

New contributor

add a commentÂ |Â

up vote
6
down vote

favorite

I have a dataset made of True and False.

Sample Table:
 A B C
0 False True False
1 False False False
2 True True False
3 True True True
4 False True False
5 True True True
6 True False False
7 True False True
8 False True True
9 True False False

I want to count the number of consecutive True values for every column, and if there's more than one consecutive True series, I want to get the max of it.

For the table above, I would get:

length = [3, 4, 2]

I found similar threads but none resolved my problem.

Since I do and will have many more columns(products), I need to do this regardless of the column name, for the whole table and get an array as the result.

And if possible, I'd like to learn the index of the first true of the longest sequence aka where this longest true series starts, so the result would be for this one:

index = [5, 2, 7]

edited 2 hours ago

asked 2 hours ago

crinix

455

New contributor

I have a dataset made of True and False.

Sample Table:
 A B C
0 False True False
1 False False False
2 True True False
3 True True True
4 False True False
5 True True True
6 True False False
7 True False True
8 False True True
9 True False False

I want to count the number of consecutive True values for every column, and if there's more than one consecutive True series, I want to get the max of it.

For the table above, I would get:

length = [3, 4, 2]

I found similar threads but none resolved my problem.

Since I do and will have many more columns(products), I need to do this regardless of the column name, for the whole table and get an array as the result.

And if possible, I'd like to learn the index of the first true of the longest sequence aka where this longest true series starts, so the result would be for this one:

index = [5, 2, 7]

python pandas numpy dataframe count

edited 2 hours ago

asked 2 hours ago

crinix

455

New contributor

edited 2 hours ago

asked 2 hours ago

crinix

455

New contributor

edited 2 hours ago

asked 2 hours ago

crinix

455

New contributor

asked 2 hours ago

crinix

455

asked 2 hours ago

crinix

455

New contributor

crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
5
down vote

accepted

We would basically leverage two philosophies - Catching shifts on compared array and Offsetting each column results so that we could vectorize it.

So, with that intention set, here's one way to achieve the desired results -

def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
 # a is a boolean array

 pad = np.zeros(a.shape[1],dtype=bool)
 mask = np.vstack((pad, a, pad))

 mask_step = mask[1:] != mask[:-1]
 idx = np.flatnonzero(mask_step.T)
 island_starts = idx[::2]
 island_lens = idx[1::2] - idx[::2]
 n_islands_percol = mask_step.sum(0)//2

 bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
 scale = island_lens.max()+1

 scaled_idx = np.argsort(scale*bins + island_lens)
 grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
 max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]

 max_island_percol_start = max_island_starts%(a.shape[0]+1)

 valid = n_islands_percol!=0
 cut_idx = grp_shift_idx[:-1][valid]
 max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)

 out_len = np.full(a.shape[1], fillna_len, dtype=int)
 out_len[valid] = max_island_percol_len
 out_index = np.where(valid,max_island_percol_start,fillna_index)
 return out_index, out_len

Sample run -

# Generic case to handle all 0s columns
In [112]: a
Out[112]: 
array([[False, False, False],
 [False, False, False],
 [ True, False, False],
 [ True, False, True],
 [False, False, False],
 [ True, False, True],
 [ True, False, False],
 [ True, False, True],
 [False, False, True],
 [ True, False, False]])

In [117]: starts,lens = maxisland_start_len_mask(a, fillna_index=-1, fillna_len=0)

In [118]: starts
Out[118]: array([ 5, -1, 7])

In [119]: lens
Out[119]: array([3, 0, 2])

edited 1 hour ago

answered 2 hours ago

Divakar

149k1474162

1

Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
â€“Â crinix
1 hour ago

add a commentÂ |Â

up vote
5
down vote

Solution should be simplify, if always at least one True per column:

b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

print (c)
 A B C
0 0 1 0
1 0 0 0
2 1 1 0
3 2 2 1
4 0 3 0
5 1 4 1
6 2 0 0
7 3 0 1
8 0 1 2
9 1 0 0

#get maximal value of all columns
length = c.max().tolist()
print (length)
[3, 4, 2]

#get indexes by maximal value, subtract length and add 1 
index = c.idxmax().sub(length).add(1).tolist()
print (index)
[5, 2, 7]

Detail:

print (pd.concat([b,
 b.mask(df), 
 b.mask(df).ffill(), 
 b.mask(df).ffill().fillna(0),
 b.sub(b.mask(df).ffill().fillna(0)).astype(int)
 ], axis=1, 
 keys=('cumsum', 'mask', 'ffill', 'fillna','sub')))

 cumsum mask ffill fillna sub 
 A B C A B C A B C A B C A B C
0 0 1 0 0.0 NaN 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0 1 0
1 0 1 0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0 0 0
2 1 2 0 NaN NaN 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1 1 0
3 2 3 1 NaN NaN NaN 0.0 1.0 0.0 0.0 1.0 0.0 2 2 1
4 2 4 1 2.0 NaN 1.0 2.0 1.0 1.0 2.0 1.0 1.0 0 3 0
5 3 5 2 NaN NaN NaN 2.0 1.0 1.0 2.0 1.0 1.0 1 4 1
6 4 5 2 NaN 5.0 2.0 2.0 5.0 2.0 2.0 5.0 2.0 2 0 0
7 5 5 3 NaN 5.0 NaN 2.0 5.0 2.0 2.0 5.0 2.0 3 0 1
8 5 6 4 5.0 NaN NaN 5.0 5.0 2.0 5.0 5.0 2.0 0 1 2
9 6 6 4 NaN 6.0 4.0 5.0 6.0 4.0 5.0 6.0 4.0 1 0 0

EDIT:

General solution working with only False columns - add numpy.where with boolean mask created by DataFrame.any:

print (df)
 A B C
0 False True False
1 False False False
2 True True False
3 True True False
4 False True False
5 True True False
6 True False False
7 True False False
8 False True False
9 True False False

b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

mask = df.any()
length = np.where(mask, c.max(), -1).tolist()
print (length)
[3, 4, -1]

index = np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()
print (index)
[5, 2, 0]

edited 1 hour ago

answered 2 hours ago

jezrael

293k19213290

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

crinix is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52717996%2fcount-the-number-of-consecutive-trues-in-dataframe-python%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
5
down vote

accepted

We would basically leverage two philosophies - Catching shifts on compared array and Offsetting each column results so that we could vectorize it.

So, with that intention set, here's one way to achieve the desired results -

def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
 # a is a boolean array

 pad = np.zeros(a.shape[1],dtype=bool)
 mask = np.vstack((pad, a, pad))

 mask_step = mask[1:] != mask[:-1]
 idx = np.flatnonzero(mask_step.T)
 island_starts = idx[::2]
 island_lens = idx[1::2] - idx[::2]
 n_islands_percol = mask_step.sum(0)//2

 bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
 scale = island_lens.max()+1

 scaled_idx = np.argsort(scale*bins + island_lens)
 grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
 max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]

 max_island_percol_start = max_island_starts%(a.shape[0]+1)

 valid = n_islands_percol!=0
 cut_idx = grp_shift_idx[:-1][valid]
 max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)

 out_len = np.full(a.shape[1], fillna_len, dtype=int)
 out_len[valid] = max_island_percol_len
 out_index = np.where(valid,max_island_percol_start,fillna_index)
 return out_index, out_len

Sample run -

# Generic case to handle all 0s columns
In [112]: a
Out[112]: 
array([[False, False, False],
 [False, False, False],
 [ True, False, False],
 [ True, False, True],
 [False, False, False],
 [ True, False, True],
 [ True, False, False],
 [ True, False, True],
 [False, False, True],
 [ True, False, False]])

In [117]: starts,lens = maxisland_start_len_mask(a, fillna_index=-1, fillna_len=0)

In [118]: starts
Out[118]: array([ 5, -1, 7])

In [119]: lens
Out[119]: array([3, 0, 2])

edited 1 hour ago

answered 2 hours ago

Divakar

149k1474162

1

Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
â€“Â crinix
1 hour ago

add a commentÂ |Â

up vote
5
down vote

accepted

We would basically leverage two philosophies - Catching shifts on compared array and Offsetting each column results so that we could vectorize it.

So, with that intention set, here's one way to achieve the desired results -

def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
 # a is a boolean array

 pad = np.zeros(a.shape[1],dtype=bool)
 mask = np.vstack((pad, a, pad))

 mask_step = mask[1:] != mask[:-1]
 idx = np.flatnonzero(mask_step.T)
 island_starts = idx[::2]
 island_lens = idx[1::2] - idx[::2]
 n_islands_percol = mask_step.sum(0)//2

 bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
 scale = island_lens.max()+1

 scaled_idx = np.argsort(scale*bins + island_lens)
 grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
 max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]

 max_island_percol_start = max_island_starts%(a.shape[0]+1)

 valid = n_islands_percol!=0
 cut_idx = grp_shift_idx[:-1][valid]
 max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)

 out_len = np.full(a.shape[1], fillna_len, dtype=int)
 out_len[valid] = max_island_percol_len
 out_index = np.where(valid,max_island_percol_start,fillna_index)
 return out_index, out_len

Sample run -

# Generic case to handle all 0s columns
In [112]: a
Out[112]: 
array([[False, False, False],
 [False, False, False],
 [ True, False, False],
 [ True, False, True],
 [False, False, False],
 [ True, False, True],
 [ True, False, False],
 [ True, False, True],
 [False, False, True],
 [ True, False, False]])

In [117]: starts,lens = maxisland_start_len_mask(a, fillna_index=-1, fillna_len=0)

In [118]: starts
Out[118]: array([ 5, -1, 7])

In [119]: lens
Out[119]: array([3, 0, 2])

edited 1 hour ago

answered 2 hours ago

Divakar

149k1474162

1

Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
â€“Â crinix
1 hour ago

add a commentÂ |Â

up vote
5
down vote

accepted

We would basically leverage two philosophies - Catching shifts on compared array and Offsetting each column results so that we could vectorize it.

So, with that intention set, here's one way to achieve the desired results -

def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
 # a is a boolean array

 pad = np.zeros(a.shape[1],dtype=bool)
 mask = np.vstack((pad, a, pad))

 mask_step = mask[1:] != mask[:-1]
 idx = np.flatnonzero(mask_step.T)
 island_starts = idx[::2]
 island_lens = idx[1::2] - idx[::2]
 n_islands_percol = mask_step.sum(0)//2

 bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
 scale = island_lens.max()+1

 scaled_idx = np.argsort(scale*bins + island_lens)
 grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
 max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]

 max_island_percol_start = max_island_starts%(a.shape[0]+1)

 valid = n_islands_percol!=0
 cut_idx = grp_shift_idx[:-1][valid]
 max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)

 out_len = np.full(a.shape[1], fillna_len, dtype=int)
 out_len[valid] = max_island_percol_len
 out_index = np.where(valid,max_island_percol_start,fillna_index)
 return out_index, out_len

Sample run -

# Generic case to handle all 0s columns
In [112]: a
Out[112]: 
array([[False, False, False],
 [False, False, False],
 [ True, False, False],
 [ True, False, True],
 [False, False, False],
 [ True, False, True],
 [ True, False, False],
 [ True, False, True],
 [False, False, True],
 [ True, False, False]])

In [117]: starts,lens = maxisland_start_len_mask(a, fillna_index=-1, fillna_len=0)

In [118]: starts
Out[118]: array([ 5, -1, 7])

In [119]: lens
Out[119]: array([3, 0, 2])

edited 1 hour ago

answered 2 hours ago

Divakar

149k1474162

We would basically leverage two philosophies - Catching shifts on compared array and Offsetting each column results so that we could vectorize it.

So, with that intention set, here's one way to achieve the desired results -

def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
 # a is a boolean array

 pad = np.zeros(a.shape[1],dtype=bool)
 mask = np.vstack((pad, a, pad))

 mask_step = mask[1:] != mask[:-1]
 idx = np.flatnonzero(mask_step.T)
 island_starts = idx[::2]
 island_lens = idx[1::2] - idx[::2]
 n_islands_percol = mask_step.sum(0)//2

 bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
 scale = island_lens.max()+1

 scaled_idx = np.argsort(scale*bins + island_lens)
 grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
 max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]

 max_island_percol_start = max_island_starts%(a.shape[0]+1)

 valid = n_islands_percol!=0
 cut_idx = grp_shift_idx[:-1][valid]
 max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)

 out_len = np.full(a.shape[1], fillna_len, dtype=int)
 out_len[valid] = max_island_percol_len
 out_index = np.where(valid,max_island_percol_start,fillna_index)
 return out_index, out_len

Sample run -

# Generic case to handle all 0s columns
In [112]: a
Out[112]: 
array([[False, False, False],
 [False, False, False],
 [ True, False, False],
 [ True, False, True],
 [False, False, False],
 [ True, False, True],
 [ True, False, False],
 [ True, False, True],
 [False, False, True],
 [ True, False, False]])

In [117]: starts,lens = maxisland_start_len_mask(a, fillna_index=-1, fillna_len=0)

In [118]: starts
Out[118]: array([ 5, -1, 7])

In [119]: lens
Out[119]: array([3, 0, 2])

edited 1 hour ago

answered 2 hours ago

Divakar

149k1474162

edited 1 hour ago

answered 2 hours ago

Divakar

149k1474162

answered 2 hours ago

Divakar

149k1474162

answered 2 hours ago

Divakar

149k1474162

1

Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
â€“Â crinix
1 hour ago

add a commentÂ |Â

1

Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
â€“Â crinix
1 hour ago

Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
â€“Â crinix
1 hour ago

add a commentÂ |Â

up vote
5
down vote

Solution should be simplify, if always at least one True per column:

b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

print (c)
 A B C
0 0 1 0
1 0 0 0
2 1 1 0
3 2 2 1
4 0 3 0
5 1 4 1
6 2 0 0
7 3 0 1
8 0 1 2
9 1 0 0

#get maximal value of all columns
length = c.max().tolist()
print (length)
[3, 4, 2]

#get indexes by maximal value, subtract length and add 1 
index = c.idxmax().sub(length).add(1).tolist()
print (index)
[5, 2, 7]

Detail:

print (pd.concat([b,
 b.mask(df), 
 b.mask(df).ffill(), 
 b.mask(df).ffill().fillna(0),
 b.sub(b.mask(df).ffill().fillna(0)).astype(int)
 ], axis=1, 
 keys=('cumsum', 'mask', 'ffill', 'fillna','sub')))

 cumsum mask ffill fillna sub 
 A B C A B C A B C A B C A B C
0 0 1 0 0.0 NaN 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0 1 0
1 0 1 0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0 0 0
2 1 2 0 NaN NaN 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1 1 0
3 2 3 1 NaN NaN NaN 0.0 1.0 0.0 0.0 1.0 0.0 2 2 1
4 2 4 1 2.0 NaN 1.0 2.0 1.0 1.0 2.0 1.0 1.0 0 3 0
5 3 5 2 NaN NaN NaN 2.0 1.0 1.0 2.0 1.0 1.0 1 4 1
6 4 5 2 NaN 5.0 2.0 2.0 5.0 2.0 2.0 5.0 2.0 2 0 0
7 5 5 3 NaN 5.0 NaN 2.0 5.0 2.0 2.0 5.0 2.0 3 0 1
8 5 6 4 5.0 NaN NaN 5.0 5.0 2.0 5.0 5.0 2.0 0 1 2
9 6 6 4 NaN 6.0 4.0 5.0 6.0 4.0 5.0 6.0 4.0 1 0 0

EDIT:

General solution working with only False columns - add numpy.where with boolean mask created by DataFrame.any:

print (df)
 A B C
0 False True False
1 False False False
2 True True False
3 True True False
4 False True False
5 True True False
6 True False False
7 True False False
8 False True False
9 True False False

b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

mask = df.any()
length = np.where(mask, c.max(), -1).tolist()
print (length)
[3, 4, -1]

index = np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()
print (index)
[5, 2, 0]

edited 1 hour ago

answered 2 hours ago

jezrael

293k19213290

add a commentÂ |Â

up vote
5
down vote

Solution should be simplify, if always at least one True per column:

b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

print (c)
 A B C
0 0 1 0
1 0 0 0
2 1 1 0
3 2 2 1
4 0 3 0
5 1 4 1
6 2 0 0
7 3 0 1
8 0 1 2
9 1 0 0

#get maximal value of all columns
length = c.max().tolist()
print (length)
[3, 4, 2]

#get indexes by maximal value, subtract length and add 1 
index = c.idxmax().sub(length).add(1).tolist()
print (index)
[5, 2, 7]

Detail:

print (pd.concat([b,
 b.mask(df), 
 b.mask(df).ffill(), 
 b.mask(df).ffill().fillna(0),
 b.sub(b.mask(df).ffill().fillna(0)).astype(int)
 ], axis=1, 
 keys=('cumsum', 'mask', 'ffill', 'fillna','sub')))

 cumsum mask ffill fillna sub 
 A B C A B C A B C A B C A B C
0 0 1 0 0.0 NaN 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0 1 0
1 0 1 0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0 0 0
2 1 2 0 NaN NaN 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1 1 0
3 2 3 1 NaN NaN NaN 0.0 1.0 0.0 0.0 1.0 0.0 2 2 1
4 2 4 1 2.0 NaN 1.0 2.0 1.0 1.0 2.0 1.0 1.0 0 3 0
5 3 5 2 NaN NaN NaN 2.0 1.0 1.0 2.0 1.0 1.0 1 4 1
6 4 5 2 NaN 5.0 2.0 2.0 5.0 2.0 2.0 5.0 2.0 2 0 0
7 5 5 3 NaN 5.0 NaN 2.0 5.0 2.0 2.0 5.0 2.0 3 0 1
8 5 6 4 5.0 NaN NaN 5.0 5.0 2.0 5.0 5.0 2.0 0 1 2
9 6 6 4 NaN 6.0 4.0 5.0 6.0 4.0 5.0 6.0 4.0 1 0 0

EDIT:

General solution working with only False columns - add numpy.where with boolean mask created by DataFrame.any:

print (df)
 A B C
0 False True False
1 False False False
2 True True False
3 True True False
4 False True False
5 True True False
6 True False False
7 True False False
8 False True False
9 True False False

b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

mask = df.any()
length = np.where(mask, c.max(), -1).tolist()
print (length)
[3, 4, -1]

index = np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()
print (index)
[5, 2, 0]

edited 1 hour ago

answered 2 hours ago

jezrael

293k19213290

add a commentÂ |Â

up vote
5
down vote

Solution should be simplify, if always at least one True per column:

b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

print (c)
 A B C
0 0 1 0
1 0 0 0
2 1 1 0
3 2 2 1
4 0 3 0
5 1 4 1
6 2 0 0
7 3 0 1
8 0 1 2
9 1 0 0

#get maximal value of all columns
length = c.max().tolist()
print (length)
[3, 4, 2]

#get indexes by maximal value, subtract length and add 1 
index = c.idxmax().sub(length).add(1).tolist()
print (index)
[5, 2, 7]

Detail:

print (pd.concat([b,
 b.mask(df), 
 b.mask(df).ffill(), 
 b.mask(df).ffill().fillna(0),
 b.sub(b.mask(df).ffill().fillna(0)).astype(int)
 ], axis=1, 
 keys=('cumsum', 'mask', 'ffill', 'fillna','sub')))

 cumsum mask ffill fillna sub 
 A B C A B C A B C A B C A B C
0 0 1 0 0.0 NaN 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0 1 0
1 0 1 0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0 0 0
2 1 2 0 NaN NaN 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1 1 0
3 2 3 1 NaN NaN NaN 0.0 1.0 0.0 0.0 1.0 0.0 2 2 1
4 2 4 1 2.0 NaN 1.0 2.0 1.0 1.0 2.0 1.0 1.0 0 3 0
5 3 5 2 NaN NaN NaN 2.0 1.0 1.0 2.0 1.0 1.0 1 4 1
6 4 5 2 NaN 5.0 2.0 2.0 5.0 2.0 2.0 5.0 2.0 2 0 0
7 5 5 3 NaN 5.0 NaN 2.0 5.0 2.0 2.0 5.0 2.0 3 0 1
8 5 6 4 5.0 NaN NaN 5.0 5.0 2.0 5.0 5.0 2.0 0 1 2
9 6 6 4 NaN 6.0 4.0 5.0 6.0 4.0 5.0 6.0 4.0 1 0 0

EDIT:

General solution working with only False columns - add numpy.where with boolean mask created by DataFrame.any:

print (df)
 A B C
0 False True False
1 False False False
2 True True False
3 True True False
4 False True False
5 True True False
6 True False False
7 True False False
8 False True False
9 True False False

b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

mask = df.any()
length = np.where(mask, c.max(), -1).tolist()
print (length)
[3, 4, -1]

index = np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()
print (index)
[5, 2, 0]

edited 1 hour ago

answered 2 hours ago

jezrael

293k19213290

Solution should be simplify, if always at least one True per column:

b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

print (c)
 A B C
0 0 1 0
1 0 0 0
2 1 1 0
3 2 2 1
4 0 3 0
5 1 4 1
6 2 0 0
7 3 0 1
8 0 1 2
9 1 0 0

#get maximal value of all columns
length = c.max().tolist()
print (length)
[3, 4, 2]

#get indexes by maximal value, subtract length and add 1 
index = c.idxmax().sub(length).add(1).tolist()
print (index)
[5, 2, 7]

Detail:

print (pd.concat([b,
 b.mask(df), 
 b.mask(df).ffill(), 
 b.mask(df).ffill().fillna(0),
 b.sub(b.mask(df).ffill().fillna(0)).astype(int)
 ], axis=1, 
 keys=('cumsum', 'mask', 'ffill', 'fillna','sub')))

 cumsum mask ffill fillna sub 
 A B C A B C A B C A B C A B C
0 0 1 0 0.0 NaN 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0 1 0
1 0 1 0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0 0 0
2 1 2 0 NaN NaN 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1 1 0
3 2 3 1 NaN NaN NaN 0.0 1.0 0.0 0.0 1.0 0.0 2 2 1
4 2 4 1 2.0 NaN 1.0 2.0 1.0 1.0 2.0 1.0 1.0 0 3 0
5 3 5 2 NaN NaN NaN 2.0 1.0 1.0 2.0 1.0 1.0 1 4 1
6 4 5 2 NaN 5.0 2.0 2.0 5.0 2.0 2.0 5.0 2.0 2 0 0
7 5 5 3 NaN 5.0 NaN 2.0 5.0 2.0 2.0 5.0 2.0 3 0 1
8 5 6 4 5.0 NaN NaN 5.0 5.0 2.0 5.0 5.0 2.0 0 1 2
9 6 6 4 NaN 6.0 4.0 5.0 6.0 4.0 5.0 6.0 4.0 1 0 0

EDIT:

General solution working with only False columns - add numpy.where with boolean mask created by DataFrame.any:

print (df)
 A B C
0 False True False
1 False False False
2 True True False
3 True True False
4 False True False
5 True True False
6 True False False
7 True False False
8 False True False
9 True False False

b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

mask = df.any()
length = np.where(mask, c.max(), -1).tolist()
print (length)
[3, 4, -1]

index = np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()
print (index)
[5, 2, 0]

edited 1 hour ago

answered 2 hours ago

jezrael

293k19213290

edited 1 hour ago

answered 2 hours ago

jezrael

293k19213290

answered 2 hours ago

jezrael

293k19213290

answered 2 hours ago

jezrael

293k19213290

add a commentÂ |Â

crinix is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

crinix is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky