Count the Number of Consecutive TRUEs in DataFrame (Python)
Clash Royale CLAN TAG#URR8PPP
up vote
6
down vote
favorite
I have a dataset made of True and False.
Sample Table:
A B C
0 False True False
1 False False False
2 True True False
3 True True True
4 False True False
5 True True True
6 True False False
7 True False True
8 False True True
9 True False False
I want to count the number of consecutive True values for every column, and if there's more than one consecutive True series, I want to get the max of it.
For the table above, I would get:
length = [3, 4, 2]
I found similar threads but none resolved my problem.
Since I do and will have many more columns(products), I need to do this regardless of the column name, for the whole table and get an array as the result.
And if possible, I'd like to learn the index of the first true of the longest sequence aka where this longest true series starts, so the result would be for this one:
index = [5, 2, 7]
python pandas numpy dataframe count
New contributor
crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
6
down vote
favorite
I have a dataset made of True and False.
Sample Table:
A B C
0 False True False
1 False False False
2 True True False
3 True True True
4 False True False
5 True True True
6 True False False
7 True False True
8 False True True
9 True False False
I want to count the number of consecutive True values for every column, and if there's more than one consecutive True series, I want to get the max of it.
For the table above, I would get:
length = [3, 4, 2]
I found similar threads but none resolved my problem.
Since I do and will have many more columns(products), I need to do this regardless of the column name, for the whole table and get an array as the result.
And if possible, I'd like to learn the index of the first true of the longest sequence aka where this longest true series starts, so the result would be for this one:
index = [5, 2, 7]
python pandas numpy dataframe count
New contributor
crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
up vote
6
down vote
favorite
up vote
6
down vote
favorite
I have a dataset made of True and False.
Sample Table:
A B C
0 False True False
1 False False False
2 True True False
3 True True True
4 False True False
5 True True True
6 True False False
7 True False True
8 False True True
9 True False False
I want to count the number of consecutive True values for every column, and if there's more than one consecutive True series, I want to get the max of it.
For the table above, I would get:
length = [3, 4, 2]
I found similar threads but none resolved my problem.
Since I do and will have many more columns(products), I need to do this regardless of the column name, for the whole table and get an array as the result.
And if possible, I'd like to learn the index of the first true of the longest sequence aka where this longest true series starts, so the result would be for this one:
index = [5, 2, 7]
python pandas numpy dataframe count
New contributor
crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I have a dataset made of True and False.
Sample Table:
A B C
0 False True False
1 False False False
2 True True False
3 True True True
4 False True False
5 True True True
6 True False False
7 True False True
8 False True True
9 True False False
I want to count the number of consecutive True values for every column, and if there's more than one consecutive True series, I want to get the max of it.
For the table above, I would get:
length = [3, 4, 2]
I found similar threads but none resolved my problem.
Since I do and will have many more columns(products), I need to do this regardless of the column name, for the whole table and get an array as the result.
And if possible, I'd like to learn the index of the first true of the longest sequence aka where this longest true series starts, so the result would be for this one:
index = [5, 2, 7]
python pandas numpy dataframe count
python pandas numpy dataframe count
New contributor
crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 2 hours ago
New contributor
crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 2 hours ago


crinix
455
455
New contributor
crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
5
down vote
accepted
We would basically leverage two philosophies - Catching shifts on compared array
and Offsetting each column results so that we could vectorize it
.
So, with that intention set, here's one way to achieve the desired results -
def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
# a is a boolean array
pad = np.zeros(a.shape[1],dtype=bool)
mask = np.vstack((pad, a, pad))
mask_step = mask[1:] != mask[:-1]
idx = np.flatnonzero(mask_step.T)
island_starts = idx[::2]
island_lens = idx[1::2] - idx[::2]
n_islands_percol = mask_step.sum(0)//2
bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
scale = island_lens.max()+1
scaled_idx = np.argsort(scale*bins + island_lens)
grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]
max_island_percol_start = max_island_starts%(a.shape[0]+1)
valid = n_islands_percol!=0
cut_idx = grp_shift_idx[:-1][valid]
max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)
out_len = np.full(a.shape[1], fillna_len, dtype=int)
out_len[valid] = max_island_percol_len
out_index = np.where(valid,max_island_percol_start,fillna_index)
return out_index, out_len
Sample run -
# Generic case to handle all 0s columns
In [112]: a
Out[112]:
array([[False, False, False],
[False, False, False],
[ True, False, False],
[ True, False, True],
[False, False, False],
[ True, False, True],
[ True, False, False],
[ True, False, True],
[False, False, True],
[ True, False, False]])
In [117]: starts,lens = maxisland_start_len_mask(a, fillna_index=-1, fillna_len=0)
In [118]: starts
Out[118]: array([ 5, -1, 7])
In [119]: lens
Out[119]: array([3, 0, 2])
1
Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
– crinix
1 hour ago
add a comment |Â
up vote
5
down vote
Solution should be simplify, if always at least one True
per column:
b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)
print (c)
A B C
0 0 1 0
1 0 0 0
2 1 1 0
3 2 2 1
4 0 3 0
5 1 4 1
6 2 0 0
7 3 0 1
8 0 1 2
9 1 0 0
#get maximal value of all columns
length = c.max().tolist()
print (length)
[3, 4, 2]
#get indexes by maximal value, subtract length and add 1
index = c.idxmax().sub(length).add(1).tolist()
print (index)
[5, 2, 7]
Detail:
print (pd.concat([b,
b.mask(df),
b.mask(df).ffill(),
b.mask(df).ffill().fillna(0),
b.sub(b.mask(df).ffill().fillna(0)).astype(int)
], axis=1,
keys=('cumsum', 'mask', 'ffill', 'fillna','sub')))
cumsum mask ffill fillna sub
A B C A B C A B C A B C A B C
0 0 1 0 0.0 NaN 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0 1 0
1 0 1 0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0 0 0
2 1 2 0 NaN NaN 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1 1 0
3 2 3 1 NaN NaN NaN 0.0 1.0 0.0 0.0 1.0 0.0 2 2 1
4 2 4 1 2.0 NaN 1.0 2.0 1.0 1.0 2.0 1.0 1.0 0 3 0
5 3 5 2 NaN NaN NaN 2.0 1.0 1.0 2.0 1.0 1.0 1 4 1
6 4 5 2 NaN 5.0 2.0 2.0 5.0 2.0 2.0 5.0 2.0 2 0 0
7 5 5 3 NaN 5.0 NaN 2.0 5.0 2.0 2.0 5.0 2.0 3 0 1
8 5 6 4 5.0 NaN NaN 5.0 5.0 2.0 5.0 5.0 2.0 0 1 2
9 6 6 4 NaN 6.0 4.0 5.0 6.0 4.0 5.0 6.0 4.0 1 0 0
EDIT:
General solution working with only False
columns - add numpy.where
with boolean mask created by DataFrame.any
:
print (df)
A B C
0 False True False
1 False False False
2 True True False
3 True True False
4 False True False
5 True True False
6 True False False
7 True False False
8 False True False
9 True False False
b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)
mask = df.any()
length = np.where(mask, c.max(), -1).tolist()
print (length)
[3, 4, -1]
index = np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()
print (index)
[5, 2, 0]
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
accepted
We would basically leverage two philosophies - Catching shifts on compared array
and Offsetting each column results so that we could vectorize it
.
So, with that intention set, here's one way to achieve the desired results -
def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
# a is a boolean array
pad = np.zeros(a.shape[1],dtype=bool)
mask = np.vstack((pad, a, pad))
mask_step = mask[1:] != mask[:-1]
idx = np.flatnonzero(mask_step.T)
island_starts = idx[::2]
island_lens = idx[1::2] - idx[::2]
n_islands_percol = mask_step.sum(0)//2
bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
scale = island_lens.max()+1
scaled_idx = np.argsort(scale*bins + island_lens)
grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]
max_island_percol_start = max_island_starts%(a.shape[0]+1)
valid = n_islands_percol!=0
cut_idx = grp_shift_idx[:-1][valid]
max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)
out_len = np.full(a.shape[1], fillna_len, dtype=int)
out_len[valid] = max_island_percol_len
out_index = np.where(valid,max_island_percol_start,fillna_index)
return out_index, out_len
Sample run -
# Generic case to handle all 0s columns
In [112]: a
Out[112]:
array([[False, False, False],
[False, False, False],
[ True, False, False],
[ True, False, True],
[False, False, False],
[ True, False, True],
[ True, False, False],
[ True, False, True],
[False, False, True],
[ True, False, False]])
In [117]: starts,lens = maxisland_start_len_mask(a, fillna_index=-1, fillna_len=0)
In [118]: starts
Out[118]: array([ 5, -1, 7])
In [119]: lens
Out[119]: array([3, 0, 2])
1
Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
– crinix
1 hour ago
add a comment |Â
up vote
5
down vote
accepted
We would basically leverage two philosophies - Catching shifts on compared array
and Offsetting each column results so that we could vectorize it
.
So, with that intention set, here's one way to achieve the desired results -
def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
# a is a boolean array
pad = np.zeros(a.shape[1],dtype=bool)
mask = np.vstack((pad, a, pad))
mask_step = mask[1:] != mask[:-1]
idx = np.flatnonzero(mask_step.T)
island_starts = idx[::2]
island_lens = idx[1::2] - idx[::2]
n_islands_percol = mask_step.sum(0)//2
bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
scale = island_lens.max()+1
scaled_idx = np.argsort(scale*bins + island_lens)
grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]
max_island_percol_start = max_island_starts%(a.shape[0]+1)
valid = n_islands_percol!=0
cut_idx = grp_shift_idx[:-1][valid]
max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)
out_len = np.full(a.shape[1], fillna_len, dtype=int)
out_len[valid] = max_island_percol_len
out_index = np.where(valid,max_island_percol_start,fillna_index)
return out_index, out_len
Sample run -
# Generic case to handle all 0s columns
In [112]: a
Out[112]:
array([[False, False, False],
[False, False, False],
[ True, False, False],
[ True, False, True],
[False, False, False],
[ True, False, True],
[ True, False, False],
[ True, False, True],
[False, False, True],
[ True, False, False]])
In [117]: starts,lens = maxisland_start_len_mask(a, fillna_index=-1, fillna_len=0)
In [118]: starts
Out[118]: array([ 5, -1, 7])
In [119]: lens
Out[119]: array([3, 0, 2])
1
Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
– crinix
1 hour ago
add a comment |Â
up vote
5
down vote
accepted
up vote
5
down vote
accepted
We would basically leverage two philosophies - Catching shifts on compared array
and Offsetting each column results so that we could vectorize it
.
So, with that intention set, here's one way to achieve the desired results -
def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
# a is a boolean array
pad = np.zeros(a.shape[1],dtype=bool)
mask = np.vstack((pad, a, pad))
mask_step = mask[1:] != mask[:-1]
idx = np.flatnonzero(mask_step.T)
island_starts = idx[::2]
island_lens = idx[1::2] - idx[::2]
n_islands_percol = mask_step.sum(0)//2
bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
scale = island_lens.max()+1
scaled_idx = np.argsort(scale*bins + island_lens)
grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]
max_island_percol_start = max_island_starts%(a.shape[0]+1)
valid = n_islands_percol!=0
cut_idx = grp_shift_idx[:-1][valid]
max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)
out_len = np.full(a.shape[1], fillna_len, dtype=int)
out_len[valid] = max_island_percol_len
out_index = np.where(valid,max_island_percol_start,fillna_index)
return out_index, out_len
Sample run -
# Generic case to handle all 0s columns
In [112]: a
Out[112]:
array([[False, False, False],
[False, False, False],
[ True, False, False],
[ True, False, True],
[False, False, False],
[ True, False, True],
[ True, False, False],
[ True, False, True],
[False, False, True],
[ True, False, False]])
In [117]: starts,lens = maxisland_start_len_mask(a, fillna_index=-1, fillna_len=0)
In [118]: starts
Out[118]: array([ 5, -1, 7])
In [119]: lens
Out[119]: array([3, 0, 2])
We would basically leverage two philosophies - Catching shifts on compared array
and Offsetting each column results so that we could vectorize it
.
So, with that intention set, here's one way to achieve the desired results -
def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
# a is a boolean array
pad = np.zeros(a.shape[1],dtype=bool)
mask = np.vstack((pad, a, pad))
mask_step = mask[1:] != mask[:-1]
idx = np.flatnonzero(mask_step.T)
island_starts = idx[::2]
island_lens = idx[1::2] - idx[::2]
n_islands_percol = mask_step.sum(0)//2
bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
scale = island_lens.max()+1
scaled_idx = np.argsort(scale*bins + island_lens)
grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]
max_island_percol_start = max_island_starts%(a.shape[0]+1)
valid = n_islands_percol!=0
cut_idx = grp_shift_idx[:-1][valid]
max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)
out_len = np.full(a.shape[1], fillna_len, dtype=int)
out_len[valid] = max_island_percol_len
out_index = np.where(valid,max_island_percol_start,fillna_index)
return out_index, out_len
Sample run -
# Generic case to handle all 0s columns
In [112]: a
Out[112]:
array([[False, False, False],
[False, False, False],
[ True, False, False],
[ True, False, True],
[False, False, False],
[ True, False, True],
[ True, False, False],
[ True, False, True],
[False, False, True],
[ True, False, False]])
In [117]: starts,lens = maxisland_start_len_mask(a, fillna_index=-1, fillna_len=0)
In [118]: starts
Out[118]: array([ 5, -1, 7])
In [119]: lens
Out[119]: array([3, 0, 2])
edited 1 hour ago
answered 2 hours ago


Divakar
149k1474162
149k1474162
1
Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
– crinix
1 hour ago
add a comment |Â
1
Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
– crinix
1 hour ago
1
1
Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
– crinix
1 hour ago
Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
– crinix
1 hour ago
add a comment |Â
up vote
5
down vote
Solution should be simplify, if always at least one True
per column:
b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)
print (c)
A B C
0 0 1 0
1 0 0 0
2 1 1 0
3 2 2 1
4 0 3 0
5 1 4 1
6 2 0 0
7 3 0 1
8 0 1 2
9 1 0 0
#get maximal value of all columns
length = c.max().tolist()
print (length)
[3, 4, 2]
#get indexes by maximal value, subtract length and add 1
index = c.idxmax().sub(length).add(1).tolist()
print (index)
[5, 2, 7]
Detail:
print (pd.concat([b,
b.mask(df),
b.mask(df).ffill(),
b.mask(df).ffill().fillna(0),
b.sub(b.mask(df).ffill().fillna(0)).astype(int)
], axis=1,
keys=('cumsum', 'mask', 'ffill', 'fillna','sub')))
cumsum mask ffill fillna sub
A B C A B C A B C A B C A B C
0 0 1 0 0.0 NaN 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0 1 0
1 0 1 0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0 0 0
2 1 2 0 NaN NaN 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1 1 0
3 2 3 1 NaN NaN NaN 0.0 1.0 0.0 0.0 1.0 0.0 2 2 1
4 2 4 1 2.0 NaN 1.0 2.0 1.0 1.0 2.0 1.0 1.0 0 3 0
5 3 5 2 NaN NaN NaN 2.0 1.0 1.0 2.0 1.0 1.0 1 4 1
6 4 5 2 NaN 5.0 2.0 2.0 5.0 2.0 2.0 5.0 2.0 2 0 0
7 5 5 3 NaN 5.0 NaN 2.0 5.0 2.0 2.0 5.0 2.0 3 0 1
8 5 6 4 5.0 NaN NaN 5.0 5.0 2.0 5.0 5.0 2.0 0 1 2
9 6 6 4 NaN 6.0 4.0 5.0 6.0 4.0 5.0 6.0 4.0 1 0 0
EDIT:
General solution working with only False
columns - add numpy.where
with boolean mask created by DataFrame.any
:
print (df)
A B C
0 False True False
1 False False False
2 True True False
3 True True False
4 False True False
5 True True False
6 True False False
7 True False False
8 False True False
9 True False False
b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)
mask = df.any()
length = np.where(mask, c.max(), -1).tolist()
print (length)
[3, 4, -1]
index = np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()
print (index)
[5, 2, 0]
add a comment |Â
up vote
5
down vote
Solution should be simplify, if always at least one True
per column:
b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)
print (c)
A B C
0 0 1 0
1 0 0 0
2 1 1 0
3 2 2 1
4 0 3 0
5 1 4 1
6 2 0 0
7 3 0 1
8 0 1 2
9 1 0 0
#get maximal value of all columns
length = c.max().tolist()
print (length)
[3, 4, 2]
#get indexes by maximal value, subtract length and add 1
index = c.idxmax().sub(length).add(1).tolist()
print (index)
[5, 2, 7]
Detail:
print (pd.concat([b,
b.mask(df),
b.mask(df).ffill(),
b.mask(df).ffill().fillna(0),
b.sub(b.mask(df).ffill().fillna(0)).astype(int)
], axis=1,
keys=('cumsum', 'mask', 'ffill', 'fillna','sub')))
cumsum mask ffill fillna sub
A B C A B C A B C A B C A B C
0 0 1 0 0.0 NaN 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0 1 0
1 0 1 0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0 0 0
2 1 2 0 NaN NaN 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1 1 0
3 2 3 1 NaN NaN NaN 0.0 1.0 0.0 0.0 1.0 0.0 2 2 1
4 2 4 1 2.0 NaN 1.0 2.0 1.0 1.0 2.0 1.0 1.0 0 3 0
5 3 5 2 NaN NaN NaN 2.0 1.0 1.0 2.0 1.0 1.0 1 4 1
6 4 5 2 NaN 5.0 2.0 2.0 5.0 2.0 2.0 5.0 2.0 2 0 0
7 5 5 3 NaN 5.0 NaN 2.0 5.0 2.0 2.0 5.0 2.0 3 0 1
8 5 6 4 5.0 NaN NaN 5.0 5.0 2.0 5.0 5.0 2.0 0 1 2
9 6 6 4 NaN 6.0 4.0 5.0 6.0 4.0 5.0 6.0 4.0 1 0 0
EDIT:
General solution working with only False
columns - add numpy.where
with boolean mask created by DataFrame.any
:
print (df)
A B C
0 False True False
1 False False False
2 True True False
3 True True False
4 False True False
5 True True False
6 True False False
7 True False False
8 False True False
9 True False False
b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)
mask = df.any()
length = np.where(mask, c.max(), -1).tolist()
print (length)
[3, 4, -1]
index = np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()
print (index)
[5, 2, 0]
add a comment |Â
up vote
5
down vote
up vote
5
down vote
Solution should be simplify, if always at least one True
per column:
b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)
print (c)
A B C
0 0 1 0
1 0 0 0
2 1 1 0
3 2 2 1
4 0 3 0
5 1 4 1
6 2 0 0
7 3 0 1
8 0 1 2
9 1 0 0
#get maximal value of all columns
length = c.max().tolist()
print (length)
[3, 4, 2]
#get indexes by maximal value, subtract length and add 1
index = c.idxmax().sub(length).add(1).tolist()
print (index)
[5, 2, 7]
Detail:
print (pd.concat([b,
b.mask(df),
b.mask(df).ffill(),
b.mask(df).ffill().fillna(0),
b.sub(b.mask(df).ffill().fillna(0)).astype(int)
], axis=1,
keys=('cumsum', 'mask', 'ffill', 'fillna','sub')))
cumsum mask ffill fillna sub
A B C A B C A B C A B C A B C
0 0 1 0 0.0 NaN 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0 1 0
1 0 1 0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0 0 0
2 1 2 0 NaN NaN 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1 1 0
3 2 3 1 NaN NaN NaN 0.0 1.0 0.0 0.0 1.0 0.0 2 2 1
4 2 4 1 2.0 NaN 1.0 2.0 1.0 1.0 2.0 1.0 1.0 0 3 0
5 3 5 2 NaN NaN NaN 2.0 1.0 1.0 2.0 1.0 1.0 1 4 1
6 4 5 2 NaN 5.0 2.0 2.0 5.0 2.0 2.0 5.0 2.0 2 0 0
7 5 5 3 NaN 5.0 NaN 2.0 5.0 2.0 2.0 5.0 2.0 3 0 1
8 5 6 4 5.0 NaN NaN 5.0 5.0 2.0 5.0 5.0 2.0 0 1 2
9 6 6 4 NaN 6.0 4.0 5.0 6.0 4.0 5.0 6.0 4.0 1 0 0
EDIT:
General solution working with only False
columns - add numpy.where
with boolean mask created by DataFrame.any
:
print (df)
A B C
0 False True False
1 False False False
2 True True False
3 True True False
4 False True False
5 True True False
6 True False False
7 True False False
8 False True False
9 True False False
b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)
mask = df.any()
length = np.where(mask, c.max(), -1).tolist()
print (length)
[3, 4, -1]
index = np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()
print (index)
[5, 2, 0]
Solution should be simplify, if always at least one True
per column:
b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)
print (c)
A B C
0 0 1 0
1 0 0 0
2 1 1 0
3 2 2 1
4 0 3 0
5 1 4 1
6 2 0 0
7 3 0 1
8 0 1 2
9 1 0 0
#get maximal value of all columns
length = c.max().tolist()
print (length)
[3, 4, 2]
#get indexes by maximal value, subtract length and add 1
index = c.idxmax().sub(length).add(1).tolist()
print (index)
[5, 2, 7]
Detail:
print (pd.concat([b,
b.mask(df),
b.mask(df).ffill(),
b.mask(df).ffill().fillna(0),
b.sub(b.mask(df).ffill().fillna(0)).astype(int)
], axis=1,
keys=('cumsum', 'mask', 'ffill', 'fillna','sub')))
cumsum mask ffill fillna sub
A B C A B C A B C A B C A B C
0 0 1 0 0.0 NaN 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0 1 0
1 0 1 0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0 0 0
2 1 2 0 NaN NaN 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1 1 0
3 2 3 1 NaN NaN NaN 0.0 1.0 0.0 0.0 1.0 0.0 2 2 1
4 2 4 1 2.0 NaN 1.0 2.0 1.0 1.0 2.0 1.0 1.0 0 3 0
5 3 5 2 NaN NaN NaN 2.0 1.0 1.0 2.0 1.0 1.0 1 4 1
6 4 5 2 NaN 5.0 2.0 2.0 5.0 2.0 2.0 5.0 2.0 2 0 0
7 5 5 3 NaN 5.0 NaN 2.0 5.0 2.0 2.0 5.0 2.0 3 0 1
8 5 6 4 5.0 NaN NaN 5.0 5.0 2.0 5.0 5.0 2.0 0 1 2
9 6 6 4 NaN 6.0 4.0 5.0 6.0 4.0 5.0 6.0 4.0 1 0 0
EDIT:
General solution working with only False
columns - add numpy.where
with boolean mask created by DataFrame.any
:
print (df)
A B C
0 False True False
1 False False False
2 True True False
3 True True False
4 False True False
5 True True False
6 True False False
7 True False False
8 False True False
9 True False False
b = df.cumsum()
c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)
mask = df.any()
length = np.where(mask, c.max(), -1).tolist()
print (length)
[3, 4, -1]
index = np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()
print (index)
[5, 2, 0]
edited 1 hour ago
answered 2 hours ago


jezrael
293k19213290
293k19213290
add a comment |Â
add a comment |Â
crinix is a new contributor. Be nice, and check out our Code of Conduct.
crinix is a new contributor. Be nice, and check out our Code of Conduct.
crinix is a new contributor. Be nice, and check out our Code of Conduct.
crinix is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52717996%2fcount-the-number-of-consecutive-trues-in-dataframe-python%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password