Count the Number of Consecutive TRUEs in DataFrame (Python)

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
6
down vote

favorite












I have a dataset made of True and False.



Sample Table:
A B C
0 False True False
1 False False False
2 True True False
3 True True True
4 False True False
5 True True True
6 True False False
7 True False True
8 False True True
9 True False False


I want to count the number of consecutive True values for every column, and if there's more than one consecutive True series, I want to get the max of it.



For the table above, I would get:



length = [3, 4, 2]


I found similar threads but none resolved my problem.



Since I do and will have many more columns(products), I need to do this regardless of the column name, for the whole table and get an array as the result.



And if possible, I'd like to learn the index of the first true of the longest sequence aka where this longest true series starts, so the result would be for this one:



index = [5, 2, 7]









share|improve this question









New contributor




crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.























    up vote
    6
    down vote

    favorite












    I have a dataset made of True and False.



    Sample Table:
    A B C
    0 False True False
    1 False False False
    2 True True False
    3 True True True
    4 False True False
    5 True True True
    6 True False False
    7 True False True
    8 False True True
    9 True False False


    I want to count the number of consecutive True values for every column, and if there's more than one consecutive True series, I want to get the max of it.



    For the table above, I would get:



    length = [3, 4, 2]


    I found similar threads but none resolved my problem.



    Since I do and will have many more columns(products), I need to do this regardless of the column name, for the whole table and get an array as the result.



    And if possible, I'd like to learn the index of the first true of the longest sequence aka where this longest true series starts, so the result would be for this one:



    index = [5, 2, 7]









    share|improve this question









    New contributor




    crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





















      up vote
      6
      down vote

      favorite









      up vote
      6
      down vote

      favorite











      I have a dataset made of True and False.



      Sample Table:
      A B C
      0 False True False
      1 False False False
      2 True True False
      3 True True True
      4 False True False
      5 True True True
      6 True False False
      7 True False True
      8 False True True
      9 True False False


      I want to count the number of consecutive True values for every column, and if there's more than one consecutive True series, I want to get the max of it.



      For the table above, I would get:



      length = [3, 4, 2]


      I found similar threads but none resolved my problem.



      Since I do and will have many more columns(products), I need to do this regardless of the column name, for the whole table and get an array as the result.



      And if possible, I'd like to learn the index of the first true of the longest sequence aka where this longest true series starts, so the result would be for this one:



      index = [5, 2, 7]









      share|improve this question









      New contributor




      crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      I have a dataset made of True and False.



      Sample Table:
      A B C
      0 False True False
      1 False False False
      2 True True False
      3 True True True
      4 False True False
      5 True True True
      6 True False False
      7 True False True
      8 False True True
      9 True False False


      I want to count the number of consecutive True values for every column, and if there's more than one consecutive True series, I want to get the max of it.



      For the table above, I would get:



      length = [3, 4, 2]


      I found similar threads but none resolved my problem.



      Since I do and will have many more columns(products), I need to do this regardless of the column name, for the whole table and get an array as the result.



      And if possible, I'd like to learn the index of the first true of the longest sequence aka where this longest true series starts, so the result would be for this one:



      index = [5, 2, 7]






      python pandas numpy dataframe count






      share|improve this question









      New contributor




      crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 2 hours ago





















      New contributor




      crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 2 hours ago









      crinix

      455




      455




      New contributor




      crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      crinix is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          5
          down vote



          accepted










          We would basically leverage two philosophies - Catching shifts on compared array and Offsetting each column results so that we could vectorize it.



          So, with that intention set, here's one way to achieve the desired results -



          def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
          # a is a boolean array

          pad = np.zeros(a.shape[1],dtype=bool)
          mask = np.vstack((pad, a, pad))

          mask_step = mask[1:] != mask[:-1]
          idx = np.flatnonzero(mask_step.T)
          island_starts = idx[::2]
          island_lens = idx[1::2] - idx[::2]
          n_islands_percol = mask_step.sum(0)//2

          bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
          scale = island_lens.max()+1

          scaled_idx = np.argsort(scale*bins + island_lens)
          grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
          max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]

          max_island_percol_start = max_island_starts%(a.shape[0]+1)

          valid = n_islands_percol!=0
          cut_idx = grp_shift_idx[:-1][valid]
          max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)

          out_len = np.full(a.shape[1], fillna_len, dtype=int)
          out_len[valid] = max_island_percol_len
          out_index = np.where(valid,max_island_percol_start,fillna_index)
          return out_index, out_len


          Sample run -



          # Generic case to handle all 0s columns
          In [112]: a
          Out[112]:
          array([[False, False, False],
          [False, False, False],
          [ True, False, False],
          [ True, False, True],
          [False, False, False],
          [ True, False, True],
          [ True, False, False],
          [ True, False, True],
          [False, False, True],
          [ True, False, False]])

          In [117]: starts,lens = maxisland_start_len_mask(a, fillna_index=-1, fillna_len=0)

          In [118]: starts
          Out[118]: array([ 5, -1, 7])

          In [119]: lens
          Out[119]: array([3, 0, 2])





          share|improve this answer


















          • 1




            Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
            – crinix
            1 hour ago

















          up vote
          5
          down vote













          Solution should be simplify, if always at least one True per column:



          b = df.cumsum()
          c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

          print (c)
          A B C
          0 0 1 0
          1 0 0 0
          2 1 1 0
          3 2 2 1
          4 0 3 0
          5 1 4 1
          6 2 0 0
          7 3 0 1
          8 0 1 2
          9 1 0 0

          #get maximal value of all columns
          length = c.max().tolist()
          print (length)
          [3, 4, 2]

          #get indexes by maximal value, subtract length and add 1
          index = c.idxmax().sub(length).add(1).tolist()
          print (index)
          [5, 2, 7]


          Detail:



          print (pd.concat([b,
          b.mask(df),
          b.mask(df).ffill(),
          b.mask(df).ffill().fillna(0),
          b.sub(b.mask(df).ffill().fillna(0)).astype(int)
          ], axis=1,
          keys=('cumsum', 'mask', 'ffill', 'fillna','sub')))

          cumsum mask ffill fillna sub
          A B C A B C A B C A B C A B C
          0 0 1 0 0.0 NaN 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0 1 0
          1 0 1 0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0 0 0
          2 1 2 0 NaN NaN 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1 1 0
          3 2 3 1 NaN NaN NaN 0.0 1.0 0.0 0.0 1.0 0.0 2 2 1
          4 2 4 1 2.0 NaN 1.0 2.0 1.0 1.0 2.0 1.0 1.0 0 3 0
          5 3 5 2 NaN NaN NaN 2.0 1.0 1.0 2.0 1.0 1.0 1 4 1
          6 4 5 2 NaN 5.0 2.0 2.0 5.0 2.0 2.0 5.0 2.0 2 0 0
          7 5 5 3 NaN 5.0 NaN 2.0 5.0 2.0 2.0 5.0 2.0 3 0 1
          8 5 6 4 5.0 NaN NaN 5.0 5.0 2.0 5.0 5.0 2.0 0 1 2
          9 6 6 4 NaN 6.0 4.0 5.0 6.0 4.0 5.0 6.0 4.0 1 0 0


          EDIT:



          General solution working with only False columns - add numpy.where with boolean mask created by DataFrame.any:



          print (df)
          A B C
          0 False True False
          1 False False False
          2 True True False
          3 True True False
          4 False True False
          5 True True False
          6 True False False
          7 True False False
          8 False True False
          9 True False False

          b = df.cumsum()
          c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

          mask = df.any()
          length = np.where(mask, c.max(), -1).tolist()
          print (length)
          [3, 4, -1]

          index = np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()
          print (index)
          [5, 2, 0]





          share|improve this answer






















            Your Answer





            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "1"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );






            crinix is a new contributor. Be nice, and check out our Code of Conduct.









             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52717996%2fcount-the-number-of-consecutive-trues-in-dataframe-python%23new-answer', 'question_page');

            );

            Post as a guest






























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            5
            down vote



            accepted










            We would basically leverage two philosophies - Catching shifts on compared array and Offsetting each column results so that we could vectorize it.



            So, with that intention set, here's one way to achieve the desired results -



            def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
            # a is a boolean array

            pad = np.zeros(a.shape[1],dtype=bool)
            mask = np.vstack((pad, a, pad))

            mask_step = mask[1:] != mask[:-1]
            idx = np.flatnonzero(mask_step.T)
            island_starts = idx[::2]
            island_lens = idx[1::2] - idx[::2]
            n_islands_percol = mask_step.sum(0)//2

            bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
            scale = island_lens.max()+1

            scaled_idx = np.argsort(scale*bins + island_lens)
            grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
            max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]

            max_island_percol_start = max_island_starts%(a.shape[0]+1)

            valid = n_islands_percol!=0
            cut_idx = grp_shift_idx[:-1][valid]
            max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)

            out_len = np.full(a.shape[1], fillna_len, dtype=int)
            out_len[valid] = max_island_percol_len
            out_index = np.where(valid,max_island_percol_start,fillna_index)
            return out_index, out_len


            Sample run -



            # Generic case to handle all 0s columns
            In [112]: a
            Out[112]:
            array([[False, False, False],
            [False, False, False],
            [ True, False, False],
            [ True, False, True],
            [False, False, False],
            [ True, False, True],
            [ True, False, False],
            [ True, False, True],
            [False, False, True],
            [ True, False, False]])

            In [117]: starts,lens = maxisland_start_len_mask(a, fillna_index=-1, fillna_len=0)

            In [118]: starts
            Out[118]: array([ 5, -1, 7])

            In [119]: lens
            Out[119]: array([3, 0, 2])





            share|improve this answer


















            • 1




              Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
              – crinix
              1 hour ago














            up vote
            5
            down vote



            accepted










            We would basically leverage two philosophies - Catching shifts on compared array and Offsetting each column results so that we could vectorize it.



            So, with that intention set, here's one way to achieve the desired results -



            def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
            # a is a boolean array

            pad = np.zeros(a.shape[1],dtype=bool)
            mask = np.vstack((pad, a, pad))

            mask_step = mask[1:] != mask[:-1]
            idx = np.flatnonzero(mask_step.T)
            island_starts = idx[::2]
            island_lens = idx[1::2] - idx[::2]
            n_islands_percol = mask_step.sum(0)//2

            bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
            scale = island_lens.max()+1

            scaled_idx = np.argsort(scale*bins + island_lens)
            grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
            max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]

            max_island_percol_start = max_island_starts%(a.shape[0]+1)

            valid = n_islands_percol!=0
            cut_idx = grp_shift_idx[:-1][valid]
            max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)

            out_len = np.full(a.shape[1], fillna_len, dtype=int)
            out_len[valid] = max_island_percol_len
            out_index = np.where(valid,max_island_percol_start,fillna_index)
            return out_index, out_len


            Sample run -



            # Generic case to handle all 0s columns
            In [112]: a
            Out[112]:
            array([[False, False, False],
            [False, False, False],
            [ True, False, False],
            [ True, False, True],
            [False, False, False],
            [ True, False, True],
            [ True, False, False],
            [ True, False, True],
            [False, False, True],
            [ True, False, False]])

            In [117]: starts,lens = maxisland_start_len_mask(a, fillna_index=-1, fillna_len=0)

            In [118]: starts
            Out[118]: array([ 5, -1, 7])

            In [119]: lens
            Out[119]: array([3, 0, 2])





            share|improve this answer


















            • 1




              Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
              – crinix
              1 hour ago












            up vote
            5
            down vote



            accepted







            up vote
            5
            down vote



            accepted






            We would basically leverage two philosophies - Catching shifts on compared array and Offsetting each column results so that we could vectorize it.



            So, with that intention set, here's one way to achieve the desired results -



            def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
            # a is a boolean array

            pad = np.zeros(a.shape[1],dtype=bool)
            mask = np.vstack((pad, a, pad))

            mask_step = mask[1:] != mask[:-1]
            idx = np.flatnonzero(mask_step.T)
            island_starts = idx[::2]
            island_lens = idx[1::2] - idx[::2]
            n_islands_percol = mask_step.sum(0)//2

            bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
            scale = island_lens.max()+1

            scaled_idx = np.argsort(scale*bins + island_lens)
            grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
            max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]

            max_island_percol_start = max_island_starts%(a.shape[0]+1)

            valid = n_islands_percol!=0
            cut_idx = grp_shift_idx[:-1][valid]
            max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)

            out_len = np.full(a.shape[1], fillna_len, dtype=int)
            out_len[valid] = max_island_percol_len
            out_index = np.where(valid,max_island_percol_start,fillna_index)
            return out_index, out_len


            Sample run -



            # Generic case to handle all 0s columns
            In [112]: a
            Out[112]:
            array([[False, False, False],
            [False, False, False],
            [ True, False, False],
            [ True, False, True],
            [False, False, False],
            [ True, False, True],
            [ True, False, False],
            [ True, False, True],
            [False, False, True],
            [ True, False, False]])

            In [117]: starts,lens = maxisland_start_len_mask(a, fillna_index=-1, fillna_len=0)

            In [118]: starts
            Out[118]: array([ 5, -1, 7])

            In [119]: lens
            Out[119]: array([3, 0, 2])





            share|improve this answer














            We would basically leverage two philosophies - Catching shifts on compared array and Offsetting each column results so that we could vectorize it.



            So, with that intention set, here's one way to achieve the desired results -



            def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):
            # a is a boolean array

            pad = np.zeros(a.shape[1],dtype=bool)
            mask = np.vstack((pad, a, pad))

            mask_step = mask[1:] != mask[:-1]
            idx = np.flatnonzero(mask_step.T)
            island_starts = idx[::2]
            island_lens = idx[1::2] - idx[::2]
            n_islands_percol = mask_step.sum(0)//2

            bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)
            scale = island_lens.max()+1

            scaled_idx = np.argsort(scale*bins + island_lens)
            grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]
            max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]

            max_island_percol_start = max_island_starts%(a.shape[0]+1)

            valid = n_islands_percol!=0
            cut_idx = grp_shift_idx[:-1][valid]
            max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)

            out_len = np.full(a.shape[1], fillna_len, dtype=int)
            out_len[valid] = max_island_percol_len
            out_index = np.where(valid,max_island_percol_start,fillna_index)
            return out_index, out_len


            Sample run -



            # Generic case to handle all 0s columns
            In [112]: a
            Out[112]:
            array([[False, False, False],
            [False, False, False],
            [ True, False, False],
            [ True, False, True],
            [False, False, False],
            [ True, False, True],
            [ True, False, False],
            [ True, False, True],
            [False, False, True],
            [ True, False, False]])

            In [117]: starts,lens = maxisland_start_len_mask(a, fillna_index=-1, fillna_len=0)

            In [118]: starts
            Out[118]: array([ 5, -1, 7])

            In [119]: lens
            Out[119]: array([3, 0, 2])






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 1 hour ago

























            answered 2 hours ago









            Divakar

            149k1474162




            149k1474162







            • 1




              Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
              – crinix
              1 hour ago












            • 1




              Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
              – crinix
              1 hour ago







            1




            1




            Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
            – crinix
            1 hour ago




            Changing in line 117, maxisland_start_len_mask to maxisland_start_len, it works now. THANK YOU!
            – crinix
            1 hour ago












            up vote
            5
            down vote













            Solution should be simplify, if always at least one True per column:



            b = df.cumsum()
            c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

            print (c)
            A B C
            0 0 1 0
            1 0 0 0
            2 1 1 0
            3 2 2 1
            4 0 3 0
            5 1 4 1
            6 2 0 0
            7 3 0 1
            8 0 1 2
            9 1 0 0

            #get maximal value of all columns
            length = c.max().tolist()
            print (length)
            [3, 4, 2]

            #get indexes by maximal value, subtract length and add 1
            index = c.idxmax().sub(length).add(1).tolist()
            print (index)
            [5, 2, 7]


            Detail:



            print (pd.concat([b,
            b.mask(df),
            b.mask(df).ffill(),
            b.mask(df).ffill().fillna(0),
            b.sub(b.mask(df).ffill().fillna(0)).astype(int)
            ], axis=1,
            keys=('cumsum', 'mask', 'ffill', 'fillna','sub')))

            cumsum mask ffill fillna sub
            A B C A B C A B C A B C A B C
            0 0 1 0 0.0 NaN 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0 1 0
            1 0 1 0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0 0 0
            2 1 2 0 NaN NaN 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1 1 0
            3 2 3 1 NaN NaN NaN 0.0 1.0 0.0 0.0 1.0 0.0 2 2 1
            4 2 4 1 2.0 NaN 1.0 2.0 1.0 1.0 2.0 1.0 1.0 0 3 0
            5 3 5 2 NaN NaN NaN 2.0 1.0 1.0 2.0 1.0 1.0 1 4 1
            6 4 5 2 NaN 5.0 2.0 2.0 5.0 2.0 2.0 5.0 2.0 2 0 0
            7 5 5 3 NaN 5.0 NaN 2.0 5.0 2.0 2.0 5.0 2.0 3 0 1
            8 5 6 4 5.0 NaN NaN 5.0 5.0 2.0 5.0 5.0 2.0 0 1 2
            9 6 6 4 NaN 6.0 4.0 5.0 6.0 4.0 5.0 6.0 4.0 1 0 0


            EDIT:



            General solution working with only False columns - add numpy.where with boolean mask created by DataFrame.any:



            print (df)
            A B C
            0 False True False
            1 False False False
            2 True True False
            3 True True False
            4 False True False
            5 True True False
            6 True False False
            7 True False False
            8 False True False
            9 True False False

            b = df.cumsum()
            c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

            mask = df.any()
            length = np.where(mask, c.max(), -1).tolist()
            print (length)
            [3, 4, -1]

            index = np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()
            print (index)
            [5, 2, 0]





            share|improve this answer


























              up vote
              5
              down vote













              Solution should be simplify, if always at least one True per column:



              b = df.cumsum()
              c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

              print (c)
              A B C
              0 0 1 0
              1 0 0 0
              2 1 1 0
              3 2 2 1
              4 0 3 0
              5 1 4 1
              6 2 0 0
              7 3 0 1
              8 0 1 2
              9 1 0 0

              #get maximal value of all columns
              length = c.max().tolist()
              print (length)
              [3, 4, 2]

              #get indexes by maximal value, subtract length and add 1
              index = c.idxmax().sub(length).add(1).tolist()
              print (index)
              [5, 2, 7]


              Detail:



              print (pd.concat([b,
              b.mask(df),
              b.mask(df).ffill(),
              b.mask(df).ffill().fillna(0),
              b.sub(b.mask(df).ffill().fillna(0)).astype(int)
              ], axis=1,
              keys=('cumsum', 'mask', 'ffill', 'fillna','sub')))

              cumsum mask ffill fillna sub
              A B C A B C A B C A B C A B C
              0 0 1 0 0.0 NaN 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0 1 0
              1 0 1 0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0 0 0
              2 1 2 0 NaN NaN 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1 1 0
              3 2 3 1 NaN NaN NaN 0.0 1.0 0.0 0.0 1.0 0.0 2 2 1
              4 2 4 1 2.0 NaN 1.0 2.0 1.0 1.0 2.0 1.0 1.0 0 3 0
              5 3 5 2 NaN NaN NaN 2.0 1.0 1.0 2.0 1.0 1.0 1 4 1
              6 4 5 2 NaN 5.0 2.0 2.0 5.0 2.0 2.0 5.0 2.0 2 0 0
              7 5 5 3 NaN 5.0 NaN 2.0 5.0 2.0 2.0 5.0 2.0 3 0 1
              8 5 6 4 5.0 NaN NaN 5.0 5.0 2.0 5.0 5.0 2.0 0 1 2
              9 6 6 4 NaN 6.0 4.0 5.0 6.0 4.0 5.0 6.0 4.0 1 0 0


              EDIT:



              General solution working with only False columns - add numpy.where with boolean mask created by DataFrame.any:



              print (df)
              A B C
              0 False True False
              1 False False False
              2 True True False
              3 True True False
              4 False True False
              5 True True False
              6 True False False
              7 True False False
              8 False True False
              9 True False False

              b = df.cumsum()
              c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

              mask = df.any()
              length = np.where(mask, c.max(), -1).tolist()
              print (length)
              [3, 4, -1]

              index = np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()
              print (index)
              [5, 2, 0]





              share|improve this answer
























                up vote
                5
                down vote










                up vote
                5
                down vote









                Solution should be simplify, if always at least one True per column:



                b = df.cumsum()
                c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

                print (c)
                A B C
                0 0 1 0
                1 0 0 0
                2 1 1 0
                3 2 2 1
                4 0 3 0
                5 1 4 1
                6 2 0 0
                7 3 0 1
                8 0 1 2
                9 1 0 0

                #get maximal value of all columns
                length = c.max().tolist()
                print (length)
                [3, 4, 2]

                #get indexes by maximal value, subtract length and add 1
                index = c.idxmax().sub(length).add(1).tolist()
                print (index)
                [5, 2, 7]


                Detail:



                print (pd.concat([b,
                b.mask(df),
                b.mask(df).ffill(),
                b.mask(df).ffill().fillna(0),
                b.sub(b.mask(df).ffill().fillna(0)).astype(int)
                ], axis=1,
                keys=('cumsum', 'mask', 'ffill', 'fillna','sub')))

                cumsum mask ffill fillna sub
                A B C A B C A B C A B C A B C
                0 0 1 0 0.0 NaN 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0 1 0
                1 0 1 0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0 0 0
                2 1 2 0 NaN NaN 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1 1 0
                3 2 3 1 NaN NaN NaN 0.0 1.0 0.0 0.0 1.0 0.0 2 2 1
                4 2 4 1 2.0 NaN 1.0 2.0 1.0 1.0 2.0 1.0 1.0 0 3 0
                5 3 5 2 NaN NaN NaN 2.0 1.0 1.0 2.0 1.0 1.0 1 4 1
                6 4 5 2 NaN 5.0 2.0 2.0 5.0 2.0 2.0 5.0 2.0 2 0 0
                7 5 5 3 NaN 5.0 NaN 2.0 5.0 2.0 2.0 5.0 2.0 3 0 1
                8 5 6 4 5.0 NaN NaN 5.0 5.0 2.0 5.0 5.0 2.0 0 1 2
                9 6 6 4 NaN 6.0 4.0 5.0 6.0 4.0 5.0 6.0 4.0 1 0 0


                EDIT:



                General solution working with only False columns - add numpy.where with boolean mask created by DataFrame.any:



                print (df)
                A B C
                0 False True False
                1 False False False
                2 True True False
                3 True True False
                4 False True False
                5 True True False
                6 True False False
                7 True False False
                8 False True False
                9 True False False

                b = df.cumsum()
                c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

                mask = df.any()
                length = np.where(mask, c.max(), -1).tolist()
                print (length)
                [3, 4, -1]

                index = np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()
                print (index)
                [5, 2, 0]





                share|improve this answer














                Solution should be simplify, if always at least one True per column:



                b = df.cumsum()
                c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

                print (c)
                A B C
                0 0 1 0
                1 0 0 0
                2 1 1 0
                3 2 2 1
                4 0 3 0
                5 1 4 1
                6 2 0 0
                7 3 0 1
                8 0 1 2
                9 1 0 0

                #get maximal value of all columns
                length = c.max().tolist()
                print (length)
                [3, 4, 2]

                #get indexes by maximal value, subtract length and add 1
                index = c.idxmax().sub(length).add(1).tolist()
                print (index)
                [5, 2, 7]


                Detail:



                print (pd.concat([b,
                b.mask(df),
                b.mask(df).ffill(),
                b.mask(df).ffill().fillna(0),
                b.sub(b.mask(df).ffill().fillna(0)).astype(int)
                ], axis=1,
                keys=('cumsum', 'mask', 'ffill', 'fillna','sub')))

                cumsum mask ffill fillna sub
                A B C A B C A B C A B C A B C
                0 0 1 0 0.0 NaN 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0 1 0
                1 0 1 0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0 0 0
                2 1 2 0 NaN NaN 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1 1 0
                3 2 3 1 NaN NaN NaN 0.0 1.0 0.0 0.0 1.0 0.0 2 2 1
                4 2 4 1 2.0 NaN 1.0 2.0 1.0 1.0 2.0 1.0 1.0 0 3 0
                5 3 5 2 NaN NaN NaN 2.0 1.0 1.0 2.0 1.0 1.0 1 4 1
                6 4 5 2 NaN 5.0 2.0 2.0 5.0 2.0 2.0 5.0 2.0 2 0 0
                7 5 5 3 NaN 5.0 NaN 2.0 5.0 2.0 2.0 5.0 2.0 3 0 1
                8 5 6 4 5.0 NaN NaN 5.0 5.0 2.0 5.0 5.0 2.0 0 1 2
                9 6 6 4 NaN 6.0 4.0 5.0 6.0 4.0 5.0 6.0 4.0 1 0 0


                EDIT:



                General solution working with only False columns - add numpy.where with boolean mask created by DataFrame.any:



                print (df)
                A B C
                0 False True False
                1 False False False
                2 True True False
                3 True True False
                4 False True False
                5 True True False
                6 True False False
                7 True False False
                8 False True False
                9 True False False

                b = df.cumsum()
                c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)

                mask = df.any()
                length = np.where(mask, c.max(), -1).tolist()
                print (length)
                [3, 4, -1]

                index = np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()
                print (index)
                [5, 2, 0]






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited 1 hour ago

























                answered 2 hours ago









                jezrael

                293k19213290




                293k19213290




















                    crinix is a new contributor. Be nice, and check out our Code of Conduct.









                     

                    draft saved


                    draft discarded


















                    crinix is a new contributor. Be nice, and check out our Code of Conduct.












                    crinix is a new contributor. Be nice, and check out our Code of Conduct.











                    crinix is a new contributor. Be nice, and check out our Code of Conduct.













                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52717996%2fcount-the-number-of-consecutive-trues-in-dataframe-python%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Comments

                    Popular posts from this blog

                    What does second last employer means? [closed]

                    List of Gilmore Girls characters

                    Confectionery