propagate conditional column value in pandas

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
6
down vote

favorite












I want to create an indicator variable that will propagate to all rows with the same customer-period value pair as the indicator. Specifically, if baz is yes, I want all rows of that same customer and period email to show my indicator.



df
Customer Period Question Score
A 1 foo 2
A 1 bar 3
A 1 baz yes
A 1 biz 1
B 1 bar 2
B 1 baz no
B 1 qux 3
A 2 foo 5
A 2 baz yes
B 2 baz yes
B 2 biz 2


I've tried



df['Indicator'] = np.where(
(df.Question.str.contains('baz') & (df.Score == 'yes')),
1, 0)


which returns



 Customer Period Question Score Indicator
A 1 foo 2 0
A 1 bar 3 0
A 1 baz yes 1
A 1 biz 1 0
B 1 bar 2 0
B 1 baz no 0
B 1 qux 3 0
A 2 foo 5 0
A 2 baz yes 1
B 2 baz yes 1
B 2 biz 2 0


But this is the desired output:



 Customer Period Question Score Indicator
A 1 foo 2 1
A 1 bar 3 1
A 1 baz yes 1
A 1 biz 1 1
B 1 bar 2 0
B 1 baz no 0
B 1 qux 3 0
A 2 foo 5 1
A 2 baz yes 1
B 2 baz yes 1
B 2 biz 2 1


I'm not sure how to go about getting what I want. Maybe groupby with ffill and another with bfill?







share|improve this question
























    up vote
    6
    down vote

    favorite












    I want to create an indicator variable that will propagate to all rows with the same customer-period value pair as the indicator. Specifically, if baz is yes, I want all rows of that same customer and period email to show my indicator.



    df
    Customer Period Question Score
    A 1 foo 2
    A 1 bar 3
    A 1 baz yes
    A 1 biz 1
    B 1 bar 2
    B 1 baz no
    B 1 qux 3
    A 2 foo 5
    A 2 baz yes
    B 2 baz yes
    B 2 biz 2


    I've tried



    df['Indicator'] = np.where(
    (df.Question.str.contains('baz') & (df.Score == 'yes')),
    1, 0)


    which returns



     Customer Period Question Score Indicator
    A 1 foo 2 0
    A 1 bar 3 0
    A 1 baz yes 1
    A 1 biz 1 0
    B 1 bar 2 0
    B 1 baz no 0
    B 1 qux 3 0
    A 2 foo 5 0
    A 2 baz yes 1
    B 2 baz yes 1
    B 2 biz 2 0


    But this is the desired output:



     Customer Period Question Score Indicator
    A 1 foo 2 1
    A 1 bar 3 1
    A 1 baz yes 1
    A 1 biz 1 1
    B 1 bar 2 0
    B 1 baz no 0
    B 1 qux 3 0
    A 2 foo 5 1
    A 2 baz yes 1
    B 2 baz yes 1
    B 2 biz 2 1


    I'm not sure how to go about getting what I want. Maybe groupby with ffill and another with bfill?







    share|improve this question






















      up vote
      6
      down vote

      favorite









      up vote
      6
      down vote

      favorite











      I want to create an indicator variable that will propagate to all rows with the same customer-period value pair as the indicator. Specifically, if baz is yes, I want all rows of that same customer and period email to show my indicator.



      df
      Customer Period Question Score
      A 1 foo 2
      A 1 bar 3
      A 1 baz yes
      A 1 biz 1
      B 1 bar 2
      B 1 baz no
      B 1 qux 3
      A 2 foo 5
      A 2 baz yes
      B 2 baz yes
      B 2 biz 2


      I've tried



      df['Indicator'] = np.where(
      (df.Question.str.contains('baz') & (df.Score == 'yes')),
      1, 0)


      which returns



       Customer Period Question Score Indicator
      A 1 foo 2 0
      A 1 bar 3 0
      A 1 baz yes 1
      A 1 biz 1 0
      B 1 bar 2 0
      B 1 baz no 0
      B 1 qux 3 0
      A 2 foo 5 0
      A 2 baz yes 1
      B 2 baz yes 1
      B 2 biz 2 0


      But this is the desired output:



       Customer Period Question Score Indicator
      A 1 foo 2 1
      A 1 bar 3 1
      A 1 baz yes 1
      A 1 biz 1 1
      B 1 bar 2 0
      B 1 baz no 0
      B 1 qux 3 0
      A 2 foo 5 1
      A 2 baz yes 1
      B 2 baz yes 1
      B 2 biz 2 1


      I'm not sure how to go about getting what I want. Maybe groupby with ffill and another with bfill?







      share|improve this question












      I want to create an indicator variable that will propagate to all rows with the same customer-period value pair as the indicator. Specifically, if baz is yes, I want all rows of that same customer and period email to show my indicator.



      df
      Customer Period Question Score
      A 1 foo 2
      A 1 bar 3
      A 1 baz yes
      A 1 biz 1
      B 1 bar 2
      B 1 baz no
      B 1 qux 3
      A 2 foo 5
      A 2 baz yes
      B 2 baz yes
      B 2 biz 2


      I've tried



      df['Indicator'] = np.where(
      (df.Question.str.contains('baz') & (df.Score == 'yes')),
      1, 0)


      which returns



       Customer Period Question Score Indicator
      A 1 foo 2 0
      A 1 bar 3 0
      A 1 baz yes 1
      A 1 biz 1 0
      B 1 bar 2 0
      B 1 baz no 0
      B 1 qux 3 0
      A 2 foo 5 0
      A 2 baz yes 1
      B 2 baz yes 1
      B 2 biz 2 0


      But this is the desired output:



       Customer Period Question Score Indicator
      A 1 foo 2 1
      A 1 bar 3 1
      A 1 baz yes 1
      A 1 biz 1 1
      B 1 bar 2 0
      B 1 baz no 0
      B 1 qux 3 0
      A 2 foo 5 1
      A 2 baz yes 1
      B 2 baz yes 1
      B 2 biz 2 1


      I'm not sure how to go about getting what I want. Maybe groupby with ffill and another with bfill?









      share|improve this question











      share|improve this question




      share|improve this question










      asked Aug 21 at 14:28









      Andrew

      466216




      466216






















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          5
          down vote



          accepted










          You can use



          In [954]: df['Indicator'] = (df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
          .groupby(['Customer', 'Period'])['eq']
          .transform('any').astype(int))

          In [955]: df
          Out[955]:
          Customer Period Question Score Indicator
          0 A 1 foo 2 1
          1 A 1 bar 3 1
          2 A 1 baz yes 1
          3 A 1 biz 1 1
          4 B 1 bar 2 0
          5 B 1 baz no 0
          6 B 1 qux 3 0
          7 A 2 foo 5 1
          8 A 2 baz yes 1
          9 B 2 baz yes 1
          10 B 2 biz 2 1



          Details



          In [956]: df.Question.eq('baz') & df.Score.eq('yes')
          Out[956]:
          0 False
          1 False
          2 True
          3 False
          4 False
          5 False
          6 False
          7 False
          8 True
          9 True
          10 False
          dtype: bool

          In [957]: df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
          Out[957]:
          Customer Period Question Score Indicator eq
          0 A 1 foo 2 1 False
          1 A 1 bar 3 1 False
          2 A 1 baz yes 1 True
          3 A 1 biz 1 1 False
          4 B 1 bar 2 0 False
          5 B 1 baz no 0 False
          6 B 1 qux 3 0 False
          7 A 2 foo 5 1 False
          8 A 2 baz yes 1 True
          9 B 2 baz yes 1 True
          10 B 2 biz 2 1 False





          share|improve this answer




















          • perfect! It took me a second to make it work with my real set, but this is a great solution.
            – Andrew
            Aug 21 at 14:47

















          up vote
          4
          down vote













          Here's one way. The idea is to use a Boolean mask with MultiIndex. Then use pd.Series.isin to compare against your filtered indices.



          mask = (df['Question'] == 'baz') & (df['Score'] == 'yes')
          idx_cols = ['Customer', 'Period']
          idx = df.set_index(idx_cols).loc[mask.values].index

          df['Indicator'] = pd.Series(df.set_index(idx_cols).index.values).isin(idx).astype(int)

          print(df)

          Customer Period Question Score Indicator
          0 A 1 foo 2 1
          1 A 1 bar 3 1
          2 A 1 baz yes 1
          3 A 1 biz 1 1
          4 B 1 bar 2 0
          5 B 1 baz no 0
          6 B 1 qux 3 0
          7 A 2 foo 5 1
          8 A 2 baz yes 1
          9 B 2 baz yes 1
          10 B 2 biz 2 1





          share|improve this answer



























            up vote
            4
            down vote













            You can factorize the tuples of Customer and Period. Then use np.logical_or.at to get group-wise any



            i, r = pd.factorize([*zip(df.Customer, df.Period)])
            a = np.zeros(len(r), dtype=np.bool8)
            np.logical_or.at(a, i, df.eval('Question == "baz" and Score == "yes"'))
            df.assign(Indicator=a[i].astype(np.int64))

            Customer Period Question Score Indicator
            0 A 1 foo 2 1
            1 A 1 bar 3 1
            2 A 1 baz yes 1
            3 A 1 biz 1 1
            4 B 1 bar 2 0
            5 B 1 baz no 0
            6 B 1 qux 3 0
            7 A 2 foo 5 1
            8 A 2 baz yes 1
            9 B 2 baz yes 1
            10 B 2 biz 2 1



            Explanation



            i, r = pd.factorize([*zip(df.Customer, df.Period)])


            produces unique (Customer, Period) pairs in r where i is an array keeping track of which element of r went where in order to produce the original list of tuples




            1. Original list of tuples



              [*zip(df.Customer, df.Period)]

              [('A', 1),
              ('A', 1),
              ('A', 1),
              ('A', 1),
              ('B', 1),
              ('B', 1),
              ('B', 1),
              ('A', 2),
              ('A', 2),
              ('B', 2),
              ('B', 2)]



            2. After factorizing, unique tuples r



              r

              array([('A', 1), ('B', 1), ('A', 2), ('B', 2)], dtype=object)



            3. And the positions i



              i

              array([0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3])


            I can now use i as indices for evaluating grouped any in Numpy using Numpy's at method on ufuncs. Basically, this allows me to create an array upfront whose values may change based on my at operation. Then specify an array of indices (that's what i will be) and an array matching the size of i that is the second part of my operation at that index.



            I end up using as my matching array



            df.eval('Question == "baz" and Score == "yes"')

            0 False
            1 False
            2 True
            3 False
            4 False
            5 False
            6 False
            7 False
            8 True
            9 True
            10 False
            dtype: bool


            Let me show this in painstaking detail



             Flag GroupIndex Group State of a
            0 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
            1 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
            2 True 0 (A, 1) [1, 0, 0, 0] # Flag is True, or_eq for Index 0
            3 False 0 (A, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
            4 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
            5 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
            6 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
            7 False 2 (A, 2) [1, 0, 0, 0] # Flag is False, So do Nothing
            8 True 2 (A, 2) [1, 0, 1, 0] # Flag is True, or_eq for Index 2
            9 True 3 (B, 2) [1, 0, 1, 1] # Flag is True, or_eq for Index 3
            10 False 3 (B, 2) [1, 0, 1, 1] # Flag is False, So do Nothing


            The final State is [1, 0, 1, 1] or in boolean terms [True, False, True, True]. And that represents the or accumulation within each unique group that is housed in a



            a

            array([ True, False, True, True])


            If I slice this with the index positions in i and cast as integers, I get



            a[i].astype(np.int64)

            array([1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1])


            Which is precisely what we were looking for.



            Finally, I use assign to produce a copy of the dataframe with its new column.



            df.assign(Indicator=a[i].astype(np.int64))

            Customer Period Question Score Indicator
            0 A 1 foo 2 1
            1 A 1 bar 3 1
            2 A 1 baz yes 1
            3 A 1 biz 1 1
            4 B 1 bar 2 0
            5 B 1 baz no 0
            6 B 1 qux 3 0
            7 A 2 foo 5 1
            8 A 2 baz yes 1
            9 B 2 baz yes 1
            10 B 2 biz 2 1



            Why Do it This Way?!



            Numpy is often faster.

            Below is a slightly more optimized approach. (basically the same)



            i, r = pd.factorize([*zip(df.Customer, df.Period)])
            a = np.zeros(len(r), dtype=np.bool8)
            q = df.Question.values == 'baz'
            s = df.Score.values == 'yes'
            m = q & s
            np.logical_or.at(a, i, m)
            df.assign(Indicator=a[i].astype(np.int64))





            share|improve this answer






















              Your Answer





              StackExchange.ifUsing("editor", function ()
              StackExchange.using("externalEditor", function ()
              StackExchange.using("snippets", function ()
              StackExchange.snippets.init();
              );
              );
              , "code-snippets");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "1"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              convertImagesToLinks: true,
              noModals: false,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













               

              draft saved


              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51950944%2fpropagate-conditional-column-value-in-pandas%23new-answer', 'question_page');

              );

              Post as a guest






























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              5
              down vote



              accepted










              You can use



              In [954]: df['Indicator'] = (df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
              .groupby(['Customer', 'Period'])['eq']
              .transform('any').astype(int))

              In [955]: df
              Out[955]:
              Customer Period Question Score Indicator
              0 A 1 foo 2 1
              1 A 1 bar 3 1
              2 A 1 baz yes 1
              3 A 1 biz 1 1
              4 B 1 bar 2 0
              5 B 1 baz no 0
              6 B 1 qux 3 0
              7 A 2 foo 5 1
              8 A 2 baz yes 1
              9 B 2 baz yes 1
              10 B 2 biz 2 1



              Details



              In [956]: df.Question.eq('baz') & df.Score.eq('yes')
              Out[956]:
              0 False
              1 False
              2 True
              3 False
              4 False
              5 False
              6 False
              7 False
              8 True
              9 True
              10 False
              dtype: bool

              In [957]: df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
              Out[957]:
              Customer Period Question Score Indicator eq
              0 A 1 foo 2 1 False
              1 A 1 bar 3 1 False
              2 A 1 baz yes 1 True
              3 A 1 biz 1 1 False
              4 B 1 bar 2 0 False
              5 B 1 baz no 0 False
              6 B 1 qux 3 0 False
              7 A 2 foo 5 1 False
              8 A 2 baz yes 1 True
              9 B 2 baz yes 1 True
              10 B 2 biz 2 1 False





              share|improve this answer




















              • perfect! It took me a second to make it work with my real set, but this is a great solution.
                – Andrew
                Aug 21 at 14:47














              up vote
              5
              down vote



              accepted










              You can use



              In [954]: df['Indicator'] = (df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
              .groupby(['Customer', 'Period'])['eq']
              .transform('any').astype(int))

              In [955]: df
              Out[955]:
              Customer Period Question Score Indicator
              0 A 1 foo 2 1
              1 A 1 bar 3 1
              2 A 1 baz yes 1
              3 A 1 biz 1 1
              4 B 1 bar 2 0
              5 B 1 baz no 0
              6 B 1 qux 3 0
              7 A 2 foo 5 1
              8 A 2 baz yes 1
              9 B 2 baz yes 1
              10 B 2 biz 2 1



              Details



              In [956]: df.Question.eq('baz') & df.Score.eq('yes')
              Out[956]:
              0 False
              1 False
              2 True
              3 False
              4 False
              5 False
              6 False
              7 False
              8 True
              9 True
              10 False
              dtype: bool

              In [957]: df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
              Out[957]:
              Customer Period Question Score Indicator eq
              0 A 1 foo 2 1 False
              1 A 1 bar 3 1 False
              2 A 1 baz yes 1 True
              3 A 1 biz 1 1 False
              4 B 1 bar 2 0 False
              5 B 1 baz no 0 False
              6 B 1 qux 3 0 False
              7 A 2 foo 5 1 False
              8 A 2 baz yes 1 True
              9 B 2 baz yes 1 True
              10 B 2 biz 2 1 False





              share|improve this answer




















              • perfect! It took me a second to make it work with my real set, but this is a great solution.
                – Andrew
                Aug 21 at 14:47












              up vote
              5
              down vote



              accepted







              up vote
              5
              down vote



              accepted






              You can use



              In [954]: df['Indicator'] = (df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
              .groupby(['Customer', 'Period'])['eq']
              .transform('any').astype(int))

              In [955]: df
              Out[955]:
              Customer Period Question Score Indicator
              0 A 1 foo 2 1
              1 A 1 bar 3 1
              2 A 1 baz yes 1
              3 A 1 biz 1 1
              4 B 1 bar 2 0
              5 B 1 baz no 0
              6 B 1 qux 3 0
              7 A 2 foo 5 1
              8 A 2 baz yes 1
              9 B 2 baz yes 1
              10 B 2 biz 2 1



              Details



              In [956]: df.Question.eq('baz') & df.Score.eq('yes')
              Out[956]:
              0 False
              1 False
              2 True
              3 False
              4 False
              5 False
              6 False
              7 False
              8 True
              9 True
              10 False
              dtype: bool

              In [957]: df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
              Out[957]:
              Customer Period Question Score Indicator eq
              0 A 1 foo 2 1 False
              1 A 1 bar 3 1 False
              2 A 1 baz yes 1 True
              3 A 1 biz 1 1 False
              4 B 1 bar 2 0 False
              5 B 1 baz no 0 False
              6 B 1 qux 3 0 False
              7 A 2 foo 5 1 False
              8 A 2 baz yes 1 True
              9 B 2 baz yes 1 True
              10 B 2 biz 2 1 False





              share|improve this answer












              You can use



              In [954]: df['Indicator'] = (df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
              .groupby(['Customer', 'Period'])['eq']
              .transform('any').astype(int))

              In [955]: df
              Out[955]:
              Customer Period Question Score Indicator
              0 A 1 foo 2 1
              1 A 1 bar 3 1
              2 A 1 baz yes 1
              3 A 1 biz 1 1
              4 B 1 bar 2 0
              5 B 1 baz no 0
              6 B 1 qux 3 0
              7 A 2 foo 5 1
              8 A 2 baz yes 1
              9 B 2 baz yes 1
              10 B 2 biz 2 1



              Details



              In [956]: df.Question.eq('baz') & df.Score.eq('yes')
              Out[956]:
              0 False
              1 False
              2 True
              3 False
              4 False
              5 False
              6 False
              7 False
              8 True
              9 True
              10 False
              dtype: bool

              In [957]: df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
              Out[957]:
              Customer Period Question Score Indicator eq
              0 A 1 foo 2 1 False
              1 A 1 bar 3 1 False
              2 A 1 baz yes 1 True
              3 A 1 biz 1 1 False
              4 B 1 bar 2 0 False
              5 B 1 baz no 0 False
              6 B 1 qux 3 0 False
              7 A 2 foo 5 1 False
              8 A 2 baz yes 1 True
              9 B 2 baz yes 1 True
              10 B 2 biz 2 1 False






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Aug 21 at 14:32









              Zero

              35.6k75584




              35.6k75584











              • perfect! It took me a second to make it work with my real set, but this is a great solution.
                – Andrew
                Aug 21 at 14:47
















              • perfect! It took me a second to make it work with my real set, but this is a great solution.
                – Andrew
                Aug 21 at 14:47















              perfect! It took me a second to make it work with my real set, but this is a great solution.
              – Andrew
              Aug 21 at 14:47




              perfect! It took me a second to make it work with my real set, but this is a great solution.
              – Andrew
              Aug 21 at 14:47












              up vote
              4
              down vote













              Here's one way. The idea is to use a Boolean mask with MultiIndex. Then use pd.Series.isin to compare against your filtered indices.



              mask = (df['Question'] == 'baz') & (df['Score'] == 'yes')
              idx_cols = ['Customer', 'Period']
              idx = df.set_index(idx_cols).loc[mask.values].index

              df['Indicator'] = pd.Series(df.set_index(idx_cols).index.values).isin(idx).astype(int)

              print(df)

              Customer Period Question Score Indicator
              0 A 1 foo 2 1
              1 A 1 bar 3 1
              2 A 1 baz yes 1
              3 A 1 biz 1 1
              4 B 1 bar 2 0
              5 B 1 baz no 0
              6 B 1 qux 3 0
              7 A 2 foo 5 1
              8 A 2 baz yes 1
              9 B 2 baz yes 1
              10 B 2 biz 2 1





              share|improve this answer
























                up vote
                4
                down vote













                Here's one way. The idea is to use a Boolean mask with MultiIndex. Then use pd.Series.isin to compare against your filtered indices.



                mask = (df['Question'] == 'baz') & (df['Score'] == 'yes')
                idx_cols = ['Customer', 'Period']
                idx = df.set_index(idx_cols).loc[mask.values].index

                df['Indicator'] = pd.Series(df.set_index(idx_cols).index.values).isin(idx).astype(int)

                print(df)

                Customer Period Question Score Indicator
                0 A 1 foo 2 1
                1 A 1 bar 3 1
                2 A 1 baz yes 1
                3 A 1 biz 1 1
                4 B 1 bar 2 0
                5 B 1 baz no 0
                6 B 1 qux 3 0
                7 A 2 foo 5 1
                8 A 2 baz yes 1
                9 B 2 baz yes 1
                10 B 2 biz 2 1





                share|improve this answer






















                  up vote
                  4
                  down vote










                  up vote
                  4
                  down vote









                  Here's one way. The idea is to use a Boolean mask with MultiIndex. Then use pd.Series.isin to compare against your filtered indices.



                  mask = (df['Question'] == 'baz') & (df['Score'] == 'yes')
                  idx_cols = ['Customer', 'Period']
                  idx = df.set_index(idx_cols).loc[mask.values].index

                  df['Indicator'] = pd.Series(df.set_index(idx_cols).index.values).isin(idx).astype(int)

                  print(df)

                  Customer Period Question Score Indicator
                  0 A 1 foo 2 1
                  1 A 1 bar 3 1
                  2 A 1 baz yes 1
                  3 A 1 biz 1 1
                  4 B 1 bar 2 0
                  5 B 1 baz no 0
                  6 B 1 qux 3 0
                  7 A 2 foo 5 1
                  8 A 2 baz yes 1
                  9 B 2 baz yes 1
                  10 B 2 biz 2 1





                  share|improve this answer












                  Here's one way. The idea is to use a Boolean mask with MultiIndex. Then use pd.Series.isin to compare against your filtered indices.



                  mask = (df['Question'] == 'baz') & (df['Score'] == 'yes')
                  idx_cols = ['Customer', 'Period']
                  idx = df.set_index(idx_cols).loc[mask.values].index

                  df['Indicator'] = pd.Series(df.set_index(idx_cols).index.values).isin(idx).astype(int)

                  print(df)

                  Customer Period Question Score Indicator
                  0 A 1 foo 2 1
                  1 A 1 bar 3 1
                  2 A 1 baz yes 1
                  3 A 1 biz 1 1
                  4 B 1 bar 2 0
                  5 B 1 baz no 0
                  6 B 1 qux 3 0
                  7 A 2 foo 5 1
                  8 A 2 baz yes 1
                  9 B 2 baz yes 1
                  10 B 2 biz 2 1






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Aug 21 at 14:37









                  jpp

                  63.3k173680




                  63.3k173680




















                      up vote
                      4
                      down vote













                      You can factorize the tuples of Customer and Period. Then use np.logical_or.at to get group-wise any



                      i, r = pd.factorize([*zip(df.Customer, df.Period)])
                      a = np.zeros(len(r), dtype=np.bool8)
                      np.logical_or.at(a, i, df.eval('Question == "baz" and Score == "yes"'))
                      df.assign(Indicator=a[i].astype(np.int64))

                      Customer Period Question Score Indicator
                      0 A 1 foo 2 1
                      1 A 1 bar 3 1
                      2 A 1 baz yes 1
                      3 A 1 biz 1 1
                      4 B 1 bar 2 0
                      5 B 1 baz no 0
                      6 B 1 qux 3 0
                      7 A 2 foo 5 1
                      8 A 2 baz yes 1
                      9 B 2 baz yes 1
                      10 B 2 biz 2 1



                      Explanation



                      i, r = pd.factorize([*zip(df.Customer, df.Period)])


                      produces unique (Customer, Period) pairs in r where i is an array keeping track of which element of r went where in order to produce the original list of tuples




                      1. Original list of tuples



                        [*zip(df.Customer, df.Period)]

                        [('A', 1),
                        ('A', 1),
                        ('A', 1),
                        ('A', 1),
                        ('B', 1),
                        ('B', 1),
                        ('B', 1),
                        ('A', 2),
                        ('A', 2),
                        ('B', 2),
                        ('B', 2)]



                      2. After factorizing, unique tuples r



                        r

                        array([('A', 1), ('B', 1), ('A', 2), ('B', 2)], dtype=object)



                      3. And the positions i



                        i

                        array([0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3])


                      I can now use i as indices for evaluating grouped any in Numpy using Numpy's at method on ufuncs. Basically, this allows me to create an array upfront whose values may change based on my at operation. Then specify an array of indices (that's what i will be) and an array matching the size of i that is the second part of my operation at that index.



                      I end up using as my matching array



                      df.eval('Question == "baz" and Score == "yes"')

                      0 False
                      1 False
                      2 True
                      3 False
                      4 False
                      5 False
                      6 False
                      7 False
                      8 True
                      9 True
                      10 False
                      dtype: bool


                      Let me show this in painstaking detail



                       Flag GroupIndex Group State of a
                      0 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
                      1 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
                      2 True 0 (A, 1) [1, 0, 0, 0] # Flag is True, or_eq for Index 0
                      3 False 0 (A, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
                      4 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
                      5 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
                      6 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
                      7 False 2 (A, 2) [1, 0, 0, 0] # Flag is False, So do Nothing
                      8 True 2 (A, 2) [1, 0, 1, 0] # Flag is True, or_eq for Index 2
                      9 True 3 (B, 2) [1, 0, 1, 1] # Flag is True, or_eq for Index 3
                      10 False 3 (B, 2) [1, 0, 1, 1] # Flag is False, So do Nothing


                      The final State is [1, 0, 1, 1] or in boolean terms [True, False, True, True]. And that represents the or accumulation within each unique group that is housed in a



                      a

                      array([ True, False, True, True])


                      If I slice this with the index positions in i and cast as integers, I get



                      a[i].astype(np.int64)

                      array([1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1])


                      Which is precisely what we were looking for.



                      Finally, I use assign to produce a copy of the dataframe with its new column.



                      df.assign(Indicator=a[i].astype(np.int64))

                      Customer Period Question Score Indicator
                      0 A 1 foo 2 1
                      1 A 1 bar 3 1
                      2 A 1 baz yes 1
                      3 A 1 biz 1 1
                      4 B 1 bar 2 0
                      5 B 1 baz no 0
                      6 B 1 qux 3 0
                      7 A 2 foo 5 1
                      8 A 2 baz yes 1
                      9 B 2 baz yes 1
                      10 B 2 biz 2 1



                      Why Do it This Way?!



                      Numpy is often faster.

                      Below is a slightly more optimized approach. (basically the same)



                      i, r = pd.factorize([*zip(df.Customer, df.Period)])
                      a = np.zeros(len(r), dtype=np.bool8)
                      q = df.Question.values == 'baz'
                      s = df.Score.values == 'yes'
                      m = q & s
                      np.logical_or.at(a, i, m)
                      df.assign(Indicator=a[i].astype(np.int64))





                      share|improve this answer


























                        up vote
                        4
                        down vote













                        You can factorize the tuples of Customer and Period. Then use np.logical_or.at to get group-wise any



                        i, r = pd.factorize([*zip(df.Customer, df.Period)])
                        a = np.zeros(len(r), dtype=np.bool8)
                        np.logical_or.at(a, i, df.eval('Question == "baz" and Score == "yes"'))
                        df.assign(Indicator=a[i].astype(np.int64))

                        Customer Period Question Score Indicator
                        0 A 1 foo 2 1
                        1 A 1 bar 3 1
                        2 A 1 baz yes 1
                        3 A 1 biz 1 1
                        4 B 1 bar 2 0
                        5 B 1 baz no 0
                        6 B 1 qux 3 0
                        7 A 2 foo 5 1
                        8 A 2 baz yes 1
                        9 B 2 baz yes 1
                        10 B 2 biz 2 1



                        Explanation



                        i, r = pd.factorize([*zip(df.Customer, df.Period)])


                        produces unique (Customer, Period) pairs in r where i is an array keeping track of which element of r went where in order to produce the original list of tuples




                        1. Original list of tuples



                          [*zip(df.Customer, df.Period)]

                          [('A', 1),
                          ('A', 1),
                          ('A', 1),
                          ('A', 1),
                          ('B', 1),
                          ('B', 1),
                          ('B', 1),
                          ('A', 2),
                          ('A', 2),
                          ('B', 2),
                          ('B', 2)]



                        2. After factorizing, unique tuples r



                          r

                          array([('A', 1), ('B', 1), ('A', 2), ('B', 2)], dtype=object)



                        3. And the positions i



                          i

                          array([0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3])


                        I can now use i as indices for evaluating grouped any in Numpy using Numpy's at method on ufuncs. Basically, this allows me to create an array upfront whose values may change based on my at operation. Then specify an array of indices (that's what i will be) and an array matching the size of i that is the second part of my operation at that index.



                        I end up using as my matching array



                        df.eval('Question == "baz" and Score == "yes"')

                        0 False
                        1 False
                        2 True
                        3 False
                        4 False
                        5 False
                        6 False
                        7 False
                        8 True
                        9 True
                        10 False
                        dtype: bool


                        Let me show this in painstaking detail



                         Flag GroupIndex Group State of a
                        0 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
                        1 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
                        2 True 0 (A, 1) [1, 0, 0, 0] # Flag is True, or_eq for Index 0
                        3 False 0 (A, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
                        4 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
                        5 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
                        6 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
                        7 False 2 (A, 2) [1, 0, 0, 0] # Flag is False, So do Nothing
                        8 True 2 (A, 2) [1, 0, 1, 0] # Flag is True, or_eq for Index 2
                        9 True 3 (B, 2) [1, 0, 1, 1] # Flag is True, or_eq for Index 3
                        10 False 3 (B, 2) [1, 0, 1, 1] # Flag is False, So do Nothing


                        The final State is [1, 0, 1, 1] or in boolean terms [True, False, True, True]. And that represents the or accumulation within each unique group that is housed in a



                        a

                        array([ True, False, True, True])


                        If I slice this with the index positions in i and cast as integers, I get



                        a[i].astype(np.int64)

                        array([1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1])


                        Which is precisely what we were looking for.



                        Finally, I use assign to produce a copy of the dataframe with its new column.



                        df.assign(Indicator=a[i].astype(np.int64))

                        Customer Period Question Score Indicator
                        0 A 1 foo 2 1
                        1 A 1 bar 3 1
                        2 A 1 baz yes 1
                        3 A 1 biz 1 1
                        4 B 1 bar 2 0
                        5 B 1 baz no 0
                        6 B 1 qux 3 0
                        7 A 2 foo 5 1
                        8 A 2 baz yes 1
                        9 B 2 baz yes 1
                        10 B 2 biz 2 1



                        Why Do it This Way?!



                        Numpy is often faster.

                        Below is a slightly more optimized approach. (basically the same)



                        i, r = pd.factorize([*zip(df.Customer, df.Period)])
                        a = np.zeros(len(r), dtype=np.bool8)
                        q = df.Question.values == 'baz'
                        s = df.Score.values == 'yes'
                        m = q & s
                        np.logical_or.at(a, i, m)
                        df.assign(Indicator=a[i].astype(np.int64))





                        share|improve this answer
























                          up vote
                          4
                          down vote










                          up vote
                          4
                          down vote









                          You can factorize the tuples of Customer and Period. Then use np.logical_or.at to get group-wise any



                          i, r = pd.factorize([*zip(df.Customer, df.Period)])
                          a = np.zeros(len(r), dtype=np.bool8)
                          np.logical_or.at(a, i, df.eval('Question == "baz" and Score == "yes"'))
                          df.assign(Indicator=a[i].astype(np.int64))

                          Customer Period Question Score Indicator
                          0 A 1 foo 2 1
                          1 A 1 bar 3 1
                          2 A 1 baz yes 1
                          3 A 1 biz 1 1
                          4 B 1 bar 2 0
                          5 B 1 baz no 0
                          6 B 1 qux 3 0
                          7 A 2 foo 5 1
                          8 A 2 baz yes 1
                          9 B 2 baz yes 1
                          10 B 2 biz 2 1



                          Explanation



                          i, r = pd.factorize([*zip(df.Customer, df.Period)])


                          produces unique (Customer, Period) pairs in r where i is an array keeping track of which element of r went where in order to produce the original list of tuples




                          1. Original list of tuples



                            [*zip(df.Customer, df.Period)]

                            [('A', 1),
                            ('A', 1),
                            ('A', 1),
                            ('A', 1),
                            ('B', 1),
                            ('B', 1),
                            ('B', 1),
                            ('A', 2),
                            ('A', 2),
                            ('B', 2),
                            ('B', 2)]



                          2. After factorizing, unique tuples r



                            r

                            array([('A', 1), ('B', 1), ('A', 2), ('B', 2)], dtype=object)



                          3. And the positions i



                            i

                            array([0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3])


                          I can now use i as indices for evaluating grouped any in Numpy using Numpy's at method on ufuncs. Basically, this allows me to create an array upfront whose values may change based on my at operation. Then specify an array of indices (that's what i will be) and an array matching the size of i that is the second part of my operation at that index.



                          I end up using as my matching array



                          df.eval('Question == "baz" and Score == "yes"')

                          0 False
                          1 False
                          2 True
                          3 False
                          4 False
                          5 False
                          6 False
                          7 False
                          8 True
                          9 True
                          10 False
                          dtype: bool


                          Let me show this in painstaking detail



                           Flag GroupIndex Group State of a
                          0 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
                          1 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
                          2 True 0 (A, 1) [1, 0, 0, 0] # Flag is True, or_eq for Index 0
                          3 False 0 (A, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
                          4 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
                          5 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
                          6 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
                          7 False 2 (A, 2) [1, 0, 0, 0] # Flag is False, So do Nothing
                          8 True 2 (A, 2) [1, 0, 1, 0] # Flag is True, or_eq for Index 2
                          9 True 3 (B, 2) [1, 0, 1, 1] # Flag is True, or_eq for Index 3
                          10 False 3 (B, 2) [1, 0, 1, 1] # Flag is False, So do Nothing


                          The final State is [1, 0, 1, 1] or in boolean terms [True, False, True, True]. And that represents the or accumulation within each unique group that is housed in a



                          a

                          array([ True, False, True, True])


                          If I slice this with the index positions in i and cast as integers, I get



                          a[i].astype(np.int64)

                          array([1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1])


                          Which is precisely what we were looking for.



                          Finally, I use assign to produce a copy of the dataframe with its new column.



                          df.assign(Indicator=a[i].astype(np.int64))

                          Customer Period Question Score Indicator
                          0 A 1 foo 2 1
                          1 A 1 bar 3 1
                          2 A 1 baz yes 1
                          3 A 1 biz 1 1
                          4 B 1 bar 2 0
                          5 B 1 baz no 0
                          6 B 1 qux 3 0
                          7 A 2 foo 5 1
                          8 A 2 baz yes 1
                          9 B 2 baz yes 1
                          10 B 2 biz 2 1



                          Why Do it This Way?!



                          Numpy is often faster.

                          Below is a slightly more optimized approach. (basically the same)



                          i, r = pd.factorize([*zip(df.Customer, df.Period)])
                          a = np.zeros(len(r), dtype=np.bool8)
                          q = df.Question.values == 'baz'
                          s = df.Score.values == 'yes'
                          m = q & s
                          np.logical_or.at(a, i, m)
                          df.assign(Indicator=a[i].astype(np.int64))





                          share|improve this answer














                          You can factorize the tuples of Customer and Period. Then use np.logical_or.at to get group-wise any



                          i, r = pd.factorize([*zip(df.Customer, df.Period)])
                          a = np.zeros(len(r), dtype=np.bool8)
                          np.logical_or.at(a, i, df.eval('Question == "baz" and Score == "yes"'))
                          df.assign(Indicator=a[i].astype(np.int64))

                          Customer Period Question Score Indicator
                          0 A 1 foo 2 1
                          1 A 1 bar 3 1
                          2 A 1 baz yes 1
                          3 A 1 biz 1 1
                          4 B 1 bar 2 0
                          5 B 1 baz no 0
                          6 B 1 qux 3 0
                          7 A 2 foo 5 1
                          8 A 2 baz yes 1
                          9 B 2 baz yes 1
                          10 B 2 biz 2 1



                          Explanation



                          i, r = pd.factorize([*zip(df.Customer, df.Period)])


                          produces unique (Customer, Period) pairs in r where i is an array keeping track of which element of r went where in order to produce the original list of tuples




                          1. Original list of tuples



                            [*zip(df.Customer, df.Period)]

                            [('A', 1),
                            ('A', 1),
                            ('A', 1),
                            ('A', 1),
                            ('B', 1),
                            ('B', 1),
                            ('B', 1),
                            ('A', 2),
                            ('A', 2),
                            ('B', 2),
                            ('B', 2)]



                          2. After factorizing, unique tuples r



                            r

                            array([('A', 1), ('B', 1), ('A', 2), ('B', 2)], dtype=object)



                          3. And the positions i



                            i

                            array([0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3])


                          I can now use i as indices for evaluating grouped any in Numpy using Numpy's at method on ufuncs. Basically, this allows me to create an array upfront whose values may change based on my at operation. Then specify an array of indices (that's what i will be) and an array matching the size of i that is the second part of my operation at that index.



                          I end up using as my matching array



                          df.eval('Question == "baz" and Score == "yes"')

                          0 False
                          1 False
                          2 True
                          3 False
                          4 False
                          5 False
                          6 False
                          7 False
                          8 True
                          9 True
                          10 False
                          dtype: bool


                          Let me show this in painstaking detail



                           Flag GroupIndex Group State of a
                          0 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
                          1 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
                          2 True 0 (A, 1) [1, 0, 0, 0] # Flag is True, or_eq for Index 0
                          3 False 0 (A, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
                          4 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
                          5 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
                          6 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
                          7 False 2 (A, 2) [1, 0, 0, 0] # Flag is False, So do Nothing
                          8 True 2 (A, 2) [1, 0, 1, 0] # Flag is True, or_eq for Index 2
                          9 True 3 (B, 2) [1, 0, 1, 1] # Flag is True, or_eq for Index 3
                          10 False 3 (B, 2) [1, 0, 1, 1] # Flag is False, So do Nothing


                          The final State is [1, 0, 1, 1] or in boolean terms [True, False, True, True]. And that represents the or accumulation within each unique group that is housed in a



                          a

                          array([ True, False, True, True])


                          If I slice this with the index positions in i and cast as integers, I get



                          a[i].astype(np.int64)

                          array([1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1])


                          Which is precisely what we were looking for.



                          Finally, I use assign to produce a copy of the dataframe with its new column.



                          df.assign(Indicator=a[i].astype(np.int64))

                          Customer Period Question Score Indicator
                          0 A 1 foo 2 1
                          1 A 1 bar 3 1
                          2 A 1 baz yes 1
                          3 A 1 biz 1 1
                          4 B 1 bar 2 0
                          5 B 1 baz no 0
                          6 B 1 qux 3 0
                          7 A 2 foo 5 1
                          8 A 2 baz yes 1
                          9 B 2 baz yes 1
                          10 B 2 biz 2 1



                          Why Do it This Way?!



                          Numpy is often faster.

                          Below is a slightly more optimized approach. (basically the same)



                          i, r = pd.factorize([*zip(df.Customer, df.Period)])
                          a = np.zeros(len(r), dtype=np.bool8)
                          q = df.Question.values == 'baz'
                          s = df.Score.values == 'yes'
                          m = q & s
                          np.logical_or.at(a, i, m)
                          df.assign(Indicator=a[i].astype(np.int64))






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Aug 21 at 15:23

























                          answered Aug 21 at 14:43









                          piRSquared

                          138k19117244




                          138k19117244



























                               

                              draft saved


                              draft discarded















































                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51950944%2fpropagate-conditional-column-value-in-pandas%23new-answer', 'question_page');

                              );

                              Post as a guest













































































                              Comments

                              Popular posts from this blog

                              What does second last employer means? [closed]

                              List of Gilmore Girls characters

                              Confectionery