Pandas DataFrame check if column value exists in a group of columns

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
7
down vote

favorite
1












I have a DataFrame like this (simplified example)



id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101


And would like to create an additional column that is either 1 or 0. 1 if v0 value is in the values of v1 to v4, and 0 if it's not. So, in this example for id 1 then the value should be 1 (since v2 = 10) and for id 2 value should be 0 since 22 is not in v1 thru v4.



In reality the table is way bigger (around 100,000 rows and variables go from v1 to v99).










share|improve this question

























    up vote
    7
    down vote

    favorite
    1












    I have a DataFrame like this (simplified example)



    id v0 v1 v2 v3 v4
    1 10 5 10 22 50
    2 22 23 55 60 50
    3 8 2 40 80 110
    4 15 15 25 100 101


    And would like to create an additional column that is either 1 or 0. 1 if v0 value is in the values of v1 to v4, and 0 if it's not. So, in this example for id 1 then the value should be 1 (since v2 = 10) and for id 2 value should be 0 since 22 is not in v1 thru v4.



    In reality the table is way bigger (around 100,000 rows and variables go from v1 to v99).










    share|improve this question























      up vote
      7
      down vote

      favorite
      1









      up vote
      7
      down vote

      favorite
      1






      1





      I have a DataFrame like this (simplified example)



      id v0 v1 v2 v3 v4
      1 10 5 10 22 50
      2 22 23 55 60 50
      3 8 2 40 80 110
      4 15 15 25 100 101


      And would like to create an additional column that is either 1 or 0. 1 if v0 value is in the values of v1 to v4, and 0 if it's not. So, in this example for id 1 then the value should be 1 (since v2 = 10) and for id 2 value should be 0 since 22 is not in v1 thru v4.



      In reality the table is way bigger (around 100,000 rows and variables go from v1 to v99).










      share|improve this question













      I have a DataFrame like this (simplified example)



      id v0 v1 v2 v3 v4
      1 10 5 10 22 50
      2 22 23 55 60 50
      3 8 2 40 80 110
      4 15 15 25 100 101


      And would like to create an additional column that is either 1 or 0. 1 if v0 value is in the values of v1 to v4, and 0 if it's not. So, in this example for id 1 then the value should be 1 (since v2 = 10) and for id 2 value should be 0 since 22 is not in v1 thru v4.



      In reality the table is way bigger (around 100,000 rows and variables go from v1 to v99).







      python pandas numpy dataframe






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 1 hour ago









      EGM8686

      532




      532






















          6 Answers
          6






          active

          oldest

          votes

















          up vote
          3
          down vote













          You can use the underlying numpy arrays for performance:



          Setup



          a = df.v0.values
          b = df.iloc[:, 2:].values



          df.assign(out=(a[:, None]==b).any(1).astype(int))




           id v0 v1 v2 v3 v4 out
          0 1 10 5 10 22 50 1
          1 2 22 23 55 60 50 0
          2 3 8 2 40 80 110 0
          3 4 15 15 25 100 101 1



          This solution leverages broadcasting to allow for pairwise comparison:



          First, we broadcast a:



          >>> a[:, None]
          array([[10],
          [22],
          [ 8],
          [15]], dtype=int64)


          Which allows for pairwise comparison with b:



          >>> a[:, None] == b
          array([[False, True, False, False],
          [False, False, False, False],
          [False, False, False, False],
          [ True, False, False, False]])


          We then simply check for any True results along the first axis, and convert to integer.




          Performance




          Functions



          def user_chris(df):
          a = df.v0.values
          b = df.iloc[:, 2:].values
          return (a[:, None]==b).any(1).astype(int)

          def rahlf23(df):
          df = df.set_index('id')
          return df.drop('v0', 1).isin(df['v0']).any(1).astype(int)

          def chris_a(df):
          return df.loc[:, "v1":].eq(df['v0'], 0).any(1).astype(int)

          def chris(df):
          return df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

          def anton_vbr(df):
          df.set_index('id', inplace=True)
          return df.isin(df.pop('v0')).any(1).astype(int)


          Setup



          import pandas as pd
          import numpy as np
          import matplotlib.pyplot as plt
          from timeit import timeit

          res = pd.DataFrame(
          index=['user_chris', 'rahlf23', 'chris_a', 'chris', 'anton_vbr'],
          columns=[10, 50, 100, 500, 1000, 5000],
          dtype=float
          )

          for f in res.index:
          for c in res.columns:
          vals = np.random.randint(1, 100, (c, c))
          vals = np.column_stack((np.arange(vals.shape[0]), vals))
          df = pd.DataFrame(vals, columns=['id'] + [f'vi' for i in range(0, vals.shape[0])])
          stmt = '(df)'.format(f)
          setp = 'from __main__ import df, '.format(f)
          res.at[f, c] = timeit(stmt, setp, number=50)

          ax = res.div(res.min()).T.plot(loglog=True)
          ax.set_xlabel("N");
          ax.set_ylabel("time (relative)");

          plt.show()


          Output



          enter image description here






          share|improve this answer


















          • 1




            Incredible effort, +1 for sure. Sure are a lot of Chris’ in here :)
            – Chris A
            6 mins ago










          • Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
            – user3483203
            4 mins ago


















          up vote
          2
          down vote













          How about:



          df['new_col'] = df.loc[:, "v1":].eq(df['v0'],0).any(1).astype(int)


          [out]



           id v0 v1 v2 v3 v4 new_col
          0 1 10 5 10 22 50 1
          1 2 22 23 55 60 50 0
          2 3 8 2 40 80 110 0
          3 4 15 15 25 100 101 1





          share|improve this answer


















          • 2




            What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
            – DSM
            1 hour ago

















          up vote
          1
          down vote













          You can also use a lambda function:



          df['newCol'] = df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

          id v0 v1 v2 v3 v4 newCol
          0 1 10 5 10 22 50 1
          1 2 22 23 55 60 50 0
          2 3 8 2 40 80 110 0
          3 4 15 15 25 100 101 1





          share|improve this answer



























            up vote
            1
            down vote













            I'm assuming here that id is set to be your dataframe index here:



            df = df.set_index('id')


            Then the following should work (similar answer here):



            df['New'] = df.drop('v0', 1).isin(df['v0']).any(1).astype(int)


            Gives:



             v0 v1 v2 v3 v4 New
            id
            1 10 5 10 22 50 1
            2 22 23 55 60 50 0
            3 8 2 40 80 110 0
            4 15 15 25 100 101 1





            share|improve this answer






















            • @Wen: really? For me pd.DataFrame("A": [1,2,3]).isin([3,2,1]) gives Trues, but pd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1])) gives False, True, False, which makes it seem like it is respecting the index.
              – DSM
              52 mins ago











            • @DSM yes you are right :-) my leak of knowledge :-(
              – Wen
              50 mins ago






            • 1




              @rahlf23: because you're only dropping v0 in principle you could get yourself into trouble if an id value collides with something in the v columns.
              – DSM
              49 mins ago










            • I just noticed that as well, thank you for pointing that out. Let me account for that.
              – rahlf23
              49 mins ago

















            up vote
            1
            down vote













            Another take, most likely the smallest syntax:



            df['new'] = df.isin(df.pop('v0')).any(1).astype(int)


            Full proof:



            import pandas as pd

            data = '''
            id v0 v1 v2 v3 v4
            1 10 5 10 22 50
            2 22 23 55 60 50
            3 8 2 40 80 110
            4 15 15 25 100 101'''

            df = pd.read_csv(pd.compat.StringIO(data), sep='s+')
            df.set_index('id', inplace=True)
            df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
            print(df)


            Returns:



             v1 v2 v3 v4 new
            id
            1 5 10 22 50 1
            2 23 55 60 50 0
            3 2 40 80 110 0
            4 15 25 100 101 1





            share|improve this answer



























              up vote
              0
              down vote













              df["exists"] = df.drop(["v0"], 1).isin(df["v0"]).any(1).astype(int)


              output



               id v0 v1 v2 v3 v4 exists
              0 1 10 5 10 22 50 1
              1 2 22 23 55 60 50 0
              2 3 8 2 40 80 110 0
              3 4 15 15 25 100 101 1





              share|improve this answer








              New contributor




              Sebabrata Ghosh is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.













              • 2




                This seems identical to this answer
                – user3483203
                35 mins ago











              Your Answer





              StackExchange.ifUsing("editor", function ()
              StackExchange.using("externalEditor", function ()
              StackExchange.using("snippets", function ()
              StackExchange.snippets.init();
              );
              );
              , "code-snippets");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "1"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              convertImagesToLinks: true,
              noModals: false,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













               

              draft saved


              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52393659%2fpandas-dataframe-check-if-column-value-exists-in-a-group-of-columns%23new-answer', 'question_page');

              );

              Post as a guest






























              6 Answers
              6






              active

              oldest

              votes








              6 Answers
              6






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              3
              down vote













              You can use the underlying numpy arrays for performance:



              Setup



              a = df.v0.values
              b = df.iloc[:, 2:].values



              df.assign(out=(a[:, None]==b).any(1).astype(int))




               id v0 v1 v2 v3 v4 out
              0 1 10 5 10 22 50 1
              1 2 22 23 55 60 50 0
              2 3 8 2 40 80 110 0
              3 4 15 15 25 100 101 1



              This solution leverages broadcasting to allow for pairwise comparison:



              First, we broadcast a:



              >>> a[:, None]
              array([[10],
              [22],
              [ 8],
              [15]], dtype=int64)


              Which allows for pairwise comparison with b:



              >>> a[:, None] == b
              array([[False, True, False, False],
              [False, False, False, False],
              [False, False, False, False],
              [ True, False, False, False]])


              We then simply check for any True results along the first axis, and convert to integer.




              Performance




              Functions



              def user_chris(df):
              a = df.v0.values
              b = df.iloc[:, 2:].values
              return (a[:, None]==b).any(1).astype(int)

              def rahlf23(df):
              df = df.set_index('id')
              return df.drop('v0', 1).isin(df['v0']).any(1).astype(int)

              def chris_a(df):
              return df.loc[:, "v1":].eq(df['v0'], 0).any(1).astype(int)

              def chris(df):
              return df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

              def anton_vbr(df):
              df.set_index('id', inplace=True)
              return df.isin(df.pop('v0')).any(1).astype(int)


              Setup



              import pandas as pd
              import numpy as np
              import matplotlib.pyplot as plt
              from timeit import timeit

              res = pd.DataFrame(
              index=['user_chris', 'rahlf23', 'chris_a', 'chris', 'anton_vbr'],
              columns=[10, 50, 100, 500, 1000, 5000],
              dtype=float
              )

              for f in res.index:
              for c in res.columns:
              vals = np.random.randint(1, 100, (c, c))
              vals = np.column_stack((np.arange(vals.shape[0]), vals))
              df = pd.DataFrame(vals, columns=['id'] + [f'vi' for i in range(0, vals.shape[0])])
              stmt = '(df)'.format(f)
              setp = 'from __main__ import df, '.format(f)
              res.at[f, c] = timeit(stmt, setp, number=50)

              ax = res.div(res.min()).T.plot(loglog=True)
              ax.set_xlabel("N");
              ax.set_ylabel("time (relative)");

              plt.show()


              Output



              enter image description here






              share|improve this answer


















              • 1




                Incredible effort, +1 for sure. Sure are a lot of Chris’ in here :)
                – Chris A
                6 mins ago










              • Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
                – user3483203
                4 mins ago















              up vote
              3
              down vote













              You can use the underlying numpy arrays for performance:



              Setup



              a = df.v0.values
              b = df.iloc[:, 2:].values



              df.assign(out=(a[:, None]==b).any(1).astype(int))




               id v0 v1 v2 v3 v4 out
              0 1 10 5 10 22 50 1
              1 2 22 23 55 60 50 0
              2 3 8 2 40 80 110 0
              3 4 15 15 25 100 101 1



              This solution leverages broadcasting to allow for pairwise comparison:



              First, we broadcast a:



              >>> a[:, None]
              array([[10],
              [22],
              [ 8],
              [15]], dtype=int64)


              Which allows for pairwise comparison with b:



              >>> a[:, None] == b
              array([[False, True, False, False],
              [False, False, False, False],
              [False, False, False, False],
              [ True, False, False, False]])


              We then simply check for any True results along the first axis, and convert to integer.




              Performance




              Functions



              def user_chris(df):
              a = df.v0.values
              b = df.iloc[:, 2:].values
              return (a[:, None]==b).any(1).astype(int)

              def rahlf23(df):
              df = df.set_index('id')
              return df.drop('v0', 1).isin(df['v0']).any(1).astype(int)

              def chris_a(df):
              return df.loc[:, "v1":].eq(df['v0'], 0).any(1).astype(int)

              def chris(df):
              return df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

              def anton_vbr(df):
              df.set_index('id', inplace=True)
              return df.isin(df.pop('v0')).any(1).astype(int)


              Setup



              import pandas as pd
              import numpy as np
              import matplotlib.pyplot as plt
              from timeit import timeit

              res = pd.DataFrame(
              index=['user_chris', 'rahlf23', 'chris_a', 'chris', 'anton_vbr'],
              columns=[10, 50, 100, 500, 1000, 5000],
              dtype=float
              )

              for f in res.index:
              for c in res.columns:
              vals = np.random.randint(1, 100, (c, c))
              vals = np.column_stack((np.arange(vals.shape[0]), vals))
              df = pd.DataFrame(vals, columns=['id'] + [f'vi' for i in range(0, vals.shape[0])])
              stmt = '(df)'.format(f)
              setp = 'from __main__ import df, '.format(f)
              res.at[f, c] = timeit(stmt, setp, number=50)

              ax = res.div(res.min()).T.plot(loglog=True)
              ax.set_xlabel("N");
              ax.set_ylabel("time (relative)");

              plt.show()


              Output



              enter image description here






              share|improve this answer


















              • 1




                Incredible effort, +1 for sure. Sure are a lot of Chris’ in here :)
                – Chris A
                6 mins ago










              • Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
                – user3483203
                4 mins ago













              up vote
              3
              down vote










              up vote
              3
              down vote









              You can use the underlying numpy arrays for performance:



              Setup



              a = df.v0.values
              b = df.iloc[:, 2:].values



              df.assign(out=(a[:, None]==b).any(1).astype(int))




               id v0 v1 v2 v3 v4 out
              0 1 10 5 10 22 50 1
              1 2 22 23 55 60 50 0
              2 3 8 2 40 80 110 0
              3 4 15 15 25 100 101 1



              This solution leverages broadcasting to allow for pairwise comparison:



              First, we broadcast a:



              >>> a[:, None]
              array([[10],
              [22],
              [ 8],
              [15]], dtype=int64)


              Which allows for pairwise comparison with b:



              >>> a[:, None] == b
              array([[False, True, False, False],
              [False, False, False, False],
              [False, False, False, False],
              [ True, False, False, False]])


              We then simply check for any True results along the first axis, and convert to integer.




              Performance




              Functions



              def user_chris(df):
              a = df.v0.values
              b = df.iloc[:, 2:].values
              return (a[:, None]==b).any(1).astype(int)

              def rahlf23(df):
              df = df.set_index('id')
              return df.drop('v0', 1).isin(df['v0']).any(1).astype(int)

              def chris_a(df):
              return df.loc[:, "v1":].eq(df['v0'], 0).any(1).astype(int)

              def chris(df):
              return df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

              def anton_vbr(df):
              df.set_index('id', inplace=True)
              return df.isin(df.pop('v0')).any(1).astype(int)


              Setup



              import pandas as pd
              import numpy as np
              import matplotlib.pyplot as plt
              from timeit import timeit

              res = pd.DataFrame(
              index=['user_chris', 'rahlf23', 'chris_a', 'chris', 'anton_vbr'],
              columns=[10, 50, 100, 500, 1000, 5000],
              dtype=float
              )

              for f in res.index:
              for c in res.columns:
              vals = np.random.randint(1, 100, (c, c))
              vals = np.column_stack((np.arange(vals.shape[0]), vals))
              df = pd.DataFrame(vals, columns=['id'] + [f'vi' for i in range(0, vals.shape[0])])
              stmt = '(df)'.format(f)
              setp = 'from __main__ import df, '.format(f)
              res.at[f, c] = timeit(stmt, setp, number=50)

              ax = res.div(res.min()).T.plot(loglog=True)
              ax.set_xlabel("N");
              ax.set_ylabel("time (relative)");

              plt.show()


              Output



              enter image description here






              share|improve this answer














              You can use the underlying numpy arrays for performance:



              Setup



              a = df.v0.values
              b = df.iloc[:, 2:].values



              df.assign(out=(a[:, None]==b).any(1).astype(int))




               id v0 v1 v2 v3 v4 out
              0 1 10 5 10 22 50 1
              1 2 22 23 55 60 50 0
              2 3 8 2 40 80 110 0
              3 4 15 15 25 100 101 1



              This solution leverages broadcasting to allow for pairwise comparison:



              First, we broadcast a:



              >>> a[:, None]
              array([[10],
              [22],
              [ 8],
              [15]], dtype=int64)


              Which allows for pairwise comparison with b:



              >>> a[:, None] == b
              array([[False, True, False, False],
              [False, False, False, False],
              [False, False, False, False],
              [ True, False, False, False]])


              We then simply check for any True results along the first axis, and convert to integer.




              Performance




              Functions



              def user_chris(df):
              a = df.v0.values
              b = df.iloc[:, 2:].values
              return (a[:, None]==b).any(1).astype(int)

              def rahlf23(df):
              df = df.set_index('id')
              return df.drop('v0', 1).isin(df['v0']).any(1).astype(int)

              def chris_a(df):
              return df.loc[:, "v1":].eq(df['v0'], 0).any(1).astype(int)

              def chris(df):
              return df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

              def anton_vbr(df):
              df.set_index('id', inplace=True)
              return df.isin(df.pop('v0')).any(1).astype(int)


              Setup



              import pandas as pd
              import numpy as np
              import matplotlib.pyplot as plt
              from timeit import timeit

              res = pd.DataFrame(
              index=['user_chris', 'rahlf23', 'chris_a', 'chris', 'anton_vbr'],
              columns=[10, 50, 100, 500, 1000, 5000],
              dtype=float
              )

              for f in res.index:
              for c in res.columns:
              vals = np.random.randint(1, 100, (c, c))
              vals = np.column_stack((np.arange(vals.shape[0]), vals))
              df = pd.DataFrame(vals, columns=['id'] + [f'vi' for i in range(0, vals.shape[0])])
              stmt = '(df)'.format(f)
              setp = 'from __main__ import df, '.format(f)
              res.at[f, c] = timeit(stmt, setp, number=50)

              ax = res.div(res.min()).T.plot(loglog=True)
              ax.set_xlabel("N");
              ax.set_ylabel("time (relative)");

              plt.show()


              Output



              enter image description here







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 26 mins ago

























              answered 1 hour ago









              user3483203

              24.2k62147




              24.2k62147







              • 1




                Incredible effort, +1 for sure. Sure are a lot of Chris’ in here :)
                – Chris A
                6 mins ago










              • Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
                – user3483203
                4 mins ago













              • 1




                Incredible effort, +1 for sure. Sure are a lot of Chris’ in here :)
                – Chris A
                6 mins ago










              • Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
                – user3483203
                4 mins ago








              1




              1




              Incredible effort, +1 for sure. Sure are a lot of Chris’ in here :)
              – Chris A
              6 mins ago




              Incredible effort, +1 for sure. Sure are a lot of Chris’ in here :)
              – Chris A
              6 mins ago












              Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
              – user3483203
              4 mins ago





              Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
              – user3483203
              4 mins ago













              up vote
              2
              down vote













              How about:



              df['new_col'] = df.loc[:, "v1":].eq(df['v0'],0).any(1).astype(int)


              [out]



               id v0 v1 v2 v3 v4 new_col
              0 1 10 5 10 22 50 1
              1 2 22 23 55 60 50 0
              2 3 8 2 40 80 110 0
              3 4 15 15 25 100 101 1





              share|improve this answer


















              • 2




                What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
                – DSM
                1 hour ago














              up vote
              2
              down vote













              How about:



              df['new_col'] = df.loc[:, "v1":].eq(df['v0'],0).any(1).astype(int)


              [out]



               id v0 v1 v2 v3 v4 new_col
              0 1 10 5 10 22 50 1
              1 2 22 23 55 60 50 0
              2 3 8 2 40 80 110 0
              3 4 15 15 25 100 101 1





              share|improve this answer


















              • 2




                What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
                – DSM
                1 hour ago












              up vote
              2
              down vote










              up vote
              2
              down vote









              How about:



              df['new_col'] = df.loc[:, "v1":].eq(df['v0'],0).any(1).astype(int)


              [out]



               id v0 v1 v2 v3 v4 new_col
              0 1 10 5 10 22 50 1
              1 2 22 23 55 60 50 0
              2 3 8 2 40 80 110 0
              3 4 15 15 25 100 101 1





              share|improve this answer














              How about:



              df['new_col'] = df.loc[:, "v1":].eq(df['v0'],0).any(1).astype(int)


              [out]



               id v0 v1 v2 v3 v4 new_col
              0 1 10 5 10 22 50 1
              1 2 22 23 55 60 50 0
              2 3 8 2 40 80 110 0
              3 4 15 15 25 100 101 1






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 1 hour ago

























              answered 1 hour ago









              Chris A

              1,07619




              1,07619







              • 2




                What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
                – DSM
                1 hour ago












              • 2




                What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
                – DSM
                1 hour ago







              2




              2




              What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
              – DSM
              1 hour ago




              What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
              – DSM
              1 hour ago










              up vote
              1
              down vote













              You can also use a lambda function:



              df['newCol'] = df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

              id v0 v1 v2 v3 v4 newCol
              0 1 10 5 10 22 50 1
              1 2 22 23 55 60 50 0
              2 3 8 2 40 80 110 0
              3 4 15 15 25 100 101 1





              share|improve this answer
























                up vote
                1
                down vote













                You can also use a lambda function:



                df['newCol'] = df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

                id v0 v1 v2 v3 v4 newCol
                0 1 10 5 10 22 50 1
                1 2 22 23 55 60 50 0
                2 3 8 2 40 80 110 0
                3 4 15 15 25 100 101 1





                share|improve this answer






















                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  You can also use a lambda function:



                  df['newCol'] = df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

                  id v0 v1 v2 v3 v4 newCol
                  0 1 10 5 10 22 50 1
                  1 2 22 23 55 60 50 0
                  2 3 8 2 40 80 110 0
                  3 4 15 15 25 100 101 1





                  share|improve this answer












                  You can also use a lambda function:



                  df['newCol'] = df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

                  id v0 v1 v2 v3 v4 newCol
                  0 1 10 5 10 22 50 1
                  1 2 22 23 55 60 50 0
                  2 3 8 2 40 80 110 0
                  3 4 15 15 25 100 101 1






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 1 hour ago









                  Chris

                  58318




                  58318




















                      up vote
                      1
                      down vote













                      I'm assuming here that id is set to be your dataframe index here:



                      df = df.set_index('id')


                      Then the following should work (similar answer here):



                      df['New'] = df.drop('v0', 1).isin(df['v0']).any(1).astype(int)


                      Gives:



                       v0 v1 v2 v3 v4 New
                      id
                      1 10 5 10 22 50 1
                      2 22 23 55 60 50 0
                      3 8 2 40 80 110 0
                      4 15 15 25 100 101 1





                      share|improve this answer






















                      • @Wen: really? For me pd.DataFrame("A": [1,2,3]).isin([3,2,1]) gives Trues, but pd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1])) gives False, True, False, which makes it seem like it is respecting the index.
                        – DSM
                        52 mins ago











                      • @DSM yes you are right :-) my leak of knowledge :-(
                        – Wen
                        50 mins ago






                      • 1




                        @rahlf23: because you're only dropping v0 in principle you could get yourself into trouble if an id value collides with something in the v columns.
                        – DSM
                        49 mins ago










                      • I just noticed that as well, thank you for pointing that out. Let me account for that.
                        – rahlf23
                        49 mins ago














                      up vote
                      1
                      down vote













                      I'm assuming here that id is set to be your dataframe index here:



                      df = df.set_index('id')


                      Then the following should work (similar answer here):



                      df['New'] = df.drop('v0', 1).isin(df['v0']).any(1).astype(int)


                      Gives:



                       v0 v1 v2 v3 v4 New
                      id
                      1 10 5 10 22 50 1
                      2 22 23 55 60 50 0
                      3 8 2 40 80 110 0
                      4 15 15 25 100 101 1





                      share|improve this answer






















                      • @Wen: really? For me pd.DataFrame("A": [1,2,3]).isin([3,2,1]) gives Trues, but pd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1])) gives False, True, False, which makes it seem like it is respecting the index.
                        – DSM
                        52 mins ago











                      • @DSM yes you are right :-) my leak of knowledge :-(
                        – Wen
                        50 mins ago






                      • 1




                        @rahlf23: because you're only dropping v0 in principle you could get yourself into trouble if an id value collides with something in the v columns.
                        – DSM
                        49 mins ago










                      • I just noticed that as well, thank you for pointing that out. Let me account for that.
                        – rahlf23
                        49 mins ago












                      up vote
                      1
                      down vote










                      up vote
                      1
                      down vote









                      I'm assuming here that id is set to be your dataframe index here:



                      df = df.set_index('id')


                      Then the following should work (similar answer here):



                      df['New'] = df.drop('v0', 1).isin(df['v0']).any(1).astype(int)


                      Gives:



                       v0 v1 v2 v3 v4 New
                      id
                      1 10 5 10 22 50 1
                      2 22 23 55 60 50 0
                      3 8 2 40 80 110 0
                      4 15 15 25 100 101 1





                      share|improve this answer














                      I'm assuming here that id is set to be your dataframe index here:



                      df = df.set_index('id')


                      Then the following should work (similar answer here):



                      df['New'] = df.drop('v0', 1).isin(df['v0']).any(1).astype(int)


                      Gives:



                       v0 v1 v2 v3 v4 New
                      id
                      1 10 5 10 22 50 1
                      2 22 23 55 60 50 0
                      3 8 2 40 80 110 0
                      4 15 15 25 100 101 1






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited 45 mins ago

























                      answered 1 hour ago









                      rahlf23

                      2,0701220




                      2,0701220











                      • @Wen: really? For me pd.DataFrame("A": [1,2,3]).isin([3,2,1]) gives Trues, but pd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1])) gives False, True, False, which makes it seem like it is respecting the index.
                        – DSM
                        52 mins ago











                      • @DSM yes you are right :-) my leak of knowledge :-(
                        – Wen
                        50 mins ago






                      • 1




                        @rahlf23: because you're only dropping v0 in principle you could get yourself into trouble if an id value collides with something in the v columns.
                        – DSM
                        49 mins ago










                      • I just noticed that as well, thank you for pointing that out. Let me account for that.
                        – rahlf23
                        49 mins ago
















                      • @Wen: really? For me pd.DataFrame("A": [1,2,3]).isin([3,2,1]) gives Trues, but pd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1])) gives False, True, False, which makes it seem like it is respecting the index.
                        – DSM
                        52 mins ago











                      • @DSM yes you are right :-) my leak of knowledge :-(
                        – Wen
                        50 mins ago






                      • 1




                        @rahlf23: because you're only dropping v0 in principle you could get yourself into trouble if an id value collides with something in the v columns.
                        – DSM
                        49 mins ago










                      • I just noticed that as well, thank you for pointing that out. Let me account for that.
                        – rahlf23
                        49 mins ago















                      @Wen: really? For me pd.DataFrame("A": [1,2,3]).isin([3,2,1]) gives Trues, but pd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1])) gives False, True, False, which makes it seem like it is respecting the index.
                      – DSM
                      52 mins ago





                      @Wen: really? For me pd.DataFrame("A": [1,2,3]).isin([3,2,1]) gives Trues, but pd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1])) gives False, True, False, which makes it seem like it is respecting the index.
                      – DSM
                      52 mins ago













                      @DSM yes you are right :-) my leak of knowledge :-(
                      – Wen
                      50 mins ago




                      @DSM yes you are right :-) my leak of knowledge :-(
                      – Wen
                      50 mins ago




                      1




                      1




                      @rahlf23: because you're only dropping v0 in principle you could get yourself into trouble if an id value collides with something in the v columns.
                      – DSM
                      49 mins ago




                      @rahlf23: because you're only dropping v0 in principle you could get yourself into trouble if an id value collides with something in the v columns.
                      – DSM
                      49 mins ago












                      I just noticed that as well, thank you for pointing that out. Let me account for that.
                      – rahlf23
                      49 mins ago




                      I just noticed that as well, thank you for pointing that out. Let me account for that.
                      – rahlf23
                      49 mins ago










                      up vote
                      1
                      down vote













                      Another take, most likely the smallest syntax:



                      df['new'] = df.isin(df.pop('v0')).any(1).astype(int)


                      Full proof:



                      import pandas as pd

                      data = '''
                      id v0 v1 v2 v3 v4
                      1 10 5 10 22 50
                      2 22 23 55 60 50
                      3 8 2 40 80 110
                      4 15 15 25 100 101'''

                      df = pd.read_csv(pd.compat.StringIO(data), sep='s+')
                      df.set_index('id', inplace=True)
                      df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
                      print(df)


                      Returns:



                       v1 v2 v3 v4 new
                      id
                      1 5 10 22 50 1
                      2 23 55 60 50 0
                      3 2 40 80 110 0
                      4 15 25 100 101 1





                      share|improve this answer
























                        up vote
                        1
                        down vote













                        Another take, most likely the smallest syntax:



                        df['new'] = df.isin(df.pop('v0')).any(1).astype(int)


                        Full proof:



                        import pandas as pd

                        data = '''
                        id v0 v1 v2 v3 v4
                        1 10 5 10 22 50
                        2 22 23 55 60 50
                        3 8 2 40 80 110
                        4 15 15 25 100 101'''

                        df = pd.read_csv(pd.compat.StringIO(data), sep='s+')
                        df.set_index('id', inplace=True)
                        df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
                        print(df)


                        Returns:



                         v1 v2 v3 v4 new
                        id
                        1 5 10 22 50 1
                        2 23 55 60 50 0
                        3 2 40 80 110 0
                        4 15 25 100 101 1





                        share|improve this answer






















                          up vote
                          1
                          down vote










                          up vote
                          1
                          down vote









                          Another take, most likely the smallest syntax:



                          df['new'] = df.isin(df.pop('v0')).any(1).astype(int)


                          Full proof:



                          import pandas as pd

                          data = '''
                          id v0 v1 v2 v3 v4
                          1 10 5 10 22 50
                          2 22 23 55 60 50
                          3 8 2 40 80 110
                          4 15 15 25 100 101'''

                          df = pd.read_csv(pd.compat.StringIO(data), sep='s+')
                          df.set_index('id', inplace=True)
                          df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
                          print(df)


                          Returns:



                           v1 v2 v3 v4 new
                          id
                          1 5 10 22 50 1
                          2 23 55 60 50 0
                          3 2 40 80 110 0
                          4 15 25 100 101 1





                          share|improve this answer












                          Another take, most likely the smallest syntax:



                          df['new'] = df.isin(df.pop('v0')).any(1).astype(int)


                          Full proof:



                          import pandas as pd

                          data = '''
                          id v0 v1 v2 v3 v4
                          1 10 5 10 22 50
                          2 22 23 55 60 50
                          3 8 2 40 80 110
                          4 15 15 25 100 101'''

                          df = pd.read_csv(pd.compat.StringIO(data), sep='s+')
                          df.set_index('id', inplace=True)
                          df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
                          print(df)


                          Returns:



                           v1 v2 v3 v4 new
                          id
                          1 5 10 22 50 1
                          2 23 55 60 50 0
                          3 2 40 80 110 0
                          4 15 25 100 101 1






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered 42 mins ago









                          Anton vBR

                          10.3k2922




                          10.3k2922




















                              up vote
                              0
                              down vote













                              df["exists"] = df.drop(["v0"], 1).isin(df["v0"]).any(1).astype(int)


                              output



                               id v0 v1 v2 v3 v4 exists
                              0 1 10 5 10 22 50 1
                              1 2 22 23 55 60 50 0
                              2 3 8 2 40 80 110 0
                              3 4 15 15 25 100 101 1





                              share|improve this answer








                              New contributor




                              Sebabrata Ghosh is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                              Check out our Code of Conduct.













                              • 2




                                This seems identical to this answer
                                – user3483203
                                35 mins ago















                              up vote
                              0
                              down vote













                              df["exists"] = df.drop(["v0"], 1).isin(df["v0"]).any(1).astype(int)


                              output



                               id v0 v1 v2 v3 v4 exists
                              0 1 10 5 10 22 50 1
                              1 2 22 23 55 60 50 0
                              2 3 8 2 40 80 110 0
                              3 4 15 15 25 100 101 1





                              share|improve this answer








                              New contributor




                              Sebabrata Ghosh is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                              Check out our Code of Conduct.













                              • 2




                                This seems identical to this answer
                                – user3483203
                                35 mins ago













                              up vote
                              0
                              down vote










                              up vote
                              0
                              down vote









                              df["exists"] = df.drop(["v0"], 1).isin(df["v0"]).any(1).astype(int)


                              output



                               id v0 v1 v2 v3 v4 exists
                              0 1 10 5 10 22 50 1
                              1 2 22 23 55 60 50 0
                              2 3 8 2 40 80 110 0
                              3 4 15 15 25 100 101 1





                              share|improve this answer








                              New contributor




                              Sebabrata Ghosh is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                              Check out our Code of Conduct.









                              df["exists"] = df.drop(["v0"], 1).isin(df["v0"]).any(1).astype(int)


                              output



                               id v0 v1 v2 v3 v4 exists
                              0 1 10 5 10 22 50 1
                              1 2 22 23 55 60 50 0
                              2 3 8 2 40 80 110 0
                              3 4 15 15 25 100 101 1






                              share|improve this answer








                              New contributor




                              Sebabrata Ghosh is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                              Check out our Code of Conduct.









                              share|improve this answer



                              share|improve this answer






                              New contributor




                              Sebabrata Ghosh is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                              Check out our Code of Conduct.









                              answered 36 mins ago









                              Sebabrata Ghosh

                              1




                              1




                              New contributor




                              Sebabrata Ghosh is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                              Check out our Code of Conduct.





                              New contributor





                              Sebabrata Ghosh is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                              Check out our Code of Conduct.






                              Sebabrata Ghosh is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                              Check out our Code of Conduct.







                              • 2




                                This seems identical to this answer
                                – user3483203
                                35 mins ago













                              • 2




                                This seems identical to this answer
                                – user3483203
                                35 mins ago








                              2




                              2




                              This seems identical to this answer
                              – user3483203
                              35 mins ago





                              This seems identical to this answer
                              – user3483203
                              35 mins ago


















                               

                              draft saved


                              draft discarded















































                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52393659%2fpandas-dataframe-check-if-column-value-exists-in-a-group-of-columns%23new-answer', 'question_page');

                              );

                              Post as a guest













































































                              Comments

                              Popular posts from this blog

                              What does second last employer means? [closed]

                              Installing NextGIS Connect into QGIS 3?

                              One-line joke