Pandas DataFrame check if column value exists in a group of columns

up vote
7
down vote

favorite

I have a DataFrame like this (simplified example)

id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101

And would like to create an additional column that is either 1 or 0. 1 if v0 value is in the values of v1 to v4, and 0 if it's not. So, in this example for id 1 then the value should be 1 (since v2 = 10) and for id 2 value should be 0 since 22 is not in v1 thru v4.

In reality the table is way bigger (around 100,000 rows and variables go from v1 to v99).

asked 1 hour ago

EGM8686

532

add a commentÂ |Â

up vote
7
down vote

favorite

I have a DataFrame like this (simplified example)

id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101

In reality the table is way bigger (around 100,000 rows and variables go from v1 to v99).

asked 1 hour ago

EGM8686

532

add a commentÂ |Â

up vote
7
down vote

favorite

I have a DataFrame like this (simplified example)

id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101

In reality the table is way bigger (around 100,000 rows and variables go from v1 to v99).

asked 1 hour ago

EGM8686

532

I have a DataFrame like this (simplified example)

id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101

In reality the table is way bigger (around 100,000 rows and variables go from v1 to v99).

python pandas numpy dataframe

asked 1 hour ago

EGM8686

532

asked 1 hour ago

EGM8686

532

asked 1 hour ago

EGM8686

532

asked 1 hour ago

EGM8686

532

asked 1 hour ago

EGM8686

532

add a commentÂ |Â

6 Answers
6

active

oldest

votes

up vote
3
down vote

You can use the underlying numpy arrays for performance:

Setup

a = df.v0.values
b = df.iloc[:, 2:].values

df.assign(out=(a[:, None]==b).any(1).astype(int))

 id v0 v1 v2 v3 v4 out
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

This solution leverages broadcasting to allow for pairwise comparison:

First, we broadcast a:

>>> a[:, None]
array([[10],
 [22],
 [ 8],
 [15]], dtype=int64)

Which allows for pairwise comparison with b:

>>> a[:, None] == b
array([[False, True, False, False],
 [False, False, False, False],
 [False, False, False, False],
 [ True, False, False, False]])

We then simply check for any True results along the first axis, and convert to integer.

Performance

Functions

def user_chris(df):
 a = df.v0.values
 b = df.iloc[:, 2:].values
 return (a[:, None]==b).any(1).astype(int)

def rahlf23(df):
 df = df.set_index('id')
 return df.drop('v0', 1).isin(df['v0']).any(1).astype(int)

def chris_a(df):
 return df.loc[:, "v1":].eq(df['v0'], 0).any(1).astype(int)

def chris(df):
 return df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

def anton_vbr(df):
 df.set_index('id', inplace=True)
 return df.isin(df.pop('v0')).any(1).astype(int)

Setup

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from timeit import timeit

res = pd.DataFrame(
 index=['user_chris', 'rahlf23', 'chris_a', 'chris', 'anton_vbr'],
 columns=[10, 50, 100, 500, 1000, 5000],
 dtype=float
)

for f in res.index:
 for c in res.columns:
 vals = np.random.randint(1, 100, (c, c))
 vals = np.column_stack((np.arange(vals.shape[0]), vals))
 df = pd.DataFrame(vals, columns=['id'] + [f'vi' for i in range(0, vals.shape[0])])
 stmt = '(df)'.format(f)
 setp = 'from __main__ import df, '.format(f)
 res.at[f, c] = timeit(stmt, setp, number=50)

ax = res.div(res.min()).T.plot(loglog=True)
ax.set_xlabel("N");
ax.set_ylabel("time (relative)");

plt.show()

Output

enter image description here

edited 26 mins ago

answered 1 hour ago

user3483203

24.2k62147

1

Incredible effort, +1 for sure. Sure are a lot of ChrisÃ¢Â€Â™ in here :)
â€“Â Chris A
6 mins ago

Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
â€“Â user3483203
4 mins ago

add a commentÂ |Â

up vote
2
down vote

How about:

df['new_col'] = df.loc[:, "v1":].eq(df['v0'],0).any(1).astype(int)

[out]

 id v0 v1 v2 v3 v4 new_col
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

edited 1 hour ago

answered 1 hour ago

Chris A

1,07619

2

What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
â€“Â DSM
1 hour ago

add a commentÂ |Â

up vote
1
down vote

You can also use a lambda function:

df['newCol'] = df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

 id v0 v1 v2 v3 v4 newCol
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

answered 1 hour ago

Chris

58318

add a commentÂ |Â

up vote
1
down vote

I'm assuming here that id is set to be your dataframe index here:

df = df.set_index('id')

Then the following should work (similar answer here):

df['New'] = df.drop('v0', 1).isin(df['v0']).any(1).astype(int)

Gives:

 v0 v1 v2 v3 v4 New
id 
1 10 5 10 22 50 1
2 22 23 55 60 50 0
3 8 2 40 80 110 0
4 15 15 25 100 101 1

edited 45 mins ago

answered 1 hour ago

rahlf23

2,0701220

@Wen: really? For me pd.DataFrame("A": [1,2,3]).isin([3,2,1]) gives Trues, but pd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1])) gives False, True, False, which makes it seem like it is respecting the index.
â€“Â DSM
52 mins ago

@DSM yes you are right :-) my leak of knowledge :-(
â€“Â Wen
50 mins ago

1

@rahlf23: because you're only dropping v0 in principle you could get yourself into trouble if an id value collides with something in the v columns.
â€“Â DSM
49 mins ago

I just noticed that as well, thank you for pointing that out. Let me account for that.
â€“Â rahlf23
49 mins ago

add a commentÂ |Â

up vote
1
down vote

Another take, most likely the smallest syntax:

df['new'] = df.isin(df.pop('v0')).any(1).astype(int)

Full proof:

import pandas as pd

data = '''
id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101'''

df = pd.read_csv(pd.compat.StringIO(data), sep='s+')
df.set_index('id', inplace=True)
df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
print(df)

Returns:

 v1 v2 v3 v4 new
id 
1 5 10 22 50 1
2 23 55 60 50 0
3 2 40 80 110 0
4 15 25 100 101 1

answered 42 mins ago

Anton vBR

10.3k2922

add a commentÂ |Â

up vote
0
down vote

df["exists"] = df.drop(["v0"], 1).isin(df["v0"]).any(1).astype(int)

output

 id v0 v1 v2 v3 v4 exists
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

answered 36 mins ago

Sebabrata Ghosh

New contributor

2

This seems identical to this answer
â€“Â user3483203
35 mins ago

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52393659%2fpandas-dataframe-check-if-column-value-exists-in-a-group-of-columns%23new-answer', 'question_page');

);

Post as a guest

Name

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

up vote
3
down vote

You can use the underlying numpy arrays for performance:

Setup

a = df.v0.values
b = df.iloc[:, 2:].values

df.assign(out=(a[:, None]==b).any(1).astype(int))

 id v0 v1 v2 v3 v4 out
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

This solution leverages broadcasting to allow for pairwise comparison:

First, we broadcast a:

>>> a[:, None]
array([[10],
 [22],
 [ 8],
 [15]], dtype=int64)

Which allows for pairwise comparison with b:

>>> a[:, None] == b
array([[False, True, False, False],
 [False, False, False, False],
 [False, False, False, False],
 [ True, False, False, False]])

We then simply check for any True results along the first axis, and convert to integer.

Performance

Functions

def user_chris(df):
 a = df.v0.values
 b = df.iloc[:, 2:].values
 return (a[:, None]==b).any(1).astype(int)

def rahlf23(df):
 df = df.set_index('id')
 return df.drop('v0', 1).isin(df['v0']).any(1).astype(int)

def chris_a(df):
 return df.loc[:, "v1":].eq(df['v0'], 0).any(1).astype(int)

def chris(df):
 return df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

def anton_vbr(df):
 df.set_index('id', inplace=True)
 return df.isin(df.pop('v0')).any(1).astype(int)

Setup

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from timeit import timeit

res = pd.DataFrame(
 index=['user_chris', 'rahlf23', 'chris_a', 'chris', 'anton_vbr'],
 columns=[10, 50, 100, 500, 1000, 5000],
 dtype=float
)

for f in res.index:
 for c in res.columns:
 vals = np.random.randint(1, 100, (c, c))
 vals = np.column_stack((np.arange(vals.shape[0]), vals))
 df = pd.DataFrame(vals, columns=['id'] + [f'vi' for i in range(0, vals.shape[0])])
 stmt = '(df)'.format(f)
 setp = 'from __main__ import df, '.format(f)
 res.at[f, c] = timeit(stmt, setp, number=50)

ax = res.div(res.min()).T.plot(loglog=True)
ax.set_xlabel("N");
ax.set_ylabel("time (relative)");

plt.show()

Output

enter image description here

edited 26 mins ago

answered 1 hour ago

user3483203

24.2k62147

1

Incredible effort, +1 for sure. Sure are a lot of ChrisÃ¢Â€Â™ in here :)
â€“Â Chris A
6 mins ago

Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
â€“Â user3483203
4 mins ago

add a commentÂ |Â

up vote
3
down vote

You can use the underlying numpy arrays for performance:

Setup

a = df.v0.values
b = df.iloc[:, 2:].values

df.assign(out=(a[:, None]==b).any(1).astype(int))

 id v0 v1 v2 v3 v4 out
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

This solution leverages broadcasting to allow for pairwise comparison:

First, we broadcast a:

>>> a[:, None]
array([[10],
 [22],
 [ 8],
 [15]], dtype=int64)

Which allows for pairwise comparison with b:

>>> a[:, None] == b
array([[False, True, False, False],
 [False, False, False, False],
 [False, False, False, False],
 [ True, False, False, False]])

We then simply check for any True results along the first axis, and convert to integer.

Performance

Functions

def user_chris(df):
 a = df.v0.values
 b = df.iloc[:, 2:].values
 return (a[:, None]==b).any(1).astype(int)

def rahlf23(df):
 df = df.set_index('id')
 return df.drop('v0', 1).isin(df['v0']).any(1).astype(int)

def chris_a(df):
 return df.loc[:, "v1":].eq(df['v0'], 0).any(1).astype(int)

def chris(df):
 return df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

def anton_vbr(df):
 df.set_index('id', inplace=True)
 return df.isin(df.pop('v0')).any(1).astype(int)

Setup

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from timeit import timeit

res = pd.DataFrame(
 index=['user_chris', 'rahlf23', 'chris_a', 'chris', 'anton_vbr'],
 columns=[10, 50, 100, 500, 1000, 5000],
 dtype=float
)

for f in res.index:
 for c in res.columns:
 vals = np.random.randint(1, 100, (c, c))
 vals = np.column_stack((np.arange(vals.shape[0]), vals))
 df = pd.DataFrame(vals, columns=['id'] + [f'vi' for i in range(0, vals.shape[0])])
 stmt = '(df)'.format(f)
 setp = 'from __main__ import df, '.format(f)
 res.at[f, c] = timeit(stmt, setp, number=50)

ax = res.div(res.min()).T.plot(loglog=True)
ax.set_xlabel("N");
ax.set_ylabel("time (relative)");

plt.show()

Output

enter image description here

edited 26 mins ago

answered 1 hour ago

user3483203

24.2k62147

1

Incredible effort, +1 for sure. Sure are a lot of ChrisÃ¢Â€Â™ in here :)
â€“Â Chris A
6 mins ago

Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
â€“Â user3483203
4 mins ago

add a commentÂ |Â

up vote
3
down vote

You can use the underlying numpy arrays for performance:

Setup

a = df.v0.values
b = df.iloc[:, 2:].values

df.assign(out=(a[:, None]==b).any(1).astype(int))

 id v0 v1 v2 v3 v4 out
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

This solution leverages broadcasting to allow for pairwise comparison:

First, we broadcast a:

>>> a[:, None]
array([[10],
 [22],
 [ 8],
 [15]], dtype=int64)

Which allows for pairwise comparison with b:

>>> a[:, None] == b
array([[False, True, False, False],
 [False, False, False, False],
 [False, False, False, False],
 [ True, False, False, False]])

We then simply check for any True results along the first axis, and convert to integer.

Performance

Functions

def user_chris(df):
 a = df.v0.values
 b = df.iloc[:, 2:].values
 return (a[:, None]==b).any(1).astype(int)

def rahlf23(df):
 df = df.set_index('id')
 return df.drop('v0', 1).isin(df['v0']).any(1).astype(int)

def chris_a(df):
 return df.loc[:, "v1":].eq(df['v0'], 0).any(1).astype(int)

def chris(df):
 return df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

def anton_vbr(df):
 df.set_index('id', inplace=True)
 return df.isin(df.pop('v0')).any(1).astype(int)

Setup

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from timeit import timeit

res = pd.DataFrame(
 index=['user_chris', 'rahlf23', 'chris_a', 'chris', 'anton_vbr'],
 columns=[10, 50, 100, 500, 1000, 5000],
 dtype=float
)

for f in res.index:
 for c in res.columns:
 vals = np.random.randint(1, 100, (c, c))
 vals = np.column_stack((np.arange(vals.shape[0]), vals))
 df = pd.DataFrame(vals, columns=['id'] + [f'vi' for i in range(0, vals.shape[0])])
 stmt = '(df)'.format(f)
 setp = 'from __main__ import df, '.format(f)
 res.at[f, c] = timeit(stmt, setp, number=50)

ax = res.div(res.min()).T.plot(loglog=True)
ax.set_xlabel("N");
ax.set_ylabel("time (relative)");

plt.show()

Output

enter image description here

edited 26 mins ago

answered 1 hour ago

user3483203

24.2k62147

You can use the underlying numpy arrays for performance:

Setup

a = df.v0.values
b = df.iloc[:, 2:].values

df.assign(out=(a[:, None]==b).any(1).astype(int))

 id v0 v1 v2 v3 v4 out
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

This solution leverages broadcasting to allow for pairwise comparison:

First, we broadcast a:

>>> a[:, None]
array([[10],
 [22],
 [ 8],
 [15]], dtype=int64)

Which allows for pairwise comparison with b:

>>> a[:, None] == b
array([[False, True, False, False],
 [False, False, False, False],
 [False, False, False, False],
 [ True, False, False, False]])

We then simply check for any True results along the first axis, and convert to integer.

Performance

Functions

def user_chris(df):
 a = df.v0.values
 b = df.iloc[:, 2:].values
 return (a[:, None]==b).any(1).astype(int)

def rahlf23(df):
 df = df.set_index('id')
 return df.drop('v0', 1).isin(df['v0']).any(1).astype(int)

def chris_a(df):
 return df.loc[:, "v1":].eq(df['v0'], 0).any(1).astype(int)

def chris(df):
 return df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

def anton_vbr(df):
 df.set_index('id', inplace=True)
 return df.isin(df.pop('v0')).any(1).astype(int)

Setup

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from timeit import timeit

res = pd.DataFrame(
 index=['user_chris', 'rahlf23', 'chris_a', 'chris', 'anton_vbr'],
 columns=[10, 50, 100, 500, 1000, 5000],
 dtype=float
)

for f in res.index:
 for c in res.columns:
 vals = np.random.randint(1, 100, (c, c))
 vals = np.column_stack((np.arange(vals.shape[0]), vals))
 df = pd.DataFrame(vals, columns=['id'] + [f'vi' for i in range(0, vals.shape[0])])
 stmt = '(df)'.format(f)
 setp = 'from __main__ import df, '.format(f)
 res.at[f, c] = timeit(stmt, setp, number=50)

ax = res.div(res.min()).T.plot(loglog=True)
ax.set_xlabel("N");
ax.set_ylabel("time (relative)");

plt.show()

Output

enter image description here

edited 26 mins ago

answered 1 hour ago

user3483203

24.2k62147

edited 26 mins ago

answered 1 hour ago

user3483203

24.2k62147

answered 1 hour ago

user3483203

24.2k62147

answered 1 hour ago

user3483203

24.2k62147

1

Incredible effort, +1 for sure. Sure are a lot of ChrisÃ¢Â€Â™ in here :)
â€“Â Chris A
6 mins ago

Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
â€“Â user3483203
4 mins ago

add a commentÂ |Â

1

Incredible effort, +1 for sure. Sure are a lot of ChrisÃ¢Â€Â™ in here :)
â€“Â Chris A
6 mins ago

Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
â€“Â user3483203
4 mins ago

Incredible effort, +1 for sure. Sure are a lot of ChrisÃ¢Â€Â™ in here :)
â€“Â Chris A
6 mins ago

Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
â€“Â user3483203
4 mins ago

add a commentÂ |Â

up vote
2
down vote

How about:

df['new_col'] = df.loc[:, "v1":].eq(df['v0'],0).any(1).astype(int)

[out]

 id v0 v1 v2 v3 v4 new_col
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

edited 1 hour ago

answered 1 hour ago

Chris A

1,07619

2

What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
â€“Â DSM
1 hour ago

add a commentÂ |Â

up vote
2
down vote

How about:

df['new_col'] = df.loc[:, "v1":].eq(df['v0'],0).any(1).astype(int)

[out]

 id v0 v1 v2 v3 v4 new_col
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

edited 1 hour ago

answered 1 hour ago

Chris A

1,07619

2

What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
â€“Â DSM
1 hour ago

add a commentÂ |Â

up vote
2
down vote

How about:

df['new_col'] = df.loc[:, "v1":].eq(df['v0'],0).any(1).astype(int)

[out]

 id v0 v1 v2 v3 v4 new_col
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

edited 1 hour ago

answered 1 hour ago

Chris A

1,07619

How about:

df['new_col'] = df.loc[:, "v1":].eq(df['v0'],0).any(1).astype(int)

[out]

 id v0 v1 v2 v3 v4 new_col
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

edited 1 hour ago

answered 1 hour ago

Chris A

1,07619

edited 1 hour ago

answered 1 hour ago

Chris A

1,07619

answered 1 hour ago

Chris A

1,07619

answered 1 hour ago

Chris A

1,07619

2

What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
â€“Â DSM
1 hour ago

add a commentÂ |Â

2

What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
â€“Â DSM
1 hour ago

What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
â€“Â DSM
1 hour ago

add a commentÂ |Â

up vote
1
down vote

You can also use a lambda function:

df['newCol'] = df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

 id v0 v1 v2 v3 v4 newCol
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

answered 1 hour ago

Chris

58318

add a commentÂ |Â

up vote
1
down vote

You can also use a lambda function:

df['newCol'] = df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

 id v0 v1 v2 v3 v4 newCol
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

answered 1 hour ago

Chris

58318

add a commentÂ |Â

up vote
1
down vote

You can also use a lambda function:

df['newCol'] = df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

 id v0 v1 v2 v3 v4 newCol
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

answered 1 hour ago

Chris

58318

You can also use a lambda function:

df['newCol'] = df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)

 id v0 v1 v2 v3 v4 newCol
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

answered 1 hour ago

Chris

58318

answered 1 hour ago

Chris

58318

answered 1 hour ago

Chris

58318

answered 1 hour ago

Chris

58318

add a commentÂ |Â

up vote
1
down vote

I'm assuming here that id is set to be your dataframe index here:

df = df.set_index('id')

Then the following should work (similar answer here):

df['New'] = df.drop('v0', 1).isin(df['v0']).any(1).astype(int)

Gives:

 v0 v1 v2 v3 v4 New
id 
1 10 5 10 22 50 1
2 22 23 55 60 50 0
3 8 2 40 80 110 0
4 15 15 25 100 101 1

edited 45 mins ago

answered 1 hour ago

rahlf23

2,0701220

@Wen: really? For me pd.DataFrame("A": [1,2,3]).isin([3,2,1]) gives Trues, but pd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1])) gives False, True, False, which makes it seem like it is respecting the index.
â€“Â DSM
52 mins ago

@DSM yes you are right :-) my leak of knowledge :-(
â€“Â Wen
50 mins ago

1

@rahlf23: because you're only dropping v0 in principle you could get yourself into trouble if an id value collides with something in the v columns.
â€“Â DSM
49 mins ago

I just noticed that as well, thank you for pointing that out. Let me account for that.
â€“Â rahlf23
49 mins ago

add a commentÂ |Â

up vote
1
down vote

I'm assuming here that id is set to be your dataframe index here:

df = df.set_index('id')

Then the following should work (similar answer here):

df['New'] = df.drop('v0', 1).isin(df['v0']).any(1).astype(int)

Gives:

 v0 v1 v2 v3 v4 New
id 
1 10 5 10 22 50 1
2 22 23 55 60 50 0
3 8 2 40 80 110 0
4 15 15 25 100 101 1

edited 45 mins ago

answered 1 hour ago

rahlf23

2,0701220

@Wen: really? For me pd.DataFrame("A": [1,2,3]).isin([3,2,1]) gives Trues, but pd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1])) gives False, True, False, which makes it seem like it is respecting the index.
â€“Â DSM
52 mins ago

@DSM yes you are right :-) my leak of knowledge :-(
â€“Â Wen
50 mins ago

1

@rahlf23: because you're only dropping v0 in principle you could get yourself into trouble if an id value collides with something in the v columns.
â€“Â DSM
49 mins ago

I just noticed that as well, thank you for pointing that out. Let me account for that.
â€“Â rahlf23
49 mins ago

add a commentÂ |Â

up vote
1
down vote

I'm assuming here that id is set to be your dataframe index here:

df = df.set_index('id')

Then the following should work (similar answer here):

df['New'] = df.drop('v0', 1).isin(df['v0']).any(1).astype(int)

Gives:

 v0 v1 v2 v3 v4 New
id 
1 10 5 10 22 50 1
2 22 23 55 60 50 0
3 8 2 40 80 110 0
4 15 15 25 100 101 1

edited 45 mins ago

answered 1 hour ago

rahlf23

2,0701220

I'm assuming here that id is set to be your dataframe index here:

df = df.set_index('id')

Then the following should work (similar answer here):

df['New'] = df.drop('v0', 1).isin(df['v0']).any(1).astype(int)

Gives:

 v0 v1 v2 v3 v4 New
id 
1 10 5 10 22 50 1
2 22 23 55 60 50 0
3 8 2 40 80 110 0
4 15 15 25 100 101 1

edited 45 mins ago

answered 1 hour ago

rahlf23

2,0701220

edited 45 mins ago

answered 1 hour ago

rahlf23

2,0701220

answered 1 hour ago

rahlf23

2,0701220

answered 1 hour ago

rahlf23

2,0701220

@Wen: really? For me pd.DataFrame("A": [1,2,3]).isin([3,2,1]) gives Trues, but pd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1])) gives False, True, False, which makes it seem like it is respecting the index.
â€“Â DSM
52 mins ago

@DSM yes you are right :-) my leak of knowledge :-(
â€“Â Wen
50 mins ago

1

@rahlf23: because you're only dropping v0 in principle you could get yourself into trouble if an id value collides with something in the v columns.
â€“Â DSM
49 mins ago

I just noticed that as well, thank you for pointing that out. Let me account for that.
â€“Â rahlf23
49 mins ago

add a commentÂ |Â

@Wen: really? For me pd.DataFrame("A": [1,2,3]).isin([3,2,1]) gives Trues, but pd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1])) gives False, True, False, which makes it seem like it is respecting the index.
â€“Â DSM
52 mins ago

@DSM yes you are right :-) my leak of knowledge :-(
â€“Â Wen
50 mins ago

1

@rahlf23: because you're only dropping v0 in principle you could get yourself into trouble if an id value collides with something in the v columns.
â€“Â DSM
49 mins ago

I just noticed that as well, thank you for pointing that out. Let me account for that.
â€“Â rahlf23
49 mins ago

@Wen: really? For me pd.DataFrame("A": [1,2,3]).isin([3,2,1]) gives Trues, but pd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1])) gives False, True, False, which makes it seem like it is respecting the index.
â€“Â DSM
52 mins ago

@DSM yes you are right :-) my leak of knowledge :-(
â€“Â Wen
50 mins ago

@rahlf23: because you're only dropping v0 in principle you could get yourself into trouble if an id value collides with something in the v columns.
â€“Â DSM
49 mins ago

I just noticed that as well, thank you for pointing that out. Let me account for that.
â€“Â rahlf23
49 mins ago

add a commentÂ |Â

up vote
1
down vote

Another take, most likely the smallest syntax:

df['new'] = df.isin(df.pop('v0')).any(1).astype(int)

Full proof:

import pandas as pd

data = '''
id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101'''

df = pd.read_csv(pd.compat.StringIO(data), sep='s+')
df.set_index('id', inplace=True)
df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
print(df)

Returns:

 v1 v2 v3 v4 new
id 
1 5 10 22 50 1
2 23 55 60 50 0
3 2 40 80 110 0
4 15 25 100 101 1

answered 42 mins ago

Anton vBR

10.3k2922

add a commentÂ |Â

up vote
1
down vote

Another take, most likely the smallest syntax:

df['new'] = df.isin(df.pop('v0')).any(1).astype(int)

Full proof:

import pandas as pd

data = '''
id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101'''

df = pd.read_csv(pd.compat.StringIO(data), sep='s+')
df.set_index('id', inplace=True)
df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
print(df)

Returns:

 v1 v2 v3 v4 new
id 
1 5 10 22 50 1
2 23 55 60 50 0
3 2 40 80 110 0
4 15 25 100 101 1

answered 42 mins ago

Anton vBR

10.3k2922

add a commentÂ |Â

up vote
1
down vote

Another take, most likely the smallest syntax:

df['new'] = df.isin(df.pop('v0')).any(1).astype(int)

Full proof:

import pandas as pd

data = '''
id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101'''

df = pd.read_csv(pd.compat.StringIO(data), sep='s+')
df.set_index('id', inplace=True)
df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
print(df)

Returns:

 v1 v2 v3 v4 new
id 
1 5 10 22 50 1
2 23 55 60 50 0
3 2 40 80 110 0
4 15 25 100 101 1

answered 42 mins ago

Anton vBR

10.3k2922

Another take, most likely the smallest syntax:

df['new'] = df.isin(df.pop('v0')).any(1).astype(int)

Full proof:

import pandas as pd

data = '''
id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101'''

df = pd.read_csv(pd.compat.StringIO(data), sep='s+')
df.set_index('id', inplace=True)
df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
print(df)

Returns:

 v1 v2 v3 v4 new
id 
1 5 10 22 50 1
2 23 55 60 50 0
3 2 40 80 110 0
4 15 25 100 101 1

answered 42 mins ago

Anton vBR

10.3k2922

answered 42 mins ago

Anton vBR

10.3k2922

answered 42 mins ago

Anton vBR

10.3k2922

answered 42 mins ago

Anton vBR

10.3k2922

add a commentÂ |Â

up vote
0
down vote

df["exists"] = df.drop(["v0"], 1).isin(df["v0"]).any(1).astype(int)

output

 id v0 v1 v2 v3 v4 exists
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

answered 36 mins ago

Sebabrata Ghosh

New contributor

2

This seems identical to this answer
â€“Â user3483203
35 mins ago

add a commentÂ |Â

up vote
0
down vote

df["exists"] = df.drop(["v0"], 1).isin(df["v0"]).any(1).astype(int)

output

 id v0 v1 v2 v3 v4 exists
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

answered 36 mins ago

Sebabrata Ghosh

New contributor

2

This seems identical to this answer
â€“Â user3483203
35 mins ago

add a commentÂ |Â

up vote
0
down vote

df["exists"] = df.drop(["v0"], 1).isin(df["v0"]).any(1).astype(int)

output

 id v0 v1 v2 v3 v4 exists
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

answered 36 mins ago

Sebabrata Ghosh

New contributor

df["exists"] = df.drop(["v0"], 1).isin(df["v0"]).any(1).astype(int)

output

 id v0 v1 v2 v3 v4 exists
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1

answered 36 mins ago

Sebabrata Ghosh

New contributor

answered 36 mins ago

Sebabrata Ghosh

New contributor

answered 36 mins ago

Sebabrata Ghosh

answered 36 mins ago

Sebabrata Ghosh

New contributor

Sebabrata Ghosh is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

2

This seems identical to this answer
â€“Â user3483203
35 mins ago

add a commentÂ |Â

2

This seems identical to this answer
â€“Â user3483203
35 mins ago

This seems identical to this answer
â€“Â user3483203
35 mins ago

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky