Pandas DataFrame check if column value exists in a group of columns
Clash Royale CLAN TAG#URR8PPP
up vote
7
down vote
favorite
I have a DataFrame like this (simplified example)
id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101
And would like to create an additional column that is either 1 or 0. 1 if v0 value is in the values of v1 to v4, and 0 if it's not. So, in this example for id 1 then the value should be 1 (since v2 = 10) and for id 2 value should be 0 since 22 is not in v1 thru v4.
In reality the table is way bigger (around 100,000 rows and variables go from v1 to v99).
python pandas numpy dataframe
add a comment |Â
up vote
7
down vote
favorite
I have a DataFrame like this (simplified example)
id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101
And would like to create an additional column that is either 1 or 0. 1 if v0 value is in the values of v1 to v4, and 0 if it's not. So, in this example for id 1 then the value should be 1 (since v2 = 10) and for id 2 value should be 0 since 22 is not in v1 thru v4.
In reality the table is way bigger (around 100,000 rows and variables go from v1 to v99).
python pandas numpy dataframe
add a comment |Â
up vote
7
down vote
favorite
up vote
7
down vote
favorite
I have a DataFrame like this (simplified example)
id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101
And would like to create an additional column that is either 1 or 0. 1 if v0 value is in the values of v1 to v4, and 0 if it's not. So, in this example for id 1 then the value should be 1 (since v2 = 10) and for id 2 value should be 0 since 22 is not in v1 thru v4.
In reality the table is way bigger (around 100,000 rows and variables go from v1 to v99).
python pandas numpy dataframe
I have a DataFrame like this (simplified example)
id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101
And would like to create an additional column that is either 1 or 0. 1 if v0 value is in the values of v1 to v4, and 0 if it's not. So, in this example for id 1 then the value should be 1 (since v2 = 10) and for id 2 value should be 0 since 22 is not in v1 thru v4.
In reality the table is way bigger (around 100,000 rows and variables go from v1 to v99).
python pandas numpy dataframe
python pandas numpy dataframe
asked 1 hour ago
EGM8686
532
532
add a comment |Â
add a comment |Â
6 Answers
6
active
oldest
votes
up vote
3
down vote
You can use the underlying numpy
arrays for performance:
Setup
a = df.v0.values
b = df.iloc[:, 2:].values
df.assign(out=(a[:, None]==b).any(1).astype(int))
id v0 v1 v2 v3 v4 out
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
This solution leverages broadcasting to allow for pairwise comparison:
First, we broadcast a
:
>>> a[:, None]
array([[10],
[22],
[ 8],
[15]], dtype=int64)
Which allows for pairwise comparison with b
:
>>> a[:, None] == b
array([[False, True, False, False],
[False, False, False, False],
[False, False, False, False],
[ True, False, False, False]])
We then simply check for any True
results along the first axis, and convert to integer.
Performance
Functions
def user_chris(df):
a = df.v0.values
b = df.iloc[:, 2:].values
return (a[:, None]==b).any(1).astype(int)
def rahlf23(df):
df = df.set_index('id')
return df.drop('v0', 1).isin(df['v0']).any(1).astype(int)
def chris_a(df):
return df.loc[:, "v1":].eq(df['v0'], 0).any(1).astype(int)
def chris(df):
return df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)
def anton_vbr(df):
df.set_index('id', inplace=True)
return df.isin(df.pop('v0')).any(1).astype(int)
Setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from timeit import timeit
res = pd.DataFrame(
index=['user_chris', 'rahlf23', 'chris_a', 'chris', 'anton_vbr'],
columns=[10, 50, 100, 500, 1000, 5000],
dtype=float
)
for f in res.index:
for c in res.columns:
vals = np.random.randint(1, 100, (c, c))
vals = np.column_stack((np.arange(vals.shape[0]), vals))
df = pd.DataFrame(vals, columns=['id'] + [f'vi' for i in range(0, vals.shape[0])])
stmt = '(df)'.format(f)
setp = 'from __main__ import df, '.format(f)
res.at[f, c] = timeit(stmt, setp, number=50)
ax = res.div(res.min()).T.plot(loglog=True)
ax.set_xlabel("N");
ax.set_ylabel("time (relative)");
plt.show()
Output
1
Incredible effort, +1 for sure. Sure are a lot of Chrisâ in here :)
â Chris A
6 mins ago
Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
â user3483203
4 mins ago
add a comment |Â
up vote
2
down vote
How about:
df['new_col'] = df.loc[:, "v1":].eq(df['v0'],0).any(1).astype(int)
[out]
id v0 v1 v2 v3 v4 new_col
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
2
What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
â DSM
1 hour ago
add a comment |Â
up vote
1
down vote
You can also use a lambda function:
df['newCol'] = df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)
id v0 v1 v2 v3 v4 newCol
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
add a comment |Â
up vote
1
down vote
I'm assuming here that id
is set to be your dataframe index here:
df = df.set_index('id')
Then the following should work (similar answer here):
df['New'] = df.drop('v0', 1).isin(df['v0']).any(1).astype(int)
Gives:
v0 v1 v2 v3 v4 New
id
1 10 5 10 22 50 1
2 22 23 55 60 50 0
3 8 2 40 80 110 0
4 15 15 25 100 101 1
@Wen: really? For mepd.DataFrame("A": [1,2,3]).isin([3,2,1])
gives Trues, butpd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1]))
gives False, True, False, which makes it seem like it is respecting the index.
â DSM
52 mins ago
@DSM yes you are right :-) my leak of knowledge :-(
â Wen
50 mins ago
1
@rahlf23: because you're only droppingv0
in principle you could get yourself into trouble if an id value collides with something in the v columns.
â DSM
49 mins ago
I just noticed that as well, thank you for pointing that out. Let me account for that.
â rahlf23
49 mins ago
add a comment |Â
up vote
1
down vote
Another take, most likely the smallest syntax:
df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
Full proof:
import pandas as pd
data = '''
id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101'''
df = pd.read_csv(pd.compat.StringIO(data), sep='s+')
df.set_index('id', inplace=True)
df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
print(df)
Returns:
v1 v2 v3 v4 new
id
1 5 10 22 50 1
2 23 55 60 50 0
3 2 40 80 110 0
4 15 25 100 101 1
add a comment |Â
up vote
0
down vote
df["exists"] = df.drop(["v0"], 1).isin(df["v0"]).any(1).astype(int)
output
id v0 v1 v2 v3 v4 exists
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
New contributor
2
This seems identical to this answer
â user3483203
35 mins ago
add a comment |Â
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
You can use the underlying numpy
arrays for performance:
Setup
a = df.v0.values
b = df.iloc[:, 2:].values
df.assign(out=(a[:, None]==b).any(1).astype(int))
id v0 v1 v2 v3 v4 out
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
This solution leverages broadcasting to allow for pairwise comparison:
First, we broadcast a
:
>>> a[:, None]
array([[10],
[22],
[ 8],
[15]], dtype=int64)
Which allows for pairwise comparison with b
:
>>> a[:, None] == b
array([[False, True, False, False],
[False, False, False, False],
[False, False, False, False],
[ True, False, False, False]])
We then simply check for any True
results along the first axis, and convert to integer.
Performance
Functions
def user_chris(df):
a = df.v0.values
b = df.iloc[:, 2:].values
return (a[:, None]==b).any(1).astype(int)
def rahlf23(df):
df = df.set_index('id')
return df.drop('v0', 1).isin(df['v0']).any(1).astype(int)
def chris_a(df):
return df.loc[:, "v1":].eq(df['v0'], 0).any(1).astype(int)
def chris(df):
return df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)
def anton_vbr(df):
df.set_index('id', inplace=True)
return df.isin(df.pop('v0')).any(1).astype(int)
Setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from timeit import timeit
res = pd.DataFrame(
index=['user_chris', 'rahlf23', 'chris_a', 'chris', 'anton_vbr'],
columns=[10, 50, 100, 500, 1000, 5000],
dtype=float
)
for f in res.index:
for c in res.columns:
vals = np.random.randint(1, 100, (c, c))
vals = np.column_stack((np.arange(vals.shape[0]), vals))
df = pd.DataFrame(vals, columns=['id'] + [f'vi' for i in range(0, vals.shape[0])])
stmt = '(df)'.format(f)
setp = 'from __main__ import df, '.format(f)
res.at[f, c] = timeit(stmt, setp, number=50)
ax = res.div(res.min()).T.plot(loglog=True)
ax.set_xlabel("N");
ax.set_ylabel("time (relative)");
plt.show()
Output
1
Incredible effort, +1 for sure. Sure are a lot of Chrisâ in here :)
â Chris A
6 mins ago
Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
â user3483203
4 mins ago
add a comment |Â
up vote
3
down vote
You can use the underlying numpy
arrays for performance:
Setup
a = df.v0.values
b = df.iloc[:, 2:].values
df.assign(out=(a[:, None]==b).any(1).astype(int))
id v0 v1 v2 v3 v4 out
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
This solution leverages broadcasting to allow for pairwise comparison:
First, we broadcast a
:
>>> a[:, None]
array([[10],
[22],
[ 8],
[15]], dtype=int64)
Which allows for pairwise comparison with b
:
>>> a[:, None] == b
array([[False, True, False, False],
[False, False, False, False],
[False, False, False, False],
[ True, False, False, False]])
We then simply check for any True
results along the first axis, and convert to integer.
Performance
Functions
def user_chris(df):
a = df.v0.values
b = df.iloc[:, 2:].values
return (a[:, None]==b).any(1).astype(int)
def rahlf23(df):
df = df.set_index('id')
return df.drop('v0', 1).isin(df['v0']).any(1).astype(int)
def chris_a(df):
return df.loc[:, "v1":].eq(df['v0'], 0).any(1).astype(int)
def chris(df):
return df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)
def anton_vbr(df):
df.set_index('id', inplace=True)
return df.isin(df.pop('v0')).any(1).astype(int)
Setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from timeit import timeit
res = pd.DataFrame(
index=['user_chris', 'rahlf23', 'chris_a', 'chris', 'anton_vbr'],
columns=[10, 50, 100, 500, 1000, 5000],
dtype=float
)
for f in res.index:
for c in res.columns:
vals = np.random.randint(1, 100, (c, c))
vals = np.column_stack((np.arange(vals.shape[0]), vals))
df = pd.DataFrame(vals, columns=['id'] + [f'vi' for i in range(0, vals.shape[0])])
stmt = '(df)'.format(f)
setp = 'from __main__ import df, '.format(f)
res.at[f, c] = timeit(stmt, setp, number=50)
ax = res.div(res.min()).T.plot(loglog=True)
ax.set_xlabel("N");
ax.set_ylabel("time (relative)");
plt.show()
Output
1
Incredible effort, +1 for sure. Sure are a lot of Chrisâ in here :)
â Chris A
6 mins ago
Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
â user3483203
4 mins ago
add a comment |Â
up vote
3
down vote
up vote
3
down vote
You can use the underlying numpy
arrays for performance:
Setup
a = df.v0.values
b = df.iloc[:, 2:].values
df.assign(out=(a[:, None]==b).any(1).astype(int))
id v0 v1 v2 v3 v4 out
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
This solution leverages broadcasting to allow for pairwise comparison:
First, we broadcast a
:
>>> a[:, None]
array([[10],
[22],
[ 8],
[15]], dtype=int64)
Which allows for pairwise comparison with b
:
>>> a[:, None] == b
array([[False, True, False, False],
[False, False, False, False],
[False, False, False, False],
[ True, False, False, False]])
We then simply check for any True
results along the first axis, and convert to integer.
Performance
Functions
def user_chris(df):
a = df.v0.values
b = df.iloc[:, 2:].values
return (a[:, None]==b).any(1).astype(int)
def rahlf23(df):
df = df.set_index('id')
return df.drop('v0', 1).isin(df['v0']).any(1).astype(int)
def chris_a(df):
return df.loc[:, "v1":].eq(df['v0'], 0).any(1).astype(int)
def chris(df):
return df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)
def anton_vbr(df):
df.set_index('id', inplace=True)
return df.isin(df.pop('v0')).any(1).astype(int)
Setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from timeit import timeit
res = pd.DataFrame(
index=['user_chris', 'rahlf23', 'chris_a', 'chris', 'anton_vbr'],
columns=[10, 50, 100, 500, 1000, 5000],
dtype=float
)
for f in res.index:
for c in res.columns:
vals = np.random.randint(1, 100, (c, c))
vals = np.column_stack((np.arange(vals.shape[0]), vals))
df = pd.DataFrame(vals, columns=['id'] + [f'vi' for i in range(0, vals.shape[0])])
stmt = '(df)'.format(f)
setp = 'from __main__ import df, '.format(f)
res.at[f, c] = timeit(stmt, setp, number=50)
ax = res.div(res.min()).T.plot(loglog=True)
ax.set_xlabel("N");
ax.set_ylabel("time (relative)");
plt.show()
Output
You can use the underlying numpy
arrays for performance:
Setup
a = df.v0.values
b = df.iloc[:, 2:].values
df.assign(out=(a[:, None]==b).any(1).astype(int))
id v0 v1 v2 v3 v4 out
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
This solution leverages broadcasting to allow for pairwise comparison:
First, we broadcast a
:
>>> a[:, None]
array([[10],
[22],
[ 8],
[15]], dtype=int64)
Which allows for pairwise comparison with b
:
>>> a[:, None] == b
array([[False, True, False, False],
[False, False, False, False],
[False, False, False, False],
[ True, False, False, False]])
We then simply check for any True
results along the first axis, and convert to integer.
Performance
Functions
def user_chris(df):
a = df.v0.values
b = df.iloc[:, 2:].values
return (a[:, None]==b).any(1).astype(int)
def rahlf23(df):
df = df.set_index('id')
return df.drop('v0', 1).isin(df['v0']).any(1).astype(int)
def chris_a(df):
return df.loc[:, "v1":].eq(df['v0'], 0).any(1).astype(int)
def chris(df):
return df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)
def anton_vbr(df):
df.set_index('id', inplace=True)
return df.isin(df.pop('v0')).any(1).astype(int)
Setup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from timeit import timeit
res = pd.DataFrame(
index=['user_chris', 'rahlf23', 'chris_a', 'chris', 'anton_vbr'],
columns=[10, 50, 100, 500, 1000, 5000],
dtype=float
)
for f in res.index:
for c in res.columns:
vals = np.random.randint(1, 100, (c, c))
vals = np.column_stack((np.arange(vals.shape[0]), vals))
df = pd.DataFrame(vals, columns=['id'] + [f'vi' for i in range(0, vals.shape[0])])
stmt = '(df)'.format(f)
setp = 'from __main__ import df, '.format(f)
res.at[f, c] = timeit(stmt, setp, number=50)
ax = res.div(res.min()).T.plot(loglog=True)
ax.set_xlabel("N");
ax.set_ylabel("time (relative)");
plt.show()
Output
edited 26 mins ago
answered 1 hour ago
user3483203
24.2k62147
24.2k62147
1
Incredible effort, +1 for sure. Sure are a lot of Chrisâ in here :)
â Chris A
6 mins ago
Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
â user3483203
4 mins ago
add a comment |Â
1
Incredible effort, +1 for sure. Sure are a lot of Chrisâ in here :)
â Chris A
6 mins ago
Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
â user3483203
4 mins ago
1
1
Incredible effort, +1 for sure. Sure are a lot of Chrisâ in here :)
â Chris A
6 mins ago
Incredible effort, +1 for sure. Sure are a lot of Chrisâ in here :)
â Chris A
6 mins ago
Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
â user3483203
4 mins ago
Thank you, not sure why your answer was being downvoted, it's the fastest (and clearest) pandas solution, +1
â user3483203
4 mins ago
add a comment |Â
up vote
2
down vote
How about:
df['new_col'] = df.loc[:, "v1":].eq(df['v0'],0).any(1).astype(int)
[out]
id v0 v1 v2 v3 v4 new_col
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
2
What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
â DSM
1 hour ago
add a comment |Â
up vote
2
down vote
How about:
df['new_col'] = df.loc[:, "v1":].eq(df['v0'],0).any(1).astype(int)
[out]
id v0 v1 v2 v3 v4 new_col
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
2
What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
â DSM
1 hour ago
add a comment |Â
up vote
2
down vote
up vote
2
down vote
How about:
df['new_col'] = df.loc[:, "v1":].eq(df['v0'],0).any(1).astype(int)
[out]
id v0 v1 v2 v3 v4 new_col
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
How about:
df['new_col'] = df.loc[:, "v1":].eq(df['v0'],0).any(1).astype(int)
[out]
id v0 v1 v2 v3 v4 new_col
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
edited 1 hour ago
answered 1 hour ago
Chris A
1,07619
1,07619
2
What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
â DSM
1 hour ago
add a comment |Â
2
What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
â DSM
1 hour ago
2
2
What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
â DSM
1 hour ago
What's wrong with this answer? Maybe I'm missing something, but this is what I was planning to do.
â DSM
1 hour ago
add a comment |Â
up vote
1
down vote
You can also use a lambda function:
df['newCol'] = df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)
id v0 v1 v2 v3 v4 newCol
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
add a comment |Â
up vote
1
down vote
You can also use a lambda function:
df['newCol'] = df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)
id v0 v1 v2 v3 v4 newCol
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
add a comment |Â
up vote
1
down vote
up vote
1
down vote
You can also use a lambda function:
df['newCol'] = df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)
id v0 v1 v2 v3 v4 newCol
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
You can also use a lambda function:
df['newCol'] = df.apply(lambda x: int(x['v0'] in x.values[2:]), axis=1)
id v0 v1 v2 v3 v4 newCol
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
answered 1 hour ago
Chris
58318
58318
add a comment |Â
add a comment |Â
up vote
1
down vote
I'm assuming here that id
is set to be your dataframe index here:
df = df.set_index('id')
Then the following should work (similar answer here):
df['New'] = df.drop('v0', 1).isin(df['v0']).any(1).astype(int)
Gives:
v0 v1 v2 v3 v4 New
id
1 10 5 10 22 50 1
2 22 23 55 60 50 0
3 8 2 40 80 110 0
4 15 15 25 100 101 1
@Wen: really? For mepd.DataFrame("A": [1,2,3]).isin([3,2,1])
gives Trues, butpd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1]))
gives False, True, False, which makes it seem like it is respecting the index.
â DSM
52 mins ago
@DSM yes you are right :-) my leak of knowledge :-(
â Wen
50 mins ago
1
@rahlf23: because you're only droppingv0
in principle you could get yourself into trouble if an id value collides with something in the v columns.
â DSM
49 mins ago
I just noticed that as well, thank you for pointing that out. Let me account for that.
â rahlf23
49 mins ago
add a comment |Â
up vote
1
down vote
I'm assuming here that id
is set to be your dataframe index here:
df = df.set_index('id')
Then the following should work (similar answer here):
df['New'] = df.drop('v0', 1).isin(df['v0']).any(1).astype(int)
Gives:
v0 v1 v2 v3 v4 New
id
1 10 5 10 22 50 1
2 22 23 55 60 50 0
3 8 2 40 80 110 0
4 15 15 25 100 101 1
@Wen: really? For mepd.DataFrame("A": [1,2,3]).isin([3,2,1])
gives Trues, butpd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1]))
gives False, True, False, which makes it seem like it is respecting the index.
â DSM
52 mins ago
@DSM yes you are right :-) my leak of knowledge :-(
â Wen
50 mins ago
1
@rahlf23: because you're only droppingv0
in principle you could get yourself into trouble if an id value collides with something in the v columns.
â DSM
49 mins ago
I just noticed that as well, thank you for pointing that out. Let me account for that.
â rahlf23
49 mins ago
add a comment |Â
up vote
1
down vote
up vote
1
down vote
I'm assuming here that id
is set to be your dataframe index here:
df = df.set_index('id')
Then the following should work (similar answer here):
df['New'] = df.drop('v0', 1).isin(df['v0']).any(1).astype(int)
Gives:
v0 v1 v2 v3 v4 New
id
1 10 5 10 22 50 1
2 22 23 55 60 50 0
3 8 2 40 80 110 0
4 15 15 25 100 101 1
I'm assuming here that id
is set to be your dataframe index here:
df = df.set_index('id')
Then the following should work (similar answer here):
df['New'] = df.drop('v0', 1).isin(df['v0']).any(1).astype(int)
Gives:
v0 v1 v2 v3 v4 New
id
1 10 5 10 22 50 1
2 22 23 55 60 50 0
3 8 2 40 80 110 0
4 15 15 25 100 101 1
edited 45 mins ago
answered 1 hour ago
rahlf23
2,0701220
2,0701220
@Wen: really? For mepd.DataFrame("A": [1,2,3]).isin([3,2,1])
gives Trues, butpd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1]))
gives False, True, False, which makes it seem like it is respecting the index.
â DSM
52 mins ago
@DSM yes you are right :-) my leak of knowledge :-(
â Wen
50 mins ago
1
@rahlf23: because you're only droppingv0
in principle you could get yourself into trouble if an id value collides with something in the v columns.
â DSM
49 mins ago
I just noticed that as well, thank you for pointing that out. Let me account for that.
â rahlf23
49 mins ago
add a comment |Â
@Wen: really? For mepd.DataFrame("A": [1,2,3]).isin([3,2,1])
gives Trues, butpd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1]))
gives False, True, False, which makes it seem like it is respecting the index.
â DSM
52 mins ago
@DSM yes you are right :-) my leak of knowledge :-(
â Wen
50 mins ago
1
@rahlf23: because you're only droppingv0
in principle you could get yourself into trouble if an id value collides with something in the v columns.
â DSM
49 mins ago
I just noticed that as well, thank you for pointing that out. Let me account for that.
â rahlf23
49 mins ago
@Wen: really? For me
pd.DataFrame("A": [1,2,3]).isin([3,2,1])
gives Trues, but pd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1]))
gives False, True, False, which makes it seem like it is respecting the index.â DSM
52 mins ago
@Wen: really? For me
pd.DataFrame("A": [1,2,3]).isin([3,2,1])
gives Trues, but pd.DataFrame("A": [1,2,3]).isin(pd.Series([3,2,1]))
gives False, True, False, which makes it seem like it is respecting the index.â DSM
52 mins ago
@DSM yes you are right :-) my leak of knowledge :-(
â Wen
50 mins ago
@DSM yes you are right :-) my leak of knowledge :-(
â Wen
50 mins ago
1
1
@rahlf23: because you're only dropping
v0
in principle you could get yourself into trouble if an id value collides with something in the v columns.â DSM
49 mins ago
@rahlf23: because you're only dropping
v0
in principle you could get yourself into trouble if an id value collides with something in the v columns.â DSM
49 mins ago
I just noticed that as well, thank you for pointing that out. Let me account for that.
â rahlf23
49 mins ago
I just noticed that as well, thank you for pointing that out. Let me account for that.
â rahlf23
49 mins ago
add a comment |Â
up vote
1
down vote
Another take, most likely the smallest syntax:
df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
Full proof:
import pandas as pd
data = '''
id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101'''
df = pd.read_csv(pd.compat.StringIO(data), sep='s+')
df.set_index('id', inplace=True)
df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
print(df)
Returns:
v1 v2 v3 v4 new
id
1 5 10 22 50 1
2 23 55 60 50 0
3 2 40 80 110 0
4 15 25 100 101 1
add a comment |Â
up vote
1
down vote
Another take, most likely the smallest syntax:
df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
Full proof:
import pandas as pd
data = '''
id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101'''
df = pd.read_csv(pd.compat.StringIO(data), sep='s+')
df.set_index('id', inplace=True)
df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
print(df)
Returns:
v1 v2 v3 v4 new
id
1 5 10 22 50 1
2 23 55 60 50 0
3 2 40 80 110 0
4 15 25 100 101 1
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Another take, most likely the smallest syntax:
df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
Full proof:
import pandas as pd
data = '''
id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101'''
df = pd.read_csv(pd.compat.StringIO(data), sep='s+')
df.set_index('id', inplace=True)
df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
print(df)
Returns:
v1 v2 v3 v4 new
id
1 5 10 22 50 1
2 23 55 60 50 0
3 2 40 80 110 0
4 15 25 100 101 1
Another take, most likely the smallest syntax:
df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
Full proof:
import pandas as pd
data = '''
id v0 v1 v2 v3 v4
1 10 5 10 22 50
2 22 23 55 60 50
3 8 2 40 80 110
4 15 15 25 100 101'''
df = pd.read_csv(pd.compat.StringIO(data), sep='s+')
df.set_index('id', inplace=True)
df['new'] = df.isin(df.pop('v0')).any(1).astype(int)
print(df)
Returns:
v1 v2 v3 v4 new
id
1 5 10 22 50 1
2 23 55 60 50 0
3 2 40 80 110 0
4 15 25 100 101 1
answered 42 mins ago
Anton vBR
10.3k2922
10.3k2922
add a comment |Â
add a comment |Â
up vote
0
down vote
df["exists"] = df.drop(["v0"], 1).isin(df["v0"]).any(1).astype(int)
output
id v0 v1 v2 v3 v4 exists
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
New contributor
2
This seems identical to this answer
â user3483203
35 mins ago
add a comment |Â
up vote
0
down vote
df["exists"] = df.drop(["v0"], 1).isin(df["v0"]).any(1).astype(int)
output
id v0 v1 v2 v3 v4 exists
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
New contributor
2
This seems identical to this answer
â user3483203
35 mins ago
add a comment |Â
up vote
0
down vote
up vote
0
down vote
df["exists"] = df.drop(["v0"], 1).isin(df["v0"]).any(1).astype(int)
output
id v0 v1 v2 v3 v4 exists
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
New contributor
df["exists"] = df.drop(["v0"], 1).isin(df["v0"]).any(1).astype(int)
output
id v0 v1 v2 v3 v4 exists
0 1 10 5 10 22 50 1
1 2 22 23 55 60 50 0
2 3 8 2 40 80 110 0
3 4 15 15 25 100 101 1
New contributor
New contributor
answered 36 mins ago
Sebabrata Ghosh
1
1
New contributor
New contributor
2
This seems identical to this answer
â user3483203
35 mins ago
add a comment |Â
2
This seems identical to this answer
â user3483203
35 mins ago
2
2
This seems identical to this answer
â user3483203
35 mins ago
This seems identical to this answer
â user3483203
35 mins ago
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52393659%2fpandas-dataframe-check-if-column-value-exists-in-a-group-of-columns%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password