Clash Royale CLAN TAG#URR8PPP

up vote
6
down vote

favorite

I want to create an indicator variable that will propagate to all rows with the same customer-period value pair as the indicator. Specifically, if baz is yes, I want all rows of that same customer and period email to show my indicator.

df
 Customer Period Question Score
 A 1 foo 2
 A 1 bar 3
 A 1 baz yes
 A 1 biz 1
 B 1 bar 2
 B 1 baz no
 B 1 qux 3
 A 2 foo 5
 A 2 baz yes
 B 2 baz yes 
 B 2 biz 2

I've tried

df['Indicator'] = np.where(
 (df.Question.str.contains('baz') & (df.Score == 'yes')), 
 1, 0)

which returns

 Customer Period Question Score Indicator
 A 1 foo 2 0
 A 1 bar 3 0
 A 1 baz yes 1
 A 1 biz 1 0
 B 1 bar 2 0
 B 1 baz no 0
 B 1 qux 3 0
 A 2 foo 5 0
 A 2 baz yes 1
 B 2 baz yes 1
 B 2 biz 2 0

But this is the desired output:

 Customer Period Question Score Indicator
 A 1 foo 2 1
 A 1 bar 3 1
 A 1 baz yes 1
 A 1 biz 1 1
 B 1 bar 2 0
 B 1 baz no 0
 B 1 qux 3 0
 A 2 foo 5 1
 A 2 baz yes 1
 B 2 baz yes 1
 B 2 biz 2 1

I'm not sure how to go about getting what I want. Maybe groupby with ffill and another with bfill?

asked Aug 21 at 14:28

Andrew

466216

add a commentÂ |Â

up vote
6
down vote

favorite

df
 Customer Period Question Score
 A 1 foo 2
 A 1 bar 3
 A 1 baz yes
 A 1 biz 1
 B 1 bar 2
 B 1 baz no
 B 1 qux 3
 A 2 foo 5
 A 2 baz yes
 B 2 baz yes 
 B 2 biz 2

I've tried

df['Indicator'] = np.where(
 (df.Question.str.contains('baz') & (df.Score == 'yes')), 
 1, 0)

which returns

 Customer Period Question Score Indicator
 A 1 foo 2 0
 A 1 bar 3 0
 A 1 baz yes 1
 A 1 biz 1 0
 B 1 bar 2 0
 B 1 baz no 0
 B 1 qux 3 0
 A 2 foo 5 0
 A 2 baz yes 1
 B 2 baz yes 1
 B 2 biz 2 0

But this is the desired output:

 Customer Period Question Score Indicator
 A 1 foo 2 1
 A 1 bar 3 1
 A 1 baz yes 1
 A 1 biz 1 1
 B 1 bar 2 0
 B 1 baz no 0
 B 1 qux 3 0
 A 2 foo 5 1
 A 2 baz yes 1
 B 2 baz yes 1
 B 2 biz 2 1

I'm not sure how to go about getting what I want. Maybe groupby with ffill and another with bfill?

asked Aug 21 at 14:28

Andrew

466216

add a commentÂ |Â

up vote
6
down vote

favorite

df
 Customer Period Question Score
 A 1 foo 2
 A 1 bar 3
 A 1 baz yes
 A 1 biz 1
 B 1 bar 2
 B 1 baz no
 B 1 qux 3
 A 2 foo 5
 A 2 baz yes
 B 2 baz yes 
 B 2 biz 2

I've tried

df['Indicator'] = np.where(
 (df.Question.str.contains('baz') & (df.Score == 'yes')), 
 1, 0)

which returns

 Customer Period Question Score Indicator
 A 1 foo 2 0
 A 1 bar 3 0
 A 1 baz yes 1
 A 1 biz 1 0
 B 1 bar 2 0
 B 1 baz no 0
 B 1 qux 3 0
 A 2 foo 5 0
 A 2 baz yes 1
 B 2 baz yes 1
 B 2 biz 2 0

But this is the desired output:

 Customer Period Question Score Indicator
 A 1 foo 2 1
 A 1 bar 3 1
 A 1 baz yes 1
 A 1 biz 1 1
 B 1 bar 2 0
 B 1 baz no 0
 B 1 qux 3 0
 A 2 foo 5 1
 A 2 baz yes 1
 B 2 baz yes 1
 B 2 biz 2 1

I'm not sure how to go about getting what I want. Maybe groupby with ffill and another with bfill?

asked Aug 21 at 14:28

Andrew

466216

df
 Customer Period Question Score
 A 1 foo 2
 A 1 bar 3
 A 1 baz yes
 A 1 biz 1
 B 1 bar 2
 B 1 baz no
 B 1 qux 3
 A 2 foo 5
 A 2 baz yes
 B 2 baz yes 
 B 2 biz 2

I've tried

df['Indicator'] = np.where(
 (df.Question.str.contains('baz') & (df.Score == 'yes')), 
 1, 0)

which returns

 Customer Period Question Score Indicator
 A 1 foo 2 0
 A 1 bar 3 0
 A 1 baz yes 1
 A 1 biz 1 0
 B 1 bar 2 0
 B 1 baz no 0
 B 1 qux 3 0
 A 2 foo 5 0
 A 2 baz yes 1
 B 2 baz yes 1
 B 2 biz 2 0

But this is the desired output:

 Customer Period Question Score Indicator
 A 1 foo 2 1
 A 1 bar 3 1
 A 1 baz yes 1
 A 1 biz 1 1
 B 1 bar 2 0
 B 1 baz no 0
 B 1 qux 3 0
 A 2 foo 5 1
 A 2 baz yes 1
 B 2 baz yes 1
 B 2 biz 2 1

I'm not sure how to go about getting what I want. Maybe groupby with ffill and another with bfill?

asked Aug 21 at 14:28

Andrew

466216

asked Aug 21 at 14:28

Andrew

466216

asked Aug 21 at 14:28

Andrew

466216

asked Aug 21 at 14:28

Andrew

466216

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
5
down vote

accepted

You can use

In [954]: df['Indicator'] = (df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
 .groupby(['Customer', 'Period'])['eq']
 .transform('any').astype(int))

In [955]: df
Out[955]:
 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

Details

In [956]: df.Question.eq('baz') & df.Score.eq('yes')
Out[956]:
0 False
1 False
2 True
3 False
4 False
5 False
6 False
7 False
8 True
9 True
10 False
dtype: bool

In [957]: df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
Out[957]:
 Customer Period Question Score Indicator eq
0 A 1 foo 2 1 False
1 A 1 bar 3 1 False
2 A 1 baz yes 1 True
3 A 1 biz 1 1 False
4 B 1 bar 2 0 False
5 B 1 baz no 0 False
6 B 1 qux 3 0 False
7 A 2 foo 5 1 False
8 A 2 baz yes 1 True
9 B 2 baz yes 1 True
10 B 2 biz 2 1 False

answered Aug 21 at 14:32

Zero

35.6k75584

perfect! It took me a second to make it work with my real set, but this is a great solution.
â€“Â Andrew
Aug 21 at 14:47

add a commentÂ |Â

up vote
4
down vote

Here's one way. The idea is to use a Boolean mask with MultiIndex. Then use pd.Series.isin to compare against your filtered indices.

mask = (df['Question'] == 'baz') & (df['Score'] == 'yes')
idx_cols = ['Customer', 'Period']
idx = df.set_index(idx_cols).loc[mask.values].index

df['Indicator'] = pd.Series(df.set_index(idx_cols).index.values).isin(idx).astype(int)

print(df)

 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

answered Aug 21 at 14:37

jpp

63.3k173680

add a commentÂ |Â

up vote
4
down vote

You can factorize the tuples of Customer and Period. Then use np.logical_or.at to get group-wise any

i, r = pd.factorize([*zip(df.Customer, df.Period)])
a = np.zeros(len(r), dtype=np.bool8)
np.logical_or.at(a, i, df.eval('Question == "baz" and Score == "yes"'))
df.assign(Indicator=a[i].astype(np.int64))

 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

Explanation

i, r = pd.factorize([*zip(df.Customer, df.Period)])

produces unique (Customer, Period) pairs in r where i is an array keeping track of which element of r went where in order to produce the original list of tuples

Original list of tuples

[*zip(df.Customer, df.Period)]

[('A', 1),
 ('A', 1),
 ('A', 1),
 ('A', 1),
 ('B', 1),
 ('B', 1),
 ('B', 1),
 ('A', 2),
 ('A', 2),
 ('B', 2),
 ('B', 2)]

After factorizing, unique tuples r

r

array([('A', 1), ('B', 1), ('A', 2), ('B', 2)], dtype=object)

And the positions i

i

array([0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3])

I can now use i as indices for evaluating grouped any in Numpy using Numpy's at method on ufuncs. Basically, this allows me to create an array upfront whose values may change based on my at operation. Then specify an array of indices (that's what i will be) and an array matching the size of i that is the second part of my operation at that index.

I end up using as my matching array

df.eval('Question == "baz" and Score == "yes"')

0 False
1 False
2 True
3 False
4 False
5 False
6 False
7 False
8 True
9 True
10 False
dtype: bool

Let me show this in painstaking detail

 Flag GroupIndex Group State of a
0 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
1 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
2 True 0 (A, 1) [1, 0, 0, 0] # Flag is True, or_eq for Index 0
3 False 0 (A, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
4 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
5 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
6 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
7 False 2 (A, 2) [1, 0, 0, 0] # Flag is False, So do Nothing
8 True 2 (A, 2) [1, 0, 1, 0] # Flag is True, or_eq for Index 2
9 True 3 (B, 2) [1, 0, 1, 1] # Flag is True, or_eq for Index 3
10 False 3 (B, 2) [1, 0, 1, 1] # Flag is False, So do Nothing

The final State is [1, 0, 1, 1] or in boolean terms [True, False, True, True]. And that represents the or accumulation within each unique group that is housed in a

a

array([ True, False, True, True])

If I slice this with the index positions in i and cast as integers, I get

a[i].astype(np.int64)

array([1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1])

Which is precisely what we were looking for.

Finally, I use assign to produce a copy of the dataframe with its new column.

df.assign(Indicator=a[i].astype(np.int64))

 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

Why Do it This Way?!

Numpy is often faster.

Below is a slightly more optimized approach. (basically the same)

i, r = pd.factorize([*zip(df.Customer, df.Period)])
a = np.zeros(len(r), dtype=np.bool8)
q = df.Question.values == 'baz'
s = df.Score.values == 'yes'
m = q & s
np.logical_or.at(a, i, m)
df.assign(Indicator=a[i].astype(np.int64))

edited Aug 21 at 15:23

answered Aug 21 at 14:43

piRSquared

138k19117244

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51950944%2fpropagate-conditional-column-value-in-pandas%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
5
down vote

accepted

You can use

In [954]: df['Indicator'] = (df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
 .groupby(['Customer', 'Period'])['eq']
 .transform('any').astype(int))

In [955]: df
Out[955]:
 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

Details

In [956]: df.Question.eq('baz') & df.Score.eq('yes')
Out[956]:
0 False
1 False
2 True
3 False
4 False
5 False
6 False
7 False
8 True
9 True
10 False
dtype: bool

In [957]: df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
Out[957]:
 Customer Period Question Score Indicator eq
0 A 1 foo 2 1 False
1 A 1 bar 3 1 False
2 A 1 baz yes 1 True
3 A 1 biz 1 1 False
4 B 1 bar 2 0 False
5 B 1 baz no 0 False
6 B 1 qux 3 0 False
7 A 2 foo 5 1 False
8 A 2 baz yes 1 True
9 B 2 baz yes 1 True
10 B 2 biz 2 1 False

answered Aug 21 at 14:32

Zero

35.6k75584

perfect! It took me a second to make it work with my real set, but this is a great solution.
â€“Â Andrew
Aug 21 at 14:47

add a commentÂ |Â

up vote
5
down vote

accepted

You can use

In [954]: df['Indicator'] = (df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
 .groupby(['Customer', 'Period'])['eq']
 .transform('any').astype(int))

In [955]: df
Out[955]:
 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

Details

In [956]: df.Question.eq('baz') & df.Score.eq('yes')
Out[956]:
0 False
1 False
2 True
3 False
4 False
5 False
6 False
7 False
8 True
9 True
10 False
dtype: bool

In [957]: df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
Out[957]:
 Customer Period Question Score Indicator eq
0 A 1 foo 2 1 False
1 A 1 bar 3 1 False
2 A 1 baz yes 1 True
3 A 1 biz 1 1 False
4 B 1 bar 2 0 False
5 B 1 baz no 0 False
6 B 1 qux 3 0 False
7 A 2 foo 5 1 False
8 A 2 baz yes 1 True
9 B 2 baz yes 1 True
10 B 2 biz 2 1 False

answered Aug 21 at 14:32

Zero

35.6k75584

perfect! It took me a second to make it work with my real set, but this is a great solution.
â€“Â Andrew
Aug 21 at 14:47

add a commentÂ |Â

up vote
5
down vote

accepted

You can use

In [954]: df['Indicator'] = (df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
 .groupby(['Customer', 'Period'])['eq']
 .transform('any').astype(int))

In [955]: df
Out[955]:
 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

Details

In [956]: df.Question.eq('baz') & df.Score.eq('yes')
Out[956]:
0 False
1 False
2 True
3 False
4 False
5 False
6 False
7 False
8 True
9 True
10 False
dtype: bool

In [957]: df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
Out[957]:
 Customer Period Question Score Indicator eq
0 A 1 foo 2 1 False
1 A 1 bar 3 1 False
2 A 1 baz yes 1 True
3 A 1 biz 1 1 False
4 B 1 bar 2 0 False
5 B 1 baz no 0 False
6 B 1 qux 3 0 False
7 A 2 foo 5 1 False
8 A 2 baz yes 1 True
9 B 2 baz yes 1 True
10 B 2 biz 2 1 False

answered Aug 21 at 14:32

Zero

35.6k75584

You can use

In [954]: df['Indicator'] = (df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
 .groupby(['Customer', 'Period'])['eq']
 .transform('any').astype(int))

In [955]: df
Out[955]:
 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

Details

In [956]: df.Question.eq('baz') & df.Score.eq('yes')
Out[956]:
0 False
1 False
2 True
3 False
4 False
5 False
6 False
7 False
8 True
9 True
10 False
dtype: bool

In [957]: df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
Out[957]:
 Customer Period Question Score Indicator eq
0 A 1 foo 2 1 False
1 A 1 bar 3 1 False
2 A 1 baz yes 1 True
3 A 1 biz 1 1 False
4 B 1 bar 2 0 False
5 B 1 baz no 0 False
6 B 1 qux 3 0 False
7 A 2 foo 5 1 False
8 A 2 baz yes 1 True
9 B 2 baz yes 1 True
10 B 2 biz 2 1 False

answered Aug 21 at 14:32

Zero

35.6k75584

answered Aug 21 at 14:32

Zero

35.6k75584

answered Aug 21 at 14:32

Zero

35.6k75584

answered Aug 21 at 14:32

Zero

35.6k75584

perfect! It took me a second to make it work with my real set, but this is a great solution.
â€“Â Andrew
Aug 21 at 14:47

add a commentÂ |Â

perfect! It took me a second to make it work with my real set, but this is a great solution.
â€“Â Andrew
Aug 21 at 14:47

perfect! It took me a second to make it work with my real set, but this is a great solution.
â€“Â Andrew
Aug 21 at 14:47

add a commentÂ |Â

up vote
4
down vote

Here's one way. The idea is to use a Boolean mask with MultiIndex. Then use pd.Series.isin to compare against your filtered indices.

mask = (df['Question'] == 'baz') & (df['Score'] == 'yes')
idx_cols = ['Customer', 'Period']
idx = df.set_index(idx_cols).loc[mask.values].index

df['Indicator'] = pd.Series(df.set_index(idx_cols).index.values).isin(idx).astype(int)

print(df)

 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

answered Aug 21 at 14:37

jpp

63.3k173680

add a commentÂ |Â

up vote
4
down vote

Here's one way. The idea is to use a Boolean mask with MultiIndex. Then use pd.Series.isin to compare against your filtered indices.

mask = (df['Question'] == 'baz') & (df['Score'] == 'yes')
idx_cols = ['Customer', 'Period']
idx = df.set_index(idx_cols).loc[mask.values].index

df['Indicator'] = pd.Series(df.set_index(idx_cols).index.values).isin(idx).astype(int)

print(df)

 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

answered Aug 21 at 14:37

jpp

63.3k173680

add a commentÂ |Â

up vote
4
down vote

Here's one way. The idea is to use a Boolean mask with MultiIndex. Then use pd.Series.isin to compare against your filtered indices.

mask = (df['Question'] == 'baz') & (df['Score'] == 'yes')
idx_cols = ['Customer', 'Period']
idx = df.set_index(idx_cols).loc[mask.values].index

df['Indicator'] = pd.Series(df.set_index(idx_cols).index.values).isin(idx).astype(int)

print(df)

 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

answered Aug 21 at 14:37

jpp

63.3k173680

Here's one way. The idea is to use a Boolean mask with MultiIndex. Then use pd.Series.isin to compare against your filtered indices.

mask = (df['Question'] == 'baz') & (df['Score'] == 'yes')
idx_cols = ['Customer', 'Period']
idx = df.set_index(idx_cols).loc[mask.values].index

df['Indicator'] = pd.Series(df.set_index(idx_cols).index.values).isin(idx).astype(int)

print(df)

 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

answered Aug 21 at 14:37

jpp

63.3k173680

answered Aug 21 at 14:37

jpp

63.3k173680

answered Aug 21 at 14:37

jpp

63.3k173680

answered Aug 21 at 14:37

jpp

63.3k173680

add a commentÂ |Â

up vote
4
down vote

You can factorize the tuples of Customer and Period. Then use np.logical_or.at to get group-wise any

i, r = pd.factorize([*zip(df.Customer, df.Period)])
a = np.zeros(len(r), dtype=np.bool8)
np.logical_or.at(a, i, df.eval('Question == "baz" and Score == "yes"'))
df.assign(Indicator=a[i].astype(np.int64))

 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

Explanation

i, r = pd.factorize([*zip(df.Customer, df.Period)])

produces unique (Customer, Period) pairs in r where i is an array keeping track of which element of r went where in order to produce the original list of tuples

Original list of tuples

[*zip(df.Customer, df.Period)]

[('A', 1),
 ('A', 1),
 ('A', 1),
 ('A', 1),
 ('B', 1),
 ('B', 1),
 ('B', 1),
 ('A', 2),
 ('A', 2),
 ('B', 2),
 ('B', 2)]

After factorizing, unique tuples r

r

array([('A', 1), ('B', 1), ('A', 2), ('B', 2)], dtype=object)

And the positions i

i

array([0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3])

I end up using as my matching array

df.eval('Question == "baz" and Score == "yes"')

0 False
1 False
2 True
3 False
4 False
5 False
6 False
7 False
8 True
9 True
10 False
dtype: bool

Let me show this in painstaking detail

 Flag GroupIndex Group State of a
0 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
1 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
2 True 0 (A, 1) [1, 0, 0, 0] # Flag is True, or_eq for Index 0
3 False 0 (A, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
4 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
5 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
6 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
7 False 2 (A, 2) [1, 0, 0, 0] # Flag is False, So do Nothing
8 True 2 (A, 2) [1, 0, 1, 0] # Flag is True, or_eq for Index 2
9 True 3 (B, 2) [1, 0, 1, 1] # Flag is True, or_eq for Index 3
10 False 3 (B, 2) [1, 0, 1, 1] # Flag is False, So do Nothing

The final State is [1, 0, 1, 1] or in boolean terms [True, False, True, True]. And that represents the or accumulation within each unique group that is housed in a

a

array([ True, False, True, True])

If I slice this with the index positions in i and cast as integers, I get

a[i].astype(np.int64)

array([1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1])

Which is precisely what we were looking for.

Finally, I use assign to produce a copy of the dataframe with its new column.

df.assign(Indicator=a[i].astype(np.int64))

 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

Why Do it This Way?!

Numpy is often faster.

Below is a slightly more optimized approach. (basically the same)

i, r = pd.factorize([*zip(df.Customer, df.Period)])
a = np.zeros(len(r), dtype=np.bool8)
q = df.Question.values == 'baz'
s = df.Score.values == 'yes'
m = q & s
np.logical_or.at(a, i, m)
df.assign(Indicator=a[i].astype(np.int64))

edited Aug 21 at 15:23

answered Aug 21 at 14:43

piRSquared

138k19117244

add a commentÂ |Â

up vote
4
down vote

You can factorize the tuples of Customer and Period. Then use np.logical_or.at to get group-wise any

i, r = pd.factorize([*zip(df.Customer, df.Period)])
a = np.zeros(len(r), dtype=np.bool8)
np.logical_or.at(a, i, df.eval('Question == "baz" and Score == "yes"'))
df.assign(Indicator=a[i].astype(np.int64))

 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

Explanation

i, r = pd.factorize([*zip(df.Customer, df.Period)])

produces unique (Customer, Period) pairs in r where i is an array keeping track of which element of r went where in order to produce the original list of tuples

Original list of tuples

[*zip(df.Customer, df.Period)]

[('A', 1),
 ('A', 1),
 ('A', 1),
 ('A', 1),
 ('B', 1),
 ('B', 1),
 ('B', 1),
 ('A', 2),
 ('A', 2),
 ('B', 2),
 ('B', 2)]

After factorizing, unique tuples r

r

array([('A', 1), ('B', 1), ('A', 2), ('B', 2)], dtype=object)

And the positions i

i

array([0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3])

I end up using as my matching array

df.eval('Question == "baz" and Score == "yes"')

0 False
1 False
2 True
3 False
4 False
5 False
6 False
7 False
8 True
9 True
10 False
dtype: bool

Let me show this in painstaking detail

 Flag GroupIndex Group State of a
0 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
1 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
2 True 0 (A, 1) [1, 0, 0, 0] # Flag is True, or_eq for Index 0
3 False 0 (A, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
4 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
5 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
6 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
7 False 2 (A, 2) [1, 0, 0, 0] # Flag is False, So do Nothing
8 True 2 (A, 2) [1, 0, 1, 0] # Flag is True, or_eq for Index 2
9 True 3 (B, 2) [1, 0, 1, 1] # Flag is True, or_eq for Index 3
10 False 3 (B, 2) [1, 0, 1, 1] # Flag is False, So do Nothing

The final State is [1, 0, 1, 1] or in boolean terms [True, False, True, True]. And that represents the or accumulation within each unique group that is housed in a

a

array([ True, False, True, True])

If I slice this with the index positions in i and cast as integers, I get

a[i].astype(np.int64)

array([1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1])

Which is precisely what we were looking for.

Finally, I use assign to produce a copy of the dataframe with its new column.

df.assign(Indicator=a[i].astype(np.int64))

 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

Why Do it This Way?!

Numpy is often faster.

Below is a slightly more optimized approach. (basically the same)

i, r = pd.factorize([*zip(df.Customer, df.Period)])
a = np.zeros(len(r), dtype=np.bool8)
q = df.Question.values == 'baz'
s = df.Score.values == 'yes'
m = q & s
np.logical_or.at(a, i, m)
df.assign(Indicator=a[i].astype(np.int64))

edited Aug 21 at 15:23

answered Aug 21 at 14:43

piRSquared

138k19117244

add a commentÂ |Â

up vote
4
down vote

You can factorize the tuples of Customer and Period. Then use np.logical_or.at to get group-wise any

i, r = pd.factorize([*zip(df.Customer, df.Period)])
a = np.zeros(len(r), dtype=np.bool8)
np.logical_or.at(a, i, df.eval('Question == "baz" and Score == "yes"'))
df.assign(Indicator=a[i].astype(np.int64))

 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

Explanation

i, r = pd.factorize([*zip(df.Customer, df.Period)])

produces unique (Customer, Period) pairs in r where i is an array keeping track of which element of r went where in order to produce the original list of tuples

Original list of tuples

[*zip(df.Customer, df.Period)]

[('A', 1),
 ('A', 1),
 ('A', 1),
 ('A', 1),
 ('B', 1),
 ('B', 1),
 ('B', 1),
 ('A', 2),
 ('A', 2),
 ('B', 2),
 ('B', 2)]

After factorizing, unique tuples r

r

array([('A', 1), ('B', 1), ('A', 2), ('B', 2)], dtype=object)

And the positions i

i

array([0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3])

I end up using as my matching array

df.eval('Question == "baz" and Score == "yes"')

0 False
1 False
2 True
3 False
4 False
5 False
6 False
7 False
8 True
9 True
10 False
dtype: bool

Let me show this in painstaking detail

 Flag GroupIndex Group State of a
0 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
1 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
2 True 0 (A, 1) [1, 0, 0, 0] # Flag is True, or_eq for Index 0
3 False 0 (A, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
4 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
5 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
6 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
7 False 2 (A, 2) [1, 0, 0, 0] # Flag is False, So do Nothing
8 True 2 (A, 2) [1, 0, 1, 0] # Flag is True, or_eq for Index 2
9 True 3 (B, 2) [1, 0, 1, 1] # Flag is True, or_eq for Index 3
10 False 3 (B, 2) [1, 0, 1, 1] # Flag is False, So do Nothing

The final State is [1, 0, 1, 1] or in boolean terms [True, False, True, True]. And that represents the or accumulation within each unique group that is housed in a

a

array([ True, False, True, True])

If I slice this with the index positions in i and cast as integers, I get

a[i].astype(np.int64)

array([1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1])

Which is precisely what we were looking for.

Finally, I use assign to produce a copy of the dataframe with its new column.

df.assign(Indicator=a[i].astype(np.int64))

 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

Why Do it This Way?!

Numpy is often faster.

Below is a slightly more optimized approach. (basically the same)

i, r = pd.factorize([*zip(df.Customer, df.Period)])
a = np.zeros(len(r), dtype=np.bool8)
q = df.Question.values == 'baz'
s = df.Score.values == 'yes'
m = q & s
np.logical_or.at(a, i, m)
df.assign(Indicator=a[i].astype(np.int64))

edited Aug 21 at 15:23

answered Aug 21 at 14:43

piRSquared

138k19117244

You can factorize the tuples of Customer and Period. Then use np.logical_or.at to get group-wise any

i, r = pd.factorize([*zip(df.Customer, df.Period)])
a = np.zeros(len(r), dtype=np.bool8)
np.logical_or.at(a, i, df.eval('Question == "baz" and Score == "yes"'))
df.assign(Indicator=a[i].astype(np.int64))

 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

Explanation

i, r = pd.factorize([*zip(df.Customer, df.Period)])

produces unique (Customer, Period) pairs in r where i is an array keeping track of which element of r went where in order to produce the original list of tuples

Original list of tuples

[*zip(df.Customer, df.Period)]

[('A', 1),
 ('A', 1),
 ('A', 1),
 ('A', 1),
 ('B', 1),
 ('B', 1),
 ('B', 1),
 ('A', 2),
 ('A', 2),
 ('B', 2),
 ('B', 2)]

After factorizing, unique tuples r

r

array([('A', 1), ('B', 1), ('A', 2), ('B', 2)], dtype=object)

And the positions i

i

array([0, 0, 0, 0, 1, 1, 1, 2, 2, 3, 3])

I end up using as my matching array

df.eval('Question == "baz" and Score == "yes"')

0 False
1 False
2 True
3 False
4 False
5 False
6 False
7 False
8 True
9 True
10 False
dtype: bool

Let me show this in painstaking detail

 Flag GroupIndex Group State of a
0 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
1 False 0 (A, 1) [0, 0, 0, 0] # Flag is False, So do Nothing
2 True 0 (A, 1) [1, 0, 0, 0] # Flag is True, or_eq for Index 0
3 False 0 (A, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
4 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
5 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
6 False 1 (B, 1) [1, 0, 0, 0] # Flag is False, So do Nothing
7 False 2 (A, 2) [1, 0, 0, 0] # Flag is False, So do Nothing
8 True 2 (A, 2) [1, 0, 1, 0] # Flag is True, or_eq for Index 2
9 True 3 (B, 2) [1, 0, 1, 1] # Flag is True, or_eq for Index 3
10 False 3 (B, 2) [1, 0, 1, 1] # Flag is False, So do Nothing

The final State is [1, 0, 1, 1] or in boolean terms [True, False, True, True]. And that represents the or accumulation within each unique group that is housed in a

a

array([ True, False, True, True])

If I slice this with the index positions in i and cast as integers, I get

a[i].astype(np.int64)

array([1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1])

Which is precisely what we were looking for.

Finally, I use assign to produce a copy of the dataframe with its new column.

df.assign(Indicator=a[i].astype(np.int64))

 Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1

Why Do it This Way?!

Numpy is often faster.

Below is a slightly more optimized approach. (basically the same)

i, r = pd.factorize([*zip(df.Customer, df.Period)])
a = np.zeros(len(r), dtype=np.bool8)
q = df.Question.values == 'baz'
s = df.Score.values == 'yes'
m = q & s
np.logical_or.at(a, i, m)
df.assign(Indicator=a[i].astype(np.int64))

edited Aug 21 at 15:23

answered Aug 21 at 14:43

piRSquared

138k19117244

edited Aug 21 at 15:23

answered Aug 21 at 14:43

piRSquared

138k19117244

answered Aug 21 at 14:43

piRSquared

138k19117244

answered Aug 21 at 14:43

piRSquared

138k19117244

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky

propagate conditional column value in pandas

3 Answers
3

Explanation

Why Do it This Way?!

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Explanation

Why Do it This Way?!

Explanation

Why Do it This Way?!

Explanation

Why Do it This Way?!

Explanation

Why Do it This Way?!

Post as a guest

Comments

Post a Comment

Popular posts from this blog

What does second last employer means? [closed]

List of Gilmore Girls characters

Confectionery

Category

Random preview

propagate conditional column value in pandas

3 Answers 3

Explanation

Why Do it This Way?!

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Explanation

Why Do it This Way?!

Explanation

Why Do it This Way?!

Explanation

Why Do it This Way?!

Explanation

Why Do it This Way?!

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Comments

Post a Comment

Popular posts from this blog

What does second last employer means? [closed]

List of Gilmore Girls characters

Confectionery

3 Answers
3

3 Answers
3

3 Answers
3