Sum a column by ID, but skip the first instance?

up vote
7
down vote

favorite

I have a dataframe like the following.

A = ['ID':1, 'Period':1, 'Variable':21, 'ID':1,'Period':2, 'Variable':12, 
 'ID':2, 'Period':2, 'Variable':14, 'ID':2, 'Period':3, 'Variable':18]

df = pd.DataFrame(A)

I would essentially like to do an operation like df.groupby('ID').sum() to get the sum of the Variable column, but I need to skip the first period observed for a particular ID. So, for ID=1, I am dropping the observation at period 1, but for ID=2, I am dropping the observation at period 2.

How can I do this?

edited Aug 8 at 12:31

Vadim Kotov

4,08353047

asked Aug 8 at 10:47

Pburg

1484

6

so you want something like df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())?
â€“Â EdChum
Aug 8 at 10:50

That does it. Thank you!
â€“Â Pburg
Aug 8 at 10:53

1

You should accept one of the answers if it resolves your question, this is so the question doesn't remain unanswered
â€“Â EdChum
Aug 9 at 8:40

1

Did an answer below help? If so, please consider accepting (green tick on the left).
â€“Â jpp
Aug 26 at 11:50

add a commentÂ |Â

up vote
7
down vote

favorite

I have a dataframe like the following.

A = ['ID':1, 'Period':1, 'Variable':21, 'ID':1,'Period':2, 'Variable':12, 
 'ID':2, 'Period':2, 'Variable':14, 'ID':2, 'Period':3, 'Variable':18]

df = pd.DataFrame(A)

How can I do this?

edited Aug 8 at 12:31

Vadim Kotov

4,08353047

asked Aug 8 at 10:47

Pburg

1484

6

so you want something like df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())?
â€“Â EdChum
Aug 8 at 10:50

That does it. Thank you!
â€“Â Pburg
Aug 8 at 10:53

1

You should accept one of the answers if it resolves your question, this is so the question doesn't remain unanswered
â€“Â EdChum
Aug 9 at 8:40

1

Did an answer below help? If so, please consider accepting (green tick on the left).
â€“Â jpp
Aug 26 at 11:50

add a commentÂ |Â

up vote
7
down vote

favorite

I have a dataframe like the following.

A = ['ID':1, 'Period':1, 'Variable':21, 'ID':1,'Period':2, 'Variable':12, 
 'ID':2, 'Period':2, 'Variable':14, 'ID':2, 'Period':3, 'Variable':18]

df = pd.DataFrame(A)

How can I do this?

edited Aug 8 at 12:31

Vadim Kotov

4,08353047

asked Aug 8 at 10:47

Pburg

1484

I have a dataframe like the following.

A = ['ID':1, 'Period':1, 'Variable':21, 'ID':1,'Period':2, 'Variable':12, 
 'ID':2, 'Period':2, 'Variable':14, 'ID':2, 'Period':3, 'Variable':18]

df = pd.DataFrame(A)

How can I do this?

edited Aug 8 at 12:31

Vadim Kotov

4,08353047

asked Aug 8 at 10:47

Pburg

1484

edited Aug 8 at 12:31

Vadim Kotov

4,08353047

edited Aug 8 at 12:31

Vadim Kotov

4,08353047

edited Aug 8 at 12:31

Vadim Kotov

4,08353047

asked Aug 8 at 10:47

Pburg

1484

asked Aug 8 at 10:47

Pburg

1484

asked Aug 8 at 10:47

Pburg

1484

6

so you want something like df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())?
â€“Â EdChum
Aug 8 at 10:50

That does it. Thank you!
â€“Â Pburg
Aug 8 at 10:53

1

You should accept one of the answers if it resolves your question, this is so the question doesn't remain unanswered
â€“Â EdChum
Aug 9 at 8:40

1

Did an answer below help? If so, please consider accepting (green tick on the left).
â€“Â jpp
Aug 26 at 11:50

add a commentÂ |Â

6

so you want something like df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())?
â€“Â EdChum
Aug 8 at 10:50

That does it. Thank you!
â€“Â Pburg
Aug 8 at 10:53

1

You should accept one of the answers if it resolves your question, this is so the question doesn't remain unanswered
â€“Â EdChum
Aug 9 at 8:40

1

Did an answer below help? If so, please consider accepting (green tick on the left).
â€“Â jpp
Aug 26 at 11:50

so you want something like df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())?
â€“Â EdChum
Aug 8 at 10:50

That does it. Thank you!
â€“Â Pburg
Aug 8 at 10:53

You should accept one of the answers if it resolves your question, this is so the question doesn't remain unanswered
â€“Â EdChum
Aug 9 at 8:40

Did an answer below help? If so, please consider accepting (green tick on the left).
â€“Â jpp
Aug 26 at 11:50

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
11
down vote

accepted

You can slice within each group to ignore the first row and call sum:

In[46]:
df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())

Out[46]: 
ID
1 12
2 18
Name: Variable, dtype: int64

If you want all the columns:

In[47]:
df.groupby('ID').apply(lambda x: x.iloc[1:].sum())

Out[47]: 
 ID Period Variable
ID 
1 1 2 12
2 2 3 18

answered Aug 8 at 10:54

EdChum

156k28320280

add a commentÂ |Â

up vote
2
down vote

You can use pd.Series.duplicated to ignore the first occurrence:

res = df[df['ID'].duplicated()].groupby('ID').sum()

print(res)

 Period Variable
ID 
1 2 12
2 3 18

answered Aug 8 at 11:01

jpp

62.3k173578

2

Was thinking along the same lines, performance-wise it's probably just as taxing as apply on a groupby+1
â€“Â EdChum
Aug 8 at 11:02

@EdChum, Possibly, yes, I just think the lambda adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.
â€“Â jpp
Aug 8 at 11:03

3

Probably can also create the groupby once and then group.sum() - group.first() as well...
â€“Â Jon Clementsâ™¦
Aug 8 at 11:04

2

@JonClements Good idea you should post that as an answer. Although saying that groupby object is just metadata, you'd still be calling calculating sum and then first although I'd expect first to be fast
â€“Â EdChum
Aug 8 at 11:08

@EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
â€“Â Jon Clementsâ™¦
Aug 8 at 11:10

Â |Â
show 2 more comments

up vote
1
down vote

Select necessary observations and then apply aggregation:

df[df.ID != df.Period].groupby('ID').sum()

Output:

 Period Variable
ID 
1 2 12
2 3 18

answered Aug 8 at 10:53

Lev Zakharov

1,956320

4

Wouldn't this fail with 'ID':2, 'Period':1, 'Variable':14? I think it might be exploiting a loophole in the OP's example?
â€“Â roganjosh
Aug 8 at 10:57

@roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
â€“Â Lev Zakharov
Aug 8 at 11:01

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51744769%2fsum-a-column-by-id-but-skip-the-first-instance%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
11
down vote

accepted

You can slice within each group to ignore the first row and call sum:

In[46]:
df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())

Out[46]: 
ID
1 12
2 18
Name: Variable, dtype: int64

If you want all the columns:

In[47]:
df.groupby('ID').apply(lambda x: x.iloc[1:].sum())

Out[47]: 
 ID Period Variable
ID 
1 1 2 12
2 2 3 18

answered Aug 8 at 10:54

EdChum

156k28320280

add a commentÂ |Â

up vote
11
down vote

accepted

You can slice within each group to ignore the first row and call sum:

In[46]:
df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())

Out[46]: 
ID
1 12
2 18
Name: Variable, dtype: int64

If you want all the columns:

In[47]:
df.groupby('ID').apply(lambda x: x.iloc[1:].sum())

Out[47]: 
 ID Period Variable
ID 
1 1 2 12
2 2 3 18

answered Aug 8 at 10:54

EdChum

156k28320280

add a commentÂ |Â

up vote
11
down vote

accepted

You can slice within each group to ignore the first row and call sum:

In[46]:
df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())

Out[46]: 
ID
1 12
2 18
Name: Variable, dtype: int64

If you want all the columns:

In[47]:
df.groupby('ID').apply(lambda x: x.iloc[1:].sum())

Out[47]: 
 ID Period Variable
ID 
1 1 2 12
2 2 3 18

answered Aug 8 at 10:54

EdChum

156k28320280

You can slice within each group to ignore the first row and call sum:

In[46]:
df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())

Out[46]: 
ID
1 12
2 18
Name: Variable, dtype: int64

If you want all the columns:

In[47]:
df.groupby('ID').apply(lambda x: x.iloc[1:].sum())

Out[47]: 
 ID Period Variable
ID 
1 1 2 12
2 2 3 18

answered Aug 8 at 10:54

EdChum

156k28320280

answered Aug 8 at 10:54

EdChum

156k28320280

answered Aug 8 at 10:54

EdChum

156k28320280

answered Aug 8 at 10:54

EdChum

156k28320280

add a commentÂ |Â

up vote
2
down vote

You can use pd.Series.duplicated to ignore the first occurrence:

res = df[df['ID'].duplicated()].groupby('ID').sum()

print(res)

 Period Variable
ID 
1 2 12
2 3 18

answered Aug 8 at 11:01

jpp

62.3k173578

2

Was thinking along the same lines, performance-wise it's probably just as taxing as apply on a groupby+1
â€“Â EdChum
Aug 8 at 11:02

@EdChum, Possibly, yes, I just think the lambda adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.
â€“Â jpp
Aug 8 at 11:03

3

Probably can also create the groupby once and then group.sum() - group.first() as well...
â€“Â Jon Clementsâ™¦
Aug 8 at 11:04

2

@JonClements Good idea you should post that as an answer. Although saying that groupby object is just metadata, you'd still be calling calculating sum and then first although I'd expect first to be fast
â€“Â EdChum
Aug 8 at 11:08

@EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
â€“Â Jon Clementsâ™¦
Aug 8 at 11:10

Â |Â
show 2 more comments

up vote
2
down vote

You can use pd.Series.duplicated to ignore the first occurrence:

res = df[df['ID'].duplicated()].groupby('ID').sum()

print(res)

 Period Variable
ID 
1 2 12
2 3 18

answered Aug 8 at 11:01

jpp

62.3k173578

2

Was thinking along the same lines, performance-wise it's probably just as taxing as apply on a groupby+1
â€“Â EdChum
Aug 8 at 11:02

@EdChum, Possibly, yes, I just think the lambda adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.
â€“Â jpp
Aug 8 at 11:03

3

Probably can also create the groupby once and then group.sum() - group.first() as well...
â€“Â Jon Clementsâ™¦
Aug 8 at 11:04

2

@JonClements Good idea you should post that as an answer. Although saying that groupby object is just metadata, you'd still be calling calculating sum and then first although I'd expect first to be fast
â€“Â EdChum
Aug 8 at 11:08

@EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
â€“Â Jon Clementsâ™¦
Aug 8 at 11:10

Â |Â
show 2 more comments

up vote
2
down vote

You can use pd.Series.duplicated to ignore the first occurrence:

res = df[df['ID'].duplicated()].groupby('ID').sum()

print(res)

 Period Variable
ID 
1 2 12
2 3 18

answered Aug 8 at 11:01

jpp

62.3k173578

You can use pd.Series.duplicated to ignore the first occurrence:

res = df[df['ID'].duplicated()].groupby('ID').sum()

print(res)

 Period Variable
ID 
1 2 12
2 3 18

answered Aug 8 at 11:01

jpp

62.3k173578

answered Aug 8 at 11:01

jpp

62.3k173578

answered Aug 8 at 11:01

jpp

62.3k173578

answered Aug 8 at 11:01

jpp

62.3k173578

2

Was thinking along the same lines, performance-wise it's probably just as taxing as apply on a groupby+1
â€“Â EdChum
Aug 8 at 11:02

@EdChum, Possibly, yes, I just think the lambda adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.
â€“Â jpp
Aug 8 at 11:03

3

Probably can also create the groupby once and then group.sum() - group.first() as well...
â€“Â Jon Clementsâ™¦
Aug 8 at 11:04

2

@JonClements Good idea you should post that as an answer. Although saying that groupby object is just metadata, you'd still be calling calculating sum and then first although I'd expect first to be fast
â€“Â EdChum
Aug 8 at 11:08

@EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
â€“Â Jon Clementsâ™¦
Aug 8 at 11:10

Â |Â
show 2 more comments

2

Was thinking along the same lines, performance-wise it's probably just as taxing as apply on a groupby+1
â€“Â EdChum
Aug 8 at 11:02

@EdChum, Possibly, yes, I just think the lambda adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.
â€“Â jpp
Aug 8 at 11:03

3

Probably can also create the groupby once and then group.sum() - group.first() as well...
â€“Â Jon Clementsâ™¦
Aug 8 at 11:04

2

@JonClements Good idea you should post that as an answer. Although saying that groupby object is just metadata, you'd still be calling calculating sum and then first although I'd expect first to be fast
â€“Â EdChum
Aug 8 at 11:08

@EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
â€“Â Jon Clementsâ™¦
Aug 8 at 11:10

Was thinking along the same lines, performance-wise it's probably just as taxing as apply on a groupby+1
â€“Â EdChum
Aug 8 at 11:02

@EdChum, Possibly, yes, I just think the lambda adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.
â€“Â jpp
Aug 8 at 11:03

Probably can also create the groupby once and then group.sum() - group.first() as well...
â€“Â Jon Clementsâ™¦
Aug 8 at 11:04

@JonClements Good idea you should post that as an answer. Although saying that groupby object is just metadata, you'd still be calling calculating sum and then first although I'd expect first to be fast
â€“Â EdChum
Aug 8 at 11:08

@EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
â€“Â Jon Clementsâ™¦
Aug 8 at 11:10

Â |Â
show 2 more comments

up vote
1
down vote

Select necessary observations and then apply aggregation:

df[df.ID != df.Period].groupby('ID').sum()

Output:

 Period Variable
ID 
1 2 12
2 3 18

answered Aug 8 at 10:53

Lev Zakharov

1,956320

4

Wouldn't this fail with 'ID':2, 'Period':1, 'Variable':14? I think it might be exploiting a loophole in the OP's example?
â€“Â roganjosh
Aug 8 at 10:57

@roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
â€“Â Lev Zakharov
Aug 8 at 11:01

add a commentÂ |Â

up vote
1
down vote

Select necessary observations and then apply aggregation:

df[df.ID != df.Period].groupby('ID').sum()

Output:

 Period Variable
ID 
1 2 12
2 3 18

answered Aug 8 at 10:53

Lev Zakharov

1,956320

4

Wouldn't this fail with 'ID':2, 'Period':1, 'Variable':14? I think it might be exploiting a loophole in the OP's example?
â€“Â roganjosh
Aug 8 at 10:57

@roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
â€“Â Lev Zakharov
Aug 8 at 11:01

add a commentÂ |Â

up vote
1
down vote

Select necessary observations and then apply aggregation:

df[df.ID != df.Period].groupby('ID').sum()

Output:

 Period Variable
ID 
1 2 12
2 3 18

answered Aug 8 at 10:53

Lev Zakharov

1,956320

Select necessary observations and then apply aggregation:

df[df.ID != df.Period].groupby('ID').sum()

Output:

 Period Variable
ID 
1 2 12
2 3 18

answered Aug 8 at 10:53

Lev Zakharov

1,956320

answered Aug 8 at 10:53

Lev Zakharov

1,956320

answered Aug 8 at 10:53

Lev Zakharov

1,956320

answered Aug 8 at 10:53

Lev Zakharov

1,956320

4

Wouldn't this fail with 'ID':2, 'Period':1, 'Variable':14? I think it might be exploiting a loophole in the OP's example?
â€“Â roganjosh
Aug 8 at 10:57

@roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
â€“Â Lev Zakharov
Aug 8 at 11:01

add a commentÂ |Â

4

Wouldn't this fail with 'ID':2, 'Period':1, 'Variable':14? I think it might be exploiting a loophole in the OP's example?
â€“Â roganjosh
Aug 8 at 10:57

@roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
â€“Â Lev Zakharov
Aug 8 at 11:01

Wouldn't this fail with 'ID':2, 'Period':1, 'Variable':14? I think it might be exploiting a loophole in the OP's example?
â€“Â roganjosh
Aug 8 at 10:57

@roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
â€“Â Lev Zakharov
Aug 8 at 11:01

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky