Sum a column by ID, but skip the first instance?
Clash Royale CLAN TAG#URR8PPP
up vote
7
down vote
favorite
I have a dataframe like the following.
A = ['ID':1, 'Period':1, 'Variable':21, 'ID':1,'Period':2, 'Variable':12,
'ID':2, 'Period':2, 'Variable':14, 'ID':2, 'Period':3, 'Variable':18]
df = pd.DataFrame(A)
I would essentially like to do an operation like df.groupby('ID').sum()
to get the sum of the Variable
column, but I need to skip the first period observed for a particular ID. So, for ID=1, I am dropping the observation at period 1, but for ID=2, I am dropping the observation at period 2.
How can I do this?
python pandas dataframe pandas-groupby
add a comment |Â
up vote
7
down vote
favorite
I have a dataframe like the following.
A = ['ID':1, 'Period':1, 'Variable':21, 'ID':1,'Period':2, 'Variable':12,
'ID':2, 'Period':2, 'Variable':14, 'ID':2, 'Period':3, 'Variable':18]
df = pd.DataFrame(A)
I would essentially like to do an operation like df.groupby('ID').sum()
to get the sum of the Variable
column, but I need to skip the first period observed for a particular ID. So, for ID=1, I am dropping the observation at period 1, but for ID=2, I am dropping the observation at period 2.
How can I do this?
python pandas dataframe pandas-groupby
6
so you want something likedf.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())
?
– EdChum
Aug 8 at 10:50
That does it. Thank you!
– Pburg
Aug 8 at 10:53
1
You should accept one of the answers if it resolves your question, this is so the question doesn't remain unanswered
– EdChum
Aug 9 at 8:40
1
Did an answer below help? If so, please consider accepting (green tick on the left).
– jpp
Aug 26 at 11:50
add a comment |Â
up vote
7
down vote
favorite
up vote
7
down vote
favorite
I have a dataframe like the following.
A = ['ID':1, 'Period':1, 'Variable':21, 'ID':1,'Period':2, 'Variable':12,
'ID':2, 'Period':2, 'Variable':14, 'ID':2, 'Period':3, 'Variable':18]
df = pd.DataFrame(A)
I would essentially like to do an operation like df.groupby('ID').sum()
to get the sum of the Variable
column, but I need to skip the first period observed for a particular ID. So, for ID=1, I am dropping the observation at period 1, but for ID=2, I am dropping the observation at period 2.
How can I do this?
python pandas dataframe pandas-groupby
I have a dataframe like the following.
A = ['ID':1, 'Period':1, 'Variable':21, 'ID':1,'Period':2, 'Variable':12,
'ID':2, 'Period':2, 'Variable':14, 'ID':2, 'Period':3, 'Variable':18]
df = pd.DataFrame(A)
I would essentially like to do an operation like df.groupby('ID').sum()
to get the sum of the Variable
column, but I need to skip the first period observed for a particular ID. So, for ID=1, I am dropping the observation at period 1, but for ID=2, I am dropping the observation at period 2.
How can I do this?
python pandas dataframe pandas-groupby
edited Aug 8 at 12:31
Vadim Kotov
4,08353047
4,08353047
asked Aug 8 at 10:47
Pburg
1484
1484
6
so you want something likedf.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())
?
– EdChum
Aug 8 at 10:50
That does it. Thank you!
– Pburg
Aug 8 at 10:53
1
You should accept one of the answers if it resolves your question, this is so the question doesn't remain unanswered
– EdChum
Aug 9 at 8:40
1
Did an answer below help? If so, please consider accepting (green tick on the left).
– jpp
Aug 26 at 11:50
add a comment |Â
6
so you want something likedf.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())
?
– EdChum
Aug 8 at 10:50
That does it. Thank you!
– Pburg
Aug 8 at 10:53
1
You should accept one of the answers if it resolves your question, this is so the question doesn't remain unanswered
– EdChum
Aug 9 at 8:40
1
Did an answer below help? If so, please consider accepting (green tick on the left).
– jpp
Aug 26 at 11:50
6
6
so you want something like
df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())
?– EdChum
Aug 8 at 10:50
so you want something like
df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())
?– EdChum
Aug 8 at 10:50
That does it. Thank you!
– Pburg
Aug 8 at 10:53
That does it. Thank you!
– Pburg
Aug 8 at 10:53
1
1
You should accept one of the answers if it resolves your question, this is so the question doesn't remain unanswered
– EdChum
Aug 9 at 8:40
You should accept one of the answers if it resolves your question, this is so the question doesn't remain unanswered
– EdChum
Aug 9 at 8:40
1
1
Did an answer below help? If so, please consider accepting (green tick on the left).
– jpp
Aug 26 at 11:50
Did an answer below help? If so, please consider accepting (green tick on the left).
– jpp
Aug 26 at 11:50
add a comment |Â
3 Answers
3
active
oldest
votes
up vote
11
down vote
accepted
You can slice within each group to ignore the first row and call sum
:
In[46]:
df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())
Out[46]:
ID
1 12
2 18
Name: Variable, dtype: int64
If you want all the columns:
In[47]:
df.groupby('ID').apply(lambda x: x.iloc[1:].sum())
Out[47]:
ID Period Variable
ID
1 1 2 12
2 2 3 18
add a comment |Â
up vote
2
down vote
You can use pd.Series.duplicated
to ignore the first occurrence:
res = df[df['ID'].duplicated()].groupby('ID').sum()
print(res)
Period Variable
ID
1 2 12
2 3 18
2
Was thinking along the same lines, performance-wise it's probably just as taxing asapply
on agroupby
+1
– EdChum
Aug 8 at 11:02
@EdChum, Possibly, yes, I just think thelambda
adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.
– jpp
Aug 8 at 11:03
3
Probably can also create the groupby once and thengroup.sum() - group.first()
as well...
– Jon Clements♦
Aug 8 at 11:04
2
@JonClements Good idea you should post that as an answer. Although saying thatgroupby
object is just metadata, you'd still be calling calculatingsum
and thenfirst
although I'd expectfirst
to be fast
– EdChum
Aug 8 at 11:08
@EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
– Jon Clements♦
Aug 8 at 11:10
 |Â
show 2 more comments
up vote
1
down vote
Select necessary observations and then apply aggregation:
df[df.ID != df.Period].groupby('ID').sum()
Output:
Period Variable
ID
1 2 12
2 3 18
4
Wouldn't this fail with'ID':2, 'Period':1, 'Variable':14
? I think it might be exploiting a loophole in the OP's example?
– roganjosh
Aug 8 at 10:57
@roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
– Lev Zakharov
Aug 8 at 11:01
add a comment |Â
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
11
down vote
accepted
You can slice within each group to ignore the first row and call sum
:
In[46]:
df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())
Out[46]:
ID
1 12
2 18
Name: Variable, dtype: int64
If you want all the columns:
In[47]:
df.groupby('ID').apply(lambda x: x.iloc[1:].sum())
Out[47]:
ID Period Variable
ID
1 1 2 12
2 2 3 18
add a comment |Â
up vote
11
down vote
accepted
You can slice within each group to ignore the first row and call sum
:
In[46]:
df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())
Out[46]:
ID
1 12
2 18
Name: Variable, dtype: int64
If you want all the columns:
In[47]:
df.groupby('ID').apply(lambda x: x.iloc[1:].sum())
Out[47]:
ID Period Variable
ID
1 1 2 12
2 2 3 18
add a comment |Â
up vote
11
down vote
accepted
up vote
11
down vote
accepted
You can slice within each group to ignore the first row and call sum
:
In[46]:
df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())
Out[46]:
ID
1 12
2 18
Name: Variable, dtype: int64
If you want all the columns:
In[47]:
df.groupby('ID').apply(lambda x: x.iloc[1:].sum())
Out[47]:
ID Period Variable
ID
1 1 2 12
2 2 3 18
You can slice within each group to ignore the first row and call sum
:
In[46]:
df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())
Out[46]:
ID
1 12
2 18
Name: Variable, dtype: int64
If you want all the columns:
In[47]:
df.groupby('ID').apply(lambda x: x.iloc[1:].sum())
Out[47]:
ID Period Variable
ID
1 1 2 12
2 2 3 18
answered Aug 8 at 10:54
EdChum
156k28320280
156k28320280
add a comment |Â
add a comment |Â
up vote
2
down vote
You can use pd.Series.duplicated
to ignore the first occurrence:
res = df[df['ID'].duplicated()].groupby('ID').sum()
print(res)
Period Variable
ID
1 2 12
2 3 18
2
Was thinking along the same lines, performance-wise it's probably just as taxing asapply
on agroupby
+1
– EdChum
Aug 8 at 11:02
@EdChum, Possibly, yes, I just think thelambda
adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.
– jpp
Aug 8 at 11:03
3
Probably can also create the groupby once and thengroup.sum() - group.first()
as well...
– Jon Clements♦
Aug 8 at 11:04
2
@JonClements Good idea you should post that as an answer. Although saying thatgroupby
object is just metadata, you'd still be calling calculatingsum
and thenfirst
although I'd expectfirst
to be fast
– EdChum
Aug 8 at 11:08
@EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
– Jon Clements♦
Aug 8 at 11:10
 |Â
show 2 more comments
up vote
2
down vote
You can use pd.Series.duplicated
to ignore the first occurrence:
res = df[df['ID'].duplicated()].groupby('ID').sum()
print(res)
Period Variable
ID
1 2 12
2 3 18
2
Was thinking along the same lines, performance-wise it's probably just as taxing asapply
on agroupby
+1
– EdChum
Aug 8 at 11:02
@EdChum, Possibly, yes, I just think thelambda
adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.
– jpp
Aug 8 at 11:03
3
Probably can also create the groupby once and thengroup.sum() - group.first()
as well...
– Jon Clements♦
Aug 8 at 11:04
2
@JonClements Good idea you should post that as an answer. Although saying thatgroupby
object is just metadata, you'd still be calling calculatingsum
and thenfirst
although I'd expectfirst
to be fast
– EdChum
Aug 8 at 11:08
@EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
– Jon Clements♦
Aug 8 at 11:10
 |Â
show 2 more comments
up vote
2
down vote
up vote
2
down vote
You can use pd.Series.duplicated
to ignore the first occurrence:
res = df[df['ID'].duplicated()].groupby('ID').sum()
print(res)
Period Variable
ID
1 2 12
2 3 18
You can use pd.Series.duplicated
to ignore the first occurrence:
res = df[df['ID'].duplicated()].groupby('ID').sum()
print(res)
Period Variable
ID
1 2 12
2 3 18
answered Aug 8 at 11:01


jpp
62.3k173578
62.3k173578
2
Was thinking along the same lines, performance-wise it's probably just as taxing asapply
on agroupby
+1
– EdChum
Aug 8 at 11:02
@EdChum, Possibly, yes, I just think thelambda
adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.
– jpp
Aug 8 at 11:03
3
Probably can also create the groupby once and thengroup.sum() - group.first()
as well...
– Jon Clements♦
Aug 8 at 11:04
2
@JonClements Good idea you should post that as an answer. Although saying thatgroupby
object is just metadata, you'd still be calling calculatingsum
and thenfirst
although I'd expectfirst
to be fast
– EdChum
Aug 8 at 11:08
@EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
– Jon Clements♦
Aug 8 at 11:10
 |Â
show 2 more comments
2
Was thinking along the same lines, performance-wise it's probably just as taxing asapply
on agroupby
+1
– EdChum
Aug 8 at 11:02
@EdChum, Possibly, yes, I just think thelambda
adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.
– jpp
Aug 8 at 11:03
3
Probably can also create the groupby once and thengroup.sum() - group.first()
as well...
– Jon Clements♦
Aug 8 at 11:04
2
@JonClements Good idea you should post that as an answer. Although saying thatgroupby
object is just metadata, you'd still be calling calculatingsum
and thenfirst
although I'd expectfirst
to be fast
– EdChum
Aug 8 at 11:08
@EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
– Jon Clements♦
Aug 8 at 11:10
2
2
Was thinking along the same lines, performance-wise it's probably just as taxing as
apply
on a groupby
+1– EdChum
Aug 8 at 11:02
Was thinking along the same lines, performance-wise it's probably just as taxing as
apply
on a groupby
+1– EdChum
Aug 8 at 11:02
@EdChum, Possibly, yes, I just think the
lambda
adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.– jpp
Aug 8 at 11:03
@EdChum, Possibly, yes, I just think the
lambda
adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.– jpp
Aug 8 at 11:03
3
3
Probably can also create the groupby once and then
group.sum() - group.first()
as well...– Jon Clements♦
Aug 8 at 11:04
Probably can also create the groupby once and then
group.sum() - group.first()
as well...– Jon Clements♦
Aug 8 at 11:04
2
2
@JonClements Good idea you should post that as an answer. Although saying that
groupby
object is just metadata, you'd still be calling calculating sum
and then first
although I'd expect first
to be fast– EdChum
Aug 8 at 11:08
@JonClements Good idea you should post that as an answer. Although saying that
groupby
object is just metadata, you'd still be calling calculating sum
and then first
although I'd expect first
to be fast– EdChum
Aug 8 at 11:08
@EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
– Jon Clements♦
Aug 8 at 11:10
@EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
– Jon Clements♦
Aug 8 at 11:10
 |Â
show 2 more comments
up vote
1
down vote
Select necessary observations and then apply aggregation:
df[df.ID != df.Period].groupby('ID').sum()
Output:
Period Variable
ID
1 2 12
2 3 18
4
Wouldn't this fail with'ID':2, 'Period':1, 'Variable':14
? I think it might be exploiting a loophole in the OP's example?
– roganjosh
Aug 8 at 10:57
@roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
– Lev Zakharov
Aug 8 at 11:01
add a comment |Â
up vote
1
down vote
Select necessary observations and then apply aggregation:
df[df.ID != df.Period].groupby('ID').sum()
Output:
Period Variable
ID
1 2 12
2 3 18
4
Wouldn't this fail with'ID':2, 'Period':1, 'Variable':14
? I think it might be exploiting a loophole in the OP's example?
– roganjosh
Aug 8 at 10:57
@roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
– Lev Zakharov
Aug 8 at 11:01
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Select necessary observations and then apply aggregation:
df[df.ID != df.Period].groupby('ID').sum()
Output:
Period Variable
ID
1 2 12
2 3 18
Select necessary observations and then apply aggregation:
df[df.ID != df.Period].groupby('ID').sum()
Output:
Period Variable
ID
1 2 12
2 3 18
answered Aug 8 at 10:53


Lev Zakharov
1,956320
1,956320
4
Wouldn't this fail with'ID':2, 'Period':1, 'Variable':14
? I think it might be exploiting a loophole in the OP's example?
– roganjosh
Aug 8 at 10:57
@roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
– Lev Zakharov
Aug 8 at 11:01
add a comment |Â
4
Wouldn't this fail with'ID':2, 'Period':1, 'Variable':14
? I think it might be exploiting a loophole in the OP's example?
– roganjosh
Aug 8 at 10:57
@roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
– Lev Zakharov
Aug 8 at 11:01
4
4
Wouldn't this fail with
'ID':2, 'Period':1, 'Variable':14
? I think it might be exploiting a loophole in the OP's example?– roganjosh
Aug 8 at 10:57
Wouldn't this fail with
'ID':2, 'Period':1, 'Variable':14
? I think it might be exploiting a loophole in the OP's example?– roganjosh
Aug 8 at 10:57
@roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
– Lev Zakharov
Aug 8 at 11:01
@roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
– Lev Zakharov
Aug 8 at 11:01
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51744769%2fsum-a-column-by-id-but-skip-the-first-instance%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
6
so you want something like
df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())
?– EdChum
Aug 8 at 10:50
That does it. Thank you!
– Pburg
Aug 8 at 10:53
1
You should accept one of the answers if it resolves your question, this is so the question doesn't remain unanswered
– EdChum
Aug 9 at 8:40
1
Did an answer below help? If so, please consider accepting (green tick on the left).
– jpp
Aug 26 at 11:50