Sum a column by ID, but skip the first instance?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
7
down vote

favorite












I have a dataframe like the following.



A = ['ID':1, 'Period':1, 'Variable':21, 'ID':1,'Period':2, 'Variable':12, 
'ID':2, 'Period':2, 'Variable':14, 'ID':2, 'Period':3, 'Variable':18]

df = pd.DataFrame(A)


I would essentially like to do an operation like df.groupby('ID').sum() to get the sum of the Variable column, but I need to skip the first period observed for a particular ID. So, for ID=1, I am dropping the observation at period 1, but for ID=2, I am dropping the observation at period 2.



How can I do this?







share|improve this question


















  • 6




    so you want something like df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())?
    – EdChum
    Aug 8 at 10:50










  • That does it. Thank you!
    – Pburg
    Aug 8 at 10:53






  • 1




    You should accept one of the answers if it resolves your question, this is so the question doesn't remain unanswered
    – EdChum
    Aug 9 at 8:40






  • 1




    Did an answer below help? If so, please consider accepting (green tick on the left).
    – jpp
    Aug 26 at 11:50














up vote
7
down vote

favorite












I have a dataframe like the following.



A = ['ID':1, 'Period':1, 'Variable':21, 'ID':1,'Period':2, 'Variable':12, 
'ID':2, 'Period':2, 'Variable':14, 'ID':2, 'Period':3, 'Variable':18]

df = pd.DataFrame(A)


I would essentially like to do an operation like df.groupby('ID').sum() to get the sum of the Variable column, but I need to skip the first period observed for a particular ID. So, for ID=1, I am dropping the observation at period 1, but for ID=2, I am dropping the observation at period 2.



How can I do this?







share|improve this question


















  • 6




    so you want something like df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())?
    – EdChum
    Aug 8 at 10:50










  • That does it. Thank you!
    – Pburg
    Aug 8 at 10:53






  • 1




    You should accept one of the answers if it resolves your question, this is so the question doesn't remain unanswered
    – EdChum
    Aug 9 at 8:40






  • 1




    Did an answer below help? If so, please consider accepting (green tick on the left).
    – jpp
    Aug 26 at 11:50












up vote
7
down vote

favorite









up vote
7
down vote

favorite











I have a dataframe like the following.



A = ['ID':1, 'Period':1, 'Variable':21, 'ID':1,'Period':2, 'Variable':12, 
'ID':2, 'Period':2, 'Variable':14, 'ID':2, 'Period':3, 'Variable':18]

df = pd.DataFrame(A)


I would essentially like to do an operation like df.groupby('ID').sum() to get the sum of the Variable column, but I need to skip the first period observed for a particular ID. So, for ID=1, I am dropping the observation at period 1, but for ID=2, I am dropping the observation at period 2.



How can I do this?







share|improve this question














I have a dataframe like the following.



A = ['ID':1, 'Period':1, 'Variable':21, 'ID':1,'Period':2, 'Variable':12, 
'ID':2, 'Period':2, 'Variable':14, 'ID':2, 'Period':3, 'Variable':18]

df = pd.DataFrame(A)


I would essentially like to do an operation like df.groupby('ID').sum() to get the sum of the Variable column, but I need to skip the first period observed for a particular ID. So, for ID=1, I am dropping the observation at period 1, but for ID=2, I am dropping the observation at period 2.



How can I do this?









share|improve this question













share|improve this question




share|improve this question








edited Aug 8 at 12:31









Vadim Kotov

4,08353047




4,08353047










asked Aug 8 at 10:47









Pburg

1484




1484







  • 6




    so you want something like df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())?
    – EdChum
    Aug 8 at 10:50










  • That does it. Thank you!
    – Pburg
    Aug 8 at 10:53






  • 1




    You should accept one of the answers if it resolves your question, this is so the question doesn't remain unanswered
    – EdChum
    Aug 9 at 8:40






  • 1




    Did an answer below help? If so, please consider accepting (green tick on the left).
    – jpp
    Aug 26 at 11:50












  • 6




    so you want something like df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())?
    – EdChum
    Aug 8 at 10:50










  • That does it. Thank you!
    – Pburg
    Aug 8 at 10:53






  • 1




    You should accept one of the answers if it resolves your question, this is so the question doesn't remain unanswered
    – EdChum
    Aug 9 at 8:40






  • 1




    Did an answer below help? If so, please consider accepting (green tick on the left).
    – jpp
    Aug 26 at 11:50







6




6




so you want something like df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())?
– EdChum
Aug 8 at 10:50




so you want something like df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())?
– EdChum
Aug 8 at 10:50












That does it. Thank you!
– Pburg
Aug 8 at 10:53




That does it. Thank you!
– Pburg
Aug 8 at 10:53




1




1




You should accept one of the answers if it resolves your question, this is so the question doesn't remain unanswered
– EdChum
Aug 9 at 8:40




You should accept one of the answers if it resolves your question, this is so the question doesn't remain unanswered
– EdChum
Aug 9 at 8:40




1




1




Did an answer below help? If so, please consider accepting (green tick on the left).
– jpp
Aug 26 at 11:50




Did an answer below help? If so, please consider accepting (green tick on the left).
– jpp
Aug 26 at 11:50












3 Answers
3






active

oldest

votes

















up vote
11
down vote



accepted










You can slice within each group to ignore the first row and call sum:



In[46]:
df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())

Out[46]:
ID
1 12
2 18
Name: Variable, dtype: int64


If you want all the columns:



In[47]:
df.groupby('ID').apply(lambda x: x.iloc[1:].sum())

Out[47]:
ID Period Variable
ID
1 1 2 12
2 2 3 18





share|improve this answer



























    up vote
    2
    down vote













    You can use pd.Series.duplicated to ignore the first occurrence:



    res = df[df['ID'].duplicated()].groupby('ID').sum()

    print(res)

    Period Variable
    ID
    1 2 12
    2 3 18





    share|improve this answer
















    • 2




      Was thinking along the same lines, performance-wise it's probably just as taxing as apply on a groupby+1
      – EdChum
      Aug 8 at 11:02











    • @EdChum, Possibly, yes, I just think the lambda adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.
      – jpp
      Aug 8 at 11:03






    • 3




      Probably can also create the groupby once and then group.sum() - group.first() as well...
      – Jon Clements♦
      Aug 8 at 11:04







    • 2




      @JonClements Good idea you should post that as an answer. Although saying that groupby object is just metadata, you'd still be calling calculating sum and then first although I'd expect first to be fast
      – EdChum
      Aug 8 at 11:08











    • @EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
      – Jon Clements♦
      Aug 8 at 11:10


















    up vote
    1
    down vote













    Select necessary observations and then apply aggregation:



    df[df.ID != df.Period].groupby('ID').sum()


    Output:



     Period Variable
    ID
    1 2 12
    2 3 18





    share|improve this answer
















    • 4




      Wouldn't this fail with 'ID':2, 'Period':1, 'Variable':14? I think it might be exploiting a loophole in the OP's example?
      – roganjosh
      Aug 8 at 10:57










    • @roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
      – Lev Zakharov
      Aug 8 at 11:01










    Your Answer





    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51744769%2fsum-a-column-by-id-but-skip-the-first-instance%23new-answer', 'question_page');

    );

    Post as a guest






























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    11
    down vote



    accepted










    You can slice within each group to ignore the first row and call sum:



    In[46]:
    df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())

    Out[46]:
    ID
    1 12
    2 18
    Name: Variable, dtype: int64


    If you want all the columns:



    In[47]:
    df.groupby('ID').apply(lambda x: x.iloc[1:].sum())

    Out[47]:
    ID Period Variable
    ID
    1 1 2 12
    2 2 3 18





    share|improve this answer
























      up vote
      11
      down vote



      accepted










      You can slice within each group to ignore the first row and call sum:



      In[46]:
      df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())

      Out[46]:
      ID
      1 12
      2 18
      Name: Variable, dtype: int64


      If you want all the columns:



      In[47]:
      df.groupby('ID').apply(lambda x: x.iloc[1:].sum())

      Out[47]:
      ID Period Variable
      ID
      1 1 2 12
      2 2 3 18





      share|improve this answer






















        up vote
        11
        down vote



        accepted







        up vote
        11
        down vote



        accepted






        You can slice within each group to ignore the first row and call sum:



        In[46]:
        df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())

        Out[46]:
        ID
        1 12
        2 18
        Name: Variable, dtype: int64


        If you want all the columns:



        In[47]:
        df.groupby('ID').apply(lambda x: x.iloc[1:].sum())

        Out[47]:
        ID Period Variable
        ID
        1 1 2 12
        2 2 3 18





        share|improve this answer












        You can slice within each group to ignore the first row and call sum:



        In[46]:
        df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())

        Out[46]:
        ID
        1 12
        2 18
        Name: Variable, dtype: int64


        If you want all the columns:



        In[47]:
        df.groupby('ID').apply(lambda x: x.iloc[1:].sum())

        Out[47]:
        ID Period Variable
        ID
        1 1 2 12
        2 2 3 18






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Aug 8 at 10:54









        EdChum

        156k28320280




        156k28320280






















            up vote
            2
            down vote













            You can use pd.Series.duplicated to ignore the first occurrence:



            res = df[df['ID'].duplicated()].groupby('ID').sum()

            print(res)

            Period Variable
            ID
            1 2 12
            2 3 18





            share|improve this answer
















            • 2




              Was thinking along the same lines, performance-wise it's probably just as taxing as apply on a groupby+1
              – EdChum
              Aug 8 at 11:02











            • @EdChum, Possibly, yes, I just think the lambda adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.
              – jpp
              Aug 8 at 11:03






            • 3




              Probably can also create the groupby once and then group.sum() - group.first() as well...
              – Jon Clements♦
              Aug 8 at 11:04







            • 2




              @JonClements Good idea you should post that as an answer. Although saying that groupby object is just metadata, you'd still be calling calculating sum and then first although I'd expect first to be fast
              – EdChum
              Aug 8 at 11:08











            • @EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
              – Jon Clements♦
              Aug 8 at 11:10















            up vote
            2
            down vote













            You can use pd.Series.duplicated to ignore the first occurrence:



            res = df[df['ID'].duplicated()].groupby('ID').sum()

            print(res)

            Period Variable
            ID
            1 2 12
            2 3 18





            share|improve this answer
















            • 2




              Was thinking along the same lines, performance-wise it's probably just as taxing as apply on a groupby+1
              – EdChum
              Aug 8 at 11:02











            • @EdChum, Possibly, yes, I just think the lambda adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.
              – jpp
              Aug 8 at 11:03






            • 3




              Probably can also create the groupby once and then group.sum() - group.first() as well...
              – Jon Clements♦
              Aug 8 at 11:04







            • 2




              @JonClements Good idea you should post that as an answer. Although saying that groupby object is just metadata, you'd still be calling calculating sum and then first although I'd expect first to be fast
              – EdChum
              Aug 8 at 11:08











            • @EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
              – Jon Clements♦
              Aug 8 at 11:10













            up vote
            2
            down vote










            up vote
            2
            down vote









            You can use pd.Series.duplicated to ignore the first occurrence:



            res = df[df['ID'].duplicated()].groupby('ID').sum()

            print(res)

            Period Variable
            ID
            1 2 12
            2 3 18





            share|improve this answer












            You can use pd.Series.duplicated to ignore the first occurrence:



            res = df[df['ID'].duplicated()].groupby('ID').sum()

            print(res)

            Period Variable
            ID
            1 2 12
            2 3 18






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Aug 8 at 11:01









            jpp

            62.3k173578




            62.3k173578







            • 2




              Was thinking along the same lines, performance-wise it's probably just as taxing as apply on a groupby+1
              – EdChum
              Aug 8 at 11:02











            • @EdChum, Possibly, yes, I just think the lambda adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.
              – jpp
              Aug 8 at 11:03






            • 3




              Probably can also create the groupby once and then group.sum() - group.first() as well...
              – Jon Clements♦
              Aug 8 at 11:04







            • 2




              @JonClements Good idea you should post that as an answer. Although saying that groupby object is just metadata, you'd still be calling calculating sum and then first although I'd expect first to be fast
              – EdChum
              Aug 8 at 11:08











            • @EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
              – Jon Clements♦
              Aug 8 at 11:10













            • 2




              Was thinking along the same lines, performance-wise it's probably just as taxing as apply on a groupby+1
              – EdChum
              Aug 8 at 11:02











            • @EdChum, Possibly, yes, I just think the lambda adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.
              – jpp
              Aug 8 at 11:03






            • 3




              Probably can also create the groupby once and then group.sum() - group.first() as well...
              – Jon Clements♦
              Aug 8 at 11:04







            • 2




              @JonClements Good idea you should post that as an answer. Although saying that groupby object is just metadata, you'd still be calling calculating sum and then first although I'd expect first to be fast
              – EdChum
              Aug 8 at 11:08











            • @EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
              – Jon Clements♦
              Aug 8 at 11:10








            2




            2




            Was thinking along the same lines, performance-wise it's probably just as taxing as apply on a groupby+1
            – EdChum
            Aug 8 at 11:02





            Was thinking along the same lines, performance-wise it's probably just as taxing as apply on a groupby+1
            – EdChum
            Aug 8 at 11:02













            @EdChum, Possibly, yes, I just think the lambda adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.
            – jpp
            Aug 8 at 11:03




            @EdChum, Possibly, yes, I just think the lambda adds extra function calls for a large number of groups. While Boolean indexing is applied as a one-off.
            – jpp
            Aug 8 at 11:03




            3




            3




            Probably can also create the groupby once and then group.sum() - group.first() as well...
            – Jon Clements♦
            Aug 8 at 11:04





            Probably can also create the groupby once and then group.sum() - group.first() as well...
            – Jon Clements♦
            Aug 8 at 11:04





            2




            2




            @JonClements Good idea you should post that as an answer. Although saying that groupby object is just metadata, you'd still be calling calculating sum and then first although I'd expect first to be fast
            – EdChum
            Aug 8 at 11:08





            @JonClements Good idea you should post that as an answer. Although saying that groupby object is just metadata, you'd still be calling calculating sum and then first although I'd expect first to be fast
            – EdChum
            Aug 8 at 11:08













            @EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
            – Jon Clements♦
            Aug 8 at 11:10





            @EdChum meh... it's not really ignoring the first row just countering the effect so I imagine it might be prone to fail under some circumstances... Just throwing it out there in case someone wanted to play with it - but I'm not particularly happy it's a decent answer.
            – Jon Clements♦
            Aug 8 at 11:10











            up vote
            1
            down vote













            Select necessary observations and then apply aggregation:



            df[df.ID != df.Period].groupby('ID').sum()


            Output:



             Period Variable
            ID
            1 2 12
            2 3 18





            share|improve this answer
















            • 4




              Wouldn't this fail with 'ID':2, 'Period':1, 'Variable':14? I think it might be exploiting a loophole in the OP's example?
              – roganjosh
              Aug 8 at 10:57










            • @roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
              – Lev Zakharov
              Aug 8 at 11:01














            up vote
            1
            down vote













            Select necessary observations and then apply aggregation:



            df[df.ID != df.Period].groupby('ID').sum()


            Output:



             Period Variable
            ID
            1 2 12
            2 3 18





            share|improve this answer
















            • 4




              Wouldn't this fail with 'ID':2, 'Period':1, 'Variable':14? I think it might be exploiting a loophole in the OP's example?
              – roganjosh
              Aug 8 at 10:57










            • @roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
              – Lev Zakharov
              Aug 8 at 11:01












            up vote
            1
            down vote










            up vote
            1
            down vote









            Select necessary observations and then apply aggregation:



            df[df.ID != df.Period].groupby('ID').sum()


            Output:



             Period Variable
            ID
            1 2 12
            2 3 18





            share|improve this answer












            Select necessary observations and then apply aggregation:



            df[df.ID != df.Period].groupby('ID').sum()


            Output:



             Period Variable
            ID
            1 2 12
            2 3 18






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Aug 8 at 10:53









            Lev Zakharov

            1,956320




            1,956320







            • 4




              Wouldn't this fail with 'ID':2, 'Period':1, 'Variable':14? I think it might be exploiting a loophole in the OP's example?
              – roganjosh
              Aug 8 at 10:57










            • @roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
              – Lev Zakharov
              Aug 8 at 11:01












            • 4




              Wouldn't this fail with 'ID':2, 'Period':1, 'Variable':14? I think it might be exploiting a loophole in the OP's example?
              – roganjosh
              Aug 8 at 10:57










            • @roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
              – Lev Zakharov
              Aug 8 at 11:01







            4




            4




            Wouldn't this fail with 'ID':2, 'Period':1, 'Variable':14? I think it might be exploiting a loophole in the OP's example?
            – roganjosh
            Aug 8 at 10:57




            Wouldn't this fail with 'ID':2, 'Period':1, 'Variable':14? I think it might be exploiting a loophole in the OP's example?
            – roganjosh
            Aug 8 at 10:57












            @roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
            – Lev Zakharov
            Aug 8 at 11:01




            @roganjosh Yes, my solution based on the rule that first period equals ID. Maybe you're right.
            – Lev Zakharov
            Aug 8 at 11:01

















             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51744769%2fsum-a-column-by-id-but-skip-the-first-instance%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            What does second last employer means? [closed]

            List of Gilmore Girls characters

            Confectionery