Pandas Dataframe Multiindex Merge

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
7
down vote

favorite












I wanted to ask a questions regarding merging multiindex dataframe in pandas, here is a hypothetical scenario:



arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])

s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])


Then either



s1.merge(s2, how='left', left_index=True, right_index=True)


or



s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])


will result in error.



Do I have to do reset_index() on either s1/s2 to make this work?



Thanks










share|improve this question









New contributor




learningToCode is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.















  • 1




    This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route.
    – Scott Boston
    1 hour ago














up vote
7
down vote

favorite












I wanted to ask a questions regarding merging multiindex dataframe in pandas, here is a hypothetical scenario:



arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])

s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])


Then either



s1.merge(s2, how='left', left_index=True, right_index=True)


or



s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])


will result in error.



Do I have to do reset_index() on either s1/s2 to make this work?



Thanks










share|improve this question









New contributor




learningToCode is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.















  • 1




    This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route.
    – Scott Boston
    1 hour ago












up vote
7
down vote

favorite









up vote
7
down vote

favorite











I wanted to ask a questions regarding merging multiindex dataframe in pandas, here is a hypothetical scenario:



arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])

s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])


Then either



s1.merge(s2, how='left', left_index=True, right_index=True)


or



s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])


will result in error.



Do I have to do reset_index() on either s1/s2 to make this work?



Thanks










share|improve this question









New contributor




learningToCode is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I wanted to ask a questions regarding merging multiindex dataframe in pandas, here is a hypothetical scenario:



arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])

s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])


Then either



s1.merge(s2, how='left', left_index=True, right_index=True)


or



s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])


will result in error.



Do I have to do reset_index() on either s1/s2 to make this work?



Thanks







python pandas merge multi-index






share|improve this question









New contributor




learningToCode is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




learningToCode is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 1 hour ago









RafaelC

23.5k72447




23.5k72447






New contributor




learningToCode is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 1 hour ago









learningToCode

383




383




New contributor




learningToCode is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





learningToCode is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






learningToCode is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







  • 1




    This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route.
    – Scott Boston
    1 hour ago












  • 1




    This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route.
    – Scott Boston
    1 hour ago







1




1




This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route.
– Scott Boston
1 hour ago




This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route.
– Scott Boston
1 hour ago












4 Answers
4






active

oldest

votes

















up vote
7
down vote



accepted










Seems like you need to use a combination of them.



s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
#s1.merge(s2, right_index=True, left_on=['first', 'second'])


Output:



 s1 s2
bar one 0.765385 -0.365508
two 1.462860 0.751862
baz one 0.304163 0.761663
two -0.816658 -1.810634
foo one 1.891434 1.450081
two 0.571294 1.116862
qux one 1.056516 -0.052927
two -0.574916 -1.197596





share|improve this answer



























    up vote
    6
    down vote













    Other than using the indexes names as pointed by @ALollz, you can simply use loc, which will match indexes automatically



    s1.loc[:, 's2'] = s2 # Or explicitly, s2['s2']

    s1 s2
    first second
    bar one -0.111384 -2.341803
    two -1.226569 1.308240
    baz one 1.880835 0.697946
    two -0.008979 -0.247896
    foo one 0.103864 -1.039990
    two 0.836931 0.000811
    qux one -0.859005 -1.199615
    two -0.321341 -1.098691


    A general formula would be



    s1.loc[:, s2.columns] = s2





    share|improve this answer





























      up vote
      4
      down vote













      rename_axis



      You can rename the index levels of one and let join do its thing



      s1.join(s2.rename_axis(s1.index.names))

      s1 s2
      first second
      bar one -0.696420 -1.040463
      two 0.640891 1.483262
      baz one 1.598837 0.097424
      two 0.003994 -0.948419
      foo one -0.717401 1.190019
      two -1.201237 -0.000738
      qux one 0.559684 -0.505640
      two 1.979700 0.186013



      concat



      pd.concat([s1, s2], axis=1)

      s1 s2
      first second
      bar one -0.696420 -1.040463
      two 0.640891 1.483262
      baz one 1.598837 0.097424
      two 0.003994 -0.948419
      foo one -0.717401 1.190019
      two -1.201237 -0.000738
      qux one 0.559684 -0.505640
      two 1.979700 0.186013





      share|improve this answer
















      • 1




        Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
        – Scott Boston
        1 hour ago

















      up vote
      4
      down vote













      Assign it by combine_first



      s1.combine_first(s2)
      Out[19]:
      s1 s2
      first second
      bar one 0.039203 0.795963
      two 0.454782 -0.222806
      baz one 3.101120 -0.645474
      two -1.174929 -0.875561
      foo one -0.887226 1.078218
      two 1.507546 -1.078564
      qux one 0.028048 0.042462
      two 0.826544 -0.375351

      # s2.combine_first(s1)





      share|improve this answer




















        Your Answer





        StackExchange.ifUsing("editor", function ()
        StackExchange.using("externalEditor", function ()
        StackExchange.using("snippets", function ()
        StackExchange.snippets.init();
        );
        );
        , "code-snippets");

        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "1"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        convertImagesToLinks: true,
        noModals: false,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );






        learningToCode is a new contributor. Be nice, and check out our Code of Conduct.









         

        draft saved


        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52785579%2fpandas-dataframe-multiindex-merge%23new-answer', 'question_page');

        );

        Post as a guest






























        4 Answers
        4






        active

        oldest

        votes








        4 Answers
        4






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes








        up vote
        7
        down vote



        accepted










        Seems like you need to use a combination of them.



        s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
        #s1.merge(s2, right_index=True, left_on=['first', 'second'])


        Output:



         s1 s2
        bar one 0.765385 -0.365508
        two 1.462860 0.751862
        baz one 0.304163 0.761663
        two -0.816658 -1.810634
        foo one 1.891434 1.450081
        two 0.571294 1.116862
        qux one 1.056516 -0.052927
        two -0.574916 -1.197596





        share|improve this answer
























          up vote
          7
          down vote



          accepted










          Seems like you need to use a combination of them.



          s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
          #s1.merge(s2, right_index=True, left_on=['first', 'second'])


          Output:



           s1 s2
          bar one 0.765385 -0.365508
          two 1.462860 0.751862
          baz one 0.304163 0.761663
          two -0.816658 -1.810634
          foo one 1.891434 1.450081
          two 0.571294 1.116862
          qux one 1.056516 -0.052927
          two -0.574916 -1.197596





          share|improve this answer






















            up vote
            7
            down vote



            accepted







            up vote
            7
            down vote



            accepted






            Seems like you need to use a combination of them.



            s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
            #s1.merge(s2, right_index=True, left_on=['first', 'second'])


            Output:



             s1 s2
            bar one 0.765385 -0.365508
            two 1.462860 0.751862
            baz one 0.304163 0.761663
            two -0.816658 -1.810634
            foo one 1.891434 1.450081
            two 0.571294 1.116862
            qux one 1.056516 -0.052927
            two -0.574916 -1.197596





            share|improve this answer












            Seems like you need to use a combination of them.



            s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
            #s1.merge(s2, right_index=True, left_on=['first', 'second'])


            Output:



             s1 s2
            bar one 0.765385 -0.365508
            two 1.462860 0.751862
            baz one 0.304163 0.761663
            two -0.816658 -1.810634
            foo one 1.891434 1.450081
            two 0.571294 1.116862
            qux one 1.056516 -0.052927
            two -0.574916 -1.197596






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered 1 hour ago









            ALollz

            8,18031131




            8,18031131






















                up vote
                6
                down vote













                Other than using the indexes names as pointed by @ALollz, you can simply use loc, which will match indexes automatically



                s1.loc[:, 's2'] = s2 # Or explicitly, s2['s2']

                s1 s2
                first second
                bar one -0.111384 -2.341803
                two -1.226569 1.308240
                baz one 1.880835 0.697946
                two -0.008979 -0.247896
                foo one 0.103864 -1.039990
                two 0.836931 0.000811
                qux one -0.859005 -1.199615
                two -0.321341 -1.098691


                A general formula would be



                s1.loc[:, s2.columns] = s2





                share|improve this answer


























                  up vote
                  6
                  down vote













                  Other than using the indexes names as pointed by @ALollz, you can simply use loc, which will match indexes automatically



                  s1.loc[:, 's2'] = s2 # Or explicitly, s2['s2']

                  s1 s2
                  first second
                  bar one -0.111384 -2.341803
                  two -1.226569 1.308240
                  baz one 1.880835 0.697946
                  two -0.008979 -0.247896
                  foo one 0.103864 -1.039990
                  two 0.836931 0.000811
                  qux one -0.859005 -1.199615
                  two -0.321341 -1.098691


                  A general formula would be



                  s1.loc[:, s2.columns] = s2





                  share|improve this answer
























                    up vote
                    6
                    down vote










                    up vote
                    6
                    down vote









                    Other than using the indexes names as pointed by @ALollz, you can simply use loc, which will match indexes automatically



                    s1.loc[:, 's2'] = s2 # Or explicitly, s2['s2']

                    s1 s2
                    first second
                    bar one -0.111384 -2.341803
                    two -1.226569 1.308240
                    baz one 1.880835 0.697946
                    two -0.008979 -0.247896
                    foo one 0.103864 -1.039990
                    two 0.836931 0.000811
                    qux one -0.859005 -1.199615
                    two -0.321341 -1.098691


                    A general formula would be



                    s1.loc[:, s2.columns] = s2





                    share|improve this answer














                    Other than using the indexes names as pointed by @ALollz, you can simply use loc, which will match indexes automatically



                    s1.loc[:, 's2'] = s2 # Or explicitly, s2['s2']

                    s1 s2
                    first second
                    bar one -0.111384 -2.341803
                    two -1.226569 1.308240
                    baz one 1.880835 0.697946
                    two -0.008979 -0.247896
                    foo one 0.103864 -1.039990
                    two 0.836931 0.000811
                    qux one -0.859005 -1.199615
                    two -0.321341 -1.098691


                    A general formula would be



                    s1.loc[:, s2.columns] = s2






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited 1 hour ago

























                    answered 1 hour ago









                    RafaelC

                    23.5k72447




                    23.5k72447




















                        up vote
                        4
                        down vote













                        rename_axis



                        You can rename the index levels of one and let join do its thing



                        s1.join(s2.rename_axis(s1.index.names))

                        s1 s2
                        first second
                        bar one -0.696420 -1.040463
                        two 0.640891 1.483262
                        baz one 1.598837 0.097424
                        two 0.003994 -0.948419
                        foo one -0.717401 1.190019
                        two -1.201237 -0.000738
                        qux one 0.559684 -0.505640
                        two 1.979700 0.186013



                        concat



                        pd.concat([s1, s2], axis=1)

                        s1 s2
                        first second
                        bar one -0.696420 -1.040463
                        two 0.640891 1.483262
                        baz one 1.598837 0.097424
                        two 0.003994 -0.948419
                        foo one -0.717401 1.190019
                        two -1.201237 -0.000738
                        qux one 0.559684 -0.505640
                        two 1.979700 0.186013





                        share|improve this answer
















                        • 1




                          Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
                          – Scott Boston
                          1 hour ago














                        up vote
                        4
                        down vote













                        rename_axis



                        You can rename the index levels of one and let join do its thing



                        s1.join(s2.rename_axis(s1.index.names))

                        s1 s2
                        first second
                        bar one -0.696420 -1.040463
                        two 0.640891 1.483262
                        baz one 1.598837 0.097424
                        two 0.003994 -0.948419
                        foo one -0.717401 1.190019
                        two -1.201237 -0.000738
                        qux one 0.559684 -0.505640
                        two 1.979700 0.186013



                        concat



                        pd.concat([s1, s2], axis=1)

                        s1 s2
                        first second
                        bar one -0.696420 -1.040463
                        two 0.640891 1.483262
                        baz one 1.598837 0.097424
                        two 0.003994 -0.948419
                        foo one -0.717401 1.190019
                        two -1.201237 -0.000738
                        qux one 0.559684 -0.505640
                        two 1.979700 0.186013





                        share|improve this answer
















                        • 1




                          Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
                          – Scott Boston
                          1 hour ago












                        up vote
                        4
                        down vote










                        up vote
                        4
                        down vote









                        rename_axis



                        You can rename the index levels of one and let join do its thing



                        s1.join(s2.rename_axis(s1.index.names))

                        s1 s2
                        first second
                        bar one -0.696420 -1.040463
                        two 0.640891 1.483262
                        baz one 1.598837 0.097424
                        two 0.003994 -0.948419
                        foo one -0.717401 1.190019
                        two -1.201237 -0.000738
                        qux one 0.559684 -0.505640
                        two 1.979700 0.186013



                        concat



                        pd.concat([s1, s2], axis=1)

                        s1 s2
                        first second
                        bar one -0.696420 -1.040463
                        two 0.640891 1.483262
                        baz one 1.598837 0.097424
                        two 0.003994 -0.948419
                        foo one -0.717401 1.190019
                        two -1.201237 -0.000738
                        qux one 0.559684 -0.505640
                        two 1.979700 0.186013





                        share|improve this answer












                        rename_axis



                        You can rename the index levels of one and let join do its thing



                        s1.join(s2.rename_axis(s1.index.names))

                        s1 s2
                        first second
                        bar one -0.696420 -1.040463
                        two 0.640891 1.483262
                        baz one 1.598837 0.097424
                        two 0.003994 -0.948419
                        foo one -0.717401 1.190019
                        two -1.201237 -0.000738
                        qux one 0.559684 -0.505640
                        two 1.979700 0.186013



                        concat



                        pd.concat([s1, s2], axis=1)

                        s1 s2
                        first second
                        bar one -0.696420 -1.040463
                        two 0.640891 1.483262
                        baz one 1.598837 0.097424
                        two 0.003994 -0.948419
                        foo one -0.717401 1.190019
                        two -1.201237 -0.000738
                        qux one 0.559684 -0.505640
                        two 1.979700 0.186013






                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered 1 hour ago









                        piRSquared

                        144k19123255




                        144k19123255







                        • 1




                          Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
                          – Scott Boston
                          1 hour ago












                        • 1




                          Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
                          – Scott Boston
                          1 hour ago







                        1




                        1




                        Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
                        – Scott Boston
                        1 hour ago




                        Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
                        – Scott Boston
                        1 hour ago










                        up vote
                        4
                        down vote













                        Assign it by combine_first



                        s1.combine_first(s2)
                        Out[19]:
                        s1 s2
                        first second
                        bar one 0.039203 0.795963
                        two 0.454782 -0.222806
                        baz one 3.101120 -0.645474
                        two -1.174929 -0.875561
                        foo one -0.887226 1.078218
                        two 1.507546 -1.078564
                        qux one 0.028048 0.042462
                        two 0.826544 -0.375351

                        # s2.combine_first(s1)





                        share|improve this answer
























                          up vote
                          4
                          down vote













                          Assign it by combine_first



                          s1.combine_first(s2)
                          Out[19]:
                          s1 s2
                          first second
                          bar one 0.039203 0.795963
                          two 0.454782 -0.222806
                          baz one 3.101120 -0.645474
                          two -1.174929 -0.875561
                          foo one -0.887226 1.078218
                          two 1.507546 -1.078564
                          qux one 0.028048 0.042462
                          two 0.826544 -0.375351

                          # s2.combine_first(s1)





                          share|improve this answer






















                            up vote
                            4
                            down vote










                            up vote
                            4
                            down vote









                            Assign it by combine_first



                            s1.combine_first(s2)
                            Out[19]:
                            s1 s2
                            first second
                            bar one 0.039203 0.795963
                            two 0.454782 -0.222806
                            baz one 3.101120 -0.645474
                            two -1.174929 -0.875561
                            foo one -0.887226 1.078218
                            two 1.507546 -1.078564
                            qux one 0.028048 0.042462
                            two 0.826544 -0.375351

                            # s2.combine_first(s1)





                            share|improve this answer












                            Assign it by combine_first



                            s1.combine_first(s2)
                            Out[19]:
                            s1 s2
                            first second
                            bar one 0.039203 0.795963
                            two 0.454782 -0.222806
                            baz one 3.101120 -0.645474
                            two -1.174929 -0.875561
                            foo one -0.887226 1.078218
                            two 1.507546 -1.078564
                            qux one 0.028048 0.042462
                            two 0.826544 -0.375351

                            # s2.combine_first(s1)






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered 1 hour ago









                            Wen

                            85.4k72452




                            85.4k72452




















                                learningToCode is a new contributor. Be nice, and check out our Code of Conduct.









                                 

                                draft saved


                                draft discarded


















                                learningToCode is a new contributor. Be nice, and check out our Code of Conduct.












                                learningToCode is a new contributor. Be nice, and check out our Code of Conduct.











                                learningToCode is a new contributor. Be nice, and check out our Code of Conduct.













                                 


                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52785579%2fpandas-dataframe-multiindex-merge%23new-answer', 'question_page');

                                );

                                Post as a guest













































































                                Comments

                                Popular posts from this blog

                                Long meetings (6-7 hours a day): Being “babysat” by supervisor

                                What does second last employer means? [closed]

                                One-line joke