Pandas Dataframe Multiindex Merge

up vote
7
down vote

favorite

I wanted to ask a questions regarding merging multiindex dataframe in pandas, here is a hypothetical scenario:

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
 ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])

s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])

Then either

s1.merge(s2, how='left', left_index=True, right_index=True)

s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])

will result in error.

Do I have to do reset_index() on either s1/s2 to make this work?

Thanks

edited 1 hour ago

RafaelC

23.5k72447

asked 1 hour ago

learningToCode

383

New contributor

1

This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route.
â€“Â Scott Boston
1 hour ago

add a commentÂ |Â

up vote
7
down vote

favorite

I wanted to ask a questions regarding merging multiindex dataframe in pandas, here is a hypothetical scenario:

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
 ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])

s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])

Then either

s1.merge(s2, how='left', left_index=True, right_index=True)

s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])

will result in error.

Do I have to do reset_index() on either s1/s2 to make this work?

Thanks

edited 1 hour ago

RafaelC

23.5k72447

asked 1 hour ago

learningToCode

383

New contributor

1

This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route.
â€“Â Scott Boston
1 hour ago

add a commentÂ |Â

up vote
7
down vote

favorite

I wanted to ask a questions regarding merging multiindex dataframe in pandas, here is a hypothetical scenario:

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
 ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])

s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])

Then either

s1.merge(s2, how='left', left_index=True, right_index=True)

s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])

will result in error.

Do I have to do reset_index() on either s1/s2 to make this work?

Thanks

edited 1 hour ago

RafaelC

23.5k72447

asked 1 hour ago

learningToCode

383

New contributor

I wanted to ask a questions regarding merging multiindex dataframe in pandas, here is a hypothetical scenario:

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
 ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])

s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])

Then either

s1.merge(s2, how='left', left_index=True, right_index=True)

s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])

will result in error.

Do I have to do reset_index() on either s1/s2 to make this work?

Thanks

python pandas merge multi-index

edited 1 hour ago

RafaelC

23.5k72447

asked 1 hour ago

learningToCode

383

New contributor

edited 1 hour ago

RafaelC

23.5k72447

asked 1 hour ago

learningToCode

383

New contributor

edited 1 hour ago

RafaelC

23.5k72447

edited 1 hour ago

RafaelC

23.5k72447

edited 1 hour ago

RafaelC

23.5k72447

asked 1 hour ago

learningToCode

383

New contributor

asked 1 hour ago

learningToCode

383

asked 1 hour ago

learningToCode

383

New contributor

learningToCode is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

1

This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route.
â€“Â Scott Boston
1 hour ago

add a commentÂ |Â

1

This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route.
â€“Â Scott Boston
1 hour ago

This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route.
â€“Â Scott Boston
1 hour ago

add a commentÂ |Â

4 Answers
4

active

oldest

votes

up vote
7
down vote

accepted

Seems like you need to use a combination of them.

s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
#s1.merge(s2, right_index=True, left_on=['first', 'second'])

Output:

 s1 s2
bar one 0.765385 -0.365508
 two 1.462860 0.751862
baz one 0.304163 0.761663
 two -0.816658 -1.810634
foo one 1.891434 1.450081
 two 0.571294 1.116862
qux one 1.056516 -0.052927
 two -0.574916 -1.197596

answered 1 hour ago

ALollz

8,18031131

add a commentÂ |Â

up vote
6
down vote

Other than using the indexes names as pointed by @ALollz, you can simply use loc, which will match indexes automatically

s1.loc[:, 's2'] = s2 # Or explicitly, s2['s2']

 s1 s2
first second 
bar one -0.111384 -2.341803
 two -1.226569 1.308240
baz one 1.880835 0.697946
 two -0.008979 -0.247896
foo one 0.103864 -1.039990
 two 0.836931 0.000811
qux one -0.859005 -1.199615
 two -0.321341 -1.098691

A general formula would be

s1.loc[:, s2.columns] = s2

edited 1 hour ago

answered 1 hour ago

RafaelC

23.5k72447

add a commentÂ |Â

up vote
4
down vote

`rename_axis`

You can rename the index levels of one and let join do its thing

s1.join(s2.rename_axis(s1.index.names))

 s1 s2
first second 
bar one -0.696420 -1.040463
 two 0.640891 1.483262
baz one 1.598837 0.097424
 two 0.003994 -0.948419
foo one -0.717401 1.190019
 two -1.201237 -0.000738
qux one 0.559684 -0.505640
 two 1.979700 0.186013

`concat`

pd.concat([s1, s2], axis=1)

 s1 s2
first second 
bar one -0.696420 -1.040463
 two 0.640891 1.483262
baz one 1.598837 0.097424
 two 0.003994 -0.948419
foo one -0.717401 1.190019
 two -1.201237 -0.000738
qux one 0.559684 -0.505640
 two 1.979700 0.186013

answered 1 hour ago

piRSquared

144k19123255

1

Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
â€“Â Scott Boston
1 hour ago

add a commentÂ |Â

up vote
4
down vote

Assign it by combine_first

s1.combine_first(s2)
Out[19]: 
 s1 s2
first second 
bar one 0.039203 0.795963
 two 0.454782 -0.222806
baz one 3.101120 -0.645474
 two -1.174929 -0.875561
foo one -0.887226 1.078218
 two 1.507546 -1.078564
qux one 0.028048 0.042462
 two 0.826544 -0.375351

# s2.combine_first(s1)

answered 1 hour ago

Wen

85.4k72452

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

learningToCode is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52785579%2fpandas-dataframe-multiindex-merge%23new-answer', 'question_page');

);

Post as a guest

Name

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
7
down vote

accepted

Seems like you need to use a combination of them.

s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
#s1.merge(s2, right_index=True, left_on=['first', 'second'])

Output:

 s1 s2
bar one 0.765385 -0.365508
 two 1.462860 0.751862
baz one 0.304163 0.761663
 two -0.816658 -1.810634
foo one 1.891434 1.450081
 two 0.571294 1.116862
qux one 1.056516 -0.052927
 two -0.574916 -1.197596

answered 1 hour ago

ALollz

8,18031131

add a commentÂ |Â

up vote
7
down vote

accepted

Seems like you need to use a combination of them.

s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
#s1.merge(s2, right_index=True, left_on=['first', 'second'])

Output:

 s1 s2
bar one 0.765385 -0.365508
 two 1.462860 0.751862
baz one 0.304163 0.761663
 two -0.816658 -1.810634
foo one 1.891434 1.450081
 two 0.571294 1.116862
qux one 1.056516 -0.052927
 two -0.574916 -1.197596

answered 1 hour ago

ALollz

8,18031131

add a commentÂ |Â

up vote
7
down vote

accepted

Seems like you need to use a combination of them.

s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
#s1.merge(s2, right_index=True, left_on=['first', 'second'])

Output:

 s1 s2
bar one 0.765385 -0.365508
 two 1.462860 0.751862
baz one 0.304163 0.761663
 two -0.816658 -1.810634
foo one 1.891434 1.450081
 two 0.571294 1.116862
qux one 1.056516 -0.052927
 two -0.574916 -1.197596

answered 1 hour ago

ALollz

8,18031131

Seems like you need to use a combination of them.

s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
#s1.merge(s2, right_index=True, left_on=['first', 'second'])

Output:

 s1 s2
bar one 0.765385 -0.365508
 two 1.462860 0.751862
baz one 0.304163 0.761663
 two -0.816658 -1.810634
foo one 1.891434 1.450081
 two 0.571294 1.116862
qux one 1.056516 -0.052927
 two -0.574916 -1.197596

answered 1 hour ago

ALollz

8,18031131

answered 1 hour ago

ALollz

8,18031131

answered 1 hour ago

ALollz

8,18031131

answered 1 hour ago

ALollz

8,18031131

add a commentÂ |Â

up vote
6
down vote

Other than using the indexes names as pointed by @ALollz, you can simply use loc, which will match indexes automatically

s1.loc[:, 's2'] = s2 # Or explicitly, s2['s2']

 s1 s2
first second 
bar one -0.111384 -2.341803
 two -1.226569 1.308240
baz one 1.880835 0.697946
 two -0.008979 -0.247896
foo one 0.103864 -1.039990
 two 0.836931 0.000811
qux one -0.859005 -1.199615
 two -0.321341 -1.098691

A general formula would be

s1.loc[:, s2.columns] = s2

edited 1 hour ago

answered 1 hour ago

RafaelC

23.5k72447

add a commentÂ |Â

up vote
6
down vote

Other than using the indexes names as pointed by @ALollz, you can simply use loc, which will match indexes automatically

s1.loc[:, 's2'] = s2 # Or explicitly, s2['s2']

 s1 s2
first second 
bar one -0.111384 -2.341803
 two -1.226569 1.308240
baz one 1.880835 0.697946
 two -0.008979 -0.247896
foo one 0.103864 -1.039990
 two 0.836931 0.000811
qux one -0.859005 -1.199615
 two -0.321341 -1.098691

A general formula would be

s1.loc[:, s2.columns] = s2

edited 1 hour ago

answered 1 hour ago

RafaelC

23.5k72447

add a commentÂ |Â

up vote
6
down vote

Other than using the indexes names as pointed by @ALollz, you can simply use loc, which will match indexes automatically

s1.loc[:, 's2'] = s2 # Or explicitly, s2['s2']

 s1 s2
first second 
bar one -0.111384 -2.341803
 two -1.226569 1.308240
baz one 1.880835 0.697946
 two -0.008979 -0.247896
foo one 0.103864 -1.039990
 two 0.836931 0.000811
qux one -0.859005 -1.199615
 two -0.321341 -1.098691

A general formula would be

s1.loc[:, s2.columns] = s2

edited 1 hour ago

answered 1 hour ago

RafaelC

23.5k72447

Other than using the indexes names as pointed by @ALollz, you can simply use loc, which will match indexes automatically

s1.loc[:, 's2'] = s2 # Or explicitly, s2['s2']

 s1 s2
first second 
bar one -0.111384 -2.341803
 two -1.226569 1.308240
baz one 1.880835 0.697946
 two -0.008979 -0.247896
foo one 0.103864 -1.039990
 two 0.836931 0.000811
qux one -0.859005 -1.199615
 two -0.321341 -1.098691

A general formula would be

s1.loc[:, s2.columns] = s2

edited 1 hour ago

answered 1 hour ago

RafaelC

23.5k72447

edited 1 hour ago

answered 1 hour ago

RafaelC

23.5k72447

answered 1 hour ago

RafaelC

23.5k72447

answered 1 hour ago

RafaelC

23.5k72447

add a commentÂ |Â

up vote
4
down vote

`rename_axis`

You can rename the index levels of one and let join do its thing

s1.join(s2.rename_axis(s1.index.names))

 s1 s2
first second 
bar one -0.696420 -1.040463
 two 0.640891 1.483262
baz one 1.598837 0.097424
 two 0.003994 -0.948419
foo one -0.717401 1.190019
 two -1.201237 -0.000738
qux one 0.559684 -0.505640
 two 1.979700 0.186013

`concat`

pd.concat([s1, s2], axis=1)

 s1 s2
first second 
bar one -0.696420 -1.040463
 two 0.640891 1.483262
baz one 1.598837 0.097424
 two 0.003994 -0.948419
foo one -0.717401 1.190019
 two -1.201237 -0.000738
qux one 0.559684 -0.505640
 two 1.979700 0.186013

answered 1 hour ago

piRSquared

144k19123255

1

Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
â€“Â Scott Boston
1 hour ago

add a commentÂ |Â

up vote
4
down vote

`rename_axis`

You can rename the index levels of one and let join do its thing

s1.join(s2.rename_axis(s1.index.names))

 s1 s2
first second 
bar one -0.696420 -1.040463
 two 0.640891 1.483262
baz one 1.598837 0.097424
 two 0.003994 -0.948419
foo one -0.717401 1.190019
 two -1.201237 -0.000738
qux one 0.559684 -0.505640
 two 1.979700 0.186013

`concat`

pd.concat([s1, s2], axis=1)

 s1 s2
first second 
bar one -0.696420 -1.040463
 two 0.640891 1.483262
baz one 1.598837 0.097424
 two 0.003994 -0.948419
foo one -0.717401 1.190019
 two -1.201237 -0.000738
qux one 0.559684 -0.505640
 two 1.979700 0.186013

answered 1 hour ago

piRSquared

144k19123255

1

Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
â€“Â Scott Boston
1 hour ago

add a commentÂ |Â

up vote
4
down vote

`rename_axis`

You can rename the index levels of one and let join do its thing

s1.join(s2.rename_axis(s1.index.names))

 s1 s2
first second 
bar one -0.696420 -1.040463
 two 0.640891 1.483262
baz one 1.598837 0.097424
 two 0.003994 -0.948419
foo one -0.717401 1.190019
 two -1.201237 -0.000738
qux one 0.559684 -0.505640
 two 1.979700 0.186013

`concat`

pd.concat([s1, s2], axis=1)

 s1 s2
first second 
bar one -0.696420 -1.040463
 two 0.640891 1.483262
baz one 1.598837 0.097424
 two 0.003994 -0.948419
foo one -0.717401 1.190019
 two -1.201237 -0.000738
qux one 0.559684 -0.505640
 two 1.979700 0.186013

answered 1 hour ago

piRSquared

144k19123255

`rename_axis`

You can rename the index levels of one and let join do its thing

s1.join(s2.rename_axis(s1.index.names))

 s1 s2
first second 
bar one -0.696420 -1.040463
 two 0.640891 1.483262
baz one 1.598837 0.097424
 two 0.003994 -0.948419
foo one -0.717401 1.190019
 two -1.201237 -0.000738
qux one 0.559684 -0.505640
 two 1.979700 0.186013

`concat`

pd.concat([s1, s2], axis=1)

 s1 s2
first second 
bar one -0.696420 -1.040463
 two 0.640891 1.483262
baz one 1.598837 0.097424
 two 0.003994 -0.948419
foo one -0.717401 1.190019
 two -1.201237 -0.000738
qux one 0.559684 -0.505640
 two 1.979700 0.186013

answered 1 hour ago

piRSquared

144k19123255

answered 1 hour ago

piRSquared

144k19123255

answered 1 hour ago

piRSquared

144k19123255

answered 1 hour ago

piRSquared

144k19123255

1

Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
â€“Â Scott Boston
1 hour ago

add a commentÂ |Â

1

Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
â€“Â Scott Boston
1 hour ago

Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
â€“Â Scott Boston
1 hour ago

add a commentÂ |Â

up vote
4
down vote

Assign it by combine_first

s1.combine_first(s2)
Out[19]: 
 s1 s2
first second 
bar one 0.039203 0.795963
 two 0.454782 -0.222806
baz one 3.101120 -0.645474
 two -1.174929 -0.875561
foo one -0.887226 1.078218
 two 1.507546 -1.078564
qux one 0.028048 0.042462
 two 0.826544 -0.375351

# s2.combine_first(s1)

answered 1 hour ago

Wen

85.4k72452

add a commentÂ |Â

up vote
4
down vote

Assign it by combine_first

s1.combine_first(s2)
Out[19]: 
 s1 s2
first second 
bar one 0.039203 0.795963
 two 0.454782 -0.222806
baz one 3.101120 -0.645474
 two -1.174929 -0.875561
foo one -0.887226 1.078218
 two 1.507546 -1.078564
qux one 0.028048 0.042462
 two 0.826544 -0.375351

# s2.combine_first(s1)

answered 1 hour ago

Wen

85.4k72452

add a commentÂ |Â

up vote
4
down vote

Assign it by combine_first

s1.combine_first(s2)
Out[19]: 
 s1 s2
first second 
bar one 0.039203 0.795963
 two 0.454782 -0.222806
baz one 3.101120 -0.645474
 two -1.174929 -0.875561
foo one -0.887226 1.078218
 two 1.507546 -1.078564
qux one 0.028048 0.042462
 two 0.826544 -0.375351

# s2.combine_first(s1)

answered 1 hour ago

Wen

85.4k72452

Assign it by combine_first

s1.combine_first(s2)
Out[19]: 
 s1 s2
first second 
bar one 0.039203 0.795963
 two 0.454782 -0.222806
baz one 3.101120 -0.645474
 two -1.174929 -0.875561
foo one -0.887226 1.078218
 two 1.507546 -1.078564
qux one 0.028048 0.042462
 two 0.826544 -0.375351

# s2.combine_first(s1)

answered 1 hour ago

Wen

85.4k72452

answered 1 hour ago

Wen

85.4k72452

answered 1 hour ago

Wen

85.4k72452

answered 1 hour ago

Wen

85.4k72452

add a commentÂ |Â

learningToCode is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

learningToCode is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky