Pandas Dataframe Multiindex Merge
Clash Royale CLAN TAG#URR8PPP
up vote
7
down vote
favorite
I wanted to ask a questions regarding merging multiindex dataframe in pandas, here is a hypothetical scenario:
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])
s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])
Then either
s1.merge(s2, how='left', left_index=True, right_index=True)
or
s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])
will result in error.
Do I have to do reset_index() on either s1/s2 to make this work?
Thanks
python pandas merge multi-index
New contributor
add a comment |Â
up vote
7
down vote
favorite
I wanted to ask a questions regarding merging multiindex dataframe in pandas, here is a hypothetical scenario:
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])
s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])
Then either
s1.merge(s2, how='left', left_index=True, right_index=True)
or
s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])
will result in error.
Do I have to do reset_index() on either s1/s2 to make this work?
Thanks
python pandas merge multi-index
New contributor
1
This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route.
â Scott Boston
1 hour ago
add a comment |Â
up vote
7
down vote
favorite
up vote
7
down vote
favorite
I wanted to ask a questions regarding merging multiindex dataframe in pandas, here is a hypothetical scenario:
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])
s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])
Then either
s1.merge(s2, how='left', left_index=True, right_index=True)
or
s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])
will result in error.
Do I have to do reset_index() on either s1/s2 to make this work?
Thanks
python pandas merge multi-index
New contributor
I wanted to ask a questions regarding merging multiindex dataframe in pandas, here is a hypothetical scenario:
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])
s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])
Then either
s1.merge(s2, how='left', left_index=True, right_index=True)
or
s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])
will result in error.
Do I have to do reset_index() on either s1/s2 to make this work?
Thanks
python pandas merge multi-index
python pandas merge multi-index
New contributor
New contributor
edited 1 hour ago
RafaelC
23.5k72447
23.5k72447
New contributor
asked 1 hour ago
learningToCode
383
383
New contributor
New contributor
1
This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route.
â Scott Boston
1 hour ago
add a comment |Â
1
This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route.
â Scott Boston
1 hour ago
1
1
This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route.
â Scott Boston
1 hour ago
This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route.
â Scott Boston
1 hour ago
add a comment |Â
4 Answers
4
active
oldest
votes
up vote
7
down vote
accepted
Seems like you need to use a combination of them.
s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
#s1.merge(s2, right_index=True, left_on=['first', 'second'])
Output:
s1 s2
bar one 0.765385 -0.365508
two 1.462860 0.751862
baz one 0.304163 0.761663
two -0.816658 -1.810634
foo one 1.891434 1.450081
two 0.571294 1.116862
qux one 1.056516 -0.052927
two -0.574916 -1.197596
add a comment |Â
up vote
6
down vote
Other than using the indexes names as pointed by @ALollz, you can simply use loc
, which will match indexes automatically
s1.loc[:, 's2'] = s2 # Or explicitly, s2['s2']
s1 s2
first second
bar one -0.111384 -2.341803
two -1.226569 1.308240
baz one 1.880835 0.697946
two -0.008979 -0.247896
foo one 0.103864 -1.039990
two 0.836931 0.000811
qux one -0.859005 -1.199615
two -0.321341 -1.098691
A general formula would be
s1.loc[:, s2.columns] = s2
add a comment |Â
up vote
4
down vote
rename_axis
You can rename the index levels of one and let join
do its thing
s1.join(s2.rename_axis(s1.index.names))
s1 s2
first second
bar one -0.696420 -1.040463
two 0.640891 1.483262
baz one 1.598837 0.097424
two 0.003994 -0.948419
foo one -0.717401 1.190019
two -1.201237 -0.000738
qux one 0.559684 -0.505640
two 1.979700 0.186013
concat
pd.concat([s1, s2], axis=1)
s1 s2
first second
bar one -0.696420 -1.040463
two 0.640891 1.483262
baz one 1.598837 0.097424
two 0.003994 -0.948419
foo one -0.717401 1.190019
two -1.201237 -0.000738
qux one 0.559684 -0.505640
two 1.979700 0.186013
1
Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
â Scott Boston
1 hour ago
add a comment |Â
up vote
4
down vote
Assign it by combine_first
s1.combine_first(s2)
Out[19]:
s1 s2
first second
bar one 0.039203 0.795963
two 0.454782 -0.222806
baz one 3.101120 -0.645474
two -1.174929 -0.875561
foo one -0.887226 1.078218
two 1.507546 -1.078564
qux one 0.028048 0.042462
two 0.826544 -0.375351
# s2.combine_first(s1)
add a comment |Â
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
7
down vote
accepted
Seems like you need to use a combination of them.
s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
#s1.merge(s2, right_index=True, left_on=['first', 'second'])
Output:
s1 s2
bar one 0.765385 -0.365508
two 1.462860 0.751862
baz one 0.304163 0.761663
two -0.816658 -1.810634
foo one 1.891434 1.450081
two 0.571294 1.116862
qux one 1.056516 -0.052927
two -0.574916 -1.197596
add a comment |Â
up vote
7
down vote
accepted
Seems like you need to use a combination of them.
s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
#s1.merge(s2, right_index=True, left_on=['first', 'second'])
Output:
s1 s2
bar one 0.765385 -0.365508
two 1.462860 0.751862
baz one 0.304163 0.761663
two -0.816658 -1.810634
foo one 1.891434 1.450081
two 0.571294 1.116862
qux one 1.056516 -0.052927
two -0.574916 -1.197596
add a comment |Â
up vote
7
down vote
accepted
up vote
7
down vote
accepted
Seems like you need to use a combination of them.
s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
#s1.merge(s2, right_index=True, left_on=['first', 'second'])
Output:
s1 s2
bar one 0.765385 -0.365508
two 1.462860 0.751862
baz one 0.304163 0.761663
two -0.816658 -1.810634
foo one 1.891434 1.450081
two 0.571294 1.116862
qux one 1.056516 -0.052927
two -0.574916 -1.197596
Seems like you need to use a combination of them.
s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
#s1.merge(s2, right_index=True, left_on=['first', 'second'])
Output:
s1 s2
bar one 0.765385 -0.365508
two 1.462860 0.751862
baz one 0.304163 0.761663
two -0.816658 -1.810634
foo one 1.891434 1.450081
two 0.571294 1.116862
qux one 1.056516 -0.052927
two -0.574916 -1.197596
answered 1 hour ago
ALollz
8,18031131
8,18031131
add a comment |Â
add a comment |Â
up vote
6
down vote
Other than using the indexes names as pointed by @ALollz, you can simply use loc
, which will match indexes automatically
s1.loc[:, 's2'] = s2 # Or explicitly, s2['s2']
s1 s2
first second
bar one -0.111384 -2.341803
two -1.226569 1.308240
baz one 1.880835 0.697946
two -0.008979 -0.247896
foo one 0.103864 -1.039990
two 0.836931 0.000811
qux one -0.859005 -1.199615
two -0.321341 -1.098691
A general formula would be
s1.loc[:, s2.columns] = s2
add a comment |Â
up vote
6
down vote
Other than using the indexes names as pointed by @ALollz, you can simply use loc
, which will match indexes automatically
s1.loc[:, 's2'] = s2 # Or explicitly, s2['s2']
s1 s2
first second
bar one -0.111384 -2.341803
two -1.226569 1.308240
baz one 1.880835 0.697946
two -0.008979 -0.247896
foo one 0.103864 -1.039990
two 0.836931 0.000811
qux one -0.859005 -1.199615
two -0.321341 -1.098691
A general formula would be
s1.loc[:, s2.columns] = s2
add a comment |Â
up vote
6
down vote
up vote
6
down vote
Other than using the indexes names as pointed by @ALollz, you can simply use loc
, which will match indexes automatically
s1.loc[:, 's2'] = s2 # Or explicitly, s2['s2']
s1 s2
first second
bar one -0.111384 -2.341803
two -1.226569 1.308240
baz one 1.880835 0.697946
two -0.008979 -0.247896
foo one 0.103864 -1.039990
two 0.836931 0.000811
qux one -0.859005 -1.199615
two -0.321341 -1.098691
A general formula would be
s1.loc[:, s2.columns] = s2
Other than using the indexes names as pointed by @ALollz, you can simply use loc
, which will match indexes automatically
s1.loc[:, 's2'] = s2 # Or explicitly, s2['s2']
s1 s2
first second
bar one -0.111384 -2.341803
two -1.226569 1.308240
baz one 1.880835 0.697946
two -0.008979 -0.247896
foo one 0.103864 -1.039990
two 0.836931 0.000811
qux one -0.859005 -1.199615
two -0.321341 -1.098691
A general formula would be
s1.loc[:, s2.columns] = s2
edited 1 hour ago
answered 1 hour ago
RafaelC
23.5k72447
23.5k72447
add a comment |Â
add a comment |Â
up vote
4
down vote
rename_axis
You can rename the index levels of one and let join
do its thing
s1.join(s2.rename_axis(s1.index.names))
s1 s2
first second
bar one -0.696420 -1.040463
two 0.640891 1.483262
baz one 1.598837 0.097424
two 0.003994 -0.948419
foo one -0.717401 1.190019
two -1.201237 -0.000738
qux one 0.559684 -0.505640
two 1.979700 0.186013
concat
pd.concat([s1, s2], axis=1)
s1 s2
first second
bar one -0.696420 -1.040463
two 0.640891 1.483262
baz one 1.598837 0.097424
two 0.003994 -0.948419
foo one -0.717401 1.190019
two -1.201237 -0.000738
qux one 0.559684 -0.505640
two 1.979700 0.186013
1
Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
â Scott Boston
1 hour ago
add a comment |Â
up vote
4
down vote
rename_axis
You can rename the index levels of one and let join
do its thing
s1.join(s2.rename_axis(s1.index.names))
s1 s2
first second
bar one -0.696420 -1.040463
two 0.640891 1.483262
baz one 1.598837 0.097424
two 0.003994 -0.948419
foo one -0.717401 1.190019
two -1.201237 -0.000738
qux one 0.559684 -0.505640
two 1.979700 0.186013
concat
pd.concat([s1, s2], axis=1)
s1 s2
first second
bar one -0.696420 -1.040463
two 0.640891 1.483262
baz one 1.598837 0.097424
two 0.003994 -0.948419
foo one -0.717401 1.190019
two -1.201237 -0.000738
qux one 0.559684 -0.505640
two 1.979700 0.186013
1
Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
â Scott Boston
1 hour ago
add a comment |Â
up vote
4
down vote
up vote
4
down vote
rename_axis
You can rename the index levels of one and let join
do its thing
s1.join(s2.rename_axis(s1.index.names))
s1 s2
first second
bar one -0.696420 -1.040463
two 0.640891 1.483262
baz one 1.598837 0.097424
two 0.003994 -0.948419
foo one -0.717401 1.190019
two -1.201237 -0.000738
qux one 0.559684 -0.505640
two 1.979700 0.186013
concat
pd.concat([s1, s2], axis=1)
s1 s2
first second
bar one -0.696420 -1.040463
two 0.640891 1.483262
baz one 1.598837 0.097424
two 0.003994 -0.948419
foo one -0.717401 1.190019
two -1.201237 -0.000738
qux one 0.559684 -0.505640
two 1.979700 0.186013
rename_axis
You can rename the index levels of one and let join
do its thing
s1.join(s2.rename_axis(s1.index.names))
s1 s2
first second
bar one -0.696420 -1.040463
two 0.640891 1.483262
baz one 1.598837 0.097424
two 0.003994 -0.948419
foo one -0.717401 1.190019
two -1.201237 -0.000738
qux one 0.559684 -0.505640
two 1.979700 0.186013
concat
pd.concat([s1, s2], axis=1)
s1 s2
first second
bar one -0.696420 -1.040463
two 0.640891 1.483262
baz one 1.598837 0.097424
two 0.003994 -0.948419
foo one -0.717401 1.190019
two -1.201237 -0.000738
qux one 0.559684 -0.505640
two 1.979700 0.186013
answered 1 hour ago
piRSquared
144k19123255
144k19123255
1
Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
â Scott Boston
1 hour ago
add a comment |Â
1
Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
â Scott Boston
1 hour ago
1
1
Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
â Scott Boston
1 hour ago
Dang, I can't +1 for both of those answers. Oh well, +1/2 and +1/2.
â Scott Boston
1 hour ago
add a comment |Â
up vote
4
down vote
Assign it by combine_first
s1.combine_first(s2)
Out[19]:
s1 s2
first second
bar one 0.039203 0.795963
two 0.454782 -0.222806
baz one 3.101120 -0.645474
two -1.174929 -0.875561
foo one -0.887226 1.078218
two 1.507546 -1.078564
qux one 0.028048 0.042462
two 0.826544 -0.375351
# s2.combine_first(s1)
add a comment |Â
up vote
4
down vote
Assign it by combine_first
s1.combine_first(s2)
Out[19]:
s1 s2
first second
bar one 0.039203 0.795963
two 0.454782 -0.222806
baz one 3.101120 -0.645474
two -1.174929 -0.875561
foo one -0.887226 1.078218
two 1.507546 -1.078564
qux one 0.028048 0.042462
two 0.826544 -0.375351
# s2.combine_first(s1)
add a comment |Â
up vote
4
down vote
up vote
4
down vote
Assign it by combine_first
s1.combine_first(s2)
Out[19]:
s1 s2
first second
bar one 0.039203 0.795963
two 0.454782 -0.222806
baz one 3.101120 -0.645474
two -1.174929 -0.875561
foo one -0.887226 1.078218
two 1.507546 -1.078564
qux one 0.028048 0.042462
two 0.826544 -0.375351
# s2.combine_first(s1)
Assign it by combine_first
s1.combine_first(s2)
Out[19]:
s1 s2
first second
bar one 0.039203 0.795963
two 0.454782 -0.222806
baz one 3.101120 -0.645474
two -1.174929 -0.875561
foo one -0.887226 1.078218
two 1.507546 -1.078564
qux one 0.028048 0.042462
two 0.826544 -0.375351
# s2.combine_first(s1)
answered 1 hour ago
Wen
85.4k72452
85.4k72452
add a comment |Â
add a comment |Â
learningToCode is a new contributor. Be nice, and check out our Code of Conduct.
learningToCode is a new contributor. Be nice, and check out our Code of Conduct.
learningToCode is a new contributor. Be nice, and check out our Code of Conduct.
learningToCode is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52785579%2fpandas-dataframe-multiindex-merge%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
This is one of the things that frustrates many new pandas users/coders, there are so many different ways to do the same thing. I like that, because depending on the dataset or why are you doing it in the first place, you can go the easy to code and understand route or you can optimize for quicker run times route.
â Scott Boston
1 hour ago