Only copy one key-column into merged DataFrame

up vote
6
down vote

favorite

Consider the following DataFrames:

df1 = pd.DataFrame('a': [0, 1, 2, 3], 'b': list('abcd'))
df2 = pd.DataFrame('c': list('abcd'), 'd': 'Alex')

In this instance, df1['b'] and df2['c'] are the key columns. So when merging:

df1.merge(df2, left_on='b', right_on='c')
 a b c d
0 0 a a Alex
1 1 b b Alex
2 2 c c Alex
3 3 d d Alex

I end up with both key columns in the resultant DataFrame when I only need one. I've been using:

df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')

Is there a way to only keep one key column?

asked 55 mins ago

Alex

434318

add a commentÂ |Â

up vote
6
down vote

favorite

Consider the following DataFrames:

df1 = pd.DataFrame('a': [0, 1, 2, 3], 'b': list('abcd'))
df2 = pd.DataFrame('c': list('abcd'), 'd': 'Alex')

In this instance, df1['b'] and df2['c'] are the key columns. So when merging:

df1.merge(df2, left_on='b', right_on='c')
 a b c d
0 0 a a Alex
1 1 b b Alex
2 2 c c Alex
3 3 d d Alex

I end up with both key columns in the resultant DataFrame when I only need one. I've been using:

df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')

Is there a way to only keep one key column?

asked 55 mins ago

Alex

434318

add a commentÂ |Â

up vote
6
down vote

favorite

Consider the following DataFrames:

df1 = pd.DataFrame('a': [0, 1, 2, 3], 'b': list('abcd'))
df2 = pd.DataFrame('c': list('abcd'), 'd': 'Alex')

In this instance, df1['b'] and df2['c'] are the key columns. So when merging:

df1.merge(df2, left_on='b', right_on='c')
 a b c d
0 0 a a Alex
1 1 b b Alex
2 2 c c Alex
3 3 d d Alex

I end up with both key columns in the resultant DataFrame when I only need one. I've been using:

df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')

Is there a way to only keep one key column?

asked 55 mins ago

Alex

434318

Consider the following DataFrames:

df1 = pd.DataFrame('a': [0, 1, 2, 3], 'b': list('abcd'))
df2 = pd.DataFrame('c': list('abcd'), 'd': 'Alex')

In this instance, df1['b'] and df2['c'] are the key columns. So when merging:

df1.merge(df2, left_on='b', right_on='c')
 a b c d
0 0 a a Alex
1 1 b b Alex
2 2 c c Alex
3 3 d d Alex

I end up with both key columns in the resultant DataFrame when I only need one. I've been using:

df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')

Is there a way to only keep one key column?

python pandas merge

asked 55 mins ago

Alex

434318

asked 55 mins ago

Alex

434318

asked 55 mins ago

Alex

434318

asked 55 mins ago

Alex

434318

asked 55 mins ago

Alex

434318

add a commentÂ |Â

5 Answers
5

active

oldest

votes

up vote
5
down vote

One way is to set b and c as the index of your frames respectively, and use join followed by reset_index:

df1.set_index('b').join(df2.set_index('c')).reset_index()

 b a d
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex

This will be faster than the merge/drop method on large dataframes, mostly because drop is slow. @Bill's method is faster than my suggestion, and @W-B & @PiRsquared easily outspeed the other suggestions:

import timeit

df1 = pd.concat((df1 for _ in range(1000)))
df2 = pd.concat((df2 for _ in range(1000)))

def index_method(df1 = df1, df2 = df2):
 return df1.set_index('b').join(df2.set_index('c')).reset_index()


def merge_method(df1 = df1, df2=df2):
 return df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')

def rename_method(df1 = df1, df2 = df2):
 return df1.rename('b': 'c', axis=1).merge(df2)

def index_method2(df1 = df1, df2 = df2):
 return df1.join(df2.set_index('c'), on='b')

def assign_method(df1 = df1, df2 = df2):
 return df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()

def map_method(df1 = df1, df2 = df2):
 return df1.assign(d=df1.b.map(dict(df2.values)))

>>> timeit.timeit(index_method, number=10) / 10
0.7853091600998596
>>> timeit.timeit(merge_method, number=10) / 10
1.1696729859002517
>>> timeit.timeit(rename_method, number=10) / 10
0.4291436871004407
>>> timeit.timeit(index_method2, number=10) / 10
0.5037374985004135
>>> timeit.timeit(assign_method, number=10) / 10
0.0038641377999738325
>>> timeit.timeit(map_method, number=10) / 10
0.006620216699957382

edited 29 mins ago

answered 53 mins ago

sacul

25.8k41638

3

df1.join(df2.set_index('c'), on='b')
â€“Â piRSquared
38 mins ago

2

Would you like testing my speed ?
â€“Â W-B
33 mins ago

2

@W-B, I just did, it's far faster!
â€“Â sacul
31 mins ago

@sacul thank you :-)
â€“Â W-B
29 mins ago

add a commentÂ |Â

up vote
4
down vote

Another way is to give b and c the same name. At least for the merge operation.

df1.rename('b': 'c', axis=1).merge(df2)
 a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex

answered 47 mins ago

Bill

1,97411827

add a commentÂ |Â

up vote
4
down vote

Or use one set_index and left_index=True and right_on paramater:

df1.set_index('b').merge(df2, left_index=True, right_on='c')

Output:

 a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex

answered 41 mins ago

Scott Boston

48.2k52451

add a commentÂ |Â

up vote
4
down vote

After set_index you ca directly assign the value

df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()
Out[233]: 
 b a c
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex

answered 34 mins ago

W-B

90.5k72754

add a commentÂ |Â

up vote
4
down vote

`map`

Obnoxious (not recommended) method that I was compelled to put down because I accidentally posted a duplicate answer to someone else.

df1.assign(d=df1.b.map(dict(df2.values)))

 a b d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex

edited 33 mins ago

answered 40 mins ago

piRSquared

147k21130264

Wait, why not use map in this case of bringing only one column?
â€“Â ALollz
18 mins ago

1

Because it isn't generalized. It's very specific to this toy problem. If we truly were bringing over one column, then I'd agree.
â€“Â piRSquared
17 mins ago

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53215736%2fonly-copy-one-key-column-into-merged-dataframe%23new-answer', 'question_page');

);

Post as a guest

Name

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

up vote
5
down vote

One way is to set b and c as the index of your frames respectively, and use join followed by reset_index:

df1.set_index('b').join(df2.set_index('c')).reset_index()

 b a d
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex

import timeit

df1 = pd.concat((df1 for _ in range(1000)))
df2 = pd.concat((df2 for _ in range(1000)))

def index_method(df1 = df1, df2 = df2):
 return df1.set_index('b').join(df2.set_index('c')).reset_index()


def merge_method(df1 = df1, df2=df2):
 return df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')

def rename_method(df1 = df1, df2 = df2):
 return df1.rename('b': 'c', axis=1).merge(df2)

def index_method2(df1 = df1, df2 = df2):
 return df1.join(df2.set_index('c'), on='b')

def assign_method(df1 = df1, df2 = df2):
 return df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()

def map_method(df1 = df1, df2 = df2):
 return df1.assign(d=df1.b.map(dict(df2.values)))

>>> timeit.timeit(index_method, number=10) / 10
0.7853091600998596
>>> timeit.timeit(merge_method, number=10) / 10
1.1696729859002517
>>> timeit.timeit(rename_method, number=10) / 10
0.4291436871004407
>>> timeit.timeit(index_method2, number=10) / 10
0.5037374985004135
>>> timeit.timeit(assign_method, number=10) / 10
0.0038641377999738325
>>> timeit.timeit(map_method, number=10) / 10
0.006620216699957382

edited 29 mins ago

answered 53 mins ago

sacul

25.8k41638

3

df1.join(df2.set_index('c'), on='b')
â€“Â piRSquared
38 mins ago

2

Would you like testing my speed ?
â€“Â W-B
33 mins ago

2

@W-B, I just did, it's far faster!
â€“Â sacul
31 mins ago

@sacul thank you :-)
â€“Â W-B
29 mins ago

add a commentÂ |Â

up vote
5
down vote

One way is to set b and c as the index of your frames respectively, and use join followed by reset_index:

df1.set_index('b').join(df2.set_index('c')).reset_index()

 b a d
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex

import timeit

df1 = pd.concat((df1 for _ in range(1000)))
df2 = pd.concat((df2 for _ in range(1000)))

def index_method(df1 = df1, df2 = df2):
 return df1.set_index('b').join(df2.set_index('c')).reset_index()


def merge_method(df1 = df1, df2=df2):
 return df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')

def rename_method(df1 = df1, df2 = df2):
 return df1.rename('b': 'c', axis=1).merge(df2)

def index_method2(df1 = df1, df2 = df2):
 return df1.join(df2.set_index('c'), on='b')

def assign_method(df1 = df1, df2 = df2):
 return df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()

def map_method(df1 = df1, df2 = df2):
 return df1.assign(d=df1.b.map(dict(df2.values)))

>>> timeit.timeit(index_method, number=10) / 10
0.7853091600998596
>>> timeit.timeit(merge_method, number=10) / 10
1.1696729859002517
>>> timeit.timeit(rename_method, number=10) / 10
0.4291436871004407
>>> timeit.timeit(index_method2, number=10) / 10
0.5037374985004135
>>> timeit.timeit(assign_method, number=10) / 10
0.0038641377999738325
>>> timeit.timeit(map_method, number=10) / 10
0.006620216699957382

edited 29 mins ago

answered 53 mins ago

sacul

25.8k41638

3

df1.join(df2.set_index('c'), on='b')
â€“Â piRSquared
38 mins ago

2

Would you like testing my speed ?
â€“Â W-B
33 mins ago

2

@W-B, I just did, it's far faster!
â€“Â sacul
31 mins ago

@sacul thank you :-)
â€“Â W-B
29 mins ago

add a commentÂ |Â

up vote
5
down vote

One way is to set b and c as the index of your frames respectively, and use join followed by reset_index:

df1.set_index('b').join(df2.set_index('c')).reset_index()

 b a d
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex

import timeit

df1 = pd.concat((df1 for _ in range(1000)))
df2 = pd.concat((df2 for _ in range(1000)))

def index_method(df1 = df1, df2 = df2):
 return df1.set_index('b').join(df2.set_index('c')).reset_index()


def merge_method(df1 = df1, df2=df2):
 return df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')

def rename_method(df1 = df1, df2 = df2):
 return df1.rename('b': 'c', axis=1).merge(df2)

def index_method2(df1 = df1, df2 = df2):
 return df1.join(df2.set_index('c'), on='b')

def assign_method(df1 = df1, df2 = df2):
 return df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()

def map_method(df1 = df1, df2 = df2):
 return df1.assign(d=df1.b.map(dict(df2.values)))

>>> timeit.timeit(index_method, number=10) / 10
0.7853091600998596
>>> timeit.timeit(merge_method, number=10) / 10
1.1696729859002517
>>> timeit.timeit(rename_method, number=10) / 10
0.4291436871004407
>>> timeit.timeit(index_method2, number=10) / 10
0.5037374985004135
>>> timeit.timeit(assign_method, number=10) / 10
0.0038641377999738325
>>> timeit.timeit(map_method, number=10) / 10
0.006620216699957382

edited 29 mins ago

answered 53 mins ago

sacul

25.8k41638

One way is to set b and c as the index of your frames respectively, and use join followed by reset_index:

df1.set_index('b').join(df2.set_index('c')).reset_index()

 b a d
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex

import timeit

df1 = pd.concat((df1 for _ in range(1000)))
df2 = pd.concat((df2 for _ in range(1000)))

def index_method(df1 = df1, df2 = df2):
 return df1.set_index('b').join(df2.set_index('c')).reset_index()


def merge_method(df1 = df1, df2=df2):
 return df1.merge(df2, left_on='b', right_on='c').drop('c', axis='columns')

def rename_method(df1 = df1, df2 = df2):
 return df1.rename('b': 'c', axis=1).merge(df2)

def index_method2(df1 = df1, df2 = df2):
 return df1.join(df2.set_index('c'), on='b')

def assign_method(df1 = df1, df2 = df2):
 return df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()

def map_method(df1 = df1, df2 = df2):
 return df1.assign(d=df1.b.map(dict(df2.values)))

>>> timeit.timeit(index_method, number=10) / 10
0.7853091600998596
>>> timeit.timeit(merge_method, number=10) / 10
1.1696729859002517
>>> timeit.timeit(rename_method, number=10) / 10
0.4291436871004407
>>> timeit.timeit(index_method2, number=10) / 10
0.5037374985004135
>>> timeit.timeit(assign_method, number=10) / 10
0.0038641377999738325
>>> timeit.timeit(map_method, number=10) / 10
0.006620216699957382

edited 29 mins ago

answered 53 mins ago

sacul

25.8k41638

edited 29 mins ago

answered 53 mins ago

sacul

25.8k41638

answered 53 mins ago

sacul

25.8k41638

answered 53 mins ago

sacul

25.8k41638

3

df1.join(df2.set_index('c'), on='b')
â€“Â piRSquared
38 mins ago

2

Would you like testing my speed ?
â€“Â W-B
33 mins ago

2

@W-B, I just did, it's far faster!
â€“Â sacul
31 mins ago

@sacul thank you :-)
â€“Â W-B
29 mins ago

add a commentÂ |Â

3

df1.join(df2.set_index('c'), on='b')
â€“Â piRSquared
38 mins ago

2

Would you like testing my speed ?
â€“Â W-B
33 mins ago

2

@W-B, I just did, it's far faster!
â€“Â sacul
31 mins ago

@sacul thank you :-)
â€“Â W-B
29 mins ago

df1.join(df2.set_index('c'), on='b')
â€“Â piRSquared
38 mins ago

Would you like testing my speed ?
â€“Â W-B
33 mins ago

@W-B, I just did, it's far faster!
â€“Â sacul
31 mins ago

@sacul thank you :-)
â€“Â W-B
29 mins ago

add a commentÂ |Â

up vote
4
down vote

Another way is to give b and c the same name. At least for the merge operation.

df1.rename('b': 'c', axis=1).merge(df2)
 a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex

answered 47 mins ago

Bill

1,97411827

add a commentÂ |Â

up vote
4
down vote

Another way is to give b and c the same name. At least for the merge operation.

df1.rename('b': 'c', axis=1).merge(df2)
 a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex

answered 47 mins ago

Bill

1,97411827

add a commentÂ |Â

up vote
4
down vote

Another way is to give b and c the same name. At least for the merge operation.

df1.rename('b': 'c', axis=1).merge(df2)
 a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex

answered 47 mins ago

Bill

1,97411827

Another way is to give b and c the same name. At least for the merge operation.

df1.rename('b': 'c', axis=1).merge(df2)
 a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex

answered 47 mins ago

Bill

1,97411827

answered 47 mins ago

Bill

1,97411827

answered 47 mins ago

Bill

1,97411827

answered 47 mins ago

Bill

1,97411827

add a commentÂ |Â

up vote
4
down vote

Or use one set_index and left_index=True and right_on paramater:

df1.set_index('b').merge(df2, left_index=True, right_on='c')

Output:

 a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex

answered 41 mins ago

Scott Boston

48.2k52451

add a commentÂ |Â

up vote
4
down vote

Or use one set_index and left_index=True and right_on paramater:

df1.set_index('b').merge(df2, left_index=True, right_on='c')

Output:

 a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex

answered 41 mins ago

Scott Boston

48.2k52451

add a commentÂ |Â

up vote
4
down vote

Or use one set_index and left_index=True and right_on paramater:

df1.set_index('b').merge(df2, left_index=True, right_on='c')

Output:

 a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex

answered 41 mins ago

Scott Boston

48.2k52451

Or use one set_index and left_index=True and right_on paramater:

df1.set_index('b').merge(df2, left_index=True, right_on='c')

Output:

 a c d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex

answered 41 mins ago

Scott Boston

48.2k52451

answered 41 mins ago

Scott Boston

48.2k52451

answered 41 mins ago

Scott Boston

48.2k52451

answered 41 mins ago

Scott Boston

48.2k52451

add a commentÂ |Â

up vote
4
down vote

After set_index you ca directly assign the value

df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()
Out[233]: 
 b a c
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex

answered 34 mins ago

W-B

90.5k72754

add a commentÂ |Â

up vote
4
down vote

After set_index you ca directly assign the value

df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()
Out[233]: 
 b a c
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex

answered 34 mins ago

W-B

90.5k72754

add a commentÂ |Â

up vote
4
down vote

After set_index you ca directly assign the value

df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()
Out[233]: 
 b a c
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex

answered 34 mins ago

W-B

90.5k72754

After set_index you ca directly assign the value

df1.set_index('b').assign(c=df2.set_index('c').d).reset_index()
Out[233]: 
 b a c
0 a 0 Alex
1 b 1 Alex
2 c 2 Alex
3 d 3 Alex

answered 34 mins ago

W-B

90.5k72754

answered 34 mins ago

W-B

90.5k72754

answered 34 mins ago

W-B

90.5k72754

answered 34 mins ago

W-B

90.5k72754

add a commentÂ |Â

up vote
4
down vote

`map`

Obnoxious (not recommended) method that I was compelled to put down because I accidentally posted a duplicate answer to someone else.

df1.assign(d=df1.b.map(dict(df2.values)))

 a b d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex

edited 33 mins ago

answered 40 mins ago

piRSquared

147k21130264

Wait, why not use map in this case of bringing only one column?
â€“Â ALollz
18 mins ago

1

Because it isn't generalized. It's very specific to this toy problem. If we truly were bringing over one column, then I'd agree.
â€“Â piRSquared
17 mins ago

add a commentÂ |Â

up vote
4
down vote

`map`

Obnoxious (not recommended) method that I was compelled to put down because I accidentally posted a duplicate answer to someone else.

df1.assign(d=df1.b.map(dict(df2.values)))

 a b d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex

edited 33 mins ago

answered 40 mins ago

piRSquared

147k21130264

Wait, why not use map in this case of bringing only one column?
â€“Â ALollz
18 mins ago

1

Because it isn't generalized. It's very specific to this toy problem. If we truly were bringing over one column, then I'd agree.
â€“Â piRSquared
17 mins ago

add a commentÂ |Â

up vote
4
down vote

`map`

Obnoxious (not recommended) method that I was compelled to put down because I accidentally posted a duplicate answer to someone else.

df1.assign(d=df1.b.map(dict(df2.values)))

 a b d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex

edited 33 mins ago

answered 40 mins ago

piRSquared

147k21130264

`map`

Obnoxious (not recommended) method that I was compelled to put down because I accidentally posted a duplicate answer to someone else.

df1.assign(d=df1.b.map(dict(df2.values)))

 a b d
0 0 a Alex
1 1 b Alex
2 2 c Alex
3 3 d Alex

edited 33 mins ago

answered 40 mins ago

piRSquared

147k21130264

edited 33 mins ago

answered 40 mins ago

piRSquared

147k21130264

answered 40 mins ago

piRSquared

147k21130264

answered 40 mins ago

piRSquared

147k21130264

Wait, why not use map in this case of bringing only one column?
â€“Â ALollz
18 mins ago

1

Because it isn't generalized. It's very specific to this toy problem. If we truly were bringing over one column, then I'd agree.
â€“Â piRSquared
17 mins ago

add a commentÂ |Â

Wait, why not use map in this case of bringing only one column?
â€“Â ALollz
18 mins ago

1

Because it isn't generalized. It's very specific to this toy problem. If we truly were bringing over one column, then I'd agree.
â€“Â piRSquared
17 mins ago

Wait, why not use map in this case of bringing only one column?
â€“Â ALollz
18 mins ago

Because it isn't generalized. It's very specific to this toy problem. If we truly were bringing over one column, then I'd agree.
â€“Â piRSquared
17 mins ago

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky