Pandas column reformatting

up vote
6
down vote

favorite

Any quick way to achieve the below output pls?

Input:

Code Items
123 eq-hk
456 ca-eu; tp-lbe
789 ca-us
321 go-ch
654 ca-au; go-au
987 go-jp
147 co-ml; go-ml
258 ca-us
369 ca-us; ca-my
741 ca-us
852 ca-eu
963 ca-ml; co-ml; go-ml

Output:

Code eq ca go co tp
123 hk 
456 eu lbe
789 us 
321 ch 
654 au au 
987 jp 
147 ml ml 
258 us 
369 us,my 
741 us 
852 eu 
963 ml ml ml

Am again running into loops and a very ugly code to make it work. If there is an elegant way to achieve this pls?

Thank you!

edited 2 hours ago

Praveen

6,10331430

asked 2 hours ago

spiff

361412

add a commentÂ |Â

up vote
6
down vote

favorite

Any quick way to achieve the below output pls?

Input:

Code Items
123 eq-hk
456 ca-eu; tp-lbe
789 ca-us
321 go-ch
654 ca-au; go-au
987 go-jp
147 co-ml; go-ml
258 ca-us
369 ca-us; ca-my
741 ca-us
852 ca-eu
963 ca-ml; co-ml; go-ml

Output:

Code eq ca go co tp
123 hk 
456 eu lbe
789 us 
321 ch 
654 au au 
987 jp 
147 ml ml 
258 us 
369 us,my 
741 us 
852 eu 
963 ml ml ml

Am again running into loops and a very ugly code to make it work. If there is an elegant way to achieve this pls?

Thank you!

edited 2 hours ago

Praveen

6,10331430

asked 2 hours ago

spiff

361412

add a commentÂ |Â

up vote
6
down vote

favorite

Any quick way to achieve the below output pls?

Input:

Code Items
123 eq-hk
456 ca-eu; tp-lbe
789 ca-us
321 go-ch
654 ca-au; go-au
987 go-jp
147 co-ml; go-ml
258 ca-us
369 ca-us; ca-my
741 ca-us
852 ca-eu
963 ca-ml; co-ml; go-ml

Output:

Code eq ca go co tp
123 hk 
456 eu lbe
789 us 
321 ch 
654 au au 
987 jp 
147 ml ml 
258 us 
369 us,my 
741 us 
852 eu 
963 ml ml ml

Am again running into loops and a very ugly code to make it work. If there is an elegant way to achieve this pls?

Thank you!

edited 2 hours ago

Praveen

6,10331430

asked 2 hours ago

spiff

361412

Any quick way to achieve the below output pls?

Input:

Code Items
123 eq-hk
456 ca-eu; tp-lbe
789 ca-us
321 go-ch
654 ca-au; go-au
987 go-jp
147 co-ml; go-ml
258 ca-us
369 ca-us; ca-my
741 ca-us
852 ca-eu
963 ca-ml; co-ml; go-ml

Output:

Code eq ca go co tp
123 hk 
456 eu lbe
789 us 
321 ch 
654 au au 
987 jp 
147 ml ml 
258 us 
369 us,my 
741 us 
852 eu 
963 ml ml ml

Am again running into loops and a very ugly code to make it work. If there is an elegant way to achieve this pls?

Thank you!

python pandas

edited 2 hours ago

Praveen

6,10331430

asked 2 hours ago

spiff

361412

edited 2 hours ago

Praveen

6,10331430

asked 2 hours ago

spiff

361412

edited 2 hours ago

Praveen

6,10331430

edited 2 hours ago

Praveen

6,10331430

edited 2 hours ago

Praveen

6,10331430

asked 2 hours ago

spiff

361412

asked 2 hours ago

spiff

361412

asked 2 hours ago

spiff

361412

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
2
down vote

accepted

import pandas as pd
df = pd.DataFrame([
 ('123', 'eq-hk'),
 ('456', 'ca-eu; tp-lbe'),
 ('789', 'ca-us'),
 ('321', 'go-ch'),
 ('654', 'ca-au; go-au'),
 ('987', 'go-jp'),
 ('147', 'co-ml; go-ml'),
 ('258', 'ca-us'),
 ('369', 'ca-us; ca-my'),
 ('741', 'ca-us'),
 ('852', 'ca-eu'),
 ('963', 'ca-ml; co-ml; go-ml')],
 columns=['Code', 'Items'])


# Get item type list from each row, sum (concatenate) the lists and convert
# to a set to remove duplicates 
item_types = set(df['Items'].str.findall('(w+)-').sum())
print(item_types)
# 'ca', 'co', 'eq', 'go', 'tp'

# Generate a column for each item type
df1 = pd.DataFrame(df['Code'])
for t in item_types:
 df1[t] = df['Items'].str.findall('%s-(w+)' % t).apply(lambda x: ''.join(x))
print(df1)

# Code ca tp eq co go
#0 123 hk 
#1 456 eu lbe 
#2 789 us 
#3 321 ch
#4 654 au au
#5 987 jp
#6 147 ml ml
#7 258 us 
#8 369 usmy 
#9 741 us 
#10 852 eu 
#11 963 ml ml ml

answered 2 hours ago

Yosi Hammer

3915

Thanks vm, I just changed the last bit to ','.join(x) so I get the comma between us,my example.
â€“Â spiff
39 mins ago

Have accepted this only as it fits my depth of understanding best, am comfortable with using this type of regex to achieve it.
â€“Â spiff
36 mins ago

add a commentÂ |Â

up vote
4
down vote

This is a little bit complicate

(df.set_index('Code')
 .Items.str.split(';',expand=True)
 .stack()
 .str.split('-',expand=True)
 .set_index(0,append=True)[1]
 .unstack()
 .fillna('')
 .sum(level=0))

0 ca co eq go tp
Code 
123 hk 
147 ml ml 
258 us 
321 ch 
369 usmy 
456 eu lbe
654 au au 
741 us 
789 us 
852 eu 
963 ml ml ml 
987 jp 


# using str split to get unnest the column, 
#then we do stack, and str split again , then set the first column to index 
# after unstack we yield the result

edited 1 hour ago

coldspeed

108k1798168

answered 2 hours ago

W-B

89.6k72653

1

Hey Wen, can you please format your answer better? Answers which require horizontal scrolling are difficult to read.
â€“Â coldspeed
1 hour ago

@coldspeed ah , that is cause by the comment :-) fixed
â€“Â W-B
1 hour ago

I meant something like my edit. If the line crosses 80 characters, try to format each method on its own line.
â€“Â coldspeed
1 hour ago

add a commentÂ |Â

up vote
3
down vote

List comprehensions work better (read: much faster) for string problems like this which require multiple levels of splitting.

df2 = pd.DataFrame([
 dict(y.split('-') for y in x.split('; ')) 
 for x in df.Items]).fillna('')
df2.insert(0, 'Code', df.Code)

print(df2)
 Code ca co eq go tp
0 123 hk 
1 456 eu lbe
2 789 us 
3 321 ch 
4 654 au au 
5 987 jp 
6 147 ml ml 
7 258 us # Should be "us,my"... see below.
8 369 my 
9 741 us 
10 852 eu 
11 963 ml ml ml

This does not handle the situation where multiple items with the same key can be present in a row. For that, a slightly more involved solution is needed.

from itertools import chain

v = [x.split('; ') for x in df.Items] 
X = pd.Series(df.Code.values.repeat([len(x) for x in v]))
Y = pd.DataFrame([x.split('-') for x in chain.from_iterable(v)])

df2 = pd.concat([X, Y], axis=1, ignore_index=True)

(df2.set_index([0, 1, 3])[2]
 .unstack(1)
 .fillna('')
 .groupby(level=0)
 .agg(lambda x: ','.join(x).strip(','))

1 ca co eq go tp
0 
123 hk 
147 ml ml 
258 us 
321 ch 
369 us,my 
456 eu lbe
654 au au 
741 us 
789 us 
852 eu 
963 ml ml ml 
987 jp

edited 1 hour ago

answered 2 hours ago

coldspeed

108k1798168

thanks vm for your post!
â€“Â spiff
38 mins ago

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53148121%2fpandas-column-reformatting%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
2
down vote

accepted

import pandas as pd
df = pd.DataFrame([
 ('123', 'eq-hk'),
 ('456', 'ca-eu; tp-lbe'),
 ('789', 'ca-us'),
 ('321', 'go-ch'),
 ('654', 'ca-au; go-au'),
 ('987', 'go-jp'),
 ('147', 'co-ml; go-ml'),
 ('258', 'ca-us'),
 ('369', 'ca-us; ca-my'),
 ('741', 'ca-us'),
 ('852', 'ca-eu'),
 ('963', 'ca-ml; co-ml; go-ml')],
 columns=['Code', 'Items'])


# Get item type list from each row, sum (concatenate) the lists and convert
# to a set to remove duplicates 
item_types = set(df['Items'].str.findall('(w+)-').sum())
print(item_types)
# 'ca', 'co', 'eq', 'go', 'tp'

# Generate a column for each item type
df1 = pd.DataFrame(df['Code'])
for t in item_types:
 df1[t] = df['Items'].str.findall('%s-(w+)' % t).apply(lambda x: ''.join(x))
print(df1)

# Code ca tp eq co go
#0 123 hk 
#1 456 eu lbe 
#2 789 us 
#3 321 ch
#4 654 au au
#5 987 jp
#6 147 ml ml
#7 258 us 
#8 369 usmy 
#9 741 us 
#10 852 eu 
#11 963 ml ml ml

answered 2 hours ago

Yosi Hammer

3915

Thanks vm, I just changed the last bit to ','.join(x) so I get the comma between us,my example.
â€“Â spiff
39 mins ago

Have accepted this only as it fits my depth of understanding best, am comfortable with using this type of regex to achieve it.
â€“Â spiff
36 mins ago

add a commentÂ |Â

up vote
2
down vote

accepted

import pandas as pd
df = pd.DataFrame([
 ('123', 'eq-hk'),
 ('456', 'ca-eu; tp-lbe'),
 ('789', 'ca-us'),
 ('321', 'go-ch'),
 ('654', 'ca-au; go-au'),
 ('987', 'go-jp'),
 ('147', 'co-ml; go-ml'),
 ('258', 'ca-us'),
 ('369', 'ca-us; ca-my'),
 ('741', 'ca-us'),
 ('852', 'ca-eu'),
 ('963', 'ca-ml; co-ml; go-ml')],
 columns=['Code', 'Items'])


# Get item type list from each row, sum (concatenate) the lists and convert
# to a set to remove duplicates 
item_types = set(df['Items'].str.findall('(w+)-').sum())
print(item_types)
# 'ca', 'co', 'eq', 'go', 'tp'

# Generate a column for each item type
df1 = pd.DataFrame(df['Code'])
for t in item_types:
 df1[t] = df['Items'].str.findall('%s-(w+)' % t).apply(lambda x: ''.join(x))
print(df1)

# Code ca tp eq co go
#0 123 hk 
#1 456 eu lbe 
#2 789 us 
#3 321 ch
#4 654 au au
#5 987 jp
#6 147 ml ml
#7 258 us 
#8 369 usmy 
#9 741 us 
#10 852 eu 
#11 963 ml ml ml

answered 2 hours ago

Yosi Hammer

3915

Thanks vm, I just changed the last bit to ','.join(x) so I get the comma between us,my example.
â€“Â spiff
39 mins ago

Have accepted this only as it fits my depth of understanding best, am comfortable with using this type of regex to achieve it.
â€“Â spiff
36 mins ago

add a commentÂ |Â

up vote
2
down vote

accepted

import pandas as pd
df = pd.DataFrame([
 ('123', 'eq-hk'),
 ('456', 'ca-eu; tp-lbe'),
 ('789', 'ca-us'),
 ('321', 'go-ch'),
 ('654', 'ca-au; go-au'),
 ('987', 'go-jp'),
 ('147', 'co-ml; go-ml'),
 ('258', 'ca-us'),
 ('369', 'ca-us; ca-my'),
 ('741', 'ca-us'),
 ('852', 'ca-eu'),
 ('963', 'ca-ml; co-ml; go-ml')],
 columns=['Code', 'Items'])


# Get item type list from each row, sum (concatenate) the lists and convert
# to a set to remove duplicates 
item_types = set(df['Items'].str.findall('(w+)-').sum())
print(item_types)
# 'ca', 'co', 'eq', 'go', 'tp'

# Generate a column for each item type
df1 = pd.DataFrame(df['Code'])
for t in item_types:
 df1[t] = df['Items'].str.findall('%s-(w+)' % t).apply(lambda x: ''.join(x))
print(df1)

# Code ca tp eq co go
#0 123 hk 
#1 456 eu lbe 
#2 789 us 
#3 321 ch
#4 654 au au
#5 987 jp
#6 147 ml ml
#7 258 us 
#8 369 usmy 
#9 741 us 
#10 852 eu 
#11 963 ml ml ml

answered 2 hours ago

Yosi Hammer

3915

import pandas as pd
df = pd.DataFrame([
 ('123', 'eq-hk'),
 ('456', 'ca-eu; tp-lbe'),
 ('789', 'ca-us'),
 ('321', 'go-ch'),
 ('654', 'ca-au; go-au'),
 ('987', 'go-jp'),
 ('147', 'co-ml; go-ml'),
 ('258', 'ca-us'),
 ('369', 'ca-us; ca-my'),
 ('741', 'ca-us'),
 ('852', 'ca-eu'),
 ('963', 'ca-ml; co-ml; go-ml')],
 columns=['Code', 'Items'])


# Get item type list from each row, sum (concatenate) the lists and convert
# to a set to remove duplicates 
item_types = set(df['Items'].str.findall('(w+)-').sum())
print(item_types)
# 'ca', 'co', 'eq', 'go', 'tp'

# Generate a column for each item type
df1 = pd.DataFrame(df['Code'])
for t in item_types:
 df1[t] = df['Items'].str.findall('%s-(w+)' % t).apply(lambda x: ''.join(x))
print(df1)

# Code ca tp eq co go
#0 123 hk 
#1 456 eu lbe 
#2 789 us 
#3 321 ch
#4 654 au au
#5 987 jp
#6 147 ml ml
#7 258 us 
#8 369 usmy 
#9 741 us 
#10 852 eu 
#11 963 ml ml ml

answered 2 hours ago

Yosi Hammer

3915

answered 2 hours ago

Yosi Hammer

3915

answered 2 hours ago

Yosi Hammer

3915

answered 2 hours ago

Yosi Hammer

3915

Thanks vm, I just changed the last bit to ','.join(x) so I get the comma between us,my example.
â€“Â spiff
39 mins ago

Have accepted this only as it fits my depth of understanding best, am comfortable with using this type of regex to achieve it.
â€“Â spiff
36 mins ago

add a commentÂ |Â

Thanks vm, I just changed the last bit to ','.join(x) so I get the comma between us,my example.
â€“Â spiff
39 mins ago

Have accepted this only as it fits my depth of understanding best, am comfortable with using this type of regex to achieve it.
â€“Â spiff
36 mins ago

Thanks vm, I just changed the last bit to ','.join(x) so I get the comma between us,my example.
â€“Â spiff
39 mins ago

Have accepted this only as it fits my depth of understanding best, am comfortable with using this type of regex to achieve it.
â€“Â spiff
36 mins ago

add a commentÂ |Â

up vote
4
down vote

This is a little bit complicate

(df.set_index('Code')
 .Items.str.split(';',expand=True)
 .stack()
 .str.split('-',expand=True)
 .set_index(0,append=True)[1]
 .unstack()
 .fillna('')
 .sum(level=0))

0 ca co eq go tp
Code 
123 hk 
147 ml ml 
258 us 
321 ch 
369 usmy 
456 eu lbe
654 au au 
741 us 
789 us 
852 eu 
963 ml ml ml 
987 jp 


# using str split to get unnest the column, 
#then we do stack, and str split again , then set the first column to index 
# after unstack we yield the result

edited 1 hour ago

coldspeed

108k1798168

answered 2 hours ago

W-B

89.6k72653

1

Hey Wen, can you please format your answer better? Answers which require horizontal scrolling are difficult to read.
â€“Â coldspeed
1 hour ago

@coldspeed ah , that is cause by the comment :-) fixed
â€“Â W-B
1 hour ago

I meant something like my edit. If the line crosses 80 characters, try to format each method on its own line.
â€“Â coldspeed
1 hour ago

add a commentÂ |Â

up vote
4
down vote

This is a little bit complicate

(df.set_index('Code')
 .Items.str.split(';',expand=True)
 .stack()
 .str.split('-',expand=True)
 .set_index(0,append=True)[1]
 .unstack()
 .fillna('')
 .sum(level=0))

0 ca co eq go tp
Code 
123 hk 
147 ml ml 
258 us 
321 ch 
369 usmy 
456 eu lbe
654 au au 
741 us 
789 us 
852 eu 
963 ml ml ml 
987 jp 


# using str split to get unnest the column, 
#then we do stack, and str split again , then set the first column to index 
# after unstack we yield the result

edited 1 hour ago

coldspeed

108k1798168

answered 2 hours ago

W-B

89.6k72653

1

Hey Wen, can you please format your answer better? Answers which require horizontal scrolling are difficult to read.
â€“Â coldspeed
1 hour ago

@coldspeed ah , that is cause by the comment :-) fixed
â€“Â W-B
1 hour ago

I meant something like my edit. If the line crosses 80 characters, try to format each method on its own line.
â€“Â coldspeed
1 hour ago

add a commentÂ |Â

up vote
4
down vote

This is a little bit complicate

(df.set_index('Code')
 .Items.str.split(';',expand=True)
 .stack()
 .str.split('-',expand=True)
 .set_index(0,append=True)[1]
 .unstack()
 .fillna('')
 .sum(level=0))

0 ca co eq go tp
Code 
123 hk 
147 ml ml 
258 us 
321 ch 
369 usmy 
456 eu lbe
654 au au 
741 us 
789 us 
852 eu 
963 ml ml ml 
987 jp 


# using str split to get unnest the column, 
#then we do stack, and str split again , then set the first column to index 
# after unstack we yield the result

edited 1 hour ago

coldspeed

108k1798168

answered 2 hours ago

W-B

89.6k72653

This is a little bit complicate

(df.set_index('Code')
 .Items.str.split(';',expand=True)
 .stack()
 .str.split('-',expand=True)
 .set_index(0,append=True)[1]
 .unstack()
 .fillna('')
 .sum(level=0))

0 ca co eq go tp
Code 
123 hk 
147 ml ml 
258 us 
321 ch 
369 usmy 
456 eu lbe
654 au au 
741 us 
789 us 
852 eu 
963 ml ml ml 
987 jp 


# using str split to get unnest the column, 
#then we do stack, and str split again , then set the first column to index 
# after unstack we yield the result

edited 1 hour ago

coldspeed

108k1798168

answered 2 hours ago

W-B

89.6k72653

edited 1 hour ago

coldspeed

108k1798168

edited 1 hour ago

coldspeed

108k1798168

edited 1 hour ago

coldspeed

108k1798168

answered 2 hours ago

W-B

89.6k72653

answered 2 hours ago

W-B

89.6k72653

answered 2 hours ago

W-B

89.6k72653

1

Hey Wen, can you please format your answer better? Answers which require horizontal scrolling are difficult to read.
â€“Â coldspeed
1 hour ago

@coldspeed ah , that is cause by the comment :-) fixed
â€“Â W-B
1 hour ago

I meant something like my edit. If the line crosses 80 characters, try to format each method on its own line.
â€“Â coldspeed
1 hour ago

add a commentÂ |Â

1

Hey Wen, can you please format your answer better? Answers which require horizontal scrolling are difficult to read.
â€“Â coldspeed
1 hour ago

@coldspeed ah , that is cause by the comment :-) fixed
â€“Â W-B
1 hour ago

I meant something like my edit. If the line crosses 80 characters, try to format each method on its own line.
â€“Â coldspeed
1 hour ago

Hey Wen, can you please format your answer better? Answers which require horizontal scrolling are difficult to read.
â€“Â coldspeed
1 hour ago

@coldspeed ah , that is cause by the comment :-) fixed
â€“Â W-B
1 hour ago

I meant something like my edit. If the line crosses 80 characters, try to format each method on its own line.
â€“Â coldspeed
1 hour ago

add a commentÂ |Â

up vote
3
down vote

List comprehensions work better (read: much faster) for string problems like this which require multiple levels of splitting.

df2 = pd.DataFrame([
 dict(y.split('-') for y in x.split('; ')) 
 for x in df.Items]).fillna('')
df2.insert(0, 'Code', df.Code)

print(df2)
 Code ca co eq go tp
0 123 hk 
1 456 eu lbe
2 789 us 
3 321 ch 
4 654 au au 
5 987 jp 
6 147 ml ml 
7 258 us # Should be "us,my"... see below.
8 369 my 
9 741 us 
10 852 eu 
11 963 ml ml ml

This does not handle the situation where multiple items with the same key can be present in a row. For that, a slightly more involved solution is needed.

from itertools import chain

v = [x.split('; ') for x in df.Items] 
X = pd.Series(df.Code.values.repeat([len(x) for x in v]))
Y = pd.DataFrame([x.split('-') for x in chain.from_iterable(v)])

df2 = pd.concat([X, Y], axis=1, ignore_index=True)

(df2.set_index([0, 1, 3])[2]
 .unstack(1)
 .fillna('')
 .groupby(level=0)
 .agg(lambda x: ','.join(x).strip(','))

1 ca co eq go tp
0 
123 hk 
147 ml ml 
258 us 
321 ch 
369 us,my 
456 eu lbe
654 au au 
741 us 
789 us 
852 eu 
963 ml ml ml 
987 jp

edited 1 hour ago

answered 2 hours ago

coldspeed

108k1798168

thanks vm for your post!
â€“Â spiff
38 mins ago

add a commentÂ |Â

up vote
3
down vote

List comprehensions work better (read: much faster) for string problems like this which require multiple levels of splitting.

df2 = pd.DataFrame([
 dict(y.split('-') for y in x.split('; ')) 
 for x in df.Items]).fillna('')
df2.insert(0, 'Code', df.Code)

print(df2)
 Code ca co eq go tp
0 123 hk 
1 456 eu lbe
2 789 us 
3 321 ch 
4 654 au au 
5 987 jp 
6 147 ml ml 
7 258 us # Should be "us,my"... see below.
8 369 my 
9 741 us 
10 852 eu 
11 963 ml ml ml

This does not handle the situation where multiple items with the same key can be present in a row. For that, a slightly more involved solution is needed.

from itertools import chain

v = [x.split('; ') for x in df.Items] 
X = pd.Series(df.Code.values.repeat([len(x) for x in v]))
Y = pd.DataFrame([x.split('-') for x in chain.from_iterable(v)])

df2 = pd.concat([X, Y], axis=1, ignore_index=True)

(df2.set_index([0, 1, 3])[2]
 .unstack(1)
 .fillna('')
 .groupby(level=0)
 .agg(lambda x: ','.join(x).strip(','))

1 ca co eq go tp
0 
123 hk 
147 ml ml 
258 us 
321 ch 
369 us,my 
456 eu lbe
654 au au 
741 us 
789 us 
852 eu 
963 ml ml ml 
987 jp

edited 1 hour ago

answered 2 hours ago

coldspeed

108k1798168

thanks vm for your post!
â€“Â spiff
38 mins ago

add a commentÂ |Â

up vote
3
down vote

List comprehensions work better (read: much faster) for string problems like this which require multiple levels of splitting.

df2 = pd.DataFrame([
 dict(y.split('-') for y in x.split('; ')) 
 for x in df.Items]).fillna('')
df2.insert(0, 'Code', df.Code)

print(df2)
 Code ca co eq go tp
0 123 hk 
1 456 eu lbe
2 789 us 
3 321 ch 
4 654 au au 
5 987 jp 
6 147 ml ml 
7 258 us # Should be "us,my"... see below.
8 369 my 
9 741 us 
10 852 eu 
11 963 ml ml ml

This does not handle the situation where multiple items with the same key can be present in a row. For that, a slightly more involved solution is needed.

from itertools import chain

v = [x.split('; ') for x in df.Items] 
X = pd.Series(df.Code.values.repeat([len(x) for x in v]))
Y = pd.DataFrame([x.split('-') for x in chain.from_iterable(v)])

df2 = pd.concat([X, Y], axis=1, ignore_index=True)

(df2.set_index([0, 1, 3])[2]
 .unstack(1)
 .fillna('')
 .groupby(level=0)
 .agg(lambda x: ','.join(x).strip(','))

1 ca co eq go tp
0 
123 hk 
147 ml ml 
258 us 
321 ch 
369 us,my 
456 eu lbe
654 au au 
741 us 
789 us 
852 eu 
963 ml ml ml 
987 jp

edited 1 hour ago

answered 2 hours ago

coldspeed

108k1798168

List comprehensions work better (read: much faster) for string problems like this which require multiple levels of splitting.

df2 = pd.DataFrame([
 dict(y.split('-') for y in x.split('; ')) 
 for x in df.Items]).fillna('')
df2.insert(0, 'Code', df.Code)

print(df2)
 Code ca co eq go tp
0 123 hk 
1 456 eu lbe
2 789 us 
3 321 ch 
4 654 au au 
5 987 jp 
6 147 ml ml 
7 258 us # Should be "us,my"... see below.
8 369 my 
9 741 us 
10 852 eu 
11 963 ml ml ml

This does not handle the situation where multiple items with the same key can be present in a row. For that, a slightly more involved solution is needed.

from itertools import chain

v = [x.split('; ') for x in df.Items] 
X = pd.Series(df.Code.values.repeat([len(x) for x in v]))
Y = pd.DataFrame([x.split('-') for x in chain.from_iterable(v)])

df2 = pd.concat([X, Y], axis=1, ignore_index=True)

(df2.set_index([0, 1, 3])[2]
 .unstack(1)
 .fillna('')
 .groupby(level=0)
 .agg(lambda x: ','.join(x).strip(','))

1 ca co eq go tp
0 
123 hk 
147 ml ml 
258 us 
321 ch 
369 us,my 
456 eu lbe
654 au au 
741 us 
789 us 
852 eu 
963 ml ml ml 
987 jp

edited 1 hour ago

answered 2 hours ago

coldspeed

108k1798168

edited 1 hour ago

answered 2 hours ago

coldspeed

108k1798168

answered 2 hours ago

coldspeed

108k1798168

answered 2 hours ago

coldspeed

108k1798168

thanks vm for your post!
â€“Â spiff
38 mins ago

add a commentÂ |Â

thanks vm for your post!
â€“Â spiff
38 mins ago

thanks vm for your post!
â€“Â spiff
38 mins ago

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky