Pandas column reformatting

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
6
down vote

favorite












Any quick way to achieve the below output pls?



Input:



Code Items
123 eq-hk
456 ca-eu; tp-lbe
789 ca-us
321 go-ch
654 ca-au; go-au
987 go-jp
147 co-ml; go-ml
258 ca-us
369 ca-us; ca-my
741 ca-us
852 ca-eu
963 ca-ml; co-ml; go-ml


Output:



Code eq ca go co tp
123 hk
456 eu lbe
789 us
321 ch
654 au au
987 jp
147 ml ml
258 us
369 us,my
741 us
852 eu
963 ml ml ml


Am again running into loops and a very ugly code to make it work. If there is an elegant way to achieve this pls?



Thank you!










share|improve this question



























    up vote
    6
    down vote

    favorite












    Any quick way to achieve the below output pls?



    Input:



    Code Items
    123 eq-hk
    456 ca-eu; tp-lbe
    789 ca-us
    321 go-ch
    654 ca-au; go-au
    987 go-jp
    147 co-ml; go-ml
    258 ca-us
    369 ca-us; ca-my
    741 ca-us
    852 ca-eu
    963 ca-ml; co-ml; go-ml


    Output:



    Code eq ca go co tp
    123 hk
    456 eu lbe
    789 us
    321 ch
    654 au au
    987 jp
    147 ml ml
    258 us
    369 us,my
    741 us
    852 eu
    963 ml ml ml


    Am again running into loops and a very ugly code to make it work. If there is an elegant way to achieve this pls?



    Thank you!










    share|improve this question

























      up vote
      6
      down vote

      favorite









      up vote
      6
      down vote

      favorite











      Any quick way to achieve the below output pls?



      Input:



      Code Items
      123 eq-hk
      456 ca-eu; tp-lbe
      789 ca-us
      321 go-ch
      654 ca-au; go-au
      987 go-jp
      147 co-ml; go-ml
      258 ca-us
      369 ca-us; ca-my
      741 ca-us
      852 ca-eu
      963 ca-ml; co-ml; go-ml


      Output:



      Code eq ca go co tp
      123 hk
      456 eu lbe
      789 us
      321 ch
      654 au au
      987 jp
      147 ml ml
      258 us
      369 us,my
      741 us
      852 eu
      963 ml ml ml


      Am again running into loops and a very ugly code to make it work. If there is an elegant way to achieve this pls?



      Thank you!










      share|improve this question















      Any quick way to achieve the below output pls?



      Input:



      Code Items
      123 eq-hk
      456 ca-eu; tp-lbe
      789 ca-us
      321 go-ch
      654 ca-au; go-au
      987 go-jp
      147 co-ml; go-ml
      258 ca-us
      369 ca-us; ca-my
      741 ca-us
      852 ca-eu
      963 ca-ml; co-ml; go-ml


      Output:



      Code eq ca go co tp
      123 hk
      456 eu lbe
      789 us
      321 ch
      654 au au
      987 jp
      147 ml ml
      258 us
      369 us,my
      741 us
      852 eu
      963 ml ml ml


      Am again running into loops and a very ugly code to make it work. If there is an elegant way to achieve this pls?



      Thank you!







      python pandas






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 2 hours ago









      Praveen

      6,10331430




      6,10331430










      asked 2 hours ago









      spiff

      361412




      361412






















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          2
          down vote



          accepted










          import pandas as pd
          df = pd.DataFrame([
          ('123', 'eq-hk'),
          ('456', 'ca-eu; tp-lbe'),
          ('789', 'ca-us'),
          ('321', 'go-ch'),
          ('654', 'ca-au; go-au'),
          ('987', 'go-jp'),
          ('147', 'co-ml; go-ml'),
          ('258', 'ca-us'),
          ('369', 'ca-us; ca-my'),
          ('741', 'ca-us'),
          ('852', 'ca-eu'),
          ('963', 'ca-ml; co-ml; go-ml')],
          columns=['Code', 'Items'])


          # Get item type list from each row, sum (concatenate) the lists and convert
          # to a set to remove duplicates
          item_types = set(df['Items'].str.findall('(w+)-').sum())
          print(item_types)
          # 'ca', 'co', 'eq', 'go', 'tp'

          # Generate a column for each item type
          df1 = pd.DataFrame(df['Code'])
          for t in item_types:
          df1[t] = df['Items'].str.findall('%s-(w+)' % t).apply(lambda x: ''.join(x))
          print(df1)

          # Code ca tp eq co go
          #0 123 hk
          #1 456 eu lbe
          #2 789 us
          #3 321 ch
          #4 654 au au
          #5 987 jp
          #6 147 ml ml
          #7 258 us
          #8 369 usmy
          #9 741 us
          #10 852 eu
          #11 963 ml ml ml





          share|improve this answer




















          • Thanks vm, I just changed the last bit to ','.join(x) so I get the comma between us,my example.
            – spiff
            39 mins ago










          • Have accepted this only as it fits my depth of understanding best, am comfortable with using this type of regex to achieve it.
            – spiff
            36 mins ago

















          up vote
          4
          down vote













          This is a little bit complicate



          (df.set_index('Code')
          .Items.str.split(';',expand=True)
          .stack()
          .str.split('-',expand=True)
          .set_index(0,append=True)[1]
          .unstack()
          .fillna('')
          .sum(level=0))

          0 ca co eq go tp
          Code
          123 hk
          147 ml ml
          258 us
          321 ch
          369 usmy
          456 eu lbe
          654 au au
          741 us
          789 us
          852 eu
          963 ml ml ml
          987 jp


          # using str split to get unnest the column,
          #then we do stack, and str split again , then set the first column to index
          # after unstack we yield the result





          share|improve this answer


















          • 1




            Hey Wen, can you please format your answer better? Answers which require horizontal scrolling are difficult to read.
            – coldspeed
            1 hour ago










          • @coldspeed ah , that is cause by the comment :-) fixed
            – W-B
            1 hour ago










          • I meant something like my edit. If the line crosses 80 characters, try to format each method on its own line.
            – coldspeed
            1 hour ago

















          up vote
          3
          down vote













          List comprehensions work better (read: much faster) for string problems like this which require multiple levels of splitting.



          df2 = pd.DataFrame([
          dict(y.split('-') for y in x.split('; '))
          for x in df.Items]).fillna('')
          df2.insert(0, 'Code', df.Code)

          print(df2)
          Code ca co eq go tp
          0 123 hk
          1 456 eu lbe
          2 789 us
          3 321 ch
          4 654 au au
          5 987 jp
          6 147 ml ml
          7 258 us # Should be "us,my"... see below.
          8 369 my
          9 741 us
          10 852 eu
          11 963 ml ml ml


          This does not handle the situation where multiple items with the same key can be present in a row. For that, a slightly more involved solution is needed.



          from itertools import chain

          v = [x.split('; ') for x in df.Items]
          X = pd.Series(df.Code.values.repeat([len(x) for x in v]))
          Y = pd.DataFrame([x.split('-') for x in chain.from_iterable(v)])

          df2 = pd.concat([X, Y], axis=1, ignore_index=True)

          (df2.set_index([0, 1, 3])[2]
          .unstack(1)
          .fillna('')
          .groupby(level=0)
          .agg(lambda x: ','.join(x).strip(','))

          1 ca co eq go tp
          0
          123 hk
          147 ml ml
          258 us
          321 ch
          369 us,my
          456 eu lbe
          654 au au
          741 us
          789 us
          852 eu
          963 ml ml ml
          987 jp





          share|improve this answer






















          • thanks vm for your post!
            – spiff
            38 mins ago










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53148121%2fpandas-column-reformatting%23new-answer', 'question_page');

          );

          Post as a guest






























          3 Answers
          3






          active

          oldest

          votes








          3 Answers
          3






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          2
          down vote



          accepted










          import pandas as pd
          df = pd.DataFrame([
          ('123', 'eq-hk'),
          ('456', 'ca-eu; tp-lbe'),
          ('789', 'ca-us'),
          ('321', 'go-ch'),
          ('654', 'ca-au; go-au'),
          ('987', 'go-jp'),
          ('147', 'co-ml; go-ml'),
          ('258', 'ca-us'),
          ('369', 'ca-us; ca-my'),
          ('741', 'ca-us'),
          ('852', 'ca-eu'),
          ('963', 'ca-ml; co-ml; go-ml')],
          columns=['Code', 'Items'])


          # Get item type list from each row, sum (concatenate) the lists and convert
          # to a set to remove duplicates
          item_types = set(df['Items'].str.findall('(w+)-').sum())
          print(item_types)
          # 'ca', 'co', 'eq', 'go', 'tp'

          # Generate a column for each item type
          df1 = pd.DataFrame(df['Code'])
          for t in item_types:
          df1[t] = df['Items'].str.findall('%s-(w+)' % t).apply(lambda x: ''.join(x))
          print(df1)

          # Code ca tp eq co go
          #0 123 hk
          #1 456 eu lbe
          #2 789 us
          #3 321 ch
          #4 654 au au
          #5 987 jp
          #6 147 ml ml
          #7 258 us
          #8 369 usmy
          #9 741 us
          #10 852 eu
          #11 963 ml ml ml





          share|improve this answer




















          • Thanks vm, I just changed the last bit to ','.join(x) so I get the comma between us,my example.
            – spiff
            39 mins ago










          • Have accepted this only as it fits my depth of understanding best, am comfortable with using this type of regex to achieve it.
            – spiff
            36 mins ago














          up vote
          2
          down vote



          accepted










          import pandas as pd
          df = pd.DataFrame([
          ('123', 'eq-hk'),
          ('456', 'ca-eu; tp-lbe'),
          ('789', 'ca-us'),
          ('321', 'go-ch'),
          ('654', 'ca-au; go-au'),
          ('987', 'go-jp'),
          ('147', 'co-ml; go-ml'),
          ('258', 'ca-us'),
          ('369', 'ca-us; ca-my'),
          ('741', 'ca-us'),
          ('852', 'ca-eu'),
          ('963', 'ca-ml; co-ml; go-ml')],
          columns=['Code', 'Items'])


          # Get item type list from each row, sum (concatenate) the lists and convert
          # to a set to remove duplicates
          item_types = set(df['Items'].str.findall('(w+)-').sum())
          print(item_types)
          # 'ca', 'co', 'eq', 'go', 'tp'

          # Generate a column for each item type
          df1 = pd.DataFrame(df['Code'])
          for t in item_types:
          df1[t] = df['Items'].str.findall('%s-(w+)' % t).apply(lambda x: ''.join(x))
          print(df1)

          # Code ca tp eq co go
          #0 123 hk
          #1 456 eu lbe
          #2 789 us
          #3 321 ch
          #4 654 au au
          #5 987 jp
          #6 147 ml ml
          #7 258 us
          #8 369 usmy
          #9 741 us
          #10 852 eu
          #11 963 ml ml ml





          share|improve this answer




















          • Thanks vm, I just changed the last bit to ','.join(x) so I get the comma between us,my example.
            – spiff
            39 mins ago










          • Have accepted this only as it fits my depth of understanding best, am comfortable with using this type of regex to achieve it.
            – spiff
            36 mins ago












          up vote
          2
          down vote



          accepted







          up vote
          2
          down vote



          accepted






          import pandas as pd
          df = pd.DataFrame([
          ('123', 'eq-hk'),
          ('456', 'ca-eu; tp-lbe'),
          ('789', 'ca-us'),
          ('321', 'go-ch'),
          ('654', 'ca-au; go-au'),
          ('987', 'go-jp'),
          ('147', 'co-ml; go-ml'),
          ('258', 'ca-us'),
          ('369', 'ca-us; ca-my'),
          ('741', 'ca-us'),
          ('852', 'ca-eu'),
          ('963', 'ca-ml; co-ml; go-ml')],
          columns=['Code', 'Items'])


          # Get item type list from each row, sum (concatenate) the lists and convert
          # to a set to remove duplicates
          item_types = set(df['Items'].str.findall('(w+)-').sum())
          print(item_types)
          # 'ca', 'co', 'eq', 'go', 'tp'

          # Generate a column for each item type
          df1 = pd.DataFrame(df['Code'])
          for t in item_types:
          df1[t] = df['Items'].str.findall('%s-(w+)' % t).apply(lambda x: ''.join(x))
          print(df1)

          # Code ca tp eq co go
          #0 123 hk
          #1 456 eu lbe
          #2 789 us
          #3 321 ch
          #4 654 au au
          #5 987 jp
          #6 147 ml ml
          #7 258 us
          #8 369 usmy
          #9 741 us
          #10 852 eu
          #11 963 ml ml ml





          share|improve this answer












          import pandas as pd
          df = pd.DataFrame([
          ('123', 'eq-hk'),
          ('456', 'ca-eu; tp-lbe'),
          ('789', 'ca-us'),
          ('321', 'go-ch'),
          ('654', 'ca-au; go-au'),
          ('987', 'go-jp'),
          ('147', 'co-ml; go-ml'),
          ('258', 'ca-us'),
          ('369', 'ca-us; ca-my'),
          ('741', 'ca-us'),
          ('852', 'ca-eu'),
          ('963', 'ca-ml; co-ml; go-ml')],
          columns=['Code', 'Items'])


          # Get item type list from each row, sum (concatenate) the lists and convert
          # to a set to remove duplicates
          item_types = set(df['Items'].str.findall('(w+)-').sum())
          print(item_types)
          # 'ca', 'co', 'eq', 'go', 'tp'

          # Generate a column for each item type
          df1 = pd.DataFrame(df['Code'])
          for t in item_types:
          df1[t] = df['Items'].str.findall('%s-(w+)' % t).apply(lambda x: ''.join(x))
          print(df1)

          # Code ca tp eq co go
          #0 123 hk
          #1 456 eu lbe
          #2 789 us
          #3 321 ch
          #4 654 au au
          #5 987 jp
          #6 147 ml ml
          #7 258 us
          #8 369 usmy
          #9 741 us
          #10 852 eu
          #11 963 ml ml ml






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 2 hours ago









          Yosi Hammer

          3915




          3915











          • Thanks vm, I just changed the last bit to ','.join(x) so I get the comma between us,my example.
            – spiff
            39 mins ago










          • Have accepted this only as it fits my depth of understanding best, am comfortable with using this type of regex to achieve it.
            – spiff
            36 mins ago
















          • Thanks vm, I just changed the last bit to ','.join(x) so I get the comma between us,my example.
            – spiff
            39 mins ago










          • Have accepted this only as it fits my depth of understanding best, am comfortable with using this type of regex to achieve it.
            – spiff
            36 mins ago















          Thanks vm, I just changed the last bit to ','.join(x) so I get the comma between us,my example.
          – spiff
          39 mins ago




          Thanks vm, I just changed the last bit to ','.join(x) so I get the comma between us,my example.
          – spiff
          39 mins ago












          Have accepted this only as it fits my depth of understanding best, am comfortable with using this type of regex to achieve it.
          – spiff
          36 mins ago




          Have accepted this only as it fits my depth of understanding best, am comfortable with using this type of regex to achieve it.
          – spiff
          36 mins ago












          up vote
          4
          down vote













          This is a little bit complicate



          (df.set_index('Code')
          .Items.str.split(';',expand=True)
          .stack()
          .str.split('-',expand=True)
          .set_index(0,append=True)[1]
          .unstack()
          .fillna('')
          .sum(level=0))

          0 ca co eq go tp
          Code
          123 hk
          147 ml ml
          258 us
          321 ch
          369 usmy
          456 eu lbe
          654 au au
          741 us
          789 us
          852 eu
          963 ml ml ml
          987 jp


          # using str split to get unnest the column,
          #then we do stack, and str split again , then set the first column to index
          # after unstack we yield the result





          share|improve this answer


















          • 1




            Hey Wen, can you please format your answer better? Answers which require horizontal scrolling are difficult to read.
            – coldspeed
            1 hour ago










          • @coldspeed ah , that is cause by the comment :-) fixed
            – W-B
            1 hour ago










          • I meant something like my edit. If the line crosses 80 characters, try to format each method on its own line.
            – coldspeed
            1 hour ago














          up vote
          4
          down vote













          This is a little bit complicate



          (df.set_index('Code')
          .Items.str.split(';',expand=True)
          .stack()
          .str.split('-',expand=True)
          .set_index(0,append=True)[1]
          .unstack()
          .fillna('')
          .sum(level=0))

          0 ca co eq go tp
          Code
          123 hk
          147 ml ml
          258 us
          321 ch
          369 usmy
          456 eu lbe
          654 au au
          741 us
          789 us
          852 eu
          963 ml ml ml
          987 jp


          # using str split to get unnest the column,
          #then we do stack, and str split again , then set the first column to index
          # after unstack we yield the result





          share|improve this answer


















          • 1




            Hey Wen, can you please format your answer better? Answers which require horizontal scrolling are difficult to read.
            – coldspeed
            1 hour ago










          • @coldspeed ah , that is cause by the comment :-) fixed
            – W-B
            1 hour ago










          • I meant something like my edit. If the line crosses 80 characters, try to format each method on its own line.
            – coldspeed
            1 hour ago












          up vote
          4
          down vote










          up vote
          4
          down vote









          This is a little bit complicate



          (df.set_index('Code')
          .Items.str.split(';',expand=True)
          .stack()
          .str.split('-',expand=True)
          .set_index(0,append=True)[1]
          .unstack()
          .fillna('')
          .sum(level=0))

          0 ca co eq go tp
          Code
          123 hk
          147 ml ml
          258 us
          321 ch
          369 usmy
          456 eu lbe
          654 au au
          741 us
          789 us
          852 eu
          963 ml ml ml
          987 jp


          # using str split to get unnest the column,
          #then we do stack, and str split again , then set the first column to index
          # after unstack we yield the result





          share|improve this answer














          This is a little bit complicate



          (df.set_index('Code')
          .Items.str.split(';',expand=True)
          .stack()
          .str.split('-',expand=True)
          .set_index(0,append=True)[1]
          .unstack()
          .fillna('')
          .sum(level=0))

          0 ca co eq go tp
          Code
          123 hk
          147 ml ml
          258 us
          321 ch
          369 usmy
          456 eu lbe
          654 au au
          741 us
          789 us
          852 eu
          963 ml ml ml
          987 jp


          # using str split to get unnest the column,
          #then we do stack, and str split again , then set the first column to index
          # after unstack we yield the result






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 1 hour ago









          coldspeed

          108k1798168




          108k1798168










          answered 2 hours ago









          W-B

          89.6k72653




          89.6k72653







          • 1




            Hey Wen, can you please format your answer better? Answers which require horizontal scrolling are difficult to read.
            – coldspeed
            1 hour ago










          • @coldspeed ah , that is cause by the comment :-) fixed
            – W-B
            1 hour ago










          • I meant something like my edit. If the line crosses 80 characters, try to format each method on its own line.
            – coldspeed
            1 hour ago












          • 1




            Hey Wen, can you please format your answer better? Answers which require horizontal scrolling are difficult to read.
            – coldspeed
            1 hour ago










          • @coldspeed ah , that is cause by the comment :-) fixed
            – W-B
            1 hour ago










          • I meant something like my edit. If the line crosses 80 characters, try to format each method on its own line.
            – coldspeed
            1 hour ago







          1




          1




          Hey Wen, can you please format your answer better? Answers which require horizontal scrolling are difficult to read.
          – coldspeed
          1 hour ago




          Hey Wen, can you please format your answer better? Answers which require horizontal scrolling are difficult to read.
          – coldspeed
          1 hour ago












          @coldspeed ah , that is cause by the comment :-) fixed
          – W-B
          1 hour ago




          @coldspeed ah , that is cause by the comment :-) fixed
          – W-B
          1 hour ago












          I meant something like my edit. If the line crosses 80 characters, try to format each method on its own line.
          – coldspeed
          1 hour ago




          I meant something like my edit. If the line crosses 80 characters, try to format each method on its own line.
          – coldspeed
          1 hour ago










          up vote
          3
          down vote













          List comprehensions work better (read: much faster) for string problems like this which require multiple levels of splitting.



          df2 = pd.DataFrame([
          dict(y.split('-') for y in x.split('; '))
          for x in df.Items]).fillna('')
          df2.insert(0, 'Code', df.Code)

          print(df2)
          Code ca co eq go tp
          0 123 hk
          1 456 eu lbe
          2 789 us
          3 321 ch
          4 654 au au
          5 987 jp
          6 147 ml ml
          7 258 us # Should be "us,my"... see below.
          8 369 my
          9 741 us
          10 852 eu
          11 963 ml ml ml


          This does not handle the situation where multiple items with the same key can be present in a row. For that, a slightly more involved solution is needed.



          from itertools import chain

          v = [x.split('; ') for x in df.Items]
          X = pd.Series(df.Code.values.repeat([len(x) for x in v]))
          Y = pd.DataFrame([x.split('-') for x in chain.from_iterable(v)])

          df2 = pd.concat([X, Y], axis=1, ignore_index=True)

          (df2.set_index([0, 1, 3])[2]
          .unstack(1)
          .fillna('')
          .groupby(level=0)
          .agg(lambda x: ','.join(x).strip(','))

          1 ca co eq go tp
          0
          123 hk
          147 ml ml
          258 us
          321 ch
          369 us,my
          456 eu lbe
          654 au au
          741 us
          789 us
          852 eu
          963 ml ml ml
          987 jp





          share|improve this answer






















          • thanks vm for your post!
            – spiff
            38 mins ago














          up vote
          3
          down vote













          List comprehensions work better (read: much faster) for string problems like this which require multiple levels of splitting.



          df2 = pd.DataFrame([
          dict(y.split('-') for y in x.split('; '))
          for x in df.Items]).fillna('')
          df2.insert(0, 'Code', df.Code)

          print(df2)
          Code ca co eq go tp
          0 123 hk
          1 456 eu lbe
          2 789 us
          3 321 ch
          4 654 au au
          5 987 jp
          6 147 ml ml
          7 258 us # Should be "us,my"... see below.
          8 369 my
          9 741 us
          10 852 eu
          11 963 ml ml ml


          This does not handle the situation where multiple items with the same key can be present in a row. For that, a slightly more involved solution is needed.



          from itertools import chain

          v = [x.split('; ') for x in df.Items]
          X = pd.Series(df.Code.values.repeat([len(x) for x in v]))
          Y = pd.DataFrame([x.split('-') for x in chain.from_iterable(v)])

          df2 = pd.concat([X, Y], axis=1, ignore_index=True)

          (df2.set_index([0, 1, 3])[2]
          .unstack(1)
          .fillna('')
          .groupby(level=0)
          .agg(lambda x: ','.join(x).strip(','))

          1 ca co eq go tp
          0
          123 hk
          147 ml ml
          258 us
          321 ch
          369 us,my
          456 eu lbe
          654 au au
          741 us
          789 us
          852 eu
          963 ml ml ml
          987 jp





          share|improve this answer






















          • thanks vm for your post!
            – spiff
            38 mins ago












          up vote
          3
          down vote










          up vote
          3
          down vote









          List comprehensions work better (read: much faster) for string problems like this which require multiple levels of splitting.



          df2 = pd.DataFrame([
          dict(y.split('-') for y in x.split('; '))
          for x in df.Items]).fillna('')
          df2.insert(0, 'Code', df.Code)

          print(df2)
          Code ca co eq go tp
          0 123 hk
          1 456 eu lbe
          2 789 us
          3 321 ch
          4 654 au au
          5 987 jp
          6 147 ml ml
          7 258 us # Should be "us,my"... see below.
          8 369 my
          9 741 us
          10 852 eu
          11 963 ml ml ml


          This does not handle the situation where multiple items with the same key can be present in a row. For that, a slightly more involved solution is needed.



          from itertools import chain

          v = [x.split('; ') for x in df.Items]
          X = pd.Series(df.Code.values.repeat([len(x) for x in v]))
          Y = pd.DataFrame([x.split('-') for x in chain.from_iterable(v)])

          df2 = pd.concat([X, Y], axis=1, ignore_index=True)

          (df2.set_index([0, 1, 3])[2]
          .unstack(1)
          .fillna('')
          .groupby(level=0)
          .agg(lambda x: ','.join(x).strip(','))

          1 ca co eq go tp
          0
          123 hk
          147 ml ml
          258 us
          321 ch
          369 us,my
          456 eu lbe
          654 au au
          741 us
          789 us
          852 eu
          963 ml ml ml
          987 jp





          share|improve this answer














          List comprehensions work better (read: much faster) for string problems like this which require multiple levels of splitting.



          df2 = pd.DataFrame([
          dict(y.split('-') for y in x.split('; '))
          for x in df.Items]).fillna('')
          df2.insert(0, 'Code', df.Code)

          print(df2)
          Code ca co eq go tp
          0 123 hk
          1 456 eu lbe
          2 789 us
          3 321 ch
          4 654 au au
          5 987 jp
          6 147 ml ml
          7 258 us # Should be "us,my"... see below.
          8 369 my
          9 741 us
          10 852 eu
          11 963 ml ml ml


          This does not handle the situation where multiple items with the same key can be present in a row. For that, a slightly more involved solution is needed.



          from itertools import chain

          v = [x.split('; ') for x in df.Items]
          X = pd.Series(df.Code.values.repeat([len(x) for x in v]))
          Y = pd.DataFrame([x.split('-') for x in chain.from_iterable(v)])

          df2 = pd.concat([X, Y], axis=1, ignore_index=True)

          (df2.set_index([0, 1, 3])[2]
          .unstack(1)
          .fillna('')
          .groupby(level=0)
          .agg(lambda x: ','.join(x).strip(','))

          1 ca co eq go tp
          0
          123 hk
          147 ml ml
          258 us
          321 ch
          369 us,my
          456 eu lbe
          654 au au
          741 us
          789 us
          852 eu
          963 ml ml ml
          987 jp






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 1 hour ago

























          answered 2 hours ago









          coldspeed

          108k1798168




          108k1798168











          • thanks vm for your post!
            – spiff
            38 mins ago
















          • thanks vm for your post!
            – spiff
            38 mins ago















          thanks vm for your post!
          – spiff
          38 mins ago




          thanks vm for your post!
          – spiff
          38 mins ago

















           

          draft saved


          draft discarded















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53148121%2fpandas-column-reformatting%23new-answer', 'question_page');

          );

          Post as a guest













































































          Comments

          Popular posts from this blog

          What does second last employer means? [closed]

          Installing NextGIS Connect into QGIS 3?

          Confectionery