Pythonic way of collapsing/grouping a list to aggregating max/min

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
6
down vote

favorite












Lets say I have the following list in python. It is ordered first by Equip, then by Date:



my_list = [
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
]


What I want to do is collapse the list by each set where a given piece of Equipment's job does not change, and grab the first and last date the equipment was there. E.g., this simple example should change to:



list_by_job = [
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05'
]


A couple of things to note:




  1. A-2 on Job 1 is only there for a single day, thus its First and Last Date should be the same.

  2. A piece of equipment could be on a job, leave that job, and come back. In this case, I'd need to see an entry for each time it was on the job, not just one single summary.

  3. As stated before, the list is already sorted first by Equip, then by Date, so that ordering can be assumed. (If there is a better way to sort to accomplish this, I am all ears)

For point 3, the list



my_list = [
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03'
]


should yield



 list_by_job = [
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-01',
'Equip': 'A-2', 'Job': 'Job 2', 'First': '2018-01-02', 'Last': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03'
]


Currently I am doing so in a simple loop/non-pythonic way:



list_by_job = 

last_entry = None
for entry in my_list:
if last_entry is None or last_entry['Equip'] != entry['Equip'] or last_entry['Job'] != entry['Job']:
list_by_job.append('Equip': entry['Equip'], 'Job': entry['Job'], 'First': entry['Date'], 'Last': entry['Date'])
else:
list_by_job[-1]['Last'] = entry['Date']
last_entry = entry


Is there a more pythonic way to do this using Python's list comprehension, etc?










share|improve this question



















  • 1




    Looks more like a job for pandas.
    – Willem Van Onsem
    2 hours ago














up vote
6
down vote

favorite












Lets say I have the following list in python. It is ordered first by Equip, then by Date:



my_list = [
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
]


What I want to do is collapse the list by each set where a given piece of Equipment's job does not change, and grab the first and last date the equipment was there. E.g., this simple example should change to:



list_by_job = [
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05'
]


A couple of things to note:




  1. A-2 on Job 1 is only there for a single day, thus its First and Last Date should be the same.

  2. A piece of equipment could be on a job, leave that job, and come back. In this case, I'd need to see an entry for each time it was on the job, not just one single summary.

  3. As stated before, the list is already sorted first by Equip, then by Date, so that ordering can be assumed. (If there is a better way to sort to accomplish this, I am all ears)

For point 3, the list



my_list = [
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03'
]


should yield



 list_by_job = [
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-01',
'Equip': 'A-2', 'Job': 'Job 2', 'First': '2018-01-02', 'Last': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03'
]


Currently I am doing so in a simple loop/non-pythonic way:



list_by_job = 

last_entry = None
for entry in my_list:
if last_entry is None or last_entry['Equip'] != entry['Equip'] or last_entry['Job'] != entry['Job']:
list_by_job.append('Equip': entry['Equip'], 'Job': entry['Job'], 'First': entry['Date'], 'Last': entry['Date'])
else:
list_by_job[-1]['Last'] = entry['Date']
last_entry = entry


Is there a more pythonic way to do this using Python's list comprehension, etc?










share|improve this question



















  • 1




    Looks more like a job for pandas.
    – Willem Van Onsem
    2 hours ago












up vote
6
down vote

favorite









up vote
6
down vote

favorite











Lets say I have the following list in python. It is ordered first by Equip, then by Date:



my_list = [
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
]


What I want to do is collapse the list by each set where a given piece of Equipment's job does not change, and grab the first and last date the equipment was there. E.g., this simple example should change to:



list_by_job = [
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05'
]


A couple of things to note:




  1. A-2 on Job 1 is only there for a single day, thus its First and Last Date should be the same.

  2. A piece of equipment could be on a job, leave that job, and come back. In this case, I'd need to see an entry for each time it was on the job, not just one single summary.

  3. As stated before, the list is already sorted first by Equip, then by Date, so that ordering can be assumed. (If there is a better way to sort to accomplish this, I am all ears)

For point 3, the list



my_list = [
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03'
]


should yield



 list_by_job = [
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-01',
'Equip': 'A-2', 'Job': 'Job 2', 'First': '2018-01-02', 'Last': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03'
]


Currently I am doing so in a simple loop/non-pythonic way:



list_by_job = 

last_entry = None
for entry in my_list:
if last_entry is None or last_entry['Equip'] != entry['Equip'] or last_entry['Job'] != entry['Job']:
list_by_job.append('Equip': entry['Equip'], 'Job': entry['Job'], 'First': entry['Date'], 'Last': entry['Date'])
else:
list_by_job[-1]['Last'] = entry['Date']
last_entry = entry


Is there a more pythonic way to do this using Python's list comprehension, etc?










share|improve this question















Lets say I have the following list in python. It is ordered first by Equip, then by Date:



my_list = [
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
]


What I want to do is collapse the list by each set where a given piece of Equipment's job does not change, and grab the first and last date the equipment was there. E.g., this simple example should change to:



list_by_job = [
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05'
]


A couple of things to note:




  1. A-2 on Job 1 is only there for a single day, thus its First and Last Date should be the same.

  2. A piece of equipment could be on a job, leave that job, and come back. In this case, I'd need to see an entry for each time it was on the job, not just one single summary.

  3. As stated before, the list is already sorted first by Equip, then by Date, so that ordering can be assumed. (If there is a better way to sort to accomplish this, I am all ears)

For point 3, the list



my_list = [
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03'
]


should yield



 list_by_job = [
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-01',
'Equip': 'A-2', 'Job': 'Job 2', 'First': '2018-01-02', 'Last': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03'
]


Currently I am doing so in a simple loop/non-pythonic way:



list_by_job = 

last_entry = None
for entry in my_list:
if last_entry is None or last_entry['Equip'] != entry['Equip'] or last_entry['Job'] != entry['Job']:
list_by_job.append('Equip': entry['Equip'], 'Job': entry['Job'], 'First': entry['Date'], 'Last': entry['Date'])
else:
list_by_job[-1]['Last'] = entry['Date']
last_entry = entry


Is there a more pythonic way to do this using Python's list comprehension, etc?







python list aggregate list-comprehension






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 1 hour ago

























asked 2 hours ago









MarkD

2,47611833




2,47611833







  • 1




    Looks more like a job for pandas.
    – Willem Van Onsem
    2 hours ago












  • 1




    Looks more like a job for pandas.
    – Willem Van Onsem
    2 hours ago







1




1




Looks more like a job for pandas.
– Willem Van Onsem
2 hours ago




Looks more like a job for pandas.
– Willem Van Onsem
2 hours ago












3 Answers
3






active

oldest

votes

















up vote
6
down vote



accepted










You can use itertools.groupby:



import itertools
def _key(d):
return (d['Equip'], d['Job'])

my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
new_data = [[a, list(b)] for a, b in itertools.groupby(my_list, key=_key)]
final_result = ["Equip":c, 'Job':d, 'First':b[0]['Date'], 'Last':b[-1]['Date'] for [c, d], b in new_data]


Output:



['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-01', 
'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']


Edit:



Using data as suggested in your comment:



my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']


Output:



['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-01', 'First': '2018-01-01', 
'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-02', 'First': '2018-01-02',
'Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']





share|improve this answer






















  • I think new_data should be sorted by date too third (and not just Equip and Job). itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
    – slider
    2 hours ago











  • _key = itemgetter('Equip', 'Job') would be the built-in way to create that key function.
    – schwobaseggl
    2 hours ago










  • Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the second A-1 Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.
    – MarkD
    2 hours ago










  • @MarkD Please see my recent edit.
    – Ajax1234
    1 hour ago

















up vote
3
down vote













I suggest using pandas for this.



itertools.groupby is cool but IMO a bit harder to comprehend.



>>> import pandas as pd
>>>
>>> my_list = [
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
...: 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
...:]
>>>
>>> df = pd.DataFrame(my_list)
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> groups = df.groupby(['Equip', 'Job']).agg('Date': [min, max]).reset_index()
>>> groups.columns = ['Equip', 'Job', 'First', 'Last']
>>> groups
>>>
Equip Job First Last
0 A-1 Job 1 2018-01-01 2018-01-03
1 A-1 Job 2 2018-01-04 2018-01-05
2 A-2 Job 1 2018-01-03 2018-01-03
3 A-2 Job 3 2018-01-04 2018-01-05
>>>
>>> groups.to_dict(orient='records')
>>>
['Equip': 'A-1',
'First': Timestamp('2018-01-01 00:00:00'),
'Job': 'Job 1',
'Last': Timestamp('2018-01-03 00:00:00'),
'Equip': 'A-1',
'First': Timestamp('2018-01-04 00:00:00'),
'Job': 'Job 2',
'Last': Timestamp('2018-01-05 00:00:00'),
'Equip': 'A-2',
'First': Timestamp('2018-01-03 00:00:00'),
'Job': 'Job 1',
'Last': Timestamp('2018-01-03 00:00:00'),
'Equip': 'A-2',
'First': Timestamp('2018-01-04 00:00:00'),
'Job': 'Job 3',
'Last': Timestamp('2018-01-05 00:00:00')]


I suggest keeping the dates as time stamps.






share|improve this answer



























    up vote
    1
    down vote













    You can use pandas here, which is some sort of "database interface" for data:



    import pandas as pd

    df = pd.DataFrame(my_list)
    df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
    df2.columns = df2.columns.droplevel()
    df2 = df2.reset_index()
    result = df2.to_dict('records')


    for the given sample input, this gives:



    >>> df2.to_dict('records')
    ['Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
    'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
    'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
    'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05']


    In case the date format is not '%Y-%m-%d', then one first needs to convert it with pd.to_datetime(..) like:



    import pandas as pd

    df = pd.DataFrame(my_list)
    df['Date'] = pd.to_datetime(df['Date'])
    df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
    df2.columns = df2.columns.droplevel()
    df2 = df2.reset_index()
    result = df2.to_dict('records')





    share|improve this answer






















    • Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
      – MarkD
      1 hour ago










    • @MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
      – Willem Van Onsem
      1 hour ago










    • Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
      – timgeb
      1 hour ago










    • @timgeb: if the strings are formatted '%Y-%m-%d', then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.
      – Willem Van Onsem
      1 hour ago











    • Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
      – timgeb
      1 hour ago











    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53144837%2fpythonic-way-of-collapsing-grouping-a-list-to-aggregating-max-min%23new-answer', 'question_page');

    );

    Post as a guest






























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    6
    down vote



    accepted










    You can use itertools.groupby:



    import itertools
    def _key(d):
    return (d['Equip'], d['Job'])

    my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
    new_data = [[a, list(b)] for a, b in itertools.groupby(my_list, key=_key)]
    final_result = ["Equip":c, 'Job':d, 'First':b[0]['Date'], 'Last':b[-1]['Date'] for [c, d], b in new_data]


    Output:



    ['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-01', 
    'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
    'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
    'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']


    Edit:



    Using data as suggested in your comment:



    my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']


    Output:



    ['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-01', 'First': '2018-01-01', 
    'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-02', 'First': '2018-01-02',
    'Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
    'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
    'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
    'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']





    share|improve this answer






















    • I think new_data should be sorted by date too third (and not just Equip and Job). itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
      – slider
      2 hours ago











    • _key = itemgetter('Equip', 'Job') would be the built-in way to create that key function.
      – schwobaseggl
      2 hours ago










    • Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the second A-1 Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.
      – MarkD
      2 hours ago










    • @MarkD Please see my recent edit.
      – Ajax1234
      1 hour ago














    up vote
    6
    down vote



    accepted










    You can use itertools.groupby:



    import itertools
    def _key(d):
    return (d['Equip'], d['Job'])

    my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
    new_data = [[a, list(b)] for a, b in itertools.groupby(my_list, key=_key)]
    final_result = ["Equip":c, 'Job':d, 'First':b[0]['Date'], 'Last':b[-1]['Date'] for [c, d], b in new_data]


    Output:



    ['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-01', 
    'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
    'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
    'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']


    Edit:



    Using data as suggested in your comment:



    my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']


    Output:



    ['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-01', 'First': '2018-01-01', 
    'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-02', 'First': '2018-01-02',
    'Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
    'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
    'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
    'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']





    share|improve this answer






















    • I think new_data should be sorted by date too third (and not just Equip and Job). itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
      – slider
      2 hours ago











    • _key = itemgetter('Equip', 'Job') would be the built-in way to create that key function.
      – schwobaseggl
      2 hours ago










    • Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the second A-1 Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.
      – MarkD
      2 hours ago










    • @MarkD Please see my recent edit.
      – Ajax1234
      1 hour ago












    up vote
    6
    down vote



    accepted







    up vote
    6
    down vote



    accepted






    You can use itertools.groupby:



    import itertools
    def _key(d):
    return (d['Equip'], d['Job'])

    my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
    new_data = [[a, list(b)] for a, b in itertools.groupby(my_list, key=_key)]
    final_result = ["Equip":c, 'Job':d, 'First':b[0]['Date'], 'Last':b[-1]['Date'] for [c, d], b in new_data]


    Output:



    ['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-01', 
    'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
    'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
    'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']


    Edit:



    Using data as suggested in your comment:



    my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']


    Output:



    ['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-01', 'First': '2018-01-01', 
    'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-02', 'First': '2018-01-02',
    'Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
    'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
    'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
    'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']





    share|improve this answer














    You can use itertools.groupby:



    import itertools
    def _key(d):
    return (d['Equip'], d['Job'])

    my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
    new_data = [[a, list(b)] for a, b in itertools.groupby(my_list, key=_key)]
    final_result = ["Equip":c, 'Job':d, 'First':b[0]['Date'], 'Last':b[-1]['Date'] for [c, d], b in new_data]


    Output:



    ['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-01', 
    'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
    'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
    'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']


    Edit:



    Using data as suggested in your comment:



    my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']


    Output:



    ['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-01', 'First': '2018-01-01', 
    'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-02', 'First': '2018-01-02',
    'Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
    'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04',
    'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03',
    'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 1 hour ago

























    answered 2 hours ago









    Ajax1234

    37.8k42249




    37.8k42249











    • I think new_data should be sorted by date too third (and not just Equip and Job). itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
      – slider
      2 hours ago











    • _key = itemgetter('Equip', 'Job') would be the built-in way to create that key function.
      – schwobaseggl
      2 hours ago










    • Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the second A-1 Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.
      – MarkD
      2 hours ago










    • @MarkD Please see my recent edit.
      – Ajax1234
      1 hour ago
















    • I think new_data should be sorted by date too third (and not just Equip and Job). itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
      – slider
      2 hours ago











    • _key = itemgetter('Equip', 'Job') would be the built-in way to create that key function.
      – schwobaseggl
      2 hours ago










    • Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the second A-1 Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.
      – MarkD
      2 hours ago










    • @MarkD Please see my recent edit.
      – Ajax1234
      1 hour ago















    I think new_data should be sorted by date too third (and not just Equip and Job). itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
    – slider
    2 hours ago





    I think new_data should be sorted by date too third (and not just Equip and Job). itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
    – slider
    2 hours ago













    _key = itemgetter('Equip', 'Job') would be the built-in way to create that key function.
    – schwobaseggl
    2 hours ago




    _key = itemgetter('Equip', 'Job') would be the built-in way to create that key function.
    – schwobaseggl
    2 hours ago












    Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the second A-1 Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.
    – MarkD
    2 hours ago




    Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the second A-1 Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.
    – MarkD
    2 hours ago












    @MarkD Please see my recent edit.
    – Ajax1234
    1 hour ago




    @MarkD Please see my recent edit.
    – Ajax1234
    1 hour ago












    up vote
    3
    down vote













    I suggest using pandas for this.



    itertools.groupby is cool but IMO a bit harder to comprehend.



    >>> import pandas as pd
    >>>
    >>> my_list = [
    ...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
    ...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
    ...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
    ...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
    ...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
    ...: 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
    ...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
    ...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
    ...:]
    >>>
    >>> df = pd.DataFrame(my_list)
    >>> df['Date'] = pd.to_datetime(df['Date'])
    >>> groups = df.groupby(['Equip', 'Job']).agg('Date': [min, max]).reset_index()
    >>> groups.columns = ['Equip', 'Job', 'First', 'Last']
    >>> groups
    >>>
    Equip Job First Last
    0 A-1 Job 1 2018-01-01 2018-01-03
    1 A-1 Job 2 2018-01-04 2018-01-05
    2 A-2 Job 1 2018-01-03 2018-01-03
    3 A-2 Job 3 2018-01-04 2018-01-05
    >>>
    >>> groups.to_dict(orient='records')
    >>>
    ['Equip': 'A-1',
    'First': Timestamp('2018-01-01 00:00:00'),
    'Job': 'Job 1',
    'Last': Timestamp('2018-01-03 00:00:00'),
    'Equip': 'A-1',
    'First': Timestamp('2018-01-04 00:00:00'),
    'Job': 'Job 2',
    'Last': Timestamp('2018-01-05 00:00:00'),
    'Equip': 'A-2',
    'First': Timestamp('2018-01-03 00:00:00'),
    'Job': 'Job 1',
    'Last': Timestamp('2018-01-03 00:00:00'),
    'Equip': 'A-2',
    'First': Timestamp('2018-01-04 00:00:00'),
    'Job': 'Job 3',
    'Last': Timestamp('2018-01-05 00:00:00')]


    I suggest keeping the dates as time stamps.






    share|improve this answer
























      up vote
      3
      down vote













      I suggest using pandas for this.



      itertools.groupby is cool but IMO a bit harder to comprehend.



      >>> import pandas as pd
      >>>
      >>> my_list = [
      ...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
      ...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
      ...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
      ...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
      ...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
      ...: 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
      ...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
      ...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
      ...:]
      >>>
      >>> df = pd.DataFrame(my_list)
      >>> df['Date'] = pd.to_datetime(df['Date'])
      >>> groups = df.groupby(['Equip', 'Job']).agg('Date': [min, max]).reset_index()
      >>> groups.columns = ['Equip', 'Job', 'First', 'Last']
      >>> groups
      >>>
      Equip Job First Last
      0 A-1 Job 1 2018-01-01 2018-01-03
      1 A-1 Job 2 2018-01-04 2018-01-05
      2 A-2 Job 1 2018-01-03 2018-01-03
      3 A-2 Job 3 2018-01-04 2018-01-05
      >>>
      >>> groups.to_dict(orient='records')
      >>>
      ['Equip': 'A-1',
      'First': Timestamp('2018-01-01 00:00:00'),
      'Job': 'Job 1',
      'Last': Timestamp('2018-01-03 00:00:00'),
      'Equip': 'A-1',
      'First': Timestamp('2018-01-04 00:00:00'),
      'Job': 'Job 2',
      'Last': Timestamp('2018-01-05 00:00:00'),
      'Equip': 'A-2',
      'First': Timestamp('2018-01-03 00:00:00'),
      'Job': 'Job 1',
      'Last': Timestamp('2018-01-03 00:00:00'),
      'Equip': 'A-2',
      'First': Timestamp('2018-01-04 00:00:00'),
      'Job': 'Job 3',
      'Last': Timestamp('2018-01-05 00:00:00')]


      I suggest keeping the dates as time stamps.






      share|improve this answer






















        up vote
        3
        down vote










        up vote
        3
        down vote









        I suggest using pandas for this.



        itertools.groupby is cool but IMO a bit harder to comprehend.



        >>> import pandas as pd
        >>>
        >>> my_list = [
        ...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
        ...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
        ...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
        ...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
        ...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
        ...: 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
        ...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
        ...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
        ...:]
        >>>
        >>> df = pd.DataFrame(my_list)
        >>> df['Date'] = pd.to_datetime(df['Date'])
        >>> groups = df.groupby(['Equip', 'Job']).agg('Date': [min, max]).reset_index()
        >>> groups.columns = ['Equip', 'Job', 'First', 'Last']
        >>> groups
        >>>
        Equip Job First Last
        0 A-1 Job 1 2018-01-01 2018-01-03
        1 A-1 Job 2 2018-01-04 2018-01-05
        2 A-2 Job 1 2018-01-03 2018-01-03
        3 A-2 Job 3 2018-01-04 2018-01-05
        >>>
        >>> groups.to_dict(orient='records')
        >>>
        ['Equip': 'A-1',
        'First': Timestamp('2018-01-01 00:00:00'),
        'Job': 'Job 1',
        'Last': Timestamp('2018-01-03 00:00:00'),
        'Equip': 'A-1',
        'First': Timestamp('2018-01-04 00:00:00'),
        'Job': 'Job 2',
        'Last': Timestamp('2018-01-05 00:00:00'),
        'Equip': 'A-2',
        'First': Timestamp('2018-01-03 00:00:00'),
        'Job': 'Job 1',
        'Last': Timestamp('2018-01-03 00:00:00'),
        'Equip': 'A-2',
        'First': Timestamp('2018-01-04 00:00:00'),
        'Job': 'Job 3',
        'Last': Timestamp('2018-01-05 00:00:00')]


        I suggest keeping the dates as time stamps.






        share|improve this answer












        I suggest using pandas for this.



        itertools.groupby is cool but IMO a bit harder to comprehend.



        >>> import pandas as pd
        >>>
        >>> my_list = [
        ...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
        ...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
        ...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
        ...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
        ...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
        ...: 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
        ...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
        ...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
        ...:]
        >>>
        >>> df = pd.DataFrame(my_list)
        >>> df['Date'] = pd.to_datetime(df['Date'])
        >>> groups = df.groupby(['Equip', 'Job']).agg('Date': [min, max]).reset_index()
        >>> groups.columns = ['Equip', 'Job', 'First', 'Last']
        >>> groups
        >>>
        Equip Job First Last
        0 A-1 Job 1 2018-01-01 2018-01-03
        1 A-1 Job 2 2018-01-04 2018-01-05
        2 A-2 Job 1 2018-01-03 2018-01-03
        3 A-2 Job 3 2018-01-04 2018-01-05
        >>>
        >>> groups.to_dict(orient='records')
        >>>
        ['Equip': 'A-1',
        'First': Timestamp('2018-01-01 00:00:00'),
        'Job': 'Job 1',
        'Last': Timestamp('2018-01-03 00:00:00'),
        'Equip': 'A-1',
        'First': Timestamp('2018-01-04 00:00:00'),
        'Job': 'Job 2',
        'Last': Timestamp('2018-01-05 00:00:00'),
        'Equip': 'A-2',
        'First': Timestamp('2018-01-03 00:00:00'),
        'Job': 'Job 1',
        'Last': Timestamp('2018-01-03 00:00:00'),
        'Equip': 'A-2',
        'First': Timestamp('2018-01-04 00:00:00'),
        'Job': 'Job 3',
        'Last': Timestamp('2018-01-05 00:00:00')]


        I suggest keeping the dates as time stamps.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 2 hours ago









        timgeb

        41.6k105581




        41.6k105581




















            up vote
            1
            down vote













            You can use pandas here, which is some sort of "database interface" for data:



            import pandas as pd

            df = pd.DataFrame(my_list)
            df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
            df2.columns = df2.columns.droplevel()
            df2 = df2.reset_index()
            result = df2.to_dict('records')


            for the given sample input, this gives:



            >>> df2.to_dict('records')
            ['Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
            'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
            'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
            'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05']


            In case the date format is not '%Y-%m-%d', then one first needs to convert it with pd.to_datetime(..) like:



            import pandas as pd

            df = pd.DataFrame(my_list)
            df['Date'] = pd.to_datetime(df['Date'])
            df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
            df2.columns = df2.columns.droplevel()
            df2 = df2.reset_index()
            result = df2.to_dict('records')





            share|improve this answer






















            • Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
              – MarkD
              1 hour ago










            • @MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
              – Willem Van Onsem
              1 hour ago










            • Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
              – timgeb
              1 hour ago










            • @timgeb: if the strings are formatted '%Y-%m-%d', then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.
              – Willem Van Onsem
              1 hour ago











            • Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
              – timgeb
              1 hour ago















            up vote
            1
            down vote













            You can use pandas here, which is some sort of "database interface" for data:



            import pandas as pd

            df = pd.DataFrame(my_list)
            df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
            df2.columns = df2.columns.droplevel()
            df2 = df2.reset_index()
            result = df2.to_dict('records')


            for the given sample input, this gives:



            >>> df2.to_dict('records')
            ['Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
            'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
            'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
            'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05']


            In case the date format is not '%Y-%m-%d', then one first needs to convert it with pd.to_datetime(..) like:



            import pandas as pd

            df = pd.DataFrame(my_list)
            df['Date'] = pd.to_datetime(df['Date'])
            df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
            df2.columns = df2.columns.droplevel()
            df2 = df2.reset_index()
            result = df2.to_dict('records')





            share|improve this answer






















            • Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
              – MarkD
              1 hour ago










            • @MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
              – Willem Van Onsem
              1 hour ago










            • Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
              – timgeb
              1 hour ago










            • @timgeb: if the strings are formatted '%Y-%m-%d', then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.
              – Willem Van Onsem
              1 hour ago











            • Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
              – timgeb
              1 hour ago













            up vote
            1
            down vote










            up vote
            1
            down vote









            You can use pandas here, which is some sort of "database interface" for data:



            import pandas as pd

            df = pd.DataFrame(my_list)
            df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
            df2.columns = df2.columns.droplevel()
            df2 = df2.reset_index()
            result = df2.to_dict('records')


            for the given sample input, this gives:



            >>> df2.to_dict('records')
            ['Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
            'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
            'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
            'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05']


            In case the date format is not '%Y-%m-%d', then one first needs to convert it with pd.to_datetime(..) like:



            import pandas as pd

            df = pd.DataFrame(my_list)
            df['Date'] = pd.to_datetime(df['Date'])
            df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
            df2.columns = df2.columns.droplevel()
            df2 = df2.reset_index()
            result = df2.to_dict('records')





            share|improve this answer














            You can use pandas here, which is some sort of "database interface" for data:



            import pandas as pd

            df = pd.DataFrame(my_list)
            df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
            df2.columns = df2.columns.droplevel()
            df2 = df2.reset_index()
            result = df2.to_dict('records')


            for the given sample input, this gives:



            >>> df2.to_dict('records')
            ['Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
            'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
            'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
            'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05']


            In case the date format is not '%Y-%m-%d', then one first needs to convert it with pd.to_datetime(..) like:



            import pandas as pd

            df = pd.DataFrame(my_list)
            df['Date'] = pd.to_datetime(df['Date'])
            df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
            df2.columns = df2.columns.droplevel()
            df2 = df2.reset_index()
            result = df2.to_dict('records')






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 1 hour ago

























            answered 1 hour ago









            Willem Van Onsem

            136k16128218




            136k16128218











            • Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
              – MarkD
              1 hour ago










            • @MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
              – Willem Van Onsem
              1 hour ago










            • Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
              – timgeb
              1 hour ago










            • @timgeb: if the strings are formatted '%Y-%m-%d', then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.
              – Willem Van Onsem
              1 hour ago











            • Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
              – timgeb
              1 hour ago

















            • Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
              – MarkD
              1 hour ago










            • @MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
              – Willem Van Onsem
              1 hour ago










            • Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
              – timgeb
              1 hour ago










            • @timgeb: if the strings are formatted '%Y-%m-%d', then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.
              – Willem Van Onsem
              1 hour ago











            • Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
              – timgeb
              1 hour ago
















            Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
            – MarkD
            1 hour ago




            Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
            – MarkD
            1 hour ago












            @MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
            – Willem Van Onsem
            1 hour ago




            @MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
            – Willem Van Onsem
            1 hour ago












            Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
            – timgeb
            1 hour ago




            Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
            – timgeb
            1 hour ago












            @timgeb: if the strings are formatted '%Y-%m-%d', then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.
            – Willem Van Onsem
            1 hour ago





            @timgeb: if the strings are formatted '%Y-%m-%d', then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.
            – Willem Van Onsem
            1 hour ago













            Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
            – timgeb
            1 hour ago





            Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
            – timgeb
            1 hour ago


















             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53144837%2fpythonic-way-of-collapsing-grouping-a-list-to-aggregating-max-min%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            Long meetings (6-7 hours a day): Being “babysat” by supervisor

            Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

            Confectionery