Pythonic way of collapsing/grouping a list to aggregating max/min

up vote
6
down vote

favorite

Lets say I have the following list in python. It is ordered first by Equip, then by Date:

my_list = [
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
]

What I want to do is collapse the list by each set where a given piece of Equipment's job does not change, and grab the first and last date the equipment was there. E.g., this simple example should change to:

list_by_job = [
 'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
 'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
 'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
 'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05'
]

A couple of things to note:

A-2 on Job 1 is only there for a single day, thus its First and Last Date should be the same.

A piece of equipment could be on a job, leave that job, and come back. In this case, I'd need to see an entry for each time it was on the job, not just one single summary.

As stated before, the list is already sorted first by Equip, then by Date, so that ordering can be assumed. (If there is a better way to sort to accomplish this, I am all ears)

For point 3, the list

my_list = [
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-02',
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03'
]

should yield

 list_by_job = [
 'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-01',
 'Equip': 'A-2', 'Job': 'Job 2', 'First': '2018-01-02', 'Last': '2018-01-02',
 'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03'
 ]

Currently I am doing so in a simple loop/non-pythonic way:

list_by_job = 

last_entry = None
for entry in my_list:
 if last_entry is None or last_entry['Equip'] != entry['Equip'] or last_entry['Job'] != entry['Job']:
 list_by_job.append('Equip': entry['Equip'], 'Job': entry['Job'], 'First': entry['Date'], 'Last': entry['Date'])
 else:
 list_by_job[-1]['Last'] = entry['Date']
 last_entry = entry

Is there a more pythonic way to do this using Python's list comprehension, etc?

edited 1 hour ago

asked 2 hours ago

MarkD

2,47611833

1

Looks more like a job for pandas.
â€“Â Willem Van Onsem
2 hours ago

add a commentÂ |Â

up vote
6
down vote

favorite

Lets say I have the following list in python. It is ordered first by Equip, then by Date:

my_list = [
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
]

list_by_job = [
 'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
 'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
 'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
 'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05'
]

A couple of things to note:

A-2 on Job 1 is only there for a single day, thus its First and Last Date should be the same.

A piece of equipment could be on a job, leave that job, and come back. In this case, I'd need to see an entry for each time it was on the job, not just one single summary.

As stated before, the list is already sorted first by Equip, then by Date, so that ordering can be assumed. (If there is a better way to sort to accomplish this, I am all ears)

For point 3, the list

my_list = [
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-02',
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03'
]

should yield

 list_by_job = [
 'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-01',
 'Equip': 'A-2', 'Job': 'Job 2', 'First': '2018-01-02', 'Last': '2018-01-02',
 'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03'
 ]

Currently I am doing so in a simple loop/non-pythonic way:

list_by_job = 

last_entry = None
for entry in my_list:
 if last_entry is None or last_entry['Equip'] != entry['Equip'] or last_entry['Job'] != entry['Job']:
 list_by_job.append('Equip': entry['Equip'], 'Job': entry['Job'], 'First': entry['Date'], 'Last': entry['Date'])
 else:
 list_by_job[-1]['Last'] = entry['Date']
 last_entry = entry

Is there a more pythonic way to do this using Python's list comprehension, etc?

edited 1 hour ago

asked 2 hours ago

MarkD

2,47611833

1

Looks more like a job for pandas.
â€“Â Willem Van Onsem
2 hours ago

add a commentÂ |Â

up vote
6
down vote

favorite

Lets say I have the following list in python. It is ordered first by Equip, then by Date:

my_list = [
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
]

list_by_job = [
 'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
 'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
 'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
 'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05'
]

A couple of things to note:

A-2 on Job 1 is only there for a single day, thus its First and Last Date should be the same.

A piece of equipment could be on a job, leave that job, and come back. In this case, I'd need to see an entry for each time it was on the job, not just one single summary.

As stated before, the list is already sorted first by Equip, then by Date, so that ordering can be assumed. (If there is a better way to sort to accomplish this, I am all ears)

For point 3, the list

my_list = [
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-02',
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03'
]

should yield

 list_by_job = [
 'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-01',
 'Equip': 'A-2', 'Job': 'Job 2', 'First': '2018-01-02', 'Last': '2018-01-02',
 'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03'
 ]

Currently I am doing so in a simple loop/non-pythonic way:

list_by_job = 

last_entry = None
for entry in my_list:
 if last_entry is None or last_entry['Equip'] != entry['Equip'] or last_entry['Job'] != entry['Job']:
 list_by_job.append('Equip': entry['Equip'], 'Job': entry['Job'], 'First': entry['Date'], 'Last': entry['Date'])
 else:
 list_by_job[-1]['Last'] = entry['Date']
 last_entry = entry

Is there a more pythonic way to do this using Python's list comprehension, etc?

edited 1 hour ago

asked 2 hours ago

MarkD

2,47611833

Lets say I have the following list in python. It is ordered first by Equip, then by Date:

my_list = [
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
]

list_by_job = [
 'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
 'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
 'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
 'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05'
]

A couple of things to note:

A-2 on Job 1 is only there for a single day, thus its First and Last Date should be the same.

A piece of equipment could be on a job, leave that job, and come back. In this case, I'd need to see an entry for each time it was on the job, not just one single summary.

As stated before, the list is already sorted first by Equip, then by Date, so that ordering can be assumed. (If there is a better way to sort to accomplish this, I am all ears)

For point 3, the list

my_list = [
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-02',
 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03'
]

should yield

 list_by_job = [
 'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-01',
 'Equip': 'A-2', 'Job': 'Job 2', 'First': '2018-01-02', 'Last': '2018-01-02',
 'Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03'
 ]

Currently I am doing so in a simple loop/non-pythonic way:

list_by_job = 

last_entry = None
for entry in my_list:
 if last_entry is None or last_entry['Equip'] != entry['Equip'] or last_entry['Job'] != entry['Job']:
 list_by_job.append('Equip': entry['Equip'], 'Job': entry['Job'], 'First': entry['Date'], 'Last': entry['Date'])
 else:
 list_by_job[-1]['Last'] = entry['Date']
 last_entry = entry

Is there a more pythonic way to do this using Python's list comprehension, etc?

python list aggregate list-comprehension

edited 1 hour ago

asked 2 hours ago

MarkD

2,47611833

edited 1 hour ago

asked 2 hours ago

MarkD

2,47611833

edited 1 hour ago

asked 2 hours ago

MarkD

2,47611833

asked 2 hours ago

MarkD

2,47611833

asked 2 hours ago

MarkD

2,47611833

1

Looks more like a job for pandas.
â€“Â Willem Van Onsem
2 hours ago

add a commentÂ |Â

1

Looks more like a job for pandas.
â€“Â Willem Van Onsem
2 hours ago

Looks more like a job for pandas.
â€“Â Willem Van Onsem
2 hours ago

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
6
down vote

accepted

You can use itertools.groupby:

import itertools
def _key(d):
 return (d['Equip'], d['Job'])

my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
new_data = [[a, list(b)] for a, b in itertools.groupby(my_list, key=_key)]
final_result = ["Equip":c, 'Job':d, 'First':b[0]['Date'], 'Last':b[-1]['Date'] for [c, d], b in new_data]

Output:

['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-01', 
 'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04', 
 'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03', 
 'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']

Edit:

Using data as suggested in your comment:

my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']

Output:

['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-01', 'First': '2018-01-01', 
 'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-02', 'First': '2018-01-02', 
 'Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03', 
 'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04', 
 'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03', 
 'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']

edited 1 hour ago

answered 2 hours ago

Ajax1234

37.8k42249

I think new_data should be sorted by date too third (and not just Equip and Job). itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
â€“Â slider
2 hours ago

_key = itemgetter('Equip', 'Job') would be the built-in way to create that key function.
â€“Â schwobaseggl
2 hours ago

Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the second A-1 Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.
â€“Â MarkD
2 hours ago

@MarkD Please see my recent edit.
â€“Â Ajax1234
1 hour ago

add a commentÂ |Â

up vote
3
down vote

I suggest using pandas for this.

itertools.groupby is cool but IMO a bit harder to comprehend.

>>> import pandas as pd
>>>
>>> my_list = [
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
...: 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
...:]
>>>
>>> df = pd.DataFrame(my_list)
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> groups = df.groupby(['Equip', 'Job']).agg('Date': [min, max]).reset_index() 
>>> groups.columns = ['Equip', 'Job', 'First', 'Last']
>>> groups
>>> 
 Equip Job First Last
0 A-1 Job 1 2018-01-01 2018-01-03
1 A-1 Job 2 2018-01-04 2018-01-05
2 A-2 Job 1 2018-01-03 2018-01-03
3 A-2 Job 3 2018-01-04 2018-01-05
>>>
>>> groups.to_dict(orient='records')
>>> 
['Equip': 'A-1',
 'First': Timestamp('2018-01-01 00:00:00'),
 'Job': 'Job 1',
 'Last': Timestamp('2018-01-03 00:00:00'),
 'Equip': 'A-1',
 'First': Timestamp('2018-01-04 00:00:00'),
 'Job': 'Job 2',
 'Last': Timestamp('2018-01-05 00:00:00'),
 'Equip': 'A-2',
 'First': Timestamp('2018-01-03 00:00:00'),
 'Job': 'Job 1',
 'Last': Timestamp('2018-01-03 00:00:00'),
 'Equip': 'A-2',
 'First': Timestamp('2018-01-04 00:00:00'),
 'Job': 'Job 3',
 'Last': Timestamp('2018-01-05 00:00:00')]

I suggest keeping the dates as time stamps.

answered 2 hours ago

timgeb

41.6k105581

add a commentÂ |Â

up vote
1
down vote

You can use pandas here, which is some sort of "database interface" for data:

import pandas as pd

df = pd.DataFrame(my_list)
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')

for the given sample input, this gives:

>>> df2.to_dict('records')
['Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
 'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
 'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
 'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05']

In case the date format is not '%Y-%m-%d', then one first needs to convert it with pd.to_datetime(..) like:

import pandas as pd

df = pd.DataFrame(my_list)
df['Date'] = pd.to_datetime(df['Date'])
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')

edited 1 hour ago

answered 1 hour ago

Willem Van Onsem

136k16128218

Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
â€“Â MarkD
1 hour ago

@MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
â€“Â Willem Van Onsem
1 hour ago

Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
â€“Â timgeb
1 hour ago

@timgeb: if the strings are formatted '%Y-%m-%d', then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.
â€“Â Willem Van Onsem
1 hour ago

Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
â€“Â timgeb
1 hour ago

Â |Â
show 2 more comments

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53144837%2fpythonic-way-of-collapsing-grouping-a-list-to-aggregating-max-min%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
6
down vote

accepted

You can use itertools.groupby:

import itertools
def _key(d):
 return (d['Equip'], d['Job'])

my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
new_data = [[a, list(b)] for a, b in itertools.groupby(my_list, key=_key)]
final_result = ["Equip":c, 'Job':d, 'First':b[0]['Date'], 'Last':b[-1]['Date'] for [c, d], b in new_data]

Output:

['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-01', 
 'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04', 
 'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03', 
 'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']

Edit:

Using data as suggested in your comment:

my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']

Output:

['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-01', 'First': '2018-01-01', 
 'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-02', 'First': '2018-01-02', 
 'Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03', 
 'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04', 
 'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03', 
 'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']

edited 1 hour ago

answered 2 hours ago

Ajax1234

37.8k42249

I think new_data should be sorted by date too third (and not just Equip and Job). itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
â€“Â slider
2 hours ago

_key = itemgetter('Equip', 'Job') would be the built-in way to create that key function.
â€“Â schwobaseggl
2 hours ago

Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the second A-1 Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.
â€“Â MarkD
2 hours ago

@MarkD Please see my recent edit.
â€“Â Ajax1234
1 hour ago

add a commentÂ |Â

up vote
6
down vote

accepted

You can use itertools.groupby:

import itertools
def _key(d):
 return (d['Equip'], d['Job'])

my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
new_data = [[a, list(b)] for a, b in itertools.groupby(my_list, key=_key)]
final_result = ["Equip":c, 'Job':d, 'First':b[0]['Date'], 'Last':b[-1]['Date'] for [c, d], b in new_data]

Output:

['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-01', 
 'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04', 
 'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03', 
 'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']

Edit:

Using data as suggested in your comment:

my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']

Output:

['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-01', 'First': '2018-01-01', 
 'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-02', 'First': '2018-01-02', 
 'Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03', 
 'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04', 
 'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03', 
 'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']

edited 1 hour ago

answered 2 hours ago

Ajax1234

37.8k42249

I think new_data should be sorted by date too third (and not just Equip and Job). itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
â€“Â slider
2 hours ago

_key = itemgetter('Equip', 'Job') would be the built-in way to create that key function.
â€“Â schwobaseggl
2 hours ago

Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the second A-1 Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.
â€“Â MarkD
2 hours ago

@MarkD Please see my recent edit.
â€“Â Ajax1234
1 hour ago

add a commentÂ |Â

up vote
6
down vote

accepted

You can use itertools.groupby:

import itertools
def _key(d):
 return (d['Equip'], d['Job'])

my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
new_data = [[a, list(b)] for a, b in itertools.groupby(my_list, key=_key)]
final_result = ["Equip":c, 'Job':d, 'First':b[0]['Date'], 'Last':b[-1]['Date'] for [c, d], b in new_data]

Output:

['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-01', 
 'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04', 
 'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03', 
 'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']

Edit:

Using data as suggested in your comment:

my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']

Output:

['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-01', 'First': '2018-01-01', 
 'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-02', 'First': '2018-01-02', 
 'Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03', 
 'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04', 
 'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03', 
 'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']

edited 1 hour ago

answered 2 hours ago

Ajax1234

37.8k42249

You can use itertools.groupby:

import itertools
def _key(d):
 return (d['Equip'], d['Job'])

my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']
new_data = [[a, list(b)] for a, b in itertools.groupby(my_list, key=_key)]
final_result = ["Equip":c, 'Job':d, 'First':b[0]['Date'], 'Last':b[-1]['Date'] for [c, d], b in new_data]

Output:

['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-01', 
 'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04', 
 'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03', 
 'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']

Edit:

Using data as suggested in your comment:

my_list = ['Date': '2018-01-01', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05', 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-03', 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-04', 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05', 'Equip': 'A-2', 'Job': 'Job 3']

Output:

['Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-01', 'First': '2018-01-01', 
 'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-02', 'First': '2018-01-02', 
 'Equip': 'A-1', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03', 
 'Equip': 'A-1', 'Job': 'Job 2', 'Last': '2018-01-05', 'First': '2018-01-04', 
 'Equip': 'A-2', 'Job': 'Job 1', 'Last': '2018-01-03', 'First': '2018-01-03', 
 'Equip': 'A-2', 'Job': 'Job 3', 'Last': '2018-01-05', 'First': '2018-01-04']

edited 1 hour ago

answered 2 hours ago

Ajax1234

37.8k42249

edited 1 hour ago

answered 2 hours ago

Ajax1234

37.8k42249

answered 2 hours ago

Ajax1234

37.8k42249

answered 2 hours ago

Ajax1234

37.8k42249

I think new_data should be sorted by date too third (and not just Equip and Job). itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
â€“Â slider
2 hours ago

_key = itemgetter('Equip', 'Job') would be the built-in way to create that key function.
â€“Â schwobaseggl
2 hours ago

Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the second A-1 Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.
â€“Â MarkD
2 hours ago

@MarkD Please see my recent edit.
â€“Â Ajax1234
1 hour ago

add a commentÂ |Â

I think new_data should be sorted by date too third (and not just Equip and Job). itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
â€“Â slider
2 hours ago

_key = itemgetter('Equip', 'Job') would be the built-in way to create that key function.
â€“Â schwobaseggl
2 hours ago

Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the second A-1 Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.
â€“Â MarkD
2 hours ago

@MarkD Please see my recent edit.
â€“Â Ajax1234
1 hour ago

I think new_data should be sorted by date too third (and not just Equip and Job). itertools.groupby(sorted(my_list, key=lambda x: (x['Equip'], x['Job'], x['Date'])), key=_key)
â€“Â slider
2 hours ago

_key = itemgetter('Equip', 'Job') would be the built-in way to create that key function.
â€“Â schwobaseggl
2 hours ago

Thanks! It is almost there, but doesn't quite work for bullet-point 3 in my "couple of things to note". For example, if you switch the second A-1 Job to Job 2 (e.g.- the equipment went to job 1, then job 2, then back to job 1) there should be 2 entries for A-1 on Job 1 and 2 entries for Job 2, instead there is only one for each (that shows the absolute max/min on those jobs, instead of max/min per visit to job 1 and job 2. I'll update my example above to show what I mean.
â€“Â MarkD
2 hours ago

@MarkD Please see my recent edit.
â€“Â Ajax1234
1 hour ago

add a commentÂ |Â

up vote
3
down vote

I suggest using pandas for this.

itertools.groupby is cool but IMO a bit harder to comprehend.

>>> import pandas as pd
>>>
>>> my_list = [
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
...: 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
...:]
>>>
>>> df = pd.DataFrame(my_list)
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> groups = df.groupby(['Equip', 'Job']).agg('Date': [min, max]).reset_index() 
>>> groups.columns = ['Equip', 'Job', 'First', 'Last']
>>> groups
>>> 
 Equip Job First Last
0 A-1 Job 1 2018-01-01 2018-01-03
1 A-1 Job 2 2018-01-04 2018-01-05
2 A-2 Job 1 2018-01-03 2018-01-03
3 A-2 Job 3 2018-01-04 2018-01-05
>>>
>>> groups.to_dict(orient='records')
>>> 
['Equip': 'A-1',
 'First': Timestamp('2018-01-01 00:00:00'),
 'Job': 'Job 1',
 'Last': Timestamp('2018-01-03 00:00:00'),
 'Equip': 'A-1',
 'First': Timestamp('2018-01-04 00:00:00'),
 'Job': 'Job 2',
 'Last': Timestamp('2018-01-05 00:00:00'),
 'Equip': 'A-2',
 'First': Timestamp('2018-01-03 00:00:00'),
 'Job': 'Job 1',
 'Last': Timestamp('2018-01-03 00:00:00'),
 'Equip': 'A-2',
 'First': Timestamp('2018-01-04 00:00:00'),
 'Job': 'Job 3',
 'Last': Timestamp('2018-01-05 00:00:00')]

I suggest keeping the dates as time stamps.

answered 2 hours ago

timgeb

41.6k105581

add a commentÂ |Â

up vote
3
down vote

I suggest using pandas for this.

itertools.groupby is cool but IMO a bit harder to comprehend.

>>> import pandas as pd
>>>
>>> my_list = [
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
...: 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
...:]
>>>
>>> df = pd.DataFrame(my_list)
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> groups = df.groupby(['Equip', 'Job']).agg('Date': [min, max]).reset_index() 
>>> groups.columns = ['Equip', 'Job', 'First', 'Last']
>>> groups
>>> 
 Equip Job First Last
0 A-1 Job 1 2018-01-01 2018-01-03
1 A-1 Job 2 2018-01-04 2018-01-05
2 A-2 Job 1 2018-01-03 2018-01-03
3 A-2 Job 3 2018-01-04 2018-01-05
>>>
>>> groups.to_dict(orient='records')
>>> 
['Equip': 'A-1',
 'First': Timestamp('2018-01-01 00:00:00'),
 'Job': 'Job 1',
 'Last': Timestamp('2018-01-03 00:00:00'),
 'Equip': 'A-1',
 'First': Timestamp('2018-01-04 00:00:00'),
 'Job': 'Job 2',
 'Last': Timestamp('2018-01-05 00:00:00'),
 'Equip': 'A-2',
 'First': Timestamp('2018-01-03 00:00:00'),
 'Job': 'Job 1',
 'Last': Timestamp('2018-01-03 00:00:00'),
 'Equip': 'A-2',
 'First': Timestamp('2018-01-04 00:00:00'),
 'Job': 'Job 3',
 'Last': Timestamp('2018-01-05 00:00:00')]

I suggest keeping the dates as time stamps.

answered 2 hours ago

timgeb

41.6k105581

add a commentÂ |Â

up vote
3
down vote

I suggest using pandas for this.

itertools.groupby is cool but IMO a bit harder to comprehend.

>>> import pandas as pd
>>>
>>> my_list = [
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
...: 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
...:]
>>>
>>> df = pd.DataFrame(my_list)
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> groups = df.groupby(['Equip', 'Job']).agg('Date': [min, max]).reset_index() 
>>> groups.columns = ['Equip', 'Job', 'First', 'Last']
>>> groups
>>> 
 Equip Job First Last
0 A-1 Job 1 2018-01-01 2018-01-03
1 A-1 Job 2 2018-01-04 2018-01-05
2 A-2 Job 1 2018-01-03 2018-01-03
3 A-2 Job 3 2018-01-04 2018-01-05
>>>
>>> groups.to_dict(orient='records')
>>> 
['Equip': 'A-1',
 'First': Timestamp('2018-01-01 00:00:00'),
 'Job': 'Job 1',
 'Last': Timestamp('2018-01-03 00:00:00'),
 'Equip': 'A-1',
 'First': Timestamp('2018-01-04 00:00:00'),
 'Job': 'Job 2',
 'Last': Timestamp('2018-01-05 00:00:00'),
 'Equip': 'A-2',
 'First': Timestamp('2018-01-03 00:00:00'),
 'Job': 'Job 1',
 'Last': Timestamp('2018-01-03 00:00:00'),
 'Equip': 'A-2',
 'First': Timestamp('2018-01-04 00:00:00'),
 'Job': 'Job 3',
 'Last': Timestamp('2018-01-05 00:00:00')]

I suggest keeping the dates as time stamps.

answered 2 hours ago

timgeb

41.6k105581

I suggest using pandas for this.

itertools.groupby is cool but IMO a bit harder to comprehend.

>>> import pandas as pd
>>>
>>> my_list = [
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-01',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-02',
...: 'Equip': 'A-1', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-04',
...: 'Equip': 'A-1', 'Job': 'Job 2', 'Date': '2018-01-05',
...: 'Equip': 'A-2', 'Job': 'Job 1', 'Date': '2018-01-03',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-04',
...: 'Equip': 'A-2', 'Job': 'Job 3', 'Date': '2018-01-05'
...:]
>>>
>>> df = pd.DataFrame(my_list)
>>> df['Date'] = pd.to_datetime(df['Date'])
>>> groups = df.groupby(['Equip', 'Job']).agg('Date': [min, max]).reset_index() 
>>> groups.columns = ['Equip', 'Job', 'First', 'Last']
>>> groups
>>> 
 Equip Job First Last
0 A-1 Job 1 2018-01-01 2018-01-03
1 A-1 Job 2 2018-01-04 2018-01-05
2 A-2 Job 1 2018-01-03 2018-01-03
3 A-2 Job 3 2018-01-04 2018-01-05
>>>
>>> groups.to_dict(orient='records')
>>> 
['Equip': 'A-1',
 'First': Timestamp('2018-01-01 00:00:00'),
 'Job': 'Job 1',
 'Last': Timestamp('2018-01-03 00:00:00'),
 'Equip': 'A-1',
 'First': Timestamp('2018-01-04 00:00:00'),
 'Job': 'Job 2',
 'Last': Timestamp('2018-01-05 00:00:00'),
 'Equip': 'A-2',
 'First': Timestamp('2018-01-03 00:00:00'),
 'Job': 'Job 1',
 'Last': Timestamp('2018-01-03 00:00:00'),
 'Equip': 'A-2',
 'First': Timestamp('2018-01-04 00:00:00'),
 'Job': 'Job 3',
 'Last': Timestamp('2018-01-05 00:00:00')]

I suggest keeping the dates as time stamps.

answered 2 hours ago

timgeb

41.6k105581

answered 2 hours ago

timgeb

41.6k105581

answered 2 hours ago

timgeb

41.6k105581

answered 2 hours ago

timgeb

41.6k105581

add a commentÂ |Â

up vote
1
down vote

You can use pandas here, which is some sort of "database interface" for data:

import pandas as pd

df = pd.DataFrame(my_list)
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')

for the given sample input, this gives:

>>> df2.to_dict('records')
['Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
 'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
 'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
 'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05']

In case the date format is not '%Y-%m-%d', then one first needs to convert it with pd.to_datetime(..) like:

import pandas as pd

df = pd.DataFrame(my_list)
df['Date'] = pd.to_datetime(df['Date'])
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')

edited 1 hour ago

answered 1 hour ago

Willem Van Onsem

136k16128218

Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
â€“Â MarkD
1 hour ago

@MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
â€“Â Willem Van Onsem
1 hour ago

Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
â€“Â timgeb
1 hour ago

@timgeb: if the strings are formatted '%Y-%m-%d', then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.
â€“Â Willem Van Onsem
1 hour ago

Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
â€“Â timgeb
1 hour ago

Â |Â
show 2 more comments

up vote
1
down vote

You can use pandas here, which is some sort of "database interface" for data:

import pandas as pd

df = pd.DataFrame(my_list)
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')

for the given sample input, this gives:

>>> df2.to_dict('records')
['Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
 'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
 'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
 'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05']

In case the date format is not '%Y-%m-%d', then one first needs to convert it with pd.to_datetime(..) like:

import pandas as pd

df = pd.DataFrame(my_list)
df['Date'] = pd.to_datetime(df['Date'])
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')

edited 1 hour ago

answered 1 hour ago

Willem Van Onsem

136k16128218

Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
â€“Â MarkD
1 hour ago

@MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
â€“Â Willem Van Onsem
1 hour ago

Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
â€“Â timgeb
1 hour ago

@timgeb: if the strings are formatted '%Y-%m-%d', then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.
â€“Â Willem Van Onsem
1 hour ago

Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
â€“Â timgeb
1 hour ago

Â |Â
show 2 more comments

up vote
1
down vote

You can use pandas here, which is some sort of "database interface" for data:

import pandas as pd

df = pd.DataFrame(my_list)
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')

for the given sample input, this gives:

>>> df2.to_dict('records')
['Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
 'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
 'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
 'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05']

In case the date format is not '%Y-%m-%d', then one first needs to convert it with pd.to_datetime(..) like:

import pandas as pd

df = pd.DataFrame(my_list)
df['Date'] = pd.to_datetime(df['Date'])
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')

edited 1 hour ago

answered 1 hour ago

Willem Van Onsem

136k16128218

You can use pandas here, which is some sort of "database interface" for data:

import pandas as pd

df = pd.DataFrame(my_list)
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')

for the given sample input, this gives:

>>> df2.to_dict('records')
['Equip': 'A-1', 'Job': 'Job 1', 'First': '2018-01-01', 'Last': '2018-01-03',
 'Equip': 'A-1', 'Job': 'Job 2', 'First': '2018-01-04', 'Last': '2018-01-05',
 'Equip': 'A-2', 'Job': 'Job 1', 'First': '2018-01-03', 'Last': '2018-01-03',
 'Equip': 'A-2', 'Job': 'Job 3', 'First': '2018-01-04', 'Last': '2018-01-05']

In case the date format is not '%Y-%m-%d', then one first needs to convert it with pd.to_datetime(..) like:

import pandas as pd

df = pd.DataFrame(my_list)
df['Date'] = pd.to_datetime(df['Date'])
df2 = df.groupby(['Equip', 'Job']).agg(['min', 'max']).rename(columns='min': 'First', 'max': 'Last')
df2.columns = df2.columns.droplevel()
df2 = df2.reset_index()
result = df2.to_dict('records')

edited 1 hour ago

answered 1 hour ago

Willem Van Onsem

136k16128218

edited 1 hour ago

answered 1 hour ago

Willem Van Onsem

136k16128218

answered 1 hour ago

Willem Van Onsem

136k16128218

answered 1 hour ago

Willem Van Onsem

136k16128218

Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
â€“Â MarkD
1 hour ago

@MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
â€“Â Willem Van Onsem
1 hour ago

Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
â€“Â timgeb
1 hour ago

@timgeb: if the strings are formatted '%Y-%m-%d', then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.
â€“Â Willem Van Onsem
1 hour ago

Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
â€“Â timgeb
1 hour ago

Â |Â
show 2 more comments

Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
â€“Â MarkD
1 hour ago

@MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
â€“Â Willem Van Onsem
1 hour ago

Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
â€“Â timgeb
1 hour ago

@timgeb: if the strings are formatted '%Y-%m-%d', then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.
â€“Â Willem Van Onsem
1 hour ago

Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
â€“Â timgeb
1 hour ago

Thanks for the Pandas suggestion, and I agree, pandas is well suited to this. However, this is being done in a web app/API and I've found in the past that loading pandas in django causes some issues with response time, etc.. and would prefer to keep things vanilla python.
â€“Â MarkD
1 hour ago

@MarkD: well loading pandas the first time indeed takes some time, but after that, it can frequently take less time, since a DataFrame is not implemented in Python, but in C.
â€“Â Willem Van Onsem
1 hour ago

Are you just getting lucky by producing the correct output using strings or am I missing where you convert them to dates? I'm a bit confused. :)
â€“Â timgeb
1 hour ago

@timgeb: if the strings are formatted '%Y-%m-%d', then the lexicographical order is the same as the "date order". Since you can see a date as a three "digit" number (here digit is a bit a wrong word), where the digits have different radices. Only at the end of the "day" range, it increments the month range, and performs a wrap-around. So it works like a positional number system.
â€“Â Willem Van Onsem
1 hour ago

Ah, true. Probably still useful for future readers to mention that this works only for the exact format OP is using. I can't decide whether this is clever or dirty. Probably both. :)
â€“Â timgeb
1 hour ago

Â |Â
show 2 more comments

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky