pandas - how to get last n groups of a groupby object and combine them as a dataframe
Clash Royale CLAN TAG#URR8PPP
up vote
7
down vote
favorite
How to get last 'n' groups after df.groupby()
and combine them as a dataframe.
data = pd.read_sql_query(sql=sqlstr, con=sql_conn, index_col='SampleTime')
grouped = data.groupby(data.index.date,sort=False)
After doing grouped.ngroups
i am getting total number of groups 277. I want to combine last 12 groups and generate a dataframe.
python pandas pandas-groupby
add a comment |Â
up vote
7
down vote
favorite
How to get last 'n' groups after df.groupby()
and combine them as a dataframe.
data = pd.read_sql_query(sql=sqlstr, con=sql_conn, index_col='SampleTime')
grouped = data.groupby(data.index.date,sort=False)
After doing grouped.ngroups
i am getting total number of groups 277. I want to combine last 12 groups and generate a dataframe.
python pandas pandas-groupby
add a comment |Â
up vote
7
down vote
favorite
up vote
7
down vote
favorite
How to get last 'n' groups after df.groupby()
and combine them as a dataframe.
data = pd.read_sql_query(sql=sqlstr, con=sql_conn, index_col='SampleTime')
grouped = data.groupby(data.index.date,sort=False)
After doing grouped.ngroups
i am getting total number of groups 277. I want to combine last 12 groups and generate a dataframe.
python pandas pandas-groupby
How to get last 'n' groups after df.groupby()
and combine them as a dataframe.
data = pd.read_sql_query(sql=sqlstr, con=sql_conn, index_col='SampleTime')
grouped = data.groupby(data.index.date,sort=False)
After doing grouped.ngroups
i am getting total number of groups 277. I want to combine last 12 groups and generate a dataframe.
python pandas pandas-groupby
python pandas pandas-groupby
edited 27 mins ago
jpp
73.3k184289
73.3k184289
asked 1 hour ago
stockade
765
765
add a comment |Â
add a comment |Â
5 Answers
5
active
oldest
votes
up vote
3
down vote
accepted
Pandas GroupBy
objects are iterables. To extract the last n elements of an iterable, there's generally no need to create a list from the iterable and slice the last n elements. This will be memory-expensive.
Instead, you can use collections.deque
and specify maxlen
. Then use pd.concat
to concatenate a sequence of dataframes.
from collections import deque
from operator import itemgetter
grouped = data.groupby(data.index.date, sort=False)
res = pd.concat(deque(map(itemgetter(1), grouped), maxlen=12))
As described in the collections
docs:
Once a bounded length
deque
is full, when new items are added, a
corresponding number of items are discarded from the opposite end....
They are also useful for tracking transactions and other pools of data
where only the most recent activity is of interest.
in your example where can it be seen that you recover the last 5?
â Yuca
37 mins ago
Great catch using deques, but you are still iterating over all groups. So the advantage is to save memory in this case, am I right? Good catch anyway
â RafaelC
33 mins ago
@RafaelC, Yes, the advantage is principally lower memory usage. You can't avoid iterating over all groups.
â jpp
32 mins ago
@Yuca,maxlen=12
here.
â jpp
29 mins ago
add a comment |Â
up vote
2
down vote
Assuming you know the order of grouped
grouped = zip(*df.groupby(data.index.date,sort=False))
pd.concat(list(grouped)[1][-12:])
add a comment |Â
up vote
0
down vote
You could pass a list comprehension to pd.concat()
:
import pandas as pd
df = pd.DataFrame([
['A',1,2],
['A',7,6],
['B',1,3],
['B',9,9],
['C',1,8],
['A',4,3],
['C',7,6],
['D',4,2]],
columns=['Var','Val1','Val2'])
last_n = 2
grouped = df.groupby('Var')
pd.concat([grouped.get_group(group) for i, group in enumerate(grouped.groups) if i>=len(grouped)-last_n])
Yields:
Var Val1 Val2
4 C 1 8
6 C 7 6
7 D 4 2
add a comment |Â
up vote
0
down vote
use pd.concat
on lists comprehension and groupby.get_group
pd.concat([grouped.get_group(x) for x in list(grouped.groups.keys())[-12:]])
1
I think tail will return the last 12 entries of each group. Unless I misunderstood OP, I don't think that's what is desired...
â sacul
1 hour ago
sounds right. Re-thinking answer then
â Yuca
1 hour ago
I just tried this on a dataframe I was working on, and this seemed to be what OP asked for?ed:
the concat didn't work, but.tail(12)
returned the final 12 groups
â Mathew Savage
1 hour ago
new version should be aligned to what OP wants :) (although it doesn't provide much vs rahlf23's version)
â Yuca
44 mins ago
add a comment |Â
up vote
0
down vote
You can use ngroup
to subset the original DataFrame
to find the last 12 groups
import numpy as np
dates = df.index.date
df[df.groupby(dates, sort=False).ngroup() >= len(np.unique(dates)) - 12]
Sample Data
import pandas as pd
df = pd.DataFrame('dates': pd.date_range('2013-01-01', '2014-01-01', freq ='12H'),
'vals': np.random.randint(1,12,731)
)
df = df.set_index('dates')
Output:
vals
dates
2013-12-21 00:00:00 5
2013-12-21 12:00:00 8
2013-12-22 00:00:00 3
2013-12-22 12:00:00 8
2013-12-23 00:00:00 2
...
2013-12-31 12:00:00 2
2014-01-01 00:00:00 5
add a comment |Â
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
Pandas GroupBy
objects are iterables. To extract the last n elements of an iterable, there's generally no need to create a list from the iterable and slice the last n elements. This will be memory-expensive.
Instead, you can use collections.deque
and specify maxlen
. Then use pd.concat
to concatenate a sequence of dataframes.
from collections import deque
from operator import itemgetter
grouped = data.groupby(data.index.date, sort=False)
res = pd.concat(deque(map(itemgetter(1), grouped), maxlen=12))
As described in the collections
docs:
Once a bounded length
deque
is full, when new items are added, a
corresponding number of items are discarded from the opposite end....
They are also useful for tracking transactions and other pools of data
where only the most recent activity is of interest.
in your example where can it be seen that you recover the last 5?
â Yuca
37 mins ago
Great catch using deques, but you are still iterating over all groups. So the advantage is to save memory in this case, am I right? Good catch anyway
â RafaelC
33 mins ago
@RafaelC, Yes, the advantage is principally lower memory usage. You can't avoid iterating over all groups.
â jpp
32 mins ago
@Yuca,maxlen=12
here.
â jpp
29 mins ago
add a comment |Â
up vote
3
down vote
accepted
Pandas GroupBy
objects are iterables. To extract the last n elements of an iterable, there's generally no need to create a list from the iterable and slice the last n elements. This will be memory-expensive.
Instead, you can use collections.deque
and specify maxlen
. Then use pd.concat
to concatenate a sequence of dataframes.
from collections import deque
from operator import itemgetter
grouped = data.groupby(data.index.date, sort=False)
res = pd.concat(deque(map(itemgetter(1), grouped), maxlen=12))
As described in the collections
docs:
Once a bounded length
deque
is full, when new items are added, a
corresponding number of items are discarded from the opposite end....
They are also useful for tracking transactions and other pools of data
where only the most recent activity is of interest.
in your example where can it be seen that you recover the last 5?
â Yuca
37 mins ago
Great catch using deques, but you are still iterating over all groups. So the advantage is to save memory in this case, am I right? Good catch anyway
â RafaelC
33 mins ago
@RafaelC, Yes, the advantage is principally lower memory usage. You can't avoid iterating over all groups.
â jpp
32 mins ago
@Yuca,maxlen=12
here.
â jpp
29 mins ago
add a comment |Â
up vote
3
down vote
accepted
up vote
3
down vote
accepted
Pandas GroupBy
objects are iterables. To extract the last n elements of an iterable, there's generally no need to create a list from the iterable and slice the last n elements. This will be memory-expensive.
Instead, you can use collections.deque
and specify maxlen
. Then use pd.concat
to concatenate a sequence of dataframes.
from collections import deque
from operator import itemgetter
grouped = data.groupby(data.index.date, sort=False)
res = pd.concat(deque(map(itemgetter(1), grouped), maxlen=12))
As described in the collections
docs:
Once a bounded length
deque
is full, when new items are added, a
corresponding number of items are discarded from the opposite end....
They are also useful for tracking transactions and other pools of data
where only the most recent activity is of interest.
Pandas GroupBy
objects are iterables. To extract the last n elements of an iterable, there's generally no need to create a list from the iterable and slice the last n elements. This will be memory-expensive.
Instead, you can use collections.deque
and specify maxlen
. Then use pd.concat
to concatenate a sequence of dataframes.
from collections import deque
from operator import itemgetter
grouped = data.groupby(data.index.date, sort=False)
res = pd.concat(deque(map(itemgetter(1), grouped), maxlen=12))
As described in the collections
docs:
Once a bounded length
deque
is full, when new items are added, a
corresponding number of items are discarded from the opposite end....
They are also useful for tracking transactions and other pools of data
where only the most recent activity is of interest.
edited 30 mins ago
answered 39 mins ago
jpp
73.3k184289
73.3k184289
in your example where can it be seen that you recover the last 5?
â Yuca
37 mins ago
Great catch using deques, but you are still iterating over all groups. So the advantage is to save memory in this case, am I right? Good catch anyway
â RafaelC
33 mins ago
@RafaelC, Yes, the advantage is principally lower memory usage. You can't avoid iterating over all groups.
â jpp
32 mins ago
@Yuca,maxlen=12
here.
â jpp
29 mins ago
add a comment |Â
in your example where can it be seen that you recover the last 5?
â Yuca
37 mins ago
Great catch using deques, but you are still iterating over all groups. So the advantage is to save memory in this case, am I right? Good catch anyway
â RafaelC
33 mins ago
@RafaelC, Yes, the advantage is principally lower memory usage. You can't avoid iterating over all groups.
â jpp
32 mins ago
@Yuca,maxlen=12
here.
â jpp
29 mins ago
in your example where can it be seen that you recover the last 5?
â Yuca
37 mins ago
in your example where can it be seen that you recover the last 5?
â Yuca
37 mins ago
Great catch using deques, but you are still iterating over all groups. So the advantage is to save memory in this case, am I right? Good catch anyway
â RafaelC
33 mins ago
Great catch using deques, but you are still iterating over all groups. So the advantage is to save memory in this case, am I right? Good catch anyway
â RafaelC
33 mins ago
@RafaelC, Yes, the advantage is principally lower memory usage. You can't avoid iterating over all groups.
â jpp
32 mins ago
@RafaelC, Yes, the advantage is principally lower memory usage. You can't avoid iterating over all groups.
â jpp
32 mins ago
@Yuca,
maxlen=12
here.â jpp
29 mins ago
@Yuca,
maxlen=12
here.â jpp
29 mins ago
add a comment |Â
up vote
2
down vote
Assuming you know the order of grouped
grouped = zip(*df.groupby(data.index.date,sort=False))
pd.concat(list(grouped)[1][-12:])
add a comment |Â
up vote
2
down vote
Assuming you know the order of grouped
grouped = zip(*df.groupby(data.index.date,sort=False))
pd.concat(list(grouped)[1][-12:])
add a comment |Â
up vote
2
down vote
up vote
2
down vote
Assuming you know the order of grouped
grouped = zip(*df.groupby(data.index.date,sort=False))
pd.concat(list(grouped)[1][-12:])
Assuming you know the order of grouped
grouped = zip(*df.groupby(data.index.date,sort=False))
pd.concat(list(grouped)[1][-12:])
answered 1 hour ago
RafaelC
24.3k72447
24.3k72447
add a comment |Â
add a comment |Â
up vote
0
down vote
You could pass a list comprehension to pd.concat()
:
import pandas as pd
df = pd.DataFrame([
['A',1,2],
['A',7,6],
['B',1,3],
['B',9,9],
['C',1,8],
['A',4,3],
['C',7,6],
['D',4,2]],
columns=['Var','Val1','Val2'])
last_n = 2
grouped = df.groupby('Var')
pd.concat([grouped.get_group(group) for i, group in enumerate(grouped.groups) if i>=len(grouped)-last_n])
Yields:
Var Val1 Val2
4 C 1 8
6 C 7 6
7 D 4 2
add a comment |Â
up vote
0
down vote
You could pass a list comprehension to pd.concat()
:
import pandas as pd
df = pd.DataFrame([
['A',1,2],
['A',7,6],
['B',1,3],
['B',9,9],
['C',1,8],
['A',4,3],
['C',7,6],
['D',4,2]],
columns=['Var','Val1','Val2'])
last_n = 2
grouped = df.groupby('Var')
pd.concat([grouped.get_group(group) for i, group in enumerate(grouped.groups) if i>=len(grouped)-last_n])
Yields:
Var Val1 Val2
4 C 1 8
6 C 7 6
7 D 4 2
add a comment |Â
up vote
0
down vote
up vote
0
down vote
You could pass a list comprehension to pd.concat()
:
import pandas as pd
df = pd.DataFrame([
['A',1,2],
['A',7,6],
['B',1,3],
['B',9,9],
['C',1,8],
['A',4,3],
['C',7,6],
['D',4,2]],
columns=['Var','Val1','Val2'])
last_n = 2
grouped = df.groupby('Var')
pd.concat([grouped.get_group(group) for i, group in enumerate(grouped.groups) if i>=len(grouped)-last_n])
Yields:
Var Val1 Val2
4 C 1 8
6 C 7 6
7 D 4 2
You could pass a list comprehension to pd.concat()
:
import pandas as pd
df = pd.DataFrame([
['A',1,2],
['A',7,6],
['B',1,3],
['B',9,9],
['C',1,8],
['A',4,3],
['C',7,6],
['D',4,2]],
columns=['Var','Val1','Val2'])
last_n = 2
grouped = df.groupby('Var')
pd.concat([grouped.get_group(group) for i, group in enumerate(grouped.groups) if i>=len(grouped)-last_n])
Yields:
Var Val1 Val2
4 C 1 8
6 C 7 6
7 D 4 2
answered 1 hour ago
rahlf23
3,3661427
3,3661427
add a comment |Â
add a comment |Â
up vote
0
down vote
use pd.concat
on lists comprehension and groupby.get_group
pd.concat([grouped.get_group(x) for x in list(grouped.groups.keys())[-12:]])
1
I think tail will return the last 12 entries of each group. Unless I misunderstood OP, I don't think that's what is desired...
â sacul
1 hour ago
sounds right. Re-thinking answer then
â Yuca
1 hour ago
I just tried this on a dataframe I was working on, and this seemed to be what OP asked for?ed:
the concat didn't work, but.tail(12)
returned the final 12 groups
â Mathew Savage
1 hour ago
new version should be aligned to what OP wants :) (although it doesn't provide much vs rahlf23's version)
â Yuca
44 mins ago
add a comment |Â
up vote
0
down vote
use pd.concat
on lists comprehension and groupby.get_group
pd.concat([grouped.get_group(x) for x in list(grouped.groups.keys())[-12:]])
1
I think tail will return the last 12 entries of each group. Unless I misunderstood OP, I don't think that's what is desired...
â sacul
1 hour ago
sounds right. Re-thinking answer then
â Yuca
1 hour ago
I just tried this on a dataframe I was working on, and this seemed to be what OP asked for?ed:
the concat didn't work, but.tail(12)
returned the final 12 groups
â Mathew Savage
1 hour ago
new version should be aligned to what OP wants :) (although it doesn't provide much vs rahlf23's version)
â Yuca
44 mins ago
add a comment |Â
up vote
0
down vote
up vote
0
down vote
use pd.concat
on lists comprehension and groupby.get_group
pd.concat([grouped.get_group(x) for x in list(grouped.groups.keys())[-12:]])
use pd.concat
on lists comprehension and groupby.get_group
pd.concat([grouped.get_group(x) for x in list(grouped.groups.keys())[-12:]])
edited 51 mins ago
answered 1 hour ago
Yuca
2,0791420
2,0791420
1
I think tail will return the last 12 entries of each group. Unless I misunderstood OP, I don't think that's what is desired...
â sacul
1 hour ago
sounds right. Re-thinking answer then
â Yuca
1 hour ago
I just tried this on a dataframe I was working on, and this seemed to be what OP asked for?ed:
the concat didn't work, but.tail(12)
returned the final 12 groups
â Mathew Savage
1 hour ago
new version should be aligned to what OP wants :) (although it doesn't provide much vs rahlf23's version)
â Yuca
44 mins ago
add a comment |Â
1
I think tail will return the last 12 entries of each group. Unless I misunderstood OP, I don't think that's what is desired...
â sacul
1 hour ago
sounds right. Re-thinking answer then
â Yuca
1 hour ago
I just tried this on a dataframe I was working on, and this seemed to be what OP asked for?ed:
the concat didn't work, but.tail(12)
returned the final 12 groups
â Mathew Savage
1 hour ago
new version should be aligned to what OP wants :) (although it doesn't provide much vs rahlf23's version)
â Yuca
44 mins ago
1
1
I think tail will return the last 12 entries of each group. Unless I misunderstood OP, I don't think that's what is desired...
â sacul
1 hour ago
I think tail will return the last 12 entries of each group. Unless I misunderstood OP, I don't think that's what is desired...
â sacul
1 hour ago
sounds right. Re-thinking answer then
â Yuca
1 hour ago
sounds right. Re-thinking answer then
â Yuca
1 hour ago
I just tried this on a dataframe I was working on, and this seemed to be what OP asked for?
ed:
the concat didn't work, but .tail(12)
returned the final 12 groupsâ Mathew Savage
1 hour ago
I just tried this on a dataframe I was working on, and this seemed to be what OP asked for?
ed:
the concat didn't work, but .tail(12)
returned the final 12 groupsâ Mathew Savage
1 hour ago
new version should be aligned to what OP wants :) (although it doesn't provide much vs rahlf23's version)
â Yuca
44 mins ago
new version should be aligned to what OP wants :) (although it doesn't provide much vs rahlf23's version)
â Yuca
44 mins ago
add a comment |Â
up vote
0
down vote
You can use ngroup
to subset the original DataFrame
to find the last 12 groups
import numpy as np
dates = df.index.date
df[df.groupby(dates, sort=False).ngroup() >= len(np.unique(dates)) - 12]
Sample Data
import pandas as pd
df = pd.DataFrame('dates': pd.date_range('2013-01-01', '2014-01-01', freq ='12H'),
'vals': np.random.randint(1,12,731)
)
df = df.set_index('dates')
Output:
vals
dates
2013-12-21 00:00:00 5
2013-12-21 12:00:00 8
2013-12-22 00:00:00 3
2013-12-22 12:00:00 8
2013-12-23 00:00:00 2
...
2013-12-31 12:00:00 2
2014-01-01 00:00:00 5
add a comment |Â
up vote
0
down vote
You can use ngroup
to subset the original DataFrame
to find the last 12 groups
import numpy as np
dates = df.index.date
df[df.groupby(dates, sort=False).ngroup() >= len(np.unique(dates)) - 12]
Sample Data
import pandas as pd
df = pd.DataFrame('dates': pd.date_range('2013-01-01', '2014-01-01', freq ='12H'),
'vals': np.random.randint(1,12,731)
)
df = df.set_index('dates')
Output:
vals
dates
2013-12-21 00:00:00 5
2013-12-21 12:00:00 8
2013-12-22 00:00:00 3
2013-12-22 12:00:00 8
2013-12-23 00:00:00 2
...
2013-12-31 12:00:00 2
2014-01-01 00:00:00 5
add a comment |Â
up vote
0
down vote
up vote
0
down vote
You can use ngroup
to subset the original DataFrame
to find the last 12 groups
import numpy as np
dates = df.index.date
df[df.groupby(dates, sort=False).ngroup() >= len(np.unique(dates)) - 12]
Sample Data
import pandas as pd
df = pd.DataFrame('dates': pd.date_range('2013-01-01', '2014-01-01', freq ='12H'),
'vals': np.random.randint(1,12,731)
)
df = df.set_index('dates')
Output:
vals
dates
2013-12-21 00:00:00 5
2013-12-21 12:00:00 8
2013-12-22 00:00:00 3
2013-12-22 12:00:00 8
2013-12-23 00:00:00 2
...
2013-12-31 12:00:00 2
2014-01-01 00:00:00 5
You can use ngroup
to subset the original DataFrame
to find the last 12 groups
import numpy as np
dates = df.index.date
df[df.groupby(dates, sort=False).ngroup() >= len(np.unique(dates)) - 12]
Sample Data
import pandas as pd
df = pd.DataFrame('dates': pd.date_range('2013-01-01', '2014-01-01', freq ='12H'),
'vals': np.random.randint(1,12,731)
)
df = df.set_index('dates')
Output:
vals
dates
2013-12-21 00:00:00 5
2013-12-21 12:00:00 8
2013-12-22 00:00:00 3
2013-12-22 12:00:00 8
2013-12-23 00:00:00 2
...
2013-12-31 12:00:00 2
2014-01-01 00:00:00 5
edited 24 mins ago
answered 43 mins ago
ALollz
8,41531131
8,41531131
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f52895305%2fpandas-how-to-get-last-n-groups-of-a-groupby-object-and-combine-them-as-a-data%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password