Select only first n elements in operator form?
Clash Royale CLAN TAG#URR8PPP
up vote
10
down vote
favorite
Is there any way to make Select
take a second argument (to select only the first n
elems) while in operator form?
It's too slow to perform the whole selection and then take the first n
, for example:
dataset =
Dataset[Table[<|"a" -> RandomChoice[1, 2], "b" -> RandomReal|>,
1000000]];
Length@dataset[Select[#a == 1 & (*only want the first 1k found*)]]
AbsoluteTiming[dataset[Select[#a == 1 &]][;; 1000];]
AbsoluteTiming[Select[dataset, #a == 1 &, 1000];]
I don't want to put the dataset inside the Select like above, I'd like to use Dataset's single bracket query notation.
dataset
add a comment |Â
up vote
10
down vote
favorite
Is there any way to make Select
take a second argument (to select only the first n
elems) while in operator form?
It's too slow to perform the whole selection and then take the first n
, for example:
dataset =
Dataset[Table[<|"a" -> RandomChoice[1, 2], "b" -> RandomReal|>,
1000000]];
Length@dataset[Select[#a == 1 & (*only want the first 1k found*)]]
AbsoluteTiming[dataset[Select[#a == 1 &]][;; 1000];]
AbsoluteTiming[Select[dataset, #a == 1 &, 1000];]
I don't want to put the dataset inside the Select like above, I'd like to use Dataset's single bracket query notation.
dataset
add a comment |Â
up vote
10
down vote
favorite
up vote
10
down vote
favorite
Is there any way to make Select
take a second argument (to select only the first n
elems) while in operator form?
It's too slow to perform the whole selection and then take the first n
, for example:
dataset =
Dataset[Table[<|"a" -> RandomChoice[1, 2], "b" -> RandomReal|>,
1000000]];
Length@dataset[Select[#a == 1 & (*only want the first 1k found*)]]
AbsoluteTiming[dataset[Select[#a == 1 &]][;; 1000];]
AbsoluteTiming[Select[dataset, #a == 1 &, 1000];]
I don't want to put the dataset inside the Select like above, I'd like to use Dataset's single bracket query notation.
dataset
Is there any way to make Select
take a second argument (to select only the first n
elems) while in operator form?
It's too slow to perform the whole selection and then take the first n
, for example:
dataset =
Dataset[Table[<|"a" -> RandomChoice[1, 2], "b" -> RandomReal|>,
1000000]];
Length@dataset[Select[#a == 1 & (*only want the first 1k found*)]]
AbsoluteTiming[dataset[Select[#a == 1 &]][;; 1000];]
AbsoluteTiming[Select[dataset, #a == 1 &, 1000];]
I don't want to put the dataset inside the Select like above, I'd like to use Dataset's single bracket query notation.
dataset
asked Aug 15 at 21:05
M.R.
15.1k551177
15.1k551177
add a comment |Â
add a comment |Â
5 Answers
5
active
oldest
votes
up vote
8
down vote
accepted
Alas, no -- at least not as long as we wish to use Select
as a descending Dataset
/Query
operator. The reason is that the query compiler will only recognize Select
as a descending query operator when it has exactly one argument:
Needs["Dataset`"]
DescendingQ[Select[#==1&]]
(* True *)
Any attempt to curry other arguments will cause the expression to be interpreted as an ascending operator instead, as is the case for all functions that are not on the blessed list of descending operators:
op1 = Select[#, #==1&, 1000]&;
op2[c_, n_][l_] := Select[l, c, n]
op3[c_, n_] := Select[#, c, n] &
op4 = Curry[Select, 2, 3];
op5 = Curry[2, 3][Select];
AscendingQ /@
op1
, op2[#==1&, 1000]
, op3[#==1&, 1000]
, op4[#==1&, 1000]
, op5[#==1&, 1000]
(* True, True, True, True, True *)
We can see the difference using a small dataset:
ds = Range[10] // Dataset;
Contrast the action of the descending version of Select
...
ds[Select[# < 4&], # + 100&]
... with that of the ascending version:
ds[Select[#, # < 4 &, 3] &, # + 100 &]
In the descending version, the elements are first filtered and then added to 100. In the ascending version, the elements are first added to 100 and then filtered.
We can often work around this situation by issuing consecutive queries:
ds[Select[#, # < 4 &, 3]&][All, # + 100&]
or by using subqueries:
ds[Select[#, # < 4 &, 3] & /* Query[All, # + 100 &]]
(A pedantic note: the All
operators in these last queries are not strictly necessary given the listability of Plus
, but they illustrate the general principle.)
It is a shame that the query compiler does not have some special treatment for the Curry
operator that was introduced in version 11.3. It could be used to supplement and/or re-order the arguments to the various specially-recognized descending operators (especially Select
, SelectFirst
, and GroupBy
). Perhaps in some future version...
add a comment |Â
up vote
5
down vote
If you want to keep the operator form, you could define a helper function to do this:
SelectSubset[crit_, n_][expr_] := Select[expr, crit, n]
Then:
r1 = dataset[SelectSubset[#a == 1&, 1000]]; //RepeatedTiming
r2 = Select[dataset, #a == 1&, 1000]; //RepeatedTiming
Normal@r1 === Normal@r2
0.0029, Null
0.0030, Null
True
add a comment |Â
up vote
4
down vote
Of course:
dataset[Select[#a == 1 &]][;; 1000]; // RepeatedTiming // First
Select[dataset, #a == 1 &, 1000]; // RepeatedTiming // First
dataset[data [Function] Select[data, #a == 1 &, 1000]]; // RepeatedTiming // First
0.53
0.0030
0.0027
(Well, I don't know of a built-in method, but as it can be done like this, why should there be an extra built-in method? In the end, the operator forms are mere syntax sugar - sugar that has to be paid by a lot of documentation. (Btw., I usually do not hesitate to write longer code for better performance.))
add a comment |Â
up vote
1
down vote
Unadvised solution:
Unprotect[Select];
Select[crit_, n_Integer] = Select[#, crit, n] &;
Protect[Select];
so that
Select[1, 2, 3, 4, GreaterThan[2], 1]
Select[GreaterThan[2], 1]@1, 2, 3, 4
% == %% (* True *)
add a comment |Â
up vote
0
down vote
I think what you're looking for is simplySelect[#, #a==1&, 1000]& @ dataset
. For example,
select1s[n_] := Select[#, #a==1&, n]&
select1s[1000] @ dataset
does what you want.
add a comment |Â
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
8
down vote
accepted
Alas, no -- at least not as long as we wish to use Select
as a descending Dataset
/Query
operator. The reason is that the query compiler will only recognize Select
as a descending query operator when it has exactly one argument:
Needs["Dataset`"]
DescendingQ[Select[#==1&]]
(* True *)
Any attempt to curry other arguments will cause the expression to be interpreted as an ascending operator instead, as is the case for all functions that are not on the blessed list of descending operators:
op1 = Select[#, #==1&, 1000]&;
op2[c_, n_][l_] := Select[l, c, n]
op3[c_, n_] := Select[#, c, n] &
op4 = Curry[Select, 2, 3];
op5 = Curry[2, 3][Select];
AscendingQ /@
op1
, op2[#==1&, 1000]
, op3[#==1&, 1000]
, op4[#==1&, 1000]
, op5[#==1&, 1000]
(* True, True, True, True, True *)
We can see the difference using a small dataset:
ds = Range[10] // Dataset;
Contrast the action of the descending version of Select
...
ds[Select[# < 4&], # + 100&]
... with that of the ascending version:
ds[Select[#, # < 4 &, 3] &, # + 100 &]
In the descending version, the elements are first filtered and then added to 100. In the ascending version, the elements are first added to 100 and then filtered.
We can often work around this situation by issuing consecutive queries:
ds[Select[#, # < 4 &, 3]&][All, # + 100&]
or by using subqueries:
ds[Select[#, # < 4 &, 3] & /* Query[All, # + 100 &]]
(A pedantic note: the All
operators in these last queries are not strictly necessary given the listability of Plus
, but they illustrate the general principle.)
It is a shame that the query compiler does not have some special treatment for the Curry
operator that was introduced in version 11.3. It could be used to supplement and/or re-order the arguments to the various specially-recognized descending operators (especially Select
, SelectFirst
, and GroupBy
). Perhaps in some future version...
add a comment |Â
up vote
8
down vote
accepted
Alas, no -- at least not as long as we wish to use Select
as a descending Dataset
/Query
operator. The reason is that the query compiler will only recognize Select
as a descending query operator when it has exactly one argument:
Needs["Dataset`"]
DescendingQ[Select[#==1&]]
(* True *)
Any attempt to curry other arguments will cause the expression to be interpreted as an ascending operator instead, as is the case for all functions that are not on the blessed list of descending operators:
op1 = Select[#, #==1&, 1000]&;
op2[c_, n_][l_] := Select[l, c, n]
op3[c_, n_] := Select[#, c, n] &
op4 = Curry[Select, 2, 3];
op5 = Curry[2, 3][Select];
AscendingQ /@
op1
, op2[#==1&, 1000]
, op3[#==1&, 1000]
, op4[#==1&, 1000]
, op5[#==1&, 1000]
(* True, True, True, True, True *)
We can see the difference using a small dataset:
ds = Range[10] // Dataset;
Contrast the action of the descending version of Select
...
ds[Select[# < 4&], # + 100&]
... with that of the ascending version:
ds[Select[#, # < 4 &, 3] &, # + 100 &]
In the descending version, the elements are first filtered and then added to 100. In the ascending version, the elements are first added to 100 and then filtered.
We can often work around this situation by issuing consecutive queries:
ds[Select[#, # < 4 &, 3]&][All, # + 100&]
or by using subqueries:
ds[Select[#, # < 4 &, 3] & /* Query[All, # + 100 &]]
(A pedantic note: the All
operators in these last queries are not strictly necessary given the listability of Plus
, but they illustrate the general principle.)
It is a shame that the query compiler does not have some special treatment for the Curry
operator that was introduced in version 11.3. It could be used to supplement and/or re-order the arguments to the various specially-recognized descending operators (especially Select
, SelectFirst
, and GroupBy
). Perhaps in some future version...
add a comment |Â
up vote
8
down vote
accepted
up vote
8
down vote
accepted
Alas, no -- at least not as long as we wish to use Select
as a descending Dataset
/Query
operator. The reason is that the query compiler will only recognize Select
as a descending query operator when it has exactly one argument:
Needs["Dataset`"]
DescendingQ[Select[#==1&]]
(* True *)
Any attempt to curry other arguments will cause the expression to be interpreted as an ascending operator instead, as is the case for all functions that are not on the blessed list of descending operators:
op1 = Select[#, #==1&, 1000]&;
op2[c_, n_][l_] := Select[l, c, n]
op3[c_, n_] := Select[#, c, n] &
op4 = Curry[Select, 2, 3];
op5 = Curry[2, 3][Select];
AscendingQ /@
op1
, op2[#==1&, 1000]
, op3[#==1&, 1000]
, op4[#==1&, 1000]
, op5[#==1&, 1000]
(* True, True, True, True, True *)
We can see the difference using a small dataset:
ds = Range[10] // Dataset;
Contrast the action of the descending version of Select
...
ds[Select[# < 4&], # + 100&]
... with that of the ascending version:
ds[Select[#, # < 4 &, 3] &, # + 100 &]
In the descending version, the elements are first filtered and then added to 100. In the ascending version, the elements are first added to 100 and then filtered.
We can often work around this situation by issuing consecutive queries:
ds[Select[#, # < 4 &, 3]&][All, # + 100&]
or by using subqueries:
ds[Select[#, # < 4 &, 3] & /* Query[All, # + 100 &]]
(A pedantic note: the All
operators in these last queries are not strictly necessary given the listability of Plus
, but they illustrate the general principle.)
It is a shame that the query compiler does not have some special treatment for the Curry
operator that was introduced in version 11.3. It could be used to supplement and/or re-order the arguments to the various specially-recognized descending operators (especially Select
, SelectFirst
, and GroupBy
). Perhaps in some future version...
Alas, no -- at least not as long as we wish to use Select
as a descending Dataset
/Query
operator. The reason is that the query compiler will only recognize Select
as a descending query operator when it has exactly one argument:
Needs["Dataset`"]
DescendingQ[Select[#==1&]]
(* True *)
Any attempt to curry other arguments will cause the expression to be interpreted as an ascending operator instead, as is the case for all functions that are not on the blessed list of descending operators:
op1 = Select[#, #==1&, 1000]&;
op2[c_, n_][l_] := Select[l, c, n]
op3[c_, n_] := Select[#, c, n] &
op4 = Curry[Select, 2, 3];
op5 = Curry[2, 3][Select];
AscendingQ /@
op1
, op2[#==1&, 1000]
, op3[#==1&, 1000]
, op4[#==1&, 1000]
, op5[#==1&, 1000]
(* True, True, True, True, True *)
We can see the difference using a small dataset:
ds = Range[10] // Dataset;
Contrast the action of the descending version of Select
...
ds[Select[# < 4&], # + 100&]
... with that of the ascending version:
ds[Select[#, # < 4 &, 3] &, # + 100 &]
In the descending version, the elements are first filtered and then added to 100. In the ascending version, the elements are first added to 100 and then filtered.
We can often work around this situation by issuing consecutive queries:
ds[Select[#, # < 4 &, 3]&][All, # + 100&]
or by using subqueries:
ds[Select[#, # < 4 &, 3] & /* Query[All, # + 100 &]]
(A pedantic note: the All
operators in these last queries are not strictly necessary given the listability of Plus
, but they illustrate the general principle.)
It is a shame that the query compiler does not have some special treatment for the Curry
operator that was introduced in version 11.3. It could be used to supplement and/or re-order the arguments to the various specially-recognized descending operators (especially Select
, SelectFirst
, and GroupBy
). Perhaps in some future version...
edited Aug 16 at 14:03
answered Aug 16 at 5:32


WReach
51.7k2112206
51.7k2112206
add a comment |Â
add a comment |Â
up vote
5
down vote
If you want to keep the operator form, you could define a helper function to do this:
SelectSubset[crit_, n_][expr_] := Select[expr, crit, n]
Then:
r1 = dataset[SelectSubset[#a == 1&, 1000]]; //RepeatedTiming
r2 = Select[dataset, #a == 1&, 1000]; //RepeatedTiming
Normal@r1 === Normal@r2
0.0029, Null
0.0030, Null
True
add a comment |Â
up vote
5
down vote
If you want to keep the operator form, you could define a helper function to do this:
SelectSubset[crit_, n_][expr_] := Select[expr, crit, n]
Then:
r1 = dataset[SelectSubset[#a == 1&, 1000]]; //RepeatedTiming
r2 = Select[dataset, #a == 1&, 1000]; //RepeatedTiming
Normal@r1 === Normal@r2
0.0029, Null
0.0030, Null
True
add a comment |Â
up vote
5
down vote
up vote
5
down vote
If you want to keep the operator form, you could define a helper function to do this:
SelectSubset[crit_, n_][expr_] := Select[expr, crit, n]
Then:
r1 = dataset[SelectSubset[#a == 1&, 1000]]; //RepeatedTiming
r2 = Select[dataset, #a == 1&, 1000]; //RepeatedTiming
Normal@r1 === Normal@r2
0.0029, Null
0.0030, Null
True
If you want to keep the operator form, you could define a helper function to do this:
SelectSubset[crit_, n_][expr_] := Select[expr, crit, n]
Then:
r1 = dataset[SelectSubset[#a == 1&, 1000]]; //RepeatedTiming
r2 = Select[dataset, #a == 1&, 1000]; //RepeatedTiming
Normal@r1 === Normal@r2
0.0029, Null
0.0030, Null
True
answered Aug 15 at 21:45


Carl Woll
55.5k271144
55.5k271144
add a comment |Â
add a comment |Â
up vote
4
down vote
Of course:
dataset[Select[#a == 1 &]][;; 1000]; // RepeatedTiming // First
Select[dataset, #a == 1 &, 1000]; // RepeatedTiming // First
dataset[data [Function] Select[data, #a == 1 &, 1000]]; // RepeatedTiming // First
0.53
0.0030
0.0027
(Well, I don't know of a built-in method, but as it can be done like this, why should there be an extra built-in method? In the end, the operator forms are mere syntax sugar - sugar that has to be paid by a lot of documentation. (Btw., I usually do not hesitate to write longer code for better performance.))
add a comment |Â
up vote
4
down vote
Of course:
dataset[Select[#a == 1 &]][;; 1000]; // RepeatedTiming // First
Select[dataset, #a == 1 &, 1000]; // RepeatedTiming // First
dataset[data [Function] Select[data, #a == 1 &, 1000]]; // RepeatedTiming // First
0.53
0.0030
0.0027
(Well, I don't know of a built-in method, but as it can be done like this, why should there be an extra built-in method? In the end, the operator forms are mere syntax sugar - sugar that has to be paid by a lot of documentation. (Btw., I usually do not hesitate to write longer code for better performance.))
add a comment |Â
up vote
4
down vote
up vote
4
down vote
Of course:
dataset[Select[#a == 1 &]][;; 1000]; // RepeatedTiming // First
Select[dataset, #a == 1 &, 1000]; // RepeatedTiming // First
dataset[data [Function] Select[data, #a == 1 &, 1000]]; // RepeatedTiming // First
0.53
0.0030
0.0027
(Well, I don't know of a built-in method, but as it can be done like this, why should there be an extra built-in method? In the end, the operator forms are mere syntax sugar - sugar that has to be paid by a lot of documentation. (Btw., I usually do not hesitate to write longer code for better performance.))
Of course:
dataset[Select[#a == 1 &]][;; 1000]; // RepeatedTiming // First
Select[dataset, #a == 1 &, 1000]; // RepeatedTiming // First
dataset[data [Function] Select[data, #a == 1 &, 1000]]; // RepeatedTiming // First
0.53
0.0030
0.0027
(Well, I don't know of a built-in method, but as it can be done like this, why should there be an extra built-in method? In the end, the operator forms are mere syntax sugar - sugar that has to be paid by a lot of documentation. (Btw., I usually do not hesitate to write longer code for better performance.))
answered Aug 15 at 21:22


Henrik Schumacher
36k249102
36k249102
add a comment |Â
add a comment |Â
up vote
1
down vote
Unadvised solution:
Unprotect[Select];
Select[crit_, n_Integer] = Select[#, crit, n] &;
Protect[Select];
so that
Select[1, 2, 3, 4, GreaterThan[2], 1]
Select[GreaterThan[2], 1]@1, 2, 3, 4
% == %% (* True *)
add a comment |Â
up vote
1
down vote
Unadvised solution:
Unprotect[Select];
Select[crit_, n_Integer] = Select[#, crit, n] &;
Protect[Select];
so that
Select[1, 2, 3, 4, GreaterThan[2], 1]
Select[GreaterThan[2], 1]@1, 2, 3, 4
% == %% (* True *)
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Unadvised solution:
Unprotect[Select];
Select[crit_, n_Integer] = Select[#, crit, n] &;
Protect[Select];
so that
Select[1, 2, 3, 4, GreaterThan[2], 1]
Select[GreaterThan[2], 1]@1, 2, 3, 4
% == %% (* True *)
Unadvised solution:
Unprotect[Select];
Select[crit_, n_Integer] = Select[#, crit, n] &;
Protect[Select];
so that
Select[1, 2, 3, 4, GreaterThan[2], 1]
Select[GreaterThan[2], 1]@1, 2, 3, 4
% == %% (* True *)
answered Aug 16 at 2:06


AccidentalFourierTransform
4,388838
4,388838
add a comment |Â
add a comment |Â
up vote
0
down vote
I think what you're looking for is simplySelect[#, #a==1&, 1000]& @ dataset
. For example,
select1s[n_] := Select[#, #a==1&, n]&
select1s[1000] @ dataset
does what you want.
add a comment |Â
up vote
0
down vote
I think what you're looking for is simplySelect[#, #a==1&, 1000]& @ dataset
. For example,
select1s[n_] := Select[#, #a==1&, n]&
select1s[1000] @ dataset
does what you want.
add a comment |Â
up vote
0
down vote
up vote
0
down vote
I think what you're looking for is simplySelect[#, #a==1&, 1000]& @ dataset
. For example,
select1s[n_] := Select[#, #a==1&, n]&
select1s[1000] @ dataset
does what you want.
I think what you're looking for is simplySelect[#, #a==1&, 1000]& @ dataset
. For example,
select1s[n_] := Select[#, #a==1&, n]&
select1s[1000] @ dataset
does what you want.
edited Aug 16 at 19:15
answered Aug 16 at 18:39
iron photon
364
364
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f180081%2fselect-only-first-n-elements-in-operator-form%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password