Find the intervals which have a non-empty intersection with a given interval
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
6
down vote
favorite
The problem is the following: I have a list intervals
which consists of tuples of the form (start, end
) [with start <= end
]. Each tuple represents an interval (of the real line). We assume that the intervals in intervals
are not overlapping each other. Given a new interval (s,e)
, I would like to write a Python function which checks if (s, e)
overlaps any of the intervals in intervals
. If (s, e)
has a non-empty intersection with at least one of the intervals in intervals
, the function should return the indices of these intervals in the list intervals
.
Say that the function is called find_intersections
. Then, given that intervals = [(1, 3.5), (5.5, 8.7), (10.2, 22.6), (22.7, 23.1)]
, expected outputs would be:
find_intersection(intervals, (3.2, 5.))
returnsarray([0])
find_intersection(intervals, (6.1, 7.3))
returnsarray([1])
find_intersection(intervals, (9.1, 10.2))
returnsNo intersection.
find_intersection(intervals, (5.8, 22.9))
returnsarray([1, 2, 3])
.
The code for find_intersection
I have written is:
import itertools
def find_intersection(intervals, new_interval):
_times = sorted(list(itertools.chain.from_iterable(intervals)))
ind = np.searchsorted(_times, np.asarray(new_interval))
parity = np.mod(ind, 2)
if (not np.any(parity)) and ind[1] == ind[0]:
print('No intersection.')
elif parity[0] == 1:
ub = ind[1] if parity[1] == 1 else ind[1] - 1
return np.arange((ind[0] - 1) / 2, (ub - 1) / 2 + 1)
elif parity[1] == 1:
lb = ind[0] if parity[0] == 1 else ind[0] + 1
return np.arange((lb - 1) / 2, (ind[1] - 1) / 2 + 1)
else:
lb = ind[0] if parity[0] == 1 else ind[0] + 1
ub = ind[1] if parity[1] == 1 else ind[1] - 1
return np.arange((lb - 1) / 2, (ub - 1) / 2 + 1)
I believe that the code does the job.
Is there an easier/smarter way to address this problem?
Edit: this question was cross-posted on stackoverflow.com.
python sorting numpy interval
New contributor
add a comment |Â
up vote
6
down vote
favorite
The problem is the following: I have a list intervals
which consists of tuples of the form (start, end
) [with start <= end
]. Each tuple represents an interval (of the real line). We assume that the intervals in intervals
are not overlapping each other. Given a new interval (s,e)
, I would like to write a Python function which checks if (s, e)
overlaps any of the intervals in intervals
. If (s, e)
has a non-empty intersection with at least one of the intervals in intervals
, the function should return the indices of these intervals in the list intervals
.
Say that the function is called find_intersections
. Then, given that intervals = [(1, 3.5), (5.5, 8.7), (10.2, 22.6), (22.7, 23.1)]
, expected outputs would be:
find_intersection(intervals, (3.2, 5.))
returnsarray([0])
find_intersection(intervals, (6.1, 7.3))
returnsarray([1])
find_intersection(intervals, (9.1, 10.2))
returnsNo intersection.
find_intersection(intervals, (5.8, 22.9))
returnsarray([1, 2, 3])
.
The code for find_intersection
I have written is:
import itertools
def find_intersection(intervals, new_interval):
_times = sorted(list(itertools.chain.from_iterable(intervals)))
ind = np.searchsorted(_times, np.asarray(new_interval))
parity = np.mod(ind, 2)
if (not np.any(parity)) and ind[1] == ind[0]:
print('No intersection.')
elif parity[0] == 1:
ub = ind[1] if parity[1] == 1 else ind[1] - 1
return np.arange((ind[0] - 1) / 2, (ub - 1) / 2 + 1)
elif parity[1] == 1:
lb = ind[0] if parity[0] == 1 else ind[0] + 1
return np.arange((lb - 1) / 2, (ind[1] - 1) / 2 + 1)
else:
lb = ind[0] if parity[0] == 1 else ind[0] + 1
ub = ind[1] if parity[1] == 1 else ind[1] - 1
return np.arange((lb - 1) / 2, (ub - 1) / 2 + 1)
I believe that the code does the job.
Is there an easier/smarter way to address this problem?
Edit: this question was cross-posted on stackoverflow.com.
python sorting numpy interval
New contributor
Hello and Welcome to Code Review Stack Exchange. You have a good question here and it's sad to see that it is deleted. I hope that you will bring it back, I know someone who wants to answer your question.
â Simon Forsbergâ¦
2 days ago
@SimonForsberg : Thank you for your interest in my question. I deleted the question as I thought it was more relevant on stackoverflow.com. Now that it was also asked on stackoverflow.com, not deleting the question would result in cross-posting.
â jibounet
2 days ago
2
@jibounet, This question appears to be On-topic for Code Review, I would leave it be and wait for the answer here. feel free to post a link to the Stack Overflow question here in the comments with a brief note that it has been cross posted.
â Malachiâ¦
2 days ago
@jibounet I see that you have received answers on both Stack Overflow and Code Review, as you might see the answer styles is a bit different. Just out of curiosity, which do you like more? (I promise, I won't be offended if you answer Stack Overflow)
â Simon Forsbergâ¦
2 days ago
add a comment |Â
up vote
6
down vote
favorite
up vote
6
down vote
favorite
The problem is the following: I have a list intervals
which consists of tuples of the form (start, end
) [with start <= end
]. Each tuple represents an interval (of the real line). We assume that the intervals in intervals
are not overlapping each other. Given a new interval (s,e)
, I would like to write a Python function which checks if (s, e)
overlaps any of the intervals in intervals
. If (s, e)
has a non-empty intersection with at least one of the intervals in intervals
, the function should return the indices of these intervals in the list intervals
.
Say that the function is called find_intersections
. Then, given that intervals = [(1, 3.5), (5.5, 8.7), (10.2, 22.6), (22.7, 23.1)]
, expected outputs would be:
find_intersection(intervals, (3.2, 5.))
returnsarray([0])
find_intersection(intervals, (6.1, 7.3))
returnsarray([1])
find_intersection(intervals, (9.1, 10.2))
returnsNo intersection.
find_intersection(intervals, (5.8, 22.9))
returnsarray([1, 2, 3])
.
The code for find_intersection
I have written is:
import itertools
def find_intersection(intervals, new_interval):
_times = sorted(list(itertools.chain.from_iterable(intervals)))
ind = np.searchsorted(_times, np.asarray(new_interval))
parity = np.mod(ind, 2)
if (not np.any(parity)) and ind[1] == ind[0]:
print('No intersection.')
elif parity[0] == 1:
ub = ind[1] if parity[1] == 1 else ind[1] - 1
return np.arange((ind[0] - 1) / 2, (ub - 1) / 2 + 1)
elif parity[1] == 1:
lb = ind[0] if parity[0] == 1 else ind[0] + 1
return np.arange((lb - 1) / 2, (ind[1] - 1) / 2 + 1)
else:
lb = ind[0] if parity[0] == 1 else ind[0] + 1
ub = ind[1] if parity[1] == 1 else ind[1] - 1
return np.arange((lb - 1) / 2, (ub - 1) / 2 + 1)
I believe that the code does the job.
Is there an easier/smarter way to address this problem?
Edit: this question was cross-posted on stackoverflow.com.
python sorting numpy interval
New contributor
The problem is the following: I have a list intervals
which consists of tuples of the form (start, end
) [with start <= end
]. Each tuple represents an interval (of the real line). We assume that the intervals in intervals
are not overlapping each other. Given a new interval (s,e)
, I would like to write a Python function which checks if (s, e)
overlaps any of the intervals in intervals
. If (s, e)
has a non-empty intersection with at least one of the intervals in intervals
, the function should return the indices of these intervals in the list intervals
.
Say that the function is called find_intersections
. Then, given that intervals = [(1, 3.5), (5.5, 8.7), (10.2, 22.6), (22.7, 23.1)]
, expected outputs would be:
find_intersection(intervals, (3.2, 5.))
returnsarray([0])
find_intersection(intervals, (6.1, 7.3))
returnsarray([1])
find_intersection(intervals, (9.1, 10.2))
returnsNo intersection.
find_intersection(intervals, (5.8, 22.9))
returnsarray([1, 2, 3])
.
The code for find_intersection
I have written is:
import itertools
def find_intersection(intervals, new_interval):
_times = sorted(list(itertools.chain.from_iterable(intervals)))
ind = np.searchsorted(_times, np.asarray(new_interval))
parity = np.mod(ind, 2)
if (not np.any(parity)) and ind[1] == ind[0]:
print('No intersection.')
elif parity[0] == 1:
ub = ind[1] if parity[1] == 1 else ind[1] - 1
return np.arange((ind[0] - 1) / 2, (ub - 1) / 2 + 1)
elif parity[1] == 1:
lb = ind[0] if parity[0] == 1 else ind[0] + 1
return np.arange((lb - 1) / 2, (ind[1] - 1) / 2 + 1)
else:
lb = ind[0] if parity[0] == 1 else ind[0] + 1
ub = ind[1] if parity[1] == 1 else ind[1] - 1
return np.arange((lb - 1) / 2, (ub - 1) / 2 + 1)
I believe that the code does the job.
Is there an easier/smarter way to address this problem?
Edit: this question was cross-posted on stackoverflow.com.
python sorting numpy interval
python sorting numpy interval
New contributor
New contributor
edited 2 days ago
New contributor
asked 2 days ago
jibounet
1334
1334
New contributor
New contributor
Hello and Welcome to Code Review Stack Exchange. You have a good question here and it's sad to see that it is deleted. I hope that you will bring it back, I know someone who wants to answer your question.
â Simon Forsbergâ¦
2 days ago
@SimonForsberg : Thank you for your interest in my question. I deleted the question as I thought it was more relevant on stackoverflow.com. Now that it was also asked on stackoverflow.com, not deleting the question would result in cross-posting.
â jibounet
2 days ago
2
@jibounet, This question appears to be On-topic for Code Review, I would leave it be and wait for the answer here. feel free to post a link to the Stack Overflow question here in the comments with a brief note that it has been cross posted.
â Malachiâ¦
2 days ago
@jibounet I see that you have received answers on both Stack Overflow and Code Review, as you might see the answer styles is a bit different. Just out of curiosity, which do you like more? (I promise, I won't be offended if you answer Stack Overflow)
â Simon Forsbergâ¦
2 days ago
add a comment |Â
Hello and Welcome to Code Review Stack Exchange. You have a good question here and it's sad to see that it is deleted. I hope that you will bring it back, I know someone who wants to answer your question.
â Simon Forsbergâ¦
2 days ago
@SimonForsberg : Thank you for your interest in my question. I deleted the question as I thought it was more relevant on stackoverflow.com. Now that it was also asked on stackoverflow.com, not deleting the question would result in cross-posting.
â jibounet
2 days ago
2
@jibounet, This question appears to be On-topic for Code Review, I would leave it be and wait for the answer here. feel free to post a link to the Stack Overflow question here in the comments with a brief note that it has been cross posted.
â Malachiâ¦
2 days ago
@jibounet I see that you have received answers on both Stack Overflow and Code Review, as you might see the answer styles is a bit different. Just out of curiosity, which do you like more? (I promise, I won't be offended if you answer Stack Overflow)
â Simon Forsbergâ¦
2 days ago
Hello and Welcome to Code Review Stack Exchange. You have a good question here and it's sad to see that it is deleted. I hope that you will bring it back, I know someone who wants to answer your question.
â Simon Forsbergâ¦
2 days ago
Hello and Welcome to Code Review Stack Exchange. You have a good question here and it's sad to see that it is deleted. I hope that you will bring it back, I know someone who wants to answer your question.
â Simon Forsbergâ¦
2 days ago
@SimonForsberg : Thank you for your interest in my question. I deleted the question as I thought it was more relevant on stackoverflow.com. Now that it was also asked on stackoverflow.com, not deleting the question would result in cross-posting.
â jibounet
2 days ago
@SimonForsberg : Thank you for your interest in my question. I deleted the question as I thought it was more relevant on stackoverflow.com. Now that it was also asked on stackoverflow.com, not deleting the question would result in cross-posting.
â jibounet
2 days ago
2
2
@jibounet, This question appears to be On-topic for Code Review, I would leave it be and wait for the answer here. feel free to post a link to the Stack Overflow question here in the comments with a brief note that it has been cross posted.
â Malachiâ¦
2 days ago
@jibounet, This question appears to be On-topic for Code Review, I would leave it be and wait for the answer here. feel free to post a link to the Stack Overflow question here in the comments with a brief note that it has been cross posted.
â Malachiâ¦
2 days ago
@jibounet I see that you have received answers on both Stack Overflow and Code Review, as you might see the answer styles is a bit different. Just out of curiosity, which do you like more? (I promise, I won't be offended if you answer Stack Overflow)
â Simon Forsbergâ¦
2 days ago
@jibounet I see that you have received answers on both Stack Overflow and Code Review, as you might see the answer styles is a bit different. Just out of curiosity, which do you like more? (I promise, I won't be offended if you answer Stack Overflow)
â Simon Forsbergâ¦
2 days ago
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
6
down vote
accepted
1. Review
There is no docstring. What does the function
find_intersections
do? The text in the post is pretty clear, so you could use this as the basis of a docstring. I would make it explicit that the intervals are open (if that's what you really intend), since in Python intervals are normally half-open.There seems to be some uncertainty about whether the input
intervals
is supposed to be already sorted. On the one hand, if it's already sorted then there's no point in callingsorted
. But on the other, if it's not sorted then returning a range of indexes is not very useful, as these are indexes into_times
, not intointervals
.I recommend picking one possibility or the other: if
intervals
is supposed to be sorted, then don't sort it (you might assert that it is already sorted); but if not, then translate the returned indexes so that they refer tointervals
.If there is no intersection,
find_intersections
prints a message. But if the caller is already printing some output, it would be inconvenient (at best) to have this kind of message appear in the middle of the output. Better to leave the printing of messages (or not) up to the caller.If there is an intersection,
find_intersections
returns a range of indexes. But if there are no intersection,find_intersections
returnsNone
. It is not a good idea to return an exceptional value like this, because it would be easy for the caller to forget to check for the exceptional value, and to write something like this:result = find_intersection(intervals, new_interval)
print(f"Number of intersections = len(result).")which would fail with:
TypeError: object of type 'NoneType' has no len()
What they should have written is:
result = find_intersection(intervals, new_interval)
if result is None:
print("Number of intersections = 0.")
else:
print(f"Number of intersections = len(result).")but this is very long-winded. In this case a simple improvement would be to return an empty range of indexes.
If you're going to use NumPy, then you might as well use it wherever possible. So instead of
itertools.chain.from_iterable
, usenumpy.reshape
; instead ofsorted
, usenumpy.sort
; and so on.
2. Revised code if intervals are unsorted
If intervals
is unsorted (as suggested by the call to sorted
), then there's no point in sorting it. Sorting costs $O(nlog n)$ whereas comparing the query interval with every element of intervals
only costs $O(n)$.
So in this case we can take advantage of the fact that the open interval $(a, b)$ overlaps with the open interval $(c, d)$ if and only if $a < d$ and $c < b$, and write:
def find_intersection(intervals, query):
"""Find intersections between intervals.
Intervals are open and are represented as pairs (lower bound,
upper bound).
Arguments:
intervals: array_like, shape=(N, 2) -- Array of intervals.
query: array_like, shape=(2,) -- Interval to query.
Returns:
Array of indexes of intervals that overlap with query.
"""
intervals = np.asarray(intervals)
lower, upper = query
return np.argwhere((lower < intervals[:, 1]) & (intervals[:, 0] < upper))
3. Revised code if intervals are sorted
If intervals
is already sorted, as suggested by returning indexes into the sorted array, then take advantage of the side
argument to numpy.searchsorted
and the floor division operator, //
:
def find_intersection(intervals, query):
"""Find intersections between intervals.
Intervals are open and are represented as pairs (lower bound,
upper bound).
Arguments:
intervals: array_like, shape=(N, 2) -- Array of intervals.
query: array_like, shape=(2,) -- Interval to query.
Returns:
Array of indexes of intervals that overlap with query.
"""
endpoints = np.reshape(intervals, -1)
lower, upper = query
i = np.searchsorted(endpoints, lower, side='right')
j = np.searchsorted(endpoints, upper, side='left')
return np.arange(i // 2, (j + 1) // 2)
add a comment |Â
up vote
5
down vote
return value
In case there is no overlap, it would be cleaner to either return a sentinel value (None
for example), or raise an exception, and don't trigger any side effects, like the print
variable names
ind
, lb
and ub
are not clear variable names. you can use index
, upper_bound
and lower_bound
. In this age of IDE's with code completion, there is no need to abbreviate the variable names like that
algorithm
In your algorithm you flatten intervals
. There is no need for this, and you can do it a lot simpler if you let it be 2-dimensional
naive
If you use numpy anyway, then your algorithm can be made a lot simpler.
intervals = np.array(intervals)
Then you can just use comparison to check whether there is overlap
smaller = np.array(intervals[:,0]) <= new_interval[1]
larger = np.array(intervals[:,1]) > new_interval[0]
This doesn't use the fact that intervals
is, or can be, sorted, but comparison in numpy
is generally rather quick, and probably faster than sorting.
You have an intersection between intervals a
and b
when a[0] < b[1]
and a[1] > b[0]
To get the indices of the overlapping intervals you can use:
intersection = smaller & larger
if not intersection.any():
return None # or: raise(Exception('No intersection')
return np.argwhere(intersection)
return np.argwhere(smaller & larger)
sorted
If you want to use the fact the intervals
is sorted, you can use np.searchsorted
, instead of the direct comparison
intervals = np.sort(intervals, axis=0)
lower_index = np.searchsorted(intervals[:,1], new_interval[0])
upper_index = np.searchsorted(intervals[:,0], new_interval[1])
If upper_index > lower_index
, there is an overlap
if upper_index <= lower_index:
return None
return intervals[lower_index: upper_index]
if you need the indices, you can do return lower_index, upper_index
or return np.arange(lower_index, upper_index)
. NB, this is on the sorted intervals
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
6
down vote
accepted
1. Review
There is no docstring. What does the function
find_intersections
do? The text in the post is pretty clear, so you could use this as the basis of a docstring. I would make it explicit that the intervals are open (if that's what you really intend), since in Python intervals are normally half-open.There seems to be some uncertainty about whether the input
intervals
is supposed to be already sorted. On the one hand, if it's already sorted then there's no point in callingsorted
. But on the other, if it's not sorted then returning a range of indexes is not very useful, as these are indexes into_times
, not intointervals
.I recommend picking one possibility or the other: if
intervals
is supposed to be sorted, then don't sort it (you might assert that it is already sorted); but if not, then translate the returned indexes so that they refer tointervals
.If there is no intersection,
find_intersections
prints a message. But if the caller is already printing some output, it would be inconvenient (at best) to have this kind of message appear in the middle of the output. Better to leave the printing of messages (or not) up to the caller.If there is an intersection,
find_intersections
returns a range of indexes. But if there are no intersection,find_intersections
returnsNone
. It is not a good idea to return an exceptional value like this, because it would be easy for the caller to forget to check for the exceptional value, and to write something like this:result = find_intersection(intervals, new_interval)
print(f"Number of intersections = len(result).")which would fail with:
TypeError: object of type 'NoneType' has no len()
What they should have written is:
result = find_intersection(intervals, new_interval)
if result is None:
print("Number of intersections = 0.")
else:
print(f"Number of intersections = len(result).")but this is very long-winded. In this case a simple improvement would be to return an empty range of indexes.
If you're going to use NumPy, then you might as well use it wherever possible. So instead of
itertools.chain.from_iterable
, usenumpy.reshape
; instead ofsorted
, usenumpy.sort
; and so on.
2. Revised code if intervals are unsorted
If intervals
is unsorted (as suggested by the call to sorted
), then there's no point in sorting it. Sorting costs $O(nlog n)$ whereas comparing the query interval with every element of intervals
only costs $O(n)$.
So in this case we can take advantage of the fact that the open interval $(a, b)$ overlaps with the open interval $(c, d)$ if and only if $a < d$ and $c < b$, and write:
def find_intersection(intervals, query):
"""Find intersections between intervals.
Intervals are open and are represented as pairs (lower bound,
upper bound).
Arguments:
intervals: array_like, shape=(N, 2) -- Array of intervals.
query: array_like, shape=(2,) -- Interval to query.
Returns:
Array of indexes of intervals that overlap with query.
"""
intervals = np.asarray(intervals)
lower, upper = query
return np.argwhere((lower < intervals[:, 1]) & (intervals[:, 0] < upper))
3. Revised code if intervals are sorted
If intervals
is already sorted, as suggested by returning indexes into the sorted array, then take advantage of the side
argument to numpy.searchsorted
and the floor division operator, //
:
def find_intersection(intervals, query):
"""Find intersections between intervals.
Intervals are open and are represented as pairs (lower bound,
upper bound).
Arguments:
intervals: array_like, shape=(N, 2) -- Array of intervals.
query: array_like, shape=(2,) -- Interval to query.
Returns:
Array of indexes of intervals that overlap with query.
"""
endpoints = np.reshape(intervals, -1)
lower, upper = query
i = np.searchsorted(endpoints, lower, side='right')
j = np.searchsorted(endpoints, upper, side='left')
return np.arange(i // 2, (j + 1) // 2)
add a comment |Â
up vote
6
down vote
accepted
1. Review
There is no docstring. What does the function
find_intersections
do? The text in the post is pretty clear, so you could use this as the basis of a docstring. I would make it explicit that the intervals are open (if that's what you really intend), since in Python intervals are normally half-open.There seems to be some uncertainty about whether the input
intervals
is supposed to be already sorted. On the one hand, if it's already sorted then there's no point in callingsorted
. But on the other, if it's not sorted then returning a range of indexes is not very useful, as these are indexes into_times
, not intointervals
.I recommend picking one possibility or the other: if
intervals
is supposed to be sorted, then don't sort it (you might assert that it is already sorted); but if not, then translate the returned indexes so that they refer tointervals
.If there is no intersection,
find_intersections
prints a message. But if the caller is already printing some output, it would be inconvenient (at best) to have this kind of message appear in the middle of the output. Better to leave the printing of messages (or not) up to the caller.If there is an intersection,
find_intersections
returns a range of indexes. But if there are no intersection,find_intersections
returnsNone
. It is not a good idea to return an exceptional value like this, because it would be easy for the caller to forget to check for the exceptional value, and to write something like this:result = find_intersection(intervals, new_interval)
print(f"Number of intersections = len(result).")which would fail with:
TypeError: object of type 'NoneType' has no len()
What they should have written is:
result = find_intersection(intervals, new_interval)
if result is None:
print("Number of intersections = 0.")
else:
print(f"Number of intersections = len(result).")but this is very long-winded. In this case a simple improvement would be to return an empty range of indexes.
If you're going to use NumPy, then you might as well use it wherever possible. So instead of
itertools.chain.from_iterable
, usenumpy.reshape
; instead ofsorted
, usenumpy.sort
; and so on.
2. Revised code if intervals are unsorted
If intervals
is unsorted (as suggested by the call to sorted
), then there's no point in sorting it. Sorting costs $O(nlog n)$ whereas comparing the query interval with every element of intervals
only costs $O(n)$.
So in this case we can take advantage of the fact that the open interval $(a, b)$ overlaps with the open interval $(c, d)$ if and only if $a < d$ and $c < b$, and write:
def find_intersection(intervals, query):
"""Find intersections between intervals.
Intervals are open and are represented as pairs (lower bound,
upper bound).
Arguments:
intervals: array_like, shape=(N, 2) -- Array of intervals.
query: array_like, shape=(2,) -- Interval to query.
Returns:
Array of indexes of intervals that overlap with query.
"""
intervals = np.asarray(intervals)
lower, upper = query
return np.argwhere((lower < intervals[:, 1]) & (intervals[:, 0] < upper))
3. Revised code if intervals are sorted
If intervals
is already sorted, as suggested by returning indexes into the sorted array, then take advantage of the side
argument to numpy.searchsorted
and the floor division operator, //
:
def find_intersection(intervals, query):
"""Find intersections between intervals.
Intervals are open and are represented as pairs (lower bound,
upper bound).
Arguments:
intervals: array_like, shape=(N, 2) -- Array of intervals.
query: array_like, shape=(2,) -- Interval to query.
Returns:
Array of indexes of intervals that overlap with query.
"""
endpoints = np.reshape(intervals, -1)
lower, upper = query
i = np.searchsorted(endpoints, lower, side='right')
j = np.searchsorted(endpoints, upper, side='left')
return np.arange(i // 2, (j + 1) // 2)
add a comment |Â
up vote
6
down vote
accepted
up vote
6
down vote
accepted
1. Review
There is no docstring. What does the function
find_intersections
do? The text in the post is pretty clear, so you could use this as the basis of a docstring. I would make it explicit that the intervals are open (if that's what you really intend), since in Python intervals are normally half-open.There seems to be some uncertainty about whether the input
intervals
is supposed to be already sorted. On the one hand, if it's already sorted then there's no point in callingsorted
. But on the other, if it's not sorted then returning a range of indexes is not very useful, as these are indexes into_times
, not intointervals
.I recommend picking one possibility or the other: if
intervals
is supposed to be sorted, then don't sort it (you might assert that it is already sorted); but if not, then translate the returned indexes so that they refer tointervals
.If there is no intersection,
find_intersections
prints a message. But if the caller is already printing some output, it would be inconvenient (at best) to have this kind of message appear in the middle of the output. Better to leave the printing of messages (or not) up to the caller.If there is an intersection,
find_intersections
returns a range of indexes. But if there are no intersection,find_intersections
returnsNone
. It is not a good idea to return an exceptional value like this, because it would be easy for the caller to forget to check for the exceptional value, and to write something like this:result = find_intersection(intervals, new_interval)
print(f"Number of intersections = len(result).")which would fail with:
TypeError: object of type 'NoneType' has no len()
What they should have written is:
result = find_intersection(intervals, new_interval)
if result is None:
print("Number of intersections = 0.")
else:
print(f"Number of intersections = len(result).")but this is very long-winded. In this case a simple improvement would be to return an empty range of indexes.
If you're going to use NumPy, then you might as well use it wherever possible. So instead of
itertools.chain.from_iterable
, usenumpy.reshape
; instead ofsorted
, usenumpy.sort
; and so on.
2. Revised code if intervals are unsorted
If intervals
is unsorted (as suggested by the call to sorted
), then there's no point in sorting it. Sorting costs $O(nlog n)$ whereas comparing the query interval with every element of intervals
only costs $O(n)$.
So in this case we can take advantage of the fact that the open interval $(a, b)$ overlaps with the open interval $(c, d)$ if and only if $a < d$ and $c < b$, and write:
def find_intersection(intervals, query):
"""Find intersections between intervals.
Intervals are open and are represented as pairs (lower bound,
upper bound).
Arguments:
intervals: array_like, shape=(N, 2) -- Array of intervals.
query: array_like, shape=(2,) -- Interval to query.
Returns:
Array of indexes of intervals that overlap with query.
"""
intervals = np.asarray(intervals)
lower, upper = query
return np.argwhere((lower < intervals[:, 1]) & (intervals[:, 0] < upper))
3. Revised code if intervals are sorted
If intervals
is already sorted, as suggested by returning indexes into the sorted array, then take advantage of the side
argument to numpy.searchsorted
and the floor division operator, //
:
def find_intersection(intervals, query):
"""Find intersections between intervals.
Intervals are open and are represented as pairs (lower bound,
upper bound).
Arguments:
intervals: array_like, shape=(N, 2) -- Array of intervals.
query: array_like, shape=(2,) -- Interval to query.
Returns:
Array of indexes of intervals that overlap with query.
"""
endpoints = np.reshape(intervals, -1)
lower, upper = query
i = np.searchsorted(endpoints, lower, side='right')
j = np.searchsorted(endpoints, upper, side='left')
return np.arange(i // 2, (j + 1) // 2)
1. Review
There is no docstring. What does the function
find_intersections
do? The text in the post is pretty clear, so you could use this as the basis of a docstring. I would make it explicit that the intervals are open (if that's what you really intend), since in Python intervals are normally half-open.There seems to be some uncertainty about whether the input
intervals
is supposed to be already sorted. On the one hand, if it's already sorted then there's no point in callingsorted
. But on the other, if it's not sorted then returning a range of indexes is not very useful, as these are indexes into_times
, not intointervals
.I recommend picking one possibility or the other: if
intervals
is supposed to be sorted, then don't sort it (you might assert that it is already sorted); but if not, then translate the returned indexes so that they refer tointervals
.If there is no intersection,
find_intersections
prints a message. But if the caller is already printing some output, it would be inconvenient (at best) to have this kind of message appear in the middle of the output. Better to leave the printing of messages (or not) up to the caller.If there is an intersection,
find_intersections
returns a range of indexes. But if there are no intersection,find_intersections
returnsNone
. It is not a good idea to return an exceptional value like this, because it would be easy for the caller to forget to check for the exceptional value, and to write something like this:result = find_intersection(intervals, new_interval)
print(f"Number of intersections = len(result).")which would fail with:
TypeError: object of type 'NoneType' has no len()
What they should have written is:
result = find_intersection(intervals, new_interval)
if result is None:
print("Number of intersections = 0.")
else:
print(f"Number of intersections = len(result).")but this is very long-winded. In this case a simple improvement would be to return an empty range of indexes.
If you're going to use NumPy, then you might as well use it wherever possible. So instead of
itertools.chain.from_iterable
, usenumpy.reshape
; instead ofsorted
, usenumpy.sort
; and so on.
2. Revised code if intervals are unsorted
If intervals
is unsorted (as suggested by the call to sorted
), then there's no point in sorting it. Sorting costs $O(nlog n)$ whereas comparing the query interval with every element of intervals
only costs $O(n)$.
So in this case we can take advantage of the fact that the open interval $(a, b)$ overlaps with the open interval $(c, d)$ if and only if $a < d$ and $c < b$, and write:
def find_intersection(intervals, query):
"""Find intersections between intervals.
Intervals are open and are represented as pairs (lower bound,
upper bound).
Arguments:
intervals: array_like, shape=(N, 2) -- Array of intervals.
query: array_like, shape=(2,) -- Interval to query.
Returns:
Array of indexes of intervals that overlap with query.
"""
intervals = np.asarray(intervals)
lower, upper = query
return np.argwhere((lower < intervals[:, 1]) & (intervals[:, 0] < upper))
3. Revised code if intervals are sorted
If intervals
is already sorted, as suggested by returning indexes into the sorted array, then take advantage of the side
argument to numpy.searchsorted
and the floor division operator, //
:
def find_intersection(intervals, query):
"""Find intersections between intervals.
Intervals are open and are represented as pairs (lower bound,
upper bound).
Arguments:
intervals: array_like, shape=(N, 2) -- Array of intervals.
query: array_like, shape=(2,) -- Interval to query.
Returns:
Array of indexes of intervals that overlap with query.
"""
endpoints = np.reshape(intervals, -1)
lower, upper = query
i = np.searchsorted(endpoints, lower, side='right')
j = np.searchsorted(endpoints, upper, side='left')
return np.arange(i // 2, (j + 1) // 2)
answered 2 days ago
Gareth Rees
41.8k394168
41.8k394168
add a comment |Â
add a comment |Â
up vote
5
down vote
return value
In case there is no overlap, it would be cleaner to either return a sentinel value (None
for example), or raise an exception, and don't trigger any side effects, like the print
variable names
ind
, lb
and ub
are not clear variable names. you can use index
, upper_bound
and lower_bound
. In this age of IDE's with code completion, there is no need to abbreviate the variable names like that
algorithm
In your algorithm you flatten intervals
. There is no need for this, and you can do it a lot simpler if you let it be 2-dimensional
naive
If you use numpy anyway, then your algorithm can be made a lot simpler.
intervals = np.array(intervals)
Then you can just use comparison to check whether there is overlap
smaller = np.array(intervals[:,0]) <= new_interval[1]
larger = np.array(intervals[:,1]) > new_interval[0]
This doesn't use the fact that intervals
is, or can be, sorted, but comparison in numpy
is generally rather quick, and probably faster than sorting.
You have an intersection between intervals a
and b
when a[0] < b[1]
and a[1] > b[0]
To get the indices of the overlapping intervals you can use:
intersection = smaller & larger
if not intersection.any():
return None # or: raise(Exception('No intersection')
return np.argwhere(intersection)
return np.argwhere(smaller & larger)
sorted
If you want to use the fact the intervals
is sorted, you can use np.searchsorted
, instead of the direct comparison
intervals = np.sort(intervals, axis=0)
lower_index = np.searchsorted(intervals[:,1], new_interval[0])
upper_index = np.searchsorted(intervals[:,0], new_interval[1])
If upper_index > lower_index
, there is an overlap
if upper_index <= lower_index:
return None
return intervals[lower_index: upper_index]
if you need the indices, you can do return lower_index, upper_index
or return np.arange(lower_index, upper_index)
. NB, this is on the sorted intervals
add a comment |Â
up vote
5
down vote
return value
In case there is no overlap, it would be cleaner to either return a sentinel value (None
for example), or raise an exception, and don't trigger any side effects, like the print
variable names
ind
, lb
and ub
are not clear variable names. you can use index
, upper_bound
and lower_bound
. In this age of IDE's with code completion, there is no need to abbreviate the variable names like that
algorithm
In your algorithm you flatten intervals
. There is no need for this, and you can do it a lot simpler if you let it be 2-dimensional
naive
If you use numpy anyway, then your algorithm can be made a lot simpler.
intervals = np.array(intervals)
Then you can just use comparison to check whether there is overlap
smaller = np.array(intervals[:,0]) <= new_interval[1]
larger = np.array(intervals[:,1]) > new_interval[0]
This doesn't use the fact that intervals
is, or can be, sorted, but comparison in numpy
is generally rather quick, and probably faster than sorting.
You have an intersection between intervals a
and b
when a[0] < b[1]
and a[1] > b[0]
To get the indices of the overlapping intervals you can use:
intersection = smaller & larger
if not intersection.any():
return None # or: raise(Exception('No intersection')
return np.argwhere(intersection)
return np.argwhere(smaller & larger)
sorted
If you want to use the fact the intervals
is sorted, you can use np.searchsorted
, instead of the direct comparison
intervals = np.sort(intervals, axis=0)
lower_index = np.searchsorted(intervals[:,1], new_interval[0])
upper_index = np.searchsorted(intervals[:,0], new_interval[1])
If upper_index > lower_index
, there is an overlap
if upper_index <= lower_index:
return None
return intervals[lower_index: upper_index]
if you need the indices, you can do return lower_index, upper_index
or return np.arange(lower_index, upper_index)
. NB, this is on the sorted intervals
add a comment |Â
up vote
5
down vote
up vote
5
down vote
return value
In case there is no overlap, it would be cleaner to either return a sentinel value (None
for example), or raise an exception, and don't trigger any side effects, like the print
variable names
ind
, lb
and ub
are not clear variable names. you can use index
, upper_bound
and lower_bound
. In this age of IDE's with code completion, there is no need to abbreviate the variable names like that
algorithm
In your algorithm you flatten intervals
. There is no need for this, and you can do it a lot simpler if you let it be 2-dimensional
naive
If you use numpy anyway, then your algorithm can be made a lot simpler.
intervals = np.array(intervals)
Then you can just use comparison to check whether there is overlap
smaller = np.array(intervals[:,0]) <= new_interval[1]
larger = np.array(intervals[:,1]) > new_interval[0]
This doesn't use the fact that intervals
is, or can be, sorted, but comparison in numpy
is generally rather quick, and probably faster than sorting.
You have an intersection between intervals a
and b
when a[0] < b[1]
and a[1] > b[0]
To get the indices of the overlapping intervals you can use:
intersection = smaller & larger
if not intersection.any():
return None # or: raise(Exception('No intersection')
return np.argwhere(intersection)
return np.argwhere(smaller & larger)
sorted
If you want to use the fact the intervals
is sorted, you can use np.searchsorted
, instead of the direct comparison
intervals = np.sort(intervals, axis=0)
lower_index = np.searchsorted(intervals[:,1], new_interval[0])
upper_index = np.searchsorted(intervals[:,0], new_interval[1])
If upper_index > lower_index
, there is an overlap
if upper_index <= lower_index:
return None
return intervals[lower_index: upper_index]
if you need the indices, you can do return lower_index, upper_index
or return np.arange(lower_index, upper_index)
. NB, this is on the sorted intervals
return value
In case there is no overlap, it would be cleaner to either return a sentinel value (None
for example), or raise an exception, and don't trigger any side effects, like the print
variable names
ind
, lb
and ub
are not clear variable names. you can use index
, upper_bound
and lower_bound
. In this age of IDE's with code completion, there is no need to abbreviate the variable names like that
algorithm
In your algorithm you flatten intervals
. There is no need for this, and you can do it a lot simpler if you let it be 2-dimensional
naive
If you use numpy anyway, then your algorithm can be made a lot simpler.
intervals = np.array(intervals)
Then you can just use comparison to check whether there is overlap
smaller = np.array(intervals[:,0]) <= new_interval[1]
larger = np.array(intervals[:,1]) > new_interval[0]
This doesn't use the fact that intervals
is, or can be, sorted, but comparison in numpy
is generally rather quick, and probably faster than sorting.
You have an intersection between intervals a
and b
when a[0] < b[1]
and a[1] > b[0]
To get the indices of the overlapping intervals you can use:
intersection = smaller & larger
if not intersection.any():
return None # or: raise(Exception('No intersection')
return np.argwhere(intersection)
return np.argwhere(smaller & larger)
sorted
If you want to use the fact the intervals
is sorted, you can use np.searchsorted
, instead of the direct comparison
intervals = np.sort(intervals, axis=0)
lower_index = np.searchsorted(intervals[:,1], new_interval[0])
upper_index = np.searchsorted(intervals[:,0], new_interval[1])
If upper_index > lower_index
, there is an overlap
if upper_index <= lower_index:
return None
return intervals[lower_index: upper_index]
if you need the indices, you can do return lower_index, upper_index
or return np.arange(lower_index, upper_index)
. NB, this is on the sorted intervals
answered 2 days ago
Maarten Fabré
3,504214
3,504214
add a comment |Â
add a comment |Â
jibounet is a new contributor. Be nice, and check out our Code of Conduct.
jibounet is a new contributor. Be nice, and check out our Code of Conduct.
jibounet is a new contributor. Be nice, and check out our Code of Conduct.
jibounet is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f203468%2ffind-the-intervals-which-have-a-non-empty-intersection-with-a-given-interval%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Hello and Welcome to Code Review Stack Exchange. You have a good question here and it's sad to see that it is deleted. I hope that you will bring it back, I know someone who wants to answer your question.
â Simon Forsbergâ¦
2 days ago
@SimonForsberg : Thank you for your interest in my question. I deleted the question as I thought it was more relevant on stackoverflow.com. Now that it was also asked on stackoverflow.com, not deleting the question would result in cross-posting.
â jibounet
2 days ago
2
@jibounet, This question appears to be On-topic for Code Review, I would leave it be and wait for the answer here. feel free to post a link to the Stack Overflow question here in the comments with a brief note that it has been cross posted.
â Malachiâ¦
2 days ago
@jibounet I see that you have received answers on both Stack Overflow and Code Review, as you might see the answer styles is a bit different. Just out of curiosity, which do you like more? (I promise, I won't be offended if you answer Stack Overflow)
â Simon Forsbergâ¦
2 days ago