Find the intervals which have a non-empty intersection with a given interval

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
6
down vote

favorite












The problem is the following: I have a list intervals which consists of tuples of the form (start, end) [with start <= end]. Each tuple represents an interval (of the real line). We assume that the intervals in intervals are not overlapping each other. Given a new interval (s,e), I would like to write a Python function which checks if (s, e) overlaps any of the intervals in intervals. If (s, e) has a non-empty intersection with at least one of the intervals in intervals, the function should return the indices of these intervals in the list intervals.



Say that the function is called find_intersections. Then, given that intervals = [(1, 3.5), (5.5, 8.7), (10.2, 22.6), (22.7, 23.1)], expected outputs would be:




  • find_intersection(intervals, (3.2, 5.)) returns array([0])


  • find_intersection(intervals, (6.1, 7.3)) returns array([1])


  • find_intersection(intervals, (9.1, 10.2)) returns No intersection.


  • find_intersection(intervals, (5.8, 22.9)) returns array([1, 2, 3]).

The code for find_intersection I have written is:



import itertools

def find_intersection(intervals, new_interval):
_times = sorted(list(itertools.chain.from_iterable(intervals)))
ind = np.searchsorted(_times, np.asarray(new_interval))
parity = np.mod(ind, 2)
if (not np.any(parity)) and ind[1] == ind[0]:
print('No intersection.')
elif parity[0] == 1:
ub = ind[1] if parity[1] == 1 else ind[1] - 1
return np.arange((ind[0] - 1) / 2, (ub - 1) / 2 + 1)
elif parity[1] == 1:
lb = ind[0] if parity[0] == 1 else ind[0] + 1
return np.arange((lb - 1) / 2, (ind[1] - 1) / 2 + 1)
else:
lb = ind[0] if parity[0] == 1 else ind[0] + 1
ub = ind[1] if parity[1] == 1 else ind[1] - 1
return np.arange((lb - 1) / 2, (ub - 1) / 2 + 1)


I believe that the code does the job.



Is there an easier/smarter way to address this problem?



Edit: this question was cross-posted on stackoverflow.com.










share|improve this question









New contributor




jibounet is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • Hello and Welcome to Code Review Stack Exchange. You have a good question here and it's sad to see that it is deleted. I hope that you will bring it back, I know someone who wants to answer your question.
    – Simon Forsberg♦
    2 days ago










  • @SimonForsberg : Thank you for your interest in my question. I deleted the question as I thought it was more relevant on stackoverflow.com. Now that it was also asked on stackoverflow.com, not deleting the question would result in cross-posting.
    – jibounet
    2 days ago






  • 2




    @jibounet, This question appears to be On-topic for Code Review, I would leave it be and wait for the answer here. feel free to post a link to the Stack Overflow question here in the comments with a brief note that it has been cross posted.
    – Malachi♦
    2 days ago










  • @jibounet I see that you have received answers on both Stack Overflow and Code Review, as you might see the answer styles is a bit different. Just out of curiosity, which do you like more? (I promise, I won't be offended if you answer Stack Overflow)
    – Simon Forsberg♦
    2 days ago
















up vote
6
down vote

favorite












The problem is the following: I have a list intervals which consists of tuples of the form (start, end) [with start <= end]. Each tuple represents an interval (of the real line). We assume that the intervals in intervals are not overlapping each other. Given a new interval (s,e), I would like to write a Python function which checks if (s, e) overlaps any of the intervals in intervals. If (s, e) has a non-empty intersection with at least one of the intervals in intervals, the function should return the indices of these intervals in the list intervals.



Say that the function is called find_intersections. Then, given that intervals = [(1, 3.5), (5.5, 8.7), (10.2, 22.6), (22.7, 23.1)], expected outputs would be:




  • find_intersection(intervals, (3.2, 5.)) returns array([0])


  • find_intersection(intervals, (6.1, 7.3)) returns array([1])


  • find_intersection(intervals, (9.1, 10.2)) returns No intersection.


  • find_intersection(intervals, (5.8, 22.9)) returns array([1, 2, 3]).

The code for find_intersection I have written is:



import itertools

def find_intersection(intervals, new_interval):
_times = sorted(list(itertools.chain.from_iterable(intervals)))
ind = np.searchsorted(_times, np.asarray(new_interval))
parity = np.mod(ind, 2)
if (not np.any(parity)) and ind[1] == ind[0]:
print('No intersection.')
elif parity[0] == 1:
ub = ind[1] if parity[1] == 1 else ind[1] - 1
return np.arange((ind[0] - 1) / 2, (ub - 1) / 2 + 1)
elif parity[1] == 1:
lb = ind[0] if parity[0] == 1 else ind[0] + 1
return np.arange((lb - 1) / 2, (ind[1] - 1) / 2 + 1)
else:
lb = ind[0] if parity[0] == 1 else ind[0] + 1
ub = ind[1] if parity[1] == 1 else ind[1] - 1
return np.arange((lb - 1) / 2, (ub - 1) / 2 + 1)


I believe that the code does the job.



Is there an easier/smarter way to address this problem?



Edit: this question was cross-posted on stackoverflow.com.










share|improve this question









New contributor




jibounet is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • Hello and Welcome to Code Review Stack Exchange. You have a good question here and it's sad to see that it is deleted. I hope that you will bring it back, I know someone who wants to answer your question.
    – Simon Forsberg♦
    2 days ago










  • @SimonForsberg : Thank you for your interest in my question. I deleted the question as I thought it was more relevant on stackoverflow.com. Now that it was also asked on stackoverflow.com, not deleting the question would result in cross-posting.
    – jibounet
    2 days ago






  • 2




    @jibounet, This question appears to be On-topic for Code Review, I would leave it be and wait for the answer here. feel free to post a link to the Stack Overflow question here in the comments with a brief note that it has been cross posted.
    – Malachi♦
    2 days ago










  • @jibounet I see that you have received answers on both Stack Overflow and Code Review, as you might see the answer styles is a bit different. Just out of curiosity, which do you like more? (I promise, I won't be offended if you answer Stack Overflow)
    – Simon Forsberg♦
    2 days ago












up vote
6
down vote

favorite









up vote
6
down vote

favorite











The problem is the following: I have a list intervals which consists of tuples of the form (start, end) [with start <= end]. Each tuple represents an interval (of the real line). We assume that the intervals in intervals are not overlapping each other. Given a new interval (s,e), I would like to write a Python function which checks if (s, e) overlaps any of the intervals in intervals. If (s, e) has a non-empty intersection with at least one of the intervals in intervals, the function should return the indices of these intervals in the list intervals.



Say that the function is called find_intersections. Then, given that intervals = [(1, 3.5), (5.5, 8.7), (10.2, 22.6), (22.7, 23.1)], expected outputs would be:




  • find_intersection(intervals, (3.2, 5.)) returns array([0])


  • find_intersection(intervals, (6.1, 7.3)) returns array([1])


  • find_intersection(intervals, (9.1, 10.2)) returns No intersection.


  • find_intersection(intervals, (5.8, 22.9)) returns array([1, 2, 3]).

The code for find_intersection I have written is:



import itertools

def find_intersection(intervals, new_interval):
_times = sorted(list(itertools.chain.from_iterable(intervals)))
ind = np.searchsorted(_times, np.asarray(new_interval))
parity = np.mod(ind, 2)
if (not np.any(parity)) and ind[1] == ind[0]:
print('No intersection.')
elif parity[0] == 1:
ub = ind[1] if parity[1] == 1 else ind[1] - 1
return np.arange((ind[0] - 1) / 2, (ub - 1) / 2 + 1)
elif parity[1] == 1:
lb = ind[0] if parity[0] == 1 else ind[0] + 1
return np.arange((lb - 1) / 2, (ind[1] - 1) / 2 + 1)
else:
lb = ind[0] if parity[0] == 1 else ind[0] + 1
ub = ind[1] if parity[1] == 1 else ind[1] - 1
return np.arange((lb - 1) / 2, (ub - 1) / 2 + 1)


I believe that the code does the job.



Is there an easier/smarter way to address this problem?



Edit: this question was cross-posted on stackoverflow.com.










share|improve this question









New contributor




jibounet is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











The problem is the following: I have a list intervals which consists of tuples of the form (start, end) [with start <= end]. Each tuple represents an interval (of the real line). We assume that the intervals in intervals are not overlapping each other. Given a new interval (s,e), I would like to write a Python function which checks if (s, e) overlaps any of the intervals in intervals. If (s, e) has a non-empty intersection with at least one of the intervals in intervals, the function should return the indices of these intervals in the list intervals.



Say that the function is called find_intersections. Then, given that intervals = [(1, 3.5), (5.5, 8.7), (10.2, 22.6), (22.7, 23.1)], expected outputs would be:




  • find_intersection(intervals, (3.2, 5.)) returns array([0])


  • find_intersection(intervals, (6.1, 7.3)) returns array([1])


  • find_intersection(intervals, (9.1, 10.2)) returns No intersection.


  • find_intersection(intervals, (5.8, 22.9)) returns array([1, 2, 3]).

The code for find_intersection I have written is:



import itertools

def find_intersection(intervals, new_interval):
_times = sorted(list(itertools.chain.from_iterable(intervals)))
ind = np.searchsorted(_times, np.asarray(new_interval))
parity = np.mod(ind, 2)
if (not np.any(parity)) and ind[1] == ind[0]:
print('No intersection.')
elif parity[0] == 1:
ub = ind[1] if parity[1] == 1 else ind[1] - 1
return np.arange((ind[0] - 1) / 2, (ub - 1) / 2 + 1)
elif parity[1] == 1:
lb = ind[0] if parity[0] == 1 else ind[0] + 1
return np.arange((lb - 1) / 2, (ind[1] - 1) / 2 + 1)
else:
lb = ind[0] if parity[0] == 1 else ind[0] + 1
ub = ind[1] if parity[1] == 1 else ind[1] - 1
return np.arange((lb - 1) / 2, (ub - 1) / 2 + 1)


I believe that the code does the job.



Is there an easier/smarter way to address this problem?



Edit: this question was cross-posted on stackoverflow.com.







python sorting numpy interval






share|improve this question









New contributor




jibounet is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




jibounet is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 2 days ago





















New contributor




jibounet is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









jibounet

1334




1334




New contributor




jibounet is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





jibounet is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






jibounet is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • Hello and Welcome to Code Review Stack Exchange. You have a good question here and it's sad to see that it is deleted. I hope that you will bring it back, I know someone who wants to answer your question.
    – Simon Forsberg♦
    2 days ago










  • @SimonForsberg : Thank you for your interest in my question. I deleted the question as I thought it was more relevant on stackoverflow.com. Now that it was also asked on stackoverflow.com, not deleting the question would result in cross-posting.
    – jibounet
    2 days ago






  • 2




    @jibounet, This question appears to be On-topic for Code Review, I would leave it be and wait for the answer here. feel free to post a link to the Stack Overflow question here in the comments with a brief note that it has been cross posted.
    – Malachi♦
    2 days ago










  • @jibounet I see that you have received answers on both Stack Overflow and Code Review, as you might see the answer styles is a bit different. Just out of curiosity, which do you like more? (I promise, I won't be offended if you answer Stack Overflow)
    – Simon Forsberg♦
    2 days ago
















  • Hello and Welcome to Code Review Stack Exchange. You have a good question here and it's sad to see that it is deleted. I hope that you will bring it back, I know someone who wants to answer your question.
    – Simon Forsberg♦
    2 days ago










  • @SimonForsberg : Thank you for your interest in my question. I deleted the question as I thought it was more relevant on stackoverflow.com. Now that it was also asked on stackoverflow.com, not deleting the question would result in cross-posting.
    – jibounet
    2 days ago






  • 2




    @jibounet, This question appears to be On-topic for Code Review, I would leave it be and wait for the answer here. feel free to post a link to the Stack Overflow question here in the comments with a brief note that it has been cross posted.
    – Malachi♦
    2 days ago










  • @jibounet I see that you have received answers on both Stack Overflow and Code Review, as you might see the answer styles is a bit different. Just out of curiosity, which do you like more? (I promise, I won't be offended if you answer Stack Overflow)
    – Simon Forsberg♦
    2 days ago















Hello and Welcome to Code Review Stack Exchange. You have a good question here and it's sad to see that it is deleted. I hope that you will bring it back, I know someone who wants to answer your question.
– Simon Forsberg♦
2 days ago




Hello and Welcome to Code Review Stack Exchange. You have a good question here and it's sad to see that it is deleted. I hope that you will bring it back, I know someone who wants to answer your question.
– Simon Forsberg♦
2 days ago












@SimonForsberg : Thank you for your interest in my question. I deleted the question as I thought it was more relevant on stackoverflow.com. Now that it was also asked on stackoverflow.com, not deleting the question would result in cross-posting.
– jibounet
2 days ago




@SimonForsberg : Thank you for your interest in my question. I deleted the question as I thought it was more relevant on stackoverflow.com. Now that it was also asked on stackoverflow.com, not deleting the question would result in cross-posting.
– jibounet
2 days ago




2




2




@jibounet, This question appears to be On-topic for Code Review, I would leave it be and wait for the answer here. feel free to post a link to the Stack Overflow question here in the comments with a brief note that it has been cross posted.
– Malachi♦
2 days ago




@jibounet, This question appears to be On-topic for Code Review, I would leave it be and wait for the answer here. feel free to post a link to the Stack Overflow question here in the comments with a brief note that it has been cross posted.
– Malachi♦
2 days ago












@jibounet I see that you have received answers on both Stack Overflow and Code Review, as you might see the answer styles is a bit different. Just out of curiosity, which do you like more? (I promise, I won't be offended if you answer Stack Overflow)
– Simon Forsberg♦
2 days ago




@jibounet I see that you have received answers on both Stack Overflow and Code Review, as you might see the answer styles is a bit different. Just out of curiosity, which do you like more? (I promise, I won't be offended if you answer Stack Overflow)
– Simon Forsberg♦
2 days ago










2 Answers
2






active

oldest

votes

















up vote
6
down vote



accepted










1. Review



  1. There is no docstring. What does the function find_intersections do? The text in the post is pretty clear, so you could use this as the basis of a docstring. I would make it explicit that the intervals are open (if that's what you really intend), since in Python intervals are normally half-open.



  2. There seems to be some uncertainty about whether the input intervals is supposed to be already sorted. On the one hand, if it's already sorted then there's no point in calling sorted. But on the other, if it's not sorted then returning a range of indexes is not very useful, as these are indexes into _times, not into intervals.



    I recommend picking one possibility or the other: if intervals is supposed to be sorted, then don't sort it (you might assert that it is already sorted); but if not, then translate the returned indexes so that they refer to intervals.



  3. If there is no intersection, find_intersections prints a message. But if the caller is already printing some output, it would be inconvenient (at best) to have this kind of message appear in the middle of the output. Better to leave the printing of messages (or not) up to the caller.



  4. If there is an intersection, find_intersections returns a range of indexes. But if there are no intersection, find_intersections returns None. It is not a good idea to return an exceptional value like this, because it would be easy for the caller to forget to check for the exceptional value, and to write something like this:



    result = find_intersection(intervals, new_interval)
    print(f"Number of intersections = len(result).")


    which would fail with:



    TypeError: object of type 'NoneType' has no len()


    What they should have written is:



    result = find_intersection(intervals, new_interval)
    if result is None:
    print("Number of intersections = 0.")
    else:
    print(f"Number of intersections = len(result).")


    but this is very long-winded. In this case a simple improvement would be to return an empty range of indexes.



  5. If you're going to use NumPy, then you might as well use it wherever possible. So instead of itertools.chain.from_iterable, use numpy.reshape; instead of sorted, use numpy.sort; and so on.


2. Revised code if intervals are unsorted



If intervals is unsorted (as suggested by the call to sorted), then there's no point in sorting it. Sorting costs $O(nlog n)$ whereas comparing the query interval with every element of intervals only costs $O(n)$.



So in this case we can take advantage of the fact that the open interval $(a, b)$ overlaps with the open interval $(c, d)$ if and only if $a < d$ and $c < b$, and write:



def find_intersection(intervals, query):
"""Find intersections between intervals.
Intervals are open and are represented as pairs (lower bound,
upper bound).

Arguments:
intervals: array_like, shape=(N, 2) -- Array of intervals.
query: array_like, shape=(2,) -- Interval to query.

Returns:
Array of indexes of intervals that overlap with query.

"""
intervals = np.asarray(intervals)
lower, upper = query
return np.argwhere((lower < intervals[:, 1]) & (intervals[:, 0] < upper))


3. Revised code if intervals are sorted



If intervals is already sorted, as suggested by returning indexes into the sorted array, then take advantage of the side argument to numpy.searchsorted and the floor division operator, //:



def find_intersection(intervals, query):
"""Find intersections between intervals.
Intervals are open and are represented as pairs (lower bound,
upper bound).

Arguments:
intervals: array_like, shape=(N, 2) -- Array of intervals.
query: array_like, shape=(2,) -- Interval to query.

Returns:
Array of indexes of intervals that overlap with query.

"""
endpoints = np.reshape(intervals, -1)
lower, upper = query
i = np.searchsorted(endpoints, lower, side='right')
j = np.searchsorted(endpoints, upper, side='left')
return np.arange(i // 2, (j + 1) // 2)





share|improve this answer



























    up vote
    5
    down vote













    return value



    In case there is no overlap, it would be cleaner to either return a sentinel value (None for example), or raise an exception, and don't trigger any side effects, like the print



    variable names



    ind, lb and ub are not clear variable names. you can use index, upper_bound and lower_bound. In this age of IDE's with code completion, there is no need to abbreviate the variable names like that



    algorithm



    In your algorithm you flatten intervals. There is no need for this, and you can do it a lot simpler if you let it be 2-dimensional



    naive



    If you use numpy anyway, then your algorithm can be made a lot simpler.



    intervals = np.array(intervals)


    Then you can just use comparison to check whether there is overlap



    smaller = np.array(intervals[:,0]) <= new_interval[1]
    larger = np.array(intervals[:,1]) > new_interval[0]


    This doesn't use the fact that intervals is, or can be, sorted, but comparison in numpy is generally rather quick, and probably faster than sorting.



    You have an intersection between intervals a and b when a[0] < b[1] and a[1] > b[0]



    To get the indices of the overlapping intervals you can use:



    intersection = smaller & larger

    if not intersection.any():
    return None # or: raise(Exception('No intersection')
    return np.argwhere(intersection)


    return np.argwhere(smaller & larger)


    sorted



    If you want to use the fact the intervals is sorted, you can use np.searchsorted, instead of the direct comparison



    intervals = np.sort(intervals, axis=0)
    lower_index = np.searchsorted(intervals[:,1], new_interval[0])
    upper_index = np.searchsorted(intervals[:,0], new_interval[1])


    If upper_index > lower_index, there is an overlap



    if upper_index <= lower_index:
    return None
    return intervals[lower_index: upper_index]


    if you need the indices, you can do return lower_index, upper_index or return np.arange(lower_index, upper_index). NB, this is on the sorted intervals






    share|improve this answer




















      Your Answer




      StackExchange.ifUsing("editor", function ()
      return StackExchange.using("mathjaxEditing", function ()
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
      );
      );
      , "mathjax-editing");

      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "196"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );






      jibounet is a new contributor. Be nice, and check out our Code of Conduct.









       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f203468%2ffind-the-intervals-which-have-a-non-empty-intersection-with-a-given-interval%23new-answer', 'question_page');

      );

      Post as a guest






























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      6
      down vote



      accepted










      1. Review



      1. There is no docstring. What does the function find_intersections do? The text in the post is pretty clear, so you could use this as the basis of a docstring. I would make it explicit that the intervals are open (if that's what you really intend), since in Python intervals are normally half-open.



      2. There seems to be some uncertainty about whether the input intervals is supposed to be already sorted. On the one hand, if it's already sorted then there's no point in calling sorted. But on the other, if it's not sorted then returning a range of indexes is not very useful, as these are indexes into _times, not into intervals.



        I recommend picking one possibility or the other: if intervals is supposed to be sorted, then don't sort it (you might assert that it is already sorted); but if not, then translate the returned indexes so that they refer to intervals.



      3. If there is no intersection, find_intersections prints a message. But if the caller is already printing some output, it would be inconvenient (at best) to have this kind of message appear in the middle of the output. Better to leave the printing of messages (or not) up to the caller.



      4. If there is an intersection, find_intersections returns a range of indexes. But if there are no intersection, find_intersections returns None. It is not a good idea to return an exceptional value like this, because it would be easy for the caller to forget to check for the exceptional value, and to write something like this:



        result = find_intersection(intervals, new_interval)
        print(f"Number of intersections = len(result).")


        which would fail with:



        TypeError: object of type 'NoneType' has no len()


        What they should have written is:



        result = find_intersection(intervals, new_interval)
        if result is None:
        print("Number of intersections = 0.")
        else:
        print(f"Number of intersections = len(result).")


        but this is very long-winded. In this case a simple improvement would be to return an empty range of indexes.



      5. If you're going to use NumPy, then you might as well use it wherever possible. So instead of itertools.chain.from_iterable, use numpy.reshape; instead of sorted, use numpy.sort; and so on.


      2. Revised code if intervals are unsorted



      If intervals is unsorted (as suggested by the call to sorted), then there's no point in sorting it. Sorting costs $O(nlog n)$ whereas comparing the query interval with every element of intervals only costs $O(n)$.



      So in this case we can take advantage of the fact that the open interval $(a, b)$ overlaps with the open interval $(c, d)$ if and only if $a < d$ and $c < b$, and write:



      def find_intersection(intervals, query):
      """Find intersections between intervals.
      Intervals are open and are represented as pairs (lower bound,
      upper bound).

      Arguments:
      intervals: array_like, shape=(N, 2) -- Array of intervals.
      query: array_like, shape=(2,) -- Interval to query.

      Returns:
      Array of indexes of intervals that overlap with query.

      """
      intervals = np.asarray(intervals)
      lower, upper = query
      return np.argwhere((lower < intervals[:, 1]) & (intervals[:, 0] < upper))


      3. Revised code if intervals are sorted



      If intervals is already sorted, as suggested by returning indexes into the sorted array, then take advantage of the side argument to numpy.searchsorted and the floor division operator, //:



      def find_intersection(intervals, query):
      """Find intersections between intervals.
      Intervals are open and are represented as pairs (lower bound,
      upper bound).

      Arguments:
      intervals: array_like, shape=(N, 2) -- Array of intervals.
      query: array_like, shape=(2,) -- Interval to query.

      Returns:
      Array of indexes of intervals that overlap with query.

      """
      endpoints = np.reshape(intervals, -1)
      lower, upper = query
      i = np.searchsorted(endpoints, lower, side='right')
      j = np.searchsorted(endpoints, upper, side='left')
      return np.arange(i // 2, (j + 1) // 2)





      share|improve this answer
























        up vote
        6
        down vote



        accepted










        1. Review



        1. There is no docstring. What does the function find_intersections do? The text in the post is pretty clear, so you could use this as the basis of a docstring. I would make it explicit that the intervals are open (if that's what you really intend), since in Python intervals are normally half-open.



        2. There seems to be some uncertainty about whether the input intervals is supposed to be already sorted. On the one hand, if it's already sorted then there's no point in calling sorted. But on the other, if it's not sorted then returning a range of indexes is not very useful, as these are indexes into _times, not into intervals.



          I recommend picking one possibility or the other: if intervals is supposed to be sorted, then don't sort it (you might assert that it is already sorted); but if not, then translate the returned indexes so that they refer to intervals.



        3. If there is no intersection, find_intersections prints a message. But if the caller is already printing some output, it would be inconvenient (at best) to have this kind of message appear in the middle of the output. Better to leave the printing of messages (or not) up to the caller.



        4. If there is an intersection, find_intersections returns a range of indexes. But if there are no intersection, find_intersections returns None. It is not a good idea to return an exceptional value like this, because it would be easy for the caller to forget to check for the exceptional value, and to write something like this:



          result = find_intersection(intervals, new_interval)
          print(f"Number of intersections = len(result).")


          which would fail with:



          TypeError: object of type 'NoneType' has no len()


          What they should have written is:



          result = find_intersection(intervals, new_interval)
          if result is None:
          print("Number of intersections = 0.")
          else:
          print(f"Number of intersections = len(result).")


          but this is very long-winded. In this case a simple improvement would be to return an empty range of indexes.



        5. If you're going to use NumPy, then you might as well use it wherever possible. So instead of itertools.chain.from_iterable, use numpy.reshape; instead of sorted, use numpy.sort; and so on.


        2. Revised code if intervals are unsorted



        If intervals is unsorted (as suggested by the call to sorted), then there's no point in sorting it. Sorting costs $O(nlog n)$ whereas comparing the query interval with every element of intervals only costs $O(n)$.



        So in this case we can take advantage of the fact that the open interval $(a, b)$ overlaps with the open interval $(c, d)$ if and only if $a < d$ and $c < b$, and write:



        def find_intersection(intervals, query):
        """Find intersections between intervals.
        Intervals are open and are represented as pairs (lower bound,
        upper bound).

        Arguments:
        intervals: array_like, shape=(N, 2) -- Array of intervals.
        query: array_like, shape=(2,) -- Interval to query.

        Returns:
        Array of indexes of intervals that overlap with query.

        """
        intervals = np.asarray(intervals)
        lower, upper = query
        return np.argwhere((lower < intervals[:, 1]) & (intervals[:, 0] < upper))


        3. Revised code if intervals are sorted



        If intervals is already sorted, as suggested by returning indexes into the sorted array, then take advantage of the side argument to numpy.searchsorted and the floor division operator, //:



        def find_intersection(intervals, query):
        """Find intersections between intervals.
        Intervals are open and are represented as pairs (lower bound,
        upper bound).

        Arguments:
        intervals: array_like, shape=(N, 2) -- Array of intervals.
        query: array_like, shape=(2,) -- Interval to query.

        Returns:
        Array of indexes of intervals that overlap with query.

        """
        endpoints = np.reshape(intervals, -1)
        lower, upper = query
        i = np.searchsorted(endpoints, lower, side='right')
        j = np.searchsorted(endpoints, upper, side='left')
        return np.arange(i // 2, (j + 1) // 2)





        share|improve this answer






















          up vote
          6
          down vote



          accepted







          up vote
          6
          down vote



          accepted






          1. Review



          1. There is no docstring. What does the function find_intersections do? The text in the post is pretty clear, so you could use this as the basis of a docstring. I would make it explicit that the intervals are open (if that's what you really intend), since in Python intervals are normally half-open.



          2. There seems to be some uncertainty about whether the input intervals is supposed to be already sorted. On the one hand, if it's already sorted then there's no point in calling sorted. But on the other, if it's not sorted then returning a range of indexes is not very useful, as these are indexes into _times, not into intervals.



            I recommend picking one possibility or the other: if intervals is supposed to be sorted, then don't sort it (you might assert that it is already sorted); but if not, then translate the returned indexes so that they refer to intervals.



          3. If there is no intersection, find_intersections prints a message. But if the caller is already printing some output, it would be inconvenient (at best) to have this kind of message appear in the middle of the output. Better to leave the printing of messages (or not) up to the caller.



          4. If there is an intersection, find_intersections returns a range of indexes. But if there are no intersection, find_intersections returns None. It is not a good idea to return an exceptional value like this, because it would be easy for the caller to forget to check for the exceptional value, and to write something like this:



            result = find_intersection(intervals, new_interval)
            print(f"Number of intersections = len(result).")


            which would fail with:



            TypeError: object of type 'NoneType' has no len()


            What they should have written is:



            result = find_intersection(intervals, new_interval)
            if result is None:
            print("Number of intersections = 0.")
            else:
            print(f"Number of intersections = len(result).")


            but this is very long-winded. In this case a simple improvement would be to return an empty range of indexes.



          5. If you're going to use NumPy, then you might as well use it wherever possible. So instead of itertools.chain.from_iterable, use numpy.reshape; instead of sorted, use numpy.sort; and so on.


          2. Revised code if intervals are unsorted



          If intervals is unsorted (as suggested by the call to sorted), then there's no point in sorting it. Sorting costs $O(nlog n)$ whereas comparing the query interval with every element of intervals only costs $O(n)$.



          So in this case we can take advantage of the fact that the open interval $(a, b)$ overlaps with the open interval $(c, d)$ if and only if $a < d$ and $c < b$, and write:



          def find_intersection(intervals, query):
          """Find intersections between intervals.
          Intervals are open and are represented as pairs (lower bound,
          upper bound).

          Arguments:
          intervals: array_like, shape=(N, 2) -- Array of intervals.
          query: array_like, shape=(2,) -- Interval to query.

          Returns:
          Array of indexes of intervals that overlap with query.

          """
          intervals = np.asarray(intervals)
          lower, upper = query
          return np.argwhere((lower < intervals[:, 1]) & (intervals[:, 0] < upper))


          3. Revised code if intervals are sorted



          If intervals is already sorted, as suggested by returning indexes into the sorted array, then take advantage of the side argument to numpy.searchsorted and the floor division operator, //:



          def find_intersection(intervals, query):
          """Find intersections between intervals.
          Intervals are open and are represented as pairs (lower bound,
          upper bound).

          Arguments:
          intervals: array_like, shape=(N, 2) -- Array of intervals.
          query: array_like, shape=(2,) -- Interval to query.

          Returns:
          Array of indexes of intervals that overlap with query.

          """
          endpoints = np.reshape(intervals, -1)
          lower, upper = query
          i = np.searchsorted(endpoints, lower, side='right')
          j = np.searchsorted(endpoints, upper, side='left')
          return np.arange(i // 2, (j + 1) // 2)





          share|improve this answer












          1. Review



          1. There is no docstring. What does the function find_intersections do? The text in the post is pretty clear, so you could use this as the basis of a docstring. I would make it explicit that the intervals are open (if that's what you really intend), since in Python intervals are normally half-open.



          2. There seems to be some uncertainty about whether the input intervals is supposed to be already sorted. On the one hand, if it's already sorted then there's no point in calling sorted. But on the other, if it's not sorted then returning a range of indexes is not very useful, as these are indexes into _times, not into intervals.



            I recommend picking one possibility or the other: if intervals is supposed to be sorted, then don't sort it (you might assert that it is already sorted); but if not, then translate the returned indexes so that they refer to intervals.



          3. If there is no intersection, find_intersections prints a message. But if the caller is already printing some output, it would be inconvenient (at best) to have this kind of message appear in the middle of the output. Better to leave the printing of messages (or not) up to the caller.



          4. If there is an intersection, find_intersections returns a range of indexes. But if there are no intersection, find_intersections returns None. It is not a good idea to return an exceptional value like this, because it would be easy for the caller to forget to check for the exceptional value, and to write something like this:



            result = find_intersection(intervals, new_interval)
            print(f"Number of intersections = len(result).")


            which would fail with:



            TypeError: object of type 'NoneType' has no len()


            What they should have written is:



            result = find_intersection(intervals, new_interval)
            if result is None:
            print("Number of intersections = 0.")
            else:
            print(f"Number of intersections = len(result).")


            but this is very long-winded. In this case a simple improvement would be to return an empty range of indexes.



          5. If you're going to use NumPy, then you might as well use it wherever possible. So instead of itertools.chain.from_iterable, use numpy.reshape; instead of sorted, use numpy.sort; and so on.


          2. Revised code if intervals are unsorted



          If intervals is unsorted (as suggested by the call to sorted), then there's no point in sorting it. Sorting costs $O(nlog n)$ whereas comparing the query interval with every element of intervals only costs $O(n)$.



          So in this case we can take advantage of the fact that the open interval $(a, b)$ overlaps with the open interval $(c, d)$ if and only if $a < d$ and $c < b$, and write:



          def find_intersection(intervals, query):
          """Find intersections between intervals.
          Intervals are open and are represented as pairs (lower bound,
          upper bound).

          Arguments:
          intervals: array_like, shape=(N, 2) -- Array of intervals.
          query: array_like, shape=(2,) -- Interval to query.

          Returns:
          Array of indexes of intervals that overlap with query.

          """
          intervals = np.asarray(intervals)
          lower, upper = query
          return np.argwhere((lower < intervals[:, 1]) & (intervals[:, 0] < upper))


          3. Revised code if intervals are sorted



          If intervals is already sorted, as suggested by returning indexes into the sorted array, then take advantage of the side argument to numpy.searchsorted and the floor division operator, //:



          def find_intersection(intervals, query):
          """Find intersections between intervals.
          Intervals are open and are represented as pairs (lower bound,
          upper bound).

          Arguments:
          intervals: array_like, shape=(N, 2) -- Array of intervals.
          query: array_like, shape=(2,) -- Interval to query.

          Returns:
          Array of indexes of intervals that overlap with query.

          """
          endpoints = np.reshape(intervals, -1)
          lower, upper = query
          i = np.searchsorted(endpoints, lower, side='right')
          j = np.searchsorted(endpoints, upper, side='left')
          return np.arange(i // 2, (j + 1) // 2)






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 2 days ago









          Gareth Rees

          41.8k394168




          41.8k394168






















              up vote
              5
              down vote













              return value



              In case there is no overlap, it would be cleaner to either return a sentinel value (None for example), or raise an exception, and don't trigger any side effects, like the print



              variable names



              ind, lb and ub are not clear variable names. you can use index, upper_bound and lower_bound. In this age of IDE's with code completion, there is no need to abbreviate the variable names like that



              algorithm



              In your algorithm you flatten intervals. There is no need for this, and you can do it a lot simpler if you let it be 2-dimensional



              naive



              If you use numpy anyway, then your algorithm can be made a lot simpler.



              intervals = np.array(intervals)


              Then you can just use comparison to check whether there is overlap



              smaller = np.array(intervals[:,0]) <= new_interval[1]
              larger = np.array(intervals[:,1]) > new_interval[0]


              This doesn't use the fact that intervals is, or can be, sorted, but comparison in numpy is generally rather quick, and probably faster than sorting.



              You have an intersection between intervals a and b when a[0] < b[1] and a[1] > b[0]



              To get the indices of the overlapping intervals you can use:



              intersection = smaller & larger

              if not intersection.any():
              return None # or: raise(Exception('No intersection')
              return np.argwhere(intersection)


              return np.argwhere(smaller & larger)


              sorted



              If you want to use the fact the intervals is sorted, you can use np.searchsorted, instead of the direct comparison



              intervals = np.sort(intervals, axis=0)
              lower_index = np.searchsorted(intervals[:,1], new_interval[0])
              upper_index = np.searchsorted(intervals[:,0], new_interval[1])


              If upper_index > lower_index, there is an overlap



              if upper_index <= lower_index:
              return None
              return intervals[lower_index: upper_index]


              if you need the indices, you can do return lower_index, upper_index or return np.arange(lower_index, upper_index). NB, this is on the sorted intervals






              share|improve this answer
























                up vote
                5
                down vote













                return value



                In case there is no overlap, it would be cleaner to either return a sentinel value (None for example), or raise an exception, and don't trigger any side effects, like the print



                variable names



                ind, lb and ub are not clear variable names. you can use index, upper_bound and lower_bound. In this age of IDE's with code completion, there is no need to abbreviate the variable names like that



                algorithm



                In your algorithm you flatten intervals. There is no need for this, and you can do it a lot simpler if you let it be 2-dimensional



                naive



                If you use numpy anyway, then your algorithm can be made a lot simpler.



                intervals = np.array(intervals)


                Then you can just use comparison to check whether there is overlap



                smaller = np.array(intervals[:,0]) <= new_interval[1]
                larger = np.array(intervals[:,1]) > new_interval[0]


                This doesn't use the fact that intervals is, or can be, sorted, but comparison in numpy is generally rather quick, and probably faster than sorting.



                You have an intersection between intervals a and b when a[0] < b[1] and a[1] > b[0]



                To get the indices of the overlapping intervals you can use:



                intersection = smaller & larger

                if not intersection.any():
                return None # or: raise(Exception('No intersection')
                return np.argwhere(intersection)


                return np.argwhere(smaller & larger)


                sorted



                If you want to use the fact the intervals is sorted, you can use np.searchsorted, instead of the direct comparison



                intervals = np.sort(intervals, axis=0)
                lower_index = np.searchsorted(intervals[:,1], new_interval[0])
                upper_index = np.searchsorted(intervals[:,0], new_interval[1])


                If upper_index > lower_index, there is an overlap



                if upper_index <= lower_index:
                return None
                return intervals[lower_index: upper_index]


                if you need the indices, you can do return lower_index, upper_index or return np.arange(lower_index, upper_index). NB, this is on the sorted intervals






                share|improve this answer






















                  up vote
                  5
                  down vote










                  up vote
                  5
                  down vote









                  return value



                  In case there is no overlap, it would be cleaner to either return a sentinel value (None for example), or raise an exception, and don't trigger any side effects, like the print



                  variable names



                  ind, lb and ub are not clear variable names. you can use index, upper_bound and lower_bound. In this age of IDE's with code completion, there is no need to abbreviate the variable names like that



                  algorithm



                  In your algorithm you flatten intervals. There is no need for this, and you can do it a lot simpler if you let it be 2-dimensional



                  naive



                  If you use numpy anyway, then your algorithm can be made a lot simpler.



                  intervals = np.array(intervals)


                  Then you can just use comparison to check whether there is overlap



                  smaller = np.array(intervals[:,0]) <= new_interval[1]
                  larger = np.array(intervals[:,1]) > new_interval[0]


                  This doesn't use the fact that intervals is, or can be, sorted, but comparison in numpy is generally rather quick, and probably faster than sorting.



                  You have an intersection between intervals a and b when a[0] < b[1] and a[1] > b[0]



                  To get the indices of the overlapping intervals you can use:



                  intersection = smaller & larger

                  if not intersection.any():
                  return None # or: raise(Exception('No intersection')
                  return np.argwhere(intersection)


                  return np.argwhere(smaller & larger)


                  sorted



                  If you want to use the fact the intervals is sorted, you can use np.searchsorted, instead of the direct comparison



                  intervals = np.sort(intervals, axis=0)
                  lower_index = np.searchsorted(intervals[:,1], new_interval[0])
                  upper_index = np.searchsorted(intervals[:,0], new_interval[1])


                  If upper_index > lower_index, there is an overlap



                  if upper_index <= lower_index:
                  return None
                  return intervals[lower_index: upper_index]


                  if you need the indices, you can do return lower_index, upper_index or return np.arange(lower_index, upper_index). NB, this is on the sorted intervals






                  share|improve this answer












                  return value



                  In case there is no overlap, it would be cleaner to either return a sentinel value (None for example), or raise an exception, and don't trigger any side effects, like the print



                  variable names



                  ind, lb and ub are not clear variable names. you can use index, upper_bound and lower_bound. In this age of IDE's with code completion, there is no need to abbreviate the variable names like that



                  algorithm



                  In your algorithm you flatten intervals. There is no need for this, and you can do it a lot simpler if you let it be 2-dimensional



                  naive



                  If you use numpy anyway, then your algorithm can be made a lot simpler.



                  intervals = np.array(intervals)


                  Then you can just use comparison to check whether there is overlap



                  smaller = np.array(intervals[:,0]) <= new_interval[1]
                  larger = np.array(intervals[:,1]) > new_interval[0]


                  This doesn't use the fact that intervals is, or can be, sorted, but comparison in numpy is generally rather quick, and probably faster than sorting.



                  You have an intersection between intervals a and b when a[0] < b[1] and a[1] > b[0]



                  To get the indices of the overlapping intervals you can use:



                  intersection = smaller & larger

                  if not intersection.any():
                  return None # or: raise(Exception('No intersection')
                  return np.argwhere(intersection)


                  return np.argwhere(smaller & larger)


                  sorted



                  If you want to use the fact the intervals is sorted, you can use np.searchsorted, instead of the direct comparison



                  intervals = np.sort(intervals, axis=0)
                  lower_index = np.searchsorted(intervals[:,1], new_interval[0])
                  upper_index = np.searchsorted(intervals[:,0], new_interval[1])


                  If upper_index > lower_index, there is an overlap



                  if upper_index <= lower_index:
                  return None
                  return intervals[lower_index: upper_index]


                  if you need the indices, you can do return lower_index, upper_index or return np.arange(lower_index, upper_index). NB, this is on the sorted intervals







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 2 days ago









                  Maarten Fabré

                  3,504214




                  3,504214




















                      jibounet is a new contributor. Be nice, and check out our Code of Conduct.









                       

                      draft saved


                      draft discarded


















                      jibounet is a new contributor. Be nice, and check out our Code of Conduct.












                      jibounet is a new contributor. Be nice, and check out our Code of Conduct.











                      jibounet is a new contributor. Be nice, and check out our Code of Conduct.













                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f203468%2ffind-the-intervals-which-have-a-non-empty-intersection-with-a-given-interval%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Comments

                      Popular posts from this blog

                      What does second last employer means? [closed]

                      Installing NextGIS Connect into QGIS 3?

                      One-line joke