Generic way to remove all duplicates from a not-sorted container

up vote
3
down vote

favorite

Here's an interesting article called "How to Remove Elements from a Sequence Container in C++". At some point, the author also explains how to remove duplicates from a container, but only with the very restrictive condition that those duplicates are adjacent. I've given a bit of thoughts on how to make a generic algorithm that would work on any duplicate in the container, and it's a bit more complicated that one might think at first.

The naively obvious solution is:

template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 auto it = std::next(first);
 while (first != last) 
 it = std::find(it, last, *first);
 if (it == last) it = std::next(++first);
 else std::rotate(it, std::next(it), last--);
 
 return last;

and it looks a lot like an implementation of a STL's algorithm, but it also is a polynomial-time algorithm, which is only acceptable if there is no alternative. But you generally keep track of already encountered values in a separate container in order to attain linear or almost-linear time. Sets are good candidates but they come in two flavors, and the std::unordered_set being more efficient than the simple std::set must be used when possible.

I've used concepts (gcc > 6.3 && -fconcepts) to dispatch to the most efficient overload:

template <typename T>
concept bool Hashable = requires 
 std::hash<T>();
;

template <typename T>
concept bool Comparable = requires (T a, T b) 
 a == b -> bool;
;

template <typename T>
concept bool Orderable = requires (T a, T b) 
 a < b -> bool;
;

unordered_sets require the key to be hashable and comparable, and sets that it be orderable. After code factorization it gives:

template <typename Iterator, typename Set>
Iterator remove_duplicates_impl(Iterator first, Iterator last, Set& known_values) 
 while (first != last) 
 if (known_values.find(*first) != known_values.end()) std::rotate(first, std::next(first), last--);
 else known_values.insert(*first++);
 
 return last;


template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 requires Orderable<Value_type<Iterator>> && !Hashable<Value_type<Iterator>>

 std::set<Value_type<Iterator>> known_values;
 return remove_duplicates_impl(first, last, known_values);



template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 requires Hashable<Value_type<Iterator>> && Comparable<Value_type<Iterator>>

 std::unordered_set<Value_type<Iterator>> known_values;
 return remove_duplicates_impl(first, last, known_values);

So what's missing here, be it optimizations or corner-cases handling?

A minimal working example:

#include <iostream>
#include <algorithm>
#include <vector>
#include <unordered_set>
#include <set>
#include <iterator>
#include <type_traits>

template <typename Iterator>
using Value_type = typename std::iterator_traits<Iterator>::value_type;

template <typename T>
concept bool Hashable = requires 
 std::hash<T>();
;

template <typename T>
concept bool Comparable = requires (T a, T b) 
 a == b -> bool;
;

template <typename T>
concept bool Orderable = requires (T a, T b) 
 a < b -> bool;
;

template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 auto it = std::next(first);
 while (first != last) 
 it = std::find(it, last, *first);
 if (it == last) it = std::next(++first);
 else std::rotate(it, std::next(it), last--);
 
 return last;


template <typename Iterator, typename Set>
Iterator remove_duplicates_impl(Iterator first, Iterator last, Set& known_values) 
 while (first != last) 
 if (known_values.find(*first) != known_values.end()) std::rotate(first, std::next(first), last--);
 else known_values.insert(*first++);
 
 return last;


template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 requires Orderable<Value_type<Iterator>> && !Hashable<Value_type<Iterator>>

 std::set<Value_type<Iterator>> known_values;
 return remove_duplicates_impl(first, last, known_values);



template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 requires Hashable<Value_type<Iterator>>

 std::unordered_set<Value_type<Iterator>> known_values;
 return remove_duplicates_impl(first, last, known_values);


struct Foo ;
bool operator==(Foo, Foo) return true; 

int main() 

 std::vector<int> data 1,2,3,4,5,6,7,8,9,8,7,6,5,2,4,3,1,8 ;
 std::vector<int> unik(data.begin(), remove_duplicates(data.begin(), data.end()));
 for (auto i : unik) std::cout << i << ' '; 
 std::cout << std::endl;

 std::vector<std::pair<int, int>> test 1,2, 3,4 , 5,6 , 7,8, 7, 8 ;
 [[maybe_unused]]
 auto it = remove_duplicates(test.begin(), test.end());

 std::vector<Foo> test2 Foo(), Foo(), Foo() ;
 [[maybe_unused]]
 auto it2 = remove_duplicates(test2.begin(), test2.end());

asked 3 hours ago

papagaga

3,389217

You don't say what kind of performance measurement you've done - is there a clear improvement over the naive algorithm that requires no extra storage? I suspect that if duplicates abound, then std::rotate is likely to dominate the work. Two obvious test cases suggest themselves - one with all elements different (e.g. using std::iota() and std::shuffle()) and one with all elements identical. Obviously, the timings will depend on the element type, and the cost of its move assignment, so that immediately doubles the test cases to try.
â€“Â Toby Speight
2 hours ago

add a commentÂ |Â

up vote
3
down vote

favorite

The naively obvious solution is:

template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 auto it = std::next(first);
 while (first != last) 
 it = std::find(it, last, *first);
 if (it == last) it = std::next(++first);
 else std::rotate(it, std::next(it), last--);
 
 return last;

I've used concepts (gcc > 6.3 && -fconcepts) to dispatch to the most efficient overload:

template <typename T>
concept bool Hashable = requires 
 std::hash<T>();
;

template <typename T>
concept bool Comparable = requires (T a, T b) 
 a == b -> bool;
;

template <typename T>
concept bool Orderable = requires (T a, T b) 
 a < b -> bool;
;

unordered_sets require the key to be hashable and comparable, and sets that it be orderable. After code factorization it gives:

template <typename Iterator, typename Set>
Iterator remove_duplicates_impl(Iterator first, Iterator last, Set& known_values) 
 while (first != last) 
 if (known_values.find(*first) != known_values.end()) std::rotate(first, std::next(first), last--);
 else known_values.insert(*first++);
 
 return last;


template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 requires Orderable<Value_type<Iterator>> && !Hashable<Value_type<Iterator>>

 std::set<Value_type<Iterator>> known_values;
 return remove_duplicates_impl(first, last, known_values);



template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 requires Hashable<Value_type<Iterator>> && Comparable<Value_type<Iterator>>

 std::unordered_set<Value_type<Iterator>> known_values;
 return remove_duplicates_impl(first, last, known_values);

So what's missing here, be it optimizations or corner-cases handling?

A minimal working example:

#include <iostream>
#include <algorithm>
#include <vector>
#include <unordered_set>
#include <set>
#include <iterator>
#include <type_traits>

template <typename Iterator>
using Value_type = typename std::iterator_traits<Iterator>::value_type;

template <typename T>
concept bool Hashable = requires 
 std::hash<T>();
;

template <typename T>
concept bool Comparable = requires (T a, T b) 
 a == b -> bool;
;

template <typename T>
concept bool Orderable = requires (T a, T b) 
 a < b -> bool;
;

template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 auto it = std::next(first);
 while (first != last) 
 it = std::find(it, last, *first);
 if (it == last) it = std::next(++first);
 else std::rotate(it, std::next(it), last--);
 
 return last;


template <typename Iterator, typename Set>
Iterator remove_duplicates_impl(Iterator first, Iterator last, Set& known_values) 
 while (first != last) 
 if (known_values.find(*first) != known_values.end()) std::rotate(first, std::next(first), last--);
 else known_values.insert(*first++);
 
 return last;


template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 requires Orderable<Value_type<Iterator>> && !Hashable<Value_type<Iterator>>

 std::set<Value_type<Iterator>> known_values;
 return remove_duplicates_impl(first, last, known_values);



template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 requires Hashable<Value_type<Iterator>>

 std::unordered_set<Value_type<Iterator>> known_values;
 return remove_duplicates_impl(first, last, known_values);


struct Foo ;
bool operator==(Foo, Foo) return true; 

int main() 

 std::vector<int> data 1,2,3,4,5,6,7,8,9,8,7,6,5,2,4,3,1,8 ;
 std::vector<int> unik(data.begin(), remove_duplicates(data.begin(), data.end()));
 for (auto i : unik) std::cout << i << ' '; 
 std::cout << std::endl;

 std::vector<std::pair<int, int>> test 1,2, 3,4 , 5,6 , 7,8, 7, 8 ;
 [[maybe_unused]]
 auto it = remove_duplicates(test.begin(), test.end());

 std::vector<Foo> test2 Foo(), Foo(), Foo() ;
 [[maybe_unused]]
 auto it2 = remove_duplicates(test2.begin(), test2.end());

asked 3 hours ago

papagaga

3,389217

You don't say what kind of performance measurement you've done - is there a clear improvement over the naive algorithm that requires no extra storage? I suspect that if duplicates abound, then std::rotate is likely to dominate the work. Two obvious test cases suggest themselves - one with all elements different (e.g. using std::iota() and std::shuffle()) and one with all elements identical. Obviously, the timings will depend on the element type, and the cost of its move assignment, so that immediately doubles the test cases to try.
â€“Â Toby Speight
2 hours ago

add a commentÂ |Â

up vote
3
down vote

favorite

The naively obvious solution is:

template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 auto it = std::next(first);
 while (first != last) 
 it = std::find(it, last, *first);
 if (it == last) it = std::next(++first);
 else std::rotate(it, std::next(it), last--);
 
 return last;

I've used concepts (gcc > 6.3 && -fconcepts) to dispatch to the most efficient overload:

template <typename T>
concept bool Hashable = requires 
 std::hash<T>();
;

template <typename T>
concept bool Comparable = requires (T a, T b) 
 a == b -> bool;
;

template <typename T>
concept bool Orderable = requires (T a, T b) 
 a < b -> bool;
;

unordered_sets require the key to be hashable and comparable, and sets that it be orderable. After code factorization it gives:

template <typename Iterator, typename Set>
Iterator remove_duplicates_impl(Iterator first, Iterator last, Set& known_values) 
 while (first != last) 
 if (known_values.find(*first) != known_values.end()) std::rotate(first, std::next(first), last--);
 else known_values.insert(*first++);
 
 return last;


template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 requires Orderable<Value_type<Iterator>> && !Hashable<Value_type<Iterator>>

 std::set<Value_type<Iterator>> known_values;
 return remove_duplicates_impl(first, last, known_values);



template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 requires Hashable<Value_type<Iterator>> && Comparable<Value_type<Iterator>>

 std::unordered_set<Value_type<Iterator>> known_values;
 return remove_duplicates_impl(first, last, known_values);

So what's missing here, be it optimizations or corner-cases handling?

A minimal working example:

#include <iostream>
#include <algorithm>
#include <vector>
#include <unordered_set>
#include <set>
#include <iterator>
#include <type_traits>

template <typename Iterator>
using Value_type = typename std::iterator_traits<Iterator>::value_type;

template <typename T>
concept bool Hashable = requires 
 std::hash<T>();
;

template <typename T>
concept bool Comparable = requires (T a, T b) 
 a == b -> bool;
;

template <typename T>
concept bool Orderable = requires (T a, T b) 
 a < b -> bool;
;

template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 auto it = std::next(first);
 while (first != last) 
 it = std::find(it, last, *first);
 if (it == last) it = std::next(++first);
 else std::rotate(it, std::next(it), last--);
 
 return last;


template <typename Iterator, typename Set>
Iterator remove_duplicates_impl(Iterator first, Iterator last, Set& known_values) 
 while (first != last) 
 if (known_values.find(*first) != known_values.end()) std::rotate(first, std::next(first), last--);
 else known_values.insert(*first++);
 
 return last;


template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 requires Orderable<Value_type<Iterator>> && !Hashable<Value_type<Iterator>>

 std::set<Value_type<Iterator>> known_values;
 return remove_duplicates_impl(first, last, known_values);



template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 requires Hashable<Value_type<Iterator>>

 std::unordered_set<Value_type<Iterator>> known_values;
 return remove_duplicates_impl(first, last, known_values);


struct Foo ;
bool operator==(Foo, Foo) return true; 

int main() 

 std::vector<int> data 1,2,3,4,5,6,7,8,9,8,7,6,5,2,4,3,1,8 ;
 std::vector<int> unik(data.begin(), remove_duplicates(data.begin(), data.end()));
 for (auto i : unik) std::cout << i << ' '; 
 std::cout << std::endl;

 std::vector<std::pair<int, int>> test 1,2, 3,4 , 5,6 , 7,8, 7, 8 ;
 [[maybe_unused]]
 auto it = remove_duplicates(test.begin(), test.end());

 std::vector<Foo> test2 Foo(), Foo(), Foo() ;
 [[maybe_unused]]
 auto it2 = remove_duplicates(test2.begin(), test2.end());

asked 3 hours ago

papagaga

3,389217

The naively obvious solution is:

template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 auto it = std::next(first);
 while (first != last) 
 it = std::find(it, last, *first);
 if (it == last) it = std::next(++first);
 else std::rotate(it, std::next(it), last--);
 
 return last;

I've used concepts (gcc > 6.3 && -fconcepts) to dispatch to the most efficient overload:

template <typename T>
concept bool Hashable = requires 
 std::hash<T>();
;

template <typename T>
concept bool Comparable = requires (T a, T b) 
 a == b -> bool;
;

template <typename T>
concept bool Orderable = requires (T a, T b) 
 a < b -> bool;
;

unordered_sets require the key to be hashable and comparable, and sets that it be orderable. After code factorization it gives:

template <typename Iterator, typename Set>
Iterator remove_duplicates_impl(Iterator first, Iterator last, Set& known_values) 
 while (first != last) 
 if (known_values.find(*first) != known_values.end()) std::rotate(first, std::next(first), last--);
 else known_values.insert(*first++);
 
 return last;


template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 requires Orderable<Value_type<Iterator>> && !Hashable<Value_type<Iterator>>

 std::set<Value_type<Iterator>> known_values;
 return remove_duplicates_impl(first, last, known_values);



template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 requires Hashable<Value_type<Iterator>> && Comparable<Value_type<Iterator>>

 std::unordered_set<Value_type<Iterator>> known_values;
 return remove_duplicates_impl(first, last, known_values);

So what's missing here, be it optimizations or corner-cases handling?

A minimal working example:

#include <iostream>
#include <algorithm>
#include <vector>
#include <unordered_set>
#include <set>
#include <iterator>
#include <type_traits>

template <typename Iterator>
using Value_type = typename std::iterator_traits<Iterator>::value_type;

template <typename T>
concept bool Hashable = requires 
 std::hash<T>();
;

template <typename T>
concept bool Comparable = requires (T a, T b) 
 a == b -> bool;
;

template <typename T>
concept bool Orderable = requires (T a, T b) 
 a < b -> bool;
;

template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 auto it = std::next(first);
 while (first != last) 
 it = std::find(it, last, *first);
 if (it == last) it = std::next(++first);
 else std::rotate(it, std::next(it), last--);
 
 return last;


template <typename Iterator, typename Set>
Iterator remove_duplicates_impl(Iterator first, Iterator last, Set& known_values) 
 while (first != last) 
 if (known_values.find(*first) != known_values.end()) std::rotate(first, std::next(first), last--);
 else known_values.insert(*first++);
 
 return last;


template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 requires Orderable<Value_type<Iterator>> && !Hashable<Value_type<Iterator>>

 std::set<Value_type<Iterator>> known_values;
 return remove_duplicates_impl(first, last, known_values);



template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last) 
 requires Hashable<Value_type<Iterator>>

 std::unordered_set<Value_type<Iterator>> known_values;
 return remove_duplicates_impl(first, last, known_values);


struct Foo ;
bool operator==(Foo, Foo) return true; 

int main() 

 std::vector<int> data 1,2,3,4,5,6,7,8,9,8,7,6,5,2,4,3,1,8 ;
 std::vector<int> unik(data.begin(), remove_duplicates(data.begin(), data.end()));
 for (auto i : unik) std::cout << i << ' '; 
 std::cout << std::endl;

 std::vector<std::pair<int, int>> test 1,2, 3,4 , 5,6 , 7,8, 7, 8 ;
 [[maybe_unused]]
 auto it = remove_duplicates(test.begin(), test.end());

 std::vector<Foo> test2 Foo(), Foo(), Foo() ;
 [[maybe_unused]]
 auto it2 = remove_duplicates(test2.begin(), test2.end());

c++ c++20

asked 3 hours ago

papagaga

3,389217

asked 3 hours ago

papagaga

3,389217

asked 3 hours ago

papagaga

3,389217

asked 3 hours ago

papagaga

3,389217

asked 3 hours ago

papagaga

3,389217

You don't say what kind of performance measurement you've done - is there a clear improvement over the naive algorithm that requires no extra storage? I suspect that if duplicates abound, then std::rotate is likely to dominate the work. Two obvious test cases suggest themselves - one with all elements different (e.g. using std::iota() and std::shuffle()) and one with all elements identical. Obviously, the timings will depend on the element type, and the cost of its move assignment, so that immediately doubles the test cases to try.
â€“Â Toby Speight
2 hours ago

add a commentÂ |Â

You don't say what kind of performance measurement you've done - is there a clear improvement over the naive algorithm that requires no extra storage? I suspect that if duplicates abound, then std::rotate is likely to dominate the work. Two obvious test cases suggest themselves - one with all elements different (e.g. using std::iota() and std::shuffle()) and one with all elements identical. Obviously, the timings will depend on the element type, and the cost of its move assignment, so that immediately doubles the test cases to try.
â€“Â Toby Speight
2 hours ago

You don't say what kind of performance measurement you've done - is there a clear improvement over the naive algorithm that requires no extra storage? I suspect that if duplicates abound, then std::rotate is likely to dominate the work. Two obvious test cases suggest themselves - one with all elements different (e.g. using std::iota() and std::shuffle()) and one with all elements identical. Obviously, the timings will depend on the element type, and the cost of its move assignment, so that immediately doubles the test cases to try.
â€“Â Toby Speight
2 hours ago

add a commentÂ |Â

2 Answers
2

active

oldest

votes

up vote
3
down vote

Your naive solution is quadratic time, O(n^2) and O(1) additional memory. Note, polynomial time doesn't say much about the actual time complexity other than it's not exponential or factorial.

Your improved version is O(nlogn) time for the implementation with set and O(n) for the one with unordered_set and both come with the additional cost of O(n) memory.

However the hash version is not necessarily faster than the set version depending on the cost of the hash function and value of n, you should allow the user the option of choosing a specific implementation if they want.

I would probably call the functions something like remove_duplicates_stable to imply the property you are designing for that is preserving the original order and keeping the first observed value. Which by the way differs between the naive implementation and the others, the naive one keeps the last of the dupes.

If keeping the order of the elements is not important I would probably sort the source array and use std unique or something. Because sorting can make use of move construction to avoid possibly expensive copy construction of all non duplicate elements which is necessary in your implementation, requires no additional space, and is O(nlogn). But again it depends on if copy construction is expensive and of the hash function is expensive.

I would also put the implementation in a separate namespace, like "detail" but this is just personal style.

edited 3 hours ago

answered 3 hours ago

Emily L.

11.1k12372

add a commentÂ |Â

up vote
1
down vote

Regardless of which kind of set you use to keep track, you'll be spending a lot of time in std::rotate when given an input like this:

std::array<int, 10000>()

Yes, all the elements compare equal, and we dutifully rotate each one individually, resulting in (I think) Ã‚Â¼nÃ‚Â² move operations. This is a problem with the "naive method", too, but not with std::unique(). If we're going to use extra storage, it might be more productive to use at least some of it to selectively copy input values to a new collection, and then swap() that new collection back into place.

I know it's not really the code for review, but I can't resist pointing out that you could have written the naive version more clearly with std::remove(), like this:

template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last)

 while (first != last) 
 auto const& value = *first;
 last = std::remove(first = std::next(first), last, value);
 
 return last;

edited 1 hour ago

answered 2 hours ago

Toby Speight

19.2k43697

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f203991%2fgeneric-way-to-remove-all-duplicates-from-a-not-sorted-container%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
3
down vote

Your naive solution is quadratic time, O(n^2) and O(1) additional memory. Note, polynomial time doesn't say much about the actual time complexity other than it's not exponential or factorial.

Your improved version is O(nlogn) time for the implementation with set and O(n) for the one with unordered_set and both come with the additional cost of O(n) memory.

I would also put the implementation in a separate namespace, like "detail" but this is just personal style.

edited 3 hours ago

answered 3 hours ago

Emily L.

11.1k12372

add a commentÂ |Â

up vote
3
down vote

Your naive solution is quadratic time, O(n^2) and O(1) additional memory. Note, polynomial time doesn't say much about the actual time complexity other than it's not exponential or factorial.

Your improved version is O(nlogn) time for the implementation with set and O(n) for the one with unordered_set and both come with the additional cost of O(n) memory.

I would also put the implementation in a separate namespace, like "detail" but this is just personal style.

edited 3 hours ago

answered 3 hours ago

Emily L.

11.1k12372

add a commentÂ |Â

up vote
3
down vote

Your naive solution is quadratic time, O(n^2) and O(1) additional memory. Note, polynomial time doesn't say much about the actual time complexity other than it's not exponential or factorial.

Your improved version is O(nlogn) time for the implementation with set and O(n) for the one with unordered_set and both come with the additional cost of O(n) memory.

I would also put the implementation in a separate namespace, like "detail" but this is just personal style.

edited 3 hours ago

answered 3 hours ago

Emily L.

11.1k12372

Your naive solution is quadratic time, O(n^2) and O(1) additional memory. Note, polynomial time doesn't say much about the actual time complexity other than it's not exponential or factorial.

Your improved version is O(nlogn) time for the implementation with set and O(n) for the one with unordered_set and both come with the additional cost of O(n) memory.

I would also put the implementation in a separate namespace, like "detail" but this is just personal style.

edited 3 hours ago

answered 3 hours ago

Emily L.

11.1k12372

edited 3 hours ago

answered 3 hours ago

Emily L.

11.1k12372

answered 3 hours ago

Emily L.

11.1k12372

answered 3 hours ago

Emily L.

11.1k12372

add a commentÂ |Â

up vote
1
down vote

Regardless of which kind of set you use to keep track, you'll be spending a lot of time in std::rotate when given an input like this:

std::array<int, 10000>()

I know it's not really the code for review, but I can't resist pointing out that you could have written the naive version more clearly with std::remove(), like this:

template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last)

 while (first != last) 
 auto const& value = *first;
 last = std::remove(first = std::next(first), last, value);
 
 return last;

edited 1 hour ago

answered 2 hours ago

Toby Speight

19.2k43697

add a commentÂ |Â

up vote
1
down vote

Regardless of which kind of set you use to keep track, you'll be spending a lot of time in std::rotate when given an input like this:

std::array<int, 10000>()

I know it's not really the code for review, but I can't resist pointing out that you could have written the naive version more clearly with std::remove(), like this:

template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last)

 while (first != last) 
 auto const& value = *first;
 last = std::remove(first = std::next(first), last, value);
 
 return last;

edited 1 hour ago

answered 2 hours ago

Toby Speight

19.2k43697

add a commentÂ |Â

up vote
1
down vote

Regardless of which kind of set you use to keep track, you'll be spending a lot of time in std::rotate when given an input like this:

std::array<int, 10000>()

I know it's not really the code for review, but I can't resist pointing out that you could have written the naive version more clearly with std::remove(), like this:

template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last)

 while (first != last) 
 auto const& value = *first;
 last = std::remove(first = std::next(first), last, value);
 
 return last;

edited 1 hour ago

answered 2 hours ago

Toby Speight

19.2k43697

Regardless of which kind of set you use to keep track, you'll be spending a lot of time in std::rotate when given an input like this:

std::array<int, 10000>()

I know it's not really the code for review, but I can't resist pointing out that you could have written the naive version more clearly with std::remove(), like this:

template <typename Iterator>
Iterator remove_duplicates(Iterator first, Iterator last)

 while (first != last) 
 auto const& value = *first;
 last = std::remove(first = std::next(first), last, value);
 
 return last;

edited 1 hour ago

answered 2 hours ago

Toby Speight

19.2k43697

edited 1 hour ago

answered 2 hours ago

Toby Speight

19.2k43697

answered 2 hours ago

Toby Speight

19.2k43697

answered 2 hours ago

Toby Speight

19.2k43697

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

Iyfjky