up vote
1
down vote

favorite

I have a task to write a concurrent version of quicksort algorithm, to speed it up. This is what I wrote:

template <class T>
void quicksort::sort(T *data, int len) 
 thread_count = 0;
 int p = 0;
 int r = len - 1;
 _sort(data, p, r);


template <class T>
void quicksort::_swap(T *data, int first, int second) 
 auto temp = data[first];
 data[first] = data[second];
 data[second] = temp;


template <class T>
void quicksort::_sort(T *data, int p, int r) 
 thread_count++;
 if (p < r) 
 auto q = _partition(data, p, r);
 if (thread_count >= thread_limit) 
 _sort(data, p, q - 1);
 _sort(data, q + 1, r);
 else 
 std::thread lower(&quicksort::_sort<T>, this, data, p, q - 1);
 std::thread upper(&quicksort::_sort<T>, this, data, q + 1, r);
 lower.join();
 upper.join();
 
 
 thread_count--;


template <class T>
int quicksort::_partition(T *data, int p, int r) 
 auto first = data[p];
 auto last = data[r];
 T pivot = 0;
 // Median of three
 auto middle = data[(p + 2) / 2];
 if (first > middle) 
 if (middle > last) 
 pivot = middle;
 else 
 pivot = last;
 
 else 
 if (middle > last) 
 pivot = std::max(first, last);
 else 
 pivot = middle;
 
 
 auto i = p - 1;
 for (auto j = p; j < r; ++j) 
 if (data[j] <= pivot) 
 _swap(data, ++i, j);
 
 
 _swap(data, i + 1, r);
 return i + 1;

Those are methods from class quicksort, which also has std::atomic<int> thread_limit and std::atomic<int> thread_count.

So f.e I cap the thread limit to f.e maximum 1000 threads.

The part I am particularly not sure of is the part in _sort() function. Basically, I don't know if my way of starting those threads and then joining them is the right way to do it.

I have a benchmark method, and I benchmarked my quicksort implementation like this: I took the average of 100 benchmarks for every thread_limit from 200 to 2000, stepping by 100 (so 200, 300, 400) and for single-threaded quicksort of mine. And this was for 1.000.000-sized arrays (every time I was creating a new one - this overhead was not measured). The result was shocking for me. It occurred, that of course, even with 2000 threads it was faster than with only 1 thread, but still, the fastest one was with thread_limit = 200. I thought that the optimum would be somewhere nearer the middle.

Sometimes f.e 700 threads were faster than 600 threads, but still, the general rule was: the more threads, the slower.

Do you guys see any flaws in my code or in my understanding of the benchmark I made?

asked 5 hours ago

Frynio

1083

New contributor

Having more threads than are present in your hardware (likely 4 or 8 for typical laptops, possibly 12 for desktop) is not beneficial.
â€“Â 1201ProgramAlarm
5 hours ago

Okay. I have two cores and 4 threads in my laptop. Does it mean, that I actually should cap on 4 threads? Now I actually measured for thread_limit ranging from 20 to 800, and I still see that the more threads, the slower. 20 threads are faster than 40 etc.
â€“Â Frynio
4 hours ago

@Frynio, more threads than hardware supports means more context switching. Context switches mean slower performance, and perhaps something else I'm unaware of (haven't designed or studied in depth any OS yet).
â€“Â Incomputable
4 hours ago

Every switch adds overhead. So if the overhead takes longer than the parallellisation improves, it's not helping.
â€“Â Mast
2 hours ago

You benchmarked your code, good. Did you verify it's output as well? As in, you know how long it takes, but does it work correctly?
â€“Â Mast
2 hours ago

Â |Â
show 1 more comment

up vote
1
down vote

favorite

I have a task to write a concurrent version of quicksort algorithm, to speed it up. This is what I wrote:

template <class T>
void quicksort::sort(T *data, int len) 
 thread_count = 0;
 int p = 0;
 int r = len - 1;
 _sort(data, p, r);


template <class T>
void quicksort::_swap(T *data, int first, int second) 
 auto temp = data[first];
 data[first] = data[second];
 data[second] = temp;


template <class T>
void quicksort::_sort(T *data, int p, int r) 
 thread_count++;
 if (p < r) 
 auto q = _partition(data, p, r);
 if (thread_count >= thread_limit) 
 _sort(data, p, q - 1);
 _sort(data, q + 1, r);
 else 
 std::thread lower(&quicksort::_sort<T>, this, data, p, q - 1);
 std::thread upper(&quicksort::_sort<T>, this, data, q + 1, r);
 lower.join();
 upper.join();
 
 
 thread_count--;


template <class T>
int quicksort::_partition(T *data, int p, int r) 
 auto first = data[p];
 auto last = data[r];
 T pivot = 0;
 // Median of three
 auto middle = data[(p + 2) / 2];
 if (first > middle) 
 if (middle > last) 
 pivot = middle;
 else 
 pivot = last;
 
 else 
 if (middle > last) 
 pivot = std::max(first, last);
 else 
 pivot = middle;
 
 
 auto i = p - 1;
 for (auto j = p; j < r; ++j) 
 if (data[j] <= pivot) 
 _swap(data, ++i, j);
 
 
 _swap(data, i + 1, r);
 return i + 1;

Those are methods from class quicksort, which also has std::atomic<int> thread_limit and std::atomic<int> thread_count.

So f.e I cap the thread limit to f.e maximum 1000 threads.

The part I am particularly not sure of is the part in _sort() function. Basically, I don't know if my way of starting those threads and then joining them is the right way to do it.

Sometimes f.e 700 threads were faster than 600 threads, but still, the general rule was: the more threads, the slower.

Do you guys see any flaws in my code or in my understanding of the benchmark I made?

asked 5 hours ago

Frynio

1083

New contributor

Having more threads than are present in your hardware (likely 4 or 8 for typical laptops, possibly 12 for desktop) is not beneficial.
â€“Â 1201ProgramAlarm
5 hours ago

Okay. I have two cores and 4 threads in my laptop. Does it mean, that I actually should cap on 4 threads? Now I actually measured for thread_limit ranging from 20 to 800, and I still see that the more threads, the slower. 20 threads are faster than 40 etc.
â€“Â Frynio
4 hours ago

@Frynio, more threads than hardware supports means more context switching. Context switches mean slower performance, and perhaps something else I'm unaware of (haven't designed or studied in depth any OS yet).
â€“Â Incomputable
4 hours ago

Every switch adds overhead. So if the overhead takes longer than the parallellisation improves, it's not helping.
â€“Â Mast
2 hours ago

You benchmarked your code, good. Did you verify it's output as well? As in, you know how long it takes, but does it work correctly?
â€“Â Mast
2 hours ago

Â |Â
show 1 more comment

up vote
1
down vote

favorite

I have a task to write a concurrent version of quicksort algorithm, to speed it up. This is what I wrote:

template <class T>
void quicksort::sort(T *data, int len) 
 thread_count = 0;
 int p = 0;
 int r = len - 1;
 _sort(data, p, r);


template <class T>
void quicksort::_swap(T *data, int first, int second) 
 auto temp = data[first];
 data[first] = data[second];
 data[second] = temp;


template <class T>
void quicksort::_sort(T *data, int p, int r) 
 thread_count++;
 if (p < r) 
 auto q = _partition(data, p, r);
 if (thread_count >= thread_limit) 
 _sort(data, p, q - 1);
 _sort(data, q + 1, r);
 else 
 std::thread lower(&quicksort::_sort<T>, this, data, p, q - 1);
 std::thread upper(&quicksort::_sort<T>, this, data, q + 1, r);
 lower.join();
 upper.join();
 
 
 thread_count--;


template <class T>
int quicksort::_partition(T *data, int p, int r) 
 auto first = data[p];
 auto last = data[r];
 T pivot = 0;
 // Median of three
 auto middle = data[(p + 2) / 2];
 if (first > middle) 
 if (middle > last) 
 pivot = middle;
 else 
 pivot = last;
 
 else 
 if (middle > last) 
 pivot = std::max(first, last);
 else 
 pivot = middle;
 
 
 auto i = p - 1;
 for (auto j = p; j < r; ++j) 
 if (data[j] <= pivot) 
 _swap(data, ++i, j);
 
 
 _swap(data, i + 1, r);
 return i + 1;

Those are methods from class quicksort, which also has std::atomic<int> thread_limit and std::atomic<int> thread_count.

So f.e I cap the thread limit to f.e maximum 1000 threads.

The part I am particularly not sure of is the part in _sort() function. Basically, I don't know if my way of starting those threads and then joining them is the right way to do it.

Sometimes f.e 700 threads were faster than 600 threads, but still, the general rule was: the more threads, the slower.

Do you guys see any flaws in my code or in my understanding of the benchmark I made?

asked 5 hours ago

Frynio

1083

New contributor

I have a task to write a concurrent version of quicksort algorithm, to speed it up. This is what I wrote:

template <class T>
void quicksort::sort(T *data, int len) 
 thread_count = 0;
 int p = 0;
 int r = len - 1;
 _sort(data, p, r);


template <class T>
void quicksort::_swap(T *data, int first, int second) 
 auto temp = data[first];
 data[first] = data[second];
 data[second] = temp;


template <class T>
void quicksort::_sort(T *data, int p, int r) 
 thread_count++;
 if (p < r) 
 auto q = _partition(data, p, r);
 if (thread_count >= thread_limit) 
 _sort(data, p, q - 1);
 _sort(data, q + 1, r);
 else 
 std::thread lower(&quicksort::_sort<T>, this, data, p, q - 1);
 std::thread upper(&quicksort::_sort<T>, this, data, q + 1, r);
 lower.join();
 upper.join();
 
 
 thread_count--;


template <class T>
int quicksort::_partition(T *data, int p, int r) 
 auto first = data[p];
 auto last = data[r];
 T pivot = 0;
 // Median of three
 auto middle = data[(p + 2) / 2];
 if (first > middle) 
 if (middle > last) 
 pivot = middle;
 else 
 pivot = last;
 
 else 
 if (middle > last) 
 pivot = std::max(first, last);
 else 
 pivot = middle;
 
 
 auto i = p - 1;
 for (auto j = p; j < r; ++j) 
 if (data[j] <= pivot) 
 _swap(data, ++i, j);
 
 
 _swap(data, i + 1, r);
 return i + 1;

Those are methods from class quicksort, which also has std::atomic<int> thread_limit and std::atomic<int> thread_count.

So f.e I cap the thread limit to f.e maximum 1000 threads.

The part I am particularly not sure of is the part in _sort() function. Basically, I don't know if my way of starting those threads and then joining them is the right way to do it.

Sometimes f.e 700 threads were faster than 600 threads, but still, the general rule was: the more threads, the slower.

Do you guys see any flaws in my code or in my understanding of the benchmark I made?

c++ multithreading quick-sort

asked 5 hours ago

Frynio

1083

New contributor

asked 5 hours ago

Frynio

1083

New contributor

asked 5 hours ago

Frynio

1083

New contributor

asked 5 hours ago

Frynio

1083

asked 5 hours ago

Frynio

1083

New contributor

Frynio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

Having more threads than are present in your hardware (likely 4 or 8 for typical laptops, possibly 12 for desktop) is not beneficial.
â€“Â 1201ProgramAlarm
5 hours ago

Okay. I have two cores and 4 threads in my laptop. Does it mean, that I actually should cap on 4 threads? Now I actually measured for thread_limit ranging from 20 to 800, and I still see that the more threads, the slower. 20 threads are faster than 40 etc.
â€“Â Frynio
4 hours ago

@Frynio, more threads than hardware supports means more context switching. Context switches mean slower performance, and perhaps something else I'm unaware of (haven't designed or studied in depth any OS yet).
â€“Â Incomputable
4 hours ago

Every switch adds overhead. So if the overhead takes longer than the parallellisation improves, it's not helping.
â€“Â Mast
2 hours ago

You benchmarked your code, good. Did you verify it's output as well? As in, you know how long it takes, but does it work correctly?
â€“Â Mast
2 hours ago

Â |Â
show 1 more comment

Having more threads than are present in your hardware (likely 4 or 8 for typical laptops, possibly 12 for desktop) is not beneficial.
â€“Â 1201ProgramAlarm
5 hours ago

Okay. I have two cores and 4 threads in my laptop. Does it mean, that I actually should cap on 4 threads? Now I actually measured for thread_limit ranging from 20 to 800, and I still see that the more threads, the slower. 20 threads are faster than 40 etc.
â€“Â Frynio
4 hours ago

@Frynio, more threads than hardware supports means more context switching. Context switches mean slower performance, and perhaps something else I'm unaware of (haven't designed or studied in depth any OS yet).
â€“Â Incomputable
4 hours ago

Every switch adds overhead. So if the overhead takes longer than the parallellisation improves, it's not helping.
â€“Â Mast
2 hours ago

You benchmarked your code, good. Did you verify it's output as well? As in, you know how long it takes, but does it work correctly?
â€“Â Mast
2 hours ago

Having more threads than are present in your hardware (likely 4 or 8 for typical laptops, possibly 12 for desktop) is not beneficial.
â€“Â 1201ProgramAlarm
5 hours ago

Okay. I have two cores and 4 threads in my laptop. Does it mean, that I actually should cap on 4 threads? Now I actually measured for thread_limit ranging from 20 to 800, and I still see that the more threads, the slower. 20 threads are faster than 40 etc.
â€“Â Frynio
4 hours ago

@Frynio, more threads than hardware supports means more context switching. Context switches mean slower performance, and perhaps something else I'm unaware of (haven't designed or studied in depth any OS yet).
â€“Â Incomputable
4 hours ago

Every switch adds overhead. So if the overhead takes longer than the parallellisation improves, it's not helping.
â€“Â Mast
2 hours ago

You benchmarked your code, good. Did you verify it's output as well? As in, you know how long it takes, but does it work correctly?
â€“Â Mast
2 hours ago

Â |Â
show 1 more comment

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

Partition - Pivot selection

There are 2 bugs in the pivot selection code, in combination allowing for $mathcalO(n^2)$ worst case runtime.

Bug #1: Wrong average calculation

I guess this is just a typo:

auto middle = data[(p + 2) / 2];

Most likely, (p + r) / 2 was intended instead, and that would fix this problem (disregarding overflow, but that could only ever happen if $p = r = textINT_MAX$. Maybe add a check for p == r and return early?).

Bug #2: Missing a case

if (first > middle) 
 if (middle > last) 
 pivot = middle;
 else 
 pivot = last;

This snippet doesn't handle the case correctly if $middle < first < last$. Even though first should be assigned as pivot, last will be.

The fix is simple: Change pivot = last; to pivot = std::min(first, last);.

Consequences

These two bugs allow for runtime complexity to degrade to $mathcalO(n^2)$, because they introduce a huge bias towards last as pivot value:

For any call with p >= 4, middle will have the value of an element to the left of data[p], thus being guaranteed to be smaller than first or last due to previous runs of _partition. So `last will be chosen as pivot, due to bug #2.

For any call with p == 2 or p == 3, middle will be assigned the value of data[1], which is equal to first. In that case, pivot will always be assigned the value of middle/first.

For any call with p == 0 or p == 1, the median calculation will work with a slight bias towards last (due to bug #2).

Partition - Moving values

There seems to be an assumption that data[r] == pivot. This might have been a good assumption before due to bug #2, but that isn't necessarily the case once that bug is fixed. There a multiple ways to fix that.

Also, technically it isn't necessary to move values around that are equal to the pivot value: It doesn't actually matter whether they are left or right of the final pivot position.

Multithreading

The current multithreading solution isn't actually that efficient at using threads: Nearly half the threads at a time are waiting for the others to finish. (The first thread is waiting on threads #2 and #3 it spawned, they then are waiting on threads #4-7, ...)

As mentioned in the comments, this could be fixed by letting the original thread keep working on one of the subsections before waiting, i.e. changing this:

 std::thread lower(&quicksort::_sort<T>, this, data, p, q - 1);
 std::thread upper(&quicksort::_sort<T>, this, data, q + 1, r);
 lower.join();
 upper.join();

to:

 std::thread lower(&quicksort::_sort<T>, this, data, p, q - 1);
 _sort(data, q + 1, r);
 lower.join();

Another point to make is that it rarely benefits performance to have a number of threads much larger than the number of available (logical) cores. The final number is likely between #cores and 2 * #cores, but to find the best amount you'll likely need to measure.

Bug #3: No protection for concurrent calls

Since we're already in the multithreaded environment, what happens if a second thread calls quicksort::sort after the first thread, but before that call finished?

It resets thread_count, thus allowing for the creation of more threads than intended. Once they are finished, thread_count will be negative! This might cause problems if that case isn't expected.

Solution: Either don't reset thread_count, or wait until it reaches zero before starting to work on the next dataset.

answered 59 mins ago

hoffmale

5,342835

If you're after the best quicksort performance, you might want to look into sentinel-based partition (e.g. this talk, around minute 28) and/or parallel partition algorithms.
â€“Â hoffmale
52 mins ago

Jeez, thanks for the input. And about Bug #3 - I call sort() in a loop, yes, but if I join every thread, then I don't think there will be situation, where I call another sort() if the previous did not finish
â€“Â Frynio
14 mins ago

1

@Frynio: A loop on the same thread doesn't matter (one call will have finished before the next starts). However, if there ever would be two threads calling sort concurrently, then you would get issues.
â€“Â hoffmale
11 mins ago

Yes, that's true. Fortunately that's not the case
â€“Â Frynio
9 mins ago

add a commentÂ |Â

up vote
1
down vote

Thread creation is an expensive operation, and you do it way more than needed. Even though the number of threads running at any given time is capped, the threads are constantly joined and recreated over and over again (I run your code with cap of 10, over a 1000000-strong array, and counted 50 thread creations). Use a thread pool instead.

There is no need to start two threads. The calling thread may very well continue sorting one of the halves.

As said in comments, using more threads than hardware supports will cause slowdown. The software threads are beneficial only when they have natural blocking points (due to IO or synchronization). Sorting is not the case here.

answered 1 hour ago

vnp

37.7k13096

Great points! So after fixing the first two problems you mentioned, should I actually limit the threads only to match my hardware threads?
â€“Â Frynio
1 hour ago

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Frynio is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f206438%2fconcurrent-quicksort-in-c%23new-answer', 'question_page');

);

Post as a guest

Name

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

Partition - Pivot selection

There are 2 bugs in the pivot selection code, in combination allowing for $mathcalO(n^2)$ worst case runtime.

Bug #1: Wrong average calculation

I guess this is just a typo:

auto middle = data[(p + 2) / 2];

Bug #2: Missing a case

if (first > middle) 
 if (middle > last) 
 pivot = middle;
 else 
 pivot = last;

This snippet doesn't handle the case correctly if $middle < first < last$. Even though first should be assigned as pivot, last will be.

The fix is simple: Change pivot = last; to pivot = std::min(first, last);.

Consequences

These two bugs allow for runtime complexity to degrade to $mathcalO(n^2)$, because they introduce a huge bias towards last as pivot value:

For any call with p == 2 or p == 3, middle will be assigned the value of data[1], which is equal to first. In that case, pivot will always be assigned the value of middle/first.

For any call with p == 0 or p == 1, the median calculation will work with a slight bias towards last (due to bug #2).

Partition - Moving values

Also, technically it isn't necessary to move values around that are equal to the pivot value: It doesn't actually matter whether they are left or right of the final pivot position.

Multithreading

As mentioned in the comments, this could be fixed by letting the original thread keep working on one of the subsections before waiting, i.e. changing this:

 std::thread lower(&quicksort::_sort<T>, this, data, p, q - 1);
 std::thread upper(&quicksort::_sort<T>, this, data, q + 1, r);
 lower.join();
 upper.join();

to:

 std::thread lower(&quicksort::_sort<T>, this, data, p, q - 1);
 _sort(data, q + 1, r);
 lower.join();

Bug #3: No protection for concurrent calls

Since we're already in the multithreaded environment, what happens if a second thread calls quicksort::sort after the first thread, but before that call finished?

It resets thread_count, thus allowing for the creation of more threads than intended. Once they are finished, thread_count will be negative! This might cause problems if that case isn't expected.

Solution: Either don't reset thread_count, or wait until it reaches zero before starting to work on the next dataset.

answered 59 mins ago

hoffmale

5,342835

If you're after the best quicksort performance, you might want to look into sentinel-based partition (e.g. this talk, around minute 28) and/or parallel partition algorithms.
â€“Â hoffmale
52 mins ago

Jeez, thanks for the input. And about Bug #3 - I call sort() in a loop, yes, but if I join every thread, then I don't think there will be situation, where I call another sort() if the previous did not finish
â€“Â Frynio
14 mins ago

1

@Frynio: A loop on the same thread doesn't matter (one call will have finished before the next starts). However, if there ever would be two threads calling sort concurrently, then you would get issues.
â€“Â hoffmale
11 mins ago

Yes, that's true. Fortunately that's not the case
â€“Â Frynio
9 mins ago

add a commentÂ |Â

up vote
1
down vote

accepted

Partition - Pivot selection

There are 2 bugs in the pivot selection code, in combination allowing for $mathcalO(n^2)$ worst case runtime.

Bug #1: Wrong average calculation

I guess this is just a typo:

auto middle = data[(p + 2) / 2];

Bug #2: Missing a case

if (first > middle) 
 if (middle > last) 
 pivot = middle;
 else 
 pivot = last;

This snippet doesn't handle the case correctly if $middle < first < last$. Even though first should be assigned as pivot, last will be.

The fix is simple: Change pivot = last; to pivot = std::min(first, last);.

Consequences

These two bugs allow for runtime complexity to degrade to $mathcalO(n^2)$, because they introduce a huge bias towards last as pivot value:

For any call with p == 2 or p == 3, middle will be assigned the value of data[1], which is equal to first. In that case, pivot will always be assigned the value of middle/first.

For any call with p == 0 or p == 1, the median calculation will work with a slight bias towards last (due to bug #2).

Partition - Moving values

Also, technically it isn't necessary to move values around that are equal to the pivot value: It doesn't actually matter whether they are left or right of the final pivot position.

Multithreading

As mentioned in the comments, this could be fixed by letting the original thread keep working on one of the subsections before waiting, i.e. changing this:

 std::thread lower(&quicksort::_sort<T>, this, data, p, q - 1);
 std::thread upper(&quicksort::_sort<T>, this, data, q + 1, r);
 lower.join();
 upper.join();

to:

 std::thread lower(&quicksort::_sort<T>, this, data, p, q - 1);
 _sort(data, q + 1, r);
 lower.join();

Bug #3: No protection for concurrent calls

Since we're already in the multithreaded environment, what happens if a second thread calls quicksort::sort after the first thread, but before that call finished?

It resets thread_count, thus allowing for the creation of more threads than intended. Once they are finished, thread_count will be negative! This might cause problems if that case isn't expected.

Solution: Either don't reset thread_count, or wait until it reaches zero before starting to work on the next dataset.

answered 59 mins ago

hoffmale

5,342835

If you're after the best quicksort performance, you might want to look into sentinel-based partition (e.g. this talk, around minute 28) and/or parallel partition algorithms.
â€“Â hoffmale
52 mins ago

Jeez, thanks for the input. And about Bug #3 - I call sort() in a loop, yes, but if I join every thread, then I don't think there will be situation, where I call another sort() if the previous did not finish
â€“Â Frynio
14 mins ago

1

@Frynio: A loop on the same thread doesn't matter (one call will have finished before the next starts). However, if there ever would be two threads calling sort concurrently, then you would get issues.
â€“Â hoffmale
11 mins ago

Yes, that's true. Fortunately that's not the case
â€“Â Frynio
9 mins ago

add a commentÂ |Â

up vote
1
down vote

accepted

Partition - Pivot selection

There are 2 bugs in the pivot selection code, in combination allowing for $mathcalO(n^2)$ worst case runtime.

Bug #1: Wrong average calculation

I guess this is just a typo:

auto middle = data[(p + 2) / 2];

Bug #2: Missing a case

if (first > middle) 
 if (middle > last) 
 pivot = middle;
 else 
 pivot = last;

This snippet doesn't handle the case correctly if $middle < first < last$. Even though first should be assigned as pivot, last will be.

The fix is simple: Change pivot = last; to pivot = std::min(first, last);.

Consequences

These two bugs allow for runtime complexity to degrade to $mathcalO(n^2)$, because they introduce a huge bias towards last as pivot value:

For any call with p == 2 or p == 3, middle will be assigned the value of data[1], which is equal to first. In that case, pivot will always be assigned the value of middle/first.

For any call with p == 0 or p == 1, the median calculation will work with a slight bias towards last (due to bug #2).

Partition - Moving values

Also, technically it isn't necessary to move values around that are equal to the pivot value: It doesn't actually matter whether they are left or right of the final pivot position.

Multithreading

As mentioned in the comments, this could be fixed by letting the original thread keep working on one of the subsections before waiting, i.e. changing this:

 std::thread lower(&quicksort::_sort<T>, this, data, p, q - 1);
 std::thread upper(&quicksort::_sort<T>, this, data, q + 1, r);
 lower.join();
 upper.join();

to:

 std::thread lower(&quicksort::_sort<T>, this, data, p, q - 1);
 _sort(data, q + 1, r);
 lower.join();

Bug #3: No protection for concurrent calls

Since we're already in the multithreaded environment, what happens if a second thread calls quicksort::sort after the first thread, but before that call finished?

It resets thread_count, thus allowing for the creation of more threads than intended. Once they are finished, thread_count will be negative! This might cause problems if that case isn't expected.

Solution: Either don't reset thread_count, or wait until it reaches zero before starting to work on the next dataset.

answered 59 mins ago

hoffmale

5,342835

Partition - Pivot selection

There are 2 bugs in the pivot selection code, in combination allowing for $mathcalO(n^2)$ worst case runtime.

Bug #1: Wrong average calculation

I guess this is just a typo:

auto middle = data[(p + 2) / 2];

Bug #2: Missing a case

if (first > middle) 
 if (middle > last) 
 pivot = middle;
 else 
 pivot = last;

This snippet doesn't handle the case correctly if $middle < first < last$. Even though first should be assigned as pivot, last will be.

The fix is simple: Change pivot = last; to pivot = std::min(first, last);.

Consequences

These two bugs allow for runtime complexity to degrade to $mathcalO(n^2)$, because they introduce a huge bias towards last as pivot value:

For any call with p == 2 or p == 3, middle will be assigned the value of data[1], which is equal to first. In that case, pivot will always be assigned the value of middle/first.

For any call with p == 0 or p == 1, the median calculation will work with a slight bias towards last (due to bug #2).

Partition - Moving values

Also, technically it isn't necessary to move values around that are equal to the pivot value: It doesn't actually matter whether they are left or right of the final pivot position.

Multithreading

As mentioned in the comments, this could be fixed by letting the original thread keep working on one of the subsections before waiting, i.e. changing this:

 std::thread lower(&quicksort::_sort<T>, this, data, p, q - 1);
 std::thread upper(&quicksort::_sort<T>, this, data, q + 1, r);
 lower.join();
 upper.join();

to:

 std::thread lower(&quicksort::_sort<T>, this, data, p, q - 1);
 _sort(data, q + 1, r);
 lower.join();

Bug #3: No protection for concurrent calls

Since we're already in the multithreaded environment, what happens if a second thread calls quicksort::sort after the first thread, but before that call finished?

It resets thread_count, thus allowing for the creation of more threads than intended. Once they are finished, thread_count will be negative! This might cause problems if that case isn't expected.

Solution: Either don't reset thread_count, or wait until it reaches zero before starting to work on the next dataset.

answered 59 mins ago

hoffmale

5,342835

answered 59 mins ago

hoffmale

5,342835

answered 59 mins ago

hoffmale

5,342835

answered 59 mins ago

hoffmale

5,342835

If you're after the best quicksort performance, you might want to look into sentinel-based partition (e.g. this talk, around minute 28) and/or parallel partition algorithms.
â€“Â hoffmale
52 mins ago

Jeez, thanks for the input. And about Bug #3 - I call sort() in a loop, yes, but if I join every thread, then I don't think there will be situation, where I call another sort() if the previous did not finish
â€“Â Frynio
14 mins ago

1

@Frynio: A loop on the same thread doesn't matter (one call will have finished before the next starts). However, if there ever would be two threads calling sort concurrently, then you would get issues.
â€“Â hoffmale
11 mins ago

Yes, that's true. Fortunately that's not the case
â€“Â Frynio
9 mins ago

add a commentÂ |Â

If you're after the best quicksort performance, you might want to look into sentinel-based partition (e.g. this talk, around minute 28) and/or parallel partition algorithms.
â€“Â hoffmale
52 mins ago

Jeez, thanks for the input. And about Bug #3 - I call sort() in a loop, yes, but if I join every thread, then I don't think there will be situation, where I call another sort() if the previous did not finish
â€“Â Frynio
14 mins ago

1

@Frynio: A loop on the same thread doesn't matter (one call will have finished before the next starts). However, if there ever would be two threads calling sort concurrently, then you would get issues.
â€“Â hoffmale
11 mins ago

Yes, that's true. Fortunately that's not the case
â€“Â Frynio
9 mins ago

If you're after the best quicksort performance, you might want to look into sentinel-based partition (e.g. this talk, around minute 28) and/or parallel partition algorithms.
â€“Â hoffmale
52 mins ago

Jeez, thanks for the input. And about Bug #3 - I call sort() in a loop, yes, but if I join every thread, then I don't think there will be situation, where I call another sort() if the previous did not finish
â€“Â Frynio
14 mins ago

@Frynio: A loop on the same thread doesn't matter (one call will have finished before the next starts). However, if there ever would be two threads calling sort concurrently, then you would get issues.
â€“Â hoffmale
11 mins ago

Yes, that's true. Fortunately that's not the case
â€“Â Frynio
9 mins ago

add a commentÂ |Â

up vote
1
down vote

Thread creation is an expensive operation, and you do it way more than needed. Even though the number of threads running at any given time is capped, the threads are constantly joined and recreated over and over again (I run your code with cap of 10, over a 1000000-strong array, and counted 50 thread creations). Use a thread pool instead.

There is no need to start two threads. The calling thread may very well continue sorting one of the halves.

As said in comments, using more threads than hardware supports will cause slowdown. The software threads are beneficial only when they have natural blocking points (due to IO or synchronization). Sorting is not the case here.

answered 1 hour ago

vnp

37.7k13096

Great points! So after fixing the first two problems you mentioned, should I actually limit the threads only to match my hardware threads?
â€“Â Frynio
1 hour ago

add a commentÂ |Â

up vote
1
down vote

Thread creation is an expensive operation, and you do it way more than needed. Even though the number of threads running at any given time is capped, the threads are constantly joined and recreated over and over again (I run your code with cap of 10, over a 1000000-strong array, and counted 50 thread creations). Use a thread pool instead.

There is no need to start two threads. The calling thread may very well continue sorting one of the halves.

As said in comments, using more threads than hardware supports will cause slowdown. The software threads are beneficial only when they have natural blocking points (due to IO or synchronization). Sorting is not the case here.

answered 1 hour ago

vnp

37.7k13096

Great points! So after fixing the first two problems you mentioned, should I actually limit the threads only to match my hardware threads?
â€“Â Frynio
1 hour ago

add a commentÂ |Â

up vote
1
down vote

Thread creation is an expensive operation, and you do it way more than needed. Even though the number of threads running at any given time is capped, the threads are constantly joined and recreated over and over again (I run your code with cap of 10, over a 1000000-strong array, and counted 50 thread creations). Use a thread pool instead.

There is no need to start two threads. The calling thread may very well continue sorting one of the halves.

As said in comments, using more threads than hardware supports will cause slowdown. The software threads are beneficial only when they have natural blocking points (due to IO or synchronization). Sorting is not the case here.

answered 1 hour ago

vnp

37.7k13096

Thread creation is an expensive operation, and you do it way more than needed. Even though the number of threads running at any given time is capped, the threads are constantly joined and recreated over and over again (I run your code with cap of 10, over a 1000000-strong array, and counted 50 thread creations). Use a thread pool instead.

There is no need to start two threads. The calling thread may very well continue sorting one of the halves.

As said in comments, using more threads than hardware supports will cause slowdown. The software threads are beneficial only when they have natural blocking points (due to IO or synchronization). Sorting is not the case here.

answered 1 hour ago

vnp

37.7k13096

answered 1 hour ago

vnp

37.7k13096

answered 1 hour ago

vnp

37.7k13096

answered 1 hour ago

vnp

37.7k13096

Great points! So after fixing the first two problems you mentioned, should I actually limit the threads only to match my hardware threads?
â€“Â Frynio
1 hour ago

add a commentÂ |Â

Great points! So after fixing the first two problems you mentioned, should I actually limit the threads only to match my hardware threads?
â€“Â Frynio
1 hour ago

Great points! So after fixing the first two problems you mentioned, should I actually limit the threads only to match my hardware threads?
â€“Â Frynio
1 hour ago

add a commentÂ |Â

Frynio is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Frynio is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Post as a guest

Name

Category

Random preview

Concurrent quicksort in C++

2 Answers 2

Partition - Pivot selection

Bug #1: Wrong average calculation

Bug #2: Missing a case

Consequences

Partition - Moving values

Multithreading

Bug #3: No protection for concurrent calls

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Partition - Pivot selection

Bug #1: Wrong average calculation

Bug #2: Missing a case

Consequences

Partition - Moving values

Multithreading

Bug #3: No protection for concurrent calls

Partition - Pivot selection

Bug #1: Wrong average calculation

Bug #2: Missing a case

Consequences

Partition - Moving values

Multithreading

Bug #3: No protection for concurrent calls

Partition - Pivot selection

Bug #1: Wrong average calculation

Bug #2: Missing a case

Consequences

Partition - Moving values

Multithreading

Bug #3: No protection for concurrent calls

Partition - Pivot selection

Bug #1: Wrong average calculation

Bug #2: Missing a case

Consequences

Partition - Moving values

Multithreading

Bug #3: No protection for concurrent calls

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Comments

Post a Comment

Popular posts from this blog

What does second last employer means? [closed]

List of Gilmore Girls characters

Confectionery

2 Answers
2

2 Answers
2

2 Answers
2