Is the new random library really better than std::rand()?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
6
down vote

favorite
1












So I saw a talk called rand() Considered Harmful and it advocated for using the engine-distribution paradigm of random number generation over the simple std::rand() plus modulus paradigm.



However, I wanted to see the failings of std::rand() firsthand so I did a quick experiment:



  1. Basically, I wrote 2 functions getRandNum_Old() and getRandNum_New() that generated a random number between 0 and 5 inclusive using std::rand() and std::mt19937+std::uniform_int_distribution respectively.

  2. Then I generated 960,000 (divisible by 6) random numbers using the "old" way and recorded the frequencies of the numbers 0-5. Then I calculated the standard deviation of these frequencies. What I'm looking for is a standard deviation as low as possible since that is what would happen if the distribution were truly uniform.

  3. I ran that simulation 1000 times and recorded the standard deviation for each simulation. I also recorded the time it took in milliseconds.

  4. Afterwards, I did the exact same again but this time generating random numbers the "new" way.

  5. Finally, I calculated the mean and standard deviation of the list of standard deviations for both the old and new way and the mean and standard deviation for the list of times taken for both the old and new way.

Here were the results:



[OLD WAY]
Spread
mean: 346.554406
std dev: 110.318361
Time Taken (ms)
mean: 6.662910
std dev: 0.366301

[NEW WAY]
Spread
mean: 350.346792
std dev: 110.449190
Time Taken (ms)
mean: 28.053907
std dev: 0.654964


Surprisingly, the aggregate spread of rolls was the same for both methods. I.e., std::mt19937+std::uniform_int_distribution was not "more uniform" than simple std::rand()+%. Another observation I made was that the new was about 4x slower than the old way. Overall, it seemed like I was paying a huge cost in speed for almost no gain in quality.



Is my experiment flawed in some way? Or is std::rand() really not that bad, and maybe even better?



For reference, here is the code I used in its entirety:



#include <cstdio>
#include <random>
#include <algorithm>
#include <chrono>

int getRandNum_Old()
static bool init = false;
if (!init)
std::srand(time(nullptr)); // Seed std::rand
init = true;


return std::rand() % 6;


int getRandNum_New()
static bool init = false;
static std::random_device rd;
static std::mt19937 eng;
static std::uniform_int_distribution<int> dist(0,5);
if (!init)
eng.seed(rd()); // Seed random engine
init = true;


return dist(eng);


template <typename T>
double mean(T* data, int n)
double m = 0;
std::for_each(data, data+n, [&](T x) m += x; );
m /= n;
return m;


template <typename T>
double stdDev(T* data, int n)
double m = mean(data, n);
double sd = 0.0;
std::for_each(data, data+n, [&](T x) sd += ((x-m) * (x-m)); );
sd /= n;
sd = sqrt(sd);
return sd;


int main()
const int N = 960000; // Number of trials
const int M = 1000; // Number of simulations
const int D = 6; // Num sides on die

/* Do the things the "old" way (blech) */

int freqList_Old[D];
double stdDevList_Old[M];
double timeTakenList_Old[M];

for (int j = 0; j < M; j++)
auto start = std::chrono::high_resolution_clock::now();
std::fill_n(freqList_Old, D, 0);
for (int i = 0; i < N; i++)
int roll = getRandNum_Old();
freqList_Old[roll] += 1;

stdDevList_Old[j] = stdDev(freqList_Old, D);
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
double timeTaken = dur.count() / 1000.0;
timeTakenList_Old[j] = timeTaken;


/* Do the things the cool new way! */

int freqList_New[D];
double stdDevList_New[M];
double timeTakenList_New[M];

for (int j = 0; j < M; j++)
auto start = std::chrono::high_resolution_clock::now();
std::fill_n(freqList_New, D, 0);
for (int i = 0; i < N; i++)
int roll = getRandNum_New();
freqList_New[roll] += 1;

stdDevList_New[j] = stdDev(freqList_New, D);
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
double timeTaken = dur.count() / 1000.0;
timeTakenList_New[j] = timeTaken;


/* Display Results */

printf("[OLD WAY]n");
printf("Spreadn");
printf(" mean: %.6fn", mean(stdDevList_Old, M));
printf(" std dev: %.6fn", stdDev(stdDevList_Old, M));
printf("Time Taken (ms)n");
printf(" mean: %.6fn", mean(timeTakenList_Old, M));
printf(" std dev: %.6fn", stdDev(timeTakenList_Old, M));
printf("n");
printf("[NEW WAY]n");
printf("Spreadn");
printf(" mean: %.6fn", mean(stdDevList_New, M));
printf(" std dev: %.6fn", stdDev(stdDevList_New, M));
printf("Time Taken (ms)n");
printf(" mean: %.6fn", mean(timeTakenList_New, M));
printf(" std dev: %.6fn", stdDev(timeTakenList_New, M));










share|improve this question























  • Try to use getRandNum_Old in a multi-threaded code. By the way, your both function are inefficient, since they test a condition in each call, which you really don't want to if you care about performance.
    – Daniel Langr
    2 hours ago











  • I could simulate you results with some very non-random numbers. You need to look at avalanche properties as well.
    – doron
    2 hours ago










  • @doron what does simulating with non-random numbers mean? like a deterministic seed?
    – rcplusplus
    2 hours ago






  • 2




    That is pretty much why this advice exists. If you don't know how to test the RNG for sufficient entropy or whether or not it matters for your program then you should assume that std::rand() isn't good enough. en.wikipedia.org/wiki/Entropy_(computing)
    – Hans Passant
    2 hours ago






  • 1




    @rcplusplus like repeating a sequence of a handful of numbers.
    – doron
    1 hour ago














up vote
6
down vote

favorite
1












So I saw a talk called rand() Considered Harmful and it advocated for using the engine-distribution paradigm of random number generation over the simple std::rand() plus modulus paradigm.



However, I wanted to see the failings of std::rand() firsthand so I did a quick experiment:



  1. Basically, I wrote 2 functions getRandNum_Old() and getRandNum_New() that generated a random number between 0 and 5 inclusive using std::rand() and std::mt19937+std::uniform_int_distribution respectively.

  2. Then I generated 960,000 (divisible by 6) random numbers using the "old" way and recorded the frequencies of the numbers 0-5. Then I calculated the standard deviation of these frequencies. What I'm looking for is a standard deviation as low as possible since that is what would happen if the distribution were truly uniform.

  3. I ran that simulation 1000 times and recorded the standard deviation for each simulation. I also recorded the time it took in milliseconds.

  4. Afterwards, I did the exact same again but this time generating random numbers the "new" way.

  5. Finally, I calculated the mean and standard deviation of the list of standard deviations for both the old and new way and the mean and standard deviation for the list of times taken for both the old and new way.

Here were the results:



[OLD WAY]
Spread
mean: 346.554406
std dev: 110.318361
Time Taken (ms)
mean: 6.662910
std dev: 0.366301

[NEW WAY]
Spread
mean: 350.346792
std dev: 110.449190
Time Taken (ms)
mean: 28.053907
std dev: 0.654964


Surprisingly, the aggregate spread of rolls was the same for both methods. I.e., std::mt19937+std::uniform_int_distribution was not "more uniform" than simple std::rand()+%. Another observation I made was that the new was about 4x slower than the old way. Overall, it seemed like I was paying a huge cost in speed for almost no gain in quality.



Is my experiment flawed in some way? Or is std::rand() really not that bad, and maybe even better?



For reference, here is the code I used in its entirety:



#include <cstdio>
#include <random>
#include <algorithm>
#include <chrono>

int getRandNum_Old()
static bool init = false;
if (!init)
std::srand(time(nullptr)); // Seed std::rand
init = true;


return std::rand() % 6;


int getRandNum_New()
static bool init = false;
static std::random_device rd;
static std::mt19937 eng;
static std::uniform_int_distribution<int> dist(0,5);
if (!init)
eng.seed(rd()); // Seed random engine
init = true;


return dist(eng);


template <typename T>
double mean(T* data, int n)
double m = 0;
std::for_each(data, data+n, [&](T x) m += x; );
m /= n;
return m;


template <typename T>
double stdDev(T* data, int n)
double m = mean(data, n);
double sd = 0.0;
std::for_each(data, data+n, [&](T x) sd += ((x-m) * (x-m)); );
sd /= n;
sd = sqrt(sd);
return sd;


int main()
const int N = 960000; // Number of trials
const int M = 1000; // Number of simulations
const int D = 6; // Num sides on die

/* Do the things the "old" way (blech) */

int freqList_Old[D];
double stdDevList_Old[M];
double timeTakenList_Old[M];

for (int j = 0; j < M; j++)
auto start = std::chrono::high_resolution_clock::now();
std::fill_n(freqList_Old, D, 0);
for (int i = 0; i < N; i++)
int roll = getRandNum_Old();
freqList_Old[roll] += 1;

stdDevList_Old[j] = stdDev(freqList_Old, D);
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
double timeTaken = dur.count() / 1000.0;
timeTakenList_Old[j] = timeTaken;


/* Do the things the cool new way! */

int freqList_New[D];
double stdDevList_New[M];
double timeTakenList_New[M];

for (int j = 0; j < M; j++)
auto start = std::chrono::high_resolution_clock::now();
std::fill_n(freqList_New, D, 0);
for (int i = 0; i < N; i++)
int roll = getRandNum_New();
freqList_New[roll] += 1;

stdDevList_New[j] = stdDev(freqList_New, D);
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
double timeTaken = dur.count() / 1000.0;
timeTakenList_New[j] = timeTaken;


/* Display Results */

printf("[OLD WAY]n");
printf("Spreadn");
printf(" mean: %.6fn", mean(stdDevList_Old, M));
printf(" std dev: %.6fn", stdDev(stdDevList_Old, M));
printf("Time Taken (ms)n");
printf(" mean: %.6fn", mean(timeTakenList_Old, M));
printf(" std dev: %.6fn", stdDev(timeTakenList_Old, M));
printf("n");
printf("[NEW WAY]n");
printf("Spreadn");
printf(" mean: %.6fn", mean(stdDevList_New, M));
printf(" std dev: %.6fn", stdDev(stdDevList_New, M));
printf("Time Taken (ms)n");
printf(" mean: %.6fn", mean(timeTakenList_New, M));
printf(" std dev: %.6fn", stdDev(timeTakenList_New, M));










share|improve this question























  • Try to use getRandNum_Old in a multi-threaded code. By the way, your both function are inefficient, since they test a condition in each call, which you really don't want to if you care about performance.
    – Daniel Langr
    2 hours ago











  • I could simulate you results with some very non-random numbers. You need to look at avalanche properties as well.
    – doron
    2 hours ago










  • @doron what does simulating with non-random numbers mean? like a deterministic seed?
    – rcplusplus
    2 hours ago






  • 2




    That is pretty much why this advice exists. If you don't know how to test the RNG for sufficient entropy or whether or not it matters for your program then you should assume that std::rand() isn't good enough. en.wikipedia.org/wiki/Entropy_(computing)
    – Hans Passant
    2 hours ago






  • 1




    @rcplusplus like repeating a sequence of a handful of numbers.
    – doron
    1 hour ago












up vote
6
down vote

favorite
1









up vote
6
down vote

favorite
1






1





So I saw a talk called rand() Considered Harmful and it advocated for using the engine-distribution paradigm of random number generation over the simple std::rand() plus modulus paradigm.



However, I wanted to see the failings of std::rand() firsthand so I did a quick experiment:



  1. Basically, I wrote 2 functions getRandNum_Old() and getRandNum_New() that generated a random number between 0 and 5 inclusive using std::rand() and std::mt19937+std::uniform_int_distribution respectively.

  2. Then I generated 960,000 (divisible by 6) random numbers using the "old" way and recorded the frequencies of the numbers 0-5. Then I calculated the standard deviation of these frequencies. What I'm looking for is a standard deviation as low as possible since that is what would happen if the distribution were truly uniform.

  3. I ran that simulation 1000 times and recorded the standard deviation for each simulation. I also recorded the time it took in milliseconds.

  4. Afterwards, I did the exact same again but this time generating random numbers the "new" way.

  5. Finally, I calculated the mean and standard deviation of the list of standard deviations for both the old and new way and the mean and standard deviation for the list of times taken for both the old and new way.

Here were the results:



[OLD WAY]
Spread
mean: 346.554406
std dev: 110.318361
Time Taken (ms)
mean: 6.662910
std dev: 0.366301

[NEW WAY]
Spread
mean: 350.346792
std dev: 110.449190
Time Taken (ms)
mean: 28.053907
std dev: 0.654964


Surprisingly, the aggregate spread of rolls was the same for both methods. I.e., std::mt19937+std::uniform_int_distribution was not "more uniform" than simple std::rand()+%. Another observation I made was that the new was about 4x slower than the old way. Overall, it seemed like I was paying a huge cost in speed for almost no gain in quality.



Is my experiment flawed in some way? Or is std::rand() really not that bad, and maybe even better?



For reference, here is the code I used in its entirety:



#include <cstdio>
#include <random>
#include <algorithm>
#include <chrono>

int getRandNum_Old()
static bool init = false;
if (!init)
std::srand(time(nullptr)); // Seed std::rand
init = true;


return std::rand() % 6;


int getRandNum_New()
static bool init = false;
static std::random_device rd;
static std::mt19937 eng;
static std::uniform_int_distribution<int> dist(0,5);
if (!init)
eng.seed(rd()); // Seed random engine
init = true;


return dist(eng);


template <typename T>
double mean(T* data, int n)
double m = 0;
std::for_each(data, data+n, [&](T x) m += x; );
m /= n;
return m;


template <typename T>
double stdDev(T* data, int n)
double m = mean(data, n);
double sd = 0.0;
std::for_each(data, data+n, [&](T x) sd += ((x-m) * (x-m)); );
sd /= n;
sd = sqrt(sd);
return sd;


int main()
const int N = 960000; // Number of trials
const int M = 1000; // Number of simulations
const int D = 6; // Num sides on die

/* Do the things the "old" way (blech) */

int freqList_Old[D];
double stdDevList_Old[M];
double timeTakenList_Old[M];

for (int j = 0; j < M; j++)
auto start = std::chrono::high_resolution_clock::now();
std::fill_n(freqList_Old, D, 0);
for (int i = 0; i < N; i++)
int roll = getRandNum_Old();
freqList_Old[roll] += 1;

stdDevList_Old[j] = stdDev(freqList_Old, D);
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
double timeTaken = dur.count() / 1000.0;
timeTakenList_Old[j] = timeTaken;


/* Do the things the cool new way! */

int freqList_New[D];
double stdDevList_New[M];
double timeTakenList_New[M];

for (int j = 0; j < M; j++)
auto start = std::chrono::high_resolution_clock::now();
std::fill_n(freqList_New, D, 0);
for (int i = 0; i < N; i++)
int roll = getRandNum_New();
freqList_New[roll] += 1;

stdDevList_New[j] = stdDev(freqList_New, D);
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
double timeTaken = dur.count() / 1000.0;
timeTakenList_New[j] = timeTaken;


/* Display Results */

printf("[OLD WAY]n");
printf("Spreadn");
printf(" mean: %.6fn", mean(stdDevList_Old, M));
printf(" std dev: %.6fn", stdDev(stdDevList_Old, M));
printf("Time Taken (ms)n");
printf(" mean: %.6fn", mean(timeTakenList_Old, M));
printf(" std dev: %.6fn", stdDev(timeTakenList_Old, M));
printf("n");
printf("[NEW WAY]n");
printf("Spreadn");
printf(" mean: %.6fn", mean(stdDevList_New, M));
printf(" std dev: %.6fn", stdDev(stdDevList_New, M));
printf("Time Taken (ms)n");
printf(" mean: %.6fn", mean(timeTakenList_New, M));
printf(" std dev: %.6fn", stdDev(timeTakenList_New, M));










share|improve this question















So I saw a talk called rand() Considered Harmful and it advocated for using the engine-distribution paradigm of random number generation over the simple std::rand() plus modulus paradigm.



However, I wanted to see the failings of std::rand() firsthand so I did a quick experiment:



  1. Basically, I wrote 2 functions getRandNum_Old() and getRandNum_New() that generated a random number between 0 and 5 inclusive using std::rand() and std::mt19937+std::uniform_int_distribution respectively.

  2. Then I generated 960,000 (divisible by 6) random numbers using the "old" way and recorded the frequencies of the numbers 0-5. Then I calculated the standard deviation of these frequencies. What I'm looking for is a standard deviation as low as possible since that is what would happen if the distribution were truly uniform.

  3. I ran that simulation 1000 times and recorded the standard deviation for each simulation. I also recorded the time it took in milliseconds.

  4. Afterwards, I did the exact same again but this time generating random numbers the "new" way.

  5. Finally, I calculated the mean and standard deviation of the list of standard deviations for both the old and new way and the mean and standard deviation for the list of times taken for both the old and new way.

Here were the results:



[OLD WAY]
Spread
mean: 346.554406
std dev: 110.318361
Time Taken (ms)
mean: 6.662910
std dev: 0.366301

[NEW WAY]
Spread
mean: 350.346792
std dev: 110.449190
Time Taken (ms)
mean: 28.053907
std dev: 0.654964


Surprisingly, the aggregate spread of rolls was the same for both methods. I.e., std::mt19937+std::uniform_int_distribution was not "more uniform" than simple std::rand()+%. Another observation I made was that the new was about 4x slower than the old way. Overall, it seemed like I was paying a huge cost in speed for almost no gain in quality.



Is my experiment flawed in some way? Or is std::rand() really not that bad, and maybe even better?



For reference, here is the code I used in its entirety:



#include <cstdio>
#include <random>
#include <algorithm>
#include <chrono>

int getRandNum_Old()
static bool init = false;
if (!init)
std::srand(time(nullptr)); // Seed std::rand
init = true;


return std::rand() % 6;


int getRandNum_New()
static bool init = false;
static std::random_device rd;
static std::mt19937 eng;
static std::uniform_int_distribution<int> dist(0,5);
if (!init)
eng.seed(rd()); // Seed random engine
init = true;


return dist(eng);


template <typename T>
double mean(T* data, int n)
double m = 0;
std::for_each(data, data+n, [&](T x) m += x; );
m /= n;
return m;


template <typename T>
double stdDev(T* data, int n)
double m = mean(data, n);
double sd = 0.0;
std::for_each(data, data+n, [&](T x) sd += ((x-m) * (x-m)); );
sd /= n;
sd = sqrt(sd);
return sd;


int main()
const int N = 960000; // Number of trials
const int M = 1000; // Number of simulations
const int D = 6; // Num sides on die

/* Do the things the "old" way (blech) */

int freqList_Old[D];
double stdDevList_Old[M];
double timeTakenList_Old[M];

for (int j = 0; j < M; j++)
auto start = std::chrono::high_resolution_clock::now();
std::fill_n(freqList_Old, D, 0);
for (int i = 0; i < N; i++)
int roll = getRandNum_Old();
freqList_Old[roll] += 1;

stdDevList_Old[j] = stdDev(freqList_Old, D);
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
double timeTaken = dur.count() / 1000.0;
timeTakenList_Old[j] = timeTaken;


/* Do the things the cool new way! */

int freqList_New[D];
double stdDevList_New[M];
double timeTakenList_New[M];

for (int j = 0; j < M; j++)
auto start = std::chrono::high_resolution_clock::now();
std::fill_n(freqList_New, D, 0);
for (int i = 0; i < N; i++)
int roll = getRandNum_New();
freqList_New[roll] += 1;

stdDevList_New[j] = stdDev(freqList_New, D);
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
double timeTaken = dur.count() / 1000.0;
timeTakenList_New[j] = timeTaken;


/* Display Results */

printf("[OLD WAY]n");
printf("Spreadn");
printf(" mean: %.6fn", mean(stdDevList_Old, M));
printf(" std dev: %.6fn", stdDev(stdDevList_Old, M));
printf("Time Taken (ms)n");
printf(" mean: %.6fn", mean(timeTakenList_Old, M));
printf(" std dev: %.6fn", stdDev(timeTakenList_Old, M));
printf("n");
printf("[NEW WAY]n");
printf("Spreadn");
printf(" mean: %.6fn", mean(stdDevList_New, M));
printf(" std dev: %.6fn", stdDev(stdDevList_New, M));
printf("Time Taken (ms)n");
printf(" mean: %.6fn", mean(timeTakenList_New, M));
printf(" std dev: %.6fn", stdDev(timeTakenList_New, M));







c++ c++11 random






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 hours ago

























asked 2 hours ago









rcplusplus

72611123




72611123











  • Try to use getRandNum_Old in a multi-threaded code. By the way, your both function are inefficient, since they test a condition in each call, which you really don't want to if you care about performance.
    – Daniel Langr
    2 hours ago











  • I could simulate you results with some very non-random numbers. You need to look at avalanche properties as well.
    – doron
    2 hours ago










  • @doron what does simulating with non-random numbers mean? like a deterministic seed?
    – rcplusplus
    2 hours ago






  • 2




    That is pretty much why this advice exists. If you don't know how to test the RNG for sufficient entropy or whether or not it matters for your program then you should assume that std::rand() isn't good enough. en.wikipedia.org/wiki/Entropy_(computing)
    – Hans Passant
    2 hours ago






  • 1




    @rcplusplus like repeating a sequence of a handful of numbers.
    – doron
    1 hour ago
















  • Try to use getRandNum_Old in a multi-threaded code. By the way, your both function are inefficient, since they test a condition in each call, which you really don't want to if you care about performance.
    – Daniel Langr
    2 hours ago











  • I could simulate you results with some very non-random numbers. You need to look at avalanche properties as well.
    – doron
    2 hours ago










  • @doron what does simulating with non-random numbers mean? like a deterministic seed?
    – rcplusplus
    2 hours ago






  • 2




    That is pretty much why this advice exists. If you don't know how to test the RNG for sufficient entropy or whether or not it matters for your program then you should assume that std::rand() isn't good enough. en.wikipedia.org/wiki/Entropy_(computing)
    – Hans Passant
    2 hours ago






  • 1




    @rcplusplus like repeating a sequence of a handful of numbers.
    – doron
    1 hour ago















Try to use getRandNum_Old in a multi-threaded code. By the way, your both function are inefficient, since they test a condition in each call, which you really don't want to if you care about performance.
– Daniel Langr
2 hours ago





Try to use getRandNum_Old in a multi-threaded code. By the way, your both function are inefficient, since they test a condition in each call, which you really don't want to if you care about performance.
– Daniel Langr
2 hours ago













I could simulate you results with some very non-random numbers. You need to look at avalanche properties as well.
– doron
2 hours ago




I could simulate you results with some very non-random numbers. You need to look at avalanche properties as well.
– doron
2 hours ago












@doron what does simulating with non-random numbers mean? like a deterministic seed?
– rcplusplus
2 hours ago




@doron what does simulating with non-random numbers mean? like a deterministic seed?
– rcplusplus
2 hours ago




2




2




That is pretty much why this advice exists. If you don't know how to test the RNG for sufficient entropy or whether or not it matters for your program then you should assume that std::rand() isn't good enough. en.wikipedia.org/wiki/Entropy_(computing)
– Hans Passant
2 hours ago




That is pretty much why this advice exists. If you don't know how to test the RNG for sufficient entropy or whether or not it matters for your program then you should assume that std::rand() isn't good enough. en.wikipedia.org/wiki/Entropy_(computing)
– Hans Passant
2 hours ago




1




1




@rcplusplus like repeating a sequence of a handful of numbers.
– doron
1 hour ago




@rcplusplus like repeating a sequence of a handful of numbers.
– doron
1 hour ago












2 Answers
2






active

oldest

votes

















up vote
8
down vote













Pretty much any implementation of "old" rand() use an LCG; while they are generally not the best generators around, usually you are not going to see them fail on such a basic test - mean and standard deviation is generally got right even by the worst PRNGs.



Common failings of "bad" - but common enough - rand() implementations are:



  • low randomness of low-order bits;

  • short period;

  • low RAND_MAX;

  • some correlation between successive extractions (in general, LCGs produce numbers that are on a limited number of hyperplanes, although this can be somehow mitigated).

Still, none of these are specific to the API of rand(). A particular implementation could place a xorshift-family generator behind srand/rand and, algoritmically speaking, obtain a state of the art PRNG with no changes of interface, so no test like the one you did would show any weakness in the output.




Indeed, the actual problem with rand() is not much of implementation in principle but:



  • backwards compatibility; many current implementations use suboptimal generators, typically with badly choosen parameters; a notorious example is Visual C++, which sports a RAND_MAX of just 32767. However, this cannot be changed easily, as it would break compatibility with the past - people using srand with a fixed seed for reproducible simulations wouldn't be too happy (indeed, IIRC the aforementioned implementation goes back to Microsoft C early versions - or even Lattice C - from the mid-eighties);


  • bad interface; rand() provides a single generator with global state for the whole program. While this is perfectly fine (and actually quite handy) for many simple use cases, it poses problems:



    • with multithreaded code: to fix it you either need a global mutex - which would slow down everything for no reason and kill any chance of repeatability, as the sequence of calls becomes random itself -, or thread local state; this last one has been adopted by several implementations (notably Visual C++);

    • if you want a "private", reproducible sequence into a specific module of your program that doesn't impact the global state.


Finally, the rand state of affairs:



  • doesn't specify an actual implementation (the C standard provides just a sample implementation), so any program that is intended to produce reproducible output (or expect a PRNG of some known quality) across different compilers must roll its own generator;

  • doesn't provide any cross-platform method to obtain a decent seed (time(NULL) is not, as it isn't granular enough, and often - think embedded devices with no RTC - not even random enough).

Hence the new <random> header, which tries to fix this mess providing algorithms that are:



  • fully specified (so you can have cross-compiler reproducible output and guaranteed characteristics - say, range of the generator);

  • generally of state-of-the-art quality;

  • encapsulated in classes (so no global state is forced upon you, which avoids completely threading and nonlocality problems);

... and a default random_device as well to seed them.



Now, if you ask me I would have liked also a simple API built on top of this for the "easy", "guess a number" cases (similar to how Python does provide the "complicated" API, but also the trivial random.randint & co. using a global, pre-seeded PRNG for us uncomplicated people who'd like not to drown in random devices/engines/adapters/whatever every time we want to extract a number for the bingo cards), but it's true that you can easily build it by yourself over the current facilities (while building the "full" API over a simplistic one wouldn't be possible).






share|improve this answer


















  • 1




    It's exactly the combination of srand and an unspecified algorithm that gets std::rand in trouble. See also my answer to another question.
    – Peter O.
    17 mins ago

















up vote
2
down vote













If you repeat your experiment with a range larger than 5 then you will probably see different results. When your range is significantly smaller than RAND_MAX there isn't an issue for most applications.



For example if we have a RAND_MAX of 25 then rand() % 5 will produce numbers with the following frequencies:



0: 6
1: 5
2: 5
3: 5
4: 5


As RAND_MAX is guaranteed to be more than 32767 and the difference in frequencies between the least likely and the most likely is only 1, for small numbers the distribution is near enough random for most use cases.






share|improve this answer
















  • 2




    This is explained in STL's second slide
    – Alan Birtles
    1 hour ago






  • 1




    Ok, but... who is STL ? And what slides ? (serious question)
    – kebs
    1 hour ago










  • @kebs, Stephan Lavavej, see the Youtube reference in the question.
    – Evg
    1 hour ago










Your Answer





StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53040940%2fis-the-new-random-library-really-better-than-stdrand%23new-answer', 'question_page');

);

Post as a guest






























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
8
down vote













Pretty much any implementation of "old" rand() use an LCG; while they are generally not the best generators around, usually you are not going to see them fail on such a basic test - mean and standard deviation is generally got right even by the worst PRNGs.



Common failings of "bad" - but common enough - rand() implementations are:



  • low randomness of low-order bits;

  • short period;

  • low RAND_MAX;

  • some correlation between successive extractions (in general, LCGs produce numbers that are on a limited number of hyperplanes, although this can be somehow mitigated).

Still, none of these are specific to the API of rand(). A particular implementation could place a xorshift-family generator behind srand/rand and, algoritmically speaking, obtain a state of the art PRNG with no changes of interface, so no test like the one you did would show any weakness in the output.




Indeed, the actual problem with rand() is not much of implementation in principle but:



  • backwards compatibility; many current implementations use suboptimal generators, typically with badly choosen parameters; a notorious example is Visual C++, which sports a RAND_MAX of just 32767. However, this cannot be changed easily, as it would break compatibility with the past - people using srand with a fixed seed for reproducible simulations wouldn't be too happy (indeed, IIRC the aforementioned implementation goes back to Microsoft C early versions - or even Lattice C - from the mid-eighties);


  • bad interface; rand() provides a single generator with global state for the whole program. While this is perfectly fine (and actually quite handy) for many simple use cases, it poses problems:



    • with multithreaded code: to fix it you either need a global mutex - which would slow down everything for no reason and kill any chance of repeatability, as the sequence of calls becomes random itself -, or thread local state; this last one has been adopted by several implementations (notably Visual C++);

    • if you want a "private", reproducible sequence into a specific module of your program that doesn't impact the global state.


Finally, the rand state of affairs:



  • doesn't specify an actual implementation (the C standard provides just a sample implementation), so any program that is intended to produce reproducible output (or expect a PRNG of some known quality) across different compilers must roll its own generator;

  • doesn't provide any cross-platform method to obtain a decent seed (time(NULL) is not, as it isn't granular enough, and often - think embedded devices with no RTC - not even random enough).

Hence the new <random> header, which tries to fix this mess providing algorithms that are:



  • fully specified (so you can have cross-compiler reproducible output and guaranteed characteristics - say, range of the generator);

  • generally of state-of-the-art quality;

  • encapsulated in classes (so no global state is forced upon you, which avoids completely threading and nonlocality problems);

... and a default random_device as well to seed them.



Now, if you ask me I would have liked also a simple API built on top of this for the "easy", "guess a number" cases (similar to how Python does provide the "complicated" API, but also the trivial random.randint & co. using a global, pre-seeded PRNG for us uncomplicated people who'd like not to drown in random devices/engines/adapters/whatever every time we want to extract a number for the bingo cards), but it's true that you can easily build it by yourself over the current facilities (while building the "full" API over a simplistic one wouldn't be possible).






share|improve this answer


















  • 1




    It's exactly the combination of srand and an unspecified algorithm that gets std::rand in trouble. See also my answer to another question.
    – Peter O.
    17 mins ago














up vote
8
down vote













Pretty much any implementation of "old" rand() use an LCG; while they are generally not the best generators around, usually you are not going to see them fail on such a basic test - mean and standard deviation is generally got right even by the worst PRNGs.



Common failings of "bad" - but common enough - rand() implementations are:



  • low randomness of low-order bits;

  • short period;

  • low RAND_MAX;

  • some correlation between successive extractions (in general, LCGs produce numbers that are on a limited number of hyperplanes, although this can be somehow mitigated).

Still, none of these are specific to the API of rand(). A particular implementation could place a xorshift-family generator behind srand/rand and, algoritmically speaking, obtain a state of the art PRNG with no changes of interface, so no test like the one you did would show any weakness in the output.




Indeed, the actual problem with rand() is not much of implementation in principle but:



  • backwards compatibility; many current implementations use suboptimal generators, typically with badly choosen parameters; a notorious example is Visual C++, which sports a RAND_MAX of just 32767. However, this cannot be changed easily, as it would break compatibility with the past - people using srand with a fixed seed for reproducible simulations wouldn't be too happy (indeed, IIRC the aforementioned implementation goes back to Microsoft C early versions - or even Lattice C - from the mid-eighties);


  • bad interface; rand() provides a single generator with global state for the whole program. While this is perfectly fine (and actually quite handy) for many simple use cases, it poses problems:



    • with multithreaded code: to fix it you either need a global mutex - which would slow down everything for no reason and kill any chance of repeatability, as the sequence of calls becomes random itself -, or thread local state; this last one has been adopted by several implementations (notably Visual C++);

    • if you want a "private", reproducible sequence into a specific module of your program that doesn't impact the global state.


Finally, the rand state of affairs:



  • doesn't specify an actual implementation (the C standard provides just a sample implementation), so any program that is intended to produce reproducible output (or expect a PRNG of some known quality) across different compilers must roll its own generator;

  • doesn't provide any cross-platform method to obtain a decent seed (time(NULL) is not, as it isn't granular enough, and often - think embedded devices with no RTC - not even random enough).

Hence the new <random> header, which tries to fix this mess providing algorithms that are:



  • fully specified (so you can have cross-compiler reproducible output and guaranteed characteristics - say, range of the generator);

  • generally of state-of-the-art quality;

  • encapsulated in classes (so no global state is forced upon you, which avoids completely threading and nonlocality problems);

... and a default random_device as well to seed them.



Now, if you ask me I would have liked also a simple API built on top of this for the "easy", "guess a number" cases (similar to how Python does provide the "complicated" API, but also the trivial random.randint & co. using a global, pre-seeded PRNG for us uncomplicated people who'd like not to drown in random devices/engines/adapters/whatever every time we want to extract a number for the bingo cards), but it's true that you can easily build it by yourself over the current facilities (while building the "full" API over a simplistic one wouldn't be possible).






share|improve this answer


















  • 1




    It's exactly the combination of srand and an unspecified algorithm that gets std::rand in trouble. See also my answer to another question.
    – Peter O.
    17 mins ago












up vote
8
down vote










up vote
8
down vote









Pretty much any implementation of "old" rand() use an LCG; while they are generally not the best generators around, usually you are not going to see them fail on such a basic test - mean and standard deviation is generally got right even by the worst PRNGs.



Common failings of "bad" - but common enough - rand() implementations are:



  • low randomness of low-order bits;

  • short period;

  • low RAND_MAX;

  • some correlation between successive extractions (in general, LCGs produce numbers that are on a limited number of hyperplanes, although this can be somehow mitigated).

Still, none of these are specific to the API of rand(). A particular implementation could place a xorshift-family generator behind srand/rand and, algoritmically speaking, obtain a state of the art PRNG with no changes of interface, so no test like the one you did would show any weakness in the output.




Indeed, the actual problem with rand() is not much of implementation in principle but:



  • backwards compatibility; many current implementations use suboptimal generators, typically with badly choosen parameters; a notorious example is Visual C++, which sports a RAND_MAX of just 32767. However, this cannot be changed easily, as it would break compatibility with the past - people using srand with a fixed seed for reproducible simulations wouldn't be too happy (indeed, IIRC the aforementioned implementation goes back to Microsoft C early versions - or even Lattice C - from the mid-eighties);


  • bad interface; rand() provides a single generator with global state for the whole program. While this is perfectly fine (and actually quite handy) for many simple use cases, it poses problems:



    • with multithreaded code: to fix it you either need a global mutex - which would slow down everything for no reason and kill any chance of repeatability, as the sequence of calls becomes random itself -, or thread local state; this last one has been adopted by several implementations (notably Visual C++);

    • if you want a "private", reproducible sequence into a specific module of your program that doesn't impact the global state.


Finally, the rand state of affairs:



  • doesn't specify an actual implementation (the C standard provides just a sample implementation), so any program that is intended to produce reproducible output (or expect a PRNG of some known quality) across different compilers must roll its own generator;

  • doesn't provide any cross-platform method to obtain a decent seed (time(NULL) is not, as it isn't granular enough, and often - think embedded devices with no RTC - not even random enough).

Hence the new <random> header, which tries to fix this mess providing algorithms that are:



  • fully specified (so you can have cross-compiler reproducible output and guaranteed characteristics - say, range of the generator);

  • generally of state-of-the-art quality;

  • encapsulated in classes (so no global state is forced upon you, which avoids completely threading and nonlocality problems);

... and a default random_device as well to seed them.



Now, if you ask me I would have liked also a simple API built on top of this for the "easy", "guess a number" cases (similar to how Python does provide the "complicated" API, but also the trivial random.randint & co. using a global, pre-seeded PRNG for us uncomplicated people who'd like not to drown in random devices/engines/adapters/whatever every time we want to extract a number for the bingo cards), but it's true that you can easily build it by yourself over the current facilities (while building the "full" API over a simplistic one wouldn't be possible).






share|improve this answer














Pretty much any implementation of "old" rand() use an LCG; while they are generally not the best generators around, usually you are not going to see them fail on such a basic test - mean and standard deviation is generally got right even by the worst PRNGs.



Common failings of "bad" - but common enough - rand() implementations are:



  • low randomness of low-order bits;

  • short period;

  • low RAND_MAX;

  • some correlation between successive extractions (in general, LCGs produce numbers that are on a limited number of hyperplanes, although this can be somehow mitigated).

Still, none of these are specific to the API of rand(). A particular implementation could place a xorshift-family generator behind srand/rand and, algoritmically speaking, obtain a state of the art PRNG with no changes of interface, so no test like the one you did would show any weakness in the output.




Indeed, the actual problem with rand() is not much of implementation in principle but:



  • backwards compatibility; many current implementations use suboptimal generators, typically with badly choosen parameters; a notorious example is Visual C++, which sports a RAND_MAX of just 32767. However, this cannot be changed easily, as it would break compatibility with the past - people using srand with a fixed seed for reproducible simulations wouldn't be too happy (indeed, IIRC the aforementioned implementation goes back to Microsoft C early versions - or even Lattice C - from the mid-eighties);


  • bad interface; rand() provides a single generator with global state for the whole program. While this is perfectly fine (and actually quite handy) for many simple use cases, it poses problems:



    • with multithreaded code: to fix it you either need a global mutex - which would slow down everything for no reason and kill any chance of repeatability, as the sequence of calls becomes random itself -, or thread local state; this last one has been adopted by several implementations (notably Visual C++);

    • if you want a "private", reproducible sequence into a specific module of your program that doesn't impact the global state.


Finally, the rand state of affairs:



  • doesn't specify an actual implementation (the C standard provides just a sample implementation), so any program that is intended to produce reproducible output (or expect a PRNG of some known quality) across different compilers must roll its own generator;

  • doesn't provide any cross-platform method to obtain a decent seed (time(NULL) is not, as it isn't granular enough, and often - think embedded devices with no RTC - not even random enough).

Hence the new <random> header, which tries to fix this mess providing algorithms that are:



  • fully specified (so you can have cross-compiler reproducible output and guaranteed characteristics - say, range of the generator);

  • generally of state-of-the-art quality;

  • encapsulated in classes (so no global state is forced upon you, which avoids completely threading and nonlocality problems);

... and a default random_device as well to seed them.



Now, if you ask me I would have liked also a simple API built on top of this for the "easy", "guess a number" cases (similar to how Python does provide the "complicated" API, but also the trivial random.randint & co. using a global, pre-seeded PRNG for us uncomplicated people who'd like not to drown in random devices/engines/adapters/whatever every time we want to extract a number for the bingo cards), but it's true that you can easily build it by yourself over the current facilities (while building the "full" API over a simplistic one wouldn't be possible).







share|improve this answer














share|improve this answer



share|improve this answer








edited 6 mins ago

























answered 1 hour ago









Matteo Italia

95.8k12132234




95.8k12132234







  • 1




    It's exactly the combination of srand and an unspecified algorithm that gets std::rand in trouble. See also my answer to another question.
    – Peter O.
    17 mins ago












  • 1




    It's exactly the combination of srand and an unspecified algorithm that gets std::rand in trouble. See also my answer to another question.
    – Peter O.
    17 mins ago







1




1




It's exactly the combination of srand and an unspecified algorithm that gets std::rand in trouble. See also my answer to another question.
– Peter O.
17 mins ago




It's exactly the combination of srand and an unspecified algorithm that gets std::rand in trouble. See also my answer to another question.
– Peter O.
17 mins ago












up vote
2
down vote













If you repeat your experiment with a range larger than 5 then you will probably see different results. When your range is significantly smaller than RAND_MAX there isn't an issue for most applications.



For example if we have a RAND_MAX of 25 then rand() % 5 will produce numbers with the following frequencies:



0: 6
1: 5
2: 5
3: 5
4: 5


As RAND_MAX is guaranteed to be more than 32767 and the difference in frequencies between the least likely and the most likely is only 1, for small numbers the distribution is near enough random for most use cases.






share|improve this answer
















  • 2




    This is explained in STL's second slide
    – Alan Birtles
    1 hour ago






  • 1




    Ok, but... who is STL ? And what slides ? (serious question)
    – kebs
    1 hour ago










  • @kebs, Stephan Lavavej, see the Youtube reference in the question.
    – Evg
    1 hour ago














up vote
2
down vote













If you repeat your experiment with a range larger than 5 then you will probably see different results. When your range is significantly smaller than RAND_MAX there isn't an issue for most applications.



For example if we have a RAND_MAX of 25 then rand() % 5 will produce numbers with the following frequencies:



0: 6
1: 5
2: 5
3: 5
4: 5


As RAND_MAX is guaranteed to be more than 32767 and the difference in frequencies between the least likely and the most likely is only 1, for small numbers the distribution is near enough random for most use cases.






share|improve this answer
















  • 2




    This is explained in STL's second slide
    – Alan Birtles
    1 hour ago






  • 1




    Ok, but... who is STL ? And what slides ? (serious question)
    – kebs
    1 hour ago










  • @kebs, Stephan Lavavej, see the Youtube reference in the question.
    – Evg
    1 hour ago












up vote
2
down vote










up vote
2
down vote









If you repeat your experiment with a range larger than 5 then you will probably see different results. When your range is significantly smaller than RAND_MAX there isn't an issue for most applications.



For example if we have a RAND_MAX of 25 then rand() % 5 will produce numbers with the following frequencies:



0: 6
1: 5
2: 5
3: 5
4: 5


As RAND_MAX is guaranteed to be more than 32767 and the difference in frequencies between the least likely and the most likely is only 1, for small numbers the distribution is near enough random for most use cases.






share|improve this answer












If you repeat your experiment with a range larger than 5 then you will probably see different results. When your range is significantly smaller than RAND_MAX there isn't an issue for most applications.



For example if we have a RAND_MAX of 25 then rand() % 5 will produce numbers with the following frequencies:



0: 6
1: 5
2: 5
3: 5
4: 5


As RAND_MAX is guaranteed to be more than 32767 and the difference in frequencies between the least likely and the most likely is only 1, for small numbers the distribution is near enough random for most use cases.







share|improve this answer












share|improve this answer



share|improve this answer










answered 1 hour ago









Alan Birtles

6,548632




6,548632







  • 2




    This is explained in STL's second slide
    – Alan Birtles
    1 hour ago






  • 1




    Ok, but... who is STL ? And what slides ? (serious question)
    – kebs
    1 hour ago










  • @kebs, Stephan Lavavej, see the Youtube reference in the question.
    – Evg
    1 hour ago












  • 2




    This is explained in STL's second slide
    – Alan Birtles
    1 hour ago






  • 1




    Ok, but... who is STL ? And what slides ? (serious question)
    – kebs
    1 hour ago










  • @kebs, Stephan Lavavej, see the Youtube reference in the question.
    – Evg
    1 hour ago







2




2




This is explained in STL's second slide
– Alan Birtles
1 hour ago




This is explained in STL's second slide
– Alan Birtles
1 hour ago




1




1




Ok, but... who is STL ? And what slides ? (serious question)
– kebs
1 hour ago




Ok, but... who is STL ? And what slides ? (serious question)
– kebs
1 hour ago












@kebs, Stephan Lavavej, see the Youtube reference in the question.
– Evg
1 hour ago




@kebs, Stephan Lavavej, see the Youtube reference in the question.
– Evg
1 hour ago

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53040940%2fis-the-new-random-library-really-better-than-stdrand%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What does second last employer means? [closed]

List of Gilmore Girls characters

Confectionery