Why is the new random library better than std::rand()?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
55
down vote

favorite
11












So I saw a talk called rand() Considered Harmful and it advocated for using the engine-distribution paradigm of random number generation over the simple std::rand() plus modulus paradigm.



However, I wanted to see the failings of std::rand() firsthand so I did a quick experiment:



  1. Basically, I wrote 2 functions getRandNum_Old() and getRandNum_New() that generated a random number between 0 and 5 inclusive using std::rand() and std::mt19937+std::uniform_int_distribution respectively.

  2. Then I generated 960,000 (divisible by 6) random numbers using the "old" way and recorded the frequencies of the numbers 0-5. Then I calculated the standard deviation of these frequencies. What I'm looking for is a standard deviation as low as possible since that is what would happen if the distribution were truly uniform.

  3. I ran that simulation 1000 times and recorded the standard deviation for each simulation. I also recorded the time it took in milliseconds.

  4. Afterwards, I did the exact same again but this time generating random numbers the "new" way.

  5. Finally, I calculated the mean and standard deviation of the list of standard deviations for both the old and new way and the mean and standard deviation for the list of times taken for both the old and new way.

Here were the results:



[OLD WAY]
Spread
mean: 346.554406
std dev: 110.318361
Time Taken (ms)
mean: 6.662910
std dev: 0.366301

[NEW WAY]
Spread
mean: 350.346792
std dev: 110.449190
Time Taken (ms)
mean: 28.053907
std dev: 0.654964


Surprisingly, the aggregate spread of rolls was the same for both methods. I.e., std::mt19937+std::uniform_int_distribution was not "more uniform" than simple std::rand()+%. Another observation I made was that the new was about 4x slower than the old way. Overall, it seemed like I was paying a huge cost in speed for almost no gain in quality.



Is my experiment flawed in some way? Or is std::rand() really not that bad, and maybe even better?



For reference, here is the code I used in its entirety:



#include <cstdio>
#include <random>
#include <algorithm>
#include <chrono>

int getRandNum_Old()
static bool init = false;
if (!init)
std::srand(time(nullptr)); // Seed std::rand
init = true;


return std::rand() % 6;


int getRandNum_New()
static bool init = false;
static std::random_device rd;
static std::mt19937 eng;
static std::uniform_int_distribution<int> dist(0,5);
if (!init)
eng.seed(rd()); // Seed random engine
init = true;


return dist(eng);


template <typename T>
double mean(T* data, int n)
double m = 0;
std::for_each(data, data+n, [&](T x) m += x; );
m /= n;
return m;


template <typename T>
double stdDev(T* data, int n)
double m = mean(data, n);
double sd = 0.0;
std::for_each(data, data+n, [&](T x) sd += ((x-m) * (x-m)); );
sd /= n;
sd = sqrt(sd);
return sd;


int main()
const int N = 960000; // Number of trials
const int M = 1000; // Number of simulations
const int D = 6; // Num sides on die

/* Do the things the "old" way (blech) */

int freqList_Old[D];
double stdDevList_Old[M];
double timeTakenList_Old[M];

for (int j = 0; j < M; j++)
auto start = std::chrono::high_resolution_clock::now();
std::fill_n(freqList_Old, D, 0);
for (int i = 0; i < N; i++)
int roll = getRandNum_Old();
freqList_Old[roll] += 1;

stdDevList_Old[j] = stdDev(freqList_Old, D);
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
double timeTaken = dur.count() / 1000.0;
timeTakenList_Old[j] = timeTaken;


/* Do the things the cool new way! */

int freqList_New[D];
double stdDevList_New[M];
double timeTakenList_New[M];

for (int j = 0; j < M; j++)
auto start = std::chrono::high_resolution_clock::now();
std::fill_n(freqList_New, D, 0);
for (int i = 0; i < N; i++)
int roll = getRandNum_New();
freqList_New[roll] += 1;

stdDevList_New[j] = stdDev(freqList_New, D);
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
double timeTaken = dur.count() / 1000.0;
timeTakenList_New[j] = timeTaken;


/* Display Results */

printf("[OLD WAY]n");
printf("Spreadn");
printf(" mean: %.6fn", mean(stdDevList_Old, M));
printf(" std dev: %.6fn", stdDev(stdDevList_Old, M));
printf("Time Taken (ms)n");
printf(" mean: %.6fn", mean(timeTakenList_Old, M));
printf(" std dev: %.6fn", stdDev(timeTakenList_Old, M));
printf("n");
printf("[NEW WAY]n");
printf("Spreadn");
printf(" mean: %.6fn", mean(stdDevList_New, M));
printf(" std dev: %.6fn", stdDev(stdDevList_New, M));
printf("Time Taken (ms)n");
printf(" mean: %.6fn", mean(timeTakenList_New, M));
printf(" std dev: %.6fn", stdDev(timeTakenList_New, M));










share|improve this question



















  • 25




    That is pretty much why this advice exists. If you don't know how to test the RNG for sufficient entropy or whether or not it matters for your program then you should assume that std::rand() isn't good enough. en.wikipedia.org/wiki/Entropy_(computing)
    – Hans Passant
    yesterday






  • 4




    The bottom line on whether rand() is good-enough depends largely on what you are using the collection of random numbers for. It you need a particular type of random distribution, then, of course the library implementation will be better. If you simply need random numbers and don't care about the "randomness" or what type of distribution is produced, then rand() is fine. Match the proper tool to the job at hand.
    – David C. Rankin
    yesterday






  • 2




    possible dupe: stackoverflow.com/questions/52869166/… I just don't want to hammer this one, so I refrain from actually voting.
    – bolov
    yesterday






  • 15




    for (i=0; i<k*n; i++) a[i]=i%n; produces same exact mean and standard deviation as the best RNG out there. If this is good enough for your application, just use this sequence.
    – n.m.
    yesterday






  • 2




    "standard deviation as low as possible" - no. That's wrong. You expect the frequencies to be a little bit different - about sqrt(frequency) is about what you expect the standard deviation to be. The "incrementing counter" that n.m. produced will have a much lower s.d. (and is a very bad rng).
    – Martin Bonner
    yesterday














up vote
55
down vote

favorite
11












So I saw a talk called rand() Considered Harmful and it advocated for using the engine-distribution paradigm of random number generation over the simple std::rand() plus modulus paradigm.



However, I wanted to see the failings of std::rand() firsthand so I did a quick experiment:



  1. Basically, I wrote 2 functions getRandNum_Old() and getRandNum_New() that generated a random number between 0 and 5 inclusive using std::rand() and std::mt19937+std::uniform_int_distribution respectively.

  2. Then I generated 960,000 (divisible by 6) random numbers using the "old" way and recorded the frequencies of the numbers 0-5. Then I calculated the standard deviation of these frequencies. What I'm looking for is a standard deviation as low as possible since that is what would happen if the distribution were truly uniform.

  3. I ran that simulation 1000 times and recorded the standard deviation for each simulation. I also recorded the time it took in milliseconds.

  4. Afterwards, I did the exact same again but this time generating random numbers the "new" way.

  5. Finally, I calculated the mean and standard deviation of the list of standard deviations for both the old and new way and the mean and standard deviation for the list of times taken for both the old and new way.

Here were the results:



[OLD WAY]
Spread
mean: 346.554406
std dev: 110.318361
Time Taken (ms)
mean: 6.662910
std dev: 0.366301

[NEW WAY]
Spread
mean: 350.346792
std dev: 110.449190
Time Taken (ms)
mean: 28.053907
std dev: 0.654964


Surprisingly, the aggregate spread of rolls was the same for both methods. I.e., std::mt19937+std::uniform_int_distribution was not "more uniform" than simple std::rand()+%. Another observation I made was that the new was about 4x slower than the old way. Overall, it seemed like I was paying a huge cost in speed for almost no gain in quality.



Is my experiment flawed in some way? Or is std::rand() really not that bad, and maybe even better?



For reference, here is the code I used in its entirety:



#include <cstdio>
#include <random>
#include <algorithm>
#include <chrono>

int getRandNum_Old()
static bool init = false;
if (!init)
std::srand(time(nullptr)); // Seed std::rand
init = true;


return std::rand() % 6;


int getRandNum_New()
static bool init = false;
static std::random_device rd;
static std::mt19937 eng;
static std::uniform_int_distribution<int> dist(0,5);
if (!init)
eng.seed(rd()); // Seed random engine
init = true;


return dist(eng);


template <typename T>
double mean(T* data, int n)
double m = 0;
std::for_each(data, data+n, [&](T x) m += x; );
m /= n;
return m;


template <typename T>
double stdDev(T* data, int n)
double m = mean(data, n);
double sd = 0.0;
std::for_each(data, data+n, [&](T x) sd += ((x-m) * (x-m)); );
sd /= n;
sd = sqrt(sd);
return sd;


int main()
const int N = 960000; // Number of trials
const int M = 1000; // Number of simulations
const int D = 6; // Num sides on die

/* Do the things the "old" way (blech) */

int freqList_Old[D];
double stdDevList_Old[M];
double timeTakenList_Old[M];

for (int j = 0; j < M; j++)
auto start = std::chrono::high_resolution_clock::now();
std::fill_n(freqList_Old, D, 0);
for (int i = 0; i < N; i++)
int roll = getRandNum_Old();
freqList_Old[roll] += 1;

stdDevList_Old[j] = stdDev(freqList_Old, D);
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
double timeTaken = dur.count() / 1000.0;
timeTakenList_Old[j] = timeTaken;


/* Do the things the cool new way! */

int freqList_New[D];
double stdDevList_New[M];
double timeTakenList_New[M];

for (int j = 0; j < M; j++)
auto start = std::chrono::high_resolution_clock::now();
std::fill_n(freqList_New, D, 0);
for (int i = 0; i < N; i++)
int roll = getRandNum_New();
freqList_New[roll] += 1;

stdDevList_New[j] = stdDev(freqList_New, D);
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
double timeTaken = dur.count() / 1000.0;
timeTakenList_New[j] = timeTaken;


/* Display Results */

printf("[OLD WAY]n");
printf("Spreadn");
printf(" mean: %.6fn", mean(stdDevList_Old, M));
printf(" std dev: %.6fn", stdDev(stdDevList_Old, M));
printf("Time Taken (ms)n");
printf(" mean: %.6fn", mean(timeTakenList_Old, M));
printf(" std dev: %.6fn", stdDev(timeTakenList_Old, M));
printf("n");
printf("[NEW WAY]n");
printf("Spreadn");
printf(" mean: %.6fn", mean(stdDevList_New, M));
printf(" std dev: %.6fn", stdDev(stdDevList_New, M));
printf("Time Taken (ms)n");
printf(" mean: %.6fn", mean(timeTakenList_New, M));
printf(" std dev: %.6fn", stdDev(timeTakenList_New, M));










share|improve this question



















  • 25




    That is pretty much why this advice exists. If you don't know how to test the RNG for sufficient entropy or whether or not it matters for your program then you should assume that std::rand() isn't good enough. en.wikipedia.org/wiki/Entropy_(computing)
    – Hans Passant
    yesterday






  • 4




    The bottom line on whether rand() is good-enough depends largely on what you are using the collection of random numbers for. It you need a particular type of random distribution, then, of course the library implementation will be better. If you simply need random numbers and don't care about the "randomness" or what type of distribution is produced, then rand() is fine. Match the proper tool to the job at hand.
    – David C. Rankin
    yesterday






  • 2




    possible dupe: stackoverflow.com/questions/52869166/… I just don't want to hammer this one, so I refrain from actually voting.
    – bolov
    yesterday






  • 15




    for (i=0; i<k*n; i++) a[i]=i%n; produces same exact mean and standard deviation as the best RNG out there. If this is good enough for your application, just use this sequence.
    – n.m.
    yesterday






  • 2




    "standard deviation as low as possible" - no. That's wrong. You expect the frequencies to be a little bit different - about sqrt(frequency) is about what you expect the standard deviation to be. The "incrementing counter" that n.m. produced will have a much lower s.d. (and is a very bad rng).
    – Martin Bonner
    yesterday












up vote
55
down vote

favorite
11









up vote
55
down vote

favorite
11






11





So I saw a talk called rand() Considered Harmful and it advocated for using the engine-distribution paradigm of random number generation over the simple std::rand() plus modulus paradigm.



However, I wanted to see the failings of std::rand() firsthand so I did a quick experiment:



  1. Basically, I wrote 2 functions getRandNum_Old() and getRandNum_New() that generated a random number between 0 and 5 inclusive using std::rand() and std::mt19937+std::uniform_int_distribution respectively.

  2. Then I generated 960,000 (divisible by 6) random numbers using the "old" way and recorded the frequencies of the numbers 0-5. Then I calculated the standard deviation of these frequencies. What I'm looking for is a standard deviation as low as possible since that is what would happen if the distribution were truly uniform.

  3. I ran that simulation 1000 times and recorded the standard deviation for each simulation. I also recorded the time it took in milliseconds.

  4. Afterwards, I did the exact same again but this time generating random numbers the "new" way.

  5. Finally, I calculated the mean and standard deviation of the list of standard deviations for both the old and new way and the mean and standard deviation for the list of times taken for both the old and new way.

Here were the results:



[OLD WAY]
Spread
mean: 346.554406
std dev: 110.318361
Time Taken (ms)
mean: 6.662910
std dev: 0.366301

[NEW WAY]
Spread
mean: 350.346792
std dev: 110.449190
Time Taken (ms)
mean: 28.053907
std dev: 0.654964


Surprisingly, the aggregate spread of rolls was the same for both methods. I.e., std::mt19937+std::uniform_int_distribution was not "more uniform" than simple std::rand()+%. Another observation I made was that the new was about 4x slower than the old way. Overall, it seemed like I was paying a huge cost in speed for almost no gain in quality.



Is my experiment flawed in some way? Or is std::rand() really not that bad, and maybe even better?



For reference, here is the code I used in its entirety:



#include <cstdio>
#include <random>
#include <algorithm>
#include <chrono>

int getRandNum_Old()
static bool init = false;
if (!init)
std::srand(time(nullptr)); // Seed std::rand
init = true;


return std::rand() % 6;


int getRandNum_New()
static bool init = false;
static std::random_device rd;
static std::mt19937 eng;
static std::uniform_int_distribution<int> dist(0,5);
if (!init)
eng.seed(rd()); // Seed random engine
init = true;


return dist(eng);


template <typename T>
double mean(T* data, int n)
double m = 0;
std::for_each(data, data+n, [&](T x) m += x; );
m /= n;
return m;


template <typename T>
double stdDev(T* data, int n)
double m = mean(data, n);
double sd = 0.0;
std::for_each(data, data+n, [&](T x) sd += ((x-m) * (x-m)); );
sd /= n;
sd = sqrt(sd);
return sd;


int main()
const int N = 960000; // Number of trials
const int M = 1000; // Number of simulations
const int D = 6; // Num sides on die

/* Do the things the "old" way (blech) */

int freqList_Old[D];
double stdDevList_Old[M];
double timeTakenList_Old[M];

for (int j = 0; j < M; j++)
auto start = std::chrono::high_resolution_clock::now();
std::fill_n(freqList_Old, D, 0);
for (int i = 0; i < N; i++)
int roll = getRandNum_Old();
freqList_Old[roll] += 1;

stdDevList_Old[j] = stdDev(freqList_Old, D);
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
double timeTaken = dur.count() / 1000.0;
timeTakenList_Old[j] = timeTaken;


/* Do the things the cool new way! */

int freqList_New[D];
double stdDevList_New[M];
double timeTakenList_New[M];

for (int j = 0; j < M; j++)
auto start = std::chrono::high_resolution_clock::now();
std::fill_n(freqList_New, D, 0);
for (int i = 0; i < N; i++)
int roll = getRandNum_New();
freqList_New[roll] += 1;

stdDevList_New[j] = stdDev(freqList_New, D);
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
double timeTaken = dur.count() / 1000.0;
timeTakenList_New[j] = timeTaken;


/* Display Results */

printf("[OLD WAY]n");
printf("Spreadn");
printf(" mean: %.6fn", mean(stdDevList_Old, M));
printf(" std dev: %.6fn", stdDev(stdDevList_Old, M));
printf("Time Taken (ms)n");
printf(" mean: %.6fn", mean(timeTakenList_Old, M));
printf(" std dev: %.6fn", stdDev(timeTakenList_Old, M));
printf("n");
printf("[NEW WAY]n");
printf("Spreadn");
printf(" mean: %.6fn", mean(stdDevList_New, M));
printf(" std dev: %.6fn", stdDev(stdDevList_New, M));
printf("Time Taken (ms)n");
printf(" mean: %.6fn", mean(timeTakenList_New, M));
printf(" std dev: %.6fn", stdDev(timeTakenList_New, M));










share|improve this question















So I saw a talk called rand() Considered Harmful and it advocated for using the engine-distribution paradigm of random number generation over the simple std::rand() plus modulus paradigm.



However, I wanted to see the failings of std::rand() firsthand so I did a quick experiment:



  1. Basically, I wrote 2 functions getRandNum_Old() and getRandNum_New() that generated a random number between 0 and 5 inclusive using std::rand() and std::mt19937+std::uniform_int_distribution respectively.

  2. Then I generated 960,000 (divisible by 6) random numbers using the "old" way and recorded the frequencies of the numbers 0-5. Then I calculated the standard deviation of these frequencies. What I'm looking for is a standard deviation as low as possible since that is what would happen if the distribution were truly uniform.

  3. I ran that simulation 1000 times and recorded the standard deviation for each simulation. I also recorded the time it took in milliseconds.

  4. Afterwards, I did the exact same again but this time generating random numbers the "new" way.

  5. Finally, I calculated the mean and standard deviation of the list of standard deviations for both the old and new way and the mean and standard deviation for the list of times taken for both the old and new way.

Here were the results:



[OLD WAY]
Spread
mean: 346.554406
std dev: 110.318361
Time Taken (ms)
mean: 6.662910
std dev: 0.366301

[NEW WAY]
Spread
mean: 350.346792
std dev: 110.449190
Time Taken (ms)
mean: 28.053907
std dev: 0.654964


Surprisingly, the aggregate spread of rolls was the same for both methods. I.e., std::mt19937+std::uniform_int_distribution was not "more uniform" than simple std::rand()+%. Another observation I made was that the new was about 4x slower than the old way. Overall, it seemed like I was paying a huge cost in speed for almost no gain in quality.



Is my experiment flawed in some way? Or is std::rand() really not that bad, and maybe even better?



For reference, here is the code I used in its entirety:



#include <cstdio>
#include <random>
#include <algorithm>
#include <chrono>

int getRandNum_Old()
static bool init = false;
if (!init)
std::srand(time(nullptr)); // Seed std::rand
init = true;


return std::rand() % 6;


int getRandNum_New()
static bool init = false;
static std::random_device rd;
static std::mt19937 eng;
static std::uniform_int_distribution<int> dist(0,5);
if (!init)
eng.seed(rd()); // Seed random engine
init = true;


return dist(eng);


template <typename T>
double mean(T* data, int n)
double m = 0;
std::for_each(data, data+n, [&](T x) m += x; );
m /= n;
return m;


template <typename T>
double stdDev(T* data, int n)
double m = mean(data, n);
double sd = 0.0;
std::for_each(data, data+n, [&](T x) sd += ((x-m) * (x-m)); );
sd /= n;
sd = sqrt(sd);
return sd;


int main()
const int N = 960000; // Number of trials
const int M = 1000; // Number of simulations
const int D = 6; // Num sides on die

/* Do the things the "old" way (blech) */

int freqList_Old[D];
double stdDevList_Old[M];
double timeTakenList_Old[M];

for (int j = 0; j < M; j++)
auto start = std::chrono::high_resolution_clock::now();
std::fill_n(freqList_Old, D, 0);
for (int i = 0; i < N; i++)
int roll = getRandNum_Old();
freqList_Old[roll] += 1;

stdDevList_Old[j] = stdDev(freqList_Old, D);
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
double timeTaken = dur.count() / 1000.0;
timeTakenList_Old[j] = timeTaken;


/* Do the things the cool new way! */

int freqList_New[D];
double stdDevList_New[M];
double timeTakenList_New[M];

for (int j = 0; j < M; j++)
auto start = std::chrono::high_resolution_clock::now();
std::fill_n(freqList_New, D, 0);
for (int i = 0; i < N; i++)
int roll = getRandNum_New();
freqList_New[roll] += 1;

stdDevList_New[j] = stdDev(freqList_New, D);
auto end = std::chrono::high_resolution_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
double timeTaken = dur.count() / 1000.0;
timeTakenList_New[j] = timeTaken;


/* Display Results */

printf("[OLD WAY]n");
printf("Spreadn");
printf(" mean: %.6fn", mean(stdDevList_Old, M));
printf(" std dev: %.6fn", stdDev(stdDevList_Old, M));
printf("Time Taken (ms)n");
printf(" mean: %.6fn", mean(timeTakenList_Old, M));
printf(" std dev: %.6fn", stdDev(timeTakenList_Old, M));
printf("n");
printf("[NEW WAY]n");
printf("Spreadn");
printf(" mean: %.6fn", mean(stdDevList_New, M));
printf(" std dev: %.6fn", stdDev(stdDevList_New, M));
printf("Time Taken (ms)n");
printf(" mean: %.6fn", mean(timeTakenList_New, M));
printf(" std dev: %.6fn", stdDev(timeTakenList_New, M));







c++ c++11 random






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 11 mins ago









jpmc26

14k75597




14k75597










asked yesterday









rcplusplus

97811425




97811425







  • 25




    That is pretty much why this advice exists. If you don't know how to test the RNG for sufficient entropy or whether or not it matters for your program then you should assume that std::rand() isn't good enough. en.wikipedia.org/wiki/Entropy_(computing)
    – Hans Passant
    yesterday






  • 4




    The bottom line on whether rand() is good-enough depends largely on what you are using the collection of random numbers for. It you need a particular type of random distribution, then, of course the library implementation will be better. If you simply need random numbers and don't care about the "randomness" or what type of distribution is produced, then rand() is fine. Match the proper tool to the job at hand.
    – David C. Rankin
    yesterday






  • 2




    possible dupe: stackoverflow.com/questions/52869166/… I just don't want to hammer this one, so I refrain from actually voting.
    – bolov
    yesterday






  • 15




    for (i=0; i<k*n; i++) a[i]=i%n; produces same exact mean and standard deviation as the best RNG out there. If this is good enough for your application, just use this sequence.
    – n.m.
    yesterday






  • 2




    "standard deviation as low as possible" - no. That's wrong. You expect the frequencies to be a little bit different - about sqrt(frequency) is about what you expect the standard deviation to be. The "incrementing counter" that n.m. produced will have a much lower s.d. (and is a very bad rng).
    – Martin Bonner
    yesterday












  • 25




    That is pretty much why this advice exists. If you don't know how to test the RNG for sufficient entropy or whether or not it matters for your program then you should assume that std::rand() isn't good enough. en.wikipedia.org/wiki/Entropy_(computing)
    – Hans Passant
    yesterday






  • 4




    The bottom line on whether rand() is good-enough depends largely on what you are using the collection of random numbers for. It you need a particular type of random distribution, then, of course the library implementation will be better. If you simply need random numbers and don't care about the "randomness" or what type of distribution is produced, then rand() is fine. Match the proper tool to the job at hand.
    – David C. Rankin
    yesterday






  • 2




    possible dupe: stackoverflow.com/questions/52869166/… I just don't want to hammer this one, so I refrain from actually voting.
    – bolov
    yesterday






  • 15




    for (i=0; i<k*n; i++) a[i]=i%n; produces same exact mean and standard deviation as the best RNG out there. If this is good enough for your application, just use this sequence.
    – n.m.
    yesterday






  • 2




    "standard deviation as low as possible" - no. That's wrong. You expect the frequencies to be a little bit different - about sqrt(frequency) is about what you expect the standard deviation to be. The "incrementing counter" that n.m. produced will have a much lower s.d. (and is a very bad rng).
    – Martin Bonner
    yesterday







25




25




That is pretty much why this advice exists. If you don't know how to test the RNG for sufficient entropy or whether or not it matters for your program then you should assume that std::rand() isn't good enough. en.wikipedia.org/wiki/Entropy_(computing)
– Hans Passant
yesterday




That is pretty much why this advice exists. If you don't know how to test the RNG for sufficient entropy or whether or not it matters for your program then you should assume that std::rand() isn't good enough. en.wikipedia.org/wiki/Entropy_(computing)
– Hans Passant
yesterday




4




4




The bottom line on whether rand() is good-enough depends largely on what you are using the collection of random numbers for. It you need a particular type of random distribution, then, of course the library implementation will be better. If you simply need random numbers and don't care about the "randomness" or what type of distribution is produced, then rand() is fine. Match the proper tool to the job at hand.
– David C. Rankin
yesterday




The bottom line on whether rand() is good-enough depends largely on what you are using the collection of random numbers for. It you need a particular type of random distribution, then, of course the library implementation will be better. If you simply need random numbers and don't care about the "randomness" or what type of distribution is produced, then rand() is fine. Match the proper tool to the job at hand.
– David C. Rankin
yesterday




2




2




possible dupe: stackoverflow.com/questions/52869166/… I just don't want to hammer this one, so I refrain from actually voting.
– bolov
yesterday




possible dupe: stackoverflow.com/questions/52869166/… I just don't want to hammer this one, so I refrain from actually voting.
– bolov
yesterday




15




15




for (i=0; i<k*n; i++) a[i]=i%n; produces same exact mean and standard deviation as the best RNG out there. If this is good enough for your application, just use this sequence.
– n.m.
yesterday




for (i=0; i<k*n; i++) a[i]=i%n; produces same exact mean and standard deviation as the best RNG out there. If this is good enough for your application, just use this sequence.
– n.m.
yesterday




2




2




"standard deviation as low as possible" - no. That's wrong. You expect the frequencies to be a little bit different - about sqrt(frequency) is about what you expect the standard deviation to be. The "incrementing counter" that n.m. produced will have a much lower s.d. (and is a very bad rng).
– Martin Bonner
yesterday




"standard deviation as low as possible" - no. That's wrong. You expect the frequencies to be a little bit different - about sqrt(frequency) is about what you expect the standard deviation to be. The "incrementing counter" that n.m. produced will have a much lower s.d. (and is a very bad rng).
– Martin Bonner
yesterday












4 Answers
4






active

oldest

votes

















up vote
89
down vote



accepted










Pretty much any implementation of "old" rand() use an LCG; while they are generally not the best generators around, usually you are not going to see them fail on such a basic test - mean and standard deviation is generally got right even by the worst PRNGs.



Common failings of "bad" - but common enough - rand() implementations are:



  • low randomness of low-order bits;

  • short period;

  • low RAND_MAX;

  • some correlation between successive extractions (in general, LCGs produce numbers that are on a limited number of hyperplanes, although this can be somehow mitigated).

Still, none of these are specific to the API of rand(). A particular implementation could place a xorshift-family generator behind srand/rand and, algoritmically speaking, obtain a state of the art PRNG with no changes of interface, so no test like the one you did would show any weakness in the output.



Edit: @R. correctly notes that the rand/srand interface is limited by the fact that srand takes an unsigned int, so any generator an implementation may put behind them is intrinsically limited to UINT_MAX possible starting seeds (and thus generated sequences). This is true indeed, although the API could be trivially extended to make srand take an unsigned long long, or adding a separate srand(unsigned char *, size_t) overload.




Indeed, the actual problem with rand() is not much of implementation in principle but:



  • backwards compatibility; many current implementations use suboptimal generators, typically with badly choosen parameters; a notorious example is Visual C++, which sports a RAND_MAX of just 32767. However, this cannot be changed easily, as it would break compatibility with the past - people using srand with a fixed seed for reproducible simulations wouldn't be too happy (indeed, IIRC the aforementioned implementation goes back to Microsoft C early versions - or even Lattice C - from the mid-eighties);


  • simplistic interface; rand() provides a single generator with global state for the whole program. While this is perfectly fine (and actually quite handy) for many simple use cases, it poses problems:



    • with multithreaded code: to fix it you either need a global mutex - which would slow down everything for no reason and kill any chance of repeatability, as the sequence of calls becomes random itself -, or thread local state; this last one has been adopted by several implementations (notably Visual C++);

    • if you want a "private", reproducible sequence into a specific module of your program that doesn't impact the global state.


Finally, the rand state of affairs:



  • doesn't specify an actual implementation (the C standard provides just a sample implementation), so any program that is intended to produce reproducible output (or expect a PRNG of some known quality) across different compilers must roll its own generator;

  • doesn't provide any cross-platform method to obtain a decent seed (time(NULL) is not, as it isn't granular enough, and often - think embedded devices with no RTC - not even random enough).

Hence the new <random> header, which tries to fix this mess providing algorithms that are:



  • fully specified (so you can have cross-compiler reproducible output and guaranteed characteristics - say, range of the generator);

  • generally of state-of-the-art quality (from when the library was designed; see below);

  • encapsulated in classes (so no global state is forced upon you, which avoids completely threading and nonlocality problems);

... and a default random_device as well to seed them.



Now, if you ask me I would have liked also a simple API built on top of this for the "easy", "guess a number" cases (similar to how Python does provide the "complicated" API, but also the trivial random.randint & co. using a global, pre-seeded PRNG for us uncomplicated people who'd like not to drown in random devices/engines/adapters/whatever every time we want to extract a number for the bingo cards), but it's true that you can easily build it by yourself over the current facilities (while building the "full" API over a simplistic one wouldn't be possible).




Finally, to get back to your performance comparison: as other have specified, you are comparing a fast LCG with a slower (but generally considered better quality) Mersenne Twister; if you are ok with the quality of an LCG, you can use std::minstd_rand instead of std::mt19937.



Indeed, after tweaking your function to use std::minstd_rand and avoid useless static variables for initialization



int getRandNum_New() 
static std::minstd_rand engstd::random_device();
static std::uniform_int_distribution<int> dist0, 5;
return dist(eng);



I get 9 ms (old) vs 21 ms (new); finally, if I get rid of dist (which, compared to the classic modulo operator, handles the distribution skew for output range not multiple of the input range) and get back to what you are doing in getRandNum_Old()



int getRandNum_New() 
static std::minstd_rand engstd::random_device();
return eng() % 6;



I get it down to 6 ms (so, 30% faster), probably because, unlike the call to rand(), std::minstd_rand is easier to inline.




Incidentally, I did the same test using a hand-rolled (but pretty much conforming to the standard library interface) XorShift64*, and it's 2.3 times faster than rand() (3.68 ms vs 8.61 ms); given that, unlike the Mersenne Twister and the various provided LCGs, it passes the current randomness test suites with flying colors and it's blazingly fast, it makes you wonder why it isn't included in the standard library yet.






share|improve this answer


















  • 3




    It's exactly the combination of srand and an unspecified algorithm that gets std::rand in trouble. See also my answer to another question.
    – Peter O.
    yesterday






  • 2




    rand is fundamentally limited at the API level in that the seed (and thus the number of possible sequences that can be produced) is bounded by UINT_MAX+1.
    – R..
    yesterday






  • 2




    just a note: minstd is a bad PRNG, mt19973 is better but not by much: pcg-random.org/… (in that chart, minstd==LCG32/64). it's quite a shame that C++ doesn't provide any high-quality, fast, PRNGs like PCG or xoroshiro128+.
    – user60561
    19 hours ago






  • 2




    @MatteoItalia We're not in disagreement. This was also Bjarne's point. We really want <random> in the standard, but we would also like a "just give me a decent implementation that I can use now" option. For PRNGs as well as other things.
    – ravnsgaard
    14 hours ago






  • 2




    A couple notes: 1. Replacing std::uniform_int_distribution<int> dist0, 5(eng); with eng() % 6 reintroduces the skew factor that the std::rand code suffers from (admittedly minor skew in this case, where the engine has 2**31 - 1 outputs, and you're distributing them to 6 buckets). 2. On your note about "srand takes an unsigned int" which limits the possible outputs, as written, seeding your engine with just std::random_device() has the same problem; you need a seed_seq to properly initialize most PRNGs.
    – ShadowRanger
    9 hours ago

















up vote
5
down vote













If you repeat your experiment with a range larger than 5 then you will probably see different results. When your range is significantly smaller than RAND_MAX there isn't an issue for most applications.



For example if we have a RAND_MAX of 25 then rand() % 5 will produce numbers with the following frequencies:



0: 6
1: 5
2: 5
3: 5
4: 5


As RAND_MAX is guaranteed to be more than 32767 and the difference in frequencies between the least likely and the most likely is only 1, for small numbers the distribution is near enough random for most use cases.






share|improve this answer
















  • 3




    This is explained in STL's second slide
    – Alan Birtles
    yesterday






  • 3




    Ok, but... who is STL ? And what slides ? (serious question)
    – kebs
    yesterday










  • @kebs, Stephan Lavavej, see the Youtube reference in the question.
    – Evg
    yesterday

















up vote
2
down vote













First, surprisingly, the answer changes depending on what you are using the random number for. If it is to drive, say, a random background color changer, using rand() is perfectly fine. If you are using a random number to create a random poker hand or a cryptographically secure key, then it is not fine.



Predictability: the sequence 012345012345012345012345... would provide an even distribution of each number in your sample, but obviously isn't random. For a sequence to be random, the value of n+1 cannot be easily predicted by the value of n (or even by the values of n, n-1, n-2, n-3, etc.) Clearly a repeating sequence of the same digits is a degenerate case, but a sequence generated with any linear congruential generator can be subjected to analysis; if you use default out-of-the-box settings of a common LCG from a common library, a malicious person could "break the sequence" without much effort at all. In the past, several on-line casinos (and some brick-and-mortar ones) were hit for losses by machines using poor random number generators. Even people who should know better have been caught up; TPM chips from several manufacturers have been demonstrated to be easier to break than the bit-length of the keys would otherwise predict because of poor choices made with key-generation parameters.



Distribution: As alluded to in the video, taking a modulo of 100 (or any value not evenly divisible into the length of the sequence) will guarantee that some outcomes will become at least slightly more likely than other outcomes. In the universe of 32767 possible starting values modulo 100, the numbers 0 through 66 will appear 328/327 (0.3%) more often than the values 67 through 99; a factor that may provide an attacker an advantage.






share|improve this answer






















  • "Predictability: the sequence 012345012345012345012345... would pass your test for "randomness", in that there would be an even distribution of each number in your sample" actually, not really; what he is measuring is the stddev of the stddevs between runs, i.e. essentially how the various runs histogram are spread out. With a 012345012345012345... generator it would always be zero.
    – Matteo Italia
    yesterday










  • Good point; I read through OP's code a bit too quickly, I'm afraid. Edited my answer to reflect.
    – JackLThornton
    yesterday










  • Hehe I know because I though to make that test as well, and I noticed I obtained different results 😄
    – Matteo Italia
    yesterday

















up vote
0
down vote













The correct answer is: it depends on what you mean by "better."



The "new" <random> engines were introduced to C++ over 13 years ago, so they're not really new. The C library rand() was introduced decades ago and has been very useful in that time for any number of things.



The C++ standard library provides three classes of random number generator engines: the Linear Congruential (of which rand() is an example), the Lagged Fibonacci, and the Mersenne Twister. There are tradeoffs of each class, and each class is "best" in certain ways. For example, the LCGs have very small state and if the right parameters are chosen, fairly fast on modern desktop processors. The LFGs have larger state and use only memory fetches and addition operation, so are very fast on embedded systems and microcontrollers that lack specialized math hardware. The MTG has huge state and is slow, but can have a very large non-repeating sequence with excellent spectral characteristics.



If none of the supplied generators are good enough for your specific use, the C++ standard library also provides an interface for either a hardware generator or your own custom engine. None of the generators are intended to be used standalone: their intended use is through a distribution object that provides a random sequence with a particular probability distribution function.



Another advantage of <random> over rand() is that rand() uses global state, is not reentrant or threadsafe, and allows a single instance per process. If you need fine-grained control or predictability (ie. able to reproduce a bug given the RNG seed state) then rand() is useless. The <random> generators are locally instanced and have serializable (and restorable) state.






share|improve this answer






















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "",
    contentPolicyHtml: "",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53040940%2fwhy-is-the-new-random-library-better-than-stdrand%23new-answer', 'question_page');

    );

    Post as a guest






























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    89
    down vote



    accepted










    Pretty much any implementation of "old" rand() use an LCG; while they are generally not the best generators around, usually you are not going to see them fail on such a basic test - mean and standard deviation is generally got right even by the worst PRNGs.



    Common failings of "bad" - but common enough - rand() implementations are:



    • low randomness of low-order bits;

    • short period;

    • low RAND_MAX;

    • some correlation between successive extractions (in general, LCGs produce numbers that are on a limited number of hyperplanes, although this can be somehow mitigated).

    Still, none of these are specific to the API of rand(). A particular implementation could place a xorshift-family generator behind srand/rand and, algoritmically speaking, obtain a state of the art PRNG with no changes of interface, so no test like the one you did would show any weakness in the output.



    Edit: @R. correctly notes that the rand/srand interface is limited by the fact that srand takes an unsigned int, so any generator an implementation may put behind them is intrinsically limited to UINT_MAX possible starting seeds (and thus generated sequences). This is true indeed, although the API could be trivially extended to make srand take an unsigned long long, or adding a separate srand(unsigned char *, size_t) overload.




    Indeed, the actual problem with rand() is not much of implementation in principle but:



    • backwards compatibility; many current implementations use suboptimal generators, typically with badly choosen parameters; a notorious example is Visual C++, which sports a RAND_MAX of just 32767. However, this cannot be changed easily, as it would break compatibility with the past - people using srand with a fixed seed for reproducible simulations wouldn't be too happy (indeed, IIRC the aforementioned implementation goes back to Microsoft C early versions - or even Lattice C - from the mid-eighties);


    • simplistic interface; rand() provides a single generator with global state for the whole program. While this is perfectly fine (and actually quite handy) for many simple use cases, it poses problems:



      • with multithreaded code: to fix it you either need a global mutex - which would slow down everything for no reason and kill any chance of repeatability, as the sequence of calls becomes random itself -, or thread local state; this last one has been adopted by several implementations (notably Visual C++);

      • if you want a "private", reproducible sequence into a specific module of your program that doesn't impact the global state.


    Finally, the rand state of affairs:



    • doesn't specify an actual implementation (the C standard provides just a sample implementation), so any program that is intended to produce reproducible output (or expect a PRNG of some known quality) across different compilers must roll its own generator;

    • doesn't provide any cross-platform method to obtain a decent seed (time(NULL) is not, as it isn't granular enough, and often - think embedded devices with no RTC - not even random enough).

    Hence the new <random> header, which tries to fix this mess providing algorithms that are:



    • fully specified (so you can have cross-compiler reproducible output and guaranteed characteristics - say, range of the generator);

    • generally of state-of-the-art quality (from when the library was designed; see below);

    • encapsulated in classes (so no global state is forced upon you, which avoids completely threading and nonlocality problems);

    ... and a default random_device as well to seed them.



    Now, if you ask me I would have liked also a simple API built on top of this for the "easy", "guess a number" cases (similar to how Python does provide the "complicated" API, but also the trivial random.randint & co. using a global, pre-seeded PRNG for us uncomplicated people who'd like not to drown in random devices/engines/adapters/whatever every time we want to extract a number for the bingo cards), but it's true that you can easily build it by yourself over the current facilities (while building the "full" API over a simplistic one wouldn't be possible).




    Finally, to get back to your performance comparison: as other have specified, you are comparing a fast LCG with a slower (but generally considered better quality) Mersenne Twister; if you are ok with the quality of an LCG, you can use std::minstd_rand instead of std::mt19937.



    Indeed, after tweaking your function to use std::minstd_rand and avoid useless static variables for initialization



    int getRandNum_New() 
    static std::minstd_rand engstd::random_device();
    static std::uniform_int_distribution<int> dist0, 5;
    return dist(eng);



    I get 9 ms (old) vs 21 ms (new); finally, if I get rid of dist (which, compared to the classic modulo operator, handles the distribution skew for output range not multiple of the input range) and get back to what you are doing in getRandNum_Old()



    int getRandNum_New() 
    static std::minstd_rand engstd::random_device();
    return eng() % 6;



    I get it down to 6 ms (so, 30% faster), probably because, unlike the call to rand(), std::minstd_rand is easier to inline.




    Incidentally, I did the same test using a hand-rolled (but pretty much conforming to the standard library interface) XorShift64*, and it's 2.3 times faster than rand() (3.68 ms vs 8.61 ms); given that, unlike the Mersenne Twister and the various provided LCGs, it passes the current randomness test suites with flying colors and it's blazingly fast, it makes you wonder why it isn't included in the standard library yet.






    share|improve this answer


















    • 3




      It's exactly the combination of srand and an unspecified algorithm that gets std::rand in trouble. See also my answer to another question.
      – Peter O.
      yesterday






    • 2




      rand is fundamentally limited at the API level in that the seed (and thus the number of possible sequences that can be produced) is bounded by UINT_MAX+1.
      – R..
      yesterday






    • 2




      just a note: minstd is a bad PRNG, mt19973 is better but not by much: pcg-random.org/… (in that chart, minstd==LCG32/64). it's quite a shame that C++ doesn't provide any high-quality, fast, PRNGs like PCG or xoroshiro128+.
      – user60561
      19 hours ago






    • 2




      @MatteoItalia We're not in disagreement. This was also Bjarne's point. We really want <random> in the standard, but we would also like a "just give me a decent implementation that I can use now" option. For PRNGs as well as other things.
      – ravnsgaard
      14 hours ago






    • 2




      A couple notes: 1. Replacing std::uniform_int_distribution<int> dist0, 5(eng); with eng() % 6 reintroduces the skew factor that the std::rand code suffers from (admittedly minor skew in this case, where the engine has 2**31 - 1 outputs, and you're distributing them to 6 buckets). 2. On your note about "srand takes an unsigned int" which limits the possible outputs, as written, seeding your engine with just std::random_device() has the same problem; you need a seed_seq to properly initialize most PRNGs.
      – ShadowRanger
      9 hours ago














    up vote
    89
    down vote



    accepted










    Pretty much any implementation of "old" rand() use an LCG; while they are generally not the best generators around, usually you are not going to see them fail on such a basic test - mean and standard deviation is generally got right even by the worst PRNGs.



    Common failings of "bad" - but common enough - rand() implementations are:



    • low randomness of low-order bits;

    • short period;

    • low RAND_MAX;

    • some correlation between successive extractions (in general, LCGs produce numbers that are on a limited number of hyperplanes, although this can be somehow mitigated).

    Still, none of these are specific to the API of rand(). A particular implementation could place a xorshift-family generator behind srand/rand and, algoritmically speaking, obtain a state of the art PRNG with no changes of interface, so no test like the one you did would show any weakness in the output.



    Edit: @R. correctly notes that the rand/srand interface is limited by the fact that srand takes an unsigned int, so any generator an implementation may put behind them is intrinsically limited to UINT_MAX possible starting seeds (and thus generated sequences). This is true indeed, although the API could be trivially extended to make srand take an unsigned long long, or adding a separate srand(unsigned char *, size_t) overload.




    Indeed, the actual problem with rand() is not much of implementation in principle but:



    • backwards compatibility; many current implementations use suboptimal generators, typically with badly choosen parameters; a notorious example is Visual C++, which sports a RAND_MAX of just 32767. However, this cannot be changed easily, as it would break compatibility with the past - people using srand with a fixed seed for reproducible simulations wouldn't be too happy (indeed, IIRC the aforementioned implementation goes back to Microsoft C early versions - or even Lattice C - from the mid-eighties);


    • simplistic interface; rand() provides a single generator with global state for the whole program. While this is perfectly fine (and actually quite handy) for many simple use cases, it poses problems:



      • with multithreaded code: to fix it you either need a global mutex - which would slow down everything for no reason and kill any chance of repeatability, as the sequence of calls becomes random itself -, or thread local state; this last one has been adopted by several implementations (notably Visual C++);

      • if you want a "private", reproducible sequence into a specific module of your program that doesn't impact the global state.


    Finally, the rand state of affairs:



    • doesn't specify an actual implementation (the C standard provides just a sample implementation), so any program that is intended to produce reproducible output (or expect a PRNG of some known quality) across different compilers must roll its own generator;

    • doesn't provide any cross-platform method to obtain a decent seed (time(NULL) is not, as it isn't granular enough, and often - think embedded devices with no RTC - not even random enough).

    Hence the new <random> header, which tries to fix this mess providing algorithms that are:



    • fully specified (so you can have cross-compiler reproducible output and guaranteed characteristics - say, range of the generator);

    • generally of state-of-the-art quality (from when the library was designed; see below);

    • encapsulated in classes (so no global state is forced upon you, which avoids completely threading and nonlocality problems);

    ... and a default random_device as well to seed them.



    Now, if you ask me I would have liked also a simple API built on top of this for the "easy", "guess a number" cases (similar to how Python does provide the "complicated" API, but also the trivial random.randint & co. using a global, pre-seeded PRNG for us uncomplicated people who'd like not to drown in random devices/engines/adapters/whatever every time we want to extract a number for the bingo cards), but it's true that you can easily build it by yourself over the current facilities (while building the "full" API over a simplistic one wouldn't be possible).




    Finally, to get back to your performance comparison: as other have specified, you are comparing a fast LCG with a slower (but generally considered better quality) Mersenne Twister; if you are ok with the quality of an LCG, you can use std::minstd_rand instead of std::mt19937.



    Indeed, after tweaking your function to use std::minstd_rand and avoid useless static variables for initialization



    int getRandNum_New() 
    static std::minstd_rand engstd::random_device();
    static std::uniform_int_distribution<int> dist0, 5;
    return dist(eng);



    I get 9 ms (old) vs 21 ms (new); finally, if I get rid of dist (which, compared to the classic modulo operator, handles the distribution skew for output range not multiple of the input range) and get back to what you are doing in getRandNum_Old()



    int getRandNum_New() 
    static std::minstd_rand engstd::random_device();
    return eng() % 6;



    I get it down to 6 ms (so, 30% faster), probably because, unlike the call to rand(), std::minstd_rand is easier to inline.




    Incidentally, I did the same test using a hand-rolled (but pretty much conforming to the standard library interface) XorShift64*, and it's 2.3 times faster than rand() (3.68 ms vs 8.61 ms); given that, unlike the Mersenne Twister and the various provided LCGs, it passes the current randomness test suites with flying colors and it's blazingly fast, it makes you wonder why it isn't included in the standard library yet.






    share|improve this answer


















    • 3




      It's exactly the combination of srand and an unspecified algorithm that gets std::rand in trouble. See also my answer to another question.
      – Peter O.
      yesterday






    • 2




      rand is fundamentally limited at the API level in that the seed (and thus the number of possible sequences that can be produced) is bounded by UINT_MAX+1.
      – R..
      yesterday






    • 2




      just a note: minstd is a bad PRNG, mt19973 is better but not by much: pcg-random.org/… (in that chart, minstd==LCG32/64). it's quite a shame that C++ doesn't provide any high-quality, fast, PRNGs like PCG or xoroshiro128+.
      – user60561
      19 hours ago






    • 2




      @MatteoItalia We're not in disagreement. This was also Bjarne's point. We really want <random> in the standard, but we would also like a "just give me a decent implementation that I can use now" option. For PRNGs as well as other things.
      – ravnsgaard
      14 hours ago






    • 2




      A couple notes: 1. Replacing std::uniform_int_distribution<int> dist0, 5(eng); with eng() % 6 reintroduces the skew factor that the std::rand code suffers from (admittedly minor skew in this case, where the engine has 2**31 - 1 outputs, and you're distributing them to 6 buckets). 2. On your note about "srand takes an unsigned int" which limits the possible outputs, as written, seeding your engine with just std::random_device() has the same problem; you need a seed_seq to properly initialize most PRNGs.
      – ShadowRanger
      9 hours ago












    up vote
    89
    down vote



    accepted







    up vote
    89
    down vote



    accepted






    Pretty much any implementation of "old" rand() use an LCG; while they are generally not the best generators around, usually you are not going to see them fail on such a basic test - mean and standard deviation is generally got right even by the worst PRNGs.



    Common failings of "bad" - but common enough - rand() implementations are:



    • low randomness of low-order bits;

    • short period;

    • low RAND_MAX;

    • some correlation between successive extractions (in general, LCGs produce numbers that are on a limited number of hyperplanes, although this can be somehow mitigated).

    Still, none of these are specific to the API of rand(). A particular implementation could place a xorshift-family generator behind srand/rand and, algoritmically speaking, obtain a state of the art PRNG with no changes of interface, so no test like the one you did would show any weakness in the output.



    Edit: @R. correctly notes that the rand/srand interface is limited by the fact that srand takes an unsigned int, so any generator an implementation may put behind them is intrinsically limited to UINT_MAX possible starting seeds (and thus generated sequences). This is true indeed, although the API could be trivially extended to make srand take an unsigned long long, or adding a separate srand(unsigned char *, size_t) overload.




    Indeed, the actual problem with rand() is not much of implementation in principle but:



    • backwards compatibility; many current implementations use suboptimal generators, typically with badly choosen parameters; a notorious example is Visual C++, which sports a RAND_MAX of just 32767. However, this cannot be changed easily, as it would break compatibility with the past - people using srand with a fixed seed for reproducible simulations wouldn't be too happy (indeed, IIRC the aforementioned implementation goes back to Microsoft C early versions - or even Lattice C - from the mid-eighties);


    • simplistic interface; rand() provides a single generator with global state for the whole program. While this is perfectly fine (and actually quite handy) for many simple use cases, it poses problems:



      • with multithreaded code: to fix it you either need a global mutex - which would slow down everything for no reason and kill any chance of repeatability, as the sequence of calls becomes random itself -, or thread local state; this last one has been adopted by several implementations (notably Visual C++);

      • if you want a "private", reproducible sequence into a specific module of your program that doesn't impact the global state.


    Finally, the rand state of affairs:



    • doesn't specify an actual implementation (the C standard provides just a sample implementation), so any program that is intended to produce reproducible output (or expect a PRNG of some known quality) across different compilers must roll its own generator;

    • doesn't provide any cross-platform method to obtain a decent seed (time(NULL) is not, as it isn't granular enough, and often - think embedded devices with no RTC - not even random enough).

    Hence the new <random> header, which tries to fix this mess providing algorithms that are:



    • fully specified (so you can have cross-compiler reproducible output and guaranteed characteristics - say, range of the generator);

    • generally of state-of-the-art quality (from when the library was designed; see below);

    • encapsulated in classes (so no global state is forced upon you, which avoids completely threading and nonlocality problems);

    ... and a default random_device as well to seed them.



    Now, if you ask me I would have liked also a simple API built on top of this for the "easy", "guess a number" cases (similar to how Python does provide the "complicated" API, but also the trivial random.randint & co. using a global, pre-seeded PRNG for us uncomplicated people who'd like not to drown in random devices/engines/adapters/whatever every time we want to extract a number for the bingo cards), but it's true that you can easily build it by yourself over the current facilities (while building the "full" API over a simplistic one wouldn't be possible).




    Finally, to get back to your performance comparison: as other have specified, you are comparing a fast LCG with a slower (but generally considered better quality) Mersenne Twister; if you are ok with the quality of an LCG, you can use std::minstd_rand instead of std::mt19937.



    Indeed, after tweaking your function to use std::minstd_rand and avoid useless static variables for initialization



    int getRandNum_New() 
    static std::minstd_rand engstd::random_device();
    static std::uniform_int_distribution<int> dist0, 5;
    return dist(eng);



    I get 9 ms (old) vs 21 ms (new); finally, if I get rid of dist (which, compared to the classic modulo operator, handles the distribution skew for output range not multiple of the input range) and get back to what you are doing in getRandNum_Old()



    int getRandNum_New() 
    static std::minstd_rand engstd::random_device();
    return eng() % 6;



    I get it down to 6 ms (so, 30% faster), probably because, unlike the call to rand(), std::minstd_rand is easier to inline.




    Incidentally, I did the same test using a hand-rolled (but pretty much conforming to the standard library interface) XorShift64*, and it's 2.3 times faster than rand() (3.68 ms vs 8.61 ms); given that, unlike the Mersenne Twister and the various provided LCGs, it passes the current randomness test suites with flying colors and it's blazingly fast, it makes you wonder why it isn't included in the standard library yet.






    share|improve this answer














    Pretty much any implementation of "old" rand() use an LCG; while they are generally not the best generators around, usually you are not going to see them fail on such a basic test - mean and standard deviation is generally got right even by the worst PRNGs.



    Common failings of "bad" - but common enough - rand() implementations are:



    • low randomness of low-order bits;

    • short period;

    • low RAND_MAX;

    • some correlation between successive extractions (in general, LCGs produce numbers that are on a limited number of hyperplanes, although this can be somehow mitigated).

    Still, none of these are specific to the API of rand(). A particular implementation could place a xorshift-family generator behind srand/rand and, algoritmically speaking, obtain a state of the art PRNG with no changes of interface, so no test like the one you did would show any weakness in the output.



    Edit: @R. correctly notes that the rand/srand interface is limited by the fact that srand takes an unsigned int, so any generator an implementation may put behind them is intrinsically limited to UINT_MAX possible starting seeds (and thus generated sequences). This is true indeed, although the API could be trivially extended to make srand take an unsigned long long, or adding a separate srand(unsigned char *, size_t) overload.




    Indeed, the actual problem with rand() is not much of implementation in principle but:



    • backwards compatibility; many current implementations use suboptimal generators, typically with badly choosen parameters; a notorious example is Visual C++, which sports a RAND_MAX of just 32767. However, this cannot be changed easily, as it would break compatibility with the past - people using srand with a fixed seed for reproducible simulations wouldn't be too happy (indeed, IIRC the aforementioned implementation goes back to Microsoft C early versions - or even Lattice C - from the mid-eighties);


    • simplistic interface; rand() provides a single generator with global state for the whole program. While this is perfectly fine (and actually quite handy) for many simple use cases, it poses problems:



      • with multithreaded code: to fix it you either need a global mutex - which would slow down everything for no reason and kill any chance of repeatability, as the sequence of calls becomes random itself -, or thread local state; this last one has been adopted by several implementations (notably Visual C++);

      • if you want a "private", reproducible sequence into a specific module of your program that doesn't impact the global state.


    Finally, the rand state of affairs:



    • doesn't specify an actual implementation (the C standard provides just a sample implementation), so any program that is intended to produce reproducible output (or expect a PRNG of some known quality) across different compilers must roll its own generator;

    • doesn't provide any cross-platform method to obtain a decent seed (time(NULL) is not, as it isn't granular enough, and often - think embedded devices with no RTC - not even random enough).

    Hence the new <random> header, which tries to fix this mess providing algorithms that are:



    • fully specified (so you can have cross-compiler reproducible output and guaranteed characteristics - say, range of the generator);

    • generally of state-of-the-art quality (from when the library was designed; see below);

    • encapsulated in classes (so no global state is forced upon you, which avoids completely threading and nonlocality problems);

    ... and a default random_device as well to seed them.



    Now, if you ask me I would have liked also a simple API built on top of this for the "easy", "guess a number" cases (similar to how Python does provide the "complicated" API, but also the trivial random.randint & co. using a global, pre-seeded PRNG for us uncomplicated people who'd like not to drown in random devices/engines/adapters/whatever every time we want to extract a number for the bingo cards), but it's true that you can easily build it by yourself over the current facilities (while building the "full" API over a simplistic one wouldn't be possible).




    Finally, to get back to your performance comparison: as other have specified, you are comparing a fast LCG with a slower (but generally considered better quality) Mersenne Twister; if you are ok with the quality of an LCG, you can use std::minstd_rand instead of std::mt19937.



    Indeed, after tweaking your function to use std::minstd_rand and avoid useless static variables for initialization



    int getRandNum_New() 
    static std::minstd_rand engstd::random_device();
    static std::uniform_int_distribution<int> dist0, 5;
    return dist(eng);



    I get 9 ms (old) vs 21 ms (new); finally, if I get rid of dist (which, compared to the classic modulo operator, handles the distribution skew for output range not multiple of the input range) and get back to what you are doing in getRandNum_Old()



    int getRandNum_New() 
    static std::minstd_rand engstd::random_device();
    return eng() % 6;



    I get it down to 6 ms (so, 30% faster), probably because, unlike the call to rand(), std::minstd_rand is easier to inline.




    Incidentally, I did the same test using a hand-rolled (but pretty much conforming to the standard library interface) XorShift64*, and it's 2.3 times faster than rand() (3.68 ms vs 8.61 ms); given that, unlike the Mersenne Twister and the various provided LCGs, it passes the current randomness test suites with flying colors and it's blazingly fast, it makes you wonder why it isn't included in the standard library yet.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 16 hours ago

























    answered yesterday









    Matteo Italia

    96.1k12134235




    96.1k12134235







    • 3




      It's exactly the combination of srand and an unspecified algorithm that gets std::rand in trouble. See also my answer to another question.
      – Peter O.
      yesterday






    • 2




      rand is fundamentally limited at the API level in that the seed (and thus the number of possible sequences that can be produced) is bounded by UINT_MAX+1.
      – R..
      yesterday






    • 2




      just a note: minstd is a bad PRNG, mt19973 is better but not by much: pcg-random.org/… (in that chart, minstd==LCG32/64). it's quite a shame that C++ doesn't provide any high-quality, fast, PRNGs like PCG or xoroshiro128+.
      – user60561
      19 hours ago






    • 2




      @MatteoItalia We're not in disagreement. This was also Bjarne's point. We really want <random> in the standard, but we would also like a "just give me a decent implementation that I can use now" option. For PRNGs as well as other things.
      – ravnsgaard
      14 hours ago






    • 2




      A couple notes: 1. Replacing std::uniform_int_distribution<int> dist0, 5(eng); with eng() % 6 reintroduces the skew factor that the std::rand code suffers from (admittedly minor skew in this case, where the engine has 2**31 - 1 outputs, and you're distributing them to 6 buckets). 2. On your note about "srand takes an unsigned int" which limits the possible outputs, as written, seeding your engine with just std::random_device() has the same problem; you need a seed_seq to properly initialize most PRNGs.
      – ShadowRanger
      9 hours ago












    • 3




      It's exactly the combination of srand and an unspecified algorithm that gets std::rand in trouble. See also my answer to another question.
      – Peter O.
      yesterday






    • 2




      rand is fundamentally limited at the API level in that the seed (and thus the number of possible sequences that can be produced) is bounded by UINT_MAX+1.
      – R..
      yesterday






    • 2




      just a note: minstd is a bad PRNG, mt19973 is better but not by much: pcg-random.org/… (in that chart, minstd==LCG32/64). it's quite a shame that C++ doesn't provide any high-quality, fast, PRNGs like PCG or xoroshiro128+.
      – user60561
      19 hours ago






    • 2




      @MatteoItalia We're not in disagreement. This was also Bjarne's point. We really want <random> in the standard, but we would also like a "just give me a decent implementation that I can use now" option. For PRNGs as well as other things.
      – ravnsgaard
      14 hours ago






    • 2




      A couple notes: 1. Replacing std::uniform_int_distribution<int> dist0, 5(eng); with eng() % 6 reintroduces the skew factor that the std::rand code suffers from (admittedly minor skew in this case, where the engine has 2**31 - 1 outputs, and you're distributing them to 6 buckets). 2. On your note about "srand takes an unsigned int" which limits the possible outputs, as written, seeding your engine with just std::random_device() has the same problem; you need a seed_seq to properly initialize most PRNGs.
      – ShadowRanger
      9 hours ago







    3




    3




    It's exactly the combination of srand and an unspecified algorithm that gets std::rand in trouble. See also my answer to another question.
    – Peter O.
    yesterday




    It's exactly the combination of srand and an unspecified algorithm that gets std::rand in trouble. See also my answer to another question.
    – Peter O.
    yesterday




    2




    2




    rand is fundamentally limited at the API level in that the seed (and thus the number of possible sequences that can be produced) is bounded by UINT_MAX+1.
    – R..
    yesterday




    rand is fundamentally limited at the API level in that the seed (and thus the number of possible sequences that can be produced) is bounded by UINT_MAX+1.
    – R..
    yesterday




    2




    2




    just a note: minstd is a bad PRNG, mt19973 is better but not by much: pcg-random.org/… (in that chart, minstd==LCG32/64). it's quite a shame that C++ doesn't provide any high-quality, fast, PRNGs like PCG or xoroshiro128+.
    – user60561
    19 hours ago




    just a note: minstd is a bad PRNG, mt19973 is better but not by much: pcg-random.org/… (in that chart, minstd==LCG32/64). it's quite a shame that C++ doesn't provide any high-quality, fast, PRNGs like PCG or xoroshiro128+.
    – user60561
    19 hours ago




    2




    2




    @MatteoItalia We're not in disagreement. This was also Bjarne's point. We really want <random> in the standard, but we would also like a "just give me a decent implementation that I can use now" option. For PRNGs as well as other things.
    – ravnsgaard
    14 hours ago




    @MatteoItalia We're not in disagreement. This was also Bjarne's point. We really want <random> in the standard, but we would also like a "just give me a decent implementation that I can use now" option. For PRNGs as well as other things.
    – ravnsgaard
    14 hours ago




    2




    2




    A couple notes: 1. Replacing std::uniform_int_distribution<int> dist0, 5(eng); with eng() % 6 reintroduces the skew factor that the std::rand code suffers from (admittedly minor skew in this case, where the engine has 2**31 - 1 outputs, and you're distributing them to 6 buckets). 2. On your note about "srand takes an unsigned int" which limits the possible outputs, as written, seeding your engine with just std::random_device() has the same problem; you need a seed_seq to properly initialize most PRNGs.
    – ShadowRanger
    9 hours ago




    A couple notes: 1. Replacing std::uniform_int_distribution<int> dist0, 5(eng); with eng() % 6 reintroduces the skew factor that the std::rand code suffers from (admittedly minor skew in this case, where the engine has 2**31 - 1 outputs, and you're distributing them to 6 buckets). 2. On your note about "srand takes an unsigned int" which limits the possible outputs, as written, seeding your engine with just std::random_device() has the same problem; you need a seed_seq to properly initialize most PRNGs.
    – ShadowRanger
    9 hours ago












    up vote
    5
    down vote













    If you repeat your experiment with a range larger than 5 then you will probably see different results. When your range is significantly smaller than RAND_MAX there isn't an issue for most applications.



    For example if we have a RAND_MAX of 25 then rand() % 5 will produce numbers with the following frequencies:



    0: 6
    1: 5
    2: 5
    3: 5
    4: 5


    As RAND_MAX is guaranteed to be more than 32767 and the difference in frequencies between the least likely and the most likely is only 1, for small numbers the distribution is near enough random for most use cases.






    share|improve this answer
















    • 3




      This is explained in STL's second slide
      – Alan Birtles
      yesterday






    • 3




      Ok, but... who is STL ? And what slides ? (serious question)
      – kebs
      yesterday










    • @kebs, Stephan Lavavej, see the Youtube reference in the question.
      – Evg
      yesterday














    up vote
    5
    down vote













    If you repeat your experiment with a range larger than 5 then you will probably see different results. When your range is significantly smaller than RAND_MAX there isn't an issue for most applications.



    For example if we have a RAND_MAX of 25 then rand() % 5 will produce numbers with the following frequencies:



    0: 6
    1: 5
    2: 5
    3: 5
    4: 5


    As RAND_MAX is guaranteed to be more than 32767 and the difference in frequencies between the least likely and the most likely is only 1, for small numbers the distribution is near enough random for most use cases.






    share|improve this answer
















    • 3




      This is explained in STL's second slide
      – Alan Birtles
      yesterday






    • 3




      Ok, but... who is STL ? And what slides ? (serious question)
      – kebs
      yesterday










    • @kebs, Stephan Lavavej, see the Youtube reference in the question.
      – Evg
      yesterday












    up vote
    5
    down vote










    up vote
    5
    down vote









    If you repeat your experiment with a range larger than 5 then you will probably see different results. When your range is significantly smaller than RAND_MAX there isn't an issue for most applications.



    For example if we have a RAND_MAX of 25 then rand() % 5 will produce numbers with the following frequencies:



    0: 6
    1: 5
    2: 5
    3: 5
    4: 5


    As RAND_MAX is guaranteed to be more than 32767 and the difference in frequencies between the least likely and the most likely is only 1, for small numbers the distribution is near enough random for most use cases.






    share|improve this answer












    If you repeat your experiment with a range larger than 5 then you will probably see different results. When your range is significantly smaller than RAND_MAX there isn't an issue for most applications.



    For example if we have a RAND_MAX of 25 then rand() % 5 will produce numbers with the following frequencies:



    0: 6
    1: 5
    2: 5
    3: 5
    4: 5


    As RAND_MAX is guaranteed to be more than 32767 and the difference in frequencies between the least likely and the most likely is only 1, for small numbers the distribution is near enough random for most use cases.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered yesterday









    Alan Birtles

    6,662732




    6,662732







    • 3




      This is explained in STL's second slide
      – Alan Birtles
      yesterday






    • 3




      Ok, but... who is STL ? And what slides ? (serious question)
      – kebs
      yesterday










    • @kebs, Stephan Lavavej, see the Youtube reference in the question.
      – Evg
      yesterday












    • 3




      This is explained in STL's second slide
      – Alan Birtles
      yesterday






    • 3




      Ok, but... who is STL ? And what slides ? (serious question)
      – kebs
      yesterday










    • @kebs, Stephan Lavavej, see the Youtube reference in the question.
      – Evg
      yesterday







    3




    3




    This is explained in STL's second slide
    – Alan Birtles
    yesterday




    This is explained in STL's second slide
    – Alan Birtles
    yesterday




    3




    3




    Ok, but... who is STL ? And what slides ? (serious question)
    – kebs
    yesterday




    Ok, but... who is STL ? And what slides ? (serious question)
    – kebs
    yesterday












    @kebs, Stephan Lavavej, see the Youtube reference in the question.
    – Evg
    yesterday




    @kebs, Stephan Lavavej, see the Youtube reference in the question.
    – Evg
    yesterday










    up vote
    2
    down vote













    First, surprisingly, the answer changes depending on what you are using the random number for. If it is to drive, say, a random background color changer, using rand() is perfectly fine. If you are using a random number to create a random poker hand or a cryptographically secure key, then it is not fine.



    Predictability: the sequence 012345012345012345012345... would provide an even distribution of each number in your sample, but obviously isn't random. For a sequence to be random, the value of n+1 cannot be easily predicted by the value of n (or even by the values of n, n-1, n-2, n-3, etc.) Clearly a repeating sequence of the same digits is a degenerate case, but a sequence generated with any linear congruential generator can be subjected to analysis; if you use default out-of-the-box settings of a common LCG from a common library, a malicious person could "break the sequence" without much effort at all. In the past, several on-line casinos (and some brick-and-mortar ones) were hit for losses by machines using poor random number generators. Even people who should know better have been caught up; TPM chips from several manufacturers have been demonstrated to be easier to break than the bit-length of the keys would otherwise predict because of poor choices made with key-generation parameters.



    Distribution: As alluded to in the video, taking a modulo of 100 (or any value not evenly divisible into the length of the sequence) will guarantee that some outcomes will become at least slightly more likely than other outcomes. In the universe of 32767 possible starting values modulo 100, the numbers 0 through 66 will appear 328/327 (0.3%) more often than the values 67 through 99; a factor that may provide an attacker an advantage.






    share|improve this answer






















    • "Predictability: the sequence 012345012345012345012345... would pass your test for "randomness", in that there would be an even distribution of each number in your sample" actually, not really; what he is measuring is the stddev of the stddevs between runs, i.e. essentially how the various runs histogram are spread out. With a 012345012345012345... generator it would always be zero.
      – Matteo Italia
      yesterday










    • Good point; I read through OP's code a bit too quickly, I'm afraid. Edited my answer to reflect.
      – JackLThornton
      yesterday










    • Hehe I know because I though to make that test as well, and I noticed I obtained different results 😄
      – Matteo Italia
      yesterday














    up vote
    2
    down vote













    First, surprisingly, the answer changes depending on what you are using the random number for. If it is to drive, say, a random background color changer, using rand() is perfectly fine. If you are using a random number to create a random poker hand or a cryptographically secure key, then it is not fine.



    Predictability: the sequence 012345012345012345012345... would provide an even distribution of each number in your sample, but obviously isn't random. For a sequence to be random, the value of n+1 cannot be easily predicted by the value of n (or even by the values of n, n-1, n-2, n-3, etc.) Clearly a repeating sequence of the same digits is a degenerate case, but a sequence generated with any linear congruential generator can be subjected to analysis; if you use default out-of-the-box settings of a common LCG from a common library, a malicious person could "break the sequence" without much effort at all. In the past, several on-line casinos (and some brick-and-mortar ones) were hit for losses by machines using poor random number generators. Even people who should know better have been caught up; TPM chips from several manufacturers have been demonstrated to be easier to break than the bit-length of the keys would otherwise predict because of poor choices made with key-generation parameters.



    Distribution: As alluded to in the video, taking a modulo of 100 (or any value not evenly divisible into the length of the sequence) will guarantee that some outcomes will become at least slightly more likely than other outcomes. In the universe of 32767 possible starting values modulo 100, the numbers 0 through 66 will appear 328/327 (0.3%) more often than the values 67 through 99; a factor that may provide an attacker an advantage.






    share|improve this answer






















    • "Predictability: the sequence 012345012345012345012345... would pass your test for "randomness", in that there would be an even distribution of each number in your sample" actually, not really; what he is measuring is the stddev of the stddevs between runs, i.e. essentially how the various runs histogram are spread out. With a 012345012345012345... generator it would always be zero.
      – Matteo Italia
      yesterday










    • Good point; I read through OP's code a bit too quickly, I'm afraid. Edited my answer to reflect.
      – JackLThornton
      yesterday










    • Hehe I know because I though to make that test as well, and I noticed I obtained different results 😄
      – Matteo Italia
      yesterday












    up vote
    2
    down vote










    up vote
    2
    down vote









    First, surprisingly, the answer changes depending on what you are using the random number for. If it is to drive, say, a random background color changer, using rand() is perfectly fine. If you are using a random number to create a random poker hand or a cryptographically secure key, then it is not fine.



    Predictability: the sequence 012345012345012345012345... would provide an even distribution of each number in your sample, but obviously isn't random. For a sequence to be random, the value of n+1 cannot be easily predicted by the value of n (or even by the values of n, n-1, n-2, n-3, etc.) Clearly a repeating sequence of the same digits is a degenerate case, but a sequence generated with any linear congruential generator can be subjected to analysis; if you use default out-of-the-box settings of a common LCG from a common library, a malicious person could "break the sequence" without much effort at all. In the past, several on-line casinos (and some brick-and-mortar ones) were hit for losses by machines using poor random number generators. Even people who should know better have been caught up; TPM chips from several manufacturers have been demonstrated to be easier to break than the bit-length of the keys would otherwise predict because of poor choices made with key-generation parameters.



    Distribution: As alluded to in the video, taking a modulo of 100 (or any value not evenly divisible into the length of the sequence) will guarantee that some outcomes will become at least slightly more likely than other outcomes. In the universe of 32767 possible starting values modulo 100, the numbers 0 through 66 will appear 328/327 (0.3%) more often than the values 67 through 99; a factor that may provide an attacker an advantage.






    share|improve this answer














    First, surprisingly, the answer changes depending on what you are using the random number for. If it is to drive, say, a random background color changer, using rand() is perfectly fine. If you are using a random number to create a random poker hand or a cryptographically secure key, then it is not fine.



    Predictability: the sequence 012345012345012345012345... would provide an even distribution of each number in your sample, but obviously isn't random. For a sequence to be random, the value of n+1 cannot be easily predicted by the value of n (or even by the values of n, n-1, n-2, n-3, etc.) Clearly a repeating sequence of the same digits is a degenerate case, but a sequence generated with any linear congruential generator can be subjected to analysis; if you use default out-of-the-box settings of a common LCG from a common library, a malicious person could "break the sequence" without much effort at all. In the past, several on-line casinos (and some brick-and-mortar ones) were hit for losses by machines using poor random number generators. Even people who should know better have been caught up; TPM chips from several manufacturers have been demonstrated to be easier to break than the bit-length of the keys would otherwise predict because of poor choices made with key-generation parameters.



    Distribution: As alluded to in the video, taking a modulo of 100 (or any value not evenly divisible into the length of the sequence) will guarantee that some outcomes will become at least slightly more likely than other outcomes. In the universe of 32767 possible starting values modulo 100, the numbers 0 through 66 will appear 328/327 (0.3%) more often than the values 67 through 99; a factor that may provide an attacker an advantage.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited yesterday

























    answered yesterday









    JackLThornton

    126110




    126110











    • "Predictability: the sequence 012345012345012345012345... would pass your test for "randomness", in that there would be an even distribution of each number in your sample" actually, not really; what he is measuring is the stddev of the stddevs between runs, i.e. essentially how the various runs histogram are spread out. With a 012345012345012345... generator it would always be zero.
      – Matteo Italia
      yesterday










    • Good point; I read through OP's code a bit too quickly, I'm afraid. Edited my answer to reflect.
      – JackLThornton
      yesterday










    • Hehe I know because I though to make that test as well, and I noticed I obtained different results 😄
      – Matteo Italia
      yesterday
















    • "Predictability: the sequence 012345012345012345012345... would pass your test for "randomness", in that there would be an even distribution of each number in your sample" actually, not really; what he is measuring is the stddev of the stddevs between runs, i.e. essentially how the various runs histogram are spread out. With a 012345012345012345... generator it would always be zero.
      – Matteo Italia
      yesterday










    • Good point; I read through OP's code a bit too quickly, I'm afraid. Edited my answer to reflect.
      – JackLThornton
      yesterday










    • Hehe I know because I though to make that test as well, and I noticed I obtained different results 😄
      – Matteo Italia
      yesterday















    "Predictability: the sequence 012345012345012345012345... would pass your test for "randomness", in that there would be an even distribution of each number in your sample" actually, not really; what he is measuring is the stddev of the stddevs between runs, i.e. essentially how the various runs histogram are spread out. With a 012345012345012345... generator it would always be zero.
    – Matteo Italia
    yesterday




    "Predictability: the sequence 012345012345012345012345... would pass your test for "randomness", in that there would be an even distribution of each number in your sample" actually, not really; what he is measuring is the stddev of the stddevs between runs, i.e. essentially how the various runs histogram are spread out. With a 012345012345012345... generator it would always be zero.
    – Matteo Italia
    yesterday












    Good point; I read through OP's code a bit too quickly, I'm afraid. Edited my answer to reflect.
    – JackLThornton
    yesterday




    Good point; I read through OP's code a bit too quickly, I'm afraid. Edited my answer to reflect.
    – JackLThornton
    yesterday












    Hehe I know because I though to make that test as well, and I noticed I obtained different results 😄
    – Matteo Italia
    yesterday




    Hehe I know because I though to make that test as well, and I noticed I obtained different results 😄
    – Matteo Italia
    yesterday










    up vote
    0
    down vote













    The correct answer is: it depends on what you mean by "better."



    The "new" <random> engines were introduced to C++ over 13 years ago, so they're not really new. The C library rand() was introduced decades ago and has been very useful in that time for any number of things.



    The C++ standard library provides three classes of random number generator engines: the Linear Congruential (of which rand() is an example), the Lagged Fibonacci, and the Mersenne Twister. There are tradeoffs of each class, and each class is "best" in certain ways. For example, the LCGs have very small state and if the right parameters are chosen, fairly fast on modern desktop processors. The LFGs have larger state and use only memory fetches and addition operation, so are very fast on embedded systems and microcontrollers that lack specialized math hardware. The MTG has huge state and is slow, but can have a very large non-repeating sequence with excellent spectral characteristics.



    If none of the supplied generators are good enough for your specific use, the C++ standard library also provides an interface for either a hardware generator or your own custom engine. None of the generators are intended to be used standalone: their intended use is through a distribution object that provides a random sequence with a particular probability distribution function.



    Another advantage of <random> over rand() is that rand() uses global state, is not reentrant or threadsafe, and allows a single instance per process. If you need fine-grained control or predictability (ie. able to reproduce a bug given the RNG seed state) then rand() is useless. The <random> generators are locally instanced and have serializable (and restorable) state.






    share|improve this answer


























      up vote
      0
      down vote













      The correct answer is: it depends on what you mean by "better."



      The "new" <random> engines were introduced to C++ over 13 years ago, so they're not really new. The C library rand() was introduced decades ago and has been very useful in that time for any number of things.



      The C++ standard library provides three classes of random number generator engines: the Linear Congruential (of which rand() is an example), the Lagged Fibonacci, and the Mersenne Twister. There are tradeoffs of each class, and each class is "best" in certain ways. For example, the LCGs have very small state and if the right parameters are chosen, fairly fast on modern desktop processors. The LFGs have larger state and use only memory fetches and addition operation, so are very fast on embedded systems and microcontrollers that lack specialized math hardware. The MTG has huge state and is slow, but can have a very large non-repeating sequence with excellent spectral characteristics.



      If none of the supplied generators are good enough for your specific use, the C++ standard library also provides an interface for either a hardware generator or your own custom engine. None of the generators are intended to be used standalone: their intended use is through a distribution object that provides a random sequence with a particular probability distribution function.



      Another advantage of <random> over rand() is that rand() uses global state, is not reentrant or threadsafe, and allows a single instance per process. If you need fine-grained control or predictability (ie. able to reproduce a bug given the RNG seed state) then rand() is useless. The <random> generators are locally instanced and have serializable (and restorable) state.






      share|improve this answer
























        up vote
        0
        down vote










        up vote
        0
        down vote









        The correct answer is: it depends on what you mean by "better."



        The "new" <random> engines were introduced to C++ over 13 years ago, so they're not really new. The C library rand() was introduced decades ago and has been very useful in that time for any number of things.



        The C++ standard library provides three classes of random number generator engines: the Linear Congruential (of which rand() is an example), the Lagged Fibonacci, and the Mersenne Twister. There are tradeoffs of each class, and each class is "best" in certain ways. For example, the LCGs have very small state and if the right parameters are chosen, fairly fast on modern desktop processors. The LFGs have larger state and use only memory fetches and addition operation, so are very fast on embedded systems and microcontrollers that lack specialized math hardware. The MTG has huge state and is slow, but can have a very large non-repeating sequence with excellent spectral characteristics.



        If none of the supplied generators are good enough for your specific use, the C++ standard library also provides an interface for either a hardware generator or your own custom engine. None of the generators are intended to be used standalone: their intended use is through a distribution object that provides a random sequence with a particular probability distribution function.



        Another advantage of <random> over rand() is that rand() uses global state, is not reentrant or threadsafe, and allows a single instance per process. If you need fine-grained control or predictability (ie. able to reproduce a bug given the RNG seed state) then rand() is useless. The <random> generators are locally instanced and have serializable (and restorable) state.






        share|improve this answer














        The correct answer is: it depends on what you mean by "better."



        The "new" <random> engines were introduced to C++ over 13 years ago, so they're not really new. The C library rand() was introduced decades ago and has been very useful in that time for any number of things.



        The C++ standard library provides three classes of random number generator engines: the Linear Congruential (of which rand() is an example), the Lagged Fibonacci, and the Mersenne Twister. There are tradeoffs of each class, and each class is "best" in certain ways. For example, the LCGs have very small state and if the right parameters are chosen, fairly fast on modern desktop processors. The LFGs have larger state and use only memory fetches and addition operation, so are very fast on embedded systems and microcontrollers that lack specialized math hardware. The MTG has huge state and is slow, but can have a very large non-repeating sequence with excellent spectral characteristics.



        If none of the supplied generators are good enough for your specific use, the C++ standard library also provides an interface for either a hardware generator or your own custom engine. None of the generators are intended to be used standalone: their intended use is through a distribution object that provides a random sequence with a particular probability distribution function.



        Another advantage of <random> over rand() is that rand() uses global state, is not reentrant or threadsafe, and allows a single instance per process. If you need fine-grained control or predictability (ie. able to reproduce a bug given the RNG seed state) then rand() is useless. The <random> generators are locally instanced and have serializable (and restorable) state.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 4 hours ago









        cHao

        67.2k12111150




        67.2k12111150










        answered 6 hours ago









        Stephen M. Webb

        87949




        87949



























             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53040940%2fwhy-is-the-new-random-library-better-than-stdrand%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            Long meetings (6-7 hours a day): Being “babysat” by supervisor

            Is the Concept of Multiple Fantasy Races Scientifically Flawed? [closed]

            Confectionery