Does a “Normal Distribution” need to have mean=median=mode?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
2
down vote

favorite












I've been in a debate with my graduate-level statistics professor about "normal distributions". I contend that to truly get a normal distribution one must have mean=median=mode, all the data must be contained under the bell curve, and perfectly symmetrical around the mean. Therefore, technically, there are virtually NO normal distributions in real studies, and we should call them something else, perhaps "near-normal".



She says I'm too picky, and if the skew/kurtosis are less than 1.0 it is a normal distribution and took off points on an exam. The dataset is total number of falls/year in a random sampling of 52 nursing homes which is a random sample of a larger population. Any insight?



QUESTION: 3. Compute measures of skewness and kurtosis for this data. Include a histogram with a normal curve. Discuss your findings. Is the data normally distributed?



Statistics
Number of falls

N Valid 52
Missing 0
Mean 11.23
Median 11.50
Mode 4a
a. Multiple modes exist. The smallest value is shown



Number of falls

N Valid 52
Missing 0
Skewness .114
Std. Error of Skewness .330
Kurtosis -.961
Std. Error of Kurtosis .650



MY ANSWER:
The data is platykurtic and has only slight positive skewing, and it is NOT a normal distribution because the mean and median and mode are not equal and the data is not evenly distributed around the mean. In reality virtually no data is ever a perfect normal distribution, although we can discuss “approximately normal distributions” such as height, weight, temperature, or length of adult ring finger in large population groups.
PROFESSOR'S ANSWER:
You are correct that there is no perfectly normal distribution. But, we are not looking for perfection. We need to look at data in addition to the histogram and the measures of central tendency. What do the skewness and kurtosis statistics tell you about the distribution? Because they are both between the critical values of -1 and +1, this data is considered to be normally distributed.










share|cite|improve this question









New contributor




Possum-Pie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.















  • 2




    I think you are being a little pedantic, it makes no sense to talk about median or mode of a continuous distribution.
    – user2974951
    5 hours ago






  • 2




    I would like to know your professor's exact wording. In principle a normal distribution has mean, median and mode identical (but so do many other distributions) and has skewness 0 and (so-called excess) kurtosis 0 (and so do some other distributions). At best a distribution with (e.g.) slight skewness or kurtosis is approximately normal. Note that almost all real data are at best approximations to named distributions in the theoretical menagerie.
    – Nick Cox
    4 hours ago






  • 7




    I don't agree with @user2974951 In company with every good text I know I am perfectly happy thinking that the normal distribution has a median and mode. And that applies widely to continuous distributions, although I don't doubt that pathological counter-examples can be identified.
    – Nick Cox
    4 hours ago






  • 1




    I didn't call you or your question pedantic, but the one answer to date is I think broadly on the right lines. The main issue here seems to be different questions, what is a normal distribution in principle and what is approximately normal in practice. (FWIW, I don't readily believe the data -- unless the people who fell over many, many times are now dead or in hospital. For other reasons, I know more than I want to about nursing homes. It is also crucial whether you are talking about falls per individual or some average rate of falls for a nursing home, e.g, falls per week per 100 patients).
    – Nick Cox
    4 hours ago







  • 1




    Thanks for the specific detail, which shows merit on both sides, but I am not grading either of you. However, I dissent strongly from the term critical values as used by your Professor, as the limits $pm 1$ for skewness and kurtosis have no standing whatsoever beyond being rules of thumb that might be used. Depending on what you are doing with the data, a skewness $< 1$ might go along with wanting to transform the data and a skewness of $>1$ might go along with the leaving the data are they are, and similarly for kurtosis.
    – Nick Cox
    3 hours ago

















up vote
2
down vote

favorite












I've been in a debate with my graduate-level statistics professor about "normal distributions". I contend that to truly get a normal distribution one must have mean=median=mode, all the data must be contained under the bell curve, and perfectly symmetrical around the mean. Therefore, technically, there are virtually NO normal distributions in real studies, and we should call them something else, perhaps "near-normal".



She says I'm too picky, and if the skew/kurtosis are less than 1.0 it is a normal distribution and took off points on an exam. The dataset is total number of falls/year in a random sampling of 52 nursing homes which is a random sample of a larger population. Any insight?



QUESTION: 3. Compute measures of skewness and kurtosis for this data. Include a histogram with a normal curve. Discuss your findings. Is the data normally distributed?



Statistics
Number of falls

N Valid 52
Missing 0
Mean 11.23
Median 11.50
Mode 4a
a. Multiple modes exist. The smallest value is shown



Number of falls

N Valid 52
Missing 0
Skewness .114
Std. Error of Skewness .330
Kurtosis -.961
Std. Error of Kurtosis .650



MY ANSWER:
The data is platykurtic and has only slight positive skewing, and it is NOT a normal distribution because the mean and median and mode are not equal and the data is not evenly distributed around the mean. In reality virtually no data is ever a perfect normal distribution, although we can discuss “approximately normal distributions” such as height, weight, temperature, or length of adult ring finger in large population groups.
PROFESSOR'S ANSWER:
You are correct that there is no perfectly normal distribution. But, we are not looking for perfection. We need to look at data in addition to the histogram and the measures of central tendency. What do the skewness and kurtosis statistics tell you about the distribution? Because they are both between the critical values of -1 and +1, this data is considered to be normally distributed.










share|cite|improve this question









New contributor




Possum-Pie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.















  • 2




    I think you are being a little pedantic, it makes no sense to talk about median or mode of a continuous distribution.
    – user2974951
    5 hours ago






  • 2




    I would like to know your professor's exact wording. In principle a normal distribution has mean, median and mode identical (but so do many other distributions) and has skewness 0 and (so-called excess) kurtosis 0 (and so do some other distributions). At best a distribution with (e.g.) slight skewness or kurtosis is approximately normal. Note that almost all real data are at best approximations to named distributions in the theoretical menagerie.
    – Nick Cox
    4 hours ago






  • 7




    I don't agree with @user2974951 In company with every good text I know I am perfectly happy thinking that the normal distribution has a median and mode. And that applies widely to continuous distributions, although I don't doubt that pathological counter-examples can be identified.
    – Nick Cox
    4 hours ago






  • 1




    I didn't call you or your question pedantic, but the one answer to date is I think broadly on the right lines. The main issue here seems to be different questions, what is a normal distribution in principle and what is approximately normal in practice. (FWIW, I don't readily believe the data -- unless the people who fell over many, many times are now dead or in hospital. For other reasons, I know more than I want to about nursing homes. It is also crucial whether you are talking about falls per individual or some average rate of falls for a nursing home, e.g, falls per week per 100 patients).
    – Nick Cox
    4 hours ago







  • 1




    Thanks for the specific detail, which shows merit on both sides, but I am not grading either of you. However, I dissent strongly from the term critical values as used by your Professor, as the limits $pm 1$ for skewness and kurtosis have no standing whatsoever beyond being rules of thumb that might be used. Depending on what you are doing with the data, a skewness $< 1$ might go along with wanting to transform the data and a skewness of $>1$ might go along with the leaving the data are they are, and similarly for kurtosis.
    – Nick Cox
    3 hours ago













up vote
2
down vote

favorite









up vote
2
down vote

favorite











I've been in a debate with my graduate-level statistics professor about "normal distributions". I contend that to truly get a normal distribution one must have mean=median=mode, all the data must be contained under the bell curve, and perfectly symmetrical around the mean. Therefore, technically, there are virtually NO normal distributions in real studies, and we should call them something else, perhaps "near-normal".



She says I'm too picky, and if the skew/kurtosis are less than 1.0 it is a normal distribution and took off points on an exam. The dataset is total number of falls/year in a random sampling of 52 nursing homes which is a random sample of a larger population. Any insight?



QUESTION: 3. Compute measures of skewness and kurtosis for this data. Include a histogram with a normal curve. Discuss your findings. Is the data normally distributed?



Statistics
Number of falls

N Valid 52
Missing 0
Mean 11.23
Median 11.50
Mode 4a
a. Multiple modes exist. The smallest value is shown



Number of falls

N Valid 52
Missing 0
Skewness .114
Std. Error of Skewness .330
Kurtosis -.961
Std. Error of Kurtosis .650



MY ANSWER:
The data is platykurtic and has only slight positive skewing, and it is NOT a normal distribution because the mean and median and mode are not equal and the data is not evenly distributed around the mean. In reality virtually no data is ever a perfect normal distribution, although we can discuss “approximately normal distributions” such as height, weight, temperature, or length of adult ring finger in large population groups.
PROFESSOR'S ANSWER:
You are correct that there is no perfectly normal distribution. But, we are not looking for perfection. We need to look at data in addition to the histogram and the measures of central tendency. What do the skewness and kurtosis statistics tell you about the distribution? Because they are both between the critical values of -1 and +1, this data is considered to be normally distributed.










share|cite|improve this question









New contributor




Possum-Pie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I've been in a debate with my graduate-level statistics professor about "normal distributions". I contend that to truly get a normal distribution one must have mean=median=mode, all the data must be contained under the bell curve, and perfectly symmetrical around the mean. Therefore, technically, there are virtually NO normal distributions in real studies, and we should call them something else, perhaps "near-normal".



She says I'm too picky, and if the skew/kurtosis are less than 1.0 it is a normal distribution and took off points on an exam. The dataset is total number of falls/year in a random sampling of 52 nursing homes which is a random sample of a larger population. Any insight?



QUESTION: 3. Compute measures of skewness and kurtosis for this data. Include a histogram with a normal curve. Discuss your findings. Is the data normally distributed?



Statistics
Number of falls

N Valid 52
Missing 0
Mean 11.23
Median 11.50
Mode 4a
a. Multiple modes exist. The smallest value is shown



Number of falls

N Valid 52
Missing 0
Skewness .114
Std. Error of Skewness .330
Kurtosis -.961
Std. Error of Kurtosis .650



MY ANSWER:
The data is platykurtic and has only slight positive skewing, and it is NOT a normal distribution because the mean and median and mode are not equal and the data is not evenly distributed around the mean. In reality virtually no data is ever a perfect normal distribution, although we can discuss “approximately normal distributions” such as height, weight, temperature, or length of adult ring finger in large population groups.
PROFESSOR'S ANSWER:
You are correct that there is no perfectly normal distribution. But, we are not looking for perfection. We need to look at data in addition to the histogram and the measures of central tendency. What do the skewness and kurtosis statistics tell you about the distribution? Because they are both between the critical values of -1 and +1, this data is considered to be normally distributed.







mathematical-statistics descriptive-statistics






share|cite|improve this question









New contributor




Possum-Pie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|cite|improve this question









New contributor




Possum-Pie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|cite|improve this question




share|cite|improve this question








edited 3 hours ago





















New contributor




Possum-Pie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 5 hours ago









Possum-Pie

113




113




New contributor




Possum-Pie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Possum-Pie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Possum-Pie is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







  • 2




    I think you are being a little pedantic, it makes no sense to talk about median or mode of a continuous distribution.
    – user2974951
    5 hours ago






  • 2




    I would like to know your professor's exact wording. In principle a normal distribution has mean, median and mode identical (but so do many other distributions) and has skewness 0 and (so-called excess) kurtosis 0 (and so do some other distributions). At best a distribution with (e.g.) slight skewness or kurtosis is approximately normal. Note that almost all real data are at best approximations to named distributions in the theoretical menagerie.
    – Nick Cox
    4 hours ago






  • 7




    I don't agree with @user2974951 In company with every good text I know I am perfectly happy thinking that the normal distribution has a median and mode. And that applies widely to continuous distributions, although I don't doubt that pathological counter-examples can be identified.
    – Nick Cox
    4 hours ago






  • 1




    I didn't call you or your question pedantic, but the one answer to date is I think broadly on the right lines. The main issue here seems to be different questions, what is a normal distribution in principle and what is approximately normal in practice. (FWIW, I don't readily believe the data -- unless the people who fell over many, many times are now dead or in hospital. For other reasons, I know more than I want to about nursing homes. It is also crucial whether you are talking about falls per individual or some average rate of falls for a nursing home, e.g, falls per week per 100 patients).
    – Nick Cox
    4 hours ago







  • 1




    Thanks for the specific detail, which shows merit on both sides, but I am not grading either of you. However, I dissent strongly from the term critical values as used by your Professor, as the limits $pm 1$ for skewness and kurtosis have no standing whatsoever beyond being rules of thumb that might be used. Depending on what you are doing with the data, a skewness $< 1$ might go along with wanting to transform the data and a skewness of $>1$ might go along with the leaving the data are they are, and similarly for kurtosis.
    – Nick Cox
    3 hours ago













  • 2




    I think you are being a little pedantic, it makes no sense to talk about median or mode of a continuous distribution.
    – user2974951
    5 hours ago






  • 2




    I would like to know your professor's exact wording. In principle a normal distribution has mean, median and mode identical (but so do many other distributions) and has skewness 0 and (so-called excess) kurtosis 0 (and so do some other distributions). At best a distribution with (e.g.) slight skewness or kurtosis is approximately normal. Note that almost all real data are at best approximations to named distributions in the theoretical menagerie.
    – Nick Cox
    4 hours ago






  • 7




    I don't agree with @user2974951 In company with every good text I know I am perfectly happy thinking that the normal distribution has a median and mode. And that applies widely to continuous distributions, although I don't doubt that pathological counter-examples can be identified.
    – Nick Cox
    4 hours ago






  • 1




    I didn't call you or your question pedantic, but the one answer to date is I think broadly on the right lines. The main issue here seems to be different questions, what is a normal distribution in principle and what is approximately normal in practice. (FWIW, I don't readily believe the data -- unless the people who fell over many, many times are now dead or in hospital. For other reasons, I know more than I want to about nursing homes. It is also crucial whether you are talking about falls per individual or some average rate of falls for a nursing home, e.g, falls per week per 100 patients).
    – Nick Cox
    4 hours ago







  • 1




    Thanks for the specific detail, which shows merit on both sides, but I am not grading either of you. However, I dissent strongly from the term critical values as used by your Professor, as the limits $pm 1$ for skewness and kurtosis have no standing whatsoever beyond being rules of thumb that might be used. Depending on what you are doing with the data, a skewness $< 1$ might go along with wanting to transform the data and a skewness of $>1$ might go along with the leaving the data are they are, and similarly for kurtosis.
    – Nick Cox
    3 hours ago








2




2




I think you are being a little pedantic, it makes no sense to talk about median or mode of a continuous distribution.
– user2974951
5 hours ago




I think you are being a little pedantic, it makes no sense to talk about median or mode of a continuous distribution.
– user2974951
5 hours ago




2




2




I would like to know your professor's exact wording. In principle a normal distribution has mean, median and mode identical (but so do many other distributions) and has skewness 0 and (so-called excess) kurtosis 0 (and so do some other distributions). At best a distribution with (e.g.) slight skewness or kurtosis is approximately normal. Note that almost all real data are at best approximations to named distributions in the theoretical menagerie.
– Nick Cox
4 hours ago




I would like to know your professor's exact wording. In principle a normal distribution has mean, median and mode identical (but so do many other distributions) and has skewness 0 and (so-called excess) kurtosis 0 (and so do some other distributions). At best a distribution with (e.g.) slight skewness or kurtosis is approximately normal. Note that almost all real data are at best approximations to named distributions in the theoretical menagerie.
– Nick Cox
4 hours ago




7




7




I don't agree with @user2974951 In company with every good text I know I am perfectly happy thinking that the normal distribution has a median and mode. And that applies widely to continuous distributions, although I don't doubt that pathological counter-examples can be identified.
– Nick Cox
4 hours ago




I don't agree with @user2974951 In company with every good text I know I am perfectly happy thinking that the normal distribution has a median and mode. And that applies widely to continuous distributions, although I don't doubt that pathological counter-examples can be identified.
– Nick Cox
4 hours ago




1




1




I didn't call you or your question pedantic, but the one answer to date is I think broadly on the right lines. The main issue here seems to be different questions, what is a normal distribution in principle and what is approximately normal in practice. (FWIW, I don't readily believe the data -- unless the people who fell over many, many times are now dead or in hospital. For other reasons, I know more than I want to about nursing homes. It is also crucial whether you are talking about falls per individual or some average rate of falls for a nursing home, e.g, falls per week per 100 patients).
– Nick Cox
4 hours ago





I didn't call you or your question pedantic, but the one answer to date is I think broadly on the right lines. The main issue here seems to be different questions, what is a normal distribution in principle and what is approximately normal in practice. (FWIW, I don't readily believe the data -- unless the people who fell over many, many times are now dead or in hospital. For other reasons, I know more than I want to about nursing homes. It is also crucial whether you are talking about falls per individual or some average rate of falls for a nursing home, e.g, falls per week per 100 patients).
– Nick Cox
4 hours ago





1




1




Thanks for the specific detail, which shows merit on both sides, but I am not grading either of you. However, I dissent strongly from the term critical values as used by your Professor, as the limits $pm 1$ for skewness and kurtosis have no standing whatsoever beyond being rules of thumb that might be used. Depending on what you are doing with the data, a skewness $< 1$ might go along with wanting to transform the data and a skewness of $>1$ might go along with the leaving the data are they are, and similarly for kurtosis.
– Nick Cox
3 hours ago





Thanks for the specific detail, which shows merit on both sides, but I am not grading either of you. However, I dissent strongly from the term critical values as used by your Professor, as the limits $pm 1$ for skewness and kurtosis have no standing whatsoever beyond being rules of thumb that might be used. Depending on what you are doing with the data, a skewness $< 1$ might go along with wanting to transform the data and a skewness of $>1$ might go along with the leaving the data are they are, and similarly for kurtosis.
– Nick Cox
3 hours ago











5 Answers
5






active

oldest

votes

















up vote
3
down vote













The teacher is clearly out of his/her element, and probably should not be teaching statistics. It seems worse to me to teach something wrong than to not teach it at all.



These issues could all be cleared up easily if the distinction between "data" and "process that produced the data" were made more clearly. Data target the process that produced the data. The normal distribution is a model for this process.



It makes no sense to talk about whether the data are normally distributed. For one reason, the data are always discrete. For another reason, the normal distribution describes an infinity of potentially observable quantities, not a finite set of specific observed quantities.



Further, the answer to the question "is the process that produced the data a normally distributed process" is also always "no," regardless of the data. Two simple reasons: (i) any measurements we take are necessarily discrete, being rounded off to some level. (ii) perfect symmetry, like a perfect circle, does not exist in observable nature. There are always imperfections.



At best, the answer to the question "what do these data tell you about normality of the data-generating process" could be given as follows: "these data are consistent with what we would expect to see, had the data truly come from a normally distributed process." That answer correctly does not conclude that the distribution is normal.



These issues are very easily understood by using simulation. Just simulate data from a normal distribution and compare those to the existing data. If the data are counts (0,1,2,3,...), then obviously the normal model is wrong because it does not produce numbers like 0,1,2,3,...; instead, it produces numbers with decimals that go on forever (or at least as far as the computer will allow.) Such simulation should be the first thing you do when learning about the normality question. Then you can more correctly interpret the graphs and summary statistics.






share|cite|improve this answer




















  • I didnt downgrade your answer, but consider that you're judging a grad professor from the words of a student. How likely it is that a student is right and a teacher is wrong? Isn't it more likely that student is misrepresenting his professor and the context of the conversation?
    – Aksakal
    55 mins ago










  • Based on my experience, and on the students words, I'd say it is more likely that the teacher is wrong. There are teachers with little formal training that teach courses, even grad courses, in universities everywhere. If the accreditating agencies only knew the truth!
    – Peter Westfall
    20 mins ago

















up vote
1
down vote













You're missing the point and probably are also being "difficult," which is not appreciated in the industry. She's showing you a toy example, to train you in assessment of normality of a data set. Looking at distribution moments is one way to check the normality, e.g. Jarque Bera test is based on such an assessment.



Yes, the normal distribution is perfectly symmetrical. However, if you draw a sample from a true normal distribution, that sample will most likely not be perfectly symmetrical. This is the point you're completely missing. You can test this very easily yourself. Just generate a sample from Gaussian distribution, and check its moment. They'll never be perfectly "normal," despite the true distribution being such.



Here's a silly Python example. I'm generating 100 samples of 100 random numbers, then obtaining their means and medians. I print the first sample to show that the mean and median are different, then show the histogram of the difference between the means and medians. You can see that it's rather narrow, but the difference is basically never zero. Note, that the numbers are coming from a normal distribution.



code:



import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1)
s = np.random.normal(0, 1, (100,100))
print('sample 0 mean:',np.mean(s[:,0]),'median:',np.median(s[:,0]))

plt.hist(np.mean(s,0)-np.median(s,0))
plt.show()
print('avg mean-median:',np.mean(np.mean(s,0)-np.median(s,0)))


outputs:
enter image description here






share|cite|improve this answer


















  • 1




    (+1) Exactly the point. Confusing the random variable and a sample of realisations from that random variable.
    – Xi'an
    41 mins ago

















up vote
0
down vote













I think you and your professor are talking in different context. Equality of mean = median = mode is characteristics of theoretical distribution and this is not the only characteristics. You can not say that if for any distribution above property hold then distribution is normal. T-distribution is also symmetric but it is not normal. So, you are talking about theoretical properties of normal distribution which hold always true for normal distribution.



You professor is talking about distribution of sample data. He is right, you will never get data in real life, where you will find mean = median = mode. This is simply due to sampling error. Similarly, it is very unlikely, you will get zero coefficient of skewness for sample data and zero excess kurtosis. Your professor is just giving you simple rule to get an idea about the distribution from the sample statistics. Which is not true in general (without getting further information).






share|cite|improve this answer




















  • Professor is said to be femaie.
    – Nick Cox
    4 hours ago










  • Why you don't get mean=median=mode is mostly because many distributions really are skewed! (Strictly, mean=median=mode is possible with skewed distributions too, despite what many textbooks say.)
    – Nick Cox
    4 hours ago










  • I disagree that lack of equality of mean/median/mode= sampling error. Suppose you random sampled 52 nursing homes for fall rates. Homes 27, 34, and 52 are chronicly short-staffed and always have above-average number of falls. Those homes pushes mean towards tail and isn't due to a sampling error.
    – Possum-Pie
    4 hours ago











  • @Possum Pie What the data are is secondary here but you're giving different signals in different places. Here you are talking about several nursing homes -- but in your question you state "in a nursing home". Being unclear about even incidental details does not help.
    – Nick Cox
    4 hours ago










  • @Nick Cox Sorry, I clarified it. Number of falls/year in a sample of 52 nursing homes
    – Possum-Pie
    4 hours ago


















up vote
0
down vote













I'm an engineer, so in my world, the applied statistician is what I see most, and get the most concrete value. If you are going to work in applied, then you need to be solidly grounded in practice over theory: whether or not it is elegant, the aircraft has to fly and not crash.



When I think about this question the way I approach it, as many of my technical betters here have also done, is to think about "what does it look like in the real world with the presence of noise".



The second thing that I do is, often, to make a simulation that allows me to get my hands around the question.



Here is a very brief exploration:



#show how the mean and the median differ with respect to sample size

#libraries
library(reshape2)
library(ggplot2)

#sample sizes
ssizes <- 10^(seq(from=1, to=3, by=0.25))
ssizes <- round(ssizes)

#loops per sample
n_loops <- 5000

#pre-declare, prep for loop
my_store <- matrix(0,
ncol = 3,
nrow = n_loops*length(ssizes))

count <- 1

for(i in 1:length(ssizes))

#how many samples
n_samp <- ssizes[i]

for(j in 1:n_loops)

#draw samples
y <- 0
y <- rnorm(n = n_samp,mean = 0, sd = 1)

#compute mean, median, mode
my_store[count,1] <- n_samp
my_store[count,2] <- median(y)
my_store[count,3] <- mean(y)


#update
count = count + 1




#make data into ggplot friendly form
df <- data.frame(my_store)
names(df) <- c("n_samp", "median","mean")

df <- melt(df, id.vars = 1, measure.vars = c("median","mean"))


#make ggplot
ggplot(df, aes(x=as.factor(n_samp),
y = value,
fill = variable)) + geom_boxplot() +
labs(title = "Contrast Median and Mean estimate variation vs. Sample Size",
x = "Number of Samples",
y = "Estimated value")


It gives this as the output:
enter image description here



Note: be careful about the x-axis, because it is log-scaled, not uniform-scaled.



I know that the mean and median are exactly the same. The code says it. The empirical realization is greatly sensitive to sample size, and if there aren't truly infinite samples, then they can't ever perfectly match with theory.



You can think about whether the uncertainty in the median envelopes the estimated mean or vice versa. If the best estimate of the mean is within the 95% CI of the estimate for the median, then the data can't tell the difference. The data says they are the same in theory. If you get more data, then see what it says.






share|cite|improve this answer




















  • Interesting graph. I would have thought the Mean would have been generally larger than the median at first considering the mean chases outliers...in other words the red bars would be mean and green be medians. What am I missing?
    – Possum-Pie
    2 hours ago


















up vote
0
down vote













In medical statistics, we only ever comment on the shapes and seeming of distributions. The fact that no discrete finite sample can ever be normal is irrelevant and pedantic. I would mark you wrong for that.



If a distribution looks "mostly" normal, we are comfortable with calling it normal. When I describe distributions for a non-statistical audience, I am very comfortable with calling something approximately normal even when I know the normal distribution is not the underlying probability model, I get the sense I would side with your teacher here... but we have no histogram or dataset to verify.



As a tip, I would go through the following inspections very closely:



  • who are the outliers, how many and what are their values?

  • Are the data bimodal?

  • Do the data seem to take a skewed shape so that some transformation (like a log) would better quantify the "distance" between observations?

  • Is there apparent truncation or heaping so that assays or labs are failing to reliably detect a certain range of values?





share|cite|improve this answer






















    Your Answer




    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "65"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );






    Possum-Pie is a new contributor. Be nice, and check out our Code of Conduct.









     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f367254%2fdoes-a-normal-distribution-need-to-have-mean-median-mode%23new-answer', 'question_page');

    );

    Post as a guest






























    5 Answers
    5






    active

    oldest

    votes








    5 Answers
    5






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    3
    down vote













    The teacher is clearly out of his/her element, and probably should not be teaching statistics. It seems worse to me to teach something wrong than to not teach it at all.



    These issues could all be cleared up easily if the distinction between "data" and "process that produced the data" were made more clearly. Data target the process that produced the data. The normal distribution is a model for this process.



    It makes no sense to talk about whether the data are normally distributed. For one reason, the data are always discrete. For another reason, the normal distribution describes an infinity of potentially observable quantities, not a finite set of specific observed quantities.



    Further, the answer to the question "is the process that produced the data a normally distributed process" is also always "no," regardless of the data. Two simple reasons: (i) any measurements we take are necessarily discrete, being rounded off to some level. (ii) perfect symmetry, like a perfect circle, does not exist in observable nature. There are always imperfections.



    At best, the answer to the question "what do these data tell you about normality of the data-generating process" could be given as follows: "these data are consistent with what we would expect to see, had the data truly come from a normally distributed process." That answer correctly does not conclude that the distribution is normal.



    These issues are very easily understood by using simulation. Just simulate data from a normal distribution and compare those to the existing data. If the data are counts (0,1,2,3,...), then obviously the normal model is wrong because it does not produce numbers like 0,1,2,3,...; instead, it produces numbers with decimals that go on forever (or at least as far as the computer will allow.) Such simulation should be the first thing you do when learning about the normality question. Then you can more correctly interpret the graphs and summary statistics.






    share|cite|improve this answer




















    • I didnt downgrade your answer, but consider that you're judging a grad professor from the words of a student. How likely it is that a student is right and a teacher is wrong? Isn't it more likely that student is misrepresenting his professor and the context of the conversation?
      – Aksakal
      55 mins ago










    • Based on my experience, and on the students words, I'd say it is more likely that the teacher is wrong. There are teachers with little formal training that teach courses, even grad courses, in universities everywhere. If the accreditating agencies only knew the truth!
      – Peter Westfall
      20 mins ago














    up vote
    3
    down vote













    The teacher is clearly out of his/her element, and probably should not be teaching statistics. It seems worse to me to teach something wrong than to not teach it at all.



    These issues could all be cleared up easily if the distinction between "data" and "process that produced the data" were made more clearly. Data target the process that produced the data. The normal distribution is a model for this process.



    It makes no sense to talk about whether the data are normally distributed. For one reason, the data are always discrete. For another reason, the normal distribution describes an infinity of potentially observable quantities, not a finite set of specific observed quantities.



    Further, the answer to the question "is the process that produced the data a normally distributed process" is also always "no," regardless of the data. Two simple reasons: (i) any measurements we take are necessarily discrete, being rounded off to some level. (ii) perfect symmetry, like a perfect circle, does not exist in observable nature. There are always imperfections.



    At best, the answer to the question "what do these data tell you about normality of the data-generating process" could be given as follows: "these data are consistent with what we would expect to see, had the data truly come from a normally distributed process." That answer correctly does not conclude that the distribution is normal.



    These issues are very easily understood by using simulation. Just simulate data from a normal distribution and compare those to the existing data. If the data are counts (0,1,2,3,...), then obviously the normal model is wrong because it does not produce numbers like 0,1,2,3,...; instead, it produces numbers with decimals that go on forever (or at least as far as the computer will allow.) Such simulation should be the first thing you do when learning about the normality question. Then you can more correctly interpret the graphs and summary statistics.






    share|cite|improve this answer




















    • I didnt downgrade your answer, but consider that you're judging a grad professor from the words of a student. How likely it is that a student is right and a teacher is wrong? Isn't it more likely that student is misrepresenting his professor and the context of the conversation?
      – Aksakal
      55 mins ago










    • Based on my experience, and on the students words, I'd say it is more likely that the teacher is wrong. There are teachers with little formal training that teach courses, even grad courses, in universities everywhere. If the accreditating agencies only knew the truth!
      – Peter Westfall
      20 mins ago












    up vote
    3
    down vote










    up vote
    3
    down vote









    The teacher is clearly out of his/her element, and probably should not be teaching statistics. It seems worse to me to teach something wrong than to not teach it at all.



    These issues could all be cleared up easily if the distinction between "data" and "process that produced the data" were made more clearly. Data target the process that produced the data. The normal distribution is a model for this process.



    It makes no sense to talk about whether the data are normally distributed. For one reason, the data are always discrete. For another reason, the normal distribution describes an infinity of potentially observable quantities, not a finite set of specific observed quantities.



    Further, the answer to the question "is the process that produced the data a normally distributed process" is also always "no," regardless of the data. Two simple reasons: (i) any measurements we take are necessarily discrete, being rounded off to some level. (ii) perfect symmetry, like a perfect circle, does not exist in observable nature. There are always imperfections.



    At best, the answer to the question "what do these data tell you about normality of the data-generating process" could be given as follows: "these data are consistent with what we would expect to see, had the data truly come from a normally distributed process." That answer correctly does not conclude that the distribution is normal.



    These issues are very easily understood by using simulation. Just simulate data from a normal distribution and compare those to the existing data. If the data are counts (0,1,2,3,...), then obviously the normal model is wrong because it does not produce numbers like 0,1,2,3,...; instead, it produces numbers with decimals that go on forever (or at least as far as the computer will allow.) Such simulation should be the first thing you do when learning about the normality question. Then you can more correctly interpret the graphs and summary statistics.






    share|cite|improve this answer












    The teacher is clearly out of his/her element, and probably should not be teaching statistics. It seems worse to me to teach something wrong than to not teach it at all.



    These issues could all be cleared up easily if the distinction between "data" and "process that produced the data" were made more clearly. Data target the process that produced the data. The normal distribution is a model for this process.



    It makes no sense to talk about whether the data are normally distributed. For one reason, the data are always discrete. For another reason, the normal distribution describes an infinity of potentially observable quantities, not a finite set of specific observed quantities.



    Further, the answer to the question "is the process that produced the data a normally distributed process" is also always "no," regardless of the data. Two simple reasons: (i) any measurements we take are necessarily discrete, being rounded off to some level. (ii) perfect symmetry, like a perfect circle, does not exist in observable nature. There are always imperfections.



    At best, the answer to the question "what do these data tell you about normality of the data-generating process" could be given as follows: "these data are consistent with what we would expect to see, had the data truly come from a normally distributed process." That answer correctly does not conclude that the distribution is normal.



    These issues are very easily understood by using simulation. Just simulate data from a normal distribution and compare those to the existing data. If the data are counts (0,1,2,3,...), then obviously the normal model is wrong because it does not produce numbers like 0,1,2,3,...; instead, it produces numbers with decimals that go on forever (or at least as far as the computer will allow.) Such simulation should be the first thing you do when learning about the normality question. Then you can more correctly interpret the graphs and summary statistics.







    share|cite|improve this answer












    share|cite|improve this answer



    share|cite|improve this answer










    answered 2 hours ago









    Peter Westfall

    55339




    55339











    • I didnt downgrade your answer, but consider that you're judging a grad professor from the words of a student. How likely it is that a student is right and a teacher is wrong? Isn't it more likely that student is misrepresenting his professor and the context of the conversation?
      – Aksakal
      55 mins ago










    • Based on my experience, and on the students words, I'd say it is more likely that the teacher is wrong. There are teachers with little formal training that teach courses, even grad courses, in universities everywhere. If the accreditating agencies only knew the truth!
      – Peter Westfall
      20 mins ago
















    • I didnt downgrade your answer, but consider that you're judging a grad professor from the words of a student. How likely it is that a student is right and a teacher is wrong? Isn't it more likely that student is misrepresenting his professor and the context of the conversation?
      – Aksakal
      55 mins ago










    • Based on my experience, and on the students words, I'd say it is more likely that the teacher is wrong. There are teachers with little formal training that teach courses, even grad courses, in universities everywhere. If the accreditating agencies only knew the truth!
      – Peter Westfall
      20 mins ago















    I didnt downgrade your answer, but consider that you're judging a grad professor from the words of a student. How likely it is that a student is right and a teacher is wrong? Isn't it more likely that student is misrepresenting his professor and the context of the conversation?
    – Aksakal
    55 mins ago




    I didnt downgrade your answer, but consider that you're judging a grad professor from the words of a student. How likely it is that a student is right and a teacher is wrong? Isn't it more likely that student is misrepresenting his professor and the context of the conversation?
    – Aksakal
    55 mins ago












    Based on my experience, and on the students words, I'd say it is more likely that the teacher is wrong. There are teachers with little formal training that teach courses, even grad courses, in universities everywhere. If the accreditating agencies only knew the truth!
    – Peter Westfall
    20 mins ago




    Based on my experience, and on the students words, I'd say it is more likely that the teacher is wrong. There are teachers with little formal training that teach courses, even grad courses, in universities everywhere. If the accreditating agencies only knew the truth!
    – Peter Westfall
    20 mins ago












    up vote
    1
    down vote













    You're missing the point and probably are also being "difficult," which is not appreciated in the industry. She's showing you a toy example, to train you in assessment of normality of a data set. Looking at distribution moments is one way to check the normality, e.g. Jarque Bera test is based on such an assessment.



    Yes, the normal distribution is perfectly symmetrical. However, if you draw a sample from a true normal distribution, that sample will most likely not be perfectly symmetrical. This is the point you're completely missing. You can test this very easily yourself. Just generate a sample from Gaussian distribution, and check its moment. They'll never be perfectly "normal," despite the true distribution being such.



    Here's a silly Python example. I'm generating 100 samples of 100 random numbers, then obtaining their means and medians. I print the first sample to show that the mean and median are different, then show the histogram of the difference between the means and medians. You can see that it's rather narrow, but the difference is basically never zero. Note, that the numbers are coming from a normal distribution.



    code:



    import numpy as np
    import matplotlib.pyplot as plt

    np.random.seed(1)
    s = np.random.normal(0, 1, (100,100))
    print('sample 0 mean:',np.mean(s[:,0]),'median:',np.median(s[:,0]))

    plt.hist(np.mean(s,0)-np.median(s,0))
    plt.show()
    print('avg mean-median:',np.mean(np.mean(s,0)-np.median(s,0)))


    outputs:
    enter image description here






    share|cite|improve this answer


















    • 1




      (+1) Exactly the point. Confusing the random variable and a sample of realisations from that random variable.
      – Xi'an
      41 mins ago














    up vote
    1
    down vote













    You're missing the point and probably are also being "difficult," which is not appreciated in the industry. She's showing you a toy example, to train you in assessment of normality of a data set. Looking at distribution moments is one way to check the normality, e.g. Jarque Bera test is based on such an assessment.



    Yes, the normal distribution is perfectly symmetrical. However, if you draw a sample from a true normal distribution, that sample will most likely not be perfectly symmetrical. This is the point you're completely missing. You can test this very easily yourself. Just generate a sample from Gaussian distribution, and check its moment. They'll never be perfectly "normal," despite the true distribution being such.



    Here's a silly Python example. I'm generating 100 samples of 100 random numbers, then obtaining their means and medians. I print the first sample to show that the mean and median are different, then show the histogram of the difference between the means and medians. You can see that it's rather narrow, but the difference is basically never zero. Note, that the numbers are coming from a normal distribution.



    code:



    import numpy as np
    import matplotlib.pyplot as plt

    np.random.seed(1)
    s = np.random.normal(0, 1, (100,100))
    print('sample 0 mean:',np.mean(s[:,0]),'median:',np.median(s[:,0]))

    plt.hist(np.mean(s,0)-np.median(s,0))
    plt.show()
    print('avg mean-median:',np.mean(np.mean(s,0)-np.median(s,0)))


    outputs:
    enter image description here






    share|cite|improve this answer


















    • 1




      (+1) Exactly the point. Confusing the random variable and a sample of realisations from that random variable.
      – Xi'an
      41 mins ago












    up vote
    1
    down vote










    up vote
    1
    down vote









    You're missing the point and probably are also being "difficult," which is not appreciated in the industry. She's showing you a toy example, to train you in assessment of normality of a data set. Looking at distribution moments is one way to check the normality, e.g. Jarque Bera test is based on such an assessment.



    Yes, the normal distribution is perfectly symmetrical. However, if you draw a sample from a true normal distribution, that sample will most likely not be perfectly symmetrical. This is the point you're completely missing. You can test this very easily yourself. Just generate a sample from Gaussian distribution, and check its moment. They'll never be perfectly "normal," despite the true distribution being such.



    Here's a silly Python example. I'm generating 100 samples of 100 random numbers, then obtaining their means and medians. I print the first sample to show that the mean and median are different, then show the histogram of the difference between the means and medians. You can see that it's rather narrow, but the difference is basically never zero. Note, that the numbers are coming from a normal distribution.



    code:



    import numpy as np
    import matplotlib.pyplot as plt

    np.random.seed(1)
    s = np.random.normal(0, 1, (100,100))
    print('sample 0 mean:',np.mean(s[:,0]),'median:',np.median(s[:,0]))

    plt.hist(np.mean(s,0)-np.median(s,0))
    plt.show()
    print('avg mean-median:',np.mean(np.mean(s,0)-np.median(s,0)))


    outputs:
    enter image description here






    share|cite|improve this answer














    You're missing the point and probably are also being "difficult," which is not appreciated in the industry. She's showing you a toy example, to train you in assessment of normality of a data set. Looking at distribution moments is one way to check the normality, e.g. Jarque Bera test is based on such an assessment.



    Yes, the normal distribution is perfectly symmetrical. However, if you draw a sample from a true normal distribution, that sample will most likely not be perfectly symmetrical. This is the point you're completely missing. You can test this very easily yourself. Just generate a sample from Gaussian distribution, and check its moment. They'll never be perfectly "normal," despite the true distribution being such.



    Here's a silly Python example. I'm generating 100 samples of 100 random numbers, then obtaining their means and medians. I print the first sample to show that the mean and median are different, then show the histogram of the difference between the means and medians. You can see that it's rather narrow, but the difference is basically never zero. Note, that the numbers are coming from a normal distribution.



    code:



    import numpy as np
    import matplotlib.pyplot as plt

    np.random.seed(1)
    s = np.random.normal(0, 1, (100,100))
    print('sample 0 mean:',np.mean(s[:,0]),'median:',np.median(s[:,0]))

    plt.hist(np.mean(s,0)-np.median(s,0))
    plt.show()
    print('avg mean-median:',np.mean(np.mean(s,0)-np.median(s,0)))


    outputs:
    enter image description here







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited 38 mins ago

























    answered 58 mins ago









    Aksakal

    36.2k345104




    36.2k345104







    • 1




      (+1) Exactly the point. Confusing the random variable and a sample of realisations from that random variable.
      – Xi'an
      41 mins ago












    • 1




      (+1) Exactly the point. Confusing the random variable and a sample of realisations from that random variable.
      – Xi'an
      41 mins ago







    1




    1




    (+1) Exactly the point. Confusing the random variable and a sample of realisations from that random variable.
    – Xi'an
    41 mins ago




    (+1) Exactly the point. Confusing the random variable and a sample of realisations from that random variable.
    – Xi'an
    41 mins ago










    up vote
    0
    down vote













    I think you and your professor are talking in different context. Equality of mean = median = mode is characteristics of theoretical distribution and this is not the only characteristics. You can not say that if for any distribution above property hold then distribution is normal. T-distribution is also symmetric but it is not normal. So, you are talking about theoretical properties of normal distribution which hold always true for normal distribution.



    You professor is talking about distribution of sample data. He is right, you will never get data in real life, where you will find mean = median = mode. This is simply due to sampling error. Similarly, it is very unlikely, you will get zero coefficient of skewness for sample data and zero excess kurtosis. Your professor is just giving you simple rule to get an idea about the distribution from the sample statistics. Which is not true in general (without getting further information).






    share|cite|improve this answer




















    • Professor is said to be femaie.
      – Nick Cox
      4 hours ago










    • Why you don't get mean=median=mode is mostly because many distributions really are skewed! (Strictly, mean=median=mode is possible with skewed distributions too, despite what many textbooks say.)
      – Nick Cox
      4 hours ago










    • I disagree that lack of equality of mean/median/mode= sampling error. Suppose you random sampled 52 nursing homes for fall rates. Homes 27, 34, and 52 are chronicly short-staffed and always have above-average number of falls. Those homes pushes mean towards tail and isn't due to a sampling error.
      – Possum-Pie
      4 hours ago











    • @Possum Pie What the data are is secondary here but you're giving different signals in different places. Here you are talking about several nursing homes -- but in your question you state "in a nursing home". Being unclear about even incidental details does not help.
      – Nick Cox
      4 hours ago










    • @Nick Cox Sorry, I clarified it. Number of falls/year in a sample of 52 nursing homes
      – Possum-Pie
      4 hours ago















    up vote
    0
    down vote













    I think you and your professor are talking in different context. Equality of mean = median = mode is characteristics of theoretical distribution and this is not the only characteristics. You can not say that if for any distribution above property hold then distribution is normal. T-distribution is also symmetric but it is not normal. So, you are talking about theoretical properties of normal distribution which hold always true for normal distribution.



    You professor is talking about distribution of sample data. He is right, you will never get data in real life, where you will find mean = median = mode. This is simply due to sampling error. Similarly, it is very unlikely, you will get zero coefficient of skewness for sample data and zero excess kurtosis. Your professor is just giving you simple rule to get an idea about the distribution from the sample statistics. Which is not true in general (without getting further information).






    share|cite|improve this answer




















    • Professor is said to be femaie.
      – Nick Cox
      4 hours ago










    • Why you don't get mean=median=mode is mostly because many distributions really are skewed! (Strictly, mean=median=mode is possible with skewed distributions too, despite what many textbooks say.)
      – Nick Cox
      4 hours ago










    • I disagree that lack of equality of mean/median/mode= sampling error. Suppose you random sampled 52 nursing homes for fall rates. Homes 27, 34, and 52 are chronicly short-staffed and always have above-average number of falls. Those homes pushes mean towards tail and isn't due to a sampling error.
      – Possum-Pie
      4 hours ago











    • @Possum Pie What the data are is secondary here but you're giving different signals in different places. Here you are talking about several nursing homes -- but in your question you state "in a nursing home". Being unclear about even incidental details does not help.
      – Nick Cox
      4 hours ago










    • @Nick Cox Sorry, I clarified it. Number of falls/year in a sample of 52 nursing homes
      – Possum-Pie
      4 hours ago













    up vote
    0
    down vote










    up vote
    0
    down vote









    I think you and your professor are talking in different context. Equality of mean = median = mode is characteristics of theoretical distribution and this is not the only characteristics. You can not say that if for any distribution above property hold then distribution is normal. T-distribution is also symmetric but it is not normal. So, you are talking about theoretical properties of normal distribution which hold always true for normal distribution.



    You professor is talking about distribution of sample data. He is right, you will never get data in real life, where you will find mean = median = mode. This is simply due to sampling error. Similarly, it is very unlikely, you will get zero coefficient of skewness for sample data and zero excess kurtosis. Your professor is just giving you simple rule to get an idea about the distribution from the sample statistics. Which is not true in general (without getting further information).






    share|cite|improve this answer












    I think you and your professor are talking in different context. Equality of mean = median = mode is characteristics of theoretical distribution and this is not the only characteristics. You can not say that if for any distribution above property hold then distribution is normal. T-distribution is also symmetric but it is not normal. So, you are talking about theoretical properties of normal distribution which hold always true for normal distribution.



    You professor is talking about distribution of sample data. He is right, you will never get data in real life, where you will find mean = median = mode. This is simply due to sampling error. Similarly, it is very unlikely, you will get zero coefficient of skewness for sample data and zero excess kurtosis. Your professor is just giving you simple rule to get an idea about the distribution from the sample statistics. Which is not true in general (without getting further information).







    share|cite|improve this answer












    share|cite|improve this answer



    share|cite|improve this answer










    answered 4 hours ago









    Neeraj

    652519




    652519











    • Professor is said to be femaie.
      – Nick Cox
      4 hours ago










    • Why you don't get mean=median=mode is mostly because many distributions really are skewed! (Strictly, mean=median=mode is possible with skewed distributions too, despite what many textbooks say.)
      – Nick Cox
      4 hours ago










    • I disagree that lack of equality of mean/median/mode= sampling error. Suppose you random sampled 52 nursing homes for fall rates. Homes 27, 34, and 52 are chronicly short-staffed and always have above-average number of falls. Those homes pushes mean towards tail and isn't due to a sampling error.
      – Possum-Pie
      4 hours ago











    • @Possum Pie What the data are is secondary here but you're giving different signals in different places. Here you are talking about several nursing homes -- but in your question you state "in a nursing home". Being unclear about even incidental details does not help.
      – Nick Cox
      4 hours ago










    • @Nick Cox Sorry, I clarified it. Number of falls/year in a sample of 52 nursing homes
      – Possum-Pie
      4 hours ago

















    • Professor is said to be femaie.
      – Nick Cox
      4 hours ago










    • Why you don't get mean=median=mode is mostly because many distributions really are skewed! (Strictly, mean=median=mode is possible with skewed distributions too, despite what many textbooks say.)
      – Nick Cox
      4 hours ago










    • I disagree that lack of equality of mean/median/mode= sampling error. Suppose you random sampled 52 nursing homes for fall rates. Homes 27, 34, and 52 are chronicly short-staffed and always have above-average number of falls. Those homes pushes mean towards tail and isn't due to a sampling error.
      – Possum-Pie
      4 hours ago











    • @Possum Pie What the data are is secondary here but you're giving different signals in different places. Here you are talking about several nursing homes -- but in your question you state "in a nursing home". Being unclear about even incidental details does not help.
      – Nick Cox
      4 hours ago










    • @Nick Cox Sorry, I clarified it. Number of falls/year in a sample of 52 nursing homes
      – Possum-Pie
      4 hours ago
















    Professor is said to be femaie.
    – Nick Cox
    4 hours ago




    Professor is said to be femaie.
    – Nick Cox
    4 hours ago












    Why you don't get mean=median=mode is mostly because many distributions really are skewed! (Strictly, mean=median=mode is possible with skewed distributions too, despite what many textbooks say.)
    – Nick Cox
    4 hours ago




    Why you don't get mean=median=mode is mostly because many distributions really are skewed! (Strictly, mean=median=mode is possible with skewed distributions too, despite what many textbooks say.)
    – Nick Cox
    4 hours ago












    I disagree that lack of equality of mean/median/mode= sampling error. Suppose you random sampled 52 nursing homes for fall rates. Homes 27, 34, and 52 are chronicly short-staffed and always have above-average number of falls. Those homes pushes mean towards tail and isn't due to a sampling error.
    – Possum-Pie
    4 hours ago





    I disagree that lack of equality of mean/median/mode= sampling error. Suppose you random sampled 52 nursing homes for fall rates. Homes 27, 34, and 52 are chronicly short-staffed and always have above-average number of falls. Those homes pushes mean towards tail and isn't due to a sampling error.
    – Possum-Pie
    4 hours ago













    @Possum Pie What the data are is secondary here but you're giving different signals in different places. Here you are talking about several nursing homes -- but in your question you state "in a nursing home". Being unclear about even incidental details does not help.
    – Nick Cox
    4 hours ago




    @Possum Pie What the data are is secondary here but you're giving different signals in different places. Here you are talking about several nursing homes -- but in your question you state "in a nursing home". Being unclear about even incidental details does not help.
    – Nick Cox
    4 hours ago












    @Nick Cox Sorry, I clarified it. Number of falls/year in a sample of 52 nursing homes
    – Possum-Pie
    4 hours ago





    @Nick Cox Sorry, I clarified it. Number of falls/year in a sample of 52 nursing homes
    – Possum-Pie
    4 hours ago











    up vote
    0
    down vote













    I'm an engineer, so in my world, the applied statistician is what I see most, and get the most concrete value. If you are going to work in applied, then you need to be solidly grounded in practice over theory: whether or not it is elegant, the aircraft has to fly and not crash.



    When I think about this question the way I approach it, as many of my technical betters here have also done, is to think about "what does it look like in the real world with the presence of noise".



    The second thing that I do is, often, to make a simulation that allows me to get my hands around the question.



    Here is a very brief exploration:



    #show how the mean and the median differ with respect to sample size

    #libraries
    library(reshape2)
    library(ggplot2)

    #sample sizes
    ssizes <- 10^(seq(from=1, to=3, by=0.25))
    ssizes <- round(ssizes)

    #loops per sample
    n_loops <- 5000

    #pre-declare, prep for loop
    my_store <- matrix(0,
    ncol = 3,
    nrow = n_loops*length(ssizes))

    count <- 1

    for(i in 1:length(ssizes))

    #how many samples
    n_samp <- ssizes[i]

    for(j in 1:n_loops)

    #draw samples
    y <- 0
    y <- rnorm(n = n_samp,mean = 0, sd = 1)

    #compute mean, median, mode
    my_store[count,1] <- n_samp
    my_store[count,2] <- median(y)
    my_store[count,3] <- mean(y)


    #update
    count = count + 1




    #make data into ggplot friendly form
    df <- data.frame(my_store)
    names(df) <- c("n_samp", "median","mean")

    df <- melt(df, id.vars = 1, measure.vars = c("median","mean"))


    #make ggplot
    ggplot(df, aes(x=as.factor(n_samp),
    y = value,
    fill = variable)) + geom_boxplot() +
    labs(title = "Contrast Median and Mean estimate variation vs. Sample Size",
    x = "Number of Samples",
    y = "Estimated value")


    It gives this as the output:
    enter image description here



    Note: be careful about the x-axis, because it is log-scaled, not uniform-scaled.



    I know that the mean and median are exactly the same. The code says it. The empirical realization is greatly sensitive to sample size, and if there aren't truly infinite samples, then they can't ever perfectly match with theory.



    You can think about whether the uncertainty in the median envelopes the estimated mean or vice versa. If the best estimate of the mean is within the 95% CI of the estimate for the median, then the data can't tell the difference. The data says they are the same in theory. If you get more data, then see what it says.






    share|cite|improve this answer




















    • Interesting graph. I would have thought the Mean would have been generally larger than the median at first considering the mean chases outliers...in other words the red bars would be mean and green be medians. What am I missing?
      – Possum-Pie
      2 hours ago















    up vote
    0
    down vote













    I'm an engineer, so in my world, the applied statistician is what I see most, and get the most concrete value. If you are going to work in applied, then you need to be solidly grounded in practice over theory: whether or not it is elegant, the aircraft has to fly and not crash.



    When I think about this question the way I approach it, as many of my technical betters here have also done, is to think about "what does it look like in the real world with the presence of noise".



    The second thing that I do is, often, to make a simulation that allows me to get my hands around the question.



    Here is a very brief exploration:



    #show how the mean and the median differ with respect to sample size

    #libraries
    library(reshape2)
    library(ggplot2)

    #sample sizes
    ssizes <- 10^(seq(from=1, to=3, by=0.25))
    ssizes <- round(ssizes)

    #loops per sample
    n_loops <- 5000

    #pre-declare, prep for loop
    my_store <- matrix(0,
    ncol = 3,
    nrow = n_loops*length(ssizes))

    count <- 1

    for(i in 1:length(ssizes))

    #how many samples
    n_samp <- ssizes[i]

    for(j in 1:n_loops)

    #draw samples
    y <- 0
    y <- rnorm(n = n_samp,mean = 0, sd = 1)

    #compute mean, median, mode
    my_store[count,1] <- n_samp
    my_store[count,2] <- median(y)
    my_store[count,3] <- mean(y)


    #update
    count = count + 1




    #make data into ggplot friendly form
    df <- data.frame(my_store)
    names(df) <- c("n_samp", "median","mean")

    df <- melt(df, id.vars = 1, measure.vars = c("median","mean"))


    #make ggplot
    ggplot(df, aes(x=as.factor(n_samp),
    y = value,
    fill = variable)) + geom_boxplot() +
    labs(title = "Contrast Median and Mean estimate variation vs. Sample Size",
    x = "Number of Samples",
    y = "Estimated value")


    It gives this as the output:
    enter image description here



    Note: be careful about the x-axis, because it is log-scaled, not uniform-scaled.



    I know that the mean and median are exactly the same. The code says it. The empirical realization is greatly sensitive to sample size, and if there aren't truly infinite samples, then they can't ever perfectly match with theory.



    You can think about whether the uncertainty in the median envelopes the estimated mean or vice versa. If the best estimate of the mean is within the 95% CI of the estimate for the median, then the data can't tell the difference. The data says they are the same in theory. If you get more data, then see what it says.






    share|cite|improve this answer




















    • Interesting graph. I would have thought the Mean would have been generally larger than the median at first considering the mean chases outliers...in other words the red bars would be mean and green be medians. What am I missing?
      – Possum-Pie
      2 hours ago













    up vote
    0
    down vote










    up vote
    0
    down vote









    I'm an engineer, so in my world, the applied statistician is what I see most, and get the most concrete value. If you are going to work in applied, then you need to be solidly grounded in practice over theory: whether or not it is elegant, the aircraft has to fly and not crash.



    When I think about this question the way I approach it, as many of my technical betters here have also done, is to think about "what does it look like in the real world with the presence of noise".



    The second thing that I do is, often, to make a simulation that allows me to get my hands around the question.



    Here is a very brief exploration:



    #show how the mean and the median differ with respect to sample size

    #libraries
    library(reshape2)
    library(ggplot2)

    #sample sizes
    ssizes <- 10^(seq(from=1, to=3, by=0.25))
    ssizes <- round(ssizes)

    #loops per sample
    n_loops <- 5000

    #pre-declare, prep for loop
    my_store <- matrix(0,
    ncol = 3,
    nrow = n_loops*length(ssizes))

    count <- 1

    for(i in 1:length(ssizes))

    #how many samples
    n_samp <- ssizes[i]

    for(j in 1:n_loops)

    #draw samples
    y <- 0
    y <- rnorm(n = n_samp,mean = 0, sd = 1)

    #compute mean, median, mode
    my_store[count,1] <- n_samp
    my_store[count,2] <- median(y)
    my_store[count,3] <- mean(y)


    #update
    count = count + 1




    #make data into ggplot friendly form
    df <- data.frame(my_store)
    names(df) <- c("n_samp", "median","mean")

    df <- melt(df, id.vars = 1, measure.vars = c("median","mean"))


    #make ggplot
    ggplot(df, aes(x=as.factor(n_samp),
    y = value,
    fill = variable)) + geom_boxplot() +
    labs(title = "Contrast Median and Mean estimate variation vs. Sample Size",
    x = "Number of Samples",
    y = "Estimated value")


    It gives this as the output:
    enter image description here



    Note: be careful about the x-axis, because it is log-scaled, not uniform-scaled.



    I know that the mean and median are exactly the same. The code says it. The empirical realization is greatly sensitive to sample size, and if there aren't truly infinite samples, then they can't ever perfectly match with theory.



    You can think about whether the uncertainty in the median envelopes the estimated mean or vice versa. If the best estimate of the mean is within the 95% CI of the estimate for the median, then the data can't tell the difference. The data says they are the same in theory. If you get more data, then see what it says.






    share|cite|improve this answer












    I'm an engineer, so in my world, the applied statistician is what I see most, and get the most concrete value. If you are going to work in applied, then you need to be solidly grounded in practice over theory: whether or not it is elegant, the aircraft has to fly and not crash.



    When I think about this question the way I approach it, as many of my technical betters here have also done, is to think about "what does it look like in the real world with the presence of noise".



    The second thing that I do is, often, to make a simulation that allows me to get my hands around the question.



    Here is a very brief exploration:



    #show how the mean and the median differ with respect to sample size

    #libraries
    library(reshape2)
    library(ggplot2)

    #sample sizes
    ssizes <- 10^(seq(from=1, to=3, by=0.25))
    ssizes <- round(ssizes)

    #loops per sample
    n_loops <- 5000

    #pre-declare, prep for loop
    my_store <- matrix(0,
    ncol = 3,
    nrow = n_loops*length(ssizes))

    count <- 1

    for(i in 1:length(ssizes))

    #how many samples
    n_samp <- ssizes[i]

    for(j in 1:n_loops)

    #draw samples
    y <- 0
    y <- rnorm(n = n_samp,mean = 0, sd = 1)

    #compute mean, median, mode
    my_store[count,1] <- n_samp
    my_store[count,2] <- median(y)
    my_store[count,3] <- mean(y)


    #update
    count = count + 1




    #make data into ggplot friendly form
    df <- data.frame(my_store)
    names(df) <- c("n_samp", "median","mean")

    df <- melt(df, id.vars = 1, measure.vars = c("median","mean"))


    #make ggplot
    ggplot(df, aes(x=as.factor(n_samp),
    y = value,
    fill = variable)) + geom_boxplot() +
    labs(title = "Contrast Median and Mean estimate variation vs. Sample Size",
    x = "Number of Samples",
    y = "Estimated value")


    It gives this as the output:
    enter image description here



    Note: be careful about the x-axis, because it is log-scaled, not uniform-scaled.



    I know that the mean and median are exactly the same. The code says it. The empirical realization is greatly sensitive to sample size, and if there aren't truly infinite samples, then they can't ever perfectly match with theory.



    You can think about whether the uncertainty in the median envelopes the estimated mean or vice versa. If the best estimate of the mean is within the 95% CI of the estimate for the median, then the data can't tell the difference. The data says they are the same in theory. If you get more data, then see what it says.







    share|cite|improve this answer












    share|cite|improve this answer



    share|cite|improve this answer










    answered 2 hours ago









    EngrStudent

    5,55611862




    5,55611862











    • Interesting graph. I would have thought the Mean would have been generally larger than the median at first considering the mean chases outliers...in other words the red bars would be mean and green be medians. What am I missing?
      – Possum-Pie
      2 hours ago

















    • Interesting graph. I would have thought the Mean would have been generally larger than the median at first considering the mean chases outliers...in other words the red bars would be mean and green be medians. What am I missing?
      – Possum-Pie
      2 hours ago
















    Interesting graph. I would have thought the Mean would have been generally larger than the median at first considering the mean chases outliers...in other words the red bars would be mean and green be medians. What am I missing?
    – Possum-Pie
    2 hours ago





    Interesting graph. I would have thought the Mean would have been generally larger than the median at first considering the mean chases outliers...in other words the red bars would be mean and green be medians. What am I missing?
    – Possum-Pie
    2 hours ago











    up vote
    0
    down vote













    In medical statistics, we only ever comment on the shapes and seeming of distributions. The fact that no discrete finite sample can ever be normal is irrelevant and pedantic. I would mark you wrong for that.



    If a distribution looks "mostly" normal, we are comfortable with calling it normal. When I describe distributions for a non-statistical audience, I am very comfortable with calling something approximately normal even when I know the normal distribution is not the underlying probability model, I get the sense I would side with your teacher here... but we have no histogram or dataset to verify.



    As a tip, I would go through the following inspections very closely:



    • who are the outliers, how many and what are their values?

    • Are the data bimodal?

    • Do the data seem to take a skewed shape so that some transformation (like a log) would better quantify the "distance" between observations?

    • Is there apparent truncation or heaping so that assays or labs are failing to reliably detect a certain range of values?





    share|cite|improve this answer


























      up vote
      0
      down vote













      In medical statistics, we only ever comment on the shapes and seeming of distributions. The fact that no discrete finite sample can ever be normal is irrelevant and pedantic. I would mark you wrong for that.



      If a distribution looks "mostly" normal, we are comfortable with calling it normal. When I describe distributions for a non-statistical audience, I am very comfortable with calling something approximately normal even when I know the normal distribution is not the underlying probability model, I get the sense I would side with your teacher here... but we have no histogram or dataset to verify.



      As a tip, I would go through the following inspections very closely:



      • who are the outliers, how many and what are their values?

      • Are the data bimodal?

      • Do the data seem to take a skewed shape so that some transformation (like a log) would better quantify the "distance" between observations?

      • Is there apparent truncation or heaping so that assays or labs are failing to reliably detect a certain range of values?





      share|cite|improve this answer
























        up vote
        0
        down vote










        up vote
        0
        down vote









        In medical statistics, we only ever comment on the shapes and seeming of distributions. The fact that no discrete finite sample can ever be normal is irrelevant and pedantic. I would mark you wrong for that.



        If a distribution looks "mostly" normal, we are comfortable with calling it normal. When I describe distributions for a non-statistical audience, I am very comfortable with calling something approximately normal even when I know the normal distribution is not the underlying probability model, I get the sense I would side with your teacher here... but we have no histogram or dataset to verify.



        As a tip, I would go through the following inspections very closely:



        • who are the outliers, how many and what are their values?

        • Are the data bimodal?

        • Do the data seem to take a skewed shape so that some transformation (like a log) would better quantify the "distance" between observations?

        • Is there apparent truncation or heaping so that assays or labs are failing to reliably detect a certain range of values?





        share|cite|improve this answer














        In medical statistics, we only ever comment on the shapes and seeming of distributions. The fact that no discrete finite sample can ever be normal is irrelevant and pedantic. I would mark you wrong for that.



        If a distribution looks "mostly" normal, we are comfortable with calling it normal. When I describe distributions for a non-statistical audience, I am very comfortable with calling something approximately normal even when I know the normal distribution is not the underlying probability model, I get the sense I would side with your teacher here... but we have no histogram or dataset to verify.



        As a tip, I would go through the following inspections very closely:



        • who are the outliers, how many and what are their values?

        • Are the data bimodal?

        • Do the data seem to take a skewed shape so that some transformation (like a log) would better quantify the "distance" between observations?

        • Is there apparent truncation or heaping so that assays or labs are failing to reliably detect a certain range of values?






        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited 1 hour ago

























        answered 1 hour ago









        AdamO

        30.6k255128




        30.6k255128




















            Possum-Pie is a new contributor. Be nice, and check out our Code of Conduct.









             

            draft saved


            draft discarded


















            Possum-Pie is a new contributor. Be nice, and check out our Code of Conduct.












            Possum-Pie is a new contributor. Be nice, and check out our Code of Conduct.











            Possum-Pie is a new contributor. Be nice, and check out our Code of Conduct.













             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f367254%2fdoes-a-normal-distribution-need-to-have-mean-median-mode%23new-answer', 'question_page');

            );

            Post as a guest













































































            Comments

            Popular posts from this blog

            Long meetings (6-7 hours a day): Being “babysat” by supervisor

            What does second last employer means? [closed]

            One-line joke