Stephen Few seems to be upset. In his yearly review, he touches on what sees as a year without any progress: Since the advent of the computer (and before that the printing press, and before that writing, and before that language), data has always been BIG, and Data Science has existed at least since the […]

# Statistics

# Battling Bad Science

A few important statistical concepts are mentioned here, including observational versus randomized trails (randomization apparently is mentioned in Daniel 1:12); causality; publication bias; and a “funnel” plot.

# Type II Errors

Significance testing at 0.05 means you will get an erroneous false-positive 1 times out of 20. xkcd exhibits:

# Anatomy of a Standard Deviation

While reorganizing some old files I ran across this graphic I put together for a statistics 101 tutoring lab. Click the image above to get the full image [PDF] explaining how to calculate the standard deviation for a sample.

# To standardize or not standardize?

Andrew Gelman compares standardized and unstandardized coefficients. Gelman has previously written a paper which addresses most of the shortcomings to unstandarized coefficients (e.g., raw coefficients, the stuff you see as a default) and addresses the issues brought up in the post. Unfortunately, he didn’t address the problems I was most interested in seeing. He writes: […]