Introduction to Statistics and Data Science
2019-09-13
Chapter 1 Preface
Welcome to the exciting world of statistics and data analysis. During this semester we will learn tools for reasoning under uncertainty.
Our brains are not programmed to do well in reasoning with any level of uncertainty. For example, one of the two lines shown below are random numbers and the other is not. Can you tell which is pure random?
## [1] 4 0 7 0 8 4 4 0 6 3 9 1 5 5 3 5 9 0 2 0 3 6 8 5 9 2 5 9 8 7 3 2 4 4 0
## [36] 1 2 1 8 2 6 4 4 5 6 7 6 7 3 2
## [1] 5 6 7 1 7 0 7 1 6 0 2 9 0 4 0 3 1 4 2 0 4 0 1 0 2 4 3 1 9 3 1 5 4 1 5
## [36] 2 3 6 7 5 0 5 9 6 8 9 5 7 2 5
1.1 Librarian or Farmer?
Steve is very shy and withdrawn, invariably helpful but with little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.” Is Steve more likely to be a librarian or a farmer?
1.2 Profits
Suppose you are a sales manager and are presented with the below plot of profits for the last five years. What conclusions can you draw from the below graph?
1.3 Accidental Deaths
Now, suppose that you are investigating the occurance of accidental deaths within the united states where the cause of death is a fire. We may download the data from the center for disease control (CDC) for the year 2017 to look for which states have the highest rates and the lowest rates. By rate here we mean the number of deaths per hundred thousand residents in the state. Overall in the United States the rate of accidental deaths of this type is quite low at 0.8396804 per hundred thousand americans in 2017.
The five states with the highest rates are shown in the below table:
stateName | RatePerHundredThousand | population.percentage.us |
---|---|---|
South Dakota | 2.2 | 0.2669987 |
Arkansas | 2.1 | 0.9223525 |
South Carolina | 1.9 | 1.5425463 |
Mississippi | 1.9 | 0.9161573 |
West Virginia | 1.9 | 0.5574916 |
Notice these states have much higher rates then the nation wide average of 0.8396804. This might make us want to avoid these states! On the other hand the five states with the lowest rates are shown in the below table.
stateName | RatePerHundredThousand | population.percentage.us |
---|---|---|
Delaware | 0 | 0.2953277 |
North Dakota | 0 | 0.2319154 |
District of Columbia | 0 | 0.2130584 |
Vermont | 0 | 0.1914708 |
Wyoming | 0 | 0.1778572 |
These states all had zero accidental fire deaths in 2017. However, do you notice anything strange about our top/bottom ten lists here in terms of the populations? Perhaps the below graph will help you see what is really going on here.
These notes are written in bookdown (Xie 2019).
References
Xie, Yihui. 2019. Bookdown: Authoring Books and Technical Documents with R Markdown. https://CRAN.R-project.org/package=bookdown.