If you invest in financial markets, you may want to predict the price of a stock in six months from now on the basis of company performance measures and other economic factors. As a college student, you may be interested in knowing the dependence of the mean starting salary of a college graduate, based on your GPA. These are just some examples that highlight how statistics are used in our modern society. To figure out the desired information for each example, you need data to analyze.
The purpose of this course is to introduce you to the subject of statistics as a science of data. There is data abound in this information age; how to extract useful knowledge and gain a sound understanding in complex data sets has been more of a challenge. In this course, we will focus on the fundamentals of statistics, which may be broadly described as the techniques to collect, clarify, summarize, organize, analyze, and interpret numerical information.
This course will begin with a brief overview of the discipline of statistics and will then quickly focus on descriptive statistics, introducing graphical methods of describing data. You will learn about combinatorial probability and random distributions, the latter of which serves as the foundation for statistical inference. On the side of inference, we will focus on both estimation and hypothesis testing issues. We will also examine the techniques to study the relationship between two or more variables; this is known as regression.
By the end of this course, you should gain a sound understanding about what statistics represent, how to use statistics to organize and display data, and how to draw valid inferences based on data by using appropriate statistical tools.2 Pages
Unit 1: Statistics and Data
In today's technologically advanced world, we have access to large volumes of data. The first step of data analysis is to accurately summarize all of this data, both graphically and numerically, so that we can understand what the data reveals. To be able to use and interpret the data correctly is essential to making informed decisions. For instance, when you see a survey of opinion about a certain TV program, you may be interested in the proportion of those people who indeed like the program.
In this unit, you will learn about descriptive statistics, which are used to summarize and display data. After completing this unit, you will know how to present your findings once you have collected data. For example, suppose you want to buy a new mobile phone with a particular type of a camera. Suppose you are not sure about the prices of any of the phones with this feature, so you access a website that provides you with a sample data set of prices, given your desired features. Looking at all of the prices in a sample can sometimes be confusing. A better way to compare this data might be to look at the median price and the variation of prices. The median and variation are two ways out of several ways that you can describe data. You can also graph the data so that it is easier to see what the price distribution looks like.
In this unit, you will study precisely this; namely, you will learn both numerical and graphical ways to describe and display your data. You will understand the essentials of calculating common descriptive statistics for measuring center, variability, and skewness in data. You will learn to calculate and interpret these measurements and graphs.
Descriptive statistics are, as their name suggests, descriptive. They do not generalize beyond the data considered. Descriptive statistics illustrate what the data shows. Numerical descriptive measures computed from data are called statistics. Numerical descriptive measures of the population are called parameters. Inferential statistics can be used to generalize the findings from sample data to a broader population.
Completing this unit should take you approximately 22 hours.1 Page
- 1.1: The Science of Statistics and Its Importance 1.1: The Science of Statistics and Its Importance
- 1.1.1: What is Statistics? 1.1.1: What is Statistics?
- 1.1.2: Descriptive and Inferential Statistics 1.1.2: Descriptive and Inferential Statistics
- 1.1.3: Types of Data and Their Collection 1.1.3: Types of Data and Their Collection
- 1.2: Methods for Describing Data 1.2: Methods for Describing Data
- 1.2.1: Graphical Methods for Describing Quantitative Data 1.2.1: Graphical Methods for Describing Quantitative Data
- 1.2.2: Numerical Measures of Central Tendency and Variability 1.2.2: Numerical Measures of Central Tendency and Variability
- 1.2.3: Methods for Describing Relative Standing 1.2.3: Methods for Describing Relative Standing
1.2.4: Methods for Describing Bivariate Relationships
1 Page, 1 URL
- Section 12 Section 12
Unit 2: Elements of Probability and Random Variables
Probabilities affect our everyday lives. In this unit, you will learn about probability and its properties, how probability behaves, and how to calculate and use it. You will study the fundamentals of probability and will work through examples that cover different types of probability questions. These basic probability concepts will provide a foundation for understanding more statistical concepts, for example, interpreting polling results. Though you may have already encountered concepts of probability, after this unit, you will be able to formally and precisely predict the likelihood of an event occurring given certain constraints.
Probability theory is a discipline that was created to deal with chance phenomena. For instance, before getting a surgery, a patient wants to know the chances that the surgery might fail; before taking medication, you want to know the chances that there will be side effects; before leaving your house, you want to know the chance that it will rain today. Probability is a measure of likelihood that takes on values between 0 and 1, inclusive, with 0 representing impossible events and 1 representing certainty. The chances of events occurring fall between these two values.
The skill of calculating probability allows us to make better decisions. Whether you are evaluating how likely it is to get more than 50% of the questions correct on a quiz if you guess randomly; predicting the chance that the next storm will arrive by the end of the week; or exploring the relationship between the number of hours students spend at the gym and their performance on an exam, an understanding of the fundamentals of probability is crucial.
We will also talk about random variables. A random variable describes the outcomes of a random experiment. A statistical distribution describes the numbers of times each possible outcome occurs in a sample. The values of a random variable can vary with each repetition of an experiment. Intuitively, a random variable, summarizing certain chance phenomenon, takes on values with certain probabilities. A random variable can be classified as being either discrete or continuous, depending on the values it assumes. Suppose you count the number of people who go to a coffee shop between 4 p.m. and 5 p.m. and the amount of waiting time that they spend in that hour. In this case, the number of people is an example of a discrete random variable and the amount of waiting time they spend is an example of a continuous random variable.
Completing this unit should take you approximately 25 hours.1 Page
- 2.1: Classical Probability Model 2.1: Classical Probability Model
- 2.1.1: Events, Sample Spaces, and Probability 2.1.1: Events, Sample Spaces, and Probability
2.1.2: Counting Rules
1 URL, 1 Page
- 2.2: Random Variables and Distributions 2.2: Random Variables and Distributions
2.2.1: Common Discrete Random Variables
2 URLs, 1 Page
2.2.2: Normal Distribution
2 URLs, 1 Page
Unit 3: Sampling Distributions
The concept of sampling distribution lies at the very foundation of statistical inference. It is best to introduce sampling distribution using an example here. Suppose you want to estimate a parameter of a population, say the population mean. There are two natural estimators: 1. sample mean, which is the average value of the data set; and 2. median, which is the middle number when the measurements are arranged in ascending (or descending) order. In particular, for a sample of even size n, the median is the mean of the middle two numbers. But which one is better, and in what sense? This involves repeated sampling, and you want to choose the estimator that would do better on average. It is clear that different samples may give different sample means and medians; some of them may be closer to the truth than the others. Consequently, we cannot compare these two sample statistics or, in general, any two sample statistics on the basis of their performance with a single sample. Instead, you should recognize that sample statistics are themselves random variables; therefore, sample statistics should have frequency distributions by taking into account all possible samples. In this unit, you will study the sampling distribution of several sample statistics. This unit will show you how the central limit theorem can help to approximate sampling distributions in general.
Completing this unit should take you approximately 15 hours.1 Page
- 3.1: The Concept of Sampling Distributions 3.1: The Concept of Sampling Distributions
- 3.1.1: Continuous Random Variables 3.1.1: Continuous Random Variables
- 3.1.2: Definition and Interpretation 3.1.2: Definition and Interpretation
- 3.1.3: Sampling Distributions Properties 3.1.3: Sampling Distributions Properties
- 3.2: Sampling Distributions for Common Statistics 3.2: Sampling Distributions for Common Statistics
3.2.1: The Sampling Distribution of Sample Mean
2 URLs, 1 Page
- 3.2.2: The Sampling Distribution of Pearson's r 3.2.2: The Sampling Distribution of Pearson's r
- 3.2.3: The Sampling Distribution of the Sample Proportion 3.2.3: The Sampling Distribution of the Sample Proportion
Unit 4: Estimation with Confidence Intervals
In this unit, you will learn how to use the central limit theorem and confidence intervals, the latter of which enables you to estimate unknown population parameters. The central limit theorem provides us with a way to make inferences from samples of non-normal populations. This theorem states that given any population, as the sample size increases, the sampling distribution of the means approaches a normal distribution. This powerful theorem allows us to assume that given a large enough sample, the sampling distribution will be normally distributed.
You will also learn about confidence intervals, which provide you with a way to estimate a population parameter. Instead of giving just a one-number estimate of a variable, a confidence interval gives a range of likely values for it. This is useful, because point estimates will vary from sample to sample, so an interval with certain confidence level is better than a single point estimate. After completing this unit, you will know how to construct such confidence intervals and the level of confidence.
Completing this unit should take you approximately 10 hours.1 Page
- 4.1: Point Estimators and Their Characteristics 4.1: Point Estimators and Their Characteristics
- 4.1.1: Sample Statistics and Parameters 4.1.1: Sample Statistics and Parameters
- 4.1.2: Bias and Sampling Variability 4.1.2: Bias and Sampling Variability
- 4.2: Confidence Intervals 4.2: Confidence Intervals
- 4.2.1: Confidence Intervals for Mean 4.2.1: Confidence Intervals for Mean
- 4.2.2: Confidence Intervals for Correlation and Proportion 4.2.2: Confidence Intervals for Correlation and Proportion
- Section 36 Section 36
Unit 5: Hypothesis Test
A hypothesis test involves collecting and evaluating data from a sample. The data gathered and evaluated is then used to make a decision as to whether or not the data supports the claim that is made about the population. This unit will teach you how to conduct hypothesis tests and how to identify and differentiate between the errors associated with them.
Many times, you need answers to questions in order to make efficient decisions. For example, a restaurant owner might claim that his restaurant's food costs 30% less than other restaurants in the area, or a phone company might claim that its phones last at least one year more than phones from other companies. In order to decide whether it would be more affordable to eat at the restaurant that "costs 30% less" or another restaurant in the area, or in order to decide which phone company to choose based on the durability of the phone, you will have to collect data to justify these claims. The process of hypothesis testing is a way of decision-making. In this unit, you will learn to establish your assumptions through null and alternative hypotheses. The null hypothesis is the hypothesis that is assumed to be true and the hypothesis you hope to nullify, while the alternative hypothesis is the research hypothesis that you claim to be true. This means that you need to conduct the correct tests to be able to accept or reject the null hypothesis. You will learn how to compare sample characteristics to see whether there is enough data to accept or reject the null hypothesis.
Completing this unit should take you approximately 12 hours.1 Page
- 5.1: Elements of Hypothesis Testing 5.1: Elements of Hypothesis Testing
- 5.1.1: Setting up Hypotheses 5.1.1: Setting up Hypotheses
5.1.2: Interpreting Hypotheses Testing Results
3 URLs, 1 Page
- 5.1.3: Steps in Hypothesis Testing and Its Relation to Confidence Intervals 5.1.3: Steps in Hypothesis Testing and Its Relation to Confidence Intervals
- 5.2: Tests of Population Means 5.2: Tests of Population Means
5.2.1: Testing Single Mean
5.2.2: Testing the Difference between Two Means
1 URL, 1 Page
5.3: Chi-Square Distribution
2 URLs, 1 Page
- 5.4: Comparing the Proportions of Populations 5.4: Comparing the Proportions of Populations
- Section 47 Section 47
Unit 6: Linear Regression
In this unit, we will discuss situations in which the mean of a population, treated as a variable, depends on the value of another variable. One of the main reasons why we conduct such analyses is to understand how two variables are related to each other. The most common type of relationship is a linear relationship. For example, you may want to know what happens to one variable when you increase or decrease the other variable. You want to answer questions such as, "Does one variable increase as the other increases, or does the variable decrease?” For example, you may want to determine how the mean reaction time of rats depends on the amount of drug in bloodstream.
In this unit, you will also learn to measure the degree of a relationship between two or more variables. Both correlation and regression are measures for comparing variables. Correlation quantifies the strength of a relationship between two variables and is a measure of existing data. On the other hand, regression is the study of the strength of a linear relationship between an independent and dependent variable and can be used to predict the value of the dependent variable when the value of the independent variable is known.
Completing this unit should take you approximately 12 hours.1 Page
- 6.1: The Regression Model 6.1: The Regression Model
- 6.1.1: Scatter Plot of Two Variables and Regression Line 6.1.1: Scatter Plot of Two Variables and Regression Line
- 6.1.2: Correlation Coefficient 6.1.2: Correlation Coefficient
6.1.3: Sums of Squares
1 URL, 1 Page
- 6.2: Fitting the Model 6.2: Fitting the Model
- 6.2.1: Standard Errors of the Least Squares Estimates 6.2.1: Standard Errors of the Least Squares Estimates
- 6.2.2: Statistical Inference for the Slope and Correlation 6.2.2: Statistical Inference for the Slope and Correlation
- 6.2.3: Influential Observations 6.2.3: Influential Observations
This optional subunit will teach you about "Analysis of Variance" (abbreviated ANOVA), which is used for hypothesis tests involving more than two averages. ANOVA is about examining the amount of variability in the y variable and trying to see where that variability is coming from. You will study the simplest form of ANOVA, called single factor or one-way ANOVA. Finally, you will briefly study the F distribution, used for ANOVA, and the test of two variances.1 Page, 1 URL, 1 Quiz