# arrow_back A student wishes to investigate whether the pass rate of first year actuarial science students at universities is correlated to a mark obtained in a mathematical entrance exam.

Describe the key steps to follow in the data analysis process, for this investigation, and give an example, applicable in this scenario, for each of the steps.

- Develop a well-defined set of objectives that need to be met by the results of the data analysis
- Here the objective is to determine if mathematical entrance exam results is correlated to the Actuarial Science $1^{\text {st }}$ year students' performance
- Identify the data items required for the analysis
'o The data items needed would be the mathematical entrance mark for the $1^{\mathrm{st}}$ year students doing Actuarial Science in the different South African Universities and the $1^{\text {st }}$ year final year results for such students over a period of time
- Collection of the data from appropriate sources
- The data can be obtained from the Universities offering Actuarial Science degrees.
- Processing and formatting the data for analysis, e.g. inputting into a spreadsheet, database or other model.
- The data will need to be extracted from the administrative system of the Universities and loaded into whichever statistical package is being used for the analysis.
- Cleaning data, e.g. addressing unusual, missing or inconsistent values
- For example, a student may be recorded as registered at University X but the marks may be missing or marks which are unrealistic e.g. negative numbers or marks more than $100 \%$ per subject $\mathrm{X}$
- Exploratory data analysis,
o Here takes the form of inferential analysis as we are here testing the hypothesis that mathematical entrance mark is correlated with $1^{\text {st }}$ year Actuarial Science performance at universities.
- Modelling the data.
o In this case we need to choose the correct statistical method e.g. a Chi-squared test for the analysis
- Communicating the results, which include: describing the data sources used, the analysis performed and the conclusion of the analysis.
- Monitoring the process. Updating the data and repeating the process if required.
- May mean choosing another statistical package to use, or adjusting the level of significance chosen.
