Quality Learning Support For All
First time here? Checkout the FAQs!
x

*Math Image Search only works best with SINGLE, zoomed in, well cropped images of math. No selfies and diagrams please :)

For Example

Math Image Search 1
Math Image Search 2

Join the MathsGee Club for exclusive quizzes, courses and interactive content without ads.



ASK - ANSWER - COMMENT - VOTE - CREATE


Sites: Global Q&A | Wits | MathsGee Club | Joburg Libraries | StartUps | Zimbabwe | OER

MathsGee is Zero-Rated (You do not need data to access) on: Telkom |Dimension Data | Rain | MWEB

0 like 0 dislike
93 views
How do you determine good accuracy of a machine learning algorithm?
in Data Science & Statistics by Diamond (88,334 points) | 93 views

1 Answer

0 like 0 dislike
Best answer

Overview

This post is divided into 4 parts; they are:

  1. Model Skill Is Relative
  2. Baseline Model Skill
  3. What Is the Best Score?
  4. Discover Limits of Model Skill

Model Skill Is Relative

Your predictive modeling problem is unique.

This includes the specific data you have, the tools you’re using, and the skill you will achieve.

Your predictive modeling problem has not been solved before. Therefore, we cannot know what a good model looks like or what skill it might have.

You may have ideas of what a skillful model looks like based on knowledge of the domain, but you don’t know whether those skill scores are achievable.

The best that we can do is to compare the performance of machine learning models on your specific data to other models also trained on the same data.

Machine learning model performance is relative and ideas of what score a good model can achieve only make sense and can only be interpreted in the context of the skill scores of other models also trained on the same data.

Baseline Model Skill

Because machine learning model performance is relative, it is critical to develop a robust baseline.

A baseline is a simple and well understood procedure for making predictions on your predictive modeling problem. The skill of this model provides the bedrock for the lowest acceptable performance of a machine learning model on your specific dataset.

The results for the baseline model provide the point from which the skill of all other models trained on your data can be evaluated.

Three examples of baseline models include:

  • Predict the mean outcome value for a regression problem.
  • Predict the mode outcome value for a classification problem.
  • Predict the input as the output (called persistence) for a univariate time series forecasting problem.

The baseline performance on your problem can then be used as the yardstick by which all other models can be compared and evaluated.

If a model achieves a performance below the baseline, something is wrong (e.g. there’s a bug) or the model is not appropriate for your problem.

What Is the Best Score?

If you are working on a classification problem, the best score is 100% accuracy.

If you are working on a regression problem, the best score is 0.0 error.

These scores are an impossible to achieve upper/lower bound. All predictive modeling problems have prediction error. Expect it. The error comes from a range of sources such as:

  • Incompleteness of data sample.
  • Noise in the data.
  • Stochastic nature of the modeling algorithm.

You cannot achieve the best score, but it is good to know what the best possible performance is for your chosen measure. You know that true model performance will fall within a range between the baseline and the best possible score.

Instead, you must search the space of possible models on your dataset and discover what good and bad scores look like.

Discover Limits of Model Skill

Once you have the baseline, you can explore the extent of model performance on your predictive modeling problem.

In fact, this is the hard work and the objective of the project: to find a model that you can demonstrate works reliably well in making predictions on your specific dataset.

There are many strategies to this problem; two that you may wish to consider are:

  • Start High. Select a machine learning method that is sophisticated and known to perform well on a range of predictive model problems, such as random forest or gradient boosting. Evaluate the model on your problem and use the result as an approximate top-end benchmark, then find the simplest model that achieves similar performance.
  • Exhaustive Search. Evaluate all of the machine learning methods that you can think of on the problem and select the method that achieves the best performance relative to the baseline.

The “Start High” approach is fast and can help you define the bounds of model skill to expect on the problem and find a simple (e.g. Occam’s Razor) model that can achieve similar results. It can also help you find out whether the problem is solvable/predictable fast, which is important because not all problems are predictable.

The “Exhaustive Search” is slow and is really intended for long-running projects where model skill is more important than almost any other concern. I often perform variations of this approach testing suites of similar methods in batches and call it the spot-checking approach.

Both methods will give you a population of model performance scores that you can compare to the baseline.

You will know what a good score looks like and what a bad score looks like.

by Diamond (88,334 points)

Related questions

0 like 0 dislike
0 answers
1 like 0 dislike
1 answer
0 like 0 dislike
1 answer
asked Mar 11, 2019 in Data Science & Statistics by Edzai Zvobwo Bronze Status (8,810 points) | 56 views
0 like 0 dislike
1 answer
0 like 0 dislike
0 answers
0 like 0 dislike
0 answers
0 like 0 dislike
0 answers
0 like 0 dislike
0 answers
1 like 0 dislike
1 answer
1 like 0 dislike
1 answer
0 like 0 dislike
0 answers
0 like 0 dislike
0 answers
asked Nov 15, 2020 in Data Science & Statistics by Teddy Bronze Status (9,942 points) | 28 views
0 like 0 dislike
1 answer
0 like 0 dislike
1 answer
0 like 0 dislike
1 answer
0 like 0 dislike
1 answer
1 like 0 dislike
1 answer
0 like 0 dislike
0 answers
0 like 0 dislike
1 answer
0 like 0 dislike
1 answer
0 like 0 dislike
1 answer
0 like 0 dislike
0 answers
0 like 0 dislike
0 answers

Join the MathsGee Answers & Explanations community and get study support for success - MathsGee Answers & Explanations provides answers to subject-specific educational questions for improved outcomes.



On MathsGee Answers & Explanations, you can:


  1. Ask questions
  2. Answer questions
  3. Comment on Answers
  4. Vote on Questions and Answers
  5. Donate to your favourite users
  6. Create/Take Live Video Lessons

Posting on MathsGee Answers & Explanations


  1. Remember the human
  2. Behave like you would in real life
  3. Look for the original source of content
  4. Search for duplicates before posting
  5. Read the community's rules

Join the MathsGee Club for exclusive quizzes, courses and interactive content without ads.



Wits Maths Questions

Solve the linear equation:
Posted on Thursday September 02, 2021

Solve the equation:

(5(2-3x)+3(5x-2)-7(4-x)=0)

What is the definition of arcsin?
Posted on Thursday September 02, 2021

What is the definition of arcsin?

What are the condition for the existence of an inverse trigonometric function?
Posted on Thursday September 02, 2021

What are the condition for the existence of an inverse trigonometric function?

Click Here To Read More.
 

Given \(y=2 x=f(x)\) find \(f^{-1}\)
Posted on Thursday September 02, 2021

Given (y=2 x=f(x)) find (f^{-1})

What is the notation for inverse trigonometric functions?
Posted on Thursday September 02, 2021

What is the notation for inverse trigonometric functions?

MathsGee Tools

Math Worksheet Generator

Math Algebra Solver

Trigonometry Simulations

Vectors Simulations

Matrix Arithmetic Simulations

Matrix Transformations Simulations

Quadratic Equations Simulations

Probability & Statistics Simulations

PHET Simulations

Visual Statistics

Join the MathsGee Club for exclusive quizzes, courses and interactive content without ads.



MathsGee ZOOM | eBook

Join the MathsGee Club for exclusive quizzes, courses and interactive content without ads.