This lesson is still being designed and assembled (Pre-Alpha version)

Closing Words: Where Do We Go from Here?

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • What are some advice for newcomers in machine learning?

  • What are possible pitfalls of machine learning?

Objectives
  • Presenting some provoking thoughts regarding machine learning

Advice to Learners

Dr. Andrew Ng recently interviewed Dr. Yoshua Bengio, a world’s expert on deep learning. At the end of this interview, Dr. Bengio shared the following advice for people who are about to begin the journey with deep learning (the advice is also very relevant for general machine learning). Here are the key points, interspersed with my own comments.

You can watch the Yoshua’s full interview in the Coursera website (at the very end, near minute 20):

https://www.coursera.org/lecture/deep-neural-network/yoshua-bengio-interview-bqUgf

If you are serious about learning It could be helpful to take one of the machine learning/artificial intelligence/deep learning online courses and work out the assignments—you will really learn and gain good understanding of the issues related to machine learning.

Some examples of blogs to start with:

For R users, also look at:

Using Machine Learning Properly

A final word of advice … Machine learning is not a black box that can magically return reliable prediction—not without significant effort of tuning, verification, validation. Even after that, there needs to be constant questioning and re-validation effort to make sure that the predictions made by the machine learning algorithm are indeed trustworthy.

In this section we will discuss several foundational points to make your machine learning journey a success. Some of the points mentioned here have to be decided early in the process.

Importance of Data Quality

The quality of data fed into the machine learning algorithm is key to the reliability of machine learning’s prediction. There is a popular saying: “Garbage in, garbage out”. Whether our machine learning model would yield value or garbage depends on the input data used to train the model. Data preparation often takes a significant chunk of time in the entire machine learning process. It is not unusual that 2/3 of the time (if not more) is spent in assessing the data, cleaning the data, removing bad data points. At times, problem with data may be discovered after the machine learning is applied, which means one must go back to the data before remaking the model.

Importance of Data Preparation and Model Selection

The discussion so far assumes that both the features of the data and the model have been decided. When tackling a new problem, one has to decide which features to include for the machine learning. For example, to classify an email as spam, do we base it only on the email subject? Or do we also want to include the contents of the email? Whether it has an attachment? What about the size of the email, the IP of the sender, the sender’s email address? In the example of machine learning on network event classification, what features are important to consider?

Machine Learning Going Awry?

Here are some cases that indicates that one can perform machine learning impressively on large sets of data, yet still obtain results that are misleading.

Key Points

  • Machine learning is not a blackbox and therefore should be used with proper care.