Closing Words: Where Do We Go from Here?

Overview

Teaching: 0 min
Exercises: 0 min

Questions

What are some advice for newcomers in machine learning?

What are possible pitfalls of machine learning?

Objectives

Presenting some provoking thoughts regarding machine learning

Advice to Learners

Dr. Andrew Ng recently interviewed Dr. Yoshua Bengio, a world’s expert on deep learning. At the end of this interview, Dr. Bengio shared the following advice for people who are about to begin the journey with deep learning (the advice is also very relevant for general machine learning). Here are the key points, interspersed with my own comments.

Read a lot!
- Read books, journal articles
- There are many good blogs out there. Some examples are pointed out below.
Practice programming
- Write your own machine learning (either a pipeline like described in this training, or a full code including the learning algorithm yourself).
- Look at other people’s code and try to understand what they are doing.
Understand what you are doing
- Don’t treat ML as a black box: that somehow the correct answer will come out of magic.
- Always ask: Why? Why this parameter, why this step, etc.
(Advanced) It is helpful to program an entire machine learning, even if your code is not optimal, to help you gain deep understanding.
People who are entering into research are encouraged to read the latest journal and conference proceeding articles on machine learning. Some example conferences include: ICLR (International Conference on Learning Representations), ICML (International Conference on Machine Learning).

You can watch the Yoshua’s full interview in the Coursera website (at the very end, near minute 20):

https://www.coursera.org/lecture/deep-neural-network/yoshua-bengio-interview-bqUgf

If you are serious about learning It could be helpful to take one of the machine learning/artificial intelligence/deep learning online courses and work out the assignments—you will really learn and gain good understanding of the issues related to machine learning.

Some examples of blogs to start with:

For R users, also look at:

RStudio’s Advance Data Science resources page

Using Machine Learning Properly

A final word of advice … Machine learning is not a black box that can magically return reliable prediction—not without significant effort of tuning, verification, validation. Even after that, there needs to be constant questioning and re-validation effort to make sure that the predictions made by the machine learning algorithm are indeed trustworthy.

In this section we will discuss several foundational points to make your machine learning journey a success. Some of the points mentioned here have to be decided early in the process.

Importance of Data Quality

The quality of data fed into the machine learning algorithm is key to the reliability of machine learning’s prediction. There is a popular saying: “Garbage in, garbage out”. Whether our machine learning model would yield value or garbage depends on the input data used to train the model. Data preparation often takes a significant chunk of time in the entire machine learning process. It is not unusual that 2/3 of the time (if not more) is spent in assessing the data, cleaning the data, removing bad data points. At times, problem with data may be discovered after the machine learning is applied, which means one must go back to the data before remaking the model.

Importance of Data Preparation and Model Selection

The discussion so far assumes that both the features of the data and the model have been decided. When tackling a new problem, one has to decide which features to include for the machine learning. For example, to classify an email as spam, do we base it only on the email subject? Or do we also want to include the contents of the email? Whether it has an attachment? What about the size of the email, the IP of the sender, the sender’s email address? In the example of machine learning on network event classification, what features are important to consider?

Machine Learning Going Awry?

Here are some cases that indicates that one can perform machine learning impressively on large sets of data, yet still obtain results that are misleading.

AAAS: Machine learning ‘causing science crisis’ (BBC News, Science and Environment, by Pallab Ghosh, February 16, 2019). Also see Can we trust scientific discoveries made using machine learning?, EurekAlert! from AAAS. Here’s an edited excerpt from BBC News:

Dr Genevera Allen, [an associate professor of statistics] from Rice University in Houston said that the increased use of [machine learning techniques] was contributing to a “crisis in science”.

She warned scientists that if they didn’t improve their techniques they would be wasting both time and money. Her research was presented at the American Association for the Advancement of Science in Washington.

A growing amount of scientific research involves using machine learning software to analyse data that has already been collected. This happens across many subject areas ranging from biomedical research to astronomy. The data sets are very large and expensive.

But, according to Dr Allen, the answers they come up with are likely to be inaccurate or wrong because the software is identifying patterns that exist only in that data set and not the real world. (emphasis added)

TODO: Track the original presentation. This could be it: https://aaas.confex.com/aaas/2019/meetingapp.cgi/Paper/23404

Key Points

Machine learning is not a blackbox and therefore should be used with proper care.

previous episode

DeapSECURE module 3: Machine Learning

next episode