机器学习笔记(1)常识

Introduction

Here is a not so short note (Rolling Update) for machine learning and artificial intelligience. Main concepts and some references are presented. This note is based on Wikipedia, Andrew Ng’s Lecture Notes CS229 of Standford, Machine Learning course of SJTU IEEE-yangyang, and Machine Learning Crash Course of Google.

Elementary concepts

What is machine learning?

Machine Learning is a scientific discipline that explores the construction and study of algorithms that can learn from data. Machine learning enables analysis of massive quantities of data. We all know “Machine”, i.e. computers or computer programs. Do you know what is “Learning”? Learning denotes “Changes in the system that are adaptive in the sense that they enable the system to do same task or tasks drawn from the same population more efficiently and more effectively the next time” [1]. As a pioneer in the field of artificial intelligence, Herbert A. Simon created with Allen Newell the Logic Theory Machine (1956), and the General Problem Solver (GPS) (1957) programs, which was thought as the first method developed for separating problem solving strategy from information about particular problems.

Arthur Samuel created the world’s first successful self-learning programs, the Samuel Checkers-playing. He coined the term “Machine Learning” in 1959, which was defined as “the field of study that gives computers the alility to learn without being explicitly programmed”.

Tom Mitchell (former Chair of the Machine Learning Department at CMU) gave a modern definition: “A computer program is said to learn from experience EE with respect to some class of tasks T and performance measure PP, if its performance at tasks in TT, as measured by PP, improves with experience EE.”

An example: playing checkers.

  • EE = the experience of playing many games of checkers.
  • TT = the task of playing checkers.
  • PP = the probability that the program will win the next game.

What can machine learning do?

ML can be used to deal with Big Data deluge and predicitve analytics, e.g.

  • Document Classification
  • Spam Filtering
  • Weather Prediction
  • Stock Market Prediction
  • Collaborative Filtering
  • Clustering Images
  • Human Genetics
  • Decoding thoughts from brain scans
  • Medical data analysis
  • Finance
  • Robotics
  • Natural language processing, speech recognition
  • Computer vision
  • Web forensics
  • Computational biology
  • Sensor networks (new multi-modal sensing devices)
  • Social networks
  • Turbulence problem

Three components of a machine learning algorithm

Pedro Domingos, a CS professor at the University of Washington, decomposed machine learning into three components: Representation, Evaluation, and Optimization [2].

Table 1. The three components of learning algorithms

Represnetation Evaluation Optimization
Instances Accuracy/Error rate Combinatorial optimization
^K-nearest neighbor Precision and recall ^Greedy search
^Support vector machines Squared error ^Beam search
Hyperplanes Likelihood ^Branch-and-bound
^Naive Bayes Posterior probability Continuous optimization
^Logistic regression Information gain ^Unconstrained (Convex)
Decision trees K-L divergence ^^Gradient descent
Sets of rules Cost/Utility ^^Conjugate gradient
^Propositional rules Margin ^^Quasi-Newton methods
^Logic programs ^Constrained
Neural networks ^^Linear programming
Graphical models ^^Quadratic programming
^Bayesian networks
^Conditional random fields

Representation

A classifier must be represented in some formal language that the computer can handle. A learner takes observations as inputs. The observation language is the language used to describe these observations. The hypotheses that a learner may produce, will be formulated in a language that is called the hypothesis language. The hypothesis space is the set of hypotheses that can be described using this hypothesis language [3]. The problem is how to represent the input, i.e. what features to use. For example, 3-layer feedforward neural networks (or computational graphs) form one type of representation, while support vector machines with RBF kernels form another.

Evaluation

Evaluation is essentially how you judge or prefer candidate programs (hypotheses). An evaluation function (also called objective function, utility function, loss function, fitness function or scoring function) is needed. Mean squared error (of a model’s output vs. the data output) or likelihood (the estimated probability of a model given the observed data) are examples of different evaluation functions.

Optimization

Finally, a method is needed to search among the candidate programs (in the space of represented models) for the highest-scoring one as the optimum. Stochastic gradient descent and genetic algorithms are two different ways of optimizing a model class.

Machine Learning Tasks

Machine learning tasks are typically classified into two broad categories, depending on whether there is a learning “signal” or “feedback” available to a learning system: superivsed learning and unsupervised learning. Other tasks also include semi-supervised learning, active learning, reinforcement learning, etc.

Supervised learning

Supervised learning (or inductive learning) is the most mature, the most studied and the type of learning used by most machine learning algorithms. Learning with supervision is much easier than learning without supervision. Training data includes desired outputs. The goal is to learn a general rule that maps inputs to outputs. Typically, the inputs are transformed into a feature vector, which contains a number of features that are descriptive of the object.

  • Classification
  • Regression

Reinforcement learning is concerned with how to take actions in an environment to maximize some notion of long-term reward, such as game-theory. It differs from standard supervised learning in that correct input/output pairs are not provided, nor sub-optimal actions explicitly corrected.

Unsupervised learning

Training data does not include desired outputs. It is hard to tell what is good learning and what is not. The program own will find the feature in its inputs as “unlabeled” data.

  • Clustering
  • Dimensionality reduction
  • Anomaly dection
  • Neural Networks

Main purpose of studying ML

To make money.

Reward