Soh Kam Yung reviewed Why Machines Learn by Anil Ananthaswamy
A mathematical look at how machines learn and make decisions.
4 stars
A fascinating book that looks at the history of Machine Learning (ML) to show how we arrive at the machine learning models we have today that drive applications like ChatGPT and others. Mathematics involving algebra, vectors, matrices, and so on feature in the book. By going through the maths, the reader gets an appreciation of how ML system go about the task of learning to distinguish between inputs to provide the (hopefully) correct output.
The book starts with the earliest type of ML, the perceptron, which can learn to separate data into categories and started the initial hype over learning machines. The maths are also provided to show how, by adjusting the weights assigned to its testing input, the machine discovers the correct weights which can allow it to categorize other inputs.
Other chapters then cover other ways to train a machine to categorize its input is shown, based on Bayes Theorem and nearest neighbour. They have their advantages and disadvantages: choosing the right (or wrong) way to train a machine will have an impact on how well the machine can categories its data.
Matrix manipulation, eigenvalues and eigenvectors are then introduced. When there are many input parameters, it can be hard to categorize them based on all the factors. By using eigenvalues and eigenvectors, it is possible to discover which factors cause the most variation among the data, and thus categorize them. And, in an interesting reversal, it is also possible to manipulate the input by putting them into more categories, which can reveal patterns that can then be used to categorize the input.
These ML models categorize input data using one level of 'neurons'. The next step would be to introduce a 'hidden layer' of neurons that can be used to combine the incoming data in many ways, which provides new ways to manipulate the data for categorization. This would provide a boost in the abilities of machines to recognize input data.
Lastly, the book catches up to current day ML models, which feature a huge increase in the number of hidden layers and weights used to manipulate input data. The book then points out that this huge increase has caused the theory of how machines learn to fall behind: the machines now exhibit abilities that theory cannot account for. The ability of such machines to pick out patterns in data through self-learning, rather than being pre-fed known data, is also an unexpected feature that current ML theories cannot account for.
These unaccounted features of current day ML systems are a probable cause of concern. So too is the concern over the kind of data being pre-fed to the systems: data that comes with various biases that only cause the system to make yet more biased decisions. Until we know better how these systems behave, it would be best to treat their outputs with caution.