Nowadays, machine learning technology is ubiquitous in many aspects of modern society, and it is increasingly used in popular products such as cameras and smartphones. The uses of machine learning systems vary from web searches to recommendation systems in identifying objects in images and many others. Most of these applications are based on what is called deep learning - a method reliant on Neural Networks, a set of algorithms designed to solve any problem representable through examples.
Say, you were to build a system that allows you to classify images between cats and cars. The first thing to do is collect a large image dataset of cats and cars and attribute each to the correct category. Then, during the training step, the system will produce a vector score as an output for each category based on a shown image. The next step is to compute an objective function to measure the error between the output scores and the expected score to finally learn from it and adjust the parameters again to reduce the error.
Most of the applications of deep neural networks involve image recognition. The first neural machine, the Mark 1 Perceptron, was born to be a vision machine. Here, it is vital to mention that the goal of the Perceptron was not recognizing simple shapes but learning to recognize shapes by using statistical calculations. As mentioned by Rosenblatt in 1958, the "theory has been developed for a hypothetical nervous system, or machine called a perceptron. The perceptron is designed to illustrate some of the fundamental properties of intelligent systems in general.”
The Perceptron was the starting point of the field now commonly denominated as Computer Vision. In the summer of 1966, Marvin Minsky assigned the “Summer Vision Project” to an undergrad student to build a system able to analyze a scene and identify objects in it. Soon, they realized that it would take more than just a summer to make computers able to see and that the whole project was way bigger than they thought. More than fifty years later, Computer Vision is a prominent field of Artificial Intelligence (AI) that aims to give computers a visual understanding of the world while trying to automatize human visual tasks. Computer Vision’s ambition is not only to see but also to extract valuable information based on observation.
Training is a crucial step for deep neural networks. Sets of labeled images are constantly used to develop modern computer vision systems. The selection of the training dataset is perhaps one of the most critical and vulnerable parts of the design of neural networks. Neural networks are trained to recognize patterns in image training datasets to recognize the same patterns in future images. These systems are built on a foundation of arguable and unstable epistemological and metaphysical assumptions about the nature of images, labels, categorization, and 7 representation. Categories say a lot about how they approach the problem, and there is always a specific interest behind it. As Bowker and Star mention, “categorization is a powerful semantic and political intervention”. How categories are sorted, and what belongs in each category implies that someone decides how to implement them. All these decisions are powerful allegations about who gets to pick how things are supposed to be. Categories say a lot about what the approach to the problem is, and there is always a certain perspective behind categories.
Contrary to the good-old-fashion belief of the autonomy of AI, during the design of neural networks, many steps are still affected by human intervention. Even though we would like to assume that most of the systems already working among us were trained using well-balanced datasets, this assumption might not be necessarily correct. Indeed, there are many examples of systems reflecting racial, gender, or class bias that was reflected and amplified through the neural network. For example, the HP facial recognition system trained on a database of white people’s faces failed to recognize black people. This problem is known as “overfitting”. Given many examples of the same issue, a neural network will tend to learn too much about it and will end up focusing on a very specific pattern.
Related to overfitting, we find another phenomenon called “apophenia”. Google DeepDream is a great example of this phenomenon. Apophenia proves, as Hyto Steyerl mentions, that "pattern recognition also exists where there is no pattern but a form is detected nevertheless.” Apophenia is like creating patterns from a noisy background. Overfitting and apophenia show one of the major pitfalls of the nature of training datasets - that is the limitation of how the categories are constructed in neural networks. While training datasets might reflect just a small sample of the real world, we should raise questions about the relation of such a sample with the real world. The goal of a natural network is to be able to generalize results with unknown data, yet this generalization is possible due to the “homogeneity between training and test dataset.” Working towards a balanced training set should be one of the goals while designing an algorithmic system. This way, the risks arising from overfitting and apophenia would be minimized.
In current daily life, more and more processes are based on algorithms automatizing different tasks. Many of these tasks are put into the world with the belief that they will make decisions in a more neutral manner than a human would. Nevertheless, algorithms "are built and embedded into the lived world, at the level of institutional practice, individual behavior, and human experience”. We train and model algorithmic systems based on our visions of the world and with a clear outcome in mind. At the same time, the outcome is inevitably influenced by social, cultural, and economic interests and agendas. As Beer points out, “algorithms should not be understood as an object that exists outside of those social processes”.