Inventory: IBM long text interpretation of artificial intelligence machine learning and cognitive computing

The development of artificial intelligence has experienced several ups and downs, recently in the depth of learning technology driven by a wave of new unprecedented climax. Recently, IBM official website published an overview article, the artificial intelligence technology development process carried out a simple combing, but also illustrations introduced the sensor, clustering algorithm, rule-based system, machine learning, depth learning, neural network The concept and principle of technology. The heart of the machine was compiled. The original link please see the end of the text.

Human beings have never interrupted how to create intelligent machines. During the development of artificial intelligence ups and downs, there are success, there are failures, as well as the hidden potential. Today, there are too many news reports on the application of machine learning algorithms, from cancer detection to image comprehension, natural language processing, artificial intelligence is empowering and changing the world.

The history of modern artificial intelligence has all the elements of becoming a great drama. In the early 1950s, the development of artificial intelligence focused on thinking about machines and focus people such as Alan Turing, von Neumann, ushered in its first spring. After decades of prosperity and decline, and incredibly high expectations, artificial intelligence and its pioneers once again came to a new realm. Now, artificial intelligence is showing its real potential, deep learning, cognitive computing and other new technologies continue to emerge, and there is no lack of application points.

This article explores some important aspects of artificial intelligence and its sub-fields. Here first start from the artificial intelligence development timeline, and one by one analysis of all the elements.

Modern artificial intelligence timeline

In the early 1950s, artificial intelligence focused on so-called strong artificial intelligence, hoping that the machine could accomplish any intellectual task like a man. The development of strong artificial intelligence has led to the emergence of weak artificial intelligence, that is, the application of artificial intelligence technology to narrow areas of the problem. Before the 1980s, artificial intelligence research has been divided by these two paradigms, two camps relative. However, around 1980, machine learning began to become mainstream, its purpose is to let the computer have the ability to learn and build models, so that they can make predictions in specific areas and other acts.

Figure 1: Timeline for the development of modern artificial intelligence

On the basis of artificial intelligence and machine learning, depth learning came into being around 2000. Computer scientists use new topologies and learning methods in multi-layer neural networks. Ultimately, the evolution of neural networks has successfully solved the thorny problems in many fields.

In the past decade, Cognitive computing has also emerged, with the goal of creating systems that can learn and interact naturally with humans. By successfully defeating Jeopardy's world-class players, IBM Watson proved the value of cognitive computing.

In this article, I will explore all of the above areas, and some key algorithms to explain.

Basic artificial intelligence

The study before 1950 suggests that the brain is made up of electrical pulse networks, and that the interaction between pulses produces human thought and consciousness. Alan Turing shows that all calculations are numbers, then it is not out of reach to build a machine that can simulate the human brain.

As mentioned above, early research is a lot of strong artificial intelligence, but also put forward some basic concepts, by machine learning and in-depth learning to follow the present.

Figure 2: Timeline of artificial intelligence methods between 1950 and 1980

Artificial Intelligence Search Engine

Many of the problems in artificial intelligence can be solved by the brute-force search. However, taking into account the medium problem of search space, the basic search will soon be affected. One of the earliest examples of artificial intelligence searches is the development of checkers. Arthur Samuel has created the first checker program on the IBM 701 electronic data processing machine to optimize the search tree (alpha-beta pruning); this program also records and rewards specific actions, Allows the app to learn every game that has been played (this is the first self-learning program). In order to improve the learning rate of the program, Samuel will be programmed as a self-game to enhance its ability to play and learn.

Although you can successfully apply the search to many simple questions, this method will soon fail when the number of choices increases. To a simple word game, for example, the game at the beginning, there are nine steps may be a chess, every one chess has eight possible opposite, and so on. The complete chess tree contains 362,880 nodes. If you continue to extend this idea to chess or go, you will soon develop the search for the disadvantages.

sensor

A perceptron is an early supervised learning algorithm for single-layer neural networks. Given an input feature vector, the sensor can classify the input. By using the training set, the weights and deviations of the network can be updated for linear classification. The first implementation of the perceptron is IBM 704, which is then used for image recognition on custom hardware.

Figure 3: Perceptron and Linear Classification

As a linear classifier, the perceptron has the ability to solve linear separation problems. A typical example of perceptual limitations is that it can not learn the exclusive OR (XOR) function. The multi-layer perceptron solves this problem and lays the foundation for more complex algorithms, network topologies, and deep learning.

Clustering Algorithm

The method of using the perceptron is supervised. The user provides data to train the network and then tests the network on new data. The clustering algorithm is an unsupervised learning method. In this model, the algorithm organizes a set of feature vectors into clusters based on one or more attributes of the data.

Figure 4: Clustering in a two-dimensional feature space

The simplest clustering algorithm that you can achieve with a small amount of code is the k-means (k-means). Where k represents the number of clusters you assign to the sample. You can use a random feature vector to initialize a cluster, and then add other samples to its nearest neighbor cluster (assuming each sample can represent a feature vector, and you can use Euclidean distance to determine the 'distance') The As you add more and more samples to a cluster, the centroid, the center of the cluster, is recalculated. The algorithm then re-checks the samples once to ensure that they are in the nearest neighbor cluster, and finally no samples need to be changed to the cluster.

Although k-means clustering is relatively effective, you must determine the size of k in advance. Depending on the data, other methods may be more efficient, such as hierarchical clustering or distribution-based clustering.

Decision tree

Decision trees and clusters are very similar. The decision tree is a predictive model of observation, and some conclusions can be drawn. The conclusion is expressed in the decision tree as a leaf, while the node is the decision point to observe the bifurcation. The decision tree is derived from a decision tree learning algorithm in which the data set is divided into different subsets according to the attribute value tests, which are called recursive partitioning.

Consider the example in the following figure. In this data set, I can see if someone has productivity based on three factors. Using a decision tree learning algorithm, I can use an indicator to identify attributes (one of which is information gain). In this example, mood is the main factor in productivity, so I split this data set based on Good Mood's Yes or No. However, on Yes side, I need to cut the data set again according to the other two attributes. The different colors in the table correspond to the leaves of the different colors in the right side.

Figure 5: A simple data set and its decision tree

One of the important properties of the decision tree lies in their inherent organizational capabilities, which allows you to easily (graphically) explain the way you classify an item. Popular decision tree learning algorithms include C4.5 and Classification and Regression Tree.

Rule-based system

The earliest rule-based and reasoning-based system was Dendral, which was developed in 1965, but until the 1970s the so-called expert systems began to flourish. Rule-based systems will have the same rules of knowledge, and will use a reasoning system to arrive at a conclusion.

Rule-based systems are usually composed of a rule set, a knowledge base, a reasoning engine (using a forward or reverse rule chain), and a user interface. In the following figure, I use the knowledge 'Socrates is the man', the rule 'if it is human, will die' and an interaction 'Who will die?'

Figure 6: Rule-based system

Rule-based systems have been applied in areas such as speech recognition, planning and control, and disease identification. The Kaleidos, a system developed to monitor and diagnose dam stability in the 1990s, is still in use.

robotic leanring

Machine learning is a sub-area of artificial intelligence and computer science, and there are also the foundations of statistics and mathematical optimization. Machine learning covers technologies with supervised learning and unsupervised learning areas that can be used for forecasting, analysis and data mining. Machine learning is not limited to the depth of learning this one. But in this section, I will introduce several algorithms that make depth learning so efficient.

Figure 7: Machine learning method timeline

Reverse transmission

The powerful force of the neural network stems from its multi-layer structure. Single-layer sensor training is very straightforward, but the resulting network is not strong. The question came: how do we train multi-layer network? This is the back of the spread of the useless.

Back propagation is an algorithm for training multilayer neural networks. Its work process is divided into two stages. The first stage is to propagate the input through the entire neural network until the last layer (called feedforward). In the second stage, the algorithm computes an error and then propagates the error (adjustment weight) from the last layer to the first layer.

Figure 8: Reverse propagation diagram

During the training process, the middle layer of the network organizes itself and maps the input space to the output space. Reverse propagation, using supervised learning, you can identify the input to the output mapping error, and then you can adjust the weight accordingly (using a learning rate) to correct this error. Backward propagation is still an important aspect of neural network learning. As computing resources become faster and faster, it will continue to be applied in larger and more dense networks.

Convolution neural network

Convolution neural network (CNN) is a multi-layer neural network inspired by animal visual cortex. This architecture is useful in many applications that include image processing. The first CNN was created by Yann LeCun, which was primarily used for handwritten character recognition tasks, such as reading postal codes.

LeNet CNN consists of several layers of neural networks capable of separately extracting feature and feature extraction. The image is divided into a plurality of acceptable regions, which enter a convolution layer that can extract features from the input image. The next step is pooling, which reduces the dimension of the feature extracted by the convolution layer (by downsampling the method) while preserving the most important information (usually through the maximum pooling method). Then the algorithm performs another convolution and pooling, and then the pooling enters a fully connected multilayer sensor. The final output of the convolution neural network is a set of nodes that can identify the image features (in this case, each identified number is a node). Users can train the network through reverse propagation.

Figure 9. LeNet convolution neural network architecture

The use of deep processing, convolution, pooling, and full-connection classification layers opens the door to a variety of new applications for neural networks. In addition to image processing, convolution neural networks have been successfully applied in video recognition and natural language processing and other tasks. Convolution neural networks have also been effectively implemented on the GPU, which greatly improves the performance of convolution neural networks.

Long and short term memory (LSTM)

Remember the discussion in front of the reverse communication? The network is feedforward training. In this architecture, we send inputs to the network and propagate them forward to the output layer through the hidden layer. However, there are other topologies. I am here to study a structure that allows nodes to form a direct loop between. These neural networks are called cyclic neural networks (RNNs) that can feed content to the previous layer or subsequent nodes of the same layer. This feature makes these networks ideal for timing data.

In 1997, a special circular network called Long Term Memory (LSTM) was invented. LSTM contains memory units in the network that can remember values for long or short periods of time.

Figure 10. Long and short memory networks and memory units

The memory unit contains some gates that can control the flow of information into or out of the unit. The input gate controls when new information can flow into the memory unit. Forget gate controls the amount of time a piece of information remains in a memory unit. Finally, the output gate controls when the output contains the information contained in the memory cell. The memory unit also includes controlling the weight of each gate. The training algorithm (usually by backpropagation-through-time, a variant of the back-propagation algorithm) optimizes these weights based on the resulting error.

LSTM has been used in speech recognition, handwriting recognition, speech synthesis, image description and other tasks. Below I will talk about LSTM.

Depth study

Depth learning is a set of relatively new set of methods that fundamentally change machine learning. Depth learning itself is not an algorithm, but it is a series of algorithms that can be used to achieve deep network with unsupervised learning. These networks are very deep, so new computational methods are needed to build them, such as GPUs, in addition to computer clusters.

This paper has introduced two deep learning algorithms: convolution neural network and long and short memory network. These algorithms have been combined to achieve some surprising intelligence tasks. As shown in the following figure, convolution neural networks and long and short memories have been used to identify and describe objects in images or videos in natural language.

Figure 11. Image description with convolution neural network and long and short memory

The depth learning algorithm has also been used in face recognition and can also identify tuberculosis at 96% accuracy and is also used in autopilot and other complex problems.

However, despite the use of deep learning algorithm has a lot of results, but there are still problems we need to solve. A recent application of depth learning for skin cancer testing found that this algorithm has a higher accuracy than a certified dermatologist. However, the doctor can list the factors that lead to the diagnosis of the factors, but there is no way to know the depth of learning procedures in the classification of the factors used. This is called the depth of learning the black box problem.

Another application, known as Deep Patient, can successfully predict a disease when providing patient cases. The application has proven to be better than doctors in disease prediction - even well-known unpredictable schizophrenia. So, even if the model works well, no one can go deep into these large neural networks to find the cause.

Cognitive computing

Artificial intelligence and machine learning are full of cases of biological revelation. Although early artificial intelligence was focused on building the ambitious goal of imitating the human brain, and now, cognitive computing is moving towards that goal. Cognitive computing is based on neural networks and in-depth learning, using knowledge in cognitive science to build a system that simulates human thinking processes. However, cognitive computing covers a number of disciplines, such as machine learning, natural language processing, visual and human-computer interaction, rather than just focusing on a single technique.

An example of cognitive learning is IBM's Waston, which shows the most advanced Q & A interaction on Jeopardy. IBM has extended it to a range of web services. These services provide a powerful virtual proxy for programming applications such as visual recognition, voice text conversion (speech recognition), text-to-speech conversion (speech synthesis), language understanding and translation, and the dialog engine The

Keep going

This article covers only a small part of the history of artificial intelligence and the latest neural networks and deep learning methods. Although artificial intelligence and machine learning have experienced a lot of ups and downs, new approaches such as deep learning and cognitive computing have significantly improved the level of these disciplines. Although it may not be possible to achieve a conscious machine, but today there is indeed able to improve human life, artificial intelligence system. (Source: IBM; Compilation: Machine Heart; Participation: Wu Pan, Huang Xiaotian, Nurhachu Null; Editor: China Electronic Commerce Research Center)

Internet Research Papers