This content is produced by Laval University.
ChatGPT was launched at the end of November 2022, and quickly wowed the world with its amazing performance. The text-generating application was able to fool many readers, even among the most attentive ones, who were unable to distinguish texts produced by artificial intelligence (AI) from those written by humans. But how did what many thought impossible yesterday turn into reality so quickly?
“The explanation for this rapid rise of AI and ChatGPT can be viewed as a triangle with all three vertices of equal importance. First, the computing power of computers has increased dramatically. Second, the amount of high-quality data needed to train neural networks has increased. Third, there have been many innovations In neural network engineering Nicolas Doyon.
At the invitation of Completion of education From the College of Science and Engineering to present a conference to the general public on this topic, this professor from the Department of Mathematics and Statistics and researcher in Servo Research CenterHe discussed some of the landmarks in the history of artificial intelligence and published some of the scientific and mathematical principles on which the success of the famous computer application depends.
Champion chess machine
One of the greatest achievements of artificial intelligence dates back to 1996 when the computer Deep Blue was able to beat world chess champion Garry Kasparov. Deep Blue is programmed to create a tree of possibilities, assign a value to the final positions of the different branches of the tree, and then determine the best possible move.
This approach, which worked well in chess, was less suitable for go, whose board forms a 19 x 19 grid – which offers many more move possibilities than the 8 x 8 format of chess. Even for a computer, the tree of possibilities became too large. “For this reason, the researchers said to themselves: ‘This is not like our way of thinking at all,'” says Nicolas Doyon. How can we draw inspiration from the workings of the human brain and neurons to improve artificial intelligence? »»
Imitation of neurons
By studying the functioning of human neurons, we discovered that they do not react to all the messages they receive. The message must reach a minimum threshold for the neuron to emit so-called action potentials, which always have the same strength and the same shape, whatever the intensity of the initial message. These action potentials are transmitted to the next neuron through the synapse. It's an all-or-nothing law.
However, synapses are not only used to transmit information from one neuron to another; Their flexibility will play a central role in learning. In fact, researchers have observed that the strength of connection between synapses changes over time. “Simply put, the more a synapse is used, that is, the more it propagates the action potential to the next neuron, the stronger it becomes. We can clearly see under the microscope that when a person learns, the dendritic spine, which is an area of the neuron, becomes larger. In short, As the synapse becomes larger and stronger, the synapse gradually adjusts the way we think.
How can these biological facts be represented mathematically? Nicolas Doyon answers: “One way to translate the all-or-nothing law into mathematics is to use the Heaviside function.” Often in mathematics, functions go from 0 to 1 continuously. “A Heaviside function, on the other hand, is a function that has a value of 0 until the input to the function reaches a certain threshold. Then it suddenly goes to 1,” he explains.
“To represent the role of synapses, we assign weights to the different inputs to the neuron,” he adds. From the graph we can see that after determining the numerical values of the inputs, we multiply these values by the weight of the clip, add the results of these multiplications to get a weighted sum, and finally, see if this value reaches the desired limit, which will result in 0 or 1.
In recent years, artificial intelligence has been able to achieve major breakthroughs thanks to the development of deep learning. “We now work with neural networks with multiple layers: the input layer, the intermediate layers, and the output layer. Between a neuron in one layer and a neuron in another layer, there is a connection force, also called synaptic weight, and as the network learns, each of these is modified.” Weights,” notes Nicolas Doyon.
How do you learn the network? By training, the researcher points out. Consider the case of a neural network that is asked to confirm whether an image is a cat or a dog. We will assign the value 0 to cat and 1 to dog. To train the network, we will use thousands, even millions, of images of these little creatures, and we will examine the percentage of images that are well classified. If the network does not give the correct answer, it does not get the correct output value because the interleaved weights are not well tuned. Therefore, we will adjust these weights until we obtain a very high success rate.
But how do I adjust the weights? “We use, among other things, the method of gradient descent. To illustrate this, we can imagine a person trying to descend as quickly as possible down the mountainside. This is easy to visualize when there are only two entrances. On the x-axis we will display the success rate associated with the different weights that we hit “It has the first entry, and on the y-axis is the success rate associated with the different weights by which we multiplied the second entry. On the z-axis the error will appear. It is then possible to visualize the point where the error is at its lowest and try to adjust the weights to move towards that direction,” Professor Doyon explains, Which adds at the same time as the principle, although it is always the same, is much more difficult to visualize in reality when the number of parameters to be adjusted is in the millions, or even billions.
Math and reading are at the heart of ChatGPT
Of course, the exact numbers have not been revealed publicly, but we can estimate that ChatGPT has a network of 60 to 80 billion neurons, 96 layers and 175 billion weights that can be modified. For comparison, there are approximately 85 billion neurons in the human brain. “The comparison is still a bit tenuous, because our neurons are not exactly the same as artificial neurons, but we are roughly on the same order of magnitude,” agrees Nicolas Doyon.
When a computer application is asked to identify itself, it answers: “ChatGPT uses a deep neural network architecture.” It is important to note that ChatGPT does not have deep understanding or self-awareness. The answers are based solely on the statistical probabilities of words or phrases. Thus, to generate text, ChatGPT will It calculates the odds that it will be followed by another sequence of words, from a series of words, and then suggests the most likely sequence.
To achieve this, ChatGPT had to train on billions of data points. The content of this reading is of course professional confidentiality. However, it can be assumed that the network was trained on more than 300 billion words. “If you read 300 words per page and one page per minute, 24 hours a day, you would have to read for 1,900 years to absorb that amount of information,” the mathematician explains to help get an idea of the size of the underlying library for learning ChatGPT.
“If you read 300 words per page, one page per minute, 24 hours a day, you would have to read for 1,900 years to absorb that amount of information.”
— Nicolas Doyon, on the supposed 300 billion words that make up the ChatGPT training database
Between wonder and fear
ChatGPT's incredible performance sometimes piques the imagination of some, who see the future as a sci-fi movie where artificial intelligence takes over the world. However, this scenario is not what worries those, among scientists, who would like to see better regulation of AI development. Rather, their goal is to prevent some of the slippage associated with the use that humans might make of them. They also want us to take the time to better understand and analyze the negative ramifications of this technology.
“What could go wrong? Obviously, students can use ChatGPT to cheat. People can lose their jobs. Recently, striking writers in Hollywood demanded to limit the use of artificial intelligence in screenwriting,” recalls Nicolas Doyon.
In addition, the professor reveals that other problems are less obvious and more insidious. “For example, in the field of facial recognition, AI can recognize white men more easily than women or people from visible minorities,” he says. This fact is a bit surprising since we imagine a neutral AI. It can't be sexist or racist. But because the AI was likely trained on a database containing more male and white faces, it inherited our flaws.
Another example the professor provides comes from DeepL, a translation application that uses the same principles as ChatGPT. “If we ask DeepL to translate the phrase ‘she reads’ into Hungarian, it will give us ‘ὄ olvassa.’ If we ask it to translate the same Hungarian words into French, it will say ‘il lit’,” he says. Why? Because the database has a statistical bias, the male subject is frequently found in front of the verb “to read.”
The environmental problem, which is often hidden, should not be taken lightly. “People think that AI is virtual and has no impact on the environment. However, according to one article, every time you talk to it, ChatGPT drinks 500ml of water. This image was used to remind us that cooling supercomputers requires a huge amount of Water. In addition to this resource, ChatGPT also requires a lot of energy. Some say that AI will soon consume as much electricity as an entire country.
So what is the future of AI and ChatGPT? “I don't know,” Professor Doyon answers humbly. “Are there things that ChatGPT will never be able to do? I have no answer. Every month we hear that ChatGPT has done something new. It is impossible to know where it will all end,” the mathematician concludes.
“Hardcore beer fanatic. Falls down a lot. Professional coffee fan. Music ninja.”