The way to ChatGPT: from the first neural networks of the 1980s to the polished language models of today

MIT Technology Review published a retrospective of large language models development.

1980s–’90s: Recurrent Neural Networks

Skyrocketing ChatGPT originates from GPT-3, a large language model created also by OpenAI. Being a type of neural network language model has been trained on many textual materials.

Language model requires a neural network to generate as text is a sequence of letters and words. Neural network construction is inspired by the way neurons transmit signals.

Invented in the 1980s recurrent neural networks could process word order but that versions were slow to train and could miss the sequence.

Later in 1997 computer scientists Sepp Hochreiter and Jürgen Schmidhuber invented Long Short-Term Memory (LSTM) networks that could process lines of text several times longer and retain the sequence better. However, the language skills were limited.

2017: Transformers

A team of Goggle researchers pushed development significantly by inventing the Transformers. The new simple network architecture based solely on attention mechanisms could handle longer strings of text and additionally identify the word meaning more exactly.

2018–2019: GPT and GPT-2

The time OpenAI debited with the two large language models – GPT (Generative Pre-trained Transformer). OpenAI considers large language models as a key stage towards multi-skilled, general-purpose AI. The main difference of GPT is unsupervised learning. Previously the models had been mainly trained on supervised learning and annotated data.

Unsupervised approach enabled to speed up training process and expand the data sets size.

2020: GPT-3

OpenAI’s new language generator GPT-3 turned to be shockingly good. It pushed the bar even higher by generating human-like text.

Its ability was notably improved, GPT-3 could answer questions, summarize documents, generate stories in different styles, translate between English, French, Spanish, and Japanese, and more. However, GPT-3 uncontrollably absorbed much disinformation from the internet. As OpenAI acknowledged:

“Internet-trained models have internet-scale biases.”

December 2020: Toxic text and other problems

More issues, ethical and others, emerged on the background of enhancing capability of the large language models. But most of the tech companies continued improving and not cutting its abilities.

January 2022: InstructGPT

OpenAI performed InstructGPT, an improved version of GPT-3. Its generated materials that contained less misinformation and offensive details.

May–July 2022: OPT, BLOOM

Tech giants dominate research, but training of large language models is a costly process. Small businesses hardly manage to perform such powerful AI. In response, a handful of collaborative projects have developed large language models and released them for free to any researcher who wants to study—and improve—the technology. Meta unveiled OPT, Hugging Face united and release BLOOM.

December 2022: ChatGPT

Recently came ChatGPT, OpenAI’s latest fix for GPT-3. ChatGPT was trained using reinforcement learning on feedback from human testers who scored its performance as a fluid, accurate, and inoffensive interlocutor.

And the rest was history.