OpenAI's AI Model Development Slows Due to Data Shortage

OpenAI’s progress on its latest language model, codenamed Orion, has reportedly hit a roadblock due to a shortage of training data.

Testers have found that while Orion shows improvements over previous models, the leap isn’t as significant as the jump from GPT-3 to GPT-4.

According to The Information, cited by TechCrunch, Orion has completed only 20% of its training phase but is already showing performance close to GPT-4 in several areas. However, its coding capabilities still lag behind its predecessors. OpenAI attributes the slower development pace to the limited availability of quality data for further training.

To address this challenge, OpenAI has assembled a team to explore alternative approaches, including the use of synthetic data generated by other AI models. This method could help alleviate the shortage of high-quality text for training.

The Verge reports that OpenAI plans to release Orion in December 2024, though unlike GPT-4o and o1, it will not be available through a ChatGPT subscription. Instead, Orion will first be accessible to partner companies.

In August 2024, The Information revealed that Orion's training includes data generated by another model, o1 (also known as Strawberry), partially offsetting the data shortage and helping advance Orion’s development.