10:29 pm - May 9, 2025

One of OpenAI’s co-founders described data as AI’s “fossil fuel” and spoke about attempts to lessen its impact on development.

The artificial intelligence industry is facing a significant data shortage that could alter its trajectory, according to OpenAI co-founder Ilya Sutskever. Speaking at the recent Conference on Neural Information Processing Systems (NeurIPS) in Vancouver. Sutskever emphasised how critical data is to AI development, likening it to “fossil fuel”.

“We’ve achieved peak data and there will be no more,” he said, according to the Observer.

This alarming forecast coincides with growing restrictions on data access, revealed by a Data Provenance Initiative study. It indicated that website owners are increasingly blocking AI companies from accessing high-quality data sources, with a 25% decrease anticipated between 2023 and 2024. This limitation signifies that AI developers may soon find it more challenging to acquire the diverse datasets necessary for training sophisticated AI models.

In response to these challenges, some industry leaders are pivoting towards alternative solutions. OpenAI’s CEO, Sam Altman, has suggested the utilisation of synthetic data — information generated by AI models themselves — as a potential way forward. Furthermore, OpenAI is seeking to enhance its reasoning capabilities through the development of its new o1 model, which aims to provide AI systems with improved cognitive functionalities.

These developments come at a time when critiques of current AI capabilities are becoming more prevalent. Venture capital firm Andreessen Horowitz, represented by Marc Andreessen, noted that several companies have reached similar technological ceilings, leading to a perceived plateau in AI advancements.

In light of these challenges, Sutskever, who left OpenAI earlier this year to establish Safe Superintelligence with backing from investors like Andreessen Horowitz and Sequoia Capital, expressed optimism for the future of AI. He believes that upcoming AI systems will evolve to interpret information from limited data sources without confusion, although specifics regarding the timeline or methodology for this transformation remain undisclosed.

The pressing issue of data scarcity has spurred companies such as OpenAI, Meta, Nvidia and Microsoft to engage in data-scraping practices. This approach, while providing a solution to the current data drought, raises ethical and legal questions. Microsoft, for instance, faced criticism for its use of user data from LinkedIn to train its AI models, prompting an update in its terms of service.

Similarly, Meta’s use of publicly available social media posts from European users to train its Llama large language models is under scrutiny, as privacy concerns have led to multiple legal challenges. Nvidia has also faced backlash for scraping content from platforms like YouTube and Netflix, specifically videos from well-known tech YouTuber Marques Brownlee. Despite these companies asserting compliance with copyright laws, the ethical ramifications of utilising data without explicit user consent are increasingly being questioned across the industry.

As the landscape of AI development continues to evolve amidst these challenges, industry professionals are closely monitoring how both the technological innovations and the regulatory frameworks will reshape the methodologies for data collection and utilisation in artificial intelligence. The potential disconnection between the legal approaches to AI-generated content and the ethical implications remains a critical discussion point as stakeholders navigate this complex environment.

Source: Noah Wire Services

More on this

Tags:

Register for Editor’s picks

Stay ahead of the curve with our Editor's picks newsletter – your weekly insight into the trends, challenges, and innovations driving the future of digital media.

Leave A Reply

© 2025 Tomorrow’s Publisher. All Rights Reserved. Powered By Noah Wire Services. Created By Sawah Solutions.
Exit mobile version
×