📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Recent evidence indicates AI systems are increasingly capable of automating core engineering tasks in AI development, reaching near-saturation in key benchmarks. However, research activities, especially innovative and theoretical work, are still largely human-driven. The development suggests a shift in AI’s role from research to engineering, with implications for the future of AI innovation.
Recent benchmark results and expert analysis confirm that AI systems are now capable of automating the majority of core engineering tasks in artificial intelligence research and development, while research activities involving novel ideas and theoretical work remain primarily human-driven. This marks a significant shift in AI capabilities and could reshape how AI innovation is conducted.
Six key benchmarks measuring AI proficiency in core R&D skills—such as research reproduction, Kaggle competition performance, and kernel design—show rapid progress, with many nearing or reaching saturation. For example, the CORE-Bench, which tests AI’s ability to reproduce research, improved from 21.5% in September 2024 to 95.5% in December 2025, with the benchmark’s author declaring it ‘solved.’ Similarly, the MLE-Bench, assessing AI performance in Kaggle competitions, rose from 16.9% to 64.4% over sixteen months, approaching competitive levels with mid-tier human practitioners.
Experts such as Thorsten Meyer interpret these trends as evidence that AI now handles most engineering tasks involved in AI development, including reproducing experiments and optimizing hardware kernels. However, the same benchmarks reveal that research involving creative, theoretical, or highly novel tasks remains less automated. Clark’s analysis suggests that while engineering can now be largely automated, research may be inherently more complex and less amenable to automation, at least for now.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.
AI research automation tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.
AI engineering development software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.
AI research reproduction platforms
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational
Kaggle competition AI tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications of AI Automating Engineering Tasks
The ability of AI to automate core engineering tasks in AI development could drastically accelerate the pace of technological progress, reduce costs, and shift the role of human researchers toward higher-level conceptual work. This development raises questions about the future of AI research, the potential for AI to self-improve, and the changing landscape of AI innovation, with some experts warning that the residual human role may shift from routine to creative and strategic activities.
Recent Advances in AI R&D Capabilities
Over the past two years, multiple independent benchmarks—covering research reproduction, Kaggle competitions, and kernel optimization—have shown rapid progress toward saturation. For instance, the CORE-Bench, which measures the ability to reproduce research papers, has seen a 4.4× improvement, with AI systems now handling nearly all the steps involved in reproducing experiments. Similarly, AI’s performance on Kaggle competitions has improved significantly, reaching levels comparable to mid-tier human practitioners. This pattern indicates that AI is increasingly capable of automating routine engineering tasks involved in AI development, suggesting a structural shift in the field.
Thorsten Meyer’s analysis emphasizes that these advancements are not isolated but part of a broader cascade, where multiple capabilities are approaching or surpassing measurement limits, implying that AI is transitioning from a tool to a potential independent agent in engineering roles.
“The pattern of rapid saturation across multiple benchmarks indicates that AI can now automate vast swaths of AI engineering, with the residual research component remaining distinctly human for now.”
— Thorsten Meyer
Uncertainties About AI’s Role in Future Research
While engineering tasks are increasingly automated, it remains unclear how much of AI research—especially theoretical, creative, and paradigm-shifting work—can be automated. The structural question posed by Clark suggests that some aspects of research may be inherently resistant to automation, but this remains an open debate. Additionally, the pace at which research can be fully automated depends on future breakthroughs in AI reasoning, creativity, and understanding.
Next Steps in Monitoring AI Development Progress
Researchers and industry leaders will closely monitor ongoing benchmark developments, particularly as new versions of models and improved measurement tools are introduced. Further work is expected to focus on assessing AI’s capacity for innovative research tasks, beyond engineering. Policy discussions around AI’s role in research and development may intensify as capabilities continue to evolve, with some experts advocating for increased oversight and strategic planning to manage this transition.
Key Questions
What are the main benchmarks showing AI automation in engineering?
Key benchmarks include CORE-Bench for research reproduction, MLE-Bench for Kaggle competition performance, and various kernel design tests. All indicate rapid progress toward saturation.
Does this mean AI can now do all research work?
No, current evidence suggests AI excels at engineering and routine tasks but remains less capable of conducting creative, theoretical, or paradigm-shifting research, which still largely depends on human insight.
What are the risks or concerns associated with AI automating engineering?
Potential concerns include reduced human oversight, over-reliance on AI-generated solutions, and the challenge of ensuring AI’s alignment with ethical and safety standards as it takes on more autonomous roles.
How soon might AI automate most research activities?
This remains uncertain. While engineering is increasingly automated, full automation of research—especially creative aspects—may require several more years or breakthroughs in AI reasoning and understanding.
Source: ThorstenMeyerAI.com