뉴욕 사무실에 있는 제너럴 인튜이션(General Intuition)의 거대한 4족 보행 로봇이 다가오는 모습이었다.
" 제멋대로 움직이는 쓰레기통은 자신의 몸이 주변 세계와 어떻게 상 세계의 이 곳은 거물급 투자자들의 지원을 확보했습니다.
목요일, 업로드된 게임 플레이는 General Intuition의 시공간 추론 모델, 즉 공간 de Witte가 말했다.
"우리는 화면상의 포트나이트(Fortnite 때로는 그냥 통과해 버 환경”을 인과관계에 대해 더 풍부하게 이해할 수 있는 방식으로 구현하는 것입니다.
As soon as I entered General Intuition s R&D floor at its New York office, the company s 31-year-old co-founder and CEO Pim de Witte directed my attention to a monitor perched on a standing desk.
Someone appeared to be playing something like Fortnite.
“Our agent has been playing for 100 hours straight,” Kent Rollins, the company’s chief product officer, said, beaming.
Before I could get absorbed in the spectacle of an AI navigating the game’s virtual environment, I heard the electronic footsteps of a large quadrupedal robot approaching.
“The same brain powering the agent playing the game is powering the robot,” de Witte told me.
Josh Duplantis, a data analyst carrying a laptop streaming a live feed from the robot’s single camera, piped up to explain that the bot’s default mode was “exploration.
” Relying on that camera, its singular eye, the giant buglike bot walked up to me, circled around me, and continued into the office.
It occasionally clipped the legs of chairs or bumped into an errant trash bin, much like a toddler who hasn’t yet learned how her body relates to the world around it.
Duplantis said it took just eight minutes of real-world robotics data to fine-tune an AI model for the quadruped.
What s more, that data was collected on the street, not inside the office where the bot was currently navigating itself.
An agentic model that can generalize from gameplay to simulation to embodiment is General Intuition’s raison d’être.
And that model s ability to figure out its place in the world has secured the backing of some heavy hitters.
On Thursday, General Intuition said it raised $320 million at a $2.
3 billion valuation, confirming TechCrunch’s previous reporting.
The round brings General Intuition s total disclosed funding to $454 million, after the $134 million round it raised at launch last October.
The startup was spun out of de Witte’s other company, Medal, which allows gamers to upload and share video game clips.
The hundreds of millions of hours of uploaded gameplay provided the initial dataset to train General Intuition s model in spatial-temporal reasoning — or understanding how to move through space and time.
But the key ingredient wasn’t the gameplay footage; it was the action labels embedded in those clips: records of exactly what buttons a player pressed and when.
Most competitors, de Witte says, are trying to infer actions from video alone, which he argues is insufficient.
“We view this as just the next stage of future pre-training,” de Witte said.
“We have a single model that can respond to Fortnite information on the screen and take action, but also to real-world dynamics in a way that an LLM could never.
” At one point, de Witte set me up with a laptop running General Intuition’s world model, a simulated environment generated frame-by-frame rather than rendered by a traditional game engine.
As I often do when testing world models , I walked straight into a series of walls.
In other demos I’ve tried, the agents you control sometimes pass right through, but this one didn’t.
From the millions of hours of gameplay, it somehow learned that walls are walls, ladders are for scaling, and shadows lengthen as the sun moves.
For General Intuition, this world model isn’t the product; it’s the training environment (referred to as “the gym” internally).
The company ultimately wants to sell the agentic model itself, and de Witte argues that the action data embedded in gameplay helps the model discern the “self” from the “environment” in a way that gives it a richer understanding of causality.
Impressive though General Intuition’s technology appears in demos, the company isn’t the only one trying to crack this problem.
Moreover, getting such a model to hold up in the physical world, at scale, hasn’t yet been done.
Most approaches of this kind require enormous amounts of real-world data that s gathered slowly and expensively.