All the world’s a robot-staging ground for tech entrepreneurs building ‘physical AI’

PROVIDENCE, R.I. (AP) 鈥� Computer scientist Louis Castricato was in his eighth year studying large language models 鈥� the technology behind chatbots like ChatGPT and Claude 鈥� when he started to feel like he was hitting a dead end.

鈥淲e basically have passed the point of doing real fundamental LLM research,” Castricato said. 鈥淣ow it鈥檚 just applications.鈥�

The researcher quit his studies at Brown University and started a new company, called Overworld. Its ambition is in its name: AI that can understand and navigate a world, not just words.

There’s still plenty of money to be made from AI chatbots 鈥� investors are counting on it as they commit to leading developers like Anthropic and OpenAI. But a growing number of AI entrepreneurs are dedicating themselves to what they see as the next frontier: 鈥渨orld models鈥� that teach AI systems, and sometimes , how to react in a physical environment.

They include some of the field’s most prominent scientists, such as 鈥淕odmother of AI鈥� , who describes the concept of a world model as 鈥渙ne of the most important and most overloaded terms in AI today.”

Scientists are applying AI in new dimensions with 鈥榳orld models鈥�

At the heart of world model research is the idea that AI can’t be truly intelligent if it can only read a book. It also needs to read the room.

鈥淲here language models learn the statistical structure of text, world models learn the statistical structure of space and time: how light falls on a surface, how a garden looks from an angle no camera has captured, how objects respond to force and follow the laws of physics,鈥� wrote Li, founder of the San Francisco startup World Labs, in an essay published this month.

Another proponent is AI pioneer Yann LeCun, as Meta’s chief AI scientist last year to start Paris-based Advanced Machine Intelligence Labs.

鈥淲orld model is quickly becoming a buzzword,鈥� LeCun said on a recent 鈥淯nsupervised Learning鈥� podcast. He said he views it as something that enables an AI agent “to predict the consequences of its own actions.”

There are multiple ways of defining world models, often based on the technologies someone hopes to build with it 鈥� be it or a more interactive video game.

Robots can’t learn much from AI models trained on books

Training on , news articles and visual media, as AI language models have done, has led to AI assistants that are changing the nature of office-based work and some creative fields. But some proponents see limitations in generative AI models that work by repeatedly predicting the next word or pixel to produce new dialogue, images or lines of code.

Chatbots can’t pick up a coffee mug, notes Martial Hebert, dean of computer science at Carnegie Mellon University.

鈥淭here鈥檚 all the geometry of the world, the dynamic of how I move my hand, the physical interaction of the contact with the cup,鈥� Hebert said. 鈥淭his is much more complex than just predicting the next word in a sentence.鈥�

For scientists like Hebert, who has spent more than four decades researching robotics, the most useful application for world models is as a faster and cheaper path to 鈥減hysical AI” 鈥� another tech industry buzzword.

鈥淪ome people may have different definitions, but physical and embodied AI are kind of the evolution of what we used to call robotics,鈥� Hebert said in an interview. Some of the AI advances that have made chatbots so useful can also be applied to building AI with a broad enough awareness of its environment to work like a robot鈥檚 brain, he said.

鈥淚n your body and spinal cord you have a very general model of how to balance, how to walk around, and you can adapt to your knee hurting in the morning, so you now walk a little differently,” he said. “You don鈥檛 need to think about that. You have a general model somewhere in your nervous system and brain that allows your body to adapt very quickly.鈥�

Simulated worlds are drawing interest from investors

Smarter robots aren’t the only end game for world models. Castricato started Overworld last year and the tiny Rhode Island-based startup is now building video game worlds where a scene, say, of a spooky forest, can adapt as a virtual character moves through it and interacts with the objects in it.

鈥淭here鈥檚 no other world model where you can just walk through doors or where you can interact with a detailed environment like this,鈥� he said in an interview. 鈥淲e optimize for interaction above anything else.鈥�

While the near-term applications aren’t as readily apparent as AI coding tools, world model makers are attracting interest from venture capitalists like Steve Jang, co-founder and managing partner at Kindred Ventures.

The firm is investing in Overworld and other world model-focused companies, including Causal Labs, which is building AI models for weather prediction, and Extropic, which is building specialized computer chips suited to world models.

鈥淚 think that the future is many different types of models with many different philosophies and architectures,” Jang said. “I don鈥檛 think that it鈥檒l be one large, dense model to rule them all.鈥�

In her recent essay, Li sought to create a 鈥渢axonomy of world models鈥� to help sort out the confusion about the competing visions.

鈥淎 video model that produces gorgeous but physically impossible flames, a language model improvising a playable game, and a physics engine that faithfully simulates combustion all go by the same name,鈥� she wrote.

She divided world models into three categories. The most commercially viable today are 鈥渞enderers鈥� that prioritize the visual fidelity of the virtual worlds they create but can’t be trusted to teach robots much.

Then, there are 鈥渟imulators鈥� that create virtual training grounds that faithfully represent the physical structure of a world; and 鈥減lanners鈥� that try to predict what an AI agent or robot should do in an unstructured world.

鈥淎 robot that can plan is a robot that can work, and the entire industry is racing to be the one that gets there first,鈥� she wrote.

草莓传媒