Rabbit's Jesse Lyu on the nature of startups: 'Grow faster, or die faster,' just don't give up

In This Article:

Rabbit co-founder and CEO Jesse Lyu isn't afraid of death … the death of the company, at least. He told TechCrunch that the company is a startup whose fortunes may be swayed by the whims of multibillion-dollar rivals — but that's no reason to give up and go home.

Appearing onstage at StrictlyVC LA, Lyu explained his rather philosophical approach to the threat of Google, Microsoft, or Apple coming to crush them. (Quotes have been lightly edited for clarity.)

Rabbit's r1, the pocket AI assistant that attracted considerable hype after its debut at CES, is certainly an original proposal. Half the size of a phone, the device acts strictly as a voice-powered assistant but is able to remotely operate your apps and perform complex actions as well as answer questions and carry on a conversation like ChatGPT. He described the two parts as "intent" and "action."

"I had this vision many years ago, actually 10 years ago, but the technology wasn't ready. This is the first time in history that a device like this is actually possible," said Lyu.

He explained that he had been intrigued by the capabilities of LLMs to understand language and intent and that with the apparent versatility of transformer-based systems, it was natural to try to get them to perform actions as well.

Can a striking design set rabbit’s r1 pocket AI apart from a gaggle of virtual assistants?

"We immediately tried using super-prompts to get this language model to do things, and the result was very miserable," he recalled. "There's a demo from another company to use an LLM to go to MrBeast's latest YouTube video and leave a comment. Yes, in theory, language models can do that. But it would cause you to have to literally watch your screen doing that step by step. And it takes roughly around two to three minutes to finish one task like that. We just don't think that can convert into a good end user experience."

Their solution is the "large action model," which is trained on hours and hours of actual users interacting with popular apps: "Spotify, Uber, Expedia, DoorDash, you name it. We have the top 800 highest frequency apps. Then we set up this neural symbolic network and ask this AI, which now we call large action model, to review those clips, but frame by frame. The idea is that symbolically, the AI will be eventually smart enough to extract all the buttons, all the elements, and then we can basically build a logic to automate."

The rabbit r1 in use. Hand model: Chris Velazco of the Washington Post. Image Credits: Devin Coldewey / TechCrunch