The Mind-Blowing Experience of a Chatbot That Answers Instantly

“Speed is a feature,” Google cofounder Larry Page once told me. “Speed can drive usage as much as having bells and whistles on your product. People really underappreciate it.” I thought of Page’s remark when I tried out a chatbot from the startup Groq last week. (The name comes from grok, golden era sci-fi writer Robert Heinlein’s term for deep understanding; it’s unrelated to Elon Musk’s chatbot Grok—more on which later.) Groq makes chips optimized to speed up the large language models that have captured our imaginations and stoked our fears in the past year. You never might have thought of these LLMs as particularly slow. In fact, it’s pretty impressive that when you offer a prompt, even a relatively complicated one, a detailed answer comes within seconds, not minutes. The experience of using a chatbot that doesn’t need even a few seconds to generate a response is shocking. I typed in a straightforward request, as you do with LLMs these days: Write a musical about AI and dentistry. I had hardly stopped typing before my screen was filled with a detailed blueprint for the two-act Mysteries of the Mouth. It included a dramaturgically complete book, descriptions of a complete cast, and the order of the songs, each of which advanced the action and defined the characters. It was something a clever theater kid might have handed in at the end of a full-semester course in Outlining the Broadway Musical. It’s no longer surprising to get stuff like that from a chatbot, and Groq uses modified versions of several open source LLMs, from places like Mistral or Meta. The revelation was how quickly The Mysteries appeared, fully developed, on my screen. It took all of a second. (OpenAI’s ChapGPT, which proposed a musical called The AIgnificent Smile took around four seconds.) That speedy turnaround left me disoriented. When there’s a pause between prompt and output, the feeling is that some artificial brain is cranking away at the answer, which comes as the result of gobs of computation—a process similar to human thought but faster. But when the answers just … show up, you get a different feeling. Was the musical there all along? Did Groq have a day pass to all possible versions of the multiverse? When I described my impressions to Jonathan Ross, the CEO and primary inventor of Groq’s hardware, he was delighted. Formerly he was at Google, where he was the key inventor of its Tensor Processing Unit AI chips that have helped the company make leaps in AI research and products. On a Zoom with him, I asked how Groq worked; he went straight to the source and asked the chatbot powered by his creation. This is usually an annoying ploy in an interview, but since Groq is super fast I tolerated it, listening as the model explained that graphics chips like Nvidia’s, which work in parallel, are ideal for delivering images. Think of a lot of people filling in a paint-by-numbers picture at the same time, the bot said. But they’re not as efficient in crunching through or generating language, which proceeds sequentially. Ross then cut Groq off, asking what would happen if you put an entire LLM into the memory onboard a chip. “Oh, you’re talking about something like a dedicated AI chip?”said Groq, reacting as nimbly as an alert human conversant. “That’s actually a thing! If you had a chip specifically designed for large language models, and you could program it, you could potentially make it faster.”

Leave a ReplyCancel Reply