Part B: The Evolution of Large Language Models – From Next Word Prediction to AI Assistants

Introduction

The rise of Large Language Models (LLMs) like ChatGPT, Claude, and others has revolutionized how we interact with information. But how exactly do these sophisticated AI systems work, and what distinguishes one company’s approach from another? In this second part of our series with Perplexity AI CEO Aravind Srinivas, we dive into the mechanics behind LLMs, explore how they evolved, and examine the current competitive landscape. Understanding these concepts helps illuminate why AI has suddenly become so central to technological discourse and where the future opportunities might lie.

What Makes Large Language Models Work

Aravind explains that a Large Language Model is “essentially a giant neural network that’s trained on this one task of predicting the next word from the previous word, except it’s training on the whole internet.” These models consume terabytes of text from diverse sources including books, code, textbooks, web pages, and news articles.

The process starts with collecting and tokenizing the entire internet—converting text into numerical tokens that computers can process. The model, typically built on a transformer architecture, is then trained to predict what comes next in a sequence. “You feed like 4,000 words and ask it to, for each of those 4,000 words, predict the next word given the previous word,” Aravind explains.

This seemingly simple task requires massive computational resources. The model is distributed across “thousands of GPUs” and trained for “3 or 4 months” on “trillions of tokens.” The scale is unprecedented in computer science history.

However, the raw pre-trained model isn’t immediately useful for practical applications. It requires additional fine-tuning through a process called Reinforcement Learning from Human Feedback (RLHF), where the model is trained “to be a good chatbot” that produces helpful responses to human queries. This post-training phase involves collecting data for tasks like software programming, email composition, document summarization, and conversational abilities.

The Quantum Leap: What Changed in AI

When asked what fundamentally changed to make AI so much more powerful and ubiquitous in recent years, Aravind identifies several key factors:

  1. Unprecedented Scale: “A lot of compute thrown at the problem, unprecedentedly at scale.”
  2. High-Quality Data: Not just raw volume but “high-quality data tokens” carefully curated to develop specific capabilities.
  3. Human Feedback: Training models based on what humans find useful and accurate through RLHF.
  4. Focus on Practical Tasks: Training on “tasks useful to human labor like coding and summarization.”

The Economics of AI Services

When asked about the economics of AI services like Perplexity Pro ($20/month), Aravind notes that costs are constantly evolving as new open-source models emerge and proprietary API prices adjust in response. The availability of high-quality open-source models like Deep Seek allows Perplexity to offer advanced features at lower prices than competitors.

He observes that different query types have different cost structures: “Deep research stuff is actually pretty expensive to serve…but the cost per query on regular pro searches or reasoning searches go down because there’s more progress on the model side.”

Looking ahead, Aravind predicts that more agentic tasks will increase costs, but he’s “actually okay with this uncertainty in what the real margins are on consumer subscriptions” because the priority is improving user experience rather than “hyper-optimizing for margins now.”

Conclusion

The evolution of Large Language Models represents one of the most significant technological shifts of our time. What began as an academic exercise in next-word prediction has transformed into systems that can perform knowledge work across thousands of domains. The current landscape shows remarkable technical convergence among major players, with differentiation increasingly coming from user experience and the ability to perform agentic tasks.

As we move forward, the distinction between AI companies will likely emerge not from their underlying models but from how they integrate those capabilities into useful products that solve real problems. Perplexity’s approach of combining multiple specialized models with efficient infrastructure offers one vision of how this might evolve.

In the final part of our series, we’ll explore the future of AI, examining opportunities for entrepreneurs, the potential impact on various industries, and how countries like India might participate in and benefit from the AI revolution.

Leave a Reply

Your email address will not be published. Required fields are marked *

gift a book More Than a Motorcycle: The Royal Enfield Bullet Story