Meta's Llama 3.1 405B Large Language Model: How It Stacks Up Against ChatGPT, Mistral, and Claude

Meta's Large Language Model

When Mark Zuckerberg isn't wake surfing at his Lake Tahoe mansion, sunburned and waving the American flag, he's battling Google and OpenAI for artificial intelligence supremacy. Yesterday, Meta released its biggest and most powerful large language model ever, Llama 3.1, which also happens to be free and arguably open source. This model took months to train on 16,000 Nvidia H100 GPUs, a process likely costing hundreds of millions of dollars and using enough electricity to power a small country. The end result is a massive 405 billion parameter model with a 128,000 token context length, and according to benchmarks, it is mostly superior to OpenAI's GPT-4 and even beats Claude 3.5 Sonet on some key benchmarks. However, benchmarks can be misleading, and the only way to truly assess a new model is to test it out in real-world scenarios.

The Code Report: Testing Llama 3.1

Today, we'll try out Llama 3.1 Heavy and see if it actually delivers on its promises. It is July 24th, 2024, and AI hype has died down significantly in recent months. Llama 3.1, however, is a model that cannot be ignored. It comes in three sizes: 8B, 70B, and 405B, where "B" refers to billions of parameters—the variables the model uses to make predictions. Generally, more parameters can capture more complex patterns, but more parameters don't always guarantee a better model. GPT-4 is rumored to have over 1 trillion parameters, but the true numbers from companies like OpenAI and Anthropic remain unknown.

The cool thing about Llama is that it is open source—well, kind of. You can monetize it as long as your app doesn't have 700 million monthly active users; in that case, you need to request a license from Meta. What's not open source is the training data, which might include your blog, GitHub repos, all your Facebook posts from 2006, and maybe even your WhatsApp messages. However, we can take a look at the actual code used to train this model, which is only 300 lines of Python and PyTorch, along with a library called FairScale to distribute training across multiple GPUs. It’s a relatively simple decoder-only transformer as opposed to the mixture of experts approach used in many other big models like its biggest open-source rival, Mistral.

Open Source and Accessibility

Most importantly, the model weights are open, and that's a huge win for developers building AI-powered apps. Now, instead of paying hefty fees to use the GPT-4 API, you can self-host your own model and pay a cloud provider to rent some GPUs. However, self-hosting the big model isn't cheap. I used Olama to download it and use it locally, but the weights weigh 230 GB, and even with an RTX 4090, I wasn't able to run Llama 3.1. The good news is that you can try it for free on platforms like Meta or Nvidia's Playground.

Initial Impressions and Comparisons

Initial feedback from the AI community is mixed: while the smaller Llama models are quite impressive, the 405B model has been somewhat disappointing. The real power of Llama is that it can be fine-tuned with custom data, and in the near future, we may see some amazing uncensored fine-tuned models like Dolphin.

In my tests, Llama 3.1 405B struggled with certain tasks. For instance, it failed to build a Svelte 5 web application with Runes, a new yet-to-be-released feature. The only model I've seen do this correctly in a single shot is Claude 3.5 Sonet. In terms of coding, Llama 3.1 is decent but still clearly behind Claude. However, in creative writing and poetry, it performed well, though not the best I've seen.

Reflecting on the current state of AI, it's fascinating that multiple companies have trained massive models with immense computational resources, yet they're all plateauing at the same level of capability. OpenAI made a significant leap from GPT-3 to GPT-4, but since then, advancements have been incremental. Last year, Sam Altman of OpenAI practically begged for government regulation to protect humanity from AI, yet we haven't seen the apocalyptic Skynet scenario he warned about. AI hasn't even replaced programmers yet. It's like the transition from propeller planes to jet engines, with no leap to light-speed engines in sight.

Meta's Unique Position

Despite the skepticism, Meta seems to be the only big tech company keeping it real in the AI space. While there might be an ulterior motive hidden somewhere, Llama is a significant step forward for AI development and accessibility. This has been the Code Report, thanks for reading, and see you in the next one.