はじめに
初めまして!ナイトレイでエンジニアをしているデンマークから来ました、アレックスと申します。新人エンジニアで、日本語でプログラミングと生成AIについて勉強しているところです。
ナイトレイは今「CITY INSIGHT Copilot」という人流データと生成AIを掛け合わせたサービスを開発しているところで、私も今AIとLLMについて勉強している最中です。
それでMetaが最近リリースした「Llama 3」について少し調べました。
さて、早速話題に入りましょう!今回の内容は英語ですが、どうぞよろしくお願い致します。
Overview of Llama 3
On April 15, 2024, Meta launched their latest LLM (Large Language Model) Llama 3, which has gained a lot of attention because of its open source development, along with its capabilities as an LLM, which I have looked into.
Llama 3 offers a big leap in capability compared to its predecessor. With it’s current 8 and 70 billion parameter models (8B and 70B), it can process significantly more information and generate more complex and nuanced text formats than Llama 2.
As can be seen from the below benchmarks, the 8B Llama 3 model is nearly as powerful as the biggest Llama 2 model at 70B parameters.
https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md
This enhanced processing power puts Llama 3 and Meta as a competitive force against other leading LLMs in the current market. Furthermore, according to Mark Zuckerberg, they are currently training their 405 billion parameter model.
Comparing Llama 3 to other models of similar scale, also shows how both the 8B and 70B model are very competitive. The 8B Llama 3 model beats both Gemma 7B and Mistral 7B in all of the below benchmarks. If you compare the 70B Llama 3 model to Claude 3 Sonnet, it again leads on all the benchmarks, and while it is around the same in terms of benchmarks compared to Gemini Pro, Llama 3 is open source as opposed to the closed source nature of Gemini and Claude.
https://ai.meta.com/blog/meta-llama-3/
On the topic of benchmarks
While doing research on Llama 3, I also learned a great deal about how LLMs are graded, and what the benchmarks actually mean in terms of capabilities.
- MMLU (massive multitask language understanding)
This benchmark tests the general knowledge of a model, and is usually done zero-shot, which means that the model is tested on a new task. However on this benchmark, it seems that Llama 3 is tested on a 5-shot evaluation, which means that the model is tested on a task after first being exposed to five example tasks.
- GPQA (Graduate-Level Google-Proof Q&A Benchmark)
“It’s a collection of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. They are designed to be high-quality and extremely difficult.”
- HumanEval (Code Generation)
This benchmark is specifically used to test the coding capabilities of a LLM
- GSM-8K and MATH
These benchmarks both test the mathematic capabilities of an LLM in two different ways. GSM-8K tests basic math requiring multi-step reasoning, and MATH tests for competition-level math questions.
Quotes and insights from above are from the following article: https://medium.com/@ingridwickstevens/more-llm-acronyms-an-explainer-on-llama-3s-performance-benchmark-values-36722c6dcabb
Accessing and using Llama 3
Because of Llama 3’s open source nature, it is possible to use Llama 3 locally, if you have a computer that can handle it. To do this, you do need to first request access through Meta at https://llama.meta.com/llama-downloads/.
It is also possible to use Llama 3 through for example Amazon Bedrock or Hugging Face, where both the 8B and 70B models are available.
Usage in Meta AI
Perhaps the easiest way for general users to try using Llama 3 is through Meta’s own Meta AI. The release of Llama 3 also came with the announcement that Meta AI will be powered by Llama 3, and with the integration of Meta AI into Meta’s already existing services, such as Facebook, WhatsApp, Messenger and Instagram, it has a definite edge in terms of availability. With it being free to use, it also puts Meta as a direct competitor to services like OpenAI’s ChatGPT 4 which has a monthly subscription fee. While Meta AI is still only available in a limited selection of countries, it does show the potential of free AI services.
Open Source and broader implications
What differentiates Llama 3 significantly from its competitors is its commitment to open source development. Whereas for example ChatGPT, Claude and Gemini are kept closed source, Meta chose to make Llama 3 into an open source model, greatly increasing availability for users.
Furthermore, with Snowflake announcing the release of their new open source LLM “Arctic”, it seems like the trend of companies starting to offer open source models.
Llama 3’s release advances Meta’s position in the AI landscape, and also signals a potential shift in the trend of how AI technologies are developed and distributed.
References / 参照
- Introducing Meta Llama 3: The most capable openly available LLM to date
- Meta Llama 3 models are now available in Amazon Bedrock
- (Github) llama3/MODEL_CARD.md
- Meta Llama 3 - Github Main
- (Hugging Face) Welcome Llama 3 - Meta’s new open LLM
- Llama 3's Performance Benchmark Values Explained
- Snowflake Arctic: The Best LLM for Enterprise AI — Efficiently Intelligent, Truly Open
- Meta steps up AI battle with OpenAI and Google with release of Llama 3
- (Youtube) Meta claims Llama 3 is the most advanced open source AI yet l TechCrunch Minute
- (Youtube) Meta's Generative AI Head: How We Trained Llama 3
- TechCrunch Minute: Meta’s new Llama 3 models give open source AI a boost