Improving the performance of a machine learning model by increasing its size is typically the first and most straightforward approach. As a result, language models have been getting bigger and more cloud-dependent. However, bigger ML models do not always perform better. Training larger models for better performance comes at a cost and is not the most efficient approach.

Larger ML models require powerful GPUs that only a few companies can afford: Big Tech. Yet it is expensive even for them. Alexa was close to losing $10 billion . Although Microsoft GitHub AI Co-pilot is a paid service, users cost Microsoft up to $80 per month . It’s not surprising that Microsoft has started exploring alternative approaches . Amazon has announced its plans to introduce a monthly fee for Alexa, as the cost for inference of the model in the cloud is substantial . Most enterprises cannot afford billions of dollars. Besides, the environmental cost is on all of us. This "bigger is better" approach is exclusive and expensive.

Training Compute-Optimal Large Language Models:

Training Compute-Optimal Large Language Models by Hoffman et al. of DeepMind investigates the optimal model size and number of tokens required to train a transformer model under a given compute budget. The study argues that everyone, including OpenAI, DeepMind, and Microsoft, has been training large language models with suboptimal use of computing resources because a large language model stops learning after the learning rate has fully decayed. Researchers demonstrated this by training a model, Chinchilla with 70B parameters, and showing that it outperforms Gopher with 280B parameters and GPT-3 with 175B parameters. Chinchilla was trained on three to five times as much data as the competitive models had.

Meta’s LLaMA

In February 2023, Meta released LLaMA, the first open-source LLM, which is competitive with commercial alternatives. Meta researchers claimed that the 13B parameter LLaMA model outperforms GPT-3 with 175B parameters on many NLP benchmarks. Later in July, Meta released LLaMA-2 trained on 40% more data than LLaMA and doubled the context length. LLaMA-2 with the same parameters outperformed its predecessors , e.g., Llama-2 with 13B parameters defeats LLaMA with 13B parameters. After Chinchilla, LLaMA has been another example to show the model size is not a good proxy to predict the performance of a large language model. Furthermore, solo developers took LLaMA weights and the Alpaca* training schemes to run their own LLaMA on PCs, phones, and even a Raspberry Pi microcontroller.

*Stanford researchers fine-tuned LLaMA, a small and cheap alternative to OpenAI’s GPT-3.5 (text-davinci-003), and named it Alpaca .

These results are not surprising. Picovoice has released efficient voice AI models that outperform the alternatives. For example, Leopard and Cheetah, with 20MB model sizes , are more accurate than large cloud-dependent models of Google Speech-to-Text. If you’re interested in making large language models more efficient, talk to Picovoice Consulting and leverage their expertise!

Consult an Expert