Improving the performance of a machine learning model by increasing its size is typically the first and most straightforward approach. As a result, language models have been getting bigger and more cloud-dependent. However, bigger ML models do not always perform better. Training larger models for better performance comes at a cost and is not the most efficient approach.

Larger ML models require powerful GPUs that only a few companies can afford: Big Tech. Yet it is expensive even for them. Alexa was close to losing $10 billion. Although Microsoft GitHub AI Co-pilot is a paid service, users cost Microsoft up to $80 per month. It’s not surprising that Microsoft has started exploring alternative approaches. Amazon has announced its plans to introduce a monthly fee for Alexa, as the cost for inference of the model in the cloud is substantial. Most enterprises cannot afford billions of dollars. Besides, the environmental cost is on all of us. This "bigger is better" approach is exclusive and expensive.

Training Compute-Optimal Large Language Models:

Training Compute-Optimal Large Language Models by Hoffman et al. of DeepMind investigates the optimal model size and number of tokens required to train a transformer model under a given compute budget. The study argues that everyone, including OpenAI, DeepMind, and Microsoft, has been training large language models with suboptimal use of computing resources because a large language model stops learning after the learning rate has fully decayed. Researchers demonstrated this by training a model, Chinchilla with 70B parameters, and showing that it outperforms Gopher with 280B parameters and GPT-3 with 175B parameters. Chinchilla was trained on three to five times as much data as the competitive models had.

Meta’s LLaMA

In February 2023, Meta released LLaMA, the first open-source LLM, which is competitive with commercial alternatives. Meta researchers claimed that the 13B parameter LLaMA model outperforms GPT-3 with 175B parameters on many NLP benchmarks. Later in July, Meta released LLaMA-2 trained on 40% more data than LLaMA and doubled the context length. LLaMA-2 with the same parameters outperformed its predecessors, e.g., Llama-2 with 13B parameters defeats LLaMA with 13B parameters. After Chinchilla, LLaMA has been another example to show the model size is not a good proxy to predict the performance of a large language model. Furthermore, solo developers took LLaMA weights and the Alpaca* training schemes to run their own LLaMA on PCs, phones, and even a Raspberry Pi microcontroller.

*Stanford researchers fine-tuned LLaMA, a small and cheap alternative to OpenAI’s GPT-3.5 (text-davinci-003), and named it Alpaca.


These results are not surprising. Picovoice has released efficient voice AI models that outperform the alternatives. For example, Leopard and Cheetah, with 20MB model sizes, are more accurate than large cloud-dependent models of Google Speech-to-Text. If you’re interested in making large language models more efficient, talk to Picovoice Consulting and leverage their expertise!

Consult an Expert