Skip to main content

Apple releases OpenELM: small, open source AI models designed to run on-device

Apple logo superimposed over a tall elm tree in abstract expressionist colorful style
Credit: VentureBeat made with Midjourney V6

Time's almost up! There's only one week left to request an invite to The AI Impact Tour on June 5th. Don't miss out on this incredible opportunity to explore various methods for auditing AI models. Find out how you can attend here.


Just as Google, Samsung and Microsoft continue to push their efforts with generative AI on PCs and mobile devices, Apple is moving to join the party with OpenELM, a new family of open-source large language models (LLMs) that can run entirely on a single device rather than having to connect to cloud servers.

Released a few hours ago on AI code community Hugging Face, OpenELM consists of small models designed to perform text generation tasks efficiently.

There are eight OpenELM models in total – four pre-trained and four instruction-tuned – covering different parameter sizes between 270 million and 3 billion parameters (referring to the connections between artificial neurons in an LLM, and more parameters typically denote greater performance and more capabilities, though not always).

While pre-training is the way to get an LLM to produce coherent and potentially helpful text, it is mainly a predictive exercise and instruction tuning is the way to get it to respond with more relevant outputs to specific requests by a user — pre-training can result in a model simply trying to complete the prompt with additional text. For example, responding to the user’s prompt “teach me how to bake bread” with the text “in a home oven” rather than actual step-by-step instructions, the latter of which would be accomplished more through instruction tuning, as this helpful explainer from IBM notes.


June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure optimal performance and accuracy across your organization. Secure your attendance for this exclusive invite-only event.


Apple is offering the weights of its OpenELM models under what it deems a “sample code license,” along with different checkpoints from training, stats on how the models perform as well as instructions for pre-training, evaluation, instruction tuning and parameter-efficient fine-tuning.

The sample code license does not prohibit commercial usage or modification, only mandating that “if you redistribute the Apple Software in its entirety and without modifications, you must retain this notice and the following text and disclaimers in all such redistributions of the Apple Software.”

The company further notes that the models “are made available without any safety guarantees. Consequently, there exists the possibility of these models producing outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts.”

They are the latest in a surprising string of open-source AI model releases from Apple, a notoriously secretive and typically “closed” technology company which has yet to publicly announce or discuss its efforts in this domain beyond dropping the models and papers online. Back in October, the company made headlines with the quiet release of Ferret, an open-source language model with multimodal capabilities.

What do we know about OpenELM?

While OpenELM, which is short for Open-source Efficient Language Models, has just been released and is yet to be tested publicly, Apple’s listing on HuggingFace indicates that it is targeting on-device applications with the models, much like rivals Google, Samsung and Microsoft, the latter of which just this week released its Phi-3 Mini model that can run entirely on a smartphone.

In a paper describing the model family published on open access journal arXiv.org, Apple states that OpenELM’s development was “led by Sachin Mehta, with additional lead contributions from Mohammad Rastegari and Peter Zatloukal” and that the family of models “aims to empower and strengthen the open research community, facilitating future research efforts.”

Apple’s OpenELM models span four sizes – 270 million, 450 million, 1.1 billion and 3 billion parameters, each which is smaller than many high-performing models out there (they typically come in around 7 billion parameters) — and each which comes in a pre-trained and instruct version.

The models were pre-trained on public datasets of 1.8 trillion tokens from the likes of Reddit, Wikipedia, arXiv.org, and more.

Credit: Apple/arXiv.org

They are suited for running on commodity laptops or even some smartphones. Apple’s paper notes that the benchmarks were run on “a workstation with an Intel i9-13900KF CPU, equipped with 64 GB of DDR5- 4000 DRAM, and an NVIDIA RTX 4090 GPU with 24 GB of VRAM, running Ubuntu 22.04,” as well as an “Apple MacBook Pro with an M2 Max system-on-chip and 64GiB of RAM, running macOS 14.4.1.”

Interestingly, all models in the new family use a layer-wise scaling strategy to allocate parameters within each layer of the transformer model.

This, according to Apple, enables them to deliver enhanced accuracy results while being compute-efficient at the same time. The company pretrained the models using a new CoreNet library.

“Our pre-training dataset contains RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6, totaling approximately 1.8 trillion tokens,” the company notes on HuggingFace.

Respectable, but not bleeding-edge, performance

In terms of performance, the OpenLLM results shared by Apple show that the models perform fairly well, especially the 450 million parameters instruct variant.

Credit: Apple/arXiv.org

In addition, the 1.1 billion OpenELM variant “outperforms OLMo, which has 1.2 billion parameters, by 2.36% while requiring 2× fewer pre-training tokens.” OLMo is The Allen Institute for AI (AI2)‘s recently released “truly open-source, state-of-the-art large language model.”

On the ARC-C benchmark, designed to test knowledge and reasoning skills, the pre-trained OpenELM-3B variant scored with an accuracy of 42.24%. Meanwhile, on MMLU and HellaSwag, it scored 26.76% and 73.28%, respectively.

One user who has started testing the model family pointed out that it appears to be a “solid model but very aligned,” meaning its responses are not widely creative nor likely to venture into NSFW territory.

Rival Microsoft’s recently-introduced Phi-3 Mini with 3.8 billion parameters and 4k context length currently leads in this space.

According to recently shared stats, it scored 84.9% on the 10-shot ARC-C benchmark, 68.8% on the 5-shot MMLU and 76.7% on the 5-shot HellaSwag.

In the long term, OpenELM is expected to improve. It will be interesting to see how the community, which is already excited by Apple’s open-source move, puts it to work across different applications.