LFM2-8B-A1B

← Back to Text Models LFM2-8B-A1B is Liquid AI’s Mixture-of-Experts model, combining 8B total parameters with only 1.5B active parameters per forward pass. This delivers the quality of larger models with the speed and efficiency of smaller ones—ideal for on-device deployment.

HF GGUF MLX ONNX

Specifications

Property	Value
Parameters	8B (1.5B active)
Context Length	32K tokens
Architecture	LFM2 (MoE)

MoE Efficiency

8B quality, 1.5B inference cost

On-Device

Runs on phones and laptops

Tool Calling

Native function calling support

Quick Start

Transformers
llama.cpp
vLLM

Install:

pip install "transformers>=5.0.0" torch accelerate

Download & Run:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "LiquidAI/LFM2-8B-A1B"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": "What is machine learning?"}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)

output = model.generate(input_ids, max_new_tokens=512)
response = tokenizer.decode(output[0][len(input_ids[0]):], skip_special_tokens=True)
print(response)

Install:

brew install llama.cpp

Run:

llama-cli -hf LiquidAI/LFM2-8B-A1B-GGUF -c 4096 --color -i

The -hf flag downloads the model directly from Hugging Face. For other installation methods and advanced usage, see the llama.cpp guide.

Install:

pip install vllm==0.14

Run:

from vllm import LLM, SamplingParams

llm = LLM(model="LiquidAI/LFM2-8B-A1B")

sampling_params = SamplingParams(
    temperature=0.3,
    min_p=0.15,
    repetition_penalty=1.05,
    max_tokens=512,
)

output = llm.chat("What is machine learning?", sampling_params)
print(output[0].outputs[0].text)

Get Started

Models

Key Concepts

Inference

Fine-tuning

Help

Specifications

MoE Efficiency

On-Device

Tool Calling

Quick Start

Get Started

Models

Key Concepts

Inference

Fine-tuning

Help

​Specifications

MoE Efficiency

On-Device

Tool Calling

​Quick Start

Specifications

Quick Start