Cloud AI Comparison

If you are familiar with cloud-based AI APIs (e.g. OpenAI API), this document shows the similarity and differences between these clould APIs and Leap. We will inspect this piece of Python-based OpenAI API chat completion request to figure out how to migrate it to LeapSDK. This example code is modified from OpenAI API document.

from openai import OpenAI
client = OpenAI()

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {
            "role": "user",
            "content": "Say 'double bubble bath' ten times fast.",
        },
    ],
    stream=True,
)

for chunk in stream:
    if chunk.choices:
        delta_content = chunk.choices[0].delta.get("content")
        if delta_content:
            print(delta_content, end="", flush=True)

print("")
print("Generation done!")

Loading the model

While models can be directly used on the cloud-based APIs once the API client is created, LeapSDK requires the developers to explicitly load the model before requesting the generation. It is necessary because the model will run locally. This step generally takes a few seconds depending on the model size and the device performance. On cloud API, you need to create a API client:

client = OpenAI()

In LeapSDK, you need to load the model to create a model runner.

let modelRunner = try await Leap.load(
    model: "LFM2.5-1.2B-Instruct",
    quantization: "Q4_K_M"
)

The SDK will automatically download the model if needed or use a cached copy. The return value is a “model runner” which plays the similar role of client object in the cloud API except that it carries the model weights. If the model runner is released, the app has to reload the model before requesting new generations.

Request for generation

In the cloud API calls, client.chat.completions.create will return a stream object for caller to fetch the generated contents.

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {
            "role": "user",
            "content": "Say 'double bubble bath' ten times fast.",
        },
    ],
    stream=True,
)

In LeapSDK iOS, we use generateResponse on the conversation object to obtain a Swift AsyncStream (equivalent to a Python stream) for generation. Since the model runner object contains all information about the model, we don’t need to indicate the model name in the call again.

let conversation = modelRunner.createConversation()
let stream = conversation.generateResponse(
    message: ChatMessage(
        role: .user,
        content: [.text("Say 'double bubble bath' ten times fast.")]
    )
)

// This simplified call has the exactly same effect of the above call
let stream = conversation.generateResponse(userTextMessage: "Say 'double bubble bath' ten times fast.")

Process generated contents

In clould API Python code, a for-loop on the stream object retrieves the contents.

for chunk in stream:
    if chunk.choices:
        delta_content = chunk.choices[0].delta.get("content")
        if delta_content:
            print(delta_content, end="", flush=True)

print("")
print("Generation done!")

In LeapSDK, we use a for try await loop on the Swift AsyncThrowingStream to process the content. When the completion is done, a MessageResponse.complete case will be received.

for try await response in stream {
    switch response {
    case .chunk(let text):
        print(text, terminator: "")
    case .reasoningChunk(let reasoning):
        // Handle reasoning content if needed
        break
    case .complete(let completion):
        print("")
        print("Generation done!")
        if let stats = completion.stats {
            print("Tokens: \(stats.totalTokens), Speed: \(stats.tokenPerSecond) tok/s")
        }
    default:
        break
    }
}

Task and async/await

LeapSDK API is similar to cloud-based API. It is worth noting that most LeapSDK iOS APIs are based on Swift async/await. You will need to use an async context to execute these functions. In iOS, we recommend using Task or async functions within SwiftUI views with lifecycle-aware components.

@MainActor
final class ChatViewModel: ObservableObject {
    @Published var currentResponse = ""
    private var modelRunner: ModelRunner?
    private var conversation: Conversation?

    func loadModel() async {
        do {
            modelRunner = try await Leap.load(
                model: "LFM2.5-1.2B-Instruct",
                quantization: "Q4_K_M"
            )
            conversation = modelRunner?.createConversation()
        } catch {
            print("Failed to load model: \(error)")
        }
    }

    func sendMessage(_ text: String) {
        guard let conversation else { return }

        Task {
            do {
                for try await response in conversation.generateResponse(
                    message: ChatMessage(role: .user, content: [.text(text)])
                ) {
                    switch response {
                    case .chunk(let text):
                        currentResponse += text
                    case .complete(let completion):
                        print("Generation done!")
                        if let stats = completion.stats {
                            print("Tokens: \(stats.totalTokens)")
                        }
                        currentResponse = ""
                    default:
                        break
                    }
                }
            } catch {
                print("Generation error: \(error)")
            }
        }
    }
}

Next steps

For more information, please refer to the quick start guide.

Get Started

iOS

Android

Model Bundling Service

Loading the model

Request for generation

Process generated contents

Task and async/await

Next steps

Get Started

iOS

Android

Model Bundling Service

​Loading the model

​Request for generation

​Process generated contents

​Task and async/await

​Next steps

Loading the model

Request for generation

Process generated contents

Task and async/await

Next steps