Skip to main content

ChatMessage

Data class that is compatible with the message object in OpenAI chat completion API.
data class ChatMessage(
  val role: Role,
  val content: List<ChatMessageContent>
  val reasoningContent: String? = null
  val functionCalls: List<LeapFunctionCall>? = null,
) {
  fun toJSONObject(): JSONObject
}

ChatMessage.fromJSONObject(obj: JSONObject): ChatMessage

Fields

  • role: The role of this message (see ChatMessage.Role).
  • content: A list of message contents. Each element is an instance of ChatMessageContent.
  • reasoningContent: The reasoning content generated by the reasoning models. Only messages generated by reasoning models will have this field. For other models or other roles, this field should be null.
  • functionCalls: Function call requests generated by the model. See Function Calling guide for more details.

toJSONObject

Return a JSONObject that represents the chat message. The returned object is compatible with ChatCompletionRequestMessage from OpenAI API. It contains 2 fields: role and content . See also: Serialization Support.

fromJSONObject

Construct a ChatMessage instance from a JSONObject. Not all JSON object variants in ChatCompletionRequestMessage of OpenAI API are acceptable. As of now, role supports user, system and assistant; content can be a string or an array.
LeapSerializationException will be thrown if the provided JSONObject cannot be recognized as a message.
See also: Serialization Support.

ChatMessage.Role

Roles of the chat messages, which follows the OpenAI API definition. It is an enum with the following values:
enum class Role(val type: String) {
  SYSTEM("system"),
  USER("user"),
  ASSISTANT("assistant"),
}
  • SYSTEM: Indicates the associated content is part of the system prompt. It is generally the first message, to provide guidance on how the model should behave.
  • USER: Indicates the associated content is user input.
  • ASSISTANT: Indicates the associated content is model-generated output.

ChatMessageContent

Data class that is compatible with the content object in OpenAI chat completion API. It is a sealed interface.
abstract interface ChatMessageContent {
  fun clone(): ChatMessageContent
  fun toJSONObject(): JSONObject
}
fun ChatMessageContent.fromJSONObject(obj: JSONObject): ChatMessageContent

data class ChatMessageContent.Text(val text: String): ChatMessageContent
data class ChatMessageContent.Image(val jpegByteArray: ByteArray): ChatMessageContent
data class ChatMessageContent.Audio(val wavByteArray: ByteArray): ChatMessageContent
  • toJSONObject returns an OpenAI API compatible content object (with a type field and the real content fields)
  • fromJSONObject receives an OpenAI API compatible content object to build a message content. Not all OpenAI content objects are accepted.
Currently, only following content types are supported:
  • Text : pure text content.
  • Image: JPEG-encoded image content.
  • Audio: WAV-encoded audio content
LeapSerializationException will be thrown if the provided JSONObject cannot be recognized as a message.

ChatMessageContent.Text

data class ChatMessageContent.Text(val text: String): ChatMessageContent
Pure text content. The content is available in the text field.

ChatMessageContent.Image

data class ChatMessageContent.Image(val jpegByteArray: ByteArray): ChatMessageContent {
  companion object {
    suspend fun fromBitmap(
      bitmap: android.graphics.Bitmap,
      compressionQuality: Int = 85,
    ): ChatMessageContent.Image
  }
}
Image content. Currently, we only support JPEG encoded data. fromBitmap helper function can create an ChatMessageContent.Image content from an Android Bitmap object, but the image will be compressed.
Only the models with vision capabilities can process image content. Sending image content to other models may result in unexpected outputs or errors.

ChatMessageContent.Audio

data class Audio(val wavByteArray: ByteArray) : ChatMessageContent {
  constructor(data: ByteArray) : this(inputAudio = InputAudio(data = data))
}
Audio content for speech recognition and audio understanding. The inference engine requires WAV-encoded audio with specific format requirements.

Audio Format Requirements

The LEAP inference engine expects WAV files with the following specifications:
PropertyRequired ValueNotes
FormatWAV (RIFF)Only WAV format is supported
Sample Rate16000 Hz (16 kHz) recommendedOther sample rates are automatically resampled to 16 kHz
EncodingPCM (various bit depths)Supports Float32, Int16, Int24, Int32
ChannelsMono (1 channel)Required - stereo audio will be rejected
Byte OrderLittle-endianStandard WAV format
Supported PCM Encodings:
  • Float32: 32-bit floating point, normalized to [-1.0, 1.0]
  • Int16: 16-bit signed integer, range [-32768, 32767] (recommended)
  • Int24: 24-bit signed integer, range [-8388608, 8388607]
  • Int32: 32-bit signed integer, range [-2147483648, 2147483647]
The inference engine only accepts WAV format. MP3, AAC, OGG, or other compressed formats are not supported and will cause errors. Audio must be converted to WAV before sending to the model.
Automatic Resampling: The inference engine automatically resamples audio to 16 kHz if provided at a different sample rate. However, for best performance and quality, provide audio at 16 kHz to avoid resampling overhead.
Mono Channel Required: The inference engine strictly requires single-channel (mono) audio. Multi-channel or stereo WAV files will be rejected with an error. Convert stereo audio to mono before sending.

Creating Audio Content from WAV Files

From a WAV file:
val audioFile = File("/path/to/audio.wav")
val wavBytes = audioFile.readBytes()
val audioContent = ChatMessageContent.Audio(wavBytes)

val message = ChatMessage(
    role = ChatMessage.Role.USER,
    content = listOf(
        ChatMessageContent.Text("What is being said in this audio?"),
        audioContent
    )
)

Creating Audio Content from Raw PCM Samples

If youโ€™re recording audio or have raw PCM data, use the FloatAudioBuffer utility to create properly formatted WAV files:
import ai.liquid.leap.audio.FloatAudioBuffer

// Collect audio samples (32-bit float PCM, normalized to -1.0 to 1.0)
val audioBuffer = FloatAudioBuffer(sampleRate = 16000)

// Add audio chunks as they arrive
audioBuffer.add(floatArrayOf(0.1f, 0.2f, 0.15f, ...))
audioBuffer.add(floatArrayOf(0.3f, 0.25f, ...))

// Create WAV-encoded bytes
val wavBytes = audioBuffer.createWavBytes()
val audioContent = ChatMessageContent.Audio(wavBytes)
FloatAudioBuffer automatically creates a valid WAV header and encodes the samples as 32-bit float PCM in a WAV container, which is compatible with the inference engine.

Recording Audio on Android

When recording audio from the device microphone, configure AudioRecord or use a library like WaveRecorder with the correct settings:
import com.github.squti.androidwaverecorder.WaveRecorder

val waveRecorder = WaveRecorder(outputFilePath)
waveRecorder.configureWaveSettings {
    sampleRate = 16000                                      // 16 kHz
    channels = android.media.AudioFormat.CHANNEL_IN_MONO    // Mono
    audioEncoding = android.media.AudioFormat.ENCODING_PCM_16BIT  // 16-bit PCM
}

waveRecorder.startRecording()
// ... wait for user to finish speaking ...
waveRecorder.stopRecording()

// Read the WAV file
val audioFile = File(outputFilePath)
val wavBytes = audioFile.readBytes()
val audioContent = ChatMessageContent.Audio(wavBytes)

Audio Duration Considerations

  • Minimum duration: At least 1 second of audio is recommended for reliable speech recognition
  • Maximum duration: Limited by the modelโ€™s context window (typically several minutes)
  • Silence: Trim excessive silence from the beginning and end for better results

Audio Output from Models

When generating audio responses (e.g., with LFM2.5-Audio-1.5B), the model outputs audio at 24 kHz sample rate:
conversation.generateResponse(userMessage)
    .onEach { response ->
        when (response) {
            is MessageResponse.AudioSample -> {
                // samples: FloatArray (32-bit float PCM)
                // sampleRate: Int (typically 24000 Hz for audio generation models)
                val samples = response.samples
                val sampleRate = response.sampleRate

                // Accumulate or play audio samples
                audioBuffer.add(samples)
            }
        }
    }
    .collect()
Note: Audio input should be 16 kHz, but audio output from generation models is typically 24 kHz. Make sure your audio playback code supports the correct sample rate.