Speech to text

Transcribe meetings — with who said what

Whisper-large with speaker separation on our own hardware in Sweden. Upload audio, get timestamped text split by speaker. OpenAI-compatible audio endpoint.

Speaker separation

WhisperX + diarization identifies who is speaking and splits the transcript per speaker with timestamps — perfect for meetings and interviews.

OpenAI-compatible audio

Call /v1/audio/transcriptions just like OpenAI Whisper. Swap base_url, keep your code.

Audio stays in Sweden

All transcription runs on our own GPUs in Sweden. No files are sent to third parties and nothing is stored after delivery.

FAQ

Which formats are supported?

Common audio formats such as mp3, wav, and m4a via an OpenAI-compatible transcription endpoint.

Can I see who said what?

Yes. With speaker separation (diarization), the transcript is split per speaker with timestamps.

Where is the audio processed?

On our own hardware in Sweden. The audio never leaves the country and is not stored after the transcript is delivered.

Transcribe your first meeting

Upload an audio file and get text per speaker.

Open voice.staik.se