Quick Start Guide

This guide will get you up and running with PyKokoro in just a few minutes.

First Steps

  1. Install PyKokoro

    pip install pykokoro
    
  2. Verify Installation

    import pykokoro
    print(pykokoro.__version__)
    

Basic Usage

Generate Your First Audio

import soundfile as sf
from pykokoro import GenerationConfig, KokoroPipeline, PipelineConfig

config = PipelineConfig(
    voice="af_bella",  # American Female voice
    generation=GenerationConfig(lang="en-us"),
)
pipe = KokoroPipeline(config)
result = pipe.run("Hello! Welcome to PyKokoro text-to-speech.")

# Save to a WAV file
sf.write("hello.wav", result.audio, result.sample_rate)

That’s it! You’ve generated your first audio file.

Choosing a Voice

PyKokoro comes with 54 voices (v1.0) or 103 voices (v1.1-zh). Here are some popular ones:

from pykokoro import GenerationConfig, KokoroPipeline, PipelineConfig

# American English
pipe = KokoroPipeline(PipelineConfig(voice="af_bella"))
audio1 = pipe.run("American female voice").audio
pipe = KokoroPipeline(PipelineConfig(voice="am_adam"))
audio2 = pipe.run("American male voice").audio

# British English
pipe = KokoroPipeline(PipelineConfig(voice="bf_emma"))
audio3 = pipe.run("British female voice").audio
pipe = KokoroPipeline(PipelineConfig(voice="bm_george"))
audio4 = pipe.run("British male voice").audio

# Other languages
pipe = KokoroPipeline(
    PipelineConfig(
        voice="af_nicole",
        generation=GenerationConfig(lang="es"),
    )
)
audio5 = pipe.run("Hola, mundo").audio
pipe = KokoroPipeline(
    PipelineConfig(
        voice="af_sarah",
        generation=GenerationConfig(lang="fr"),
    )
)
audio6 = pipe.run("Bonjour le monde").audio

To see all available voices, check the README or use the voice listing examples in examples/voices.py.

Adjusting Speech Speed

Control how fast or slow the speech is:

from pykokoro import GenerationConfig, KokoroPipeline, PipelineConfig

# Normal speed (default)
pipe = KokoroPipeline(
    PipelineConfig(voice="af_bella", generation=GenerationConfig(speed=1.0))
)
audio1 = pipe.run("Normal speed").audio

# Slower (0.5x)
pipe = KokoroPipeline(
    PipelineConfig(voice="af_bella", generation=GenerationConfig(speed=0.5))
)
audio2 = pipe.run("Slower speech").audio

# Faster (1.5x)
pipe = KokoroPipeline(
    PipelineConfig(voice="af_bella", generation=GenerationConfig(speed=1.5))
)
audio3 = pipe.run("Faster speech").audio

Adding Pauses

Add natural pauses in your speech using SSMD break syntax:

from pykokoro import KokoroPipeline, PipelineConfig

text = """
Welcome to the tutorial ...c
This is a short pause ...s
And this is a longer pause ...p
These pauses make speech sound more natural.
"""

pipe = KokoroPipeline(PipelineConfig(voice="af_bella"))
result = pipe.run(text)

import soundfile as sf
sf.write("with_pauses.wav", result.audio, result.sample_rate)

Pause syntax (SSMD breaks): * ...c - Short/comma pause (0.3 seconds, default) * ...s - Medium/sentence pause (0.6 seconds, default) * ...p - Long/paragraph pause (1.0 seconds, default) * ...500ms - Custom duration pause (e.g., 500 milliseconds)

Reusing the Pipeline

Reuse a pipeline instance for multiple runs:

from pykokoro import KokoroPipeline, PipelineConfig

pipe = KokoroPipeline(PipelineConfig(voice="af_bella"))
for sentence in ["Hello", "How are you?", "Goodbye!"]:
    result = pipe.run(sentence)
    print(result.audio.shape)

Choosing spaCy Model Size (Auto)

If you use spaCy-based tokenization/splitting, you can keep language-aware auto selection and choose only the model size:

from pykokoro import (
    GenerationConfig,
    KokoroPipeline,
    PipelineConfig,
    with_spacy_model_size,
)

base = PipelineConfig(
    voice="af_bella",
    generation=GenerationConfig(lang="fr-fr"),
)
cfg = with_spacy_model_size(base, size="md")
pipe = KokoroPipeline(cfg)
result = pipe.run("Bonjour")

Processing Long Text

For long text, PyKokoro automatically handles splitting at natural boundaries:

from pykokoro import GenerationConfig, KokoroPipeline, PipelineConfig

long_text = """
This is a long passage of text. It has multiple sentences.
The text will be processed intelligently.

This is a new paragraph. It will be processed efficiently.
"""

pipe = KokoroPipeline(PipelineConfig(voice="af_bella"))
result = pipe.run(long_text)

# Or let PyKokoro insert boundary pauses
auto_pipe = KokoroPipeline(
    PipelineConfig(
        voice="af_bella",
        generation=GenerationConfig(pause_mode="auto"),
    )
)
auto_result = auto_pipe.run(long_text)

import soundfile as sf
sf.write("long_text.wav", auto_result.audio, auto_result.sample_rate)

Complete Example

Here’s a complete example putting it all together:

from pykokoro import GenerationConfig, KokoroPipeline, PipelineConfig
import soundfile as sf

def text_to_speech(text, output_file, voice="af_bella", speed=1.0):
    """Convert text to speech and save to file."""
    config = PipelineConfig(
        voice=voice,
        generation=GenerationConfig(speed=speed),
    )
    pipe = KokoroPipeline(config)
    result = pipe.run(text)
    sf.write(output_file, result.audio, result.sample_rate)
    print(f"Saved audio to {output_file}")

# Example usage
text = """
Welcome to PyKokoro! ...s

This library makes text-to-speech generation simple ...c
You can control voice, speed, and add natural pauses ...s

Enjoy creating audio content!
"""

text_to_speech(text, "welcome.wav", voice="af_bella", speed=1.0)

Next Steps

Now that you know the basics, explore: