Basic Usage
This guide covers the fundamental usage patterns of PyKokoro.
Note
PyKokoro uses KokoroPipeline as the supported API. The pipeline wraps all
stages (document parsing, splitting, G2P, and synthesis) behind one call.
Initializing the Pipeline
The main entry point is the KokoroPipeline class:
from pykokoro import GenerationConfig, KokoroPipeline, PipelineConfig
# Initialize with defaults (HuggingFace v1.0)
pipe = KokoroPipeline(PipelineConfig(voice="af_bella"))
# Specify model source and variant
pipe = KokoroPipeline(
PipelineConfig(
voice="af_bella",
model_source="huggingface",
model_variant="v1.0",
)
)
# GitHub source
pipe = KokoroPipeline(
PipelineConfig(
voice="af_bella",
model_source="github",
model_variant="v1.0",
)
)
# Custom generation settings
generation = GenerationConfig(lang="en-us", speed=1.1)
pipe = KokoroPipeline(PipelineConfig(voice="af_bella", generation=generation))
Reusing the Pipeline
Create a pipeline once and reuse it across runs:
from pykokoro import KokoroPipeline, PipelineConfig
pipe = KokoroPipeline(PipelineConfig(voice="af_bella"))
result = pipe.run("Hello!")
print(result.sample_rate)
Using Local Model Files
If you already have the ONNX model and voices files locally, pass their paths
through PipelineConfig:
from pathlib import Path
from pykokoro import GenerationConfig, KokoroPipeline, PipelineConfig
config = PipelineConfig(
voice="af_bella",
generation=GenerationConfig(lang="en-us"),
model_path=Path("/models/kokoro.onnx"),
voices_path=Path("/models/voices.bin.npz"),
)
pipe = KokoroPipeline(config)
result = pipe.run("Using local model files.")
Model Quality Options
Available quality options vary by model source and variant:
HuggingFace (Default Source):
Both v1.0 and v1.1-zh variants support:
fp32- Full precision (highest quality, largest size)fp16- Half precision (good balance)q8- 8-bit quantized (default, good balance)q8f16- 8-bit with fp16q4- 4-bit quantized (smallest, faster)q4f16- 4-bit with fp16uint8- Unsigned 8-bituint8f16- Unsigned 8-bit with fp16
GitHub v1.0:
fp32- Full precisionfp16- Half precisionfp16-gpu- GPU-optimized fp16q8- 8-bit quantized
GitHub v1.1-zh:
fp32- Full precision only
from pykokoro import KokoroPipeline, PipelineConfig
# HuggingFace v1.0 with fp16
pipe = KokoroPipeline(PipelineConfig(voice="af_bella", model_quality="fp16"))
# GitHub v1.0 with GPU optimization
pipe = KokoroPipeline(
PipelineConfig(
voice="af_bella",
model_source="github",
model_variant="v1.0",
model_quality="fp16-gpu",
)
)
Generating Speech
Basic Text-to-Speech
from pykokoro import KokoroPipeline, PipelineConfig
pipe = KokoroPipeline(PipelineConfig(voice="af_bella"))
result = pipe.run("Hello, world!")
audio = result.audio
sample_rate = result.sample_rate
Saving Audio
Using soundfile (recommended):
import soundfile as sf
from pykokoro import KokoroPipeline, PipelineConfig
pipe = KokoroPipeline(PipelineConfig(voice="af_bella"))
result = pipe.run("Hello!")
sf.write("output.wav", result.audio, result.sample_rate)
Using scipy:
from scipy.io import wavfile
from pykokoro import KokoroPipeline, PipelineConfig
pipe = KokoroPipeline(PipelineConfig(voice="af_bella"))
result = pipe.run("Hello!")
audio_int16 = (result.audio * 32767).astype("int16")
wavfile.write("output.wav", result.sample_rate, audio_int16)
Voice Selection
Voice names follow the pattern: {accent}_{gender}_{name}
Accent:
af(American Female),am(American Male),bf(British Female),bm(British Male)Gender:
f(female),m(male)Name: Specific voice identifier
Use the voice name in PipelineConfig:
from pykokoro import KokoroPipeline, PipelineConfig
pipe = KokoroPipeline(PipelineConfig(voice="bf_emma"))
result = pipe.run("Hello from the UK!")
Language Settings
PyKokoro defaults language from the voice prefix, but you can override it:
from pykokoro import GenerationConfig, KokoroPipeline, PipelineConfig
generation = GenerationConfig(lang="fr")
pipe = KokoroPipeline(PipelineConfig(voice="af_sarah", generation=generation))
result = pipe.run("Bonjour le monde")
Supported languages: en-us, en-gb, es, fr, de, it, pt, hi, ja, zh
Language-Aware spaCy Models
PyKokoro resolves spaCy model package names automatically by language when
TokenizerConfig.spacy_model="auto" (default).
Use with_spacy_model_size to set the auto-resolved model tier explicitly:
from pykokoro import (
GenerationConfig,
KokoroPipeline,
PipelineConfig,
with_spacy_model_size,
)
base = PipelineConfig(
voice="af_bella",
generation=GenerationConfig(lang="de"),
)
cfg = with_spacy_model_size(base, size="md")
# For lang="de", G2P uses de_core_news_md
pipe = KokoroPipeline(cfg)
result = pipe.run("Guten Tag")
You can still force a specific spaCy package if needed:
from pykokoro import KokoroPipeline, PipelineConfig
from pykokoro.tokenizer import TokenizerConfig
tokenizer_config = TokenizerConfig(spacy_model="fr_core_news_sm")
pipe = KokoroPipeline(
PipelineConfig(voice="af_bella", tokenizer_config=tokenizer_config)
)
Speech Speed Control
Adjust the speaking rate with GenerationConfig.speed:
from pykokoro import GenerationConfig, KokoroPipeline, PipelineConfig
generation = GenerationConfig(speed=1.5)
pipe = KokoroPipeline(PipelineConfig(voice="af_bella", generation=generation))
result = pipe.run("Fast speech")
Recommended range: 0.5 to 2.0
Pause Control
Manual Pause Markers
Add explicit pauses using SSMD break markers:
...c- Short/comma pause...s- Medium/sentence pause...p- Long/paragraph pause...500ms- Custom duration pause
from pykokoro import GenerationConfig, KokoroPipeline, PipelineConfig
text = "Hello! ...c This is a short pause. ...s And now a longer pause."
generation = GenerationConfig(pause_mode="manual")
pipe = KokoroPipeline(PipelineConfig(voice="af_bella", generation=generation))
result = pipe.run(text)
Automatic Natural Pauses
For natural rhythm, let the pipeline insert pauses at boundaries:
from pykokoro import GenerationConfig, KokoroPipeline, PipelineConfig
text = """
Artificial intelligence is transforming our world. Machine learning
models are becoming more sophisticated and accessible.
Deep learning uses neural networks with many layers.
"""
generation = GenerationConfig(
pause_mode="auto",
pause_clause=0.25,
pause_sentence=0.5,
pause_paragraph=1.0,
pause_variance=0.05,
random_seed=42,
)
pipe = KokoroPipeline(PipelineConfig(voice="af_sarah", generation=generation))
result = pipe.run(text)
Text Normalization (Say-As)
SSMD say-as syntax converts numbers, dates, and other formats:
from pykokoro import KokoroPipeline, PipelineConfig
text = "I have [123](as: cardinal) apples and [12/31/2024](as: date, format: mdy)."
pipe = KokoroPipeline(PipelineConfig(voice="af_sarah"))
result = pipe.run(text)
Error Handling
from pykokoro import KokoroPipeline, PipelineConfig
try:
pipe = KokoroPipeline(PipelineConfig(voice="invalid_voice"))
pipe.run("Hello!")
except Exception as exc:
print(f"Pipeline error: {exc}")
Batch Processing
Process multiple texts efficiently:
import soundfile as sf
from pykokoro import KokoroPipeline, PipelineConfig
texts = [
("Welcome", "welcome.wav"),
("Thank you", "thanks.wav"),
("Goodbye", "goodbye.wav"),
]
pipe = KokoroPipeline(PipelineConfig(voice="af_bella"))
for text, filename in texts:
result = pipe.run(text)
sf.write(filename, result.audio, result.sample_rate)
Next Steps
Advanced Features - Voice blending, phoneme control, and more
Examples - Real-world examples
API Reference - Complete API documentation