Pipeline Usage and Stages
KokoroPipeline is the configurable engine behind the high-level
Kokoro class. Use it when you want to swap parsing/segmentation stages,
run custom G2P logic, or control model loading at a lower level.
Pipeline overview
The default pipeline wiring is:
doc_parser -> g2p -> phoneme_processing -> audio_generation -> audio_postprocessing
Default stage classes:
SsmdDocumentParserKokoroG2PAdapterOnnxPhonemeProcessorAdapterOnnxAudioGenerationAdapterOnnxAudioPostprocessingAdapter
If any of the audio stages are omitted, the pipeline builds a Kokoro ONNX
backend and wires the missing adapters automatically.
Quick start
from pykokoro import GenerationConfig, KokoroPipeline, PipelineConfig
config = PipelineConfig(
voice="af_bella",
generation=GenerationConfig(speed=1.0),
)
pipeline = KokoroPipeline(config)
result = pipeline.run("Hello from the pipeline.")
result.save_wav("output.wav")
# Inspect intermediates
segments = result.segments
phoneme_segments = result.phoneme_segments
# Enable trace details when needed
traced = pipeline.run("Hello", return_trace=True)
if traced.trace:
print(traced.trace.warnings)
Configuration
PipelineConfig and GenerationConfig are frozen dataclasses. Use
dataclasses.replace when you want a modified copy.
from dataclasses import replace
from pykokoro import GenerationConfig, KokoroPipeline, PipelineConfig
cfg = PipelineConfig(voice="af_bella")
faster_cfg = replace(cfg, generation=replace(cfg.generation, speed=1.2))
pipeline = KokoroPipeline(faster_cfg)
PipelineConfig fields
Core
voice: Default voice name (str) orVoiceBlendused unless SSMD metadata overrides the voice per segment.generation:GenerationConfiginstance with speed, language, pause handling, and phoneme controls.
Model and provider
model_quality:"fp32","fp16","fp16-gpu","q8","q8f16","q4","q4f16","uint8","uint8f16".Noneuses the backend default.model_source:"huggingface"or"github".model_variant:"v1.0"or"v1.1-zh".model_path: Path to a local ONNX model file. Overrides model download.voices_path: Path to a local voices file. Overrides voice download.provider: ONNX provider name ("auto","cpu","cuda","openvino","directml","coreml").provider_options: Dict of provider/session options passed to ONNX Runtime.session_options: Pre-builtonnxruntime.SessionOptions(advanced use).
Tokenizer and phoneme handling
tokenizer_config:TokenizerConfigused by SSMD parsing andkokorog2p.tokenizer_config.spacy_model: spaCy package name or"auto".tokenizer_config.spacy_model_size: package tier for auto mode ("sm","md","lg","trf"). Default:"md".espeak_config: Deprecated espeak configuration. PreferTokenizerConfig.short_sentence_config:ShortSentenceConfigfor short-sentence handling.overlap_mode:"snap"clips overlapping SSMD spans to segment bounds,"strict"drops partial spans and emits trace warnings.
Helper for auto spaCy model tier:
from pykokoro import PipelineConfig, with_spacy_model_size
cfg = PipelineConfig(voice="af_bella")
cfg = with_spacy_model_size(cfg, size="md")
Other
return_trace: IncludeTraceinAudioResultwith timings/warnings.enable_deprecation_warnings: Reserved for compatibility warnings.cache_dir: Directory for the G2P disk cache (JSON files). SetNoneto disable caching.
GenerationConfig fields
speed: Speech rate multiplier (1.0is normal).lang: Default language code for phonemization ("en-us"etc).is_phonemes: Treat input text as phoneme strings instead of raw text.pause_mode:"tts"keeps natural model pauses,"manual"trims segment silence and preserves explicit pauses,"auto"inserts pauses at sentence/paragraph boundaries and trims segment silence.pause_clause: Default pause for SSMD...cbreaks (seconds).pause_sentence: Default pause for SSMD...sbreaks (seconds).pause_paragraph: Default pause for SSMD...pbreaks (seconds).pause_variance: Stored for compatibility with theKokoroAPI. The pipeline stages do not currently apply variance.random_seed: Stored for compatibility with theKokoroAPI. The pipeline stages do not currently use the seed.enable_short_sentence: Override short sentence handling for the run.
Runtime overrides
KokoroPipeline.run accepts overrides for any PipelineConfig field. The
lang keyword is special-cased to update generation.lang for convenience.
from dataclasses import replace
from pykokoro import GenerationConfig
# Override just the language
result = pipeline.run("Bonjour", lang="fr")
# Override generation settings per call
manual = replace(
pipeline.config.generation,
pause_mode="manual",
pause_sentence=0.5,
)
result = pipeline.run("Hello...s world", generation=manual)
# Override model settings per call
result = pipeline.run("Quick test", model_quality="q8")
Stage behavior
SSMD document parser
SsmdDocumentParser uses parse_ssmd_to_segments to turn SSMD markup into
clean text plus metadata spans, pause boundaries, and sentence/paragraph
segments. It honors generation.pause_* values when converting break
strengths into durations.
Supported SSMD features include:
Break markers:
...c,...s,...p,...500msLanguage overrides:
[Bonjour](fr)Phoneme overrides:
[tomato](ph: t eh m aa t ow)Prosody markup (rate/pitch/volume) and emphasis
Voice markers:
[Hello]{voice="af_sarah"}
The parser attaches SSMD metadata to annotation spans so later stages can select per-segment voices and prosody.
Plain text sentence splitting
PlainTextDocumentParser uses the optional phrasplit package for
sentence splitting. When phrasplit is unavailable, it falls back to a
single segment. The language model is derived from generation.lang using
spaCy package naming rules (for example en_core_web_sm for English).
Split boundaries are forced at SSMD pause boundaries and at spans that contain
phoneme overrides so those overrides are kept intact. Set
PYKOKORO_DEBUG_SEGMENTS=1 to log segment offsets.
Kokoro G2P adapter
KokoroG2PAdapter uses the kokorog2p package to produce phonemes and
token IDs.
generation.langselects the G2P language.generation.is_phonemestreats input as phonemes and skips text G2P.SSMD
ph/phonemesspans override phonemes for that segment.tokenizer_configis forwarded tokokorog2p.get_g2p.spacy_model="auto"resolves per language (default sizemd).cache_direnables on-disk caching of phonemes/tokens.Long phoneme token sequences are split into batches of
MAX_PHONEME_LENGTH.
Onnx phoneme processing
OnnxPhonemeProcessorAdapter calls the ONNX backend to normalize tokens,
skip empty segments, and apply short-sentence handling.
short_sentence_configcontrols defaults for short sentence handling.generation.enable_short_sentencecan override the config per run.
Onnx audio generation
OnnxAudioGenerationAdapter generates raw audio per phoneme segment.
voiceprovides the default voice style.SSMD voice metadata (
voice/voice_name) overrides the voice per segment.generation.speedcontrols synthesis speed.
Onnx audio postprocessing
OnnxAudioPostprocessingAdapter trims silence and concatenates segments.
generation.pause_mode"set to"manual"or"auto"enables silence trimming before inserting explicit pauses.SSMD prosody metadata (rate/pitch/volume) is applied to each segment.
pause_before/pause_aftervalues from G2P are inserted between segments.
Customizing the pipeline
You can replace individual stages or use the provided no-op adapters. The showcase script demonstrates multiple wiring styles:
examples/pipeline_stage_showcase.py
Example with explicit stage wiring:
from pykokoro import GenerationConfig, KokoroPipeline, PipelineConfig
from pykokoro.onnx_backend import Kokoro
from pykokoro.stages.audio_generation.onnx import OnnxAudioGenerationAdapter
from pykokoro.stages.audio_postprocessing.onnx import OnnxAudioPostprocessingAdapter
from pykokoro.stages.doc_parsers.ssmd import SsmdDocumentParser
from pykokoro.stages.g2p.kokorog2p import KokoroG2PAdapter
from pykokoro.stages.phoneme_processing.onnx import OnnxPhonemeProcessorAdapter
cfg = PipelineConfig(
voice="af_heart",
generation=GenerationConfig(lang="en-us"),
)
kokoro = Kokoro(model_quality=cfg.model_quality)
pipeline = KokoroPipeline(
cfg,
doc_parser=SsmdDocumentParser(),
g2p=KokoroG2PAdapter(),
phoneme_processing=OnnxPhonemeProcessorAdapter(kokoro),
audio_generation=OnnxAudioGenerationAdapter(kokoro),
audio_postprocessing=OnnxAudioPostprocessingAdapter(kokoro),
)
Local model files and providers
To load local ONNX artifacts, set model_path and voices_path.
You can also select a specific execution provider.
from pathlib import Path
from pykokoro import KokoroPipeline, PipelineConfig
cfg = PipelineConfig(
voice="af_bella",
model_path=Path("/models/kokoro.onnx"),
voices_path=Path("/models/voices.bin"),
provider="cuda",
provider_options={"device_id": 0},
)
pipeline = KokoroPipeline(cfg)
result = pipeline.run("Hello from local files.")