Changelog

Version 0.0.4 (TBD)

Bug Fixes:

  • Fixed deprecated local_dir_use_symlinks parameter in HuggingFace downloads (removes deprecation warnings)

  • Fixed Windows CI test failures related to temporary file permissions in encoding detection tests

  • Properly close temporary files before deletion to support Windows file locking behavior

Breaking Changes:

  • Removed redundant v1.1-zh-hf model variant - Use model_variant="v1.1-zh" with model_source="huggingface" instead

  • Changed cache directory structure to ~/.cache/pykokoro/{models|voices}/{source}/{variant}/

  • Removed deprecated functions: download_model_hf_v11zh(), download_voices_hf_v11zh(), download_all_models_hf_v11zh()

  • Function signatures updated: download_model(), download_voice(), download_all_voices(), and download_all_models() now require variant parameter

  • Path helper functions get_model_dir() and get_voices_dir() now require source and variant parameters

  • API Simplification: Replaced split_mode and trim_silence parameters with single pause_mode parameter in Kokoro.create()

Migration Guide:

For users upgrading from v0.0.3:

# Old (v0.3.x) - NO LONGER WORKS
kokoro = Kokoro(model_variant="v1.1-zh-hf")
download_model_hf_v11zh(quality="fp16")

# New (v0.4.0+)
kokoro = Kokoro(
    model_source="huggingface",
    model_variant="v1.1-zh"
)
download_model(variant="v1.1-zh", quality="fp16")

# Old pause control (v0.3.x) - NO LONGER WORKS
audio, sr = kokoro.create(
    text,
    voice="af_bella",
    split_mode="clause",
    trim_silence=True,
    pause_clause=0.25
)

# New pause control (v0.4.0+)
# Default: TTS controls pauses naturally
audio, sr = kokoro.create(text, voice="af_bella")

# Manual pause control
audio, sr = kokoro.create(
    text,
    voice="af_bella",
    pause_mode="manual",  # PyKokoro controls pauses precisely
    pause_clause=0.25,
    pause_sentence=0.5,
    pause_paragraph=1.0
)

pause_mode Parameter:

The new pause_mode parameter simplifies pause control:

  • pause_mode="tts" (default): TTS generates pauses naturally. Best for most content.

  • pause_mode="manual": PyKokoro controls pauses precisely. Best for podcasts, voice switching, and precise timing.

  • pause_mode="auto": PyKokoro inserts pauses at sentence/paragraph boundaries and trims silence.

Improvements:

  • Unified path structure across all model sources and variants

  • Consolidated duplicate download functions for cleaner API

  • Shared configuration files between HuggingFace and GitHub sources

  • Improved code maintainability with consistent path handling

  • All quantization levels supported for v1.1-zh: fp32, fp16, q8, q8f16, q4, q4f16, uint8, uint8f16

  • Simplified API with clearer parameter semantics

New Features:

  • Added support for HuggingFace Kokoro v1.1-zh model (onnx-community/Kokoro-82M-v1.1-zh-ONNX)

  • Added 103 voices for v1.1-zh variant

  • Voice files automatically combined into efficient .npz format

  • Progress callbacks for voice downloads

  • Added pause_mode parameter for simplified pause control, including "auto" boundary pauses

Documentation:

  • Updated advanced features guide with unified variant usage

  • Updated examples/hf_v11zh_demo.py demonstration script

  • Added migration guide for v0.4.0 breaking changes

  • Updated all documentation to use new pause_mode API

Version 0.0.3 (2026-01-07)

Major Refactoring:

  • Extracted internal manager classes for better code organization

  • Reduced codebase complexity by ~706 lines (12% reduction)

  • Improved maintainability with better separation of concerns

  • 100% backward compatibility maintained - no breaking changes

New Internal Classes:

  • Added OnnxSessionManager class for ONNX Runtime session management

  • Added VoiceManager class for voice loading and blending operations

  • Added AudioGenerator class for audio generation pipeline

  • Added MixedLanguageHandler class for automatic language detection

  • Added PhonemeDictionary class for custom word-to-phoneme mappings

Code Quality:

  • Reduced onnx_backend.py by 436 lines

  • Reduced tokenizer.py by 270 lines

  • Added comprehensive test coverage for new manager classes

  • All pre-commit hooks passing (ruff, ruff-format)

  • 98.7% test pass rate (312/316 tests)

Architecture Improvements:

  • Delegate pattern implementation for backward compatibility

  • Better separation of session management, voice handling, and audio generation

  • Improved modularity for easier testing and maintenance

  • Enhanced error handling and validation

Documentation:

  • Added API documentation for new internal manager classes

  • Added internal architecture section to advanced features guide

  • Updated changelog with refactoring details

Version 0.0.1 (2025-01-06)

Breaking Changes:

  • Removed PhonemeBook class - moved to separate ebook package

  • Removed PhonemeChapter class - moved to separate ebook package

  • Removed create_phoneme_book_from_chapters() function

  • Removed FORMAT_VERSION constant

  • Deleted examples/phoneme_export.py example

New Features:

  • Added split_and_phonemize_text() function for standalone text processing

  • Added enable_pauses parameter to create() method for pause marker support

  • Added pause markers: (.), (..), (...) for controlling speech pauses (DEPRECATED - use SSMD break syntax instead: ...c, ...s, ...p)

  • Added pause_short, pause_medium, pause_long parameters for custom pause durations

  • Added split_mode parameter to create() for intelligent text splitting (DEPRECATED in v0.4.0 - use pause_mode instead)

  • Added trim_silence parameter for removing silence between segments (DEPRECATED in v0.4.0 - use pause_mode="manual" instead)

  • Added pause_after field to PhonemeSegment class

Improvements:

  • Refactored _process_with_split_mode() to use standalone function

  • Improved phoneme-based generation with automatic length checking

  • Enhanced documentation with comprehensive examples

  • Better error handling and validation

  • Optimized text splitting for long passages

Bug Fixes:

  • Fixed floating point precision in pause duration tests

  • Improved backward compatibility for PhonemeSegment serialization

  • Better handling of empty and whitespace-only text

Documentation:

  • Added complete Sphinx documentation

  • Added quick start guide

  • Added installation guide

  • Added basic usage guide

  • Added advanced features guide

  • Added comprehensive examples

  • Added API reference

Version 0.0.1 (Initial Release)

Initial Features:

  • Text-to-speech synthesis using Kokoro model

  • Support for 54 voices (v1.0) across multiple languages

  • Support for English (US/GB), Spanish, French, German, Italian, Portuguese, Hindi, Japanese, Korean, Chinese

  • Voice blending capabilities

  • Phoneme-based generation

  • GPU acceleration support (CUDA/ROCm)

  • Model quality options (fp16, q8, q6)

  • Speed control

  • Basic tokenizer functionality

  • Audio trimming utilities

  • Configuration management

  • Model and voice downloading

  • PhonemeBook and PhonemeChapter classes for document processing

  • spaCy integration for sentence splitting

  • Mixed language support