Changelog
Version 0.0.4 (TBD)
Bug Fixes:
Fixed deprecated
local_dir_use_symlinksparameter in HuggingFace downloads (removes deprecation warnings)Fixed Windows CI test failures related to temporary file permissions in encoding detection tests
Properly close temporary files before deletion to support Windows file locking behavior
Breaking Changes:
Removed redundant
v1.1-zh-hfmodel variant - Usemodel_variant="v1.1-zh"withmodel_source="huggingface"insteadChanged cache directory structure to
~/.cache/pykokoro/{models|voices}/{source}/{variant}/Removed deprecated functions:
download_model_hf_v11zh(),download_voices_hf_v11zh(),download_all_models_hf_v11zh()Function signatures updated:
download_model(),download_voice(),download_all_voices(), anddownload_all_models()now requirevariantparameterPath helper functions
get_model_dir()andget_voices_dir()now requiresourceandvariantparametersAPI Simplification: Replaced
split_modeandtrim_silenceparameters with singlepause_modeparameter inKokoro.create()
Migration Guide:
For users upgrading from v0.0.3:
# Old (v0.3.x) - NO LONGER WORKS
kokoro = Kokoro(model_variant="v1.1-zh-hf")
download_model_hf_v11zh(quality="fp16")
# New (v0.4.0+)
kokoro = Kokoro(
model_source="huggingface",
model_variant="v1.1-zh"
)
download_model(variant="v1.1-zh", quality="fp16")
# Old pause control (v0.3.x) - NO LONGER WORKS
audio, sr = kokoro.create(
text,
voice="af_bella",
split_mode="clause",
trim_silence=True,
pause_clause=0.25
)
# New pause control (v0.4.0+)
# Default: TTS controls pauses naturally
audio, sr = kokoro.create(text, voice="af_bella")
# Manual pause control
audio, sr = kokoro.create(
text,
voice="af_bella",
pause_mode="manual", # PyKokoro controls pauses precisely
pause_clause=0.25,
pause_sentence=0.5,
pause_paragraph=1.0
)
pause_mode Parameter:
The new pause_mode parameter simplifies pause control:
pause_mode="tts"(default): TTS generates pauses naturally. Best for most content.pause_mode="manual": PyKokoro controls pauses precisely. Best for podcasts, voice switching, and precise timing.pause_mode="auto": PyKokoro inserts pauses at sentence/paragraph boundaries and trims silence.
Improvements:
Unified path structure across all model sources and variants
Consolidated duplicate download functions for cleaner API
Shared configuration files between HuggingFace and GitHub sources
Improved code maintainability with consistent path handling
All quantization levels supported for v1.1-zh: fp32, fp16, q8, q8f16, q4, q4f16, uint8, uint8f16
Simplified API with clearer parameter semantics
New Features:
Added support for HuggingFace Kokoro v1.1-zh model (
onnx-community/Kokoro-82M-v1.1-zh-ONNX)Added 103 voices for v1.1-zh variant
Voice files automatically combined into efficient .npz format
Progress callbacks for voice downloads
Added
pause_modeparameter for simplified pause control, including"auto"boundary pauses
Documentation:
Updated advanced features guide with unified variant usage
Updated
examples/hf_v11zh_demo.pydemonstration scriptAdded migration guide for v0.4.0 breaking changes
Updated all documentation to use new
pause_modeAPI
Version 0.0.3 (2026-01-07)
Major Refactoring:
Extracted internal manager classes for better code organization
Reduced codebase complexity by ~706 lines (12% reduction)
Improved maintainability with better separation of concerns
100% backward compatibility maintained - no breaking changes
New Internal Classes:
Added
OnnxSessionManagerclass for ONNX Runtime session managementAdded
VoiceManagerclass for voice loading and blending operationsAdded
AudioGeneratorclass for audio generation pipelineAdded
MixedLanguageHandlerclass for automatic language detectionAdded
PhonemeDictionaryclass for custom word-to-phoneme mappings
Code Quality:
Reduced
onnx_backend.pyby 436 linesReduced
tokenizer.pyby 270 linesAdded comprehensive test coverage for new manager classes
All pre-commit hooks passing (ruff, ruff-format)
98.7% test pass rate (312/316 tests)
Architecture Improvements:
Delegate pattern implementation for backward compatibility
Better separation of session management, voice handling, and audio generation
Improved modularity for easier testing and maintenance
Enhanced error handling and validation
Documentation:
Added API documentation for new internal manager classes
Added internal architecture section to advanced features guide
Updated changelog with refactoring details
Version 0.0.1 (2025-01-06)
Breaking Changes:
Removed
PhonemeBookclass - moved to separate ebook packageRemoved
PhonemeChapterclass - moved to separate ebook packageRemoved
create_phoneme_book_from_chapters()functionRemoved
FORMAT_VERSIONconstantDeleted
examples/phoneme_export.pyexample
New Features:
Added
split_and_phonemize_text()function for standalone text processingAdded
enable_pausesparameter tocreate()method for pause marker supportAdded pause markers:
(.),(..),(...)for controlling speech pauses (DEPRECATED - use SSMD break syntax instead:...c,...s,...p)Added
pause_short,pause_medium,pause_longparameters for custom pause durationsAdded
split_modeparameter tocreate()for intelligent text splitting (DEPRECATED in v0.4.0 - usepause_modeinstead)Added
trim_silenceparameter for removing silence between segments (DEPRECATED in v0.4.0 - usepause_mode="manual"instead)Added
pause_afterfield toPhonemeSegmentclass
Improvements:
Refactored
_process_with_split_mode()to use standalone functionImproved phoneme-based generation with automatic length checking
Enhanced documentation with comprehensive examples
Better error handling and validation
Optimized text splitting for long passages
Bug Fixes:
Fixed floating point precision in pause duration tests
Improved backward compatibility for PhonemeSegment serialization
Better handling of empty and whitespace-only text
Documentation:
Added complete Sphinx documentation
Added quick start guide
Added installation guide
Added basic usage guide
Added advanced features guide
Added comprehensive examples
Added API reference
Version 0.0.1 (Initial Release)
Initial Features:
Text-to-speech synthesis using Kokoro model
Support for 54 voices (v1.0) across multiple languages
Support for English (US/GB), Spanish, French, German, Italian, Portuguese, Hindi, Japanese, Korean, Chinese
Voice blending capabilities
Phoneme-based generation
GPU acceleration support (CUDA/ROCm)
Model quality options (fp16, q8, q6)
Speed control
Basic tokenizer functionality
Audio trimming utilities
Configuration management
Model and voice downloading
PhonemeBook and PhonemeChapter classes for document processing
spaCy integration for sentence splitting
Mixed language support