Changelog ========= Version 0.0.4 (TBD) ------------------- **Bug Fixes:** * Fixed deprecated ``local_dir_use_symlinks`` parameter in HuggingFace downloads (removes deprecation warnings) * Fixed Windows CI test failures related to temporary file permissions in encoding detection tests * Properly close temporary files before deletion to support Windows file locking behavior **Breaking Changes:** * Removed redundant ``v1.1-zh-hf`` model variant - Use ``model_variant="v1.1-zh"`` with ``model_source="huggingface"`` instead * Changed cache directory structure to ``~/.cache/pykokoro/{models|voices}/{source}/{variant}/`` * Removed deprecated functions: ``download_model_hf_v11zh()``, ``download_voices_hf_v11zh()``, ``download_all_models_hf_v11zh()`` * Function signatures updated: ``download_model()``, ``download_voice()``, ``download_all_voices()``, and ``download_all_models()`` now require ``variant`` parameter * Path helper functions ``get_model_dir()`` and ``get_voices_dir()`` now require ``source`` and ``variant`` parameters * **API Simplification:** Replaced ``split_mode`` and ``trim_silence`` parameters with single ``pause_mode`` parameter in ``Kokoro.create()`` **Migration Guide:** For users upgrading from v0.0.3: .. code-block:: python # Old (v0.3.x) - NO LONGER WORKS kokoro = Kokoro(model_variant="v1.1-zh-hf") download_model_hf_v11zh(quality="fp16") # New (v0.4.0+) kokoro = Kokoro( model_source="huggingface", model_variant="v1.1-zh" ) download_model(variant="v1.1-zh", quality="fp16") # Old pause control (v0.3.x) - NO LONGER WORKS audio, sr = kokoro.create( text, voice="af_bella", split_mode="clause", trim_silence=True, pause_clause=0.25 ) # New pause control (v0.4.0+) # Default: TTS controls pauses naturally audio, sr = kokoro.create(text, voice="af_bella") # Manual pause control audio, sr = kokoro.create( text, voice="af_bella", pause_mode="manual", # PyKokoro controls pauses precisely pause_clause=0.25, pause_sentence=0.5, pause_paragraph=1.0 ) **pause_mode Parameter:** The new ``pause_mode`` parameter simplifies pause control: * ``pause_mode="tts"`` (default): TTS generates pauses naturally. Best for most content. * ``pause_mode="manual"``: PyKokoro controls pauses precisely. Best for podcasts, voice switching, and precise timing. * ``pause_mode="auto"``: PyKokoro inserts pauses at sentence/paragraph boundaries and trims silence. **Improvements:** * Unified path structure across all model sources and variants * Consolidated duplicate download functions for cleaner API * Shared configuration files between HuggingFace and GitHub sources * Improved code maintainability with consistent path handling * All quantization levels supported for v1.1-zh: fp32, fp16, q8, q8f16, q4, q4f16, uint8, uint8f16 * Simplified API with clearer parameter semantics **New Features:** * Added support for HuggingFace Kokoro v1.1-zh model (``onnx-community/Kokoro-82M-v1.1-zh-ONNX``) * Added 103 voices for v1.1-zh variant * Voice files automatically combined into efficient .npz format * Progress callbacks for voice downloads * Added ``pause_mode`` parameter for simplified pause control, including ``"auto"`` boundary pauses **Documentation:** * Updated advanced features guide with unified variant usage * Updated ``examples/hf_v11zh_demo.py`` demonstration script * Added migration guide for v0.4.0 breaking changes * Updated all documentation to use new ``pause_mode`` API Version 0.0.3 (2026-01-07) -------------------------- **Major Refactoring:** * Extracted internal manager classes for better code organization * Reduced codebase complexity by ~706 lines (12% reduction) * Improved maintainability with better separation of concerns * 100% backward compatibility maintained - no breaking changes **New Internal Classes:** * Added ``OnnxSessionManager`` class for ONNX Runtime session management * Added ``VoiceManager`` class for voice loading and blending operations * Added ``AudioGenerator`` class for audio generation pipeline * Added ``MixedLanguageHandler`` class for automatic language detection * Added ``PhonemeDictionary`` class for custom word-to-phoneme mappings **Code Quality:** * Reduced ``onnx_backend.py`` by 436 lines * Reduced ``tokenizer.py`` by 270 lines * Added comprehensive test coverage for new manager classes * All pre-commit hooks passing (ruff, ruff-format) * 98.7% test pass rate (312/316 tests) **Architecture Improvements:** * Delegate pattern implementation for backward compatibility * Better separation of session management, voice handling, and audio generation * Improved modularity for easier testing and maintenance * Enhanced error handling and validation **Documentation:** * Added API documentation for new internal manager classes * Added internal architecture section to advanced features guide * Updated changelog with refactoring details Version 0.0.1 (2025-01-06) -------------------------- **Breaking Changes:** * Removed ``PhonemeBook`` class - moved to separate ebook package * Removed ``PhonemeChapter`` class - moved to separate ebook package * Removed ``create_phoneme_book_from_chapters()`` function * Removed ``FORMAT_VERSION`` constant * Deleted ``examples/phoneme_export.py`` example **New Features:** * Added ``split_and_phonemize_text()`` function for standalone text processing * Added ``enable_pauses`` parameter to ``create()`` method for pause marker support * Added pause markers: ``(.)``, ``(..)``, ``(...)`` for controlling speech pauses (DEPRECATED - use SSMD break syntax instead: ``...c``, ``...s``, ``...p``) * Added ``pause_short``, ``pause_medium``, ``pause_long`` parameters for custom pause durations * Added ``split_mode`` parameter to ``create()`` for intelligent text splitting (DEPRECATED in v0.4.0 - use ``pause_mode`` instead) * Added ``trim_silence`` parameter for removing silence between segments (DEPRECATED in v0.4.0 - use ``pause_mode="manual"`` instead) * Added ``pause_after`` field to ``PhonemeSegment`` class **Improvements:** * Refactored ``_process_with_split_mode()`` to use standalone function * Improved phoneme-based generation with automatic length checking * Enhanced documentation with comprehensive examples * Better error handling and validation * Optimized text splitting for long passages **Bug Fixes:** * Fixed floating point precision in pause duration tests * Improved backward compatibility for PhonemeSegment serialization * Better handling of empty and whitespace-only text **Documentation:** * Added complete Sphinx documentation * Added quick start guide * Added installation guide * Added basic usage guide * Added advanced features guide * Added comprehensive examples * Added API reference Version 0.0.1 (Initial Release) ------------------------------- **Initial Features:** * Text-to-speech synthesis using Kokoro model * Support for 54 voices (v1.0) across multiple languages * Support for English (US/GB), Spanish, French, German, Italian, Portuguese, Hindi, Japanese, Korean, Chinese * Voice blending capabilities * Phoneme-based generation * GPU acceleration support (CUDA/ROCm) * Model quality options (fp16, q8, q6) * Speed control * Basic tokenizer functionality * Audio trimming utilities * Configuration management * Model and voice downloading * PhonemeBook and PhonemeChapter classes for document processing * spaCy integration for sentence splitting * Mixed language support