Advanced Features
=================

This guide covers advanced features of PyKokoro for power users.

.. note::

   Use ``KokoroPipeline`` as the supported interface. Legacy ``Kokoro`` snippets
   can be updated by replacing ``kokoro.create`` with ``pipe.run``.

Voice Blending
--------------

Create custom voices by blending multiple voices together.

Basic Voice Blending
~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from pykokoro import Kokoro, VoiceBlend

   with Kokoro() as kokoro:
       # Blend two voices equally
       blend = VoiceBlend.parse("af_bella + af_sarah")

       audio, sr = kokoro.create(
           "This is a blended voice",
           voice=blend
       )

Weighted Blending
~~~~~~~~~~~~~~~~~

Control the contribution of each voice:

.. code-block:: python

   from pykokoro import Kokoro, VoiceBlend

   with Kokoro() as kokoro:
       # 70% bella, 30% sarah
       blend = VoiceBlend.parse("af_bella*0.7 + af_sarah*0.3")

       audio, sr = kokoro.create(
           "Weighted blend",
           voice=blend
       )

       # Percentage notation (normalized automatically)
       blend2 = VoiceBlend.parse("af_bella*70% + af_sarah*30%")

Multiple Voice Blending
~~~~~~~~~~~~~~~~~~~~~~~~

Blend more than two voices:

.. code-block:: python

   from pykokoro import Kokoro, VoiceBlend

   with Kokoro() as kokoro:
       # Three-way blend
       blend = VoiceBlend.parse(
           "af_bella*0.5 + af_sarah*0.3 + af_nicole*0.2"
       )

       audio, sr = kokoro.create(
           "Complex blend",
           voice=blend
       )

Creating Blended Voice Programmatically
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from pykokoro import Kokoro, VoiceBlend

   # Create blend object directly
   blend = VoiceBlend(
       voices=["af_bella", "af_sarah", "am_adam"],
       weights=[0.4, 0.4, 0.2]
   )

   with Kokoro() as kokoro:
       audio, sr = kokoro.create(
           "Custom blend",
           voice=blend
       )

Phoneme-Based Generation
-------------------------

For precise control, generate speech directly from phonemes.

Using create_from_phonemes()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from pykokoro import Kokoro

   with Kokoro() as kokoro:
       # Get phonemes for text
       phonemes = kokoro.tokenizer.phonemize("Hello, world!")

       # Generate from phonemes
       audio, sr = kokoro.create_from_phonemes(
           phonemes,
           voice="af_bella",
           speed=1.0
       )

Text to Phonemes
~~~~~~~~~~~~~~~~

Convert text to phonemes:

.. code-block:: python

   from pykokoro import Kokoro

   with Kokoro() as kokoro:
       # Get phonemes
       phonemes = kokoro.tokenizer.phonemize(
           "Hello, world!",
           lang="en-us"
       )
       print(f"Phonemes: {phonemes}")

       # Get detailed phoneme info
       result = kokoro.tokenizer.text_to_phonemes(
           "Hello",
           lang="en-us",
           with_words=True
       )
       print(result)

PhonemeSegment Processing
~~~~~~~~~~~~~~~~~~~~~~~~~~

Work with phoneme segments for batch processing:

.. code-block:: python

   from pykokoro import phonemize_text_list, create_tokenizer

   tokenizer = create_tokenizer()

   texts = ["Hello", "World", "How are you?"]
   segments = phonemize_text_list(texts, tokenizer, lang="en-us")

   for segment in segments:
       print(f"Text: {segment.text}")
       print(f"Phonemes: {segment.phonemes}")
       print(f"Tokens: {segment.tokens}")

Advanced Text Splitting
------------------------

Split and Phonemize in One Step (Legacy API)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For advanced text processing with automatic splitting and phoneme generation.

The ``split_and_phonemize_text`` function intelligently handles long text by:

1. Splitting text using your chosen mode (paragraph, sentence, or clause)
2. Phonemizing each segment
3. **Automatically cascading to finer split modes** if segments exceed the phoneme limit
4. Only truncating as last resort (when even individual words are too long)

**Cascade behavior:**

- ``paragraph`` mode → ``sentence`` → ``clause`` → ``word`` → truncate
- ``sentence`` mode → ``clause`` → ``word`` → truncate
- ``clause`` mode → ``word`` → truncate
- ``word`` mode → truncate (with warning)

.. code-block:: python

   from pykokoro import split_and_phonemize_text, create_tokenizer

   tokenizer = create_tokenizer()

   long_text = """
   This is the first sentence. This is the second.

   This is a new paragraph.
   """

   # The function automatically ensures all segments stay within limit
   segments = split_and_phonemize_text(
       long_text,
       tokenizer,
       lang="en-us",
       split_mode="sentence",  # Will cascade to clause/word if needed
       max_phoneme_length=510  # Kokoro's maximum
   )

   for segment in segments:
       print(f"Paragraph {segment.paragraph}, Sentence {segment.sentence}")
       print(f"Text: {segment.text}")
       print(f"Phonemes: {segment.phonemes[:50]}...")
       # segment.phonemes is guaranteed to be <= 510

Split Modes in Detail (Legacy API)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Paragraph Mode:**

.. code-block:: python

   segments = split_and_phonemize_text(
       text,
       tokenizer,
       split_mode="paragraph"  # Splits on double newlines
   )

**Sentence Mode:**

Requires spaCy for sentence boundary detection:

.. code-block:: python

   segments = split_and_phonemize_text(
       text,
       tokenizer,
       split_mode="sentence"  # Splits on sentence boundaries
   )

**Clause Mode:**

Splits on both sentences and commas for finer control:

.. code-block:: python

   segments = split_and_phonemize_text(
       text,
       tokenizer,
       split_mode="clause"  # Splits on sentences and commas
   )

Pause Mode
~~~~~~~~~~

The modern ``Kokoro.create()`` API uses ``pause_mode`` for controlling pause behavior:

.. code-block:: python

   from pykokoro import Kokoro

   with Kokoro() as kokoro:
       # Default: TTS controls pauses naturally
       audio, sr = kokoro.create(
           long_text,
           voice="af_sarah"
       )

       # Auto mode: PyKokoro inserts boundary pauses
       audio, sr = kokoro.create(
           long_text,
           voice="af_sarah",
           pause_mode="auto",
           pause_clause=0.25,           # Clause boundaries
           pause_sentence=0.5,          # Sentence boundaries
           pause_paragraph=1.0,         # Paragraph boundaries
           pause_variance=0.05,         # Natural variance (Gaussian)
           random_seed=42               # For reproducible results
       )

       # Manual mode: PyKokoro trims and preserves explicit pauses
       audio, sr = kokoro.create(
           long_text,
           voice="af_sarah",
           pause_mode="manual",         # Preserve explicit pauses
           pause_clause=0.25,           # Clause boundaries
           pause_sentence=0.5,          # Sentence boundaries
           pause_paragraph=1.0,         # Paragraph boundaries
           pause_variance=0.05,         # Natural variance (Gaussian)
           random_seed=42               # For reproducible results
       )

**Pause Variance Details:**

The ``pause_variance`` parameter adds Gaussian (normal distribution) variance to
pause durations, making speech sound more natural:

* **0.0** - No variance, exact pause durations
* **0.05** - Default, ±100ms at 95% confidence interval
* **0.1** - Higher variance, ±200ms at 95% confidence

The variance ensures that pauses are never exactly the same length, mimicking
natural human speech rhythm.

**Reproducibility:**

Use ``random_seed`` for consistent output across runs:

.. code-block:: python

   # Same output every time
   audio1, sr = kokoro.create(text, voice="af_sarah",
                              pause_mode="auto",
                              random_seed=42)

   audio2, sr = kokoro.create(text, voice="af_sarah",
                              pause_mode="auto",
                              random_seed=42)

   # audio1 and audio2 are identical

   # Different output each time
   audio3, sr = kokoro.create(text, voice="af_sarah",
                              pause_mode="auto",
                              random_seed=None)  # or omit parameter

Custom Warning Callbacks
~~~~~~~~~~~~~~~~~~~~~~~~~

Handle warnings during phoneme generation:

.. code-block:: python

   from pykokoro import split_and_phonemize_text, create_tokenizer

   def my_warning_handler(message):
       print(f"WARNING: {message}")

   tokenizer = create_tokenizer()
   segments = split_and_phonemize_text(
       very_long_text,
       tokenizer,
       warn_callback=my_warning_handler
   )

GPU Acceleration
----------------

Automatic GPU Detection
~~~~~~~~~~~~~~~~~~~~~~~

PyKokoro automatically uses GPU if available:

.. code-block:: python

   from pykokoro import Kokoro, get_device

   # Check available device
   device = get_device()
   print(f"Using device: {device}")

   # Kokoro will use GPU automatically
   with Kokoro() as kokoro:
       audio, sr = kokoro.create("Hello!", voice="af_bella")

Forcing Specific Device
~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from pykokoro import Kokoro

   # Force CPU
   kokoro_cpu = Kokoro(device="cpu")

   # Force CUDA (NVIDIA)
   kokoro_gpu = Kokoro(device="cuda")

   # Force ROCm (AMD)
   kokoro_rocm = Kokoro(device="rocm")

GPU Information
~~~~~~~~~~~~~~~

.. code-block:: python

   from pykokoro import get_gpu_info

   info = get_gpu_info()
   print(f"Device: {info['device']}")
   print(f"Providers: {info['providers']}")

Custom Model Paths
------------------

Model Sources
~~~~~~~~~~~~~

PyKokoro supports multiple model sources:

**HuggingFace (Default):**

.. code-block:: python

   from pykokoro import Kokoro

   # Default: HuggingFace with 54 voices
   kokoro = Kokoro(model_source="huggingface", model_quality="fp32")

**HuggingFace v1.0 (Default - 54 voices, 8 quality options):**

.. code-block:: python

   # Default: HuggingFace v1.0
   kokoro = Kokoro(model_quality="q8")  # Recommended default

**HuggingFace v1.1-zh (103 voices, 8 quality options):**

.. code-block:: python

   # HuggingFace v1.1-zh with English + Chinese voices
   # Supports all quantization levels: fp32, fp16, q8, q8f16, q4, q4f16, uint8, uint8f16
   kokoro = Kokoro(
       model_variant="v1.1-zh",
       model_quality="q8"  # All qualities available
   )

   # Use English voices
   audio, sr = kokoro.create(
       "Hello world!",
       voice="af_maple",  # v1.1-zh English voice
       lang="en-us"
   )

**GitHub v1.0 (54 voices, 4 quality options):**

.. code-block:: python

   # GitHub v1.0 with GPU-optimized fp16
   kokoro = Kokoro(
       model_source="github",
       model_variant="v1.0",
       model_quality="fp16-gpu"  # Options: fp32, fp16, fp16-gpu, q8
   )

**GitHub v1.1-zh (103 voices, fp32 only):**

.. code-block:: python

   # GitHub v1.1-zh with English + Chinese voices
   kokoro = Kokoro(
       model_source="github",
       model_variant="v1.1-zh",
       model_quality="fp32"  # Only fp32 available
   )

   # Use English voices
   audio, sr = kokoro.create(
       "Hello world!",
       voice="af_maple",  # v1.1-zh English voice
       lang="en-us"
   )

**Note:** Chinese text generation is currently in development. Use English voices from
v1.1-zh with English text for now.

Use Custom Model Files
~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from pykokoro import Kokoro

   kokoro = Kokoro(
       model_path="/path/to/custom/model.onnx",
       voices_path="/path/to/voices.bin"
   )

Download Models Manually
~~~~~~~~~~~~~~~~~~~~~~~~~

**HuggingFace Models:**

.. code-block:: python

   from pykokoro import download_model, download_voice, download_all_models

   # Download specific model quality (v1.0 by default)
   download_model(variant="v1.0", quality="q8")

   # Download specific voice
   download_voice(voice_name="af_bella", variant="v1.0")

   # Download all models
   download_all_models(variant="v1.0")

**GitHub Models:**

.. code-block:: python

   from pykokoro.onnx_backend import (
       download_model_github,
       download_voices_github,
       download_all_models_github
   )

   # Download GitHub v1.0 model
   download_model_github(variant="v1.0", quality="fp16-gpu")

   # Download GitHub v1.0 voices
   download_voices_github(variant="v1.0")

   # Download all GitHub v1.1-zh files
   download_all_models_github(
       variant="v1.1-zh",
       quality="fp32",
       progress_callback=lambda msg: print(msg)
   )

**HuggingFace v1.1-zh Models:**

.. code-block:: python

   from pykokoro import (
       download_model,
       download_all_voices,
       download_all_models,
       download_config
   )

   # Download HuggingFace v1.1-zh model (with quantization)
   download_model(variant="v1.1-zh", quality="q8")

   # Download all 103 voices for v1.1-zh
   def progress(voice_name, current, total):
       print(f"Downloading {current}/{total}: {voice_name}")

   download_all_voices(variant="v1.1-zh", progress_callback=progress)

   # Download configuration for v1.1-zh
   download_config(variant="v1.1-zh")

   # Download everything (model + config + all voices)
   download_all_models(
       variant="v1.1-zh",
       quality="q8",
       progress_callback=lambda msg: print(msg)
   )

**Available Quality Options by Source:**

* **HuggingFace v1.0**: fp32, fp16, q8, q8f16, q4, q4f16, uint8, uint8f16
* **HuggingFace v1.1-zh**: fp32, fp16, q8, q8f16, q4, q4f16, uint8, uint8f16
* **GitHub v1.0**: fp32, fp16, fp16-gpu, q8
* **GitHub v1.1-zh**: fp32 only

Get Model Paths
~~~~~~~~~~~~~~~

.. code-block:: python

   from pykokoro import get_model_path, get_voice_path

   # HuggingFace model paths
   model_path = get_model_path(quality="q8")
   voice_path = get_voice_path()

   print(f"Model: {model_path}")
   print(f"Voices: {voice_path}")

   # GitHub model paths are stored in variant-specific subdirectories:
   # ~/.cache/pykokoro/models/onnx/v1.0/kokoro-v1.0.onnx
   # ~/.cache/pykokoro/models/onnx/v1.1-zh/kokoro-v1.1-zh.onnx
   # ~/.cache/pykokoro/voices/v1.0/voices-v1.0.bin
   # ~/.cache/pykokoro/voices/v1.1-zh/voices-v1.1-zh.bin

Advanced Tokenizer Configuration
---------------------------------

Custom Tokenizer Settings
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from pykokoro import create_tokenizer, TokenizerConfig

   # Custom tokenizer config
   tokenizer_config = TokenizerConfig(
       vocab_path="/path/to/vocab.txt",
       espeak_config=espeak_config
   )

   tokenizer = create_tokenizer(config=tokenizer_config)

Mixed Language Support
~~~~~~~~~~~~~~~~~~~~~~

For text with multiple languages:

.. code-block:: python

   from pykokoro import create_tokenizer, TokenizerConfig

   config = TokenizerConfig(
       enable_mixed_language=True,
       primary_language="en-us",
       allowed_languages=["en-us", "es", "fr"],
       language_confidence_threshold=0.7
   )

   tokenizer = create_tokenizer(config=config)

Audio Trimming
--------------

Trim Silence from Audio
~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from pykokoro import trim

   # Generate audio with silence
   with Kokoro() as kokoro:
       audio, sr = kokoro.create("Hello!", voice="af_bella")

   # Trim silence
   trimmed_audio, trim_info = trim(audio)

   print(f"Original: {len(audio)} samples")
   print(f"Trimmed: {len(trimmed_audio)} samples")
   print(f"Trim info: {trim_info}")

Short Sentence Handling
-----------------------

PyKokoro improves short, single-word sentences by surrounding the word with a
pause marker. You can tune settins via ``ShortSentenceConfig``:

.. code-block:: python

   from pykokoro.short_sentence_handler import ShortSentenceConfig

   short_config = ShortSentenceConfig(
       phoneme_pretext="…",
   )

Configuration Management
------------------------

Save and Load Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   from pykokoro import save_config, load_config

   # Save configuration
   config = {
       "default_voice": "af_bella",
       "default_speed": 1.0,
       "model_quality": "q8"
   }
   save_config(config, "my_config.json")

   # Load configuration
   loaded_config = load_config("my_config.json")

Get Cache Paths
~~~~~~~~~~~~~~~

.. code-block:: python

   from pykokoro import get_user_cache_path, get_user_config_path

   cache_path = get_user_cache_path()
   config_path = get_user_config_path()

   print(f"Cache: {cache_path}")
   print(f"Config: {config_path}")

Performance Tips
----------------

1. **Reuse Kokoro Instance**

   Don't create a new ``Kokoro()`` for each request - initialize once and reuse.

2. **Use GPU When Available**

   GPU acceleration provides 3-10x speedup.

3. **Batch Processing**

   Process multiple texts in one session to avoid initialization overhead.

4. **Choose Appropriate Model Quality**

   Use ``q6`` or ``q8`` for production; ``fp16`` only when quality is critical.

5. **Use pause_mode for Long Text**

   Using ``pause_mode="auto"`` with appropriate pause settings improves quality for long text.

Internal Architecture
---------------------

Understanding PyKokoro's Internal Structure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

PyKokoro uses a modular architecture with specialized manager classes for different responsibilities:

**OnnxSessionManager** (``pykokoro/onnx_session.py``)

Manages ONNX Runtime session creation and configuration:

* Automatic provider selection (CUDA → ROCm → CPU)
* Session options and execution providers
* Fallback handling when GPU is unavailable

**VoiceManager** (``pykokoro/voice_manager.py``)

Handles voice loading and blending:

* Loads voice embeddings from binary files
* Implements voice blending with weighted combinations
* Validates voice availability across model variants

**AudioGenerator** (``pykokoro/audio_generator.py``)

Manages the audio generation pipeline:

* Converts phonemes to tokens
* Runs ONNX inference for audio generation
* Handles speed adjustment and audio post-processing

**MixedLanguageHandler** (``pykokoro/mixed_language_handler.py``)

Automatic language detection for multilingual text:

* Detects language boundaries in mixed-language text
* Routes text segments to appropriate language models
* Configurable confidence thresholds

**PhonemeDictionary** (``pykokoro/phoneme_dictionary.py``)

Custom word-to-phoneme mappings:

* Override default pronunciation for specific words
* Support for context-aware phoneme substitution
* JSON-based dictionary format

Using Manager Classes Directly
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

While most users interact with the high-level ``Kokoro`` API, advanced users can work with manager classes directly:

.. code-block:: python

   from pykokoro.onnx_session import OnnxSessionManager
   from pykokoro.voice_manager import VoiceManager
   from pykokoro.audio_generator import AudioGenerator

   # Create ONNX session with custom options
   session_manager = OnnxSessionManager(
       device="cuda",
       providers=["CUDAExecutionProvider"],
       user_session_options={"intra_op_num_threads": 4}
   )

   session = session_manager.create_session(
       model_path="/path/to/model.onnx"
   )

   # Load voices with custom blend
   voice_manager = VoiceManager(model_source="huggingface")
   voice_manager.load_voices("/path/to/voices.bin")
   voice_data = voice_manager.get_blended_voice("af_bella*0.7 + af_sarah*0.3")

   # Generate audio
   audio_gen = AudioGenerator(
       session=session,
       sample_rate=24000,
       lang="en-us"
   )

   audio = audio_gen.generate_audio_from_phonemes(
       phonemes="həˈloʊ wɝld",
       voice_data=voice_data,
       speed=1.0
   )

Custom Phoneme Dictionaries
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Create custom pronunciation mappings:

.. code-block:: python

   from pykokoro.phoneme_dictionary import PhonemeDictionary

   # Create dictionary
   dictionary = PhonemeDictionary()

   # Add custom pronunciations
   dictionary.add_word("PyKokoro", "paɪ kəˈkɔɹoʊ")
   dictionary.add_word("ONNX", "ɑnɪks")

   # Save to file
   dictionary.save("custom_pronunciations.json")

   # Load and use
   loaded_dict = PhonemeDictionary.load("custom_pronunciations.json")

   # Apply to tokenizer
   from pykokoro import create_tokenizer
   tokenizer = create_tokenizer()
   tokenizer.phoneme_dictionary = loaded_dict

   # Now "PyKokoro" will use custom pronunciation
   phonemes = tokenizer.phonemize("Welcome to PyKokoro!")

Next Steps
----------

* :doc:`examples` - Real-world usage examples
* :doc:`api_reference` - Complete API documentation