Installation Guide
This comprehensive guide will walk you through installing and setting up NeuTTS Air on your system. The model works on Linux, macOS, and Windows with Python 3.11 or higher.
Note: NeuTTS Air runs entirely on your local device. No cloud API keys or internet connection required after installation.
Prerequisites
- Python 3.11 or higher installed on your system
- Git for cloning the repository
- Sufficient disk space for model weights (approximately 2-3 GB)
- Optional: CUDA-capable GPU for faster inference (CPU works fine too)
Step 1: Clone the Repository
Open your terminal and run the following commands:
git clone https://github.com/neuphonic/neutts-air.git
cd neutts-air
Step 2: Install eSpeak (Required Dependency)
eSpeak is a required dependency for phoneme conversion. Install it based on your operating system:
macOS
brew install espeak
Mac users may need to set the library path. Add these lines at the top of neutts.py:
from phonemizer.backend.espeak.wrapper import EspeakWrapper
_ESPEAK_LIBRARY = '/opt/homebrew/Cellar/espeak/1.48.04_1/lib/libespeak.1.1.48.dylib'
EspeakWrapper.set_library(_ESPEAK_LIBRARY)
Ubuntu/Debian Linux
sudo apt install espeak
Windows
Download and install eSpeak NG from the official repository, then set environment variables in PowerShell:
$env:PHONEMIZER_ESPEAK_LIBRARY = "c:\Program Files\eSpeak NG\libespeak-ng.dll"
$env:PHONEMIZER_ESPEAK_PATH = "c:\Program Files\eSpeak NG"
setx PHONEMIZER_ESPEAK_LIBRARY "c:\Program Files\eSpeak NG\libespeak-ng.dll"
setx PHONEMIZER_ESPEAK_PATH "c:\Program Files\eSpeak NG"
Step 3: Install Python Dependencies
Install the required Python packages:
pip install -r requirements.txt
The requirements file includes dependencies for running the model with PyTorch. When using ONNX decoder or GGML model, some dependencies like PyTorch may not be required.
Step 4: Optional - Install Additional Components
For GGUF Models (Recommended for Performance)
Install llama-cpp-python for optimized GGUF model support:
pip install llama-cpp-python
For GPU acceleration with CUDA or MPS support, refer to the llama-cpp-python documentation for platform-specific installation instructions.
For ONNX Decoder
If you want to use the ONNX decoder for additional performance:
pip install onnxruntime
Step 5: Running Your First Synthesis
Test the installation with the basic example:
python -m examples.basic_example \
--input_text "My name is Dave, and um, I'm from London" \
--ref_audio samples/dave.wav \
--ref_text samples/dave.txt
This command will synthesize speech using the provided reference audio and text. The output will be saved as an audio file in your working directory.
Using NeuTTS Air in Your Code
Here's a simple example to get started with NeuTTS Air in your Python projects:
from neuttsair.neutts import NeuTTSAir
import soundfile as sf
# Initialize the model
tts = NeuTTSAir(
backbone_repo="neuphonic/neutts-air", # or 'neutts-air-q4-gguf'
backbone_device="cpu",
codec_repo="neuphonic/neucodec",
codec_device="cpu"
)
# Your input text
input_text = "My name is Dave, and um, I'm from London."
# Reference files
ref_text = "samples/dave.txt"
ref_audio_path = "samples/dave.wav"
# Load reference text
ref_text = open(ref_text, "r").read().strip()
# Encode reference audio
ref_codes = tts.encode_reference(ref_audio_path)
# Generate speech
wav = tts.infer(input_text, ref_codes, ref_text)
# Save output
sf.write("output.wav", wav, 24000)
Model Variants
NeuTTS Air is available in several formats to suit different performance needs:
Model Variant | Description | Use Case |
---|---|---|
neuphonic/neutts-air | Standard PyTorch model | Full-featured, best quality |
neutts-air-q8-gguf | 8-bit quantized GGUF | Balanced speed and quality |
neutts-air-q4-gguf | 4-bit quantized GGUF | Maximum speed, lower memory |
Optimizing Performance
For the best performance on your device:
- Use GGUF models: These quantized versions run faster with minimal quality loss
- Pre-encode references: Encode your reference audio once and reuse the codes for multiple generations
- Use ONNX decoder: The ONNX codec decoder can provide additional speed improvements
- Enable GPU acceleration: If you have a CUDA-compatible GPU or Apple Silicon, enable GPU support for faster inference
Preparing Reference Audio
For optimal voice cloning results, your reference audio should meet these criteria:
- Mono channel - Single audio channel
- 16-44 kHz sample rate - Standard quality range
- 3-15 seconds duration - Optimal length for voice capture
- WAV format - Uncompressed audio file
- Clean recording - Minimal background noise
- Natural speech - Continuous speaking with few pauses
Troubleshooting
eSpeak Library Not Found
If you get an error about eSpeak library not being found, ensure the library path is correctly set in your environment variables or at the top of the script as shown in Step 2.
CUDA/GPU Issues
If GPU acceleration is not working, verify that your CUDA installation matches your PyTorch version. You can always fall back to CPU inference by setting device="cpu"
.
Poor Voice Quality
Check your reference audio quality. Ensure it's clean, properly formatted, and within the recommended duration range. Better reference audio produces better results.
Advanced Usage
For streaming output, batch processing, and other advanced features, check the examples folder in the repository. Additional Jupyter notebooks are available demonstrating various use cases.
Developer Contributions
If you want to contribute to the NeuTTS Air project, install the pre-commit hooks:
pip install pre-commit
pre-commit install
Installation Complete
You're now ready to use NeuTTS Air for text-to-speech synthesis with voice cloning. Experiment with different reference voices and text inputs to explore the model's capabilities.
Need Help? For additional examples and detailed documentation, visit the official repository or check the examples folder included with your installation.