Speech Synthesis and Recognition (TTS & STT)
Cleek supports Text-to-Speech (TTS) and Speech-to-Text (STT) technologies. Our application can convert text information into clear voice output, allowing users to interact with our conversational agents as if they were talking to a real person. Users can choose from a variety of voices and pair the appropriate audio with the assistant. Additionally, for users who prefer auditory learning or need to obtain information while busy, TTS provides an excellent solution.
In Cleek, we have carefully selected a series of high-quality voice options (OpenAI Audio, Microsoft Edge Speech) to meet the needs of users from different regions and cultural backgrounds. Users can choose suitable voices based on personal preferences or specific scenarios, thereby obtaining a personalized communication experience.
Cleek TTS
@cleek/tts
is a high-quality TTS toolkit developed using the TS language, supporting usage in both server and browser environments.
- Server: With just 15 lines of code, it can achieve high-quality speech generation capabilities comparable to OpenAI TTS services. It currently supports EdgeSpeechTTS, MicrosoftTTS, OpenAITTS, and OpenAISTT.
- Browser: It provides high-quality React Hooks and visual audio components, supporting common functions such as loading, playing, pausing, and dragging the timeline, and offering extensive audio track style adjustment capabilities.
During the implementation of the TTS feature in Cleek, we found that there was no good frontend TTS library on the market, which resulted in a lot of effort being spent on implementation, including data conversion, audio progress management, and speech visualization. Adhering to the “Community First” concept, we have polished and open-sourced this implementation, hoping to help community developers who want to implement TTS.