Model Vision Recognition - Cleek Documentation

Cleek now supports large language models with visual recognition capabilities such as OpenAI’s gpt-4-vision, Google Gemini Pro vision, and Zhipu GLM-4 Vision, enabling Cleek to have multimodal interaction capabilities. Users can easily upload or drag and drop images into the chat box, and the assistant will be able to recognize the content of the images and engage in intelligent conversations based on them, creating more intelligent and diverse chat scenarios.

This feature opens up new ways of interaction, allowing communication to extend beyond text and encompass rich visual elements. Whether it’s sharing images in daily use or interpreting images in specific industries, the assistant can provide an excellent conversational experience.

Quickstart Speech Synthesis and Recognition (TTS & STT)

Get Started