ZhipuAI's efficient vision-language model with 128K context window. Supports image inputs and thinking.