Amazon's multimodal model for processing images, video, and text. Can analyze multiple images with 300K context.