AI PlanetX
Posts
ChatGPT Gains Vision, Video Abilities

ChatGPT Gains Vision, Video Abilities

Gemini 2.0 Generates Text, Images, Audio

Altiam Kabir
December 13, 2024

In partnership with

Welcome to another edition of AI PlanetX.

ChatGPT integrates vision and video; Google launches Gemini 2.0; Claude debuts Haiku 3.5 upgrade.

Inside This Edition: 💎

Hottest AI News
Top AI & SaaS Tools
Top AI & Tech News
Interesting Uses of AI
Top AI Video Tutorial
AI Job List
Complimentary AI Course of the Day

Hottest AI News

OpenAI

ChatGPT Combines Vision With Real Time Video Support

OpenAI has launched ChatGPT's ability to see and understand real-time video, a feature first demonstrated seven months ago.

Details:

Users can point their phones at objects for instant analysis and feedback via the ChatGPT app, with screen sharing support for tasks like device settings and math problems
The feature is available for ChatGPT Plus, Team, and Pro users, with Enterprise and Edu users getting it in January. No timeline is set for the EU
While impressive in demos, like analyzing anatomical drawings on CNN’s "60 Minutes," the system still makes mistakes, particularly with geometry

As OpenAI introduces this feature alongside a festive "Santa Mode" voice option, they're racing against competitors like Google and Meta, who are developing similar capabilities.

Start learning AI in 2025

Everyone talks about AI, but no one has the time to learn it. So, we found the easiest way to learn AI in as little time as possible: The Rundown AI.

It's a free AI newsletter that keeps you up-to-date on the latest AI news, and teaches you how to apply it in just 5 minutes a day.

Plus, complete the quiz after signing up and they’ll recommend the best AI tools, guides, and courses – tailored to your needs.

DeepMind

Google Launches Gemini 2.0 With Text, Image, and Speech Generation

Google has launched Gemini 2.0 Flash, an evolution from its text-only predecessor, with text, image, and audio generation, plus enhanced integration, as part of its push to compete with OpenAI.

Details:

Gemini 2.0 Flash offers native image and audio generation, with 8 voice options and SynthID watermarks to combat deepfakes
The model is twice as fast as Gemini 1.5 Pro, with improved coding, math, and factual accuracy, plus seamless integration with Google Search and third-party APIs
An experimental release is available via the Gemini API, with full features for early access partners until January 2024, when a broader rollout begins

Try it:

Visit the AI Studio and sign in using your Google account
Once logged in, navigate to the Create Prompt section on the left. (Alternatively, you can go to Stream Realtime to interact in real-time via text, voice, video, or screen sharing)
On the right, open the Model dropdown menu and scroll down to select Gemini 2.0 Flash Experimental
At the bottom, enter your query in the text box and press Run to send it to the AI

Anthropic

Claude Gets Upgraded With Haiku 3.5 Model

Anthropic has garnered attention by introducing its Claude 3.5 Haiku model to all Claude users, bringing improvements while retaining some limitations.

Details:

Outperforms Claude 3 Opus, excelling in coding, data extraction, labeling, content moderation, with larger output capacity and newer knowledge cutoff
Lacks image analysis, making it less versatile than Claude 3 Haiku and 3.5 Sonnet
Sparked controversy at launch when Anthropic reversed its pricing strategy, raising costs due to improved "intelligence" features

Claude 3.5 Haiku shows Anthropic's AI progress, but users must weigh its performance, limitations, and cost.

Top AI & SaaS Tools

CodeMate (Life-time Deal): Write, debug, and refactor code faster with this AI pair programmer [84% off]
Accio: AI-powered B2B search engine that acts as a personal agent, expertly interpreting your sourcing needs [F-R-E-E]
iBrief: Get concise, accurate article summaries with AI and easy sharing [F-R-E-E]
Twelve Labs: Multimodal AI that understands your videos—search by description and find exact moments [F-R-E-E]
Cartesia: Convert text to lifelike speech, clone voices in 90ms, and control emotions like anger or curiosity [F-R-E-E]

Top AI & Tech News

ChatGPT is integrated into Apple experiences within iOS, iPadOS, and macOS, enabling users to utilize ChatGPT’s capabilities directly within the OS [Read More]
Microsoft has introduced Phi-4 that enhances capabilities over its predecessors, especially in math problem-solving due to improved training data quality [Read More]
Google is advancing its augmented reality glasses with multimodal AI through Project Astra, which will soon be tested by select users on the new Android XR system [Read More]
Meta FAIR unveiled Video Seal for watermarking, Motivo for Metaverse agents, and the Large Concept Model, plus advances in memory, vision-language encoding, and social intelligence [Read More]

Interesting Uses of AI

AI Art Spotlight

Model: Midjourney

Prompt:

pencil sketch of a crazy turtle wearing sunglasses with a menacing expression. The turtle has a graffiti-covered, colorful shell with a skull symbol and other menacing motifs. The background is white. There's a bold text at the top that reads "maybe not too fast but furious!"

AI Prompt of the Day

Article Outline

This prompt is about creating a structured outline for an article on a specified topic.

Prompt:

Create an outline for an article about [topic]. The brand voice is [description]. Start with [first section idea], then go into [second section idea], then finish with [last section idea].

12 AI Tools You Won't Believe Are FREE

AI Job List

Software Engineer @ Palantir | Seattle, WA
Infrastructure Software Engineer, Public Sector @ Scale AI | San Francisco, CA; St. Louis, MO; New York, NY; Washington, DC
Engineering Manager, MarTech @ Grammarly | San Francisco; Hybrid

Complimentary AI Course of the Day

Generative AI Overview for Project Managers

Gain a basic understanding of Generative AI (GenAI) in project management. Explore different tools and its applications for enhanced project outcomes.

ChatGPT Gains Vision, Video Abilities

Gemini 2.0 Generates Text, Images, Audio

Inside This Edition: 💎

Hottest AI News

OpenAI

ChatGPT Combines Vision With Real Time Video Support

Start learning AI in 2025

DeepMind

Google Launches Gemini 2.0 With Text, Image, and Speech Generation

Anthropic

Claude Gets Upgraded With Haiku 3.5 Model

Top AI & SaaS Tools

Top AI & Tech News

Interesting Uses of AI

AI Art Spotlight

AI Prompt of the Day

Top AI Video Tutorial

12 AI Tools You Won't Believe Are FREE

AI Job List

Complimentary AI Course of the Day

Generative AI Overview for Project Managers