- AI PlanetX
- Posts
- ChatGPT Gains Vision, Video Abilities
ChatGPT Gains Vision, Video Abilities
Gemini 2.0 Generates Text, Images, Audio
Welcome to another edition of AI PlanetX.
ChatGPT integrates vision and video; Google launches Gemini 2.0; Claude debuts Haiku 3.5 upgrade.
Inside This Edition: 💎
Hottest AI News
Top AI & SaaS Tools
Top AI & Tech News
Interesting Uses of AI
Top AI Video Tutorial
AI Job List
Complimentary AI Course of the Day
Hottest AI News
OpenAI
ChatGPT Combines Vision With Real Time Video Support
OpenAI has launched ChatGPT's ability to see and understand real-time video, a feature first demonstrated seven months ago.
Details:
Users can point their phones at objects for instant analysis and feedback via the ChatGPT app, with screen sharing support for tasks like device settings and math problems
The feature is available for ChatGPT Plus, Team, and Pro users, with Enterprise and Edu users getting it in January. No timeline is set for the EU
While impressive in demos, like analyzing anatomical drawings on CNN’s "60 Minutes," the system still makes mistakes, particularly with geometry
As OpenAI introduces this feature alongside a festive "Santa Mode" voice option, they're racing against competitors like Google and Meta, who are developing similar capabilities.
Start learning AI in 2025
Everyone talks about AI, but no one has the time to learn it. So, we found the easiest way to learn AI in as little time as possible: The Rundown AI.
It's a free AI newsletter that keeps you up-to-date on the latest AI news, and teaches you how to apply it in just 5 minutes a day.
Plus, complete the quiz after signing up and they’ll recommend the best AI tools, guides, and courses – tailored to your needs.
DeepMind
Google Launches Gemini 2.0 With Text, Image, and Speech Generation
Google has launched Gemini 2.0 Flash, an evolution from its text-only predecessor, with text, image, and audio generation, plus enhanced integration, as part of its push to compete with OpenAI.
Details:
Gemini 2.0 Flash offers native image and audio generation, with 8 voice options and SynthID watermarks to combat deepfakes
The model is twice as fast as Gemini 1.5 Pro, with improved coding, math, and factual accuracy, plus seamless integration with Google Search and third-party APIs
An experimental release is available via the Gemini API, with full features for early access partners until January 2024, when a broader rollout begins
Try it:
Visit the AI Studio and sign in using your Google account
Once logged in, navigate to the Create Prompt section on the left. (Alternatively, you can go to Stream Realtime to interact in real-time via text, voice, video, or screen sharing)
On the right, open the Model dropdown menu and scroll down to select Gemini 2.0 Flash Experimental
At the bottom, enter your query in the text box and press Run to send it to the AI
Anthropic
Claude Gets Upgraded With Haiku 3.5 Model
Anthropic has garnered attention by introducing its Claude 3.5 Haiku model to all Claude users, bringing improvements while retaining some limitations.
Details:
Outperforms Claude 3 Opus, excelling in coding, data extraction, labeling, content moderation, with larger output capacity and newer knowledge cutoff
Lacks image analysis, making it less versatile than Claude 3 Haiku and 3.5 Sonnet
Sparked controversy at launch when Anthropic reversed its pricing strategy, raising costs due to improved "intelligence" features
Claude 3.5 Haiku shows Anthropic's AI progress, but users must weigh its performance, limitations, and cost.
Top AI & SaaS Tools
CodeMate (Life-time Deal): Write, debug, and refactor code faster with this AI pair programmer [84% off]
Accio: AI-powered B2B search engine that acts as a personal agent, expertly interpreting your sourcing needs [F-R-E-E]
iBrief: Get concise, accurate article summaries with AI and easy sharing [F-R-E-E]
Twelve Labs: Multimodal AI that understands your videos—search by description and find exact moments [F-R-E-E]
Cartesia: Convert text to lifelike speech, clone voices in 90ms, and control emotions like anger or curiosity [F-R-E-E]
Top AI & Tech News
ChatGPT is integrated into Apple experiences within iOS, iPadOS, and macOS, enabling users to utilize ChatGPT’s capabilities directly within the OS [Read More]
Microsoft has introduced Phi-4 that enhances capabilities over its predecessors, especially in math problem-solving due to improved training data quality [Read More]
Google is advancing its augmented reality glasses with multimodal AI through Project Astra, which will soon be tested by select users on the new Android XR system [Read More]
Meta FAIR unveiled Video Seal for watermarking, Motivo for Metaverse agents, and the Large Concept Model, plus advances in memory, vision-language encoding, and social intelligence [Read More]
Interesting Uses of AI
AI Art Spotlight
Model: Midjourney
Prompt:
pencil sketch of a crazy turtle wearing sunglasses with a menacing expression. The turtle has a graffiti-covered, colorful shell with a skull symbol and other menacing motifs. The background is white. There's a bold text at the top that reads "maybe not too fast but furious!"
AI Prompt of the Day
Article Outline
This prompt is about creating a structured outline for an article on a specified topic.
Prompt:
Create an outline for an article about [topic]. The brand voice is [description]. Start with [first section idea], then go into [second section idea], then finish with [last section idea].
Top AI Video Tutorial
12 AI Tools You Won't Believe Are FREE
AI Job List
Software Engineer @ Palantir | Seattle, WA
Infrastructure Software Engineer, Public Sector @ Scale AI | San Francisco, CA; St. Louis, MO; New York, NY; Washington, DC
Engineering Manager, MarTech @ Grammarly | San Francisco; Hybrid
Complimentary AI Course of the Day
Generative AI Overview for Project Managers
Gain a basic understanding of Generative AI (GenAI) in project management. Explore different tools and its applications for enhanced project outcomes.