Weekly Round Up | Claude 3.5 Sonnet, DeepMind's V2A, and Stable Diffusion 3 Medium

Anthropic introduces Claude 3.5 Sonnet, DeepMind advances video-to-audio technology, and Stable Diffusion 3 Medium elevates text-to-image generation.

Adam here! Anthropic has introduced Claude 3.5 Sonnet, offering new AI capabilities with enhanced speed and intelligence for complex tasks. We also look at DeepMind's video-to-audio (V2A) technology, which generates synchronised soundtracks for videos using video pixels and text prompts. Additionally, Stable Diffusion 3 Medium provides advanced text-to-image capabilities suitable for both consumer PCs and enterprise GPUs.

We cover this week's top tweets, featuring Runway's Gen-3 Alpha, updates from Perplexity, and new Apple AI features.

Check out our "More AI in Action this Week" section for useful AI tips and updates.

🌟 Introducing Claude 3.5 Sonnet: Elevate Your AI Experience

Anthropic has launched Claude 3.5 Sonnet, a new model in the Claude 3.5 family, setting new benchmarks in AI performance. Claude 3.5 Sonnet offers impressive speed, intelligence, and cost-effectiveness, making it ideal for various complex tasks.

Key Features:

  • High Performance: Twice the speed of previous models.

  • Advanced Reasoning: Excels in graduate-level reasoning and coding proficiency.

  • Visual Capabilities: Superior at interpreting charts, graphs, and transcribing text from images.

Access and Pricing: Claude 3.5 Sonnet is available for free on Claude.ai and the Claude iOS app. Pro and Team subscribers enjoy higher rate limits. The model is also accessible via Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI, costing $3 per million input tokens and $15 per million output tokens.

New Artifacts Feature: Artifacts allow users to interact with AI-generated content in a dynamic workspace, enhancing collaboration and productivity.

Commitment to Safety: Claude 3.5 Sonnet undergoes rigorous testing to ensure safe and responsible use. Privacy remains a top priority, with no user data used for training without explicit permission.

Watch:

🎙️ Generating Audio for Video: DeepMind's Latest Breakthrough

DeepMind has unveiled its video-to-audio (V2A) technology, advancing synchronised audiovisual generation. V2A uses video pixels and text prompts to create rich soundtracks, enhancing the overall viewing experience.

Key Features:

  • Synchronised Soundscapes: Generates audio that matches on-screen action.

  • Versatile Use: Compatible with video generation models like Veo for creating scores, sound effects, and dialogue.

  • Enhanced Creative Control: Allows for unlimited soundtracks with positive and negative prompts for precise audio output.

How It Works: V2A starts by encoding video input, then refines audio from random noise using a diffusion model. The result is realistic, synchronised audio that aligns closely with visual input and prompts.

Applications:

  • Archival Footage: Brings old silent films to life with appropriate soundtracks.

  • Creative Projects: Ideal for filmmakers and creators looking to add immersive audio to their videos.

Commitment to Safety: DeepMind integrates safety measures like the SynthID toolkit to watermark AI-generated content, ensuring responsible use. V2A technology undergoes rigorous testing before public release to prevent misuse.

🏞️ Stable Diffusion 3 Medium: Your Next-Level Image Generation Tool

Image: Stable Diffusion

Stable Diffusion 3 Medium is now available, offering advanced text-to-image capabilities for both consumer PCs and enterprise GPUs.

Key Features:

  • Photorealistic Quality: Detailed, lifelike images.

  • Complex Prompt Handling: Executes long, detailed prompts.

  • Efficient Typography: Clear and accurate text in images.

  • Resource-Efficient: Runs smoothly on standard consumer GPUs.

  • Customisable: Fine-tunes with small datasets.

Open Access and Licensing: Available under an open non-commercial license and a low-cost Creator License for commercial use. For large-scale commercial applications, contact Stability AI for licensing details.

🐥 Tweets of the Week

Runway Gen-3 Alpha: (@runwayml) demonstrates the fine-grained temporal control of Gen-3 Alpha, allowing users to create seamless transitions and effects within scenes using only text prompts.

Perplexity Enhancements: (@AravSrinivas), CEO of Perplexity, discusses the importance of high-density information units and the collaboration with @TakoViz to enhance knowledge cards and visualisations on Perplexity.

Apple Intelligence Features: (@heykahn) highlights new Apple AI features, including Genmoji for personalised emojis, Image Playground for AI-created images, Image Wand for converting sketches into high-quality images, and AI-powered Call Transcripts for transcribing and summarising calls.

🎬 More AI in Action this Week

Google Shortcuts you need to be using: Discover three essential Google shortcuts to save time. Use "@ gemini space" in Google Chrome for creative prompts.

Runway's Gen-3 Release: Although Sora isn't available yet, Runway's Gen-3 just launched with incredible capabilities. Check out the examples shared to see its impressive features in action.

Discover these AI Tools and Apps

  1. Runway Gen-3: AI video platform Runway will release its Gen-3 model “in the next few days”.

  2. Gemini: Get help with writing, planning, learning, and more from Google AI.

  3. Alva AI: Your trusted co-pilot system, adept at assisting you with various daily tasks such as organising your schedule and overseeing your goals.

  4. Galileo AI: Generate interface designs at lightning speed.

  5. Soloist AI: Launch your website from a few simple inputs.