SONICS.ai πŸ§ πŸŽ¬πŸ“šπŸŽžοΈ create Comics that *speak* – your Style!



This content originally appeared on DEV Community and was authored by SS

__This is a submission for the Google AI Studio Multimodal Challenge
Β 
Β Β Β  gemini-2.5-flash Β Β  gemini-2.5-flash-image-preview Β Β  imagen-4.0 Β Β  imagen-3.0

Β Β 

Inspiration

Β Β 
I always wanted to draw comics that can capture my chaotic imaginations – but the drawing, erasing, starting again is such a drag!

Β 
Also, AI didn’t help much – create – frustrate – regenerate – repeat and yet couldn’t get my vibeeven more drag!

Β 

Well that was until Gemini nano banana gemini-2.5-flash-image-preview!

I am so blown away by its editing capabilities specially working with multi-image, multi-modal inputs that I couldn’t allow my lazy self to procrastinate anymore !

So, here’s

What I built

Β 
SONICS.ai is a comprehensive, a Google AI-powered creative suite 🧠🎬📚🎞 (demo) that transforms a user’s simple idea into a fully-realized, multi-sensory, character-consistent comic book experience with podcast playbacks,

while allowing them to add their flavours/ vibes at every aspect, from storyline to characters to scenes to dialogues to text styles all in natural language.

Β 
The best part?
You dont need to be good at drawing! AI solves it for you.

And you can still bring your creativity, your imagination, your flavour/ vibe, your stories to life using SONICS.ai
without losing your patience with back-and-forth regeneration to get that perfect shot!

You can make full-comics, your style in MINUTES instead of months without lifting a pencil !!

For lazy fellas, you can hear your generated comics !

Demo

My project in action
Multimodal workflow architecture 🧠🎬📚🎞

Β 

My project in action

0:00 Intro
0:10 🧠 Story Conception
0:20 🎬 Character/ Cast Design
0:53 🎞 Comic Panel Creation
1:24 📚 Comic preview
1:34 🎧 Audio preview
1:47 🎥 Play the Comic that speaks your Style

▶ Play on Youtube

Β Β 

Multimodal workflow architecture

Β 
gemini-2.5-flash gemini-2.5-flash-image-preview imagen-4.0 imagen-3.0

Β Β 

phase 1
phase 2
phase 3
phase 4

How I Used Google AI Studio

Β 
This app was entirely built on Google AI studio vibe-coded from scratch
as you could have guessed by now for my lazy vibes !

Β 
I started with a simple idea prompt and kept on adding features by guiding the AI through pain-points I have faced when vibe-creating comics with my flavour.

Β 
The Multimodal capabilities I implemented …

Multimodal Capabilities

Β 

Input

Output

Models

Feature

Text




Image




gemini-2.5-flash-image-preview

imagen

For quality Character, Scene Background generation

Text editor based updates



Image + Text


Text


gemini-2.5-flash


Automatic character description updates for natural language based character edits

Image (mask) + Image + Text

Image



gemini-2.5-flash-image-preview

For precise edits in characters/ scenes, dialogue corrections, text stylings, positional edits, detail improvement

Multiple Images + Text

A composite image with rendered text

gemini-2.5-flash-image-preview

For comics scene panel generations ensuring character consistencies across scenes, dailogue accuracy, scene quality

Multimodal Features

The specific Multimodal functionalities I built and why it enhances the user experience (UX)…

click this for modality implementation details & respective models before proceeding

Β 

1. Composite scene panels

Β 
Models : Β  imagen Β  gemini-2.5-flash-image-preview Β  gemini-2.5-flash
Β 
The comic panels are created through an intelligent composition logic combining the multimodal capabilities of the models to create final panel images from the inputs – scene background, character images, scripts that were themsleves generated by using either of these.
Β 
This ensures character consistency, dialogue accuracy as well as scene quality across comic scenes.

2. Flavour edits

Β 
Model : Β  gemini-2.5-flash-image-preview
Β 
It is used for precise surgical editing of scenes, characters, dialogues, styles leveraging masking.
Users can simply explain their edits in natural language for feature changes (with / without masking).
Β 
This helps users avoid regenerating back-and-forth images from scratch which was really frustrating when we need to make a small style/ error correction. And users can add their vibes/ flavours/ styles to the scene.

Acknowledgement

Β 

Google AI studio is phenomenal at vibe-coding. I was able to generate and finish a well-working prototype in less that 6 hrs.
But as you could have guessed – Parkinson’s law took most time !
Β 
gemini-2.5-flash-image-preview (Gemini nano-banana) is the star of my whole idea. Due to nano banana, I was able to successfully create a consistent character comic experience,
imagen helped me create beautiful backgrounds for the comic scenes which were then fully realised using composite logic.
gemini-2.5-flash has been used for prompt engineering for inputs to other models, and also for optimising the deliverables.

Thank you!
It was a fun and great experience!

What Definitely Not a drag!


This content originally appeared on DEV Community and was authored by SS