This content originally appeared on DEV Community and was authored by SS
__This is a submission for the Google AI Studio Multimodal Challenge
Β
Β Β Β gemini-2.5-flash
Β Β gemini-2.5-flash-image-preview
Β Β imagen-4.0
Β Β imagen-3.0
Β Β
Inspiration
Β Β
I always wanted to draw comics that can capture my chaotic imaginations – but the drawing, erasing, starting again is such a drag!
Β
Also, AI didn’t help much – create – frustrate – regenerate – repeat and yet couldn’t get my vibe… even more drag!
Β
Well that was until Gemini nano banana
gemini-2.5-flash-image-preview
!
I am so blown away by its editing capabilities specially working with multi-image, multi-modal inputs that I couldn’t allow my lazy self to procrastinate anymore !
So, here’s
What I built
Β
SONICS.ai is a comprehensive, a Google AI-powered creative suite (demo) that transforms a user’s simple idea into a fully-realized, multi-sensory, character-consistent comic book experience with podcast playbacks,
while allowing them to add their flavours/ vibes at every aspect, from storyline to characters to scenes to dialogues to text styles all in natural language.
Β
The best part?
You dont need to be good at drawing! AI solves it for you.
And you can still bring your creativity, your imagination, your flavour/ vibe, your stories to life using SONICS.ai
without losing your patience with back-and-forth regeneration to get that perfect shot!
You can make full-comics, your style in MINUTES instead of months without lifting a pencil !!
For lazy fellas, you can hear your generated comics !
Demo
My project in action
Multimodal workflow architecture
Β
My project in action
0:00 Intro
0:10Story Conception
0:20Character/ Cast Design
0:53Comic Panel Creation
1:24Comic preview
1:34Audio preview
1:47Play the Comic that speaks your Style
Β Β
Multimodal workflow architecture
Β
gemini-2.5-flash
gemini-2.5-flash-image-preview
imagen-4.0
imagen-3.0
Β Β
How I Used Google AI Studio
Β
This app was entirely built on Google AI studio vibe-coded from scratch
as you could have guessed by now for my lazy vibes !
Β
I started with a simple idea prompt and kept on adding features by guiding the AI through pain-points I have faced when vibe-creating comics with my flavour.
Β
The Multimodal capabilities I implemented …
Multimodal Capabilities
Β
Input
Output
Models
Feature
Text
Image
gemini-2.5-flash-image-preview
imagen
For quality Character, Scene Background generation
Text editor based updates
Image + Text
Text
gemini-2.5-flash
Automatic character description updates for natural language based character edits
Image (mask) + Image + Text
Image
gemini-2.5-flash-image-preview
For precise edits in characters/ scenes, dialogue corrections, text stylings, positional edits, detail improvement
Multiple Images + Text
A composite image with rendered text
gemini-2.5-flash-image-preview
For comics scene panel generations ensuring character consistencies across scenes, dailogue accuracy, scene quality
Multimodal Features
The specific Multimodal functionalities I built and why it enhances the user experience (UX)…
click this for modality implementation details & respective models before proceeding
Β
1. Composite scene panels
Β
Models : Βimagen
Βgemini-2.5-flash-image-preview
Βgemini-2.5-flash
Β
The comic panels are created through an intelligent composition logic combining the multimodal capabilities of the models to create final panel images from the inputs – scene background, character images, scripts that were themsleves generated by using either of these.
Β
This ensures character consistency, dialogue accuracy as well as scene quality across comic scenes.
2. Flavour edits
Β
Model : Βgemini-2.5-flash-image-preview
Β
It is used for precise surgical editing of scenes, characters, dialogues, styles leveraging masking.
Users can simply explain their edits in natural language for feature changes (with / without masking).
Β
This helps users avoid regenerating back-and-forth images from scratch which was really frustrating when we need to make a small style/ error correction. And users can add their vibes/ flavours/ styles to the scene.
Acknowledgement
Β
Google AI studio is phenomenal at vibe-coding. I was able to generate and finish a well-working prototype in less that 6 hrs.
But as you could have guessed – Parkinson’s law took most time !
Β
gemini-2.5-flash-image-preview
(Gemini nano-banana) is the star of my whole idea. Due to nano banana, I was able to successfully create a consistent character comic experience,
imagen
helped me create beautiful backgrounds for the comic scenes which were then fully realised using composite logic.
gemini-2.5-flash
has been used for prompt engineering for inputs to other models, and also for optimising the deliverables.
Thank you!
It was a fun and great experience!
What Definitely Not a drag!
This content originally appeared on DEV Community and was authored by SS