Espai / Regen

Technical Deep Dive

No Code HereArchitecture & Strategy

Aliu MujibCo-Creator

Project Technologies

Frontend Flutter

Languages Python, Go

Transport gRPC, Protobuf

AI / ML Base SDXL, GPT-4V

Infrastructure & Tooling

ReplicateHugging Face DiffusersControlNetxformers

Feature Overview

Room restyling was enabled by leveraging Stable Diffusion alongside the creation of a ChatGPT-style UX for generating specific interior setup instructions based on user input.

Image Generation Pipeline

The implementation centers around a custom SDXLImageGenerator class responsible for:

Loading ControlNet, Autoencoder, and SDXL pipelines efficiently.
Memory optimization utilizing xformers for stable generation.
Managing dual modes: Text-to-Image and ControlNet Style Transfer (Canny edge detection).
Streaming intermediate steps via a queue and clearing CUDA caches post-completion.

Instruction Gen w/ GPT-4 Vision

After a user accepts a restyled design, the final image is passed to GPT-4 Vision. Utilizing a specific prompt and schema blueprint, the system enforces a structured JSON response containing:

"paint_colors": [...],
"material_types": [...],
"suggested_furniture": [...],
"style_notes": "...",
"search_terms": [...]

Redo Requests & Compute Cost

To manage the significant real-time compute costs incurred via Replicate API calls, a hard 3-try limit was implemented per job. This allowed users three rapid redos before requiring a full process restart—balancing user creativity with operational cost control during the beta phase.

Backend Infrastructure

Streaming & Partial JSON Parsing

To deliver a real-time, ChatGPT-like conversational streaming experience inside the mobile app, a specialized PartialJSONParser was engineered in Go.

1It intercepts incomplete JSON tokens arriving from the OpenAI stream and dynamically syntheizes closing braces/brackets on-the-fly to prevent parser crashes.
2The successfully parsed, but partial, JSON fragments are immediately serialized into Protocol Buffers.
3These chunks are streamed via gRPC down to the Flutter frontend client, enabling live rendering of design instructions as they are generated.

Real-Time UX in Flutter

The Flutter frontend was fully gRPC-based. Streaming updates were displayed in a live-typing manner to simulate AI 'thinking'. This dramatically improved perceived responsiveness and contributed to Espai's brand as an intelligent interior assistant. I also added an option to save the final setup and view associated design instructions at any time.

Learnings & Takeaways

This feature was both a technical and UX challenge. Key takeaways:

1 Real-time image generation requires thoughtful cost management
2 Structured inputs outperformed open-text for generating usable results
3 Streaming incomplete JSON from GPT-4 required creative backend handling
4 Mimicking ChatGPT's UI created user trust and strong engagement
5 Fine-tuning prompts for JSON output structure made GPT-4 outputs easier to parse and present

References

#
Hugging Face Diffusers
Image generation pipelines using pretrained diffusion models
#
Replicate Cogs
How to build and deploy AI models on Replicate
#
gRPC
High-performance communication framework used for real-time streaming