Back to Case Study

Espai / Regen

Technical Deep Dive

No Code HereArchitecture & Strategy
Aliu Mujib
Aliu MujibCo-Creator

Project Technologies

Frontend Flutter
Languages Python, Go
Transport gRPC, Protobuf
AI / ML Base SDXL, GPT-4V
Infrastructure & Tooling
ReplicateHugging Face DiffusersControlNetxformers

Feature Overview

Room restyling was enabled by leveraging Stable Diffusion alongside the creation of a ChatGPT-style UX for generating specific interior setup instructions based on user input.

Image Generation Pipeline

The implementation centers around a custom SDXLImageGenerator class responsible for:

  • Loading ControlNet, Autoencoder, and SDXL pipelines efficiently.
  • Memory optimization utilizing xformers for stable generation.
  • Managing dual modes: Text-to-Image and ControlNet Style Transfer (Canny edge detection).
  • Streaming intermediate steps via a queue and clearing CUDA caches post-completion.

Instruction Gen w/ GPT-4 Vision

After a user accepts a restyled design, the final image is passed to GPT-4 Vision. Utilizing a specific prompt and schema blueprint, the system enforces a structured JSON response containing:

"paint_colors": [...],
"material_types": [...],
"suggested_furniture": [...],
"style_notes": "...",
"search_terms": [...]

Redo Requests & Compute Cost

To manage the significant real-time compute costs incurred via Replicate API calls, a hard 3-try limit was implemented per job. This allowed users three rapid redos before requiring a full process restart—balancing user creativity with operational cost control during the beta phase.

Espai Process
Backend Infrastructure

Streaming & Partial JSON Parsing

To deliver a real-time, ChatGPT-like conversational streaming experience inside the mobile app, a specialized PartialJSONParser was engineered in Go.

  • 1It intercepts incomplete JSON tokens arriving from the OpenAI stream and dynamically syntheizes closing braces/brackets on-the-fly to prevent parser crashes.
  • 2The successfully parsed, but partial, JSON fragments are immediately serialized into Protocol Buffers.
  • 3These chunks are streamed via gRPC down to the Flutter frontend client, enabling live rendering of design instructions as they are generated.

Real-Time UX in Flutter

The Flutter frontend was fully gRPC-based. Streaming updates were displayed in a live-typing manner to simulate AI 'thinking'. This dramatically improved perceived responsiveness and contributed to Espai's brand as an intelligent interior assistant. I also added an option to save the final setup and view associated design instructions at any time.

Learnings & Takeaways

This feature was both a technical and UX challenge. Key takeaways:

  • 1 Real-time image generation requires thoughtful cost management
  • 2 Structured inputs outperformed open-text for generating usable results
  • 3 Streaming incomplete JSON from GPT-4 required creative backend handling
  • 4 Mimicking ChatGPT's UI created user trust and strong engagement
  • 5 Fine-tuning prompts for JSON output structure made GPT-4 outputs easier to parse and present

References

  • #

    Hugging Face Diffusers

    Image generation pipelines using pretrained diffusion models

  • #

    Replicate Cogs

    How to build and deploy AI models on Replicate

  • #

    gRPC

    High-performance communication framework used for real-time streaming