See Live Demo

How Rocketium's AI creative resizing achieves 95% accuracy at scale through multi-tier intelligence

20 mins read

Published by Engineering team

Processing 10,000+ creative variations daily with deterministic vector matching, algorithmic precision, and LLM-powered generation

Every day, marketing teams waste 73% of their creative production time manually resizing assets across format variations. The conventional approach of hiring more designers or using rigid templates fails at scale. We reject this inefficiency.

Through a three-tier intelligence system combining algorithmic resizing, vector similarity matching, and LLM-based generation, Rocketium's AI Auto Adapt processes 10,000+ creatives daily with 95% accuracy in 3 seconds average processing time. This isn't incremental improvement. It's a fundamental reimagining of creative automation through mathematical rigor and distributed intelligence.

The creative resizing crisis

Modern marketing operates at unprecedented scale. A single campaign may need 50+ format variations across platforms - Instagram stories, Facebook feeds, Google display ads, LinkedIn carousels, and countless other specifications. Each format demands unique aspect ratios, element positioning, text sizing, and visual hierarchy adjustments.

Consider the combinatorial explosion. A brand with 10 core creative concepts, adapted across 50 formats, with 5 messaging variants, requires 2,500 unique creative assets per campaign. Manual production at 15 minutes per adaptation would consume 625 hours of designer time equivalent to 15.6 weeks of full-time work for a single campaign.

This scalability crisis demands algorithmic solutions. But naive automation of simple image scaling or template-based approaches fails catastrophically when faced with the nuanced requirements of modern creative adaptation.

The three-path intelligence architecture

Rocketium's AI Auto Adapt recognizes a fundamental truth - different creative challenges require different types of intelligence. Rather than forcing all adaptations through a single algorithmic approach, our system provides three distinct methods, each optimized for specific creative scenarios and user preferences.

This architecture embodies a critical design philosophy - user agency over algorithmic assumption. Different creative teams have different priorities - speed vs. precision, consistency vs. creativity, learning-based vs. rule-based approaches. Our system provides choice rather than imposing a one-size-fits-all solution.

Approach 1 - Algorithmic placement

The similarity trap in creative resizing

In creative resizing, “similarity” is often treated far too simplistically. Many auto-adapt systems still rely on naive rules like matching by aspect ratio or proportional scaling to resize a design from one format to another. The result? A movie poster squeezed into an Instagram Story looks nothing like the original, or a carefully balanced banner loses its hierarchy when stretched into a square. These are not minor glitches. They are the inevitable outcome of treating similarity as a one-dimensional property when, in reality, it is multi-faceted and context-dependent.

Naive similarity assumes that two canvases are “close enough” if their width–height ratios align, but this ignores composition, grouping, visual hierarchy, and edge attachments. A 16:9 poster and a 1080×1920 Story may appear mathematically “similar,” yet visually they are worlds apart. This is the similarity trap - the mistaken belief that a single metric can capture the complex nature of design adaptation.

Our approach to creative resizing escapes this trap by redefining similarity across multiple dimensions. Instead of relying on ratio alone, we layer in Euclidean distance for size matching, weighted aspect ratio alignment, quadrant-based positioning, edge-attachment preservation, background detection, and mathematically precise group handling. Together, these techniques ensure that adaptations aren’t just resized they remain faithful to the intent, hierarchy, and usability of the original design.

Escaping the similarity trap

When adapting creatives across formats, “similarity” can’t be captured by one naïve metric. Instead, we need a layered approach: start with raw geometry, refine with proportions, preserve layout semantics, enforce design heuristics, and end with structural cohesion. Each stage fixes a limitation of the one before it.

Step 1 – Building on geometry with euclidean distance

The first way to compare two sizes is by treating width and height as points in a plane and measuring how far apart they are. This gives us a geometric sense of closeness. We measure closeness of two sizes (w₁, h₁) and (w₂, h₂) as

Example

Source = 1080×1920
Candidate A = 1200×2000 → dA≈144
Candidate B = 1920×1080 → dB≈1188

Candidate A is much closer in terms of raw dimensions.

Euclidean distance is a good first filter to discard wildly mismatched canvases. But geometry alone is blind to proportions. A square can appear “close” in pixels yet look completely different.

Step 2 – Balancing size and shape with weighted aspect ratio

Once we establish a geometric baseline, we add proportion awareness. When adapting creatives, shape similarity (aspect ratio) often matters more than absolute size. A 4:5 portrait feels much closer to a 9:16 story than a 1:1 square, even if the square is dimensionally closer. To capture this, we combine aspect ratio closeness with distance closeness.

where α and β are empirically tuned weights balancing proportional accuracy and spatial stability.

Aspect ratio closeness

Distance closeness

Weighted scoring prevents us from picking technically close but visually mismatched sizes. Still, this only helps us pick the right canvas. It doesn’t preserve element placement.

Step 3 – Preserving composition with quadrant-based positioning

Naïve scaling assumes everything moves proportionally, but creatives have structure. Elements belong to zones top-left for logos, bottom-right for CTAs, center for products. Quadrant mapping ensures those relationships survive resizing.

Example
A CTA sits in the top-right of a 1920×1080 banner. Naïve scaling pushes it downward in a tall 1080×1920 Story. Quadrant mapping keeps it anchored in the top-right.

This step preserves composition and hierarchy. It respects visual intent, but there are still cases like edge logos or backgrounds that require special handling.

Step 4 – Keeping anchors stable with edge and background rules

Some elements don’t just live in a quadrant they hug an edge or fill the canvas. These need rules beyond proportional math.

Edge attachment
If an element touches the boundary in the source, it should stay pinned to the boundary in the target.

Background detection
When an element’s area dominates the canvas

α is determined through a proprietary calculation that takes into account multiple visual and spatial parameters.

Example

A logo sitting at the top-left of a 600×400 ad remains top-left in the 1080×1920 version.
A background photo covering α of the canvas is stretched to fill the new format.

This ensures brand marks and backgrounds behave predictably. But if a logo includes both icon and text grouped together, naïve scaling can still break their proportions.

Example

A logo sitting at the top-left of a 600×400 ad remains top-left in the 1080×1920 version.
A background photo covering α of the canvas is stretched to fill the new format.

This ensures brand marks and backgrounds behave predictably. But if a logo includes both icon and text grouped together, naïve scaling can still break their proportions.

Example

A logo sitting at the top-left of a 600×400 ad remains top-left in the 1080×1920 version.
A background photo covering α of the canvas is stretched to fill the new format.

This ensures brand marks and backgrounds behave predictably. But if a logo includes both icon and text grouped together, naïve scaling can still break their proportions.

Step 5 – Maintaining cohesion with group scaling

A final refinement ensures that groups of elements (like buttons, badges, or icons with labels) scale as a single unit rather than drifting apart.

Example
A button group is 400×200 in the source and must fit into 280×140.

From distance to design

Geometry filters mismatches.
Aspect ratio adds proportional sensitivity.
Quadrants preserve layout intent.
Edges and backgrounds enforce design heuristics.
Group scaling maintains cohesion.

By layering these steps, we move from flat heuristics to a multi-dimensional framework that balances math with visual meaning. This is how we escape the Similarity Trap.

Approach 2 - Vector similarity with semantic retrieval and mapping

While mathematical resizing relies on geometric relationships, the vector path builds semantic understanding of each creative. Instead of looking only at pixels or ratios, we translate a design into a structured textual description - a language the AI can reason with. This approach allows the system to find visually and functionally similar layouts across thousands of existing creatives and reuse their structure intelligently.

Step 1 - Canonicalize the creative

Before we can compare two designs, they must be represented in a consistent, size-independent structure. Raw design files differ by coordinate systems, nested groups, and local transforms, so the first step is to flatten the creative.

Each element is expressed in a normalized coordinate space, where positions and dimensions are scaled relative to the canvas. This ensures that designs of any resolution can be compared meaningfully. Values are standardized within a uniform range to maintain stability and prevent inconsistencies during embedding generation.

Each normalized element now captures the essential layout information, its role, relative position, and relative size, independent of resolution. This canonical structure forms the foundation for all subsequent steps: text serialization, embedding generation, and similarity matching.

Flattening also ensures that no nested coordinates distort layout understanding. At this stage, each layer’s role (e.g., hero image, headline, logo, CTA, background) is assigned via a hybrid classification system combining deterministic rules and lightweight LLM inference.

Step 2 – Generate a layout description

Once flattened and labeled, the creative is serialized into a concise textual description that captures its structural intent rather than its visual appearance.

Canvas: portrait format.
Background spans entire canvas.
Hero image centered, occupying major vertical space.
Headline near top, CTA button below hero, logo at corner

This description encodes geometry, hierarchy, and function, everything an embedding model needs to understand the layout conceptually.

A similarity score close to 1 means the layouts are structurally aligned. For instance, both might have a hero centered, CTA bottom-right, and logo top-left even if their images or copy differ. Lower scores indicate layouts with different compositions or hierarchies.

Because creative repositories are large and diverse, the system uses a multi-scope search strategy to balance precision and coverage.

Brand Search

At this level, the system searches within the creative corpus of the same brand or related business units. This includes assets from the same team, workspace, or sub-brands that follow a shared visual system. Such campaigns typically share design grammar, typography, color hierarchy, and layout proportions resulting in highly precise structural matches.

Brand-level matches maintain strong visual and functional alignment with minimal need for manual adjustment, making this the most frequently used search scope for adaptation and versioning. A similarity threshold of 90–95% ensures that only designs truly aligned with the brand’s identity are selected making this the preferred and most frequently used tier.

Global Search

If no sufficiently aligned layout is found within the brand ecosystem, the system queries the global creative corpus - a large-scale index of all past designs across industries. These designs are created and maintained by Rocketium based on publicly available data and best practices of the industry. This tier functions like an idea discovery engine - it finds structurally similar layouts that may introduce new arrangements or proportions that still respect core design principles.The similarity threshold drops to 80%, allowing more exploration.

Across these scopes, the search system balances fidelity, diversity, and performance.It starts with the closest known brand context and progressively widens its lens until it finds the best structural match.This cascading search ensures that versioning rarely fails if a perfect on-brand match isn’t found, it still locates a layout close enough in structure to be adapted with minimal adjustment.

Step 4 – Transfer structure through role-aware mapping

Once a matching reference creative is found, its layout becomes the blueprint for the new size. Each element in the source design is matched with an element in the reference layout that plays a similar role for example, headline to headline, logo to logo, CTA to CTA.

The system then compares their normalized geometry (x, y, width, height) and finds the pairing that minimizes both role mismatch and positional difference. In simpler terms, it looks for the closest match in both purpose and placement.

Once the best matches are identified, their proportional positions are transferred to the target size and scaled to fit the new canvas, preserving the original hierarchy, balance, and intent of the design.

Approach 3 - LLM based versioning

Step 1 — Prepare structured input

Before being sent to the model, the creative is flattened so that every visible element is represented individually. For each element, only the essential attributes are retained:

Normalized position and size within a standardized coordinate domain
Functional role label (e.g., primary visual, headline or key text, call-to-action, branding element, background layer)
Layer order to preserve stacking and visual hierarchy

This produces a compact yet expressive representation of the layout, rich enough for the model to reason about structure, hierarchy, and balance without depending on pixel-level details.

Example of serialized input

Canvas: Square format  
Elements:
1. Primary visual centered, occupying major space  
2. Headline or key text near upper region  
3. Call-to-action near lower area aligned with flow  
4. Branding element placed in a stable corner zone  
Target: Portrait format  
Goal: Rearrange elements while maintaining hierarchy and visual harmony

Step 2 — Generate layout plan

The model processes this structured description and produces a layout blueprint, a specification of element positions and proportions adapted to the target aspect ratio.

The LLM reasons about design principles such as

Hierarchy — maintaining relative prominence of visual elements
Readability — ensuring balanced spacing and text flow
Usability — positioning interactive or action-oriented components within natural visual paths

Because it has been trained on multimodal and structured representations, the model can generalize across formats, producing compositionally coherent layouts without relying on fixed templates.

The resulting blueprint is a machine-readable output that can be rendered directly by the creative engine, enabling instant preview and refinement.

{
  "canvas": "Target format (e.g., portrait or square)",
  "elements": [
    { "role": "background_layer", "x": "<norm_x>", "y": "<norm_y>", "w": "<norm_w>", "h": "<norm_h>" },
    { "role": "primary_visual", "x": "<norm_x>", "y": "<norm_y>", "w": "<norm_w>", "h": "<norm_h>" },
    { "role": "headline_text", "x": "<norm_x>", "y": "<norm_y>", "w": "<norm_w>", "h": "<norm_h>" },
    { "role": "action_element", "x": "<norm_x>", "y": "<norm_y>", "w": "<norm_w>", "h": "<norm_h>" },
    { "role": "branding_mark", "x": "<norm_x>", "y": "<norm_y>", "w": "<norm_w>", "h": "<norm_h>" }
  ]
}

Here, the coordinate values are symbolic placeholders representing normalized spatial parameters within a standardized coordinate domain.

This blueprint serves as a ready-to-render layout specification, directly consumable by the creative engine for automated rendering or fine-tuning.

Bringing it together

The LLM path extends creative resizing beyond retrieval and resizing, it enables reasoned creation. By understanding spatial roles, hierarchy, and brand intent through structured text, the model can generate entirely new layouts that remain visually and semantically consistent.

Together, the three approaches form a hierarchy of intelligence

Algorithmic resizing for deterministic precision
Vector similarity for structural reuse
LLM generation for creative reasoning

This tri-layer architecture ensures that every format, from the most standard to the most experimental, can be generated automatically, accurately, and always in harmony with brand identity.