Google AI Introduces PaperBanana: An Agentic Framework that Automates Publication Ready Methodology Diagrams and Statistical Plots

Coinmama
Google AI Introduces PaperBanana: An Agentic Framework that Automates Publication Ready Methodology Diagrams and Statistical Plots
Ledger

Creating visually appealing illustrations for publications can be a time-consuming process in the research workflow. While AI scientists are proficient in literature reviews and coding, they often face challenges in effectively communicating complex discoveries visually. A collaborative research team from Google and Peking University has introduced a new framework called ‘PaperBanana’ that aims to address this issue by utilizing a multi-agent system to automate the creation of high-quality academic diagrams and plots.

https://dwzhu-pku.github.io/PaperBanana/

Tokenmetrics

5 Specialized Agents: The Architecture

The ‘PaperBanana’ framework does not rely on a single prompt. Instead, it employs a team of 5 agents to collaboratively transform raw text into professional visual representations.

https://dwzhu-pku.github.io/PaperBanana/

Phase 1: Linear Planning

Retriever Agent: This agent identifies the 10 most relevant reference examples from a database to guide the style and structure of the visuals.

Planner Agent: Translates technical methodology text into a detailed textual description of the target figure.

Stylist Agent: Functions as a design consultant to ensure that the output aligns with the “NeurIPS Look” by using specific color palettes and layouts.

Phase 2: Iterative Refinement

Visualizer Agent: This agent transforms the textual description into a visual output. For diagrams, it utilizes image models like Nano-Banana-Pro, while for statistical plots, it generates executable Python Matplotlib code.

Critic Agent: Inspects the generated image against the source text to identify factual errors or visual inconsistencies. It provides feedback for 3 rounds of refinement.

Outperforming the NeurIPS 2025 Benchmark

https://dwzhu-pku.github.io/PaperBanana/

The research team has introduced PaperBananaBench, a dataset comprising 292 test cases sourced from actual NeurIPS 2025 publications. By employing a VLM-as-a-Judge approach, they compared the performance of PaperBanana against leading baseline methods.

MetricImprovement over BaselineOverall Score+17.0% Conciseness+37.2% Readability+12.9% Aesthetics+6.6% Faithfulness+2.8%

The system excels in producing ‘Agent & Reasoning’ diagrams, achieving an overall score of 69.9%. Additionally, it offers an automated ‘Aesthetic Guideline’ that favors ‘Soft Tech Pastels’ over harsh primary colors.

Statistical Plots: Code vs. Image

Statistical plots require a level of numerical precision that standard image models may lack. PaperBanana addresses this by having the Visualizer Agent generate code instead of visual pixels.

Image Generation: While excelling in aesthetics, image generation may suffer from ‘numerical hallucinations’ or repeated elements.

Code-Based Generation: By utilizing the Matplotlib library to render plots, this approach ensures 100% data fidelity.

Domain-Specific Aesthetic Preferences in AI Research

As per the PaperBanana style guide, aesthetic choices may vary based on the research domain to align with the expectations of different academic communities.

Research DomainVisual ‘Vibe’Key Design ElementsAgent & ReasoningIllustrative, Narrative, “Friendly” 2D vector robots, human avatars, emojis, and “User Interface” aesthetics (chat bubbles, document icons)Computer Vision & 3DSpatial, Dense, Geometric Camera cones (frustums), ray lines, point clouds, and RGB color coding for axis correspondence Generative & LearningModular, Flow-oriented 3D cuboids for tensors, matrix grids, and “Zone” strategies using light pastel fills to group logic Theory & OptimizationMinimalist, Abstract, “Textbook” Graph nodes (circles), manifolds (planes), and a restrained grayscale palette with single highlight colors

Comparison of Visualization Paradigms

When it comes to statistical plots, there is a notable trade-off between utilizing an image generation model (IMG) versus generating executable code (Coding).

FeaturePlots via Image Generation (IMG)Plots via Coding (Matplotlib)AestheticsGenerally higher; visually appealing appearanceProfessional and standard academic look FidelityLower; may exhibit “numerical hallucinations” or repeated elements 100% accurate; faithfully represents the provided raw data ReadabilityHigh for sparse data but may struggle with complex datasets Consistently high; adept at handling dense or multi-series data without errors

Key Takeaways

Multi-Agent Collaborative Framework: ‘PaperBanana’ utilizes 5 specialized agents—Retriever, Planner, Stylist, Visualizer, and Critic—to transform technical text into high-quality methodology diagrams and plots.

Dual-Phase Generation Process: The workflow involves a Linear Planning Phase for reference retrieval and aesthetic guideline setting, followed by a 3-round Iterative Refinement Loop where errors are identified and corrected for enhanced accuracy.

Superior Performance on PaperBananaBench: Evaluation against 292 test cases from NeurIPS 2025 showcases that the framework outperforms conventional baseline methods in Overall Score (+17.0%), Conciseness (+37.2%), Readability (+12.9%), and Aesthetics (+6.6%).

Precision-Focused Statistical Plots: The system transitions from image generation to executable Python Matplotlib code for statistical data, ensuring numerical precision and eliminating common issues like “hallucinations” found in standard AI image generators.

For more information, you can refer to the original Paper and Repo. Feel free to follow us on Twitter and join our ML SubReddit. Additionally, subscribe to our Newsletter and join us on Telegram for more updates.







Previous articleHow to Build a Production-Grade Agentic AI System with Hybrid Retrieval, Provenance-First Citations, Repair Loops, and Episodic Memory