DataFactory: Multi-Agent Framework for TableQA

Revolutionizing Table Question Answering through coordinated collaboration between specialized agents. Transform structured data into actionable insights with our advanced multi-agent framework.

Wisdom in Data is the implementation and demonstration platform for the DataFactory framework described in our paper: DataFactory: Collaborative Multi-Agent Framework for Advanced Table Question Answering.

15.9%

Accuracy Improvement

28.6%

Performance Gain

3

Specialized Teams

Wisdom in Data Platform

Platform Interface Preview

A real screenshot of the Wisdom in Data platform, demonstrating the intuitive and powerful user interface for data analysis and management.

Wisdom in Data Platform UI

Key Features

Our multi-agent framework addresses critical limitations in existing TableQA methods.
This platform is designed and implemented based on the DataFactory collaborative multi-agent framework proposed in our paper, aiming to translate cutting-edge research into practical tools.

Automated Data Ingestion

LLM-driven table property analysis and automated data ingestion with intelligent DDL generation and data quality assessment.

Knowledge Graph Construction

Automated "data-to-knowledge graph" transformation algorithm for enhanced relational representation and semantic analysis.

Multi-Agent Collaboration

Coordinated Database and Knowledge Graph teams orchestrated by a Data Leader using the ReAct paradigm.

Hallucination Reduction

Comprehensive contextual information and few-shot prompting to minimize model hallucinations and improve accuracy.

Advanced Reasoning

Sophisticated multi-step reasoning combining structured and relational retrieval methods for complex queries.

Autonomous Pipeline

Fully autonomous pipeline enabling seamless user-table interaction with intelligent visualization generation.

Architecture

Our tripartite collaborative architecture designed for efficient and accurate TableQA

Data Factory Architecture

Multi-Agent Data Factory

Data Leader

The cognitive core using ReAct paradigm for dynamic reasoning, planning, and task decomposition.

Database Team

Expert team for structured data processing, numerical computation, and SQL-based queries.

Knowledge Graph Team

Specialized team for relational knowledge processing and Cypher-based graph queries.

1

Information Storage

Automated data ingestion and knowledge graph construction with LLM-driven analysis.

  • Table property analysis
  • DDL generation
  • Entity extraction
2

Knowledge Extraction

Context-enhanced SQL and Cypher query generation with historical QA integration.

  • Context enhancement
  • Few-shot prompting
  • Domain knowledge injection
3

Insight Generation

ReAct-based reasoning with multi-step analysis and comprehensive answer synthesis.

  • Multi-step reasoning
  • Team coordination
  • Answer synthesis

Platform Demonstration

Comprehensive demonstrations showcasing our multi-agent data factory capabilities

Data Import Process

Methodology Overview

This demonstration shows how the Database Information Processing Agent imports CSV and Excel files into the database. The agent intelligently analyzes the number and structure of title/header rows, automatically detects and processes merged headers, and extracts each column's data for cleaning. It then provides detailed analysis for each column, including recommended field names, types, primary key status, and the reasoning behind each suggestion. The user only needs to intervene at key steps; the rest of the process is fully automated, making data import seamless and efficient.

  • Intelligent Header Recognition: Automatic detection of title/header rows and handling of complex merged headers
  • Field Analysis: Extraction and cleaning of column data, with smart suggestions for field names, types, and primary keys
  • Reasoning Transparency: Detailed explanation for each field analysis and decision
  • Minimal User Intervention: Users only participate in critical steps, with the rest handled automatically

DDL Generation Process

Automated Schema Design

This video demonstrates how the Database Information Processing Agent generates DDL by analyzing table headers and sample values. For each column, the agent automatically infers its content, provides explanations, and incorporates user-specified sample values into the DDL. This enriched DDL information helps the Database Information Retrieval Agent better understand the table structure, enabling more accurate SQL generation for user queries.

  • Field Content Analysis: Automatic inference and explanation of each column's meaning based on headers and sample values
  • Sample Value Integration: User-specified examples are included in the DDL for richer context
  • Enhanced Retrieval: DDL is optimized to support downstream agents in query generation

Data Question Answering

Context-Enhanced SQL Generation
"Which sales representative has the highest customer satisfaction scores?"

In this demonstration, the user asks, "Which sales representative has the highest customer satisfaction scores?" The Database Retrieval Agent leverages historical QA data, DDL information, and domain-specific knowledge as retrieval-augmented generation (RAG) context to generate the appropriate SQL query and retrieve the data. The Database Analysis Agent then combines the query results and relevant domain knowledge to provide a detailed analysis and answer to the user's question.

  • RAG-based SQL Generation: Utilizes historical QA, DDL, and domain knowledge for context
  • Automated Data Retrieval: Generates and executes SQL to answer user queries
  • Expert Analysis: Analysis agent synthesizes results and knowledge for comprehensive answers

Data Visualization Generation

Automated Chart Generation
"Create a chart showing monthly revenue trends across all regions"

This video demonstrates how, when a user requests "Create a chart showing monthly revenue trends across all regions," the Database Visualization Agent first determines whether a chart is needed and recommends the most suitable chart type if not specified. It then uses data provided by the Retrieval Agent to generate the chart, while the Analysis Agent offers an interpretation of the visualization. The process is fully automated, ensuring users receive both visual and analytical insights with minimal effort.

  • Intent Recognition: Detects user needs for visualization and recommends chart types automatically
  • Data-Driven Plotting: Uses retrieved data to generate appropriate charts
  • Insightful Analysis: Analysis agent provides explanations and insights for each visualization

Knowledge Graph Construction

Data-to-Knowledge Graph Transformation

This demonstration presents how the Knowledge Graph Information Processing Agent transforms structured database tables into relational knowledge graphs. The agent reads table DDL, header information, and sample data, then applies algorithm-defined transformation rules to determine which columns become entities (including splitting multi-valued cells), and how entities are related. Logical and semantic relationship patterns are used to associate entities, and the resulting graph is stored in a knowledge graph database. This process enables downstream relational analysis that goes beyond traditional structured queries.

  • Entity Construction: Identifies entity columns, handles multi-valued cells, and generates unique entity identifiers
  • Relationship Discovery: Defines and applies logical/semantic rules to associate entities
  • Graph Storage: Persists the resulting knowledge graph for advanced relational retrieval and analysis

Knowledge Graph Visualization

Interactive Graph Exploration

This video demonstrates the knowledge graph visualization interface, which allows users to intuitively explore the internal relationships within the transformed knowledge graph. The interface displays all node types, relationship types, node properties, and their associations. Users can drag, browse, and filter the graph to gain a clear understanding of the data's relational structure.

  • Comprehensive Visualization: Shows all node types, relationships, and properties
  • Interactive Exploration: Supports drag-and-drop navigation and filtering
  • Relationship Clarity: Helps users intuitively understand data associations

Knowledge Graph Question Answering

Complex Relational Reasoning
"Map the technology stack relationships in our active projects and identify skill dependencies."

In this demonstration, the user asks, "Map the technology stack relationships in our active projects and identify skill dependencies." The Knowledge Graph Retrieval Agent uses the graph schema, historical QA, and domain-specific knowledge as RAG context to generate Cypher queries and retrieve relevant subgraphs. The Knowledge Graph Analysis Agent then analyzes the subgraph and, together with domain knowledge, provides insights into the relationships of interest. The Knowledge Graph Visualization Agent displays the subgraph below the answer, supporting drag-and-drop interaction and showing node properties for enhanced interpretability and user experience.

  • RAG-based Cypher Generation: Utilizes schema, historical QA, and domain knowledge for context
  • Subgraph Retrieval: Generates and executes Cypher to answer relational queries
  • Visual Explanation: Presents subgraphs interactively with node property details

Multi-Agent Collaboration

Coordinated Team Orchestration
"Analyze our project delivery capacity by correlating team skills, project complexity, and historical performance - what are our optimization opportunities?"

This final demonstration showcases the Data Leader orchestrating complete multi-agent collaboration using the ReAct paradigm. The user asks, "Analyze our project delivery capacity by correlating team skills, project complexity, and historical performance - what are our optimization opportunities?" The Data Leader Agent dynamically coordinates the Database and Knowledge Graph teams, integrating both structural and relational information through a three-stage iterative process. The interface displays the entire interaction between the leader and both teams, making the full workflow transparent and traceable, which enhances user trust and understanding.

  • Three-Stage Principle: Implementation of "explore-verify-analyze" methodology for complex query decomposition
  • Dynamic Team Dispatch: Intelligent delegation between Database Team and Knowledge Graph Team based on task requirements
  • Iterative Reasoning: ReAct cycle implementation with thought-action-observation phases
  • Comprehensive Synthesis: Integration of multi-source information for final answer generation
  • Full Process Transparency: All agent interactions are visible and traceable for the user

Research Results

Comprehensive evaluation across multiple benchmarks and model providers

Published Information Processing & Management 2026

DataFactory: Collaborative multi-agent framework for advanced table question answering

Tong Wang, Chi Jin, Yongkang Chen, Huan Deng, Xiaohui Kuang, Gang Zhao

Project Repository

Explore the official open-source implementation of DataFactory.

Includes framework logic, multi-agent orchestration, and reproducible workflows.

Performance Comparison

TabFact Accuracy Improvement +15.9%
WikiTQ Performance Gain +28.6%
FeTaQA Rouge-2 F Score 0.3885

Model Compatibility

Tested across 8 LLMs from 5 providers including:

  • Claude 4.0 Sonnet
  • Gemini 2.5 Flash
  • GPT-4o Mini
  • Qwen3 Series
  • DeepSeek-V3

BibTeX

Citation-ready entry
@article{WANG2026104723,
title = {DataFactory: Collaborative multi-agent framework for advanced table question answering},
journal = {Information Processing & Management},
volume = {63},
number = {6},
pages = {104723},
year = {2026},
issn = {0306-4573},
doi = {https://doi.org/10.1016/j.ipm.2026.104723},
url = {https://www.sciencedirect.com/science/article/pii/S0306457326001147},
author = {Tong Wang and Chi Jin and Yongkang Chen and Huan Deng and Xiaohui Kuang and Gang Zhao},
keywords = {Table question answering, Multi-agent systems, Large language models, Knowledge graph, Data factory, ReAct paradigm}
}