LLM-Based Iterative Whitepaper Generation System

Context

We need automated whitepaper generation that transforms technical documentation into business-focused content for decision-makers. The whitepaper must stay synchronized with technical docs but be regenerated manually (not automatically like technical docs). Traditional manual creation is time-consuming and becomes inconsistent.

Decision Drivers

Consistency: Terminology and narrative flow must be consistent across all chapters
Maintainability: Shell scripts don't scale; need readable, maintainable code with templates
Professional Output: LaTeX provides automated, professional typesetting without Word dependencies
Manual Control: Whitepaper updates are deliberate, not automatic with every code change
Source Synchronization: Source-to-chapter mappings must stay in sync as documentation grows
Discoverability: Should be immediately clear what each chapter is about
Cohesion: Related files (prompt, sources, output) should be grouped together

Decision

Iterative, Python-based LLM whitepaper generator with chapter-centric organization:

Key Components

Chapter-Centric Organization: Each chapter is a self-contained folder with descriptive name (e.g., 03-data-sovereignty instead of just 03). All related files (prompt, sources, output) live together.
LLM-Based Source Discovery: generate-sources.py uses LLM to automatically discover which documentation files are relevant for each chapter. This eliminates manual maintenance as documentation evolves.
Iterative Chapter Generation: Generate chapters sequentially, passing previous chapters as context to maintain narrative flow. Each prompt includes: glossary, previous chapters, source docs, chapter instructions.
Python + Jinja2: Use Python (not shell) with Jinja2 templates for clean separation of logic and content.
Centralized Glossary: Single config/glossary.md defines all business terms, included in every generation.
Manual CLI Triggering: Generate via make all or individual scripts - deliberate regeneration, not automatic.
LaTeX/PDF Output: Direct markdown → LaTeX → PDF pipeline for professional typesetting.
Separation of Concerns: Scripts in scripts/, config in config/, build output in build/.

Consequences

Positive:

Ensures consistency in terminology and narrative flow across chapters
Maintainable Python/Jinja2 code with clear separation of concerns
Professional LaTeX output suitable for business presentations
Flexible - prompts and glossary editable without code changes
Cost-controlled through manual triggering
Source mappings stay synchronized with documentation automatically via LLM discovery
Chapter folders immediately show what each chapter covers
All chapter-related files in one place (easier to review, modify, delete)
Easy to add new chapters (create folder with 3 files)

Negative:

Requires Python environment and LaTeX knowledge
Manual process - can become outdated if not regenerated
Sequential generation takes 5-10 minutes for full whitepaper
Requires LLM API access (no offline generation)
Initial prompt engineering effort needed
LLM source discovery is non-deterministic (may vary slightly between runs)

Key Trade-offs:

Manual vs. automatic: Chose manual for control over versioning
Python vs. shell: Chose Python for maintainability over simplicity
LaTeX vs. Word: Chose LaTeX for automation over familiarity
Sequential vs. parallel: Chose sequential for consistency over speed
Chapter folders vs. flat files: Chose folders for discoverability over simplicity
LLM vs. embedding-based source discovery: Chose simple LLM-only approach for lower complexity

Monitoring & Alerting

Identity Provider Setup

Microsoft Entra ID

Sources

LLM-Based Iterative Whitepaper Generation System ​

Context ​

Decision Drivers ​

Decision ​

Key Components ​