Swiss AI Hub Whitepaper Generator
Automated LLM-based system for generating business-focused whitepaper chapters from technical documentation.
Directory Structure
whitepaper/
├── chapters/ # Self-contained chapter folders
│ ├── 00-executive-summary/
│ │ ├── prompt.md # Chapter writing instructions
│ │ ├── sources.md # Source doc mappings
│ │ └── output.md # Generated chapter
│ ├── 01-business-challenge/
│ │ └── ...
│ └── XX-chapter-name/
│ └── ...
├── config/ # Configuration files
│ ├── glossary.md # Terminology definitions
│ ├── general_prompt.md # General writing instructions
│ └── metadata.yaml # PDF metadata (title, author, date)
├── scripts/ # Generation scripts
│ ├── generate-sources.py # LLM-based source discovery
│ ├── generate-whitepaper.py # Chapter generation
│ └── llm_utils.py # Shared utilities
├── templates/ # Jinja2 prompt templates
│ └── full_prompt.j2
├── graphics/ # Images for the whitepaper
├── build/ # Build output
│ ├── whitepaper.tex # LaTeX template
│ └── whitepaper.pdf # Generated PDF
├── Makefile # Build automation
└── README.mdHow It Works
The generator consists of two scripts:
generate-sources.py- LLM-based source discovery (optional, run when docs change)generate-whitepaper.py- Chapter generation from sources
Source Discovery
The source discovery script scans all documentation files and uses an LLM to determine which docs are relevant for each chapter. This eliminates manual maintenance of sources.md files as documentation grows.
Chapter Generation
The generator builds a combined prompt containing:
- Terminology Glossary (
config/glossary.md) - Consistent term definitions - Previous Chapters (from earlier
chapters/*/output.md) - For style consistency - Source Documentation (from
chapters/XX/sources.md) - Technical facts - Chapter Instructions (
chapters/XX/prompt.md) - What to write - General Guidelines (
config/general_prompt.md) - Writing style
This prompt is sent to an LLM, which generates a polished business chapter.
Prerequisites
# 1. Install LLM CLI tool
pipx install llm
# 2. Install Python dependencies
pip install jinja2
# 3. Install mdformat for automatic formatting (optional but recommended)
pip install mdformat mdformat-gfm mdformat-frontmatter mdformat-myst mdformat-pyproject
# 4. Configure API key for your LLM provider
llm keys set gemini # For Gemini (default)
llm keys set anthropic # For Claude
llm keys set openai # For GPT-4
# 5. Install pandoc and LaTeX for PDF generation
# macOS:
brew install pandoc
brew install --cask mactex
# Ubuntu/Debian:
sudo apt install pandoc texlive-xetex texlive-fonts-recommendedUsage
Recommended Workflow (Makefile)
# Full pipeline: discover sources, generate chapters, build PDF
make all
# Or run individual steps:
make sources # Update source mappings
make chapters # Generate whitepaper chapters
make pdf # Build PDF from chapters
# Clean generated files
make cleanManual Workflow
# 1. Update source mappings when documentation changes
python scripts/generate-sources.py
# 2. Generate whitepaper chapters
python scripts/generate-whitepaper.py
# 3. Build PDF
make pdfSource Discovery (generate-sources.py)
Automatically discovers which documentation files are relevant for each chapter using LLM analysis. Run this when:
- Documentation structure changes (new files, renamed files, deleted files)
- Chapter prompts are updated with new topics
- You want to refresh source mappings
# Update sources for all chapters
python scripts/generate-sources.py
# Update specific chapters only
python scripts/generate-sources.py 03-data-sovereignty 05-administration-governance
# Preview without writing files
python scripts/generate-sources.py --dry-run
# List chapters and their source counts
python scripts/generate-sources.py --list
# Use a different model
python scripts/generate-sources.py --model claude-3-5-sonnet-20241022The script:
- Scans all
*.en.mdfiles indocs/docs/ - Extracts title and summary from each file
- Sends chapter prompt + doc manifest to LLM
- LLM returns relevant file paths
- Writes validated paths to
chapters/XX/sources.md
Note: Generated source files can be manually adjusted. The script adds a comment indicating they were auto-generated.
Chapter Generation (generate-whitepaper.py)
Generate All Chapters
python scripts/generate-whitepaper.pyGenerate Specific Chapters
python scripts/generate-whitepaper.py 00 03List Available Chapters
python scripts/generate-whitepaper.py --listUse Different Model
python scripts/generate-whitepaper.py --model claude-3-5-sonnet-20241022
python scripts/generate-whitepaper.py --model gpt-4Get Help
python scripts/generate-whitepaper.py --helpKey Features
1. Chapter-Centric Organization
Each chapter is self-contained in its own folder with all related files:
prompt.md- Writing instructions for this chaptersources.md- Documentation files to use as source materialoutput.md- Generated chapter content
This makes it easy to understand what each chapter is about and find all related files.
2. Terminology Glossary
The config/glossary.md file ensures consistent terminology across all chapters. When you define terms here, the LLM will use them consistently throughout the whitepaper.
3. Intelligent Regeneration
When regenerating an existing chapter, the script:
- Includes the current version in the prompt
- Instructs the LLM to improve it with new information
- Preserves manual edits that don't conflict with source docs
- Maintains consistent style
4. Chapter Consistency
The script automatically includes all previously generated chapters in the prompt to ensure:
- Consistent writing style
- Coherent narrative flow
- No repetition across chapters
5. Automatic Formatting
Generated markdown files are automatically formatted using mdformat with your project's pyproject.toml configuration (line wrapping at 120 characters, GFM extensions, etc.).
6. Cost Tracking
Both scripts track token usage and display an estimated cost summary at the end of each run:
💰 Usage Summary
────────────────────────────────────
LLM Calls: 17
Input tokens: 245,000
Output tokens: 42,000
Total tokens: 287,000
Est. cost: $0.5940 USDCreating a New Chapter
1. Create Chapter Directory
mkdir chapters/XX-chapter-name2. Create Chapter Prompt: chapters/XX-chapter-name/prompt.md
# Chapter X: Your Title
## Chapter Objective
Describe what this chapter accomplishes...
## Target Audience
- Decision makers evaluating...
- Administrators planning...
## Key Topics to Cover
- Topic 1
- Topic 2
- Topic 3
## Questions This Chapter Must Answer
- Question 1?
- Question 2?
## Writing Style
- Tone: Business-focused, accessible
- Length: 5-6 pages (2000-2400 words)3. Discover Sources Automatically
# Let LLM find relevant documentation
python scripts/generate-sources.py XX-chapter-name
# Review the generated sources file
cat chapters/XX-chapter-name/sources.mdAlternatively, create chapters/XX-chapter-name/sources.md manually:
# Source Documentation for Chapter X
# Paths relative to docs/docs/
2_platform/8_knowledges/index.en.md
2_platform/5_agents/2_rag_agent/index.en.md4. Generate the Chapter
python scripts/generate-whitepaper.py XX-chapter-name5. Review and Iterate
- Review
chapters/XX-chapter-name/output.md - Refine prompt if needed
- Regenerate until satisfied
Glossary Management
The config/glossary.md file defines standard terminology. When adding terms:
- Verwendung: How to write it (e.g., "Agenten-Profil")
- Definition: What it means
- Kontext: Additional context or usage notes
The glossary is automatically included in every chapter generation prompt.
Model Selection
Different models have different strengths:
- gemini-3-pro-preview (default): Powerhorse, best quality
- gemini-2.5-flash: Fast, cost-effective, good quality
- claude-3-5-sonnet: Best for nuanced business writing
- gpt-4: Strong technical accuracy
Troubleshooting
"llm command not found"
pipx install llm"API key not set"
llm keys set gemini # or anthropic, openaiOutput is too technical
Update the chapter prompt to emphasize:
- "Write in business language accessible to non-technical decision makers"
- "Focus on WHAT and WHY, not HOW"
Output doesn't cover requirements
Add to the chapter prompt:
- "This chapter must address: [specific requirements]"
- "Provide concrete examples for each requirement"
Generation fails
- Verify source file paths are correct
- Check API rate limits
- Try a different model
- Check
chapters/XX/sources.mdfor typos
No line breaks in output
Make sure mdformat plugins are installed:
pip install mdformat-pyproject mdformat-gfm mdformat-frontmatter mdformat-mystBest Practices
- Run source discovery first when documentation changes significantly
- Generate sequentially (00, 01, 02...) for best consistency
- Review and iterate on prompts before moving to next chapter
- Commit everything - prompts, sources, outputs, and glossary
- Update glossary first when introducing new terminology
- Test with different models to find the best fit for each chapter
- Review auto-generated sources - LLM discovery is good but not perfect; adjust manually if needed
- Use dry-run mode (
--dry-run) to preview source discovery before committing
Advanced: Template Customization
Edit templates/full_prompt.j2 to customize the prompt structure:
## MY CUSTOM SECTION
{{ my_custom_variable }}
{% if condition %}
Conditional content here
{% endif %}
{% for item in items %}
- {{ item }}
{% endfor %}Then update scripts/generate-whitepaper.py to pass the new variables to template.render().
Support
- LLM CLI: https://github.com/simonw/llm
- Jinja2 Docs: https://jinja.palletsprojects.com/
- Project Repo: https://github.com/bbvch-ai/aihub-core
