Overview
DocSmith is a full-stack document generation and workspace collaboration platform built to automate the creation of complex Word and Excel documents. The platform uses dynamic TypeScript templates, AI-powered context from AnythingLLM, and asynchronous job tracking. It includes customer management, secure file handling, and both desktop (Electron) and web interfaces.
Project at a Glance
- Project Name: DocSmith
- Type: Full-Stack Document Automation Platform
- Stack: Express + TypeScript Backend, React Frontend, Electron Desktop
- Timeline: 8-10 weeks initial development
- Deployment: Internal Enterprise Use (Desktop & Web)
The Challenge
The organization needed to automate complex document creation while maintaining flexibility and quality:
- Manual Process: Creating custom Word and Excel documents took significant time
- Template Management: Needed organized system for document templates and customer files
- AI Context: Required AI assistance for dynamic content generation
- Job Tracking: Needed visibility into document generation progress and errors
- Multi-Platform: Required both desktop and web access
- Customer Isolation: Strict separation of customer data and files needed
The DocSmith Solution
I built DocSmith as a comprehensive document automation platform with three main components: an Express + TypeScript backend, a React frontend with shadcn/ui, and an Electron desktop app. The system integrates with AnythingLLM for AI-powered workspace collaboration and document context.
Core Features
- Customer Management: Full CRUD operations for customer records with secure file uploads and AI prompt history
- Template System: TypeScript-based document generators producing Word and Excel files from dynamic templates
- AI Integration: AnythingLLM workspace integration with streaming chat, document embedding, and context-aware generation
- Asynchronous Job Tracking: Persistent job status monitoring with detailed logs in both database and file system
- Customer-Centric File System: Organized folder structure with template library and isolated customer data
- Desktop & Web Support: Electron app for local file access alongside React SPA for web access
- Document Generation Pipeline: Pandoc-based document assembly with html-to-docx fallback
- Settings Management: Configurable API endpoints, first-run wizard, and development tools
Document Generation Pipeline
- Template Discovery: System finds TypeScript generator in `data/templates/{TemplateName}/` directory
- AI Context Building: Queries AnythingLLM workspace to gather relevant context and customer information
- Dynamic Generation: TypeScript generator produces Word XML (WML) markup with dynamic content
- Document Assembly: Pandoc converts markup to DOCX format (with html-to-docx as fallback)
- Job Tracking: Status and logs persisted in `.jobs/jobs.json` and `gen_cards` database table
- File Storage: Completed documents saved to customer-specific folders with secure access
Technologies Used
The Results
DocSmith transformed document generation workflows with measurable improvements:
- Automated Workflows: Complete end-to-end document generation from templates to final output
- Multi-Platform Access: Desktop app and web interface for flexible deployment
- Job Visibility: Real-time tracking of all document generation jobs with persistent status
- Template Reusability: TypeScript-based generators enabling maintainable, version-controlled templates
- AI-Enhanced Context: AnythingLLM integration providing intelligent document context and chat assistance
- Customer Isolation: Secure, organized file structure preventing data cross-contamination
Key Innovation: Full-Stack Document Pipeline
DocSmith's most powerful feature is its comprehensive document generation pipeline combining multiple technologies:
- TypeScript generators produce Word XML (WML) for precise document structure control
- Pandoc conversion creates final DOCX files with html-to-docx fallback
- AnythingLLM integration provides AI-powered context building and workspace collaboration
- Persistent job tracking in both SQLite database and file system (.jobs/jobs.json)
- Customer-centric file organization with separate folders and template library
- Desktop and web deployment via Electron and React SPA
Security & Compliance
DocSmith implements multiple layers of security controls to protect customer data and prevent vulnerabilities:
Security Features
- SQL Injection Prevention: All database queries use parameterized statements (db.prepare)
- Path Traversal Protection: Sanitized customer IDs, canonical directory validation, restricted file access
- Electron IPC Safety: Whitelisted channels, contextBridge isolation, input validation on all IPC calls
- Customer Data Isolation: Separate folders per customer, no cross-customer file access, secure upload handling
- Log Management: Automatic log rotation, no sensitive data in logs, temp file cleanup, winston logging
- API Security: Configurable API keys for AnythingLLM integration, environment-based configuration
Lessons Learned & Best Practices
Building DocSmith as a full-stack document automation platform provided valuable technical insights:
- Raw SQL Over ORMs: Direct SQLite queries with prepared statements provided better control and security than ORM abstractions
- TypeScript for Templates: Code-based document generators enabled version control, testing, and maintainability vs. static templates
- Dual Tracking Systems: Combining database (gen_cards table) and file system (.jobs/jobs.json) job tracking provided resilience and debugging capability
- Electron Security is Complex: Proper IPC channel whitelisting, contextBridge usage, and preload scripts are critical for desktop app security
- AI Integration Requires Flexibility: AnythingLLM workspace and chat integration needed careful error handling, streaming support (SSE), and fallback mechanisms
- Customer-Centric File Structure: Organizing data by customer folders simplified access control, backups, and data isolation
- Pandoc is Powerful: Leveraging Pandoc for document conversion reduced custom code while maintaining flexibility with fallback options
Use Cases Beyond This Project
DocSmith's architecture can be adapted for various document automation scenarios:
- Professional Services: Client-specific report generation with template libraries and AI context assistance
- Document Workflows: Multi-step document creation with job tracking, approval workflows, and audit trails
- Template Management: Centralized template libraries with TypeScript-based generators for complex document structures
- Data-Driven Documents: Database-to-document pipelines converting structured data to formatted Word/Excel files
- AI-Enhanced Generation: Intelligent document creation using workspace collaboration and embedded context
- Cross-Platform Solutions: Desktop and web access to document generation systems for flexible deployment
- Customer Management: Applications requiring customer-centric data organization with file uploads and secure isolation
Ready to Automate Your Document Workflows?
Let's discuss how AI can transform your team's productivity. Whether it's document generation, data processing, or workflow automation, we can build the perfect AI solution for your needs.
Schedule a Consultation