Find what matters in your codebase automatically
AI-powered 5-stage workflow that discovers relevant files, filters intelligently, and optimizes your codebase for implementation planning. From thousands of files to focused context.
Multi-Stage Intelligence
5-stage AI workflow with regex filtering, relevance assessment, and path discovery to identify the most relevant files.
Cost-Effective Operation
Token-optimized workflow with intelligent batching. Cost tracking built into every stage.
Real-Time Progress
Live progress tracking with stage-by-stage updates. See exactly what the AI is discovering.
The 5-Stage Discovery Process
Root Folder Selection
AI analyzes your directory structure (up to 2 levels deep) to identify relevant project areas. Uses hierarchical intelligence to select parent folders vs. subdirectories.
- Hierarchical directory analysis
- Smart parent/subdirectory selection
- Avoids redundant nested selections
Regex Pattern Generation & Filtering
Generates intelligent regex patterns and performs initial file filtering. Integrates with git to respect .gitignore rules and filter binary files.
- Dynamic regex pattern creation
- Git ls-files integration
- Binary file detection and exclusion
AI File Relevance Assessment
Deep content analysis using LLM to assess file relevance to your task. Uses intelligent batching with content-aware token estimation for optimal processing.
- Content-based relevance scoring
- Intelligent token-aware batching
- 2000-token overhead management
Extended Path Discovery
Discovers additional contextually relevant files through relationship analysis. Analyzes imports, configurations, and project structure to find related files.
- Import statement analysis
- Dependency graph traversal
- Configuration file discovery
Path Validation & Correction
Validates file paths and corrects inconsistencies. Ensures all discovered files exist, are accessible, and have normalized paths for cross-platform compatibility.
- File existence validation
- Path normalization
- Symbolic link resolution
Advanced Discovery Capabilities
Smart Token Management
Content-aware token estimation optimizes batching. Different ratios for JSON/XML (5 chars/token), code (3 chars/token), and text (4 chars/token) ensure efficient processing.
- Dynamic chunk sizing per file type
- 2000-token prompt overhead reservation
- Batch processing (100 files default)
- 30-second file caching TTL
Distributed Workflow Orchestration
WorkflowOrchestrator manages lifecycle with lazy initialization, dependency scheduling, and orphaned job recovery. Each stage runs as an independent background job.
- Stage dependency management
- Event-driven progress updates via Tauri
- WorkflowIntermediateData persistence
- Exponential backoff retry logic
Git Repository Integration
Executes `git ls-files --cached --others --exclude-standard` to respect .gitignore rules. Falls back to git2 library if command fails.
- Git ls-files with .gitignore respect
- Binary file detection and filtering
- Extension-based exclusion (97 types)
- Content analysis for binary detection
Implementation Plan Integration
Discovered files feed directly into the implementation planning system. Context is preserved and optimized for plan generation, ensuring comprehensive and accurate results.
- Seamless plan generation integration
- Context preservation across sessions
- Multi-model plan generation support
- Architectural synthesis preparation
Cost-Effective and Fast
Typical Cost
Per workflow run. Smart token optimization keeps costs minimal while maximizing discovery quality.
Processing Time
Depends on repository size and complexity. Real-time progress tracking with stage-by-stage updates.
Accuracy Rate
Multi-stage refinement with AI-powered relevance assessment and relationship analysis.
Experience Intelligent File Discovery
Let AI navigate your codebase intelligently. From repository analysis to implementation-ready context, this is how file discovery should work - smart, efficient, cost-effective.