This document explains a critical refactoring that unified transaction schemas across domains, fixed a movement_type extraction bug, and improved the overall architecture of our financial document parsing system.
The Problem We Faced
Initial Issue
Users were getting validation errors when uploading bank statements:
Through systematic debugging, we discovered the issue wasn't with AI extraction (OpenAI was correctly extracting movement_type), but with our data flow architecture.
Architecture Problems
1. Duplicate Transaction Schemas
We had two different transaction schemas serving the same purpose:
2. Data Loss During Conversion
The statements service was manually converting between transaction formats:
Result: Even though OpenAI extracted movement_type correctly, it was dropped during conversion.
3. Wrong Domain Ownership
1❌ BEFORE: AI Layer defines TransactionData
2- AI concerns mixed with business logic
3- Transaction schema owned by infrastructure layer
✅ Reduced complexity - fewer lines of code, fewer bugs
Long-term Improvements
✅ Easier maintenance - single place to update transaction schema
✅ Automatic compatibility - changes flow through automatically
✅ Better architecture - proper domain boundaries
✅ Future-proof - easier to add new transaction fields
Performance Gains
✅ No unnecessary object creation during conversion
✅ Direct object usage reduces memory allocations
✅ Simpler code paths improve readability and performance
Best Practices Established
Schema Design
Dependency Management
Data Transformation
Future Considerations
Schema Evolution
Add new fields only to transactions domain
Changes automatically propagate to all consumers
Version migration can be handled in one place
Testing Strategy
Test schema consistency across domains
Validate data flow from AI to database
Monitor for schema drift in CI/CD
Monitoring
Log schema field counts to detect missing fields
Track AI extraction success rates for new fields
Alert on validation failures during parsing
Conclusion
This refactor demonstrates the importance of proper domain-driven design and schema management. By moving transaction schemas to their rightful domain and eliminating redundant conversions, we:
Fixed immediate bugs (movement_type extraction)
Improved system architecture (proper domain boundaries)
Reduced future maintenance (single source of truth)
Enhanced debuggability (cleaner data flow)
The lesson: Architecture problems often manifest as data transformation
bugs. When debugging, look beyond the immediate error to understand the
underlying structural issues.