Authors: Luke Boening, Github, ChatGPT and Claude Sonnet 4.5
•
Last update: December 14, 2025 03:04:31
•
2 min read
Problem Solution
Products and Problems¶
Problem 1: Selective copying of files and folders¶
Problem Statement¶
Environment:
- Production Server: Windows Server 2022 hosting millions of document files (TIFF images, PDFs, etc.)
- Test Server: Windows Server 2022 requiring only a small subset (~hundreds) of documents for application testing
- Document Management System: IIS-hosted application that retrieves documents by metadata (document type + keywords)
- Database: Oracle database with complex relational structure mapping document metadata to physical file paths
- Document Structure: One logical document may consist of multiple physical files (e.g., multi-page TIFFs stored as individual files)
The Challenge: How to efficiently identify and synchronize only the specific files needed for test documents, when:
- File paths are not directly known - they must be derived through complex SQL queries
- One "document" maps to multiple physical files
- The Oracle database schema requires joining multiple tables to resolve paths
- Test server needs only 0.01% of production files but must mirror exact folder structure
- Direct SMB/network access between servers may not be available
- Manual copying is error-prone and doesn't scale
Solution Architecture¶
Three-Phase Approach:
Phase 1: Path Discovery (Database-Driven)¶
-- Query Oracle to identify file paths for specific documents
SELECT file_path, document_id, page_number
FROM document_files df
JOIN document_metadata dm ON df.document_id = dm.id
WHERE dm.document_type = 'Site Closure'
AND dm.site_name = 'Test Site A'
AND dm.status = 'ACTIVE'
Phase 2: Selective Collection¶
- Export path list to JSON/CSV configuration file
- PowerShell script reads configuration and builds directory structure
- Compress only required files maintaining full folder hierarchy
- Upload archive to Azure Blob Storage (intermediary)
Phase 3: Targeted Deployment¶
- Test server downloads compressed archive from Azure
- Extract files preserving complete folder structure
- Verify file counts and paths match database expectations
Key Benefits:
- ✅ Only transfers needed files (99.99% reduction in storage)
- ✅ Maintains exact folder structure required by IIS application
- ✅ Database-driven: paths determined by actual application logic
- ✅ Works without direct server-to-server connectivity
- ✅ Repeatable: Re-run queries to refresh test data as needed
- ✅ Audit trail: Database records which documents/files were selected
Implementation Notes:
- Use Oracle SQL to generate dynamic file path lists based on test scenarios
- Leverage Azure Blob Storage as secure transfer point
- PowerShell scripts handle compression, upload, download, extraction
- Schedule periodic syncs to keep test environment current
Flowchart¶
flowchart TD
A[Oracle Database] -->|SQL Query| B[Generate File Path List]
B --> C[Export to JSON/CSV]
C --> D[PowerShell: Collect Files]
D --> E[Compress Archive]
E --> F[Upload to Azure Blob]
F --> G[Test Server Downloads]
G --> H[Extract with Structure]
H --> I[Verify File Counts]
I --> J[IIS Application Ready]
Class diagram¶
classDiagram
DOCUMENT_METADATA "1" --> "*" DOCUMENT_FILES : contains
DOCUMENT_METADATA "*" --> "1" DOCUMENT_TYPES : has_type
DOCUMENT_METADATA "*" --> "1" SITES : belongs_to
class DOCUMENT_METADATA{
+int document_id
+string document_type
+string site_name
+string status
+date created_date
}
class DOCUMENT_FILES{
+int file_id
+int document_id
+string file_path
+int page_number
}
class DOCUMENT_TYPES{
+int type_id
+string type_name
}
class SITES{
+int site_id
+string site_name
}