AWS Lambda Excel Processing: Cut Costs 90% for Financial Data | BFSI Cloud Solutions

AWS Lambda Excel Processing: Cut Costs 90% for Financial Data | BFSI Cloud Solutions
First published on Dec 2, 2024 in Linkedin.
Imagine this: Your AWS Lambda function keeps crashing while handling big Excel files, running into memory constraints and timeout limits time and time again.
Ring a bell? We just went through the same scenario of performance lag and increased costs when handling large data in cloud environments.
But each barrier is an opportunity to learn.
This blog documents our whole journey of transformation – maturing from memory-intensive Excel processing challenges to quick and economical operations for large Excel file generation. We addressed GraphQL data fetching performance challenges and DynamoDB token management challenges at the same time, transforming problems into triumph.
The Challenge: Big Data Processing Issues
Our requirements were to have large Excel files via GraphQL API data fetching and S3 uploads. Early usage of Apache POI’s XSSF library resulted in
- Memory allocation mistakes
- Periodic function timeouts
- Slow processing times
The Solution Framework: Strategic Optimization Approach
We redesigned our strategy entirely with methodical enhancements.
This is what we did to tackle root causes:
1. Enhanced DynamoDB Token Management Strategy
Previous State:
Session token retrieval for DynamoDB operations was around 20 seconds per query, leading to bottlenecks and workflow interruption. Token expiration and in-process failure impacted system reliability significantly.
Existing Implementation:
Our system automatically manages credentials and refreshes tokens before they expire. This ensures uninterrupted access to resources by AppSync and DynamoDB services, improving reliability and scalability while preserving AWS security best practices. Our process now runs smoothly without disruption or threat.
Successful Outcomes:
- Processing duration for the data decreased from 1 hour 23 minutes to 6.41 minutes.
- Processing failures eliminated, overall reliability attained
- Monthly spending decreased from $18.26 to $1.65

IMPROVEMENTS

2. Excel-to-CSV Conversion: Adopting Streamlined Processing
Previous State:
Working with large data in Excel meant bringing entire workbooks into memory, resulting in memory overflow and system crashes. 100,000 rows would take 2-3 hours and 1.5GB of memory.
Current Implementation:
Data conversion into CSV format enables lightweight, incremental processing. Each processing run writes data out to temporary CSV files and appends to S3 master file.
Performance Improvements:
- Processing time reduced from hours to 8-10 minutes
- 988MB of memory saved up
- About $10 per month cost savings
Core Benefits:
- Low Memory Footprint: CSV format avoids expensive operations
- Optimized Appending: FileChannel offers bulk large file appending with no performance loss
- Continuous Processing: Data is processed sequentially, offering reliable operations

IMPROVEMENTS

3. Streaming Excel for Big Dataset Management
Earlier State:
Apache POI XSSF Workbook implementation caused Lambda memory limit exceptions due to XSSF’s behavior of loading whole workbooks into memory, which caused excessive dataset processing failures. The process was unstable and time-consuming.
Current Implementation:
We utilized SXSSF (Streaming Excel Spreadsheet Format) in Excel conversion from CSV, leveraging effective row caching and file streaming capabilities.
Technical Benefits
- Effective File Operations: Data goes into files instead of memory, avoiding memory overhead
- Row Caching System: Caches in memory only a sliding window of rows
- Large Dataset Support: Continuously supports big Excel files without impacting performance
Measurable Outcomes:
- Processing time reduced from 2-3 hours to 3-5 minutes
- Used Memory decreased by 988MB
- Utilized 10GB of Ephemeral Lambda Storage
- Attained near-perfect reliability
- Monthly charges dropped from $2.97 down to $0.0011

IMPROVEMENTS

4. GraphQL Operations Utility Singleton Implementation
Earlier State:
Having multiple S3 and GraphQL classes per operation was memory-intensive and introduced redundant usage patterns.
Existing Implementation:
We used a thread-safe Singleton pattern for GraphQL operations to reduce memory usage, eliminate redundancy, and aggregate accesses for uniform operations. The design is easy to maintain, supports thread-safe operations, and improves performance, making it ideal for synchronized data processing.
Performance Benefits:
- Monthly memory reserves of 256MB
- Reducing operation time by 50%, saving an average of 5 minutes
- Cost savings of $0.0021918

IMPROVEMENTS

The Outcomes:
- Improved Performance, Reliability, and Cost-Effectiveness.
- Through these strategic improvements, we transformed an inefficient system into a high-performance one.
- Key Insights Performance tuning isn’t merely a matter of cost reduction—it makes systems smart rather than forceful.
In the case of handling vast amounts of data or designing scalable cloud offerings, deliberate performance optimization reaps enormous rewards.
Having the same issues or interested in applying to your business? We’d love to discuss and share ideas!