AWS Lambda Excel Processing: Cut Costs 90% for Financial Data | BFSI Cloud Solutions

First published on Dec 2, 2024 in Linkedin.

Imagine this: Your AWS Lambda function keeps crashing while handling big Excel files, running into memory constraints and timeout limits time and time again.

Ring a bell? We just went through the same scenario of performance lag and increased costs when handling large data in cloud environments.

But each barrier is an opportunity to learn.

This blog documents our whole journey of transformation – maturing from memory-intensive Excel processing challenges to quick and economical operations for large Excel file generation. We addressed GraphQL data fetching performance challenges and DynamoDB token management challenges at the same time, transforming problems into triumph.

The Challenge: Big Data Processing Issues

Our requirements were to have large Excel files via GraphQL API data fetching and S3 uploads. Early usage of Apache POI’s XSSF library resulted in

Memory allocation mistakes

Periodic function timeouts

Slow processing times

The Solution Framework: Strategic Optimization Approach

We redesigned our strategy entirely with methodical enhancements.

This is what we did to tackle root causes:

1. Enhanced DynamoDB Token Management Strategy

Previous State:

Session token retrieval for DynamoDB operations was around 20 seconds per query, leading to bottlenecks and workflow interruption. Token expiration and in-process failure impacted system reliability significantly.

Existing Implementation:

Our system automatically manages credentials and refreshes tokens before they expire. This ensures uninterrupted access to resources by AppSync and DynamoDB services, improving reliability and scalability while preserving AWS security best practices. Our process now runs smoothly without disruption or threat.

Successful Outcomes:

Processing duration for the data decreased from 1 hour 23 minutes to 6.41 minutes.

Processing failures eliminated, overall reliability attained

Monthly spending decreased from $18.26 to $1.65

IMPROVEMENTS

2. Excel-to-CSV Conversion: Adopting Streamlined Processing

Previous State:

Working with large data in Excel meant bringing entire workbooks into memory, resulting in memory overflow and system crashes. 100,000 rows would take 2-3 hours and 1.5GB of memory.

Current Implementation:

Data conversion into CSV format enables lightweight, incremental processing. Each processing run writes data out to temporary CSV files and appends to S3 master file.

Performance Improvements:

Processing time reduced from hours to 8-10 minutes

988MB of memory saved up

About $10 per month cost savings

Core Benefits:

Low Memory Footprint: CSV format avoids expensive operations

Optimized Appending: FileChannel offers bulk large file appending with no performance loss

Continuous Processing: Data is processed sequentially, offering reliable operations

IMPROVEMENTS

3. Streaming Excel for Big Dataset Management

Earlier State:

Apache POI XSSF Workbook implementation caused Lambda memory limit exceptions due to XSSF’s behavior of loading whole workbooks into memory, which caused excessive dataset processing failures. The process was unstable and time-consuming.

Current Implementation:

We utilized SXSSF (Streaming Excel Spreadsheet Format) in Excel conversion from CSV, leveraging effective row caching and file streaming capabilities.

Technical Benefits

Effective File Operations: Data goes into files instead of memory, avoiding memory overhead

Row Caching System: Caches in memory only a sliding window of rows

Large Dataset Support: Continuously supports big Excel files without impacting performance

Measurable Outcomes:

Processing time reduced from 2-3 hours to 3-5 minutes

Used Memory decreased by 988MB

Utilized 10GB of Ephemeral Lambda Storage

Attained near-perfect reliability

Monthly charges dropped from $2.97 down to $0.0011

IMPROVEMENTS

4. GraphQL Operations Utility Singleton Implementation

Earlier State:

Having multiple S3 and GraphQL classes per operation was memory-intensive and introduced redundant usage patterns.

Existing Implementation:

We used a thread-safe Singleton pattern for GraphQL operations to reduce memory usage, eliminate redundancy, and aggregate accesses for uniform operations. The design is easy to maintain, supports thread-safe operations, and improves performance, making it ideal for synchronized data processing.

Performance Benefits:

Monthly memory reserves of 256MB

Reducing operation time by 50%, saving an average of 5 minutes

Cost savings of $0.0021918

IMPROVEMENTS

Excel processing improvement in time, cost and memory — Improvement in time, cost and memory

The Outcomes:

Improved Performance, Reliability, and Cost-Effectiveness.

Through these strategic improvements, we transformed an inefficient system into a high-performance one.

Key Insights Performance tuning isn’t merely a matter of cost reduction—it makes systems smart rather than forceful.

In the case of handling vast amounts of data or designing scalable cloud offerings, deliberate performance optimization reaps enormous rewards.

Having the same issues or interested in applying to your business? We’d love to discuss and share ideas!

AWS Lambda Excel Processing: Cut Costs 90% for Financial Data | BFSI Cloud Solutions

The Challenge: Big Data Processing Issues

The Solution Framework: Strategic Optimization Approach

1. Enhanced DynamoDB Token Management Strategy

Previous State:

Existing Implementation:

Successful Outcomes:

IMPROVEMENTS

2. Excel-to-CSV Conversion: Adopting Streamlined Processing

Previous State:

Current Implementation:

Performance Improvements:

Core Benefits:

IMPROVEMENTS

3. Streaming Excel for Big Dataset Management

Earlier State:

Current Implementation:

Technical Benefits

Measurable Outcomes:

IMPROVEMENTS

4. GraphQL Operations Utility Singleton Implementation

Earlier State:

Existing Implementation:

Performance Benefits:

IMPROVEMENTS

The Outcomes:

Recent Posts

Webapp to WhatsApp Chatbot for Senior Management

From Idea to MVP: Building Your GenAI Agent

Nexttrophy Backend: Designing a Scalable Microservices Architecture in 2017

All Categories

Related Blogs