Fintech
Fintech startup · Series A · Payment processing platform
The client operated a payment processing platform serving several hundred businesses. The core payment API handled transaction initiation, provider communication (Stripe and ACH), status tracking, and webhook processing. The system was built as a single synchronous flow — every payment went through one request-response cycle with no retry logic, no event separation, and no audit trail.
Client App
React SPA
API Gateway
Rate limiting + auth
Webhook Receiver
External events
Payment Service
Event-driven · Idempotent
PostgreSQL
Transactions
Event Bus
Async processing
Audit Log
Compliance trail
Payment Provider
Stripe / ACH
Notification Service
Email + Slack
Transactions failed silently under peak load because the synchronous pipeline could not handle concurrent requests above a certain threshold. There was no queue, no backpressure, and no retry mechanism.
The Stripe webhook handler was tightly coupled to the main transaction flow. When Stripe sent a delayed webhook, the system sometimes processed it out of order, causing balance mismatches that required manual reconciliation.
There was no audit trail. When a transaction failed or a client disputed a charge, the engineering team had to reconstruct what happened by reading application logs — which were unstructured console.log statements.
The compliance team had flagged the lack of traceability as a blocker for their next regulatory review.
We traced every payment from initiation to settlement, documenting the 14 distinct states a transaction could be in and the transitions between them. This revealed three race conditions that were causing the balance mismatches.
Instead of one synchronous flow, we broke the process into events: payment.initiated, payment.provider_submitted, payment.confirmed, payment.settled. Each event is processed independently with its own handler.
Every event handler is idempotent — processing the same event twice produces the same result. Failed events go into a retry queue with exponential backoff. After three failures, they move to a dead letter queue for manual review.
Every state transition, every external API call, and every webhook received is logged in a structured audit table with timestamps, actor IDs, and payload snapshots. The compliance team can now trace any transaction end-to-end without engineering help.
Incoming webhooks are validated, logged, and placed into the event queue immediately. They are no longer processed inline with the transaction flow, eliminating the ordering problem entirely.
backend
data
integrations
infrastructure
Timeline
10 weeks
Team
1 senior engineer (AxionvexTech) + 2 internal engineers on the client side
Transaction failures under load
Regular failures during peak hours
Zero transaction-level failures in the 3 months following launch
Balance reconciliation
Weekly manual reconciliation taking 2–3 hours
Automated — discrepancies flagged in real time
Incident investigation
Grep through logs, reconstruct manually
Full audit trail searchable by transaction ID in seconds
Compliance readiness
Flagged as a blocker by compliance team
Passed regulatory review without additional remediation
“The system has been in production for over six months. The client's CTO noted that the engineering team's confidence in the payment system changed noticeably — they went from avoiding payment-related tickets to actively picking them up.”