Skip to main content

Avoiding Race Conditions in Payment Gateway Transaction IDs: Lessons from Real-World Experience

Photo by Blake Wisz on Unsplash


In distributed systems, generating unique and reliable transaction reference numbers can be a challenging task, especially in payment gateway setups where race conditions may occur. During my experience with such a system, I encountered issues with race conditions caused by database triggers. In this guide we will dive into the pitfalls of generating transaction reference numbers, the approaches you can take to avoid these issues, and the lessons I learned along the way.

Understanding the Problem

Transaction reference numbers are essential in any payment gateway. These unique identifiers ensure every transaction is traceable and auditable, helping with both internal processing and external reporting. Initially, we implemented a trigger-based approach that generated structured reference numbers using a combination of prefixes, dates, and sequential numbers.

While this approach worked well in a low transaction volume environment, the story changed when we scaled out to a higher transaction volume. As transaction volumes grew and multiple instances accessed the database concurrently, race conditions began to appear. The triggers could not keep up with concurrent requests, leading to concurrent tuple updates and transaction failures — a serious issue in any payment system.

Exploring Alternatives for Unique Transaction IDs

When addressing race conditions in a distributed system, two main approaches are worth considering:

  1. Using UUIDs for Transaction Reference Numbers
  2. Using Database-Generated Sequences with Structured Formatting

Let’s dive into the pros and cons of each approach.

Approach 1: UUIDs for Transaction Reference Numbers

UUIDs (Universally Unique Identifiers) offer a powerful solution for creating globally unique transaction identifiers without relying on a database sequence or a trigger.

Pros of UUIDs:

  • Guaranteed Uniqueness: UUIDs provide a global uniqueness guarantee, eliminating the risk of collisions across distributed systems.
  • Concurrency-Friendly: UUIDs are generated independently of the database, making them ideal for high-concurrency environments.
  • Efficient Generation: Creating UUIDs is fast and doesn’t require a database round trip, making it a non-blocking process.

Cons of UUIDs:

  • Readability: UUIDs are lengthy and lack a logical order, making them harder to read and trace for support teams and end-users.
  • Index and Storage Costs: Due to their size, UUIDs consume more storage and may impact index performance slightly compared to shorter, structured IDs.

Approach 2: Database-Generated Sequence with Structured Formatting

In this approach, reference numbers are generated using a database sequence combined with additional formatting, such as <prefix><year><month><padded-sequence>, like TX2410000000001.

Pros of Structured Sequences:

  • Readability and Context: Structured IDs are human-readable, providing useful context such as the transaction type and date.
  • Easier Debugging: These IDs are helpful for troubleshooting, allowing support teams to identify transactions quickly.
  • Sequential Order: Numeric sequences are naturally ordered, aiding in index performance and making transaction tracing straightforward.

Cons of Structured Sequences:

  • Database Dependency: Sequence-based IDs rely on the database for uniqueness, which can create bottlenecks, especially in high-volume, distributed environments.
  • Potential Race Conditions: Triggers and sequences are prone to race conditions when accessed by multiple instances, leading to concurrent tuple update if not handled properly.
  • Setup Complexity: Setting up and maintaining triggers and sequences requires careful planning, especially as your system scales.

Adjusting the Triggers: The Short-term Fix

Given the complexity of overhauling our reference number generation, we implemented a targeted adjustment in the database triggers. Initially, our reference numbers were generated using a database sequence in the format <prefix><year><month><padded-sequence>, resulting in identifiers like TX2410000000001. However, to avoid the race conditions caused by sequences in a distributed system, we shifted to an epoch-based approach combined with md5 hash.

Now, the reference number format is <prefix><year><month><md5hash>, resulting in identifiers like TX2410060C4F2FA. This new reference no was constructed from two main components:

Epoch Component:

epoch text := extract(epoch FROM clock_timestamp())::bigint::text;

The epoch time in seconds (e.g., 1730700917) is unique per second, but not per millisecond or microsecond. This means that multiple calls within the same second will generate the same epoch.

md5 Hash:

upper(left(md5(random()::text || epoch || pg_backend_pid()), 9))

md5 produces a 32-character hexadecimal hash (128 bits), which is known for its low collision rate in general-purpose hashing.

Using upper(left(md5(...), 9)) limits this down to 9 characters, reducing the number of bits in the hash and slightly increasing the chance of collisions.

The backend process ID is unique per session and adds some additional uniqueness to the hash by varying with different database sessions.

Analyzing Collision Risk with Additional Constraints

To better understand the robustness of this approach, let’s examine the collision probability of our new reference number structure.

Total Variability

  1. Prefix (TX) and date segment (YYMM) add uniqueness by limiting ID generation to month-by-month batches. IDs generated in different months will have entirely different prefixes.
  2. 9-character MD5 hash segment provides 68.7 billion unique possibilities per month.

Monthly Collision Rate

Each month, the possible number of unique reference nos (from the MD5 segment) remains the same, approximately 68.7 billion unique values from the 9-character MD5 hash.

  • For lower volumes (e.g., generating a few thousand IDs per month), the collision probability remains extremely low.
  • For high-volume use cases (generating millions of IDs per month), the collision probability increases but stays manageable due to the 36-bit hash space, especially considering the time constraint (new month resets the reference no structure).

Summary of Collision Probability with Monthly Constraints

  1. Per Month: We have around 68.7 billion possible IDs. Generating even millions of IDs monthly still results in a low probability of collision.
  2. Across Months: Each month starts fresh with a new YYMM combination, so there is no cross-month collision risk. The prefix further segments by year.

This revised approach achieves a high degree of uniqueness while keeping the implementation simple and efficient. By leveraging the epoch with md5 hash, we ensure that reference numbers are unique on a per-month basis without the overhead of sequence management, making it well-suited to distributed environments.

Transitioning Away from Triggers: The Long-term Migration Process

Having identified the limitations of the trigger-based approach, we decided that for long-term we have to transition to UUIDs for transaction IDs in production. Here’s the step-by-step migration strategy we used to avoid disruptions:

  1. Introduce a New Transaction ID Column:
    We added a new column, TransactionUUID, to store the UUIDs without disrupting the existing reference number column. This allowed us to generate the new IDs independently while keeping the old ones for historical data.
  2. Update Application Code:
    Next, we modified our application logic to generate UUIDs at the application layer instead of relying on the database. This eliminated the need for a database trigger and ensured each instance could generate unique IDs independently.
  3. Run Tests in Staging:
    Before deploying to production, we extensively tested the new UUID-based approach in a staging environment. This involved running performance benchmarks to ensure that UUID generation met our latency requirements under load.
  4. Phase Out the Old Trigger Gradually:
    After verifying that the new UUIDs worked correctly, we disabled the trigger in production. By monitoring for any anomalies, we were able to confirm that the new system functioned as expected without causing interruptions.
  5. Backfill Existing Transactions (Optional):
    To maintain uniformity, it best to consider backfilling UUIDs for existing transactions. However, if historical consistency isn’t a priority, you may retain the old reference numbers and use UUIDs only for new transactions.

Lessons Learned and Best Practices

This experience taught us several valuable lessons about handling transaction IDs in distributed systems. Here are some best practices:

  • Avoid Database Triggers in Distributed Systems: Relying on triggers for ID generation can lead to race conditions as your application scales. Using UUIDs or another distributed approach is generally safer and more scalable.
  • Use UUIDs for Scalability: UUIDs are particularly well-suited to distributed systems, where the guarantee of uniqueness across instances is critical. While they aren’t as readable as structured sequences, they offer greater reliability in high-concurrency environments.
  • Consider a Hybrid Approach: If readability is essential, consider generating structured, user-friendly IDs for display purposes while using UUIDs internally for database records.
  • Test Extensively in Staging: Before making changes to ID generation in production, test your approach in a staging environment with simulated load to catch potential issues early.

Conclusion

Choosing the right approach for generating transaction reference numbers is critical in payment systems. By transitioning from a trigger-based approach to UUIDs, we were able to eliminate race conditions and ensure reliable ID generation for our transactions. Whether you choose UUIDs or a database sequence, it’s essential to plan for scalability, minimize dependencies, and test thoroughly before making changes in production.

This journey taught us valuable lessons that can help others facing similar challenges. By sharing our experience, I hope to save others from the pitfalls of race conditions and help them design resilient and scalable payment systems. 

Comments

Popular posts from this blog

Understanding Number Systems: Decimal, Binary, and Hexadecimal

In everyday life, we use numbers all the time, whether for counting, telling time, or handling money. The number system we’re most familiar with is the   decimal system , but computers use other systems, such as   binary   and   hexadecimal . Let’s break down these number systems to understand how they work. What is a Number System? A number system is a way of representing numbers using a set of symbols and rules. The most common number systems are: Decimal (Base 10) Binary (Base 2) Hexadecimal (Base 16) Each system has a different “base” that tells us how many unique digits (symbols) are used to represent numbers. Decimal Number System (Base 10) This is the system we use daily. It has  10 digits , ranging from  0 to 9 . Example: The number  529  in decimal means: 5 × 1⁰² + 2 × 1⁰¹ + 9 × 1⁰⁰ =  500 + 20 + 9 = 529 Each position represents a power of 10, starting from the rightmost digit. Why Base 10? Decimal is base 10 because it has 10 digits...

How to Monetize Your API as an Individual Developer While Hosting on Your Own Server?

In the API economy, cloud services like AWS, Google Cloud, and Azure offer many conveniences, such as scaling and infrastructure management. However, some developers prefer more control and autonomy, opting to host their APIs on personal servers. Whether for cost efficiency, data privacy, or customization, hosting your own API comes with both advantages and challenges. But, even without cloud platforms, there are effective ways to monetize your API. This guide will explore how individual developers can successfully monetize their APIs while hosting them on their own servers. Why Host Your API on Your Own Server? Hosting your own API gives you full control over the infrastructure and potentially lower long-term costs. Here’s why some developers choose this approach: Cost Control : Instead of paying ongoing cloud fees, you may opt for a one-time or lower-cost hosting solution that fits your budget and resource needs. Data Ownership : You have full control over data, which is critical if ...

API Testing with Jest and Supertest: A Step-by-Step Guide

API testing is essential to ensure your endpoints behave as expected across all scenarios. In this guide, we’ll explore how to use Jest and Supertest to test a sample API with various response types, including success, authentication errors, and validation errors. By the end, you’ll understand how to apply these tools to check for different response structures and status codes. 0. Prerequisites: Setting Up Your Environment Before diving into API testing, it’s important to ensure that your development environment is properly set up. Here’s what you need to do: Step 1: Install Node.js and npm Node.js  is a JavaScript runtime that allows you to run JavaScript code on the server side. It comes with  npm  (Node Package Manager), which helps you install and manage packages. Installation Steps: Download and install Node.js from the  official website . To verify the installation, open your terminal and run: node -v npm -v This should display the installed versions of Node.js...

The Weight of Responsibility: A Developer’s Journey to Balance Passion and Reality

For the past several years, Eddie has been on a steady climb in his career as a developer, but recently, he found himself at a crossroads — caught between the weight of his responsibilities and the desire to pursue his true passions. His journey began with a three-month internship as a web developer, which led to nearly four years in an application developer role. After that, he spent almost a year as a systems associate, managing tasks across systems analysis, quality assurance, and business analysis. Eventually, he returned to full-time software development for another two years before transitioning into more complex roles. For over a year, he worked as a multi-role software developer and database administrator before stepping into his current position as a senior software developer, database administrator, and cloud administrator — occasionally handling security tasks as well. Now, with over 8 years of professional experience, he also leads a small team of developers, which has been...

Avoiding Confusion in API Design: The Importance of Clear Responses

In today’s fast-paced software development landscape, APIs play a crucial role in connecting services and enabling functionality. However, poor design choices can lead to confusion and inefficiency for both developers and users. One such choice is the omission of a response body for successful requests, a practice I recently encountered in an enterprise API designed for bill payments. The Case of the No-Response API The API in question serves two main endpoints: one for inquiring about account validity and another for confirming payment. When successful, the API returned a  200 OK  status but no response body. This design choice led to significant confusion during our integration process. Even the internal team who developed the said API struggled to justify this approach, revealing a lack of clarity around the rationale behind it. Pros of This Design Choice While the intention behind this design may have been to streamline responses, several potential benefits can be identifi...