Skip to main content

Sharding an Existing Payment Gateway Transaction Table: A Complete Guide

Photo by Erol Ahmed on Unsplash


In payment gateway systems, the transaction table often grows rapidly, leading to performance bottlenecks. One effective way to scale a transactional database is sharding — splitting the database into smaller, more manageable pieces based on a shard key. In this guide we will walk through the process of sharding an existing transaction table in PostgreSQL, explaining key considerations, risks, and two popular sharding approaches: row-based and table-based.

Key Considerations and Risks

Before diving into the implementation of sharding, it’s important to understand the key considerations and risks associated with this approach:

Considerations:

  • Sharding Key: The selection of a shard key is crucial. A good shard key ensures even distribution of data, while also aligning with the most frequent query patterns. For payment transactions, transaction_date or customer_id can be used.
  • Database Complexity: Sharding increases the complexity of your database management. You’ll need to manage multiple shards, ensure data consistency across shards, and optimize cross-shard queries.
  • Data Migration: Migrating existing data to shards can be resource-intensive, especially when you have a large amount of data in the original transactions table.
  • Maintenance Overhead: Ongoing maintenance, such as schema updates, backup, and scaling the database, becomes more complex with sharded systems.

Risks:

  • Cross-Shard Queries: Queries that need to aggregate data across multiple shards can become slower and more complicated.
  • Data Imbalance: If the shard key does not evenly distribute the data, some shards may become overloaded, while others are underutilized.
  • Application Changes: Your application logic will need to be adapted to handle sharding, such as routing queries to the correct shard based on the shard key.

What Needs to Be Done First?

Before implementing sharding, follow these initial steps:

  1. Assess Current Data Volume: Evaluate the current size of the transactions table. If the table is too large, it’s a sign that sharding might be necessary.
  2. Choose a Shard Key: Choose a shard key that will ensure even distribution of data. For transaction-based data, a time-based shard key (e.g., transaction_date) or customer_id might be appropriate.
  3. Decide on Sharding Strategy: Decide whether to use row-based or table-based sharding. Each approach has its pros and cons, which we’ll dive into shortly.
  4. Prepare for Data Migration: Plan for migrating data from the existing transactions table to the newly created shards. This process may require downtime or a strategy for migrating data incrementally.

Row-Based vs Table-Based Sharding: Which Is Best?

Row-Based Sharding

In row-based sharding, data is distributed across different schemas based on the shard key. For example, you may create a schema for each year (e.g., shard_2022shard_2023), and each schema will have the same table structure, with the data filtered by the year.

Pros:

  • Logical Isolation: Each year or time period has its own schema, making it easy to manage and isolate data.
  • Easier Schema Evolution: Schema changes can be made to individual shards without affecting other shards.

Cons:

  • Limited Flexibility: The database grows based on the chosen shard key, so if the volume increases drastically, you may hit limits with the number of schemas.
  • Cross-Shard Queries: Queries that span multiple years (e.g., to retrieve data from different years) will require querying multiple schemas.

Table-Based Sharding

In table-based sharding, each shard is a separate table within the same schema (e.g., transactions_2022transactions_2023). The data is split into tables based on the shard key.

Pros:

  • Fine-Grained Control: You can split data across tables in the same schema or distribute the tables across different physical instances.
  • Scalability: Easier to distribute the shards across multiple database instances for horizontal scaling.

Cons:

  • Schema Clutter: The same schema will have many tables, potentially cluttering it.
  • Cross-Table Queries: Aggregating data across multiple tables can be cumbersome.

Sharding Example 1: Row-Based Sharding

In this example, we will shard the transactions table by the year extracted from transaction_date.

Step 1: Create Shard Schemas

-- Create schemas for each year (shard)
CREATE SCHEMA IF NOT EXISTS shard_2022;
CREATE SCHEMA IF NOT EXISTS shard_2023;

Step 2: Create Identical Tables in Each Shard

-- Create the same table in each schema
CREATE TABLE shard_2022.transactions (
transaction_id UUID PRIMARY KEY,
customer_id INT,
amount NUMERIC(10, 2),
transaction_date TIMESTAMPTZ,
status VARCHAR(50)
);

CREATE TABLE shard_2023.transactions (
transaction_id UUID PRIMARY KEY,
customer_id INT,
amount NUMERIC(10, 2),
transaction_date TIMESTAMPTZ,
status VARCHAR(50)
);

Step 3: Migrate Data to Shards

-- Insert data into the respective shards based on the year of TransactionDate
INSERT INTO shard_2022.transactions
SELECT * FROM transactions WHERE EXTRACT(YEAR FROM transaction_date) = 2022;

INSERT INTO shard_2023.transactions
SELECT * FROM transactions WHERE EXTRACT(YEAR FROM transaction_date) = 2023;

Step 4: Querying Data

-- Query transactions for 2022
SELECT * FROM shard_2022.transactions WHERE customer_id = 101;

-- Query transactions for 2023
SELECT * FROM shard_2023.transactions WHERE customer_id = 202;

Sharding Example 2: Table-Based Sharding

In this example, we will shard the transactions table by the year, but instead of using schemas, we will create separate tables for each year in the same schema.

Step 1: Create Shard Tables

-- Create separate tables for each year in the same schema
CREATE TABLE transactions_2022 (
transaction_id UUID PRIMARY KEY,
customer_id INT,
amount NUMERIC(10, 2),
transaction_date TIMESTAMPTZ,
status VARCHAR(50)
);

CREATE TABLE transactions_2023 (
transaction_id UUID PRIMARY KEY,
customer_id INT,
amount NUMERIC(10, 2),
transaction_date TIMESTAMPTZ,
status VARCHAR(50)
);

Step 2: Migrate Data to Shard Tables

-- Insert data into the respective shard tables based on the year
INSERT INTO transactions_2022
SELECT * FROM transactions WHERE EXTRACT(YEAR FROM transaction_date) = 2022;

INSERT INTO transactions_2023
SELECT * FROM transactions WHERE EXTRACT(YEAR FROM transaction_date) = 2023;

Step 3: Querying Data

-- Query transactions for 2022
SELECT * FROM transactions_2022 WHERE customer_id = 101;

-- Query transactions for 2023
SELECT * FROM transactions_2023 WHERE customer_id = 202;

Summary and Comparison: Row-Based vs. Table-Based Sharding

Row-Based Sharding (Schemas)

  • Logical Separation: Better isolation with separate schemas.
  • Cross-Shard Queries: Requires cross-schema queries.
  • Scalability: Limited by the number of schemas.
  • Operational Overhead: Higher complexity in schema management.

Table-Based Sharding (Tables)

  • Logical Separation: Same schema with multiple tables.
  • Cross-Shard Queries: Requires union or cross-table queries.
  • Scalability: Easier to distribute tables across instances.
  • Operational Overhead: Simpler table management, but more tables.

Conclusion

Sharding your transactions table is an effective way to scale a payment gateway system. Whether you choose row-based or table-based sharding depends on your specific use case. Row-based sharding offers better isolation, while table-based sharding allows for greater scalability. Both approaches have their pros and cons, and choosing the right strategy requires a deep understanding of your system's requirements and future growth potential.

By following the detailed steps and examples provided, you can successfully implement sharding in your PostgreSQL database, optimizing both scalability and performance for your payment gateway system.

Comments

Popular posts from this blog

Understanding Number Systems: Decimal, Binary, and Hexadecimal

In everyday life, we use numbers all the time, whether for counting, telling time, or handling money. The number system we’re most familiar with is the   decimal system , but computers use other systems, such as   binary   and   hexadecimal . Let’s break down these number systems to understand how they work. What is a Number System? A number system is a way of representing numbers using a set of symbols and rules. The most common number systems are: Decimal (Base 10) Binary (Base 2) Hexadecimal (Base 16) Each system has a different “base” that tells us how many unique digits (symbols) are used to represent numbers. Decimal Number System (Base 10) This is the system we use daily. It has  10 digits , ranging from  0 to 9 . Example: The number  529  in decimal means: 5 × 1⁰² + 2 × 1⁰¹ + 9 × 1⁰⁰ =  500 + 20 + 9 = 529 Each position represents a power of 10, starting from the rightmost digit. Why Base 10? Decimal is base 10 because it has 10 digits...

How to Monetize Your API as an Individual Developer While Hosting on Your Own Server?

In the API economy, cloud services like AWS, Google Cloud, and Azure offer many conveniences, such as scaling and infrastructure management. However, some developers prefer more control and autonomy, opting to host their APIs on personal servers. Whether for cost efficiency, data privacy, or customization, hosting your own API comes with both advantages and challenges. But, even without cloud platforms, there are effective ways to monetize your API. This guide will explore how individual developers can successfully monetize their APIs while hosting them on their own servers. Why Host Your API on Your Own Server? Hosting your own API gives you full control over the infrastructure and potentially lower long-term costs. Here’s why some developers choose this approach: Cost Control : Instead of paying ongoing cloud fees, you may opt for a one-time or lower-cost hosting solution that fits your budget and resource needs. Data Ownership : You have full control over data, which is critical if ...

API Testing with Jest and Supertest: A Step-by-Step Guide

API testing is essential to ensure your endpoints behave as expected across all scenarios. In this guide, we’ll explore how to use Jest and Supertest to test a sample API with various response types, including success, authentication errors, and validation errors. By the end, you’ll understand how to apply these tools to check for different response structures and status codes. 0. Prerequisites: Setting Up Your Environment Before diving into API testing, it’s important to ensure that your development environment is properly set up. Here’s what you need to do: Step 1: Install Node.js and npm Node.js  is a JavaScript runtime that allows you to run JavaScript code on the server side. It comes with  npm  (Node Package Manager), which helps you install and manage packages. Installation Steps: Download and install Node.js from the  official website . To verify the installation, open your terminal and run: node -v npm -v This should display the installed versions of Node.js...

The Weight of Responsibility: A Developer’s Journey to Balance Passion and Reality

For the past several years, Eddie has been on a steady climb in his career as a developer, but recently, he found himself at a crossroads — caught between the weight of his responsibilities and the desire to pursue his true passions. His journey began with a three-month internship as a web developer, which led to nearly four years in an application developer role. After that, he spent almost a year as a systems associate, managing tasks across systems analysis, quality assurance, and business analysis. Eventually, he returned to full-time software development for another two years before transitioning into more complex roles. For over a year, he worked as a multi-role software developer and database administrator before stepping into his current position as a senior software developer, database administrator, and cloud administrator — occasionally handling security tasks as well. Now, with over 8 years of professional experience, he also leads a small team of developers, which has been...

Avoiding Confusion in API Design: The Importance of Clear Responses

In today’s fast-paced software development landscape, APIs play a crucial role in connecting services and enabling functionality. However, poor design choices can lead to confusion and inefficiency for both developers and users. One such choice is the omission of a response body for successful requests, a practice I recently encountered in an enterprise API designed for bill payments. The Case of the No-Response API The API in question serves two main endpoints: one for inquiring about account validity and another for confirming payment. When successful, the API returned a  200 OK  status but no response body. This design choice led to significant confusion during our integration process. Even the internal team who developed the said API struggled to justify this approach, revealing a lack of clarity around the rationale behind it. Pros of This Design Choice While the intention behind this design may have been to streamline responses, several potential benefits can be identifi...