Published on

September 21, 2014

Scaling Existing Applications: Key Observations and Measurements

In the world of public, private, and hybrid cloud, scaling up an existing application by getting bigger hardware is not always the most effective approach. It can be expensive and may not solve scalability issues such as inconsistent performance, availability, or transaction throughput bottlenecks. In such cases, a distributed MySQL database can be a viable solution.

A distributed MySQL database retains its relational principles by applying a declarative, policy-based data distribution process. The goal is to enable successful “reads” and “writes” using data from within a single database instance or shard. By processing data within a single instance, application performance and database scalability can be significantly enhanced.

Data Distribution (Analysis) for Reads

When examining queries or reads, it is important to identify the related data within various tables that should be kept together on one machine. This can be done by analyzing joins, sub-queries, or unions to determine which pieces of data are accessed together. By keeping related data localized, query execution can be optimized.

Data Distribution for Writes

For transactions or writes, it is crucial to place additions to the database in the appropriate partitioned database instance or shard along with their related data. This ensures that a transaction is contained within a single shard, eliminating the need for distributed transactions with a 2-phase-commit. By writing data together, transaction efficiency can be improved.

Denormalization – Not the Best Solution

While denormalization may seem like a solution to data placement issues, it can create additional problems. Instead, a cascading key lookup solution can efficiently resolve data placement issues without the need for denormalization. However, in certain cases where the distribution process becomes complex, denormalization may be necessary.

Null Columns

A shard key is used to direct data and commands within a distributed database. It is important that the fields used to determine data routing are not empty or null. Every piece of data must be born with a distribution key that remains unchanged throughout its life. It is vital to insert every table into the database with an updated shard key to ensure proper data distribution.

New Applications: Design for Scale from the Start

When building new applications, it is essential to design for scalability from the start. The same data distribution principles applied to existing applications should also be applied to new ones. By selecting a distribution key and understanding the link between tables within each shard during the design process, data can be stored and accessed together on the same database.

ScaleBase has created a guide called “Building a New Application with Massive Database Scalability – Getting Started with ScaleBase” that demonstrates how to build a new application that plans for massive database scalability right from the start. It outlines the steps involved and provides a walkthrough using a sample application called ‘Blog System’.

If you are exploring distributed databases or facing challenges with distributed relational databases, feel free to share your thoughts and suggestions for future blog posts.

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.