Should I run a database on Kubernetes?

How does a database run on Kubernetes? If so, what types of databases and data are best suited for using K8s? Let’s take a look.

This article is translated by the RadonDB open source community organization, the original link: https://ift.tt/sRe7Okx

Author | Ricardo Castro

Produced | RadonDB Open Source Community

Kubernetes is an open source container orchestration solution for automatically deploying, scaling, and managing containerized applications. Although Kubernetes was originally designed for stateless applications, with the growing popularity of stateful workloads, Kubernetes can also be used to manage stateful applications.

Typically, containers are stateless, and if the container crashes or needs to be restarted, the data in the container will definitely be lost. As a container orchestrator, Kubernetes maintains regular restarts and moves containers between nodes. Regardless of what Kubernetes does to the containers running applications, this is an important issue for stateful workloads that need to persist data.

As we all know, a database server is a stateful application.

How does the database run on Kubernetes? Does Kubernetes have a mechanism to manage such applications? If so, what type of database and data is best to use it with?

In this article, we will find out.

Different ways to run a database

Take the different ways of running a database server in an enterprise as an example:

On -premises own databases : Many companies still choose to use virtual machines to host database servers on-premises or in the cloud. Enterprises are responsible for setting up database servers, setting up their security, installing patches, upgrading, configuring storage, providing high availability, scaling, backing up, and performing other database administrator operations. This is the most manual way, but it gives full control over the database and data.

Cloud-hosted databases : Most modern businesses will opt for solutions such as Amazon RDS, Azure Database, Google Cloud Database, or Instaclustr that make it easier to deploy and scale database servers on the cloud. The vendor is responsible for storage, computing, network bandwidth, installation, upgrades, and high availability. Businesses as consumers simply host the database on an instance provided by the vendor running the database engine of your choice (such as SQL or NoSQL).

Kubernetes managed database : This approach is a hybrid of the above two approaches. You can run Kubernetes on-premises or in the cloud or use a managed service. With this approach, you can take advantage of many of the benefits of Kubernetes, such as automatic scheduling, self-healing, or horizontal scaling. But database usage (such as performance tuning, backup and recovery) still requires your attention and may vary slightly due to some containerization features.

Persistent storage and other features of K8s

Although Kubernetes was developed to manage containerized applications that did not require data persistence, it now also provides solutions for managing stateful applications. Persistent volumes (PV for short) provide an API that allows Kubernetes administrators to manage volumes, which along with more storage types provides a secure and abstract way to store and manage data.

However, the cloud is unpredictable, and Kubernetes often requires restarting and rebuilding pods. As a result, persistent volumes make it difficult to move data between nodes while ensuring they are attached to the correct containers. To complicate matters, some databases need to run in a multi-node cluster configuration.

Several designs were introduced in Kubernetes version 1.5 to help with these issues. StatefulSets ensure that pods are based on the same container specification and maintain unique IDs even if they are moved to another node. Coupling pods with persistent volumes through unique IDs maintains workload state even when they are rescheduled. DaemonSets, while slightly more complex, are also a way to run a working copy on each node of the cluster.

Distributed stateful workloads often require a complex set of operations that cannot be handled by predefined resources. For example, a distributed database might need to perform a specific set of actions when a database node (in Kubernetes, a pod) fails. Examples of such operations could be electing a leader, balancing data, etc.

Native Kubernetes features can’t really handle these cases, but its Custom resources can help. Custom resources allow the Kubernetes API to be extended with domain-specific logic to define new resource types and controllers. The Operator pattern leverages custom resources to manage applications and their components by helping to develop custom solutions.

OSS frameworks, such as kubebuilder, or Operator Framework, provide building blocks to create Operators, such as Postgres Operator, MySQL Operator for Kubernetes, Elastic Cloud on Kubernetes (ECK), or K8ssandra.

Features of Distributed Databases

Most database engines provide one or more ways to distribute data and make it highly available. When choosing a database to run on Kubernetes, you need to consider the following features:

Replication: Does the database support replication? If so, what type of replication does it support (eg: bidirectional replication, transactional replication and snapshots)? This will help improve reliability, fault tolerance and accessibility.

Sharding: Is the database capable of partitioning the data and keeping different shards in different instances (i.e. pods)? This can help optimize redundancy and spread the load.

Failover: Will the database be able to switch from the primary node, read-write node to other read-only node and promote the read-only node to primary? This will also help improve reliability, fault tolerance and accessibility.

Scalability: Is the database scalable (scale-in and scale-out)? Kubernetes paves the way for horizontal scaling, but the database needs to add or remove instances as needed. This can help handle increased loads or reduce costs when loads drop. Databases with these characteristics (eg: MySQL, PostgreSQL, ClickHouse, Elasticsearch, MongoDB, or Cassandra, etc.) can more easily cope with the uncertainty of heterogeneous cloud environments.

Data Availability Considerations

Because pods and compute nodes are often ephemeral in nature, Kubernetes is better suited for certain types of data. It is important to understand the importance of the data and to what extent it must be available.

To achieve high availability, some database engines use a so-called eventual consistency model. Eventual consistency is a technique that ensures that if there are no new updates to a given block of data, all accesses to it will return the last updated value. It assumes that, at any point in time, there may be some inconsistency in the data of different nodes (depending on where it is read from) because it is constantly being updated, but once the update is done, all nodes will have the same copy of it, and all clients Both side requests will get the same data. When you’re running a database system in Kubernetes, you need to see if this is acceptable from a business perspective.

Some database engines can handle failover (for example, when a pod running a primary replica of data reschedules or crashes), but it may take some time for the standby node to recover and assume the primary role. You need to consider how much data you can afford to be unavailable in this case, and whether it is acceptable to use old data.

As you can see, it all depends on business needs. Workloads that deal with transient data (such as cache layers), read-only data (such as lookup tables), or data that can be easily reconstructed (such as API outputs) are clearly more suited to Kubernetes.

Summarize

As a container orchestration technology, Kubernetes simplifies many common operational problems such as scheduling, autoscaling, or failover. While it works great for stateless workloads, stateful workloads (like databases) have other issues to address. We have seen:

Persistent volumes and storage classes provide a safe and abstract way to manage data;
StatefulSets and DaemonSets can be built on top of these concepts by allowing pods to be bound to persistent data;
Custom resources and operators can help provide custom logic for applications that require data persistence. However, it is important to consider the available support for the database engine to be run on Kubernetes, as well as the type of data to be stored and the availability requirements for the data. Running a service in Kubernetes requires dealing with a certain level of volatility.

Therefore, Kubernetes is more suitable for deploying databases that can handle replication, sharding, and failover. Likewise, the ideal data for Kubernetes to host is data that can be easily and quickly regenerated. Ultimately, it will depend on the fault tolerance required by the business.

The text and pictures in this article are from CSDN

This article is reprinted from https://www.techug.com/post/should-i-run-the-database-on-kubernetes693b2eff5bac09f984c8/
This site is for inclusion only, and the copyright belongs to the original author.

Features of Distributed Databases

Data Availability Considerations

Leave a Comment Cancel Reply