Severalnines - load balancing

Choosing your HA topology

There are various ways to retain high availability with databases. You can use Virtual IPs (VRRP) to manage host availability, you can use resource managers like Zookeeper and Etcd to (re)configure your applications or use load balancers/proxies to distribute the workload over all available hosts.

The Virtual IPs need either an application to manage them (MHA, Orchestrator), some scripting (KeepaliveD, Pacemaker/Corosync) or an engineer to manually fail over and the decision making in the process can become complex. The Virtual IP failover is a straightforward and simple process by removing the IP address from one host, assigning it to another and use arping to send a gratuitous ARP response. In theory a Virtual IP can be moved in a second but it will take a few seconds before the failover management application is sure the host has failed and acts accordingly. In reality this should be somewhere between 10 and 30 seconds. Another limitation of Virtual IPs is that some cloud providers do not allow you to manage your own Virtual IPs or assign them at all. E.g., Google does not allow you to do that on their compute nodes.

Resource managers like Zookeeper and Etcd can monitor your databases and (re)configure your applications once a host fails or a slave gets promoted to master. In general this is a good idea but implementing your checks with Zookeeper and Etcd is a complex task.

A load balancer or proxy will sit in between the application and the database host and work transparently as if the client would connect to the database host directly. Just like with the Virtual IP and resource managers, the load balancers and proxies also need to monitor the hosts and redirect the traffic if one host is down. ClusterControl supports two proxies: HAProxy and MaxScale and both are supported for MySQL master-slave replication and Galera cluster. HAProxy and MaxScale both have their own use cases, we will describe them in this post as well.

Why do you need a load balancer?

In theory you don’t need a load balancer but in practice you will prefer one. We’ll explain why.

If you have virtual IPs setup, all you have to do is point your application to the correct (virtual) IP address and everything should be fine connection wise. But suppose you have scaled out the number of read replicas, you might want to provide virtual IPs for each of those read replicas as well because of maintenance or availability reasons. This might become a very large pool of virtual IPs that you have to manage. If one of those read replicas had a failure, you need to re-assign the virtual IP to another host or else your application will connect to either a host that is down or in worst case, a lagging server with stale data. Keeping the replication state to the application managing the virtual IPs is therefore necessary.

Also for Galera there is a similar challenge: you can in theory add as many hosts as you’d like to your application config and pick one at random. The same problem arises when this host is down: you might end up connecting to an unavailable host. Also using all hosts for both reads and writes might also cause rollbacks due to the optimistic locking in Galera. If two connections try to write to the same row at the same time, one of them will receive a roll back. In case your workload has such concurrent updates, it is advised to only use one node in Galera to write to. Therefore you want a manager that keeps track of the internal state of your database cluster.

Both HAProxy and MaxScale will offer you the functionality to monitor the database hosts and keep state of your cluster and its topology. For replication setups, in case a slave replica is down, both HAProxy and MaxScale can redistribute the connections to another host. But if a replication master is down, HAProxy will deny the connection and MaxScale will give back a proper error to the client. For Galera setups, both load balancers can elect a master node from the Galera cluster and only send the write operations to that specific node.

On the surface HAProxy and MaxScale may seem to be similar solutions, but they differ a lot in features and the way they distribute connections and queries. Both HAProxy and MaxScale can distribute connections using round-robin. You can utilize the round-robin also to split reads by designating a specific port for sending reads to the slaves and another port to send writes to the master. Your application will have to decide whether to use the read or write port. Since MaxScale is an intelligent proxy, it is database aware and is also able to analyze your queries. MaxScale is able to do read/write splitting on a single port by detecting whether you are performing a read or write operation and connecting to the designated slaves or master in your cluster. MaxScale includes additional functionality like binlog routing, audit logging and query rewriting but we will have to cover these in a separate article.

That should be enough background information on this topic, so let’s see how you can deploy both load balancers for MySQL replication and Galera topologies.

Deploying HAProxy

Using ClusterControl to deploy HAProxy on a Galera cluster is easy: go to the relevant cluster and select “Add Load Balancer”:

And you will be able to deploy an HAProxy instance by adding the host address and selecting the server instances you wish to include in the configuration:

By default the HAProxy instance will be configured to send connections to the server instances receiving the least number of connections, but you can change that policy to either round robin or source.

Under advanced settings you can set timeouts, maximum amount of connections and even secure the proxy by whitelisting an IP range for the connections.

Under the nodes tab of that cluster, the HAProxy node will appear:

Now your Galera cluster is also available via the newly deployed HAProxy node on port 3307. Don’t forget to GRANT your application access from the HAProxy IP, as now the traffic will be incoming from the proxy instead of the application hosts. Also, remember to point your application connection to the HAProxy node.

Now suppose the one server instance would go down, HAProxy will notice this within a few seconds and stop sending traffic to this instance:

The two other nodes are still fine and will keep receiving traffic. This retains the cluster highly available without the client even noticing the difference.

Deploying a secondary HAProxy node

Now that we have moved the responsibility of retaining high availability over the database connections from the client to HAProxy, what if the proxy node dies? The answer is to create another HAProxy instance and use a virtual IP controlled by Keepalived as shown in this diagram:

The benefit compared to using virtual IPs on the database nodes is that the logic for MySQL is at the proxy level and the failover for the proxies is simple.

So let’s deploy a secondary HAProxy node:

After we have deployed a secondary HAProxy node, we need to add Keepalived:

And after Keepalived has been added, your nodes overview will look like this:

So now instead of pointing your application connections to the HAProxy node directly you have to point them to the virtual IP instead.

In the example here, we used separate hosts to run HAProxy on, but you could easily add them to existing server instances as well. HAProxy does not bring much overhead, although you should keep in mind that in case of a server failure, you will lose both the database node and the proxy.

Deploying MaxScale

Deploying MaxScale to your cluster is done in a similar way to HAProxy: ‘Add Load Balancer’ in the cluster list.

ClusterControl will deploy MaxScale with both the round-robin router and the read/write splitter. The CLI port will be used to enable you to administrate MaxScale from ClusterControl.

After MaxScale has been deployed, it will be available under the Nodes tab:

Opening the MaxScale node overview will present you the interface that grants you access to the CLI interface, so there is no reason to log into MaxScale on the node anymore.

For MaxScale, the grants are slightly different: as you are proxying, you need to allow connections from the proxy - just like with HAProxy. But since MaxScale is also performing local authentication and authorization, you need to grant access to your application hosts as well.

Deploying Garbd

Galera implements a quorum-based algorithm to select a primary component through which it enforces consistency. The primary component needs to have a majority of votes (50% + 1 node), so in a 2 node system, there would be no majority resulting in split brain. Fortunately, it is possible to add a garbd (Galera Arbitrator Daemon), which is a lightweight stateless daemon that can act as the odd node. The added benefit by adding the Galera Arbitrator is that you can now do with only two nodes in your cluster.

If ClusterControl detects that your Galera cluster consists of an even number of nodes, you will be given the warning/advice by ClusterControl to extend the cluster to an odd number of nodes:

Choose wisely the host to deploy garbd on, as it will receive all replicated data. Make sure the network can handle the traffic and is secure enough. You could choose one of the HAProxy or MaxScale hosts to deploy garbd on, like in the example below:

Alternatively you could install garbd on the ClusterControl host.

After installing garbd, you will see it appear next to your two Galera nodes:

Final thoughts

We showed you how to make your MySQL master-slave and Galera cluster setups more robust and retain high availability using HAProxy and MaxScale. Also garbd is a nice daemon that can save the extra third node in your Galera cluster.

This finalizes the deployment side of ClusterControl. In our next blog, we will show you how to integrate ClusterControl within your organization by using groups and assigning certain roles to users.

Tags:

clustercontrol

deployment

haproxy

installation

Load balancers are an essential component in database high availability; especially when making topology changes transparent to applications and implementing read-write split functionality. ClusterControl provides an array of features to securely deploy, monitor and configure the industry's leading open source load balancing technologies.

In the past year we have added support for ProxySQL and added multiple enhancements for HAProxy and MariaDB’s Maxscale. We continue this tradition with the latest release of ClusterControl 1.5.

Based on feedback we received from our users, we have improved how ProxySQL is managed. We also added support for HAProxy and Keepalived to run on top of PostgreSQL clusters.

In this blog post, we’ll have a look at these improvements...

ProxySQL - User Management Enhancements

Previously, the UI would only allow you to create a new user or add an existing one, one at a time. One feedback we got from our users was that it is quite hard to manage a large number of users. We listened and in ClusterControl 1.5, it is now possible to import large batches of users. Let’s take a look at how you can do that. First of all, you need to have your ProxySQL deployed. Then, go to the ProxySQL node, and in the Users tab, you should see an “Import Users” button.

Once you click on it, a new dialog box will open:

Here you can see all of the users that ClusterControl detected on your cluster. You can scroll through them and pick the ones you want to import. You can also select or deselect all of the users from a current view.

Once you start to type in the Search box, ClusterControl will filter out non-matching results, narrowing the list only to users relevant to your search.

You can use the “Select All” button to select all users which match your search. Of course, after you selected users you want to import, you can clear the search box and start another search:

Please note “(7 selected)” - it tells you how many users, in total (not just from this search), you have selected to import. You can also click on it to see only the users you selected to import.

Once you are happy with your choice, you can click “Next” to go to the next screen.

Here you need to decide what should be the default hostgroup for each user. You can do that on per-user basis or globally, for the whole set or a subset of users resulting from a search.

Once you click on the “Import Users” button, users will be imported and they will show up in the Users tab.

ProxySQL - Scheduler Management

ProxySQL’s scheduler is a cron-like module which allows ProxySQL to start external scripts on a regular interval. The schedule can be quite granular - up to one execution every millisecond. Typically, the scheduler is used to execute Galera checker scripts (like proxysql_galera_checker.sh), but it can also be used to execute any other script that you like. In the past, ClusterControl used the scheduler to deploy the Galera checker script but this was not exposed in the UI. Starting ClusterControl 1.5, you now have full control.

As you can see, one script has been scheduled to run every 2 seconds (2000 milliseconds) - this is the default configuration for Galera cluster.

The above screenshot shows us options for editing existing entries. Please note that ProxySQL supports up to 5 arguments to the scripts it’ll execute through the scheduler.

If you want a new script to be added to the scheduler, you can click on the “Add New Script” button and you will be presented with a screen like the above. You can also preview how the full script will look like when executed. After you have filled all “Argument” fields and defined the interval, you can click on “Add New Script” button.

As a result, a script will be added to the scheduler and it’ll be visible on the list of scheduled scripts.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

PostgreSQL - Building the High Availability Stack

Setting up replication with auto failover is good, but applications need a simple way to track the writeable master. So we added support for HAProxy and Keepalived on top of the PostgreSQL clusters. This allows our PostgreSQL users to deploy a full high availability stack using ClusterControl.

From the Load Balancer sub tab, you can now deploy HAProxy - if you are familiar with how ClusterControl deploys MySQL replication, it is a very similiar setup. We install HAProxy on a given host, two backends, reads on port 3308 and writes on port 3307. It uses tcp-check, expecting a particular string to return. To produce that string, the following steps are executed on all of the database nodes. First of all, xinet.d is configured to run a service on port 9201 (to avoid confusion with MySQL setup, which uses port 9200).

# default: on
# description: postgreschk
service postgreschk
{
        flags           = REUSE
        socket_type     = stream
        port            = 9201
        wait            = no
        user            = root
        server          = /usr/local/sbin/postgreschk
        log_on_failure  += USERID
        disable         = no
        #only_from       = 0.0.0.0/0
        only_from       = 0.0.0.0/0
        per_source      = UNLIMITED

The service executes /usr/local/sbin/postgreschk script, which validates the state of PostgreSQL and tells if a given host is available and what type of host it is (master or slave). If everything is ok, it returns the string expected by HAProxy.

Just like with MySQL, HAProxy nodes in PostgreSQL clusters are seen in the UI and the status page can be accessed:

Here you can see both backends and verify that only the master is up for the r/w backend and all nodes can be accessed through the read-only backend. You can also get some statistics about traffic and connections.

HAProxy helps to improve high availability, but it can become a single point of failure. We need to go the extra mile and configure redundancy with the help of Keepalived.

Under Manage -> Load balancer -> Keepalived, you pick the HAProxy hosts you’d like to use and Keepalived will be deployed on top of them with a Virtual IP attached to the interface of your choice.

From now on, all connectivity should go to the VIP, which will be attached to one of the HAProxy nodes. If that node goes down, Keepalived will take the VIP down on that node and bring it up on another HAProxy node.

That’s it for the load balancing features introduced in ClusterControl 1.5. Do try them and let us know how yo

Tags:

Choosing your HA topology

The Virtual IPs need either an application to manage them (MHA, Orchestrator), some scripting (Keepalived, Pacemaker/Corosync) or an engineer to manually fail over and the decision making in the process can become complex. The Virtual IP failover is a straightforward and simple process by removing the IP address from one host, assigning it to another and use arping to send a gratuitous ARP response. In theory a Virtual IP can be moved in a second but it will take a few seconds before the failover management application is sure the host has failed and acts accordingly. In reality this should be somewhere between 10 and 30 seconds. Another limitation of Virtual IPs is that some cloud providers do not allow you to manage your own Virtual IPs or assign them at all. E.g., Google does not allow you to do that on their compute nodes.

A load balancer or proxy will sit in between the application and the database host and work transparently as if the client would connect to the database host directly. Just like with the Virtual IP and resource managers, the load balancers and proxies also need to monitor the hosts and redirect the traffic if one host is down. ClusterControl supports two proxies: HAProxy and ProxySQL and both are supported for MySQL master-slave replication and Galera cluster. HAProxy and ProxySQL both have their own use cases, we will describe them in this post as well.

Why do you need a load balancer?

In theory you don’t need a load balancer but in practice you will prefer one. We’ll explain why.

Both HAProxy and ProxySQL will offer you the functionality to monitor the MySQL/MariaDB database hosts and keep state of your cluster and its topology. For replication setups, in case a slave replica is down, both HAProxy and ProxySQL can redistribute the connections to another host. But if a replication master is down, HAProxy will deny the connection and ProxySQL will give back a proper error to the client. For Galera setups, both load balancers can elect a master node from the Galera cluster and only send the write operations to that specific node.

On the surface HAProxy and ProxySQL may seem to be similar solutions, but they differ a lot in features and the way they distribute connections and queries. HAProxy supports a number of balancing algorithms like least connections, source, random and round-robin while ProxySQL distributes connections using the weight-based round-robin algorithm (equal weight means equal distribution). Since ProxySQL is an intelligent proxy, it is database aware and is also able to analyze your queries. ProxySQL is able to do read/write splitting based on query rules where you can forward the queries to the designated slaves or master in your cluster. ProxySQL includes additional functionality like query rewriting, caching and query firewall with real-time, in-depth statistics generation about the workload.

That should be enough background information on this topic, so let’s see how you can deploy both load balancers for MySQL replication and Galera topologies.

Deploying HAProxy

Using ClusterControl to deploy HAProxy on a Galera cluster is easy: go to the relevant cluster and select “Add Load Balancer”:

And you will be able to deploy an HAProxy instance by adding the host address and selecting the server instances you wish to include in the configuration:

Under advanced settings you can set timeouts, maximum amount of connections and even secure the proxy by whitelisting an IP range for the connections.

Under the nodes tab of that cluster, the HAProxy node will appear:

Now suppose the one server instance would go down, HAProxy will notice this within a few seconds and stop sending traffic to this instance:

The two other nodes are still fine and will keep receiving traffic. This retains the cluster highly available without the client even noticing the difference.

Deploying a secondary HAProxy node

The benefit compared to using virtual IPs on the database nodes is that the logic for MySQL is at the proxy level and the failover for the proxies is simple.

So let’s deploy a secondary HAProxy node:

After we have deployed a secondary HAProxy node, we need to add Keepalived:

And after Keepalived has been added, your nodes overview will look like this:

So now instead of pointing your application connections to the HAProxy node directly you have to point them to the virtual IP instead.

Deploying ProxySQL

Deploying ProxySQL to your cluster is done in a similar way to HAProxy: "Add Load Balancer" in the cluster list under ProxySQL tab.

In the deployment wizard, specify where ProxySQL will be installed, the administration user/password, the monitoring user/password to connect to the MySQL backends. From ClusterControl, you can either create a new user to be used by the application (the user will be created on both MySQL and ProxySQL) or use the existing database users (the user will be created on ProxySQL only). Set whether are you are using implicit transactions or not. Basically, if you don’t use SET autocommit=0 to create new transaction, ClusterControl will configure read/write split.

After ProxySQL has been deployed, it will be available under the Nodes tab:

Opening the ProxySQL node overview will present you the ProxySQL monitoring and management interface, so there is no reason to log into ProxySQL on the node anymore. ClusterControl covers most of the ProxySQL important stats like memory utilization, query cache, query processor and so on, as well as other metrics like hostgroups, backend servers, query rule hits, top queries and ProxySQL variables. In the ProxySQL management aspect, you can manage the query rules, backend servers, users, configuration and scheduler right from the UI.

Check out our ProxySQL tutorial page which covers extensively on how to perform database Load Balancing for MySQL and MariaDB with ProxySQL.

Deploying Garbd

If ClusterControl detects that your Galera cluster consists of an even number of nodes, you will be given the warning/advice by ClusterControl to extend the cluster to an odd number of nodes:

Choose wisely the host to deploy garbd on, as it will receive all replicated data. Make sure the network can handle the traffic and is secure enough. You could choose one of the HAProxy or ProxySQL hosts to deploy garbd on, like in the example below:

Take note that starting from ClusterControl 1.5.1, garbd cannot be installed on the same host as ClusterControl due to risk of package conflicts.

After installing garbd, you will see it appear next to your two Galera nodes:

Final thoughts

We showed you how to make your MySQL master-slave and Galera cluster setups more robust and retain high availability using HAProxy and ProxySQL. Also garbd is a nice daemon that can save the extra third node in your Galera cluster.

This finalizes the deployment side of ClusterControl. In our next blog, we will show you how to integrate ClusterControl within your organization by using groups and assigning certain roles to users.

Tags:

When reading PostgreSQL getting started, you see the line: “The PostgreSQL server can handle multiple concurrent connections from clients. To achieve this, it starts (“forks”) a new process for each connection. From that point on, the client and the new server process communicate without intervention by the original postgres process. Thus, the master server process is always running, waiting for client connections, whereas client and associated server processes come and go.”

Brilliant idea. And yet it means that every new connection spins a new process, reserving RAM and possibly getting too heavy with multiple sessions. To avoid problems, postgres has max_connectionssetting with default 100 connections. Of course you can increase it, but such action would require restart (pg_settings.context is ‘postmaster’):

t=# select name,setting,short_desc,context from pg_settings where name = 'max_connections';
-[ RECORD 1 ]--------------------------------------------------
name       | max_connections
setting    | 100
short_desc | Sets the maximum number of concurrent connections.
context    | postmaster

And even after increasing - at some point you might need more connections (of course urgently as always on running prod). Why increasing it is so uncomfortable? Because if it was comfy, you would probably end up with uncontrolled spontaneous increasing of the number until the cluster starts lagging. Meaning old connections are slower - so they take more time, so you need even more and more new. To avoid such possible avalanche and add some flexibility, we have superuser_reserved_connections - to be able to connect and fix problems with SU when max_connections is exhausted. And we obviously see the need of some connection pooler. As we want new connection candidates to wait in a queue instead of failing with exception FATAL: sorry, too many clients already and not risking the postmaster.

Connection pooling is offered at some level by many popular “clients”. You could use it with jdbc for quite a while. Recently node-postgres offered it’s own node-pg-pool. More or less the implementation is simple (as the idea is): pooler starts the connections towards the database and keeps them. The client connecting to db only gets a “shared” existing connection and after closing it, the connection goes back to the pool. We also have much more sophisticated software, like pgPool. And yet pgbouncer is an extremely popular choice for the task. Why? Because it does only the pooling part, but does it right. It’s free. It’s fairly simple to set up. And you meet it at most biggest service providers as recommended or used, eg citusdata, aws, heroku and other highly respected resources.

So let us look closer at what it can and how you use it. In my setup I use default pool_mode = transaction ([pgbouncer] section) which is a very popular choice. This way we not just queue the connections exceeding max_connections, but rather reuse sessions without waiting for the previous connection to close:

[databases]
mon = host=1.1.1.1 port=5432 dbname=mon
mons = host=1.1.1.1 port=5432 dbname=mon pool_mode = session pool_size=2 max_db_connections=2
monst = host=1.1.1.1 port=5432 dbname=mon pool_mode = statement
[pgbouncer]
listen_addr = 1.1.1.1
listen_port = 6432
unix_socket_dir = /tmp
auth_file = /pg/pgbouncer/bnc_users.txt
auth_type = hba
auth_hba_file = /pg/pgbouncer/bnc_hba.conf
admin_users = root vao
pool_mode = transaction
server_reset_query = RESET ALL; --DEALLOCATE ALL; /* custom */
ignore_startup_parameters = extra_float_digits
application_name_add_host = 1
max_client_conn = 10000
autodb_idle_timeout = 3600
default_pool_size = 100
max_db_connections = 100
max_user_connections = 100
#server_reset_query_always = 1 #uncomment if you want older global behaviour

Short overview of the most popular settings and tips and tricks:

server_reset_query is very handy and important. In session pooling mode, it “wipes” previous session “artifacts”. Otherwise you would have problems with same names for prepared statements, session settings affecting next sessions and so on. The default is DISCARD ALL, that “resets” all session states. Yet you can choose more sophisticated values, e.g., RESET ALL; DEALLOCATE ALL; to forget only SET SESSION and prepared statements, keeping TEMP tables and plans “shared”. Or the opposite - you might want to make prepared statements “global” from any session. Such configuration is doable, though risky. You have to make pgbouncer reuse the session for all (thus making either very small pool size or avalanching the sessions), which is not completely reliable. Anyway - it is a useful ability. Especially in setups where you want client sessions to eventually (not immediately) change to configured pooled session settings. Very important point here is session pool mode. Before 1.6 this setting affected other pool modes as well, so if you relied on it, you need to use the new settingserver_reset_query_always = 1. Probably at some point people will want server_reset_query to be even more flexible and configurable per db/user pair (and client_reset_query instead). But as of current writing, March 2018, it’s not an option. The idea behind making this setting valid by default for session mode only was - if you share connection on transaction or statement level - you cannot rely on the session setting at all.
Auth_type = hba. Before 1.7, the big problem with pgbouncer was the absence of host based authentication - “postgres firewall”. Of course you still had it for postgres cluster connection, but pgbouncer was “open” for any source. Now we can use the same hba.conf to limit connections for host/db/user based on connection network.
connect_query is not performed on every client “connection” to pgbouncer, but rather when pgbouncer connects to a Postgres instance. Thus you can’t use it for setting or overriding “default” settings. In session mode, other sessions do not affect each other and on disconnect, reset query discards all - so you don’t need to mess with it. In transaction pooling mode, you would hope to use it for settings overriding erroneously set by other sessions, but it won’t work, alas. Eg. you want to share prepared statement between “sessions” in transaction mode, so you set something like
```
trns = dbname=mon pool_mode = transaction connect_query = 'do $$ begin raise warning $w$%$w$, $b$new connection$b$; end; $$; prepare s(int) as select $1;'
```
and indeed - every new client sees the prepared statements (unless you left server_reset_query_always to on, so pgbouncer discards it on commit). But if some client runs DISCARD s; in its session, it affects all clients on this connection and new clients connecting to it won’t see prepared statements anymore. But if you want to have some initial setting for postgres connections coming from pgbouncer, then this is the place.
application_name_add_host was added in 1.6, it has similar limitation. It “puts” the client IP to application_name, so you can easily get your bad query source, but is easily overridden by simple set application_name TO ‘wasn’’t me’; Still you can “heal” this using views - follow this post to get the idea or even use these short instructions. Basically idea is that show clients; will show the clients IP, so you can query it directly from pgbouncer database on each select from pg_stat_activity to check if it’s reset. But of course using a simple setting is much simpler and cosier. Though it does not guarantee the result...
pool_mode can be specified both as default, per database and per user - making it very flexible. Mixing modes makes pgbouncer extremely effective for pooling. This is a powerful feature, but one has to be careful when using it. Often users use it without understanding the results to absolutely atomic mixes of per transaction/per session/per user/per database/global settings working differently for the same user or database, due to the different pooling modes with pgbouncer. This is the box of matches you don’t give to children without supervision. Also many other options are configurable for default and per db and per user.
Please don’t take it literally, but you can “compare” different sections of ini with SET and ALTER: SET LOCAL affects transactions and is good to use when poll_mode=transaction , SET SESSION affects sessions and is safe for use when poll_mode=session , ALTER USER SET affects roles and will interfere with pgbouncer.ini part of section [users], ALTER DATABASE SET affects databases and will interfere with pgbouncer.ini part of section [databases], ALTER SYSTEM SET or editing postgres.conf globally affects defaults and is comparable by effect to the default section of pgbouncer.ini.
Once again - use pool mode responsibly. Prepared statements or session wide settings will be a mess in transaction pooling mode. Same as SQL transaction makes no sense in statement pooling mode. Choose a suitable pooling mode for suitable connections. A good practice is creating roles with the idea that:
- some will run only fast selects, thus can share one session without transactions for a hundred of concurrent tiny not important selects.
- Some role members are safe for session level concurrency, and ALWAYS use transactions. Thus they can safely share several sessions for hundreds of concurrent transactions.
- Some roles are just too messy of complicated to share their session with others. So you use session pooling mode for them to avoid errors on connection when all “slots” are already taken.
Don’t use it instead of HAProxy or some other load balancer. Despite the fact that pgbouncer has several configurable features addressing what a load balancer addresses, like dns_max_ttl and you can set up a DNS configuration for it, most prod environments use HAProxy or some other load balancer for HA. This is because HAProxy is really good at load balancing across live servers in round robin fashion, better than pgbouncer. Although pgbouncer is better for postgres connection pooling, it might be better to use one small daemon that perfectly performs one task, instead of a bigger one that does two tasks, but worse.
Configuration changes can be tricky. Some changes to pgbouncer.ini require restart (listen_port and such), while others such as admin_users require reload or SIGHUP. Changes inside auth_hba_file require reload, while changes to auth_file do not.

The extremely short overview of settings above is limited by the format. I invite you to take a look at the complete list. Pgbouncer is the kind of software with very small amount of “boring settings” - they all have huge potential and are of amazing interest.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

And lastly, moving from a short enthusiastic review to something where you might be less happy - the installation. The process is clearly described in this section of documentation. The only option described is building from git sources. But everybody knows there are packages! Trying both most popular:

sudo yum install pgbouncer
sudo apt-get install pgbouncer

can work. But sometimes you have to do an extra step. E.g., when no pgbouncer package is available, try this.

Or even:

sudo yum install pgbouncer
Loaded plugins: priorities, update-motd, upgrade-helper
amzn-main                                                                                                                    | 2.1 kB  00:00:00
amzn-updates                                                                                                                 | 2.5 kB  00:00:00
docker-ce-edge                                                                                                               | 2.9 kB  00:00:00
docker-ce-stable                                                                                                             | 2.9 kB  00:00:00
docker-ce-test                                                                                                               | 2.9 kB  00:00:00
pgdg10                                                                                                                       | 4.1 kB  00:00:00
pgdg95                                                                                                                       | 4.1 kB  00:00:00
pgdg96                                                                                                                       | 4.1 kB  00:00:00
pglogical                                                                                                                    | 3.0 kB  00:00:00
sensu                                                                                                                        | 2.5 kB  00:00:00
(1/3): pgdg96/x86_64/primary_db                                                                                              | 183 kB  00:00:00
(2/3): pgdg10/primary_db                                                                                                     | 151 kB  00:00:00
(3/3): pgdg95/x86_64/primary_db                                                                                              | 204 kB  00:00:00
50 packages excluded due to repository priority protections
Resolving Dependencies
--> Running transaction check
---> Package pgbouncer.x86_64 0:1.8.1-1.rhel6 will be installed
--> Processing Dependency: libevent2 >= 2.0 for package: pgbouncer-1.8.1-1.rhel6.x86_64
--> Processing Dependency: c-ares for package: pgbouncer-1.8.1-1.rhel6.x86_64
--> Processing Dependency: libcares.so.2()(64bit) for package: pgbouncer-1.8.1-1.rhel6.x86_64
--> Running transaction check
---> Package c-ares.x86_64 0:1.13.0-1.5.amzn1 will be installed
---> Package pgbouncer.x86_64 0:1.8.1-1.rhel6 will be installed
--> Processing Dependency: libevent2 >= 2.0 for package: pgbouncer-1.8.1-1.rhel6.x86_64
--> Finished Dependency Resolution
Error: Package: pgbouncer-1.8.1-1.rhel6.x86_64 (pgdg10)
           Requires: libevent2 >= 2.0
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

Of course adding pgdg to /etc/yum.repos.d/ won’t help anymore. Neither the --skip-broken or rpm -Va --nofiles --nodigest. A simple

sudo yum install libevent2
Loaded plugins: priorities, update-motd, upgrade-helper
50 packages excluded due to repository priority protections
No package libevent2 available.
Error: Nothing to do

would be too easy. So you have to build libevent2 yourself, bringing you back to the position when you have to compile things yourself. Either it is pgbouncer or one of its dependencies.

Again - digging too deep with the particularities of installation is out of scope. You should know you have a big chance to install it as package.

Lastly - questions like“why postgres does not offer a native session pooler” comes over and over. There are even very fresh suggestions and thoughts on it. But so far the most popular approach here is using pgbouncer.

Tags:

Thanks for joining this week’s webinar on how to design open source databases for high availability with Ashraf Sharif, Senior Support Engineer at Severalnines. From discussing high availability concepts through to failover or switch over mechanisms, Ashraf covered all the need-to-know information when it comes to building highly available database infrastructures.

It’s been said that not designing for failure leads to failure; but what is the best way to design a database system from the ground up to withstand failure?

Designing open source databases for high availability can be a challenge as failures happen in many different ways, which sometimes go beyond imagination. This is one of the consequences of the complexity of today’s open source database environments.

At Severalnines we’re big fans of high availability databases and have seen our fair share of failure scenarios across the thousands of database deployment attempts that we come across every year.

In this webinar replay, we look at the different types of failures you might encounter and what mechanisms can be used to address them. And we look at some of popular high availability solutions used today, and how they can help you achieve different levels of availability.

Watch the replay

Agenda

Why design for High Availability?
High availability concepts
- CAP theorem
- PACELC theorem
Trade offs
- Deployment and operational cost
- System complexity
- Performance issues
- Lock management
Architecting databases for failures
- Capacity planning
- Redundancy
- Load balancing
- Failover and switchover
- Quorum and split brain
- Fencing
- Multi datacenter and multi-cloud setups
- Recovery policy
High availability solutions
- Database architecture determines Availability
- Active-Standby failover solution with shared storage or DRBD
- Master-slave replication
- Master-master cluster
Failover and switchover mechanisms
- Reverse proxy
- Caching
- Virtual IP address
- Application connector

Watch the replay

Speaker

Ashraf Sharif is System Support Engineer at Severalnines. He was previously involved in hosting world and LAMP stack, where he worked as principal consultant and head of support team and delivered clustering solutions for large websites in the South East Asia region. His professional interests are on system scalability and high availability.

Tags:

database high availability

Join us on April 24th for Part 2 of our database high availability webinar special!

In this session we will focus on how to measure database availability. It is notoriously hard to measure and report on, although it is an important KPI in any SLA between you and your customer. With that in mind, we will discuss the different factors that affect database availability and see how you can measure your database availability in a realistic way.

It is common enough to define availability in terms of 9s (e.g. 99.9% or 99.999%) - especially here at Severalnines - although there are often different opinions as to what these numbers actually mean, or how they are measured.

Is the database available if an instance is up and running, but it is unable to serve any requests? Or if response times are excessively long, so that users consider the service unusable? Is the impact of one longer outage the same as multiple shorter outages? How do partial outages affect database availability, where some users are unable to use the service while others are completely unaffected?

Not agreeing on precise definitions with your customers might lead to dissatisfaction. The database team might be reporting that they have met their availability goals, while the customer is dissatisfied with the service.

Join us for this webinar during which we will discuss the different factors that affect database availability and see how to measure database availability in a realistic way.

Date, Time & Registration

Europe/MEA/APAC

Tuesday, April 24th at 09:00 BST / 10:00 CEST (Germany, France, Sweden)

North America/LatAm

Tuesday, April 24th at 09:00 PDT (US) / 12:00 EDT (US)

Agenda

Defining availability targets
- Critical business functions
- Customer needs
- Duration and frequency of downtime
- Planned vs unplanned downtime
- SLA
Measuring the database availability
- Failover/Switchover time
- Recovery time
- Upgrade time
- Queries latency
- Restoration time from backup
- Service outage time
Instrumentation and tools to measure database availability:
- Free & open-source tools
- CC's Operational Report
- Paid tools

Speaker

Bartlomiej Oles is a MySQL and Oracle DBA, with over 15 years experience in managing highly available production systems at IBM, Nordea Bank, Acxiom, Lufthansa, and other Fortune 500 companies. In the past five years, his focus has been on building and applying automation tools to manage multi-datacenter database environments.

Tags:

database high availability

measuring database availability

webinar

Pgpool is less actual today, than it used to be 10 years ago, when it was the default part of a production PostgreSQL set up. Often when somebody was talking about PostgreSQL cluster, they were referring to postgreSQL behind pgpool and not to the PostgreSQL instance itself (which is the right term). Pgpool is recognised between most influential Postgres players: postgresql community, commandprompt, 2ndquadrant, EDB, citusdata, postgrespro (ordered by age, not influence). I realize the level of recognition in my links is very different - I just want to emphasize the overall impact of pgpool in the postgres world. Some of the most known current postgres “vendors” were found after the pgpool was already famous. So what makes it so famous?

Just the list of most in-demand offered features makes it look great:

native replication
connection pooling
load balancing for read scalability
high availability (watchdog with virtual IP, online recovery & failover)

Well, let’s make a sandbox and play. My sample setup is master slave mode. I would assume it is the most popular today, because you typically use streaming replication together with load balancing. Replication mode is barely used these days. Most DBAs skip it in favour to streaming replication and pglogical, and previously to slony.

The replication mode has many interesting settings and surely interesting functionality. But most DBAs have master/multi slave setup by the time they get to pgpool. So they are looking for automatic failover and load balancer, and pgpool offers it out of the box for existing master/multi slave environments. Not to mention that as from Postgres 9.4, streaming replication works with no major bugs and from 10 hash indexes replication is supported, so there are barely anything to stop you from using it. Also streaming replication is asynchronous by default (configurable to synchronous and even not “linear” synchronization complicated setups, while native pgpool replication is synchronous (which means slower data changes) with no choice option. Also additional limitations apply. Pgpool manual itself suggests to prefer when possible streaming replication over pgpool native one). And so this is my choice here.

Ah, but first we need to install it - right?

Installation (of higher version on ubuntu).

First checking the ubuntu version with lsb_release -a. For me repo is:

root@u:~# sudo add-apt-repository 'deb http://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | \
>   sudo apt-key add -
OK
root@u:~# sudo apt-get update

Lastly installation itself:

sudo apt-get install pgpool2=3.7.2-1.pgdg16.04+1

Config:

I user default config from recommended mode:

zcat /usr/share/doc/pgpool2/examples/pgpool.conf.sample-stream.gz > /etc/pgpool2/pgpool.conf

Starting:

If you missed config, you see:

2018-03-22 13:52:53.284 GMT [13866] FATAL:  role "nobody" does not exist

Ah true - my bad, but easily fixable (doable blindly with one liner if you want the same user for all healthchecks and recovery):

root@u:~# sed -i s/'nobody'/'pgpool'/g /etc/pgpool2/pgpool.conf

And before we go any further, let’s create database pgpool and user pgpool in all clusters (In my sandbox they are master, failover and slave, so I need to run it on master only):

t=# create database pgpool;
CREATE DATABASE
t=# create user pgpool;
CREATE ROLE

At last - starting:

postgres@u:~$ /usr/sbin/service pgpool2 start
postgres@u:~$ /usr/sbin/service pgpool2 status
pgpool2.service - pgpool-II
   Loaded: loaded (/lib/systemd/system/pgpool2.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2018-04-09 10:25:16 IST; 4h 14min ago
     Docs: man:pgpool(8)
  Process: 19231 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)
 Main PID: 8770 (pgpool)
    Tasks: 10
   Memory: 5.5M
      CPU: 18.250s
   CGroup: /system.slice/pgpool2.service
           ├─ 7658 pgpool: wait for connection reques
           ├─ 7659 pgpool: wait for connection reques
           ├─ 7660 pgpool: wait for connection reques
           ├─ 8770 /usr/sbin/pgpool -n
           ├─ 8887 pgpool: PCP: wait for connection reques
           ├─ 8889 pgpool: health check process(0
           ├─ 8890 pgpool: health check process(1
           ├─ 8891 pgpool: health check process(2
           ├─19915 pgpool: postgres t ::1(58766) idl
           └─23730 pgpool: worker proces

Great - so we can proceed to the first feature - let’s check load balancing. It has some requirements to be used, supports hints (e.g. to balance in same session), has black-and-white-listed functions, has regular expressions based redirect preference list. It is sophisticated. Alas goingf thoroughly over all that functionality would be out of the scope of this blog, thus we will check the simplest demos:

First, something very simple will show which node is used for select (in my setup, master spins on 5400, slave on 5402 and failover on 5401, while pgpool itself is on 5433, as I have another cluster running and did not want to interfere with it):

vao@u:~$ psql -h localhost -p 5433 t -c "select current_setting('port') from ts limit 1"
 current_setting
-----------------
 5400
(1 row)

Then in loop:

vao@u:~$ (for i in $(seq 1 99); do psql -h localhost -p 5433 t -c "select current_setting('port') from ts limit 1" -XAt; done) | sort| uniq -c
      9 5400
     30 5401
     60 5402

Great. It definitely balances load between nodes, but seems to balance not equally - maybe it’s so smart it knows the weight of each statement? Let’s check the distribution with expected results:

t=# show pool_nodes;
 node_id | hostname  | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+-----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | localhost | 5400 | up     | 0.125000  | primary | 122        | false             | 0
 1       | localhost | 5401 | up     | 0.312500  | standby | 169        | false             | 0
 2       | localhost | 5402 | up     | 0.562500  | standby | 299        | true              | 0
(3 rows)

No - pgpool does not analyze the weight of statements - it was a DBA with her settings again! The settings (see the lb_weight attribute) reconciles with actual query destination targets. You can easily change it (as we did here) by changing the corresponding setting, eg:

root@u:~$ grep weight /etc/pgpool2/pgpool.conf
backend_weight0 =0.2
backend_weight1 = 0.5
backend_weight2 = 0.9
root@u:~# sed -i s/'backend_weight2 = 0.9'/'backend_weight2 = 0.2'/ /etc/pgpool2/pgpool.conf
root@u:~# grep backend_weight2 /etc/pgpool2/pgpool.conf
backend_weight2 = 0.2
root@u:~# pgpool reload
root@u:~$ (for i in $(seq 1 9); do psql -h localhost -p 5433 t -c "select current_setting('port') from ts limit 1" -XAt; done) | sort| uniq -c
      6 5401
      3 5402

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Great! The next great feature offered is connection pooling. With 3.5 the “thundering herd problem” is solved by serializing accept() calls, greatly speeding up “client connection” time. And yet this feature is pretty straightforward. It does not offer several levels of pooling or several pools configured for the same database (pgpool lets you to choose where to run selects with database_redirect_preference_list of load balancing though), or other flexible features offered by pgBouncer.

So short demo:

t=# select pid,usename,backend_type, state, left(query,33) from pg_stat_activity where usename='vao' and pid <> pg_backend_pid();
 pid  | usename |  backend_type  | state |     left
------+---------+----------------+-------+--------------
 8911 | vao     | client backend | idle  |  DISCARD ALL
 8901 | vao     | client backend | idle  |  DISCARD ALL
 7828 | vao     | client backend | idle  |  DISCARD ALL
 8966 | vao     | client backend | idle  |  DISCARD ALL
(4 rows)
Hm - did I set up this little number of children?
t=# pgpool show num_init_children;
 num_init_children
-------------------
 4
(1 row)

Ah, true, I changed them lower than default 32, so the output would not take several pages. Well then, let’s try exceeding the number of sessions (below I open postgres sessions async in loop, so the 6 sessions would be requested at more or less the same time):

vao@u:~$ for i in $(seq 1 6); do (psql -h localhost -p 5433 t -U vao -c "select pg_backend_pid(), pg_sleep(1), current_setting('port'), clock_timestamp()"&);  done
vao@u:~$  pg_backend_pid | pg_sleep | current_setting |        clock_timestamp
----------------+----------+-----------------+-------------------------------
           8904 |          | 5402            | 2018-04-10 12:46:55.626206+01
(1 row)

 pg_backend_pid | pg_sleep | current_setting |        clock_timestamp
----------------+----------+-----------------+-------------------------------
           9391 |          | 5401            | 2018-04-10 12:46:55.630175+01
(1 row)

 pg_backend_pid | pg_sleep | current_setting |       clock_timestamp
----------------+----------+-----------------+------------------------------
           8911 |          | 5400            | 2018-04-10 12:46:55.64933+01
(1 row)

 pg_backend_pid | pg_sleep | current_setting |        clock_timestamp
----------------+----------+-----------------+-------------------------------
           8904 |          | 5402            | 2018-04-10 12:46:56.629555+01
(1 row)

 pg_backend_pid | pg_sleep | current_setting |        clock_timestamp
----------------+----------+-----------------+-------------------------------
           9392 |          | 5402            | 2018-04-10 12:46:56.633092+01
(1 row)

 pg_backend_pid | pg_sleep | current_setting |       clock_timestamp
----------------+----------+-----------------+------------------------------
           8910 |          | 5402            | 2018-04-10 12:46:56.65543+01
(1 row)

It lets sessions to come by three - expected, as one is taken by the above session (selecting from pg_stat_activity) so 4-1=3. As soon as pg_sleep finishes its one second nap and session is closed by postgres, the next one is let in. So after the first three ends, the next three step in. What happens to the rest? They are queued until the next connection slot frees up. Then the process described next to serialize_accept happens and client gets connected.

Huh? Just session pooling in session mode? Is it all?.. No, here the caching steps in! Look.:

postgres=# /*NO LOAD BALANCE*/ select 1;
 ?column?
----------
        1
(1 row)

Checking the pg_stat_activity:

postgres=# select pid, datname, state, left(query,33),state_change::time(0), now()::time(0) from pg_stat_activity where usename='vao' and query not like '%DISCARD%';
  pid  | datname  | state |               left                | state_change |   now
-------+----------+-------+-----------------------------------+--------------+----------
 15506 | postgres | idle  | /*NO LOAD BALANCE*/ select 1, now | 13:35:44     | 13:37:19
(1 row)

Then run the first statement again and observe state_change not changing, which means you don’t even get to the database to get a known result! Of course if you put some mutable function, results won’t be cached. Experiment with:

postgres=# /*NO LOAD BALANCE*/ select 1, now();
 ?column? |             now
----------+------------------------------
        1 | 2018-04-10 13:35:44.41823+01
(1 row)

You will find that state_change changes as does the result.

Last point here - why /*NO LOAD BALANCE*/ ?.. to be sure we check pg_stat_activity on master and run query on master as well. Same you can use /*NO QUERY CACHE*/ hint to avoid getting a cached result.

Already much for a short review? But we did not even touch the HA part! And many users look towards pgpool specifically for this feature. Well, this is not the end of the story, this is the end of part one. Part two is coming, where we will briefly cover HA and some other tips on using pgpool...

Tags:

PostgreSQL

pgpool

load balancing

This is the second part of the blog “A Guide to Pgpool for PostgreSQL”. The first part covering load balancing, session pooling, in memory cache and installation can be found here.

Many users look towards pgpool specifically for High Availability features, and it has plenty to offer. There are few quite a lot of instructions for pgpool HA on the web (e.g. longer one and shorter one), so it would not make any sense to repeat them. Neither do we want to provide yet another blind set of configuration values. Instead I suggest to play against the rules and try doing it the wrong way, so we’ll see some interesting behaviour. One of the top expected feature (at least it’s on the top of the page) is the ability to recognise the usability of a “dead” ex master and re-use it with pg_rewind. It could save hours of bringing back the new standby with big data (as we skip rsync or pg_basebackup, which effectively copies ALL files over from the new master). Strictly speaking, pg_rewind is meant for planned failover (during upgrade or migrating to new hardware). But we’ve seen when it’s greatly helps with not planned but yet graceful shutdown and automated failover - for e.g., ClusterControl makes use of it when performing automatic failover of replication slaves. Let’s assume we have the case: we need (any) master to be accessible as much as possible. If for some reason (network failure, max connections exceeded or any other “failure” that forbids new sessions to start) we no longer can use a master for RW operations, we have a failover cluster configured, with slaves that can accept connections. We can then promote one of the slaves and fail over to it.

First let’s assume we have three nodes:

10.1.10.124:5400 with /pg/10/m (pgpool spins here as well)
10.1.10.147:5401 with /pg/10/m2
10.1.10.124:5402 with /pg/10/s2

Those are effectively the same nodes as in part one, but the failover node is moved to a different host and $PGDATA. I did it to make sure I did not typo or forget some extra quote in remote ssh command. Also the debugging info will look simpler because ip addresses are different. Finally I was not sure I will be able to make this unsupported use case to work, so I have to see it with my own eyes.

Failover

First we set failover_command and run pgpool reload and try to failover. Here and further, I will echo some info to /tmp/d on the pgpool server, so I can tail -f /tmp/d to see the flow.

postgres@u:~$ grep failover_command /etc/pgpool2/pgpool.conf
failover_command = 'bash /pg/10/fo.sh %D %H %R'

postgres@u:~$ cat /pg/10/fo.sh
rem_cmd="pg_ctl -D $3 promote"
cmd="ssh -T postgres@$2 $rem_cmd"
echo "$(date) $cmd">>/tmp/d
$cmd &>>/tmp/d

NB: Do you have $PATH set in .bashrc on remote host?..

Let’s stop the master (I know it’s not how disaster happens, you expect at least some huge monkey or red shining robot to smash the server with a huge hammer, or at least the boring hard disks to die, but I’m using this graceful variant to demo the possible use of pg_rewind, so here the failover will be the result of human error or network failure a half second over the health_check_period), so:

/usr/lib/postgresql/10/bin/pg_ctl -D /pg/10/m stop
2018-04-18 13:53:55.469 IST [27433]  LOG:  received fast shutdown request
waiting for server to shut down....2018-04-18 13:53:55.478 IST [27433]  LOG:  aborting any active transactions
2018-04-18 13:53:55.479 IST [28855] postgres t FATAL:  terminating connection due to administrator command
2018-04-18 13:53:55.483 IST [27433]  LOG:  worker process: logical replication launcher (PID 27440) exited with exit code 1
2018-04-18 13:53:55.484 IST [27435]  LOG:  shutting down
2018-04-18 13:53:55.521 IST [27433]  LOG:  database system is shut down
 done
server stopped

Now checking the failover command output:

postgres@u:~$ cat /tmp/d
Wed Apr 18 13:54:05 IST 2018 ssh -T postgres@localhost
pg_ctl -D /pg/10/f promote
waiting for server to promote.... done
server promoted

And checking after a while:

t=# select nid,port,st, role from dblink('host=localhost port=5433','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int);
 nid | port |  st  |  role
-----+------+------+---------
   0 | 5400 | down | standby
   1 | 5401 | up   | primary
   2 | 5402 | up   | standby
(3 rows)

Also we see in ex-failover cluster logs:

2018-04-13 14:26:20.823 IST [20713]  LOG:  received promote request
2018-04-13 14:26:20.823 IST [20713]  LOG:  redo done at 0/951EC20
2018-04-13 14:26:20.823 IST [20713]  LOG:  last completed transaction was at log time 2018-04-13 10:41:54.355274+01
2018-04-13 14:26:20.872 IST [20713]  LOG:  selected new timeline ID: 2
2018-04-13 14:26:20.966 IST [20713]  LOG:  archive recovery complete
2018-04-13 14:26:20.998 IST [20712]  LOG:  database system is ready to accept connections

Checking replication:

postgres@u:~$ psql -p 5401 t -c "select now() into test"
SELECT 1
postgres@u:~$ psql -p 5402 t -c "select * from test"
              now
-------------------------------
 2018-04-13 14:33:19.569245+01
(1 row)

The slave /pg/10/s2:5402 switched to a new timeline thanks to recovery_target_timeline = latestin recovery.conf, so we are good. We don’t need to adjust recovery.conf to point to the new master, because it points to the pgpool ip and port and they stay the same no matter who is performing the primary master role.

Checking load balancing:

postgres@u:~$ (for i in $(seq 1 9); do psql -h localhost -p 5433 t -c "select current_setting('port') from ts limit 1" -XAt; done) | sort| uniq -c
      6 5401
      3 5402

Nice. Apps behind pgpool will notice a second outage and continue to work.

Reusing ex-master

Now we can turn the ex-master to failover standby and bring it back (without adding a new node to pgpool, as it exists there already). If you don’t have wal_log_hints enabled or data checksums (comprehensive difference between these options is here), you have to recreate cluster on ex-master to follow a new timeline:

postgres@u:~$ rm -fr /pg/10/m
postgres@u:~$ pg_basebackup -h localhost -p 5401 -D /pg/10/m/

But don’t rush to run the statements above! If you took care on wal_log_hints (requires restart), you can try using pg_rewind for much faster switching of the ex-master to a new slave.

So ATM we have the ex-master offline, new master with next timeline started. If the ex-master was offline due to temporary network failure and it comes back, we need to shut it down first. In the case above we know it’s down, so we can just try rewinding:

postgres@u:~$ pg_rewind -D /pg/10/m2 --source-server="port=5401 host=10.1.10.147"
servers diverged at WAL location 0/40605C0 on timeline 2
rewinding from last common checkpoint at 0/4060550 on timeline 2
Done!

And again:

postgres@u:~$ pg_ctl -D /pg/10/m2 start
server started
...blah blah 
postgres@u:~$ 2018-04-16 12:08:50.303 IST [24699]  LOG:  started streaming WAL from primary at 0/B000000 on timeline 2

t=# select nid,port,st,role from dblink('host=localhost port=5433','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int);
 nid | port |  st  |  role
-----+------+------+---------
   0 | 5400 | down | standby
   1 | 5401 | up   | primary
   2 | 5402 | up   | standby
(3 rows)

Ops. Duh! Despite the fact that the cluster at port 5400 is online and follows a new timeline, we need to tell pgpool to recognize it:

postgres@u:~$ pcp_attach_node -w -h 127.0.0.1 -U vao -n 0
 pcp_attach_node  -- Command Successful

Now all three are up (and pgpool knows it) and in sync:

postgres@u:~$ sql="select ts.i::timestamp(0), current_setting('data_directory'),case when pg_is_in_recovery() then 'recovering' else 'mastering' end stream from ts order by ts desc"
postgres@u:~$ psql -h 10.1.10.147 -p 5401 t -c "$sql";
          i          | current_setting |  stream
---------------------+-----------------+-----------
 2018-04-30 14:34:36 | /pg/10/m2       | mastering
(1 row)

postgres@u:~$ psql -h 10.1.10.124 -p 5402 t -c "$sql";
          i          | current_setting |   stream
---------------------+-----------------+------------
 2018-04-30 14:34:36 | /pg/10/s2       | recovering
(1 row)

postgres@u:~$ psql -h 10.1.10.124 -p 5400 t -c "$sql";
          i          | current_setting |   stream
---------------------+-----------------+------------
 2018-04-30 14:34:36 | /pg/10/m        | recovering
(1 row)

Now I’ll try using recovery_1st_stage_command for reusing ex-master:

root@u:~# grep 1st /etc/pgpool2/pgpool.conf
recovery_1st_stage_command = 'or_1st.sh'

But recovery_1st_stage_command does not offer the needed arguments for pg_rewind, which I can see if I add to recovery_1st_stage_command:

echo "online recovery started on $(hostname) $(date --iso-8601) $0 $1 $2 $3 $4"; exit 1;

The output:

online recovery started on u2 2018-04-30 /pg/10/m2/or_1st.sh /pg/10/m2 10.1.10.124 /pg/10/m 5401

Well - using pg_rewind is just in todo list - what did I expect?.. So I need to do some monkey hack to get master ip and port (remember it will keep changing after failover).

A monkey hack

So I have something like this in recovery_1st_stage_command:

root@u:~# cat /pg/10/or_1st.sh
pgpool_host=10.1.10.124
pgpool_port=5433
echo "online recovery started on $(hostname) $(date --iso-8601) $0 $1 $2 $3 $4" | ssh -T $pgpool_host "cat >> /tmp/d"
master_port=$(psql -XAt -h $pgpool_host -p $pgpool_port t -c "select port from dblink('host=$pgpool_host port=$pgpool_port','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int) where role='primary'")
master_host=$(psql -XAt -h $pgpool_host -p $pgpool_port t -c "select hostname from dblink('host=$pgpool_host port=$pgpool_port','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int) where role='primary'")
failover_host=$(psql -XAt -h $pgpool_host -p $pgpool_port t -c "select hostname from dblink('host=$pgpool_host port=$pgpool_port','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int) where role!='primary' order by port limit 1")
src='"port=$master_port host=$master_host"'
rem_cmd="'pg_rewind -D $3 --source-server=\"port=$master_port host=$master_host\"'"
cmd="ssh -T $failover_host $rem_cmd"
echo $cmd | ssh -T $pgpool_host "cat >> /tmp/d"
$cmd

tmp=/tmp/rec_file_tmp
cat > $tmp <<EOF
standby_mode          = 'on'
primary_conninfo      = 'host=$master_host port=$master_port user=postgres'
trigger_file = '/tmp/tg_file'
recovery_target_timeline  = latest
EOF

scp $tmp $failover_host:$3/recovery.conf

rem_cmd="pg_ctl -D $3 start"
cmd="ssh -T $failover_host $rem_cmd"
echo $cmd | ssh -T $pgpool_host "cat >> /tmp/d"
$cmd
echo "OR finished $(date --iso-8601)" | ssh -T $pgpool_host "cat >> /tmp/d"
exit 0;

Now what a mess! Well - if you decide to use not existing feature - prepare - it will look bad, work worse and you will permanently feel ashamed of what you did. So step by step:

I need pgpool IP and port to remotely connect to it, both to query “show pool_nodes” and to log steps and to run commands.
I’m piping some dbg info to /tmp/d over ssh, because the command will be executed on master side, which will change after failing over
I can use the result of “show pool_nodes” to get the running master connection info simply filtering with WHERE clause
I will need double quotes in argument for pg_rewind, which will need to run over ssh, so I just split the command for readability, then echo it and run
Preparing recovery.conf based on output from “show pool_nodes” (writing it I ask myself - why did I not just use pgpool IP and port instead?..
Starting new failover slave (I know I’m supposed to use 2nd step - just skipped to avoid getting all IPs and port over again)

Now what’s left - trying to use this mess in pcp:

root@u:~# pcp_recovery_node -h 127.0.0.1 -U vao -n 0 -w
pcp_recovery_node -- Command Successful
root@u:~# psql -h localhost -p 5433 t -c"select nid,port,st,role from dblink('host=10.1.10.124 port=5433','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int)"
 nid | port | st |  role
-----+------+----+---------
   0 | 5400 | up | standby
   1 | 5401 | up | primary
   2 | 5402 | up | standby
(3 rows)

Checking the /tmp/d on pgpool server:

root@u:~# cat /tmp/d
Tue May  1 11:37:59 IST 2018 ssh -T postgres@10.1.10.147 /usr/lib/postgresql/10/bin/pg_ctl -D /pg/10/m2 promote
waiting for server to promote.... done
server promoted
online recovery started on u2 2018-05-01 /pg/10/m2/or_1st.sh /pg/10/m2
ssh -T 10.1.10.124 'pg_rewind -D --source-server="port=5401 host=10.1.10.147"'
ssh -T 10.1.10.124 pg_ctl -D start
OR finished 2018-05-01

Now obviously we want to roll it over again to see if it works on any host:

postgres@u:~$ ssh -T 10.1.10.147 pg_ctl -D /pg/10/m2 stop             waiting for server to shut down.... done
server stopped
postgres@u:~$ psql -h localhost -p 5433 t -c"select nid,port,st,role from dblink('host=10.1.10.124 port=5433','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int)"
 nid | port |  st  |  role
-----+------+------+---------
   0 | 5400 | up   | primary
   1 | 5401 | down | standby
   2 | 5402 | up   | standby
(3 rows)

root@u:~# pcp_recovery_node -h 127.0.0.1 -U vao -n 1 -w

postgres@u:~$ psql -h localhost -p 5433 t -c"select nid,port,st,role from dblink('host=10.1.10.124 port=5433','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int)"
 nid | port | st |  role
-----+------+----+---------
   0 | 5400 | up | primary
   1 | 5401 | up | standby
   2 | 5402 | up | standby
(3 rows)

Log looks similar - only IP and ports have changed:

 Tue May  1 11:44:01 IST 2018 ssh -T postgres@10.1.10.124 /usr/lib/postgresql/10/bin/pg_ctl -D /pg/10/m promote
waiting for server to promote.... done
server promoted
online recovery started on u 2018-05-01 /pg/10/m/or_1st.sh /pg/10/m 10.1.10.147 /pg/10/m2 5400
ssh -T 10.1.10.147 'pg_rewind -D /pg/10/m2 --source-server="port=5400 host=10.1.10.124"'
ssh -T 10.1.10.147 pg_ctl -D /pg/10/m2 start
online recovery started on u 2018-05-01 /pg/10/m/or_1st.sh /pg/10/m
ssh -T 10.1.10.147 'pg_rewind -D --source-server="port=5400 host=10.1.10.124"'
ssh -T 10.1.10.147 pg_ctl -D start
OR finished 2018-05-01

In this sandbox, the master moved to 5401 on failover and after living there for a while it moved back to 5400. Using pg_rewind should make it as fast as possible. Previously the scary part of automatic failover was - if you really messed up the config and did not foresee some force majeure, you could run into automatic failover to next slave and next and next until there is no free slave left. And after that, you just end up with several split brained masters and no failover spare. It’s a poor consolation in such scenario to have even more slaves to failover, but without pg_rewind you would not have even that. “Traditional” rsync or pg_basebackup copy ALL $PGDATA over to create a standby, and can’t reuse the “not too much different” ex master.

In conclusion to this experiment I would like to emphasize once again - this is not a solution suitable for blind copy pasting. The usage of pg_rewind is not encouraged for pg_pool. It is not usable at all ATM. I wanted to add some fresh air to pgpool HA configuration, for nubes like me to observe a little closer how it works. For coryphaeus to smile at naivistic approach and maybe see it with our - nubes eyes.

Tags:

In the previous part I dared to play with a not implemented feature fantasising how would it work. Well HA in first place is a matter of design and only then implementation. It does not excuse bad implementation, neither it makes naive designing look smart. Yet after you cover all possible scenarios and found an adequate best rule for most cases, sometimes a very primitive small change can ruin the stronghold. Below I want to sandbox.

What Happens When pgpool Should Failover, But Can’t?

When health check fails for the master, the failover_command fired to degenerate all or promote next slave to primary. Sounds solid. What if it fails itself, eg ssh connection fails (e.g. because other - bad admin remove key from ~/.ssh/authorized_keys). What we have?

As soon as health_check_timeout (default 20) is out (also affected by retry delay, max retires and so on) the node turns dead, so:

t=# select nid,port,st from dblink('host=localhost port=5433','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int);
 nid | port |  st
-----+------+------
   0 | 5400 | down
   1 | 5401 | up
   2 | 5402 | up
(3 rows)

So no retries left and the failover failed. The first option obviously is doing failover manually. But if failover failed because of some stupid error, master is back on rails, and the only problem you have is pgpool thinking the master is offline - you would probably want to leave things as they used to be before the accident instead - right? Of course just moving master back online is not enough. Pgpool already “degenerated” the primary. Just adding it as a new node will not help either. The worst thing is that, after the event, pgpool will not try to check whether the old master is pg_is_in_recovery() or not, thus will never accept it as Primary. According to bug track you have to “Discard pgpool_status file and do not restore previous status” with pgpool -D command.

After discarding the status, we reconnect to avoid seeing server closed the connection unexpectedly and run:

t=# select nid,port,st,role from dblink('host=localhost port=5433','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int);
 nid | port | st |  role
-----+------+----+---------
   0 | 5400 | up | primary
   1 | 5401 | up | standby
   2 | 5402 | up | standby
(3 rows)

All nodes are back up and running, pgpool recognises the master.

Finally I want to cover some hints and observations on using pgpool:

Changing backend settings is a little tricky: hostname, port and directory require reload for adding new nodes, but require restart for editing existing. While weight and flag can be altered with just reload.

Don’t confuse load_balance_node columnvalues with configuration. If you see just one node with true it’s not just OK - it’s meant so. It does not mean you have only one node in balancing pool - it just shows which node is chosen for this particular session. Below is query result with all three nodes participating in SELECT statements balancing, with node id 2 chosen:

t=# show pool_nodes;
 node_id | hostname  | port | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay
---------+-----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0       | localhost | 5400 | up     | 0.125000  | primary | 61         | false             | 0
 1       | localhost | 5401 | up     | 0.312500  | standby | 8          | false             | 0
 2       | localhost | 5402 | up     | 0.562500  | standby | 11         | true              | 0
(3 rows)

You can check which node was chosen for load balancing with show pool_nodes, but you care to know it for your query, not the “show” one, so such check is not always informative enough. Well you can monitor which node you use for the current query, with something like:
```
t=# select *,current_setting('port') from now();
              now              | current_setting
-------------------------------+-----------------
 2018-04-09 13:56:17.501779+01 | 5401
(1 row)
```

Important! But not:

t=# select now, setting from now() join pg_settings on name='port';
             now             | setting
-----------------------------+---------
 2018-04-09 13:57:17.5229+01 | 5400
(1 row)

As it will ALWAYS return master’s port. Same applies for any pg_catalog SELECT.

As you noticed in previous parts, I use more complicated way, than just show pool_nodes to list nodes with state. I do it deliberately to demonstrate how you can make the result manageable. Using where makes the query longer, but the result clear, skipping all that distracts attention for our particular task. Compare:

t=# select nid,port,st,role from dblink('host=localhost port=5433','show pool_nodes') as t (nid int,hostname text,port int,st text,lb_weight float,role text,cnt int,cur_node text,del int);
 nid | port | st |  role
-----+------+----+---------
   0 | 5400 | up | primary
   1 | 5401 | up | standby
   2 | 5402 | up | standby

With the output of initial show pool_nodes...

You can’t compare pgbouncer and pgpool. But if you do, most important to know that parsing queries in pgpool depends on pg version. So when upgrading postgreSQL, you need to upgrade pgpool as well, while one pgbouncer instance can have config for 8,9,10 different clusters in the same ini file.
Why can’t I use just a failover script instead of pgpool? You can. But pgpool offers it ALONG with memcached and connection pooling and balancing and split brain control and is checked by decades of usage.
Bug Tracking system is in place - worth of visiting it if you work with pgpool: https://www.pgpool.net/mantisbt/my_view_page.php
Numerous typos in documentation, like bakance (backend + balance?..), statemnet, allowd or mismatch across version (pool_nodes used to be int and now are enum, but link to old values in pcp_node-info is still there) spoils the impression on this wonderful product. A form to send the report on found “bug” in documentation (just like “submit correction” on postgres docs) would greatly improve it though.

Important tip: before relying on any step - check it. E.g. after promoting node you can’t repromote it (here promoting is not postgres operation, but rather registration of the node as master for pgpool):

root@u:~# sudo -u postgres pcp_promote_node -w -h 127.0.0.1 -U vao -n 1
pcp_promote_node -- Command Successful
root@u:~# sudo -u postgres pcp_promote_node -w -h 127.0.0.1 -U vao -n 1
FATAL:  invalid pgpool mode for process recovery request
DETAIL:  specified node is already primary node, can't promote node id 1

Sounds logic and looks great. Yet, if you run this against wrong node (eg, node 0 is ! pg_is_in_recovery):

root@u:~# for i in $(seq 1 3); do pcp_promote_node -w -h 127.0.0.1 -U vao -n 0; echo $?; done
pcp_promote_node -- Command Successful
0
pcp_promote_node -- Command Successful
0
pcp_promote_node -- Command Successful
0

Which is bad because you can’t repromote node and expect an error, but you get exit status 0…

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Important tip: Don’t play too much. Never play on prod!

Playing with recovery_1st_stage_command using pg_rewind, I thought to try out of curiosity another monkey hack - querying pgpool_recovery() without arguments (as I ignore them in my set up anyway) and then just trying to attach the node to pgpool:

root@u:~# psql -p 5433 -h localhost template1 -c "SELECT pgpool_recovery('or_1st.sh', '', '', '')"
 pgpool_recovery
-----------------
 t
(1 row)

root@u:~# pcp_attach_node -h 127.0.0.1 -U vao -w -n 1
pcp_attach_node -- Command Successful

This stupid idea brought me to:

root@u:~# ps -aef | grep pgpool
postgres 15227     1  0 11:22 ?        00:00:00 pgpool -D
postgres 15240 15227  0 11:22 ?        00:00:00 pgpool: health check process(0)
postgres 15241 15227  0 11:22 ?        00:00:00 pgpool: health check process(1)
postgres 15242 15227  0 11:22 ?        00:00:00 pgpool: health check process(2)
postgres 15648 15227  0 11:24 ?        00:00:00 [pgpool] <defunct>
postgres 16264 15227  0 11:26 ?        00:00:00 pgpool: PCP: wait for connection request
postgres 16266 15227  0 11:26 ?        00:00:00 [pgpool] <defunct>
postgres 16506 16264  0 11:26 ?        00:00:00 pgpool: PCP: processing recovery request
postgres 16560 15227  0 11:26 ?        00:00:00 [pgpool] <defunct>
postgres 16835 15227  0 11:26 ?        00:00:00 [pgpool] <defunct>
postgres 16836 15227  0 11:26 ?        00:00:00 [pgpool] <defunct>

No escape I have to:

root@u:~# kill -9 
root@u:~# rm /var/run/pgpoolql/.s.PGSQL.5433
root@u:~# rm /var/run/pgpoolql/.s.PGSQL.9898

Above 5433 is pgpool port and 9898 is pcp port. Obviously after crash, files are not swept, so you have to do it manually.

Do a careful reading and play a lot before taking pgpool to production. It’s much harder to find help with pgpool then postgres itself. Some questions are never answered. Especially when asked in wrong place (I answered it based on right place to get the answer)...
Don’t forget the latest timeline for cascading replication (not really the pgpool hint, but often people don’t understand that in order to pickup a new master it is not enough to specify a right endpoint for receiver).
Architecture with diagram could be found here.

Conclusion

In 10 years new promising features (watchdog and virtual ip) and important fixes (eg serialize_accept) appeared, but overall it leaves a undervalued impression. Docs have typos that have lived there for 10 years. I don't believe no-one reads the docs. I don’t believe no one noticed. You just can't report them in any easy way. There are plenty of guns loaded and prepared, lying on the documentation site for the novice user to take, point against the foot and pull the trigger. I have no reasonable idea how to improve it - I’m just warning the shooters. Misinterpreting one parameter can throw you in a desperate position of reverse engineering to find your mistake. All these years pgpool used to be and remains kind of a product for advanced users. Reading documentation I could not help myself not recalling the old Russian joke about Sherlock Holmes: Sherlock and Watson fly on the balloon. Suddenly the strong wind blows them thousands miles away. When they can land, they see the girl grazing sheep. Holmes asks the girl: “Darling where are we?” and the girl replies “You are on the balloon!”. Sherlock thanks and as they take off says “The wind took us very far - we are in Russia”. “But how do you know?” Watson asks. “It’s obvious - only in Russia coders graze sheep” Sherlock replies. “But how do you know the girl is coder?” - “It’s obvious - she gave us absolutely precise and totally useless answer”.

Tags:

A proxy layer between applications and databases would typically consist of multiple proxy nodes for high availability. This is no different for ProxySQL. ProxySQL, just like other modern proxies, can be used to build complex logic for routing queries. You can add query rules to send queries to particular hosts, you can cache queries, you can add and remove backend servers, or manage users that are allowed to connect to the ProxySQL and MySQL. However, numerous ProxySQL nodes in the proxy layer introduces another problem - synchronization across distributed instances. Any rules or other logic need to be synchronized across instances, to ensure they behave in the same way. Even if not all of the proxies are handling traffic, they still work as a standby. In case they would need to take over the work, you don’t want any surprises if the instance used does not have the most recent configuration changes.

It is quite cumbersome to ensure this manually - to make the changes by hand on all of the nodes. You can utilize tools like Ansible, Chef or Puppet to manage configurations, but the sync process has to be coded and tested. ClusterControl can help you here through an option to sync configurations between ProxySQL instances, but it also can set up and manage the other components required for high availability, e.g., Virtual IP. But starting from version 1.4.2, ProxySQL offers a native clustering and configuration syncing mechanism. In this blog post, we will discuss how to set it up with a mix of actions taken in ClusterControl and ProxySQL commandline admin interface.

First of all, let’s take a look at a typical replication environment deployed by ClusterControl.

As you can see from the screenshot, this is a MySQL replication setup with three ProxySQL instances. ProxySQL high availability is implemented through Keepalived and Virtual IP that is always assigned to one of the ProxySQL nodes. There are a couple of steps we have to take in order to configure ProxySQL clustering. First, we have to define which user ProxySQL should be using to exchange information between the nodes. Let’s define a new one on top of the existing administrative user:

Next, we need to define that user in admin-cluster_password and admin-cluster_username settings.

This has been done on just one of the nodes (10.0.0.126). Let’s sync this configuration change to the remaining ProxySQL nodes.

As we stated, ClusterControl allows you to synchronize configuration between ProxySQL nodes with just a couple of steps. When the job ended syncing 10.0.0.127 with 10.0.0.126, there’s just the last node we need to sync.

After this, we need to make a small change in the ProxySQL administrative command line interface, which is typically reachable on port 6032. We have to create entries in the ‘proxysql_servers’ table which would define the nodes in our ProxySQL cluster.

mysql> INSERT INTO proxysql_servers (hostname) VALUES ('10.0.0.126'), ('10.0.0.127'), ('10.0.0.128');
Query OK, 3 rows affected (0.00 sec)

mysql> LOAD PROXYSQL SERVERS TO RUNTIME;
Query OK, 0 rows affected (0.01 sec)

mysql> SAVE PROXYSQL SERVERS TO DISK;
Query OK, 0 rows affected (0.01 sec)

After loading the change to runtime, ProxySQL should start syncing the nodes. There are a couple of places where you can track the state of the cluster.

mysql> SELECT * FROM stats_proxysql_servers_checksums;
+------------+------+-------------------+---------+------------+--------------------+------------+------------+------------+
| hostname   | port | name              | version | epoch      | checksum           | changed_at | updated_at | diff_check |
+------------+------+-------------------+---------+------------+--------------------+------------+------------+------------+
| 10.0.0.128 | 6032 | admin_variables   | 0       | 0          |                    | 0          | 1539773916 | 0          |
| 10.0.0.128 | 6032 | mysql_query_rules | 2       | 1539772933 | 0x3FEC69A5C9D96848 | 1539773546 | 1539773916 | 0          |
| 10.0.0.128 | 6032 | mysql_servers     | 4       | 1539772933 | 0x3659DCF3E53498A0 | 1539773546 | 1539773916 | 0          |
| 10.0.0.128 | 6032 | mysql_users       | 2       | 1539772933 | 0xDD5F0BB01235E930 | 1539773546 | 1539773916 | 0          |
| 10.0.0.128 | 6032 | mysql_variables   | 0       | 0          |                    | 0          | 1539773916 | 0          |
| 10.0.0.128 | 6032 | proxysql_servers  | 2       | 1539773835 | 0x8EB13E2B48C3FDB0 | 1539773835 | 1539773916 | 0          |
| 10.0.0.127 | 6032 | admin_variables   | 0       | 0          |                    | 0          | 1539773916 | 0          |
| 10.0.0.127 | 6032 | mysql_query_rules | 3       | 1539773719 | 0x3FEC69A5C9D96848 | 1539773546 | 1539773916 | 0          |
| 10.0.0.127 | 6032 | mysql_servers     | 5       | 1539773719 | 0x3659DCF3E53498A0 | 1539773546 | 1539773916 | 0          |
| 10.0.0.127 | 6032 | mysql_users       | 3       | 1539773719 | 0xDD5F0BB01235E930 | 1539773546 | 1539773916 | 0          |
| 10.0.0.127 | 6032 | mysql_variables   | 0       | 0          |                    | 0          | 1539773916 | 0          |
| 10.0.0.127 | 6032 | proxysql_servers  | 2       | 1539773812 | 0x8EB13E2B48C3FDB0 | 1539773813 | 1539773916 | 0          |
| 10.0.0.126 | 6032 | admin_variables   | 0       | 0          |                    | 0          | 1539773916 | 0          |
| 10.0.0.126 | 6032 | mysql_query_rules | 1       | 1539770578 | 0x3FEC69A5C9D96848 | 1539773546 | 1539773916 | 0          |
| 10.0.0.126 | 6032 | mysql_servers     | 3       | 1539771053 | 0x3659DCF3E53498A0 | 1539773546 | 1539773916 | 0          |
| 10.0.0.126 | 6032 | mysql_users       | 1       | 1539770578 | 0xDD5F0BB01235E930 | 1539773546 | 1539773916 | 0          |
| 10.0.0.126 | 6032 | mysql_variables   | 0       | 0          |                    | 0          | 1539773916 | 0          |
| 10.0.0.126 | 6032 | proxysql_servers  | 2       | 1539773546 | 0x8EB13E2B48C3FDB0 | 1539773546 | 1539773916 | 0          |
+------------+------+-------------------+---------+------------+--------------------+------------+------------+------------+
18 rows in set (0.00 sec)

The stats_proxysql_servers_checksums table contains, among others, a list of nodes in the cluster, tables that are synced, versions and checksum of the table. If the checksum is not in line, ProxySQL will attempt to get the latest version from a cluster peer. More detailed information about the contents of this table can be found in ProxySQL documentation.

Another source of information about the process is ProxySQL’s log (by default it is located in /var/lib/proxysql/proxysql.log).

2018-10-17 11:00:25 [INFO] Cluster: detected a new checksum for mysql_query_rules from peer 10.0.0.126:6032, version 2, epoch 1539774025, checksum 0xD615D5416F61AA72 . Not syncing yet …
2018-10-17 11:00:27 [INFO] Cluster: detected a peer 10.0.0.126:6032 with mysql_query_rules version 2, epoch 1539774025, diff_check 3. Own version: 2, epoch: 1539772933. Proceeding with remote sync
2018-10-17 11:00:28 [INFO] Cluster: detected a peer 10.0.0.126:6032 with mysql_query_rules version 2, epoch 1539774025, diff_check 4. Own version: 2, epoch: 1539772933. Proceeding with remote sync
2018-10-17 11:00:28 [INFO] Cluster: detected peer 10.0.0.126:6032 with mysql_query_rules version 2, epoch 1539774025
2018-10-17 11:00:28 [INFO] Cluster: Fetching MySQL Query Rules from peer 10.0.0.126:6032 started
2018-10-17 11:00:28 [INFO] Cluster: Fetching MySQL Query Rules from peer 10.0.0.126:6032 completed
2018-10-17 11:00:28 [INFO] Cluster: Loading to runtime MySQL Query Rules from peer 10.0.0.126:6032
2018-10-17 11:00:28 [INFO] Cluster: Saving to disk MySQL Query Rules from peer 10.0.0.126:6032

As you can see, we have here information that a new checksum has been detected and the sync process is in place.

Let’s stop for a moment here and discuss how ProxySQL handles configuration updates from multiple sources. First of all, ProxySQL tracks checksums to detect when a configuration has changed. It also stores when it happened - this data is stored as a timestamp, so it has one second resolution. ProxySQL has two variables which also impacts how changes are being synchronized.

Cluster_check_interval_ms - it determines how often ProxySQL should check for configuration changes. By default it is 1000ms.

Cluster_mysql_servers_diffs_before_sync - it tells us how many times a check should detect a configuration change before it will get synced. Default setting is 3.

This means that, even if you will make a configuration change on the same host, if you will make it less often than 4 seconds, the remaining ProxySQL nodes may not be able to synchronize it because a new change will show up before the previous one was synchronized. It also means that if you make configuration changes on multiple ProxySQL instances, you should make them with at least a 4 second break between them as otherwise some of the changes will be lost and, as a result, configurations will diverge. For example, you add Server1 on Proxy1 , and after 2 seconds you add Server2 on Proxy2 . All other proxies will reject the change on Proxy1 because they will detect that Proxy2 has a newer configuration. 4 seconds after the change on Proxy2, all proxies (including Proxy1) will pull the configuration from Proxy2.

As the intra-cluster communication is not synchronous and if a ProxySQL node you made the changes to failed, changes may not be replicated on time. The best approach is to make the same change on two ProxySQL nodes. This way, unless both fail exactly at the same time, one of them will be able to propagate new configuration..

Also worth noting is that the ProxySQL cluster topology can be quite flexible. In our case we have three nodes, all have three entries in the proxysql_servers table. Such nodes form the cluster where you can write to any node and the changes will be propagated. On top of that, it is possible to add external nodes which would work in a “read-only” mode, which means that they would only synchronize changes made to the “core” cluster but they won’t propagate changes that were performed directly on themselves. All you need on the new node is to have just the “core” cluster nodes configured in proxysql_servers and, as a result, it will connect to those nodes and get the data changes, but it will not be queried by the rest of the cluster for its configuration changes. This setup could be used to create a source of truth with several nodes in the cluster, and other ProxySQL nodes, which just get the configuration from the main “core” cluster.

In addition to all of that, ProxySQL cluster supports automatic rejoining of the nodes - they will sync their configuration while starting. It can also be easily scaled out by adding more nodes.

We hope this blog post gives you an insight into how ProxySQL cluster can be configured. ProxySQL cluster will be transparent to ClusterControl and it will not impact any of the operations you may want to execute from the ClusterControl UI.

Tags:

load balancing

proxysql

cluster

Queries have to be cached in every heavily loaded database, there is simply no way for a database to handle all traffic with reasonable performance. There are various mechanisms in which a query cache can be implemented. Starting from the MySQL query cache, which used to work just fine for mostly read-only, low concurrency workloads and which has no place in high concurrent workloads (to the extent that Oracle removed it in MySQL 8.0), to external key-value stores like Redis, memcached or CouchBase.

The main problem with using an external dedicated data store (as we would not recommend to use MySQL query cache to anyone) is that this is yet another datastore to manage. It is yet another environment to maintain, scaling issues to handle, bugs to debug and so on.

So why not kill two birds with one stone by leveraging your proxy? The assumption here is that you are using a proxy in your production environment, as it helps load balance queries across instances, and mask the underlying database topology by provide a simple endpoint to applications. ProxySQL is a great tool for the job, as it can additionally function as a caching layer. In this blog post, we’ll show you how to cache queries in ProxySQL using ClusterControl.

How Query Cache Works in ProxySQL?

First of all, a bit of a background. ProxySQL manages traffic through query rules and it can accomplish query caching using the same mechanism. ProxySQL stores cached queries in a memory structure. Cached data is evicted using time-to-live (TTL) setting. TTL can be defined for each query rule individually so it is up to the user to decide if query rules are to be defined for each individual query, with distinct TTL or if she just needs to create a couple of rules which will match the majority of the traffic.

There are two configuration settings that define how a query cache should be used. First, mysql-query_cache_size_MB which defines a soft limit on the query cache size. It is not a hard limit so ProxySQL may use slightly more memory than that, but it is enough to keep the memory utilization under control. Second setting you can tweak is mysql-query_cache_stores_empty_result. It defines if an empty result set is cached or not.

ProxySQL query cache is designed as a key-value store. The value is the result set of a query and the key is composed from concatenated values like: user, schema and query text. Then a hash is created off that string and that hash is used as the key.

Setting up ProxySQL as a Query Cache Using ClusterControl

As the initial setup, we have a replication cluster of one master and one slave. We also have a single ProxySQL.

This is by no means a production-grade setup as we would have to implement some sort of high availability for the proxy layer (for example by deploying more than one ProxySQL instance, and then keepalived on top of them for floating Virtual IP), but it will be more than enough for our tests.

First, we are going to verify the ProxySQL configuration to make sure query cache settings are what we want them to be.

256 MB of query cache should be about right and we want to cache also the empty result sets - sometimes a query which returns no data still have to do a lot of work to verify there’s nothing to return.

Next step is to create query rules which will match the queries you want to cache. There are two ways to do that in ClusterControl.

Manually Adding Query Rules

First way requires a bit more manual actions. Using ClusterControl you can easily create any query rule you want, including query rules that do the caching. First, let’s take a look at the list of the rules:

At this point, we have a set of query rules to perform the read/write split. The first rule has an ID of 100. Our new query rule has to be processed before that one so we will use lower rule ID. Let’s create a query rule which will do the caching of queries similar to this one:

SELECT DISTINCT c FROM sbtest8 WHERE id BETWEEN 5041 AND 5140 ORDER BY c

There are three ways of matching the query: Digest, Match Digest and Match Pattern. Let’s talk a bit about them here. First, Match Digest. We can set here a regular expression that will match a generalized query string that represents some query type. For example, for our query:

SELECT DISTINCT c FROM sbtest8 WHERE id BETWEEN 5041 AND 5140 ORDER BY c

The generic representation will be:

SELECT DISTINCT c FROM sbtest8 WHERE id BETWEEN ? AND ? ORDER BY c

As you can see, it stripped the arguments to the WHERE clause therefore all queries of this type are represented as a single string. This option is quite nice to use because it matches whole query type and, what’s even more important, it’s stripped off any whitespaces. This makes it so much easier to write a regular expression as you don’t have to account for weird line breaks, whitespaces at the beginning or end of the string and so on.

Digest is basically a hash that ProxySQL calculates over the Match Digest form.

Finally, Match Pattern matches against full query text, as it was sent by the client. In our case, the query will have a form of:

SELECT DISTINCT c FROM sbtest8 WHERE id BETWEEN 5041 AND 5140 ORDER BY c

We are going to use Match Digest as we want all of those queries to be covered by the query rule. If we wanted to cache just that particular query, a good option would be to use Match Pattern.

The regular expression that we use is:

SELECT DISTINCT c FROM sbtest[0-9]+ WHERE id BETWEEN \? AND \? ORDER BY c

We are matching literally the exact generalized query string with one exception - we know that this query hit multiple tables therefore we added a regular expression to match all of them.

Once this is done, we can see if the query rule is in effect or not.

We can see that ‘Hits’ are increasing which means that our query rule is being used. Next, we’ll look at another way to create a query rule.

Using ClusterControl to Create Query Rules

ProxySQL has a useful functionality of collecting statistics of the queries it routed. You can track data like execution time, how many times a given query was executed and so on. This data is also present in ClusterControl:

What is even better, if you point on a given query type, you can create a query rule related to it. You can also easily cache this particular query type.

As you can see, some of the data like Rule IP, Cache TTL or Schema Name are already filled. ClusterControl will also fill data based on which matching mechanism you decided to use. We can easily use either hash for a given query type or we can use Match Digest or Match Pattern if we would like to fine-tune the regular expression (for example doing the same as we did earlier and extending the regular expression to match all the tables in sbtest schema).

This is all you need to easily create query cache rules in ProxySQL. Download ClusterControl to try it today.

Tags:

MySQL Replication is very popular way of building highly available database layers. It is very well known, tested and robust. It is not without limitations, though. One of them, definitely, is the fact that it utilizes only one “entry point” - you have a dedicated server in the topology, the master, and it is the only node in the cluster to which you can issue writes. This leads to severe consequences - the master is the single point of failure and, should it fail, no write can be executed by the application. It is not a surprise that much work has been put in developing tools, which would reduce the impact of a master loss. Sure, there are discussions how to approach the topic, is the automated failover better than the manual one or not. Eventually, this is a business decision to take but should you decide to follow the automation path, you will be looking for the tools to help you achieve that. One of the tools, which is still very popular, is MHA (Master High Availability). While maybe it is not actively maintained anymore, it is still in a stable shape and its huge popularity still makes it backbone of the high available MySQL replication setups. What would happen, though, if the MHA itself became unavailable? Can it become a single point of failure? Is there a way to prevent it from happening? In this blog post we will take a look at some of the scenarios.

First things first, if you plan to use MHA, make sure you use the latest version from the repo. Do not use binary releases as they do not contain all the fixes. The installation is fairly simple. MHA consists of two parts, manager and node. Node is to be deployed on your database servers. Manager will be deployed on a separate host, along with node. So, database servers: node, management host: manager and node.

It is quite easy to compile MHA. Go to the GitHub and clone repositories.

https://github.com/yoshinorim/mha4mysql-manager

https://github.com/yoshinorim/mha4mysql-node

Then it’s all about:

perl Makefile.PL
make
make install

You may have to install some perl dependences if you don’t have all of the required packages already installed. In our case, on Ubuntu 16.04, we had to install following:

perl -MCPAN -e "install Config::Tiny"
perl -MCPAN -e "install Log::Dispatch"
perl -MCPAN -e "install Parallel::ForkManager"
perl -MCPAN -e "install Module::Install"

Once you have MHA installed, you need to configure it. We will not go into any details here, there are many resources on the internet which cover this part. A sample config (definitely non-production one) may look like this:

root@mha-manager:~# cat /etc/app1.cnf
[server default]
user=cmon
password=pass
ssh_user=root
# working directory on the manager
manager_workdir=/var/log/masterha/app1
# working directory on MySQL servers
remote_workdir=/var/log/masterha/app1
[server1]
hostname=node1
candidate_master=1
[server2]
hostname=node2
candidate_master=1
[server3]
hostname=node3
no_master=1

Next step will be to see if everything works and how MHA sees the replication:

root@mha-manager:~# masterha_check_repl --conf=/etc/app1.cnf
Tue Apr  9 08:17:04 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Apr  9 08:17:04 2019 - [info] Reading application default configuration from /etc/app1.cnf..
Tue Apr  9 08:17:04 2019 - [info] Reading server configuration from /etc/app1.cnf..
Tue Apr  9 08:17:04 2019 - [info] MHA::MasterMonitor version 0.58.
Tue Apr  9 08:17:05 2019 - [error][/usr/local/share/perl/5.22.1/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. Redundant argument in sprintf at /usr/local/share/perl/5.22.1/MHA/NodeUtil.pm line 195.
Tue Apr  9 08:17:05 2019 - [error][/usr/local/share/perl/5.22.1/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Tue Apr  9 08:17:05 2019 - [info] Got exit code 1 (Not master dead).

Well, it crashed. This is because MHA attempts to parse MySQL version and it does not expect hyphens in it. Luckily, the fix is easy to find: https://github.com/yoshinorim/mha4mysql-manager/issues/116.

Now, we have MHA ready for work.

root@mha-manager:~# masterha_manager --conf=/etc/app1.cnf
Tue Apr  9 13:00:00 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Apr  9 13:00:00 2019 - [info] Reading application default configuration from /etc/app1.cnf..
Tue Apr  9 13:00:00 2019 - [info] Reading server configuration from /etc/app1.cnf..
Tue Apr  9 13:00:00 2019 - [info] MHA::MasterMonitor version 0.58.
Tue Apr  9 13:00:01 2019 - [info] GTID failover mode = 1
Tue Apr  9 13:00:01 2019 - [info] Dead Servers:
Tue Apr  9 13:00:01 2019 - [info] Alive Servers:
Tue Apr  9 13:00:01 2019 - [info]   node1(10.0.0.141:3306)
Tue Apr  9 13:00:01 2019 - [info]   node2(10.0.0.142:3306)
Tue Apr  9 13:00:01 2019 - [info]   node3(10.0.0.143:3306)
Tue Apr  9 13:00:01 2019 - [info] Alive Slaves:
Tue Apr  9 13:00:01 2019 - [info]   node2(10.0.0.142:3306)  Version=5.7.25-28-log (oldest major version between slaves) log-bin:enabled
Tue Apr  9 13:00:01 2019 - [info]     GTID ON
Tue Apr  9 13:00:01 2019 - [info]     Replicating from 10.0.0.141(10.0.0.141:3306)
Tue Apr  9 13:00:01 2019 - [info]     Primary candidate for the new Master (candidate_master is set)
Tue Apr  9 13:00:01 2019 - [info]   node3(10.0.0.143:3306)  Version=5.7.25-28-log (oldest major version between slaves) log-bin:enabled
Tue Apr  9 13:00:01 2019 - [info]     GTID ON
Tue Apr  9 13:00:01 2019 - [info]     Replicating from 10.0.0.141(10.0.0.141:3306)
Tue Apr  9 13:00:01 2019 - [info]     Not candidate for the new Master (no_master is set)
Tue Apr  9 13:00:01 2019 - [info] Current Alive Master: node1(10.0.0.141:3306)
Tue Apr  9 13:00:01 2019 - [info] Checking slave configurations..
Tue Apr  9 13:00:01 2019 - [info] Checking replication filtering settings..
Tue Apr  9 13:00:01 2019 - [info]  binlog_do_db= , binlog_ignore_db=
Tue Apr  9 13:00:01 2019 - [info]  Replication filtering check ok.
Tue Apr  9 13:00:01 2019 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Tue Apr  9 13:00:01 2019 - [info] Checking SSH publickey authentication settings on the current master..
Tue Apr  9 13:00:02 2019 - [info] HealthCheck: SSH to node1 is reachable.
Tue Apr  9 13:00:02 2019 - [info]
node1(10.0.0.141:3306) (current master)
 +--node2(10.0.0.142:3306)
 +--node3(10.0.0.143:3306)

Tue Apr  9 13:00:02 2019 - [warning] master_ip_failover_script is not defined.
Tue Apr  9 13:00:02 2019 - [warning] shutdown_script is not defined.
Tue Apr  9 13:00:02 2019 - [info] Set master ping interval 3 seconds.
Tue Apr  9 13:00:02 2019 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Tue Apr  9 13:00:02 2019 - [info] Starting ping health check on node1(10.0.0.141:3306)..
Tue Apr  9 13:00:02 2019 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

As you can see, MHA is monitoring our replication topology, checking if the master node is available or not. Let’s consider a couple of scenarios.

Scenario 1 - MHA Crashed

Let’s assume MHA is not available. How does this affect the environment? Obviously, as MHA is responsible for monitoring the master’s health and trigger failover, this will not happen when MHA is down. Master crash will not be detected, failover will not happen. The problem is, you cannot really run multiple MHA instances at the same time. Technically, you can do it although MHA will complain about lock file:

root@mha-manager:~# masterha_manager --conf=/etc/app1.cnf
Tue Apr  9 13:05:38 2019 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Apr  9 13:05:38 2019 - [info] Reading application default configuration from /etc/app1.cnf..
Tue Apr  9 13:05:38 2019 - [info] Reading server configuration from /etc/app1.cnf..
Tue Apr  9 13:05:38 2019 - [info] MHA::MasterMonitor version 0.58.
Tue Apr  9 13:05:38 2019 - [warning] /var/log/masterha/app1/app1.master_status.health already exists. You might have killed manager with SIGKILL(-9), may run two or more monitoring process for the same application, or use the same working directory. Check for details, and consider setting --workdir separately.

It will start, though, and it will attempt to monitor the environment. The problem is when both of them starts to execute actions on the cluster. Worse case would be if they decide to use different slaves as the master candidate and failover will be executed at the same time (MHA uses a lock file which prevents subsequent failovers from happening but if everything happens at the same time, and it happened in our tests, this security measure is not enough).

Unfortunately, there is no built-in way of running MHA in a highly available manner. The most simple solution will be to write a script which would test if MHA is running and if not, start it. Such script would have to be executed from cron or written in the form of a daemon, if 1 minute granularity of cron is not enough.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Scenario 2 - MHA Manager Node Lost Network Connection to the Master

Let’s be honest, this is a really bad situation. As soon as MHA cannot connect to the master, it will attempt to perform a failover. The only exception is if secondary_check_script is defined and it verified that the master is alive. It is up to the user to define exactly what actions MHA should perform to verify master’s status - it all depends on the environment and exact setup. Another very important script to define is master_ip_failover_script - this is executed upon failover and it should be used, among others, to ensure that the old master will not show up. If you happen to have access to additional ways of reaching and stopping old master, that’s really great. It can be remote management tools like Integrated Lights-out, it can be access to manageable power sockets (where you can just power off the server), it can be access to cloud provider’s CLI, which will make it possible to stop the virtual instance. It is of utmost importance to stop the old master - otherwise it may happen that, after the network issue is gone, you will end up with two writeable nodes in the system, which is a perfect solution for the split brain, a condition in which data diverged between two parts of the same cluster.

As you can see, MHA can handle the MySQL failover pretty well. It definitely requires careful configuration and you will have to write external scripts, which will be utilized to kill the old master and ensure that the split brain will not happen. Having said that, we would still recommend to use more advanced failover management tools like Orchestrator or ClusterControl, which can perform more advanced analysis of the replication topology state (for example, by utilizing slaves or proxies to assess the master’s availability) and which are and will be maintained in the future. If you are interested to learn how ClusterControl performs failover, we would like to invite you to read this blog post on the failover process in ClusterControl. You can also learn how ClusterControl interacts with ProxySQL delivering smooth, transparent failover for your application. You can always test ClusterControl by downloading it for free.

Tags:

mha

load balancing

Migrations between different environments are uncommon in database world. Migrations from one provider to another one. Moving from one datacenter to another. All of this happens on a regular basis. Organisations search for expense reduction, better flexibility and velocity. Those who owned their datacenter look forward to switch to one of the cloud providers where they can benefit from better scalability and handling capacity changes. Migrations touch all elements of the database environment - databases themselves but also the proxy and caching layer. Moving databases around is tricky but it is also hard to manage multiple proxy instances, ensuring that the configuration is in sync across all of them.

In this blog post we will take a look at challenges related to one particular piece of migration - migrating ProxySQL proxy layer from on-prem environment to EC2. Please keep in mind this is just an example, the truth is, the majority of the migration scenarios will look pretty much similar so the suggestions we are going to give in this blog post should apply to the majority of the cases. Let’s take a look at the initial setup.

Initial On-Prem Environment

The initial, on-prem setup is fairly simple - we have a three node Galera Cluster, two ProxySQL instances which are configured to route the traffic to backend databases. Each ProxySQL instance has a Keepalived colocated. Keepalived manages Virtual IP and assigns it to one of the available ProxySQL instances. Should that instance fails, VIP will be moved to the other ProxySQL, which will commence serving the traffic. Application servers use VIP to connect to and they are not aware of the setup of the proxy and database tiers.

Migrating to an AWS EC2 Environment.

There are a couple of prerequisites that are required before we can plan a migration. Migrating the proxy layer is no different in this regard. First of all, we have to have a network access between existing, on-prem environment and EC2. We will not be going into details here as there are numerous options to accomplish that. AWS provides services like AWS Direct Connect or hybrid cloud integration in Amazon Virtual Private Cloud. You can use solutions like setting up OpenVPN server or even use SSH tunneling to do the trick. All depends on the available hardware and software options at your disposal and how flexible you want the solution to be. Once the connectivity is there, let’s stop a bit and think how the setup should look like.

From the ProxySQL standpoint, there is one main concern - how to ensure that the configuration of the ProxySQL instances in the EC2 will be in sync with the configuration of ProxySQL instances on-prem? This may not be a big deal if your configuration is pretty much stable - you are not adding query rules, you are not tweaking the configuration. In that case it will be enough just to apply the existing configuration to newly created ProxySQL instances in EC2. There are a couple of ways to do that. First of all, if you are ClusterControl user, the simplest way will be to use “Synchronize Instances” job which is designed to do exactly this.

Another option could be to use dump command from SQLite: http://www.sqlitetutorial.net/sqlite-dump/

If you store your configuration as a part of some sort of infrastructure orchestration tool (Ansible, Chef, Puppet), you can easily reuse those scripts to provision new ProxySQL instances with proper configuration.

What if the configuration changes quite often? Well, there are additional options to consider. First of all, most likely all the solutions above would work too, as long as the ProxySQL configuration is not changing every couple of minutes (which is highly unlikely) - you can always sync the configuration straight before you do the switchover.

For the cases where configuration changes quite often you can consider setting up a ProxySQL cluster. The setup has been explained in detail in our blog post: https://severalnines.com/blog/how-cluster-your-proxysql-load-balancers. If you would like to use this solution in a hybrid setup, over WAN connection, you may want to increase cluster_check_interval_ms a bit from default 1 second to a higher value (5 - 10 seconds). ProxySQL cluster will ensure that all the configuration changes made in on-prem setup will be replicated by ProxySQL instances in EC2.

Final thing to consider - how to switch to correct servers in ProxySQL? The gist is - ProxySQL stores list of backend MySQL servers to connect to. It tracks their health and monitors latency. In the setup we discuss our on-prem ProxySQL servers hold list of backend servers which are also located on-prem. This is the configuration we will sync to the EC2 ProxySQL servers. This is not a hard problem to tackle and there are a couple of ways to work around it.

For example, you can add servers in OFFLINE_HARD mode in a separate hostgroup - this will imply that the nodes are not available and using a new hostgroup for them will ensure that ProxySQL will not check their state like it does for Galera nodes configured in hostgroups used for read/write splitting.

Alternatively you can simply skip those nodes for now and, while doing the switchover, remove existing servers and then run couple INSERT commands to add backend nodes from EC2.

Conclusion

As you can see, the process of migrating ProxySQL from on-prem setups to cloud is quite easy to accomplish - as long as you have network connectivity, remaining steps are far from complex. We hope this short blog post helped you to understand what’s required in this process and how to plan it.

Tags:

Choosing Your HA Topology

A load balancer or proxy will sit in between the application and the database host and work transparently as if the client would connect to the database host directly. Just like with the Virtual IP and resource managers, the load balancers and proxies also need to monitor the hosts and redirect the traffic if one host is down. ClusterControl supports two proxies: HAProxy and ProxySQL and both are supported for MySQL master-slave replication and Galera cluster. HAProxy and ProxySQL both have their own use cases, we will describe them in this post as well.

Why do you Need a Load Balancer?

In theory you don’t need a load balancer but in practice you will prefer one. We’ll explain why.

That should be enough background information on this topic, so let’s see how you can deploy both load balancers for MySQL replication and Galera topologies.

Deploying HAProxy

Using ClusterControl to deploy HAProxy on a Galera cluster is easy: go to the relevant cluster and select “Add Load Balancer”:

And you will be able to deploy an HAProxy instance by adding the host address and selecting the server instances you wish to include in the configuration:

Under advanced settings you can set timeouts, maximum amount of connections and even secure the proxy by whitelisting an IP range for the connections.

Under the nodes tab of that cluster, the HAProxy node will appear:

Now suppose the one server instance would go down, HAProxy will notice this within a few seconds and stop sending traffic to this instance:

The two other nodes are still fine and will keep receiving traffic. This retains the cluster highly available without the client even noticing the difference.

Deploying a Secondary HAProxy Node

The benefit compared to using virtual IPs on the database nodes is that the logic for MySQL is at the proxy level and the failover for the proxies is simple.

So let’s deploy a secondary HAProxy node:

After we have deployed a secondary HAProxy node, we need to add Keepalived:

And after Keepalived has been added, your nodes overview will look like this:

So now instead of pointing your application connections to the HAProxy node directly you have to point them to the virtual IP instead.

Deploying ProxySQL

Deploying ProxySQL to your cluster is done in a similar way to HAProxy: "Add Load Balancer" in the cluster list under ProxySQL tab.

After ProxySQL has been deployed, it will be available under the Nodes tab:

Check out our ProxySQL tutorial page which covers extensively on how to perform database Load Balancing for MySQL and MariaDB with ProxySQL.

Deploying Garbd

If ClusterControl detects that your Galera cluster consists of an even number of nodes, you will be given the warning/advice by ClusterControl to extend the cluster to an odd number of nodes:

Choose wisely the host to deploy garbd on, as it will receive all replicated data. Make sure the network can handle the traffic and is secure enough. You could choose one of the HAProxy or ProxySQL hosts to deploy garbd on, like in the example below:

Take note that starting from ClusterControl 1.5.1, garbd cannot be installed on the same host as ClusterControl due to risk of package conflicts.

After installing garbd, you will see it appear next to your two Galera nodes:

Final Thoughts

This finalizes the deployment side of ClusterControl. In our next blog, we will show you how to integrate ClusterControl within your organization by using groups and assigning certain roles to users.

Tags:

We’re excited to announce a major update to our tutorial “Database Load Balancing for MySQL and MariaDB with ProxySQL”

ProxySQL is a lightweight yet complex protocol-aware proxy that sits between the MySQL clients and servers. It is a gate, which basically separates clients from databases, and is therefore an entry point used to access all the database servers.

In this new update we’ve…

Updated the information about how to best deploy ProxySQL via ClusterControl
Revamped the section “Getting Started with ProxySQL”
Added a new section on Data Masking
Added new frequently asked questions (FAQs)

Load balancing and high availability go hand-in-hand. ClusterControl makes it easy to deploy and configure several different load balancing technologies for MySQL and MariaDB with a point-and-click graphical interface, allowing you to easily try them out and see which ones work best for your unique needs.

ClusterControl for ProxySQL

Included in ClusterControl Advanced and Enterprise, ProxySQL enables MySQL, MariaDB and Percona XtraDB database systems to easily manage intense, high-traffic database applications without losing availability. ClusterControl offers advanced, point-and-click configuration management features for the load balancing technologies we support. We know the issues regularly faced and make it easy to customize and configure the load balancer for your unique application needs.

We know load balancing and support many different technologies. ClusterControl has many things preconfigured to get you started with a couple of clicks. If you run into challenged we also provide resources and on-the-spot support to help ensure your configurations are running at peak performance.

ClusterControl delivers on an array of features to help deploy and manage ProxySQL

Advanced Graphical Interface - ClusterControl provides the only GUI on the market for the easy deployment, configuration and management of ProxySQL.
Point and Click deployment - With ClusterControl you’re able to apply point and click deployments to MySQL, MySQL replication, MySQL Cluster, Galera Cluster, MariaDB, MariaDB Galera Cluster, and Percona XtraDB technologies, as well the top related load balancers with HAProxy, MaxScale and ProxySQL.
Suite of monitoring graphs - With comprehensive reports you have a clear view of data points like connections, queries, data transfer and utilization, and more.
Configuration Management - Easily configure and manage your ProxySQL deployments with a simple UI. With ClusterControl you can create servers, re-orientate your setup, create users, set rules, manage query routing, and enable variable configurations.

Make sure to check out the update tutorial today!

Tags:

Introduction

Nowadays, high availability is a requirement for many systems, no matter what technology we use. This is especially important for databases, as they store data that applications rely upon. There are different ways to replicate data across multiple servers, and failover traffic when e.g. a primary server stops responding.

Architecture

There are several architectures for PostgreSQL high availability, but the basic ones would be master-slave and master-master architectures.

Master-Slave

This may be the most basic HA architecture we can setup, and often times, the more easy to set and maintain. It is based on one master database with one or more standby servers. These standby databases will remain synchronized (or almost synchronized) with the master, depending on whether the replication is synchronous or asynchronous. If the main server fails, the standby contains almost all of the data of the main server, and can quickly be turned into the new master database server.

We can have two categories of standby databases, based on the nature of the replication:

Logical standbys - The replication between the master and the slaves is made via SQL statements.
Physical standbys - The replication between the master and the slaves is made via the internal data structure modifications.

In the case of PostgreSQL, a stream of write-ahead log (WAL) records is used to keep the standby databases synchronized. This can be synchronous or asynchronous, and the entire database server is replicated.

From version 10, PostgreSQL includes a built in option to setup logical replication which is based on constructing a stream of logical data modifications from the information in the WAL. This replication method allows the data changes from individual tables to be replicated without the need of designating a master server. It also allows data to flow in multiple directions.

But a master-slave setup is not enough to effectively ensure high availability, as we also need to handle failures. To handle failures, we need to be able to detect them. Once we know there is a failure, e.g. errors on the master or the master is not responding, we can then select a slave and failover to it with the smaller delay possible. It is important that this process is as efficient as possible, in order to restore full functionality so the applications can start functioning again. PostgreSQL itself does not include an automatic failover mechanism, so that will require some custom script or third party tools for this automation.

After a failover happens, the application(s) need to be notified accordingly, so they can start using the new master. Also, we need to evaluate the state of our architecture after a failover, because we can run in a situation where we only have the new master running (i.e., we had a master and only one slave before the issue). In that case, we will need to add a slave somehow so as to re-create the master-slave setup we originally had for HA.

Master-Master Architectures

This architecture provides a way of minimizing the impact of an error on one of the nodes, as the other node(s) can take care of all the traffic, maybe slightly affecting the performance, but never losing functionality. This architecture is often used with the dual purpose of not only creating an HA environment, but also to scale horizontally (as compared to the concept of vertical scalability where we add more resources to a server).

PostgreSQL does not yet support this architecture "natively", so you will have to refer to third party tools and implementations. When choosing a solution you must keep in mind that there are a lot of projects/tools, but some of them are not being supported anymore, while others are new and might not be battle-tested in production.

For further reading on HA/Clustering architectures for Postgres, please refer to this blog.

Load Balancing

Load balancers are tools that can be used to manage the traffic from your application to get the most out of your database architecture.

Not only is it useful for balancing the load of our databases, it also helps applications get redirected to the available/healthy nodes and even specify ports with different roles.

HAProxy is a load balancer that distributes traffic from one origin to one or more destinations and can define specific rules and/or protocols for this task. If any of the destinations stops responding, it is marked as offline, and the traffic is sent to the rest of the available destinations.

Keepalived is a service that allows us to configure a virtual IP within an active/passive group of servers. This virtual IP is assigned to an active server. If this server fails, the IP is automatically migrated to the “Secondary” passive server, allowing it to continue working with the same IP in a transparent way for the systems.

Let's see how to implement, using ClusterControl, a master-slave PostgreSQL cluster with load balancer servers and keepalived configured between them, all this from a friendly and easy to use interface.

For our example we will create:

3 PostgreSQL servers (one master and two slaves).
2 HAProxy Load Balancers.
Keepalived configured between the load balancer servers.

Architecture diagram

Database Deployment

To perform a deployment from ClusterControl, simply select the option “Deploy” and follow the instructions that appear.

ClusterControl PostgreSQL Deploy 1

When selecting PostgreSQL, we must specify User, Key or Password and port to connect by SSH to our servers. We also need the name for our new cluster and if we want ClusterControl to install the corresponding software and configurations for us.

ClusterControl PostgreSQL Deploy 2

After setting up the SSH access information, we must define the database user, version and datadir (optional). We can also specify which repository to use.

In the next step, we need to add our servers to the cluster that we are going to create.

ClusterControl PostgreSQL Deploy 3

When adding our servers, we can enter IP or hostname.

In the last step, we can choose if our replication will be Synchronous or Asynchronous.

ClusterControl PostgreSQL Deploy 4

We can monitor the status of the creation of our new cluster from the ClusterControl activity monitor.

ClusterControl PostgreSQL Deploy 5

Once the task is finished, we can see our cluster in the main ClusterControl screen.

ClusterControl Cluster View

Once we have our cluster created, we can perform several tasks on it, like adding a load balancer (HAProxy) or a new replica.

Load balancer deployment

To perform a load balancer deployment, select the option “Add Load Balancer” in the cluster actions and fill the asked information.

ClusterControl PostgreSQL Load Balancer

We only need to add IP/Name, port, policy and the nodes we are going to use.

Keepalived deployment

To perform a keepalived deployment, select the cluster, go to “Manage” menu and “Load Balancer” section, and then select “Keepalived” option.

ClusterControl PostgreSQL Keepalived

For our HA environment, we need to select the load balancer servers and the virtual IP address.

Keepalived uses a virtual IP and migrates it from one load balancer to another in case of failure, so our setup can continue to function normally.

If we followed the previous steps, we should have the following topology:

ClusterControl PostgreSQL Topology

In the “Node” section, we can check the status and some metrics of our current servers in the cluster.

ClusterControl PostgreSQL Nodes

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Download the Whitepaper

ClusterControl Failover

If the “Autorecovery” option is ON, in case of master failure, ClusterControl will promote the most advanced slave (if it is not in blacklist) to master as well as notify us of the problem. It also fails over the rest of the slaves to replicate from the new master.

HAProxy is configured with two different ports, one read-write and one read-only.

In our read-write port, we have our master server as online and the rest of our nodes as offline, and in the read-only port we have both the master and the slaves online.

When HAProxy detects that one of our nodes, either master or slave, is not accessible, it automatically marks it as offline and does not take it into account for sending traffic to it. Detection is done by healthcheck scripts that are configured by ClusterControl at time of deployment. These check whether the instances are up, whether they are undergoing recovery, or are read-only.

When ClusterControl promotes a slave to master, our HAProxy marks the old master as offline (for both ports) and puts the promoted node online (in the read-write port).

If our active HAProxy, which is assigned a Virtual IP address to which our systems connect, fails, Keepalived migrates this IP to our passive HAProxy automatically. This means that our systems are then able to continue to function normally.

In this way, our systems continue to operate normally and without our intervention.

Considerations

If we manage to recover our old failed master, it will NOT be re-introduced automatically to the cluster. We need to do it manually. One reason for this is that, if our replica was delayed at the time of the failure, if we add the old master to the cluster, it would mean loss of information or inconsistency of data across nodes. We might also want to analyze the issue in detail. If we just re-introduced the failed node into the cluster, we would possibly lose diagnostic information.

Also, if failover fails, no further attempts are made. Manual intervention is required to analyze the problem and perform the corresponding actions. This is to avoid the situation where ClusterControl, as the high availability manager, tries to promote the next slave and the next one. There might be a problem and we need to check this.

Security

One important thing we cannot forget before going into prod with our HA environment is to ensure the security of it.

There are several security aspects such as encryption, role management and access restriction by IP address. These topics were seen in depth in a previous blog, so we will only point them out in this blog.

In our PostgreSQL database, we have the pg_hba.conf file which handles the client authentication. We can limit the type of connection, the source IP or network, which database we can connect to and with which users. Therefore, this file is a very important piece for our security.

We can configure our PostgreSQL database from the postgresql.conf file, so it only listens on a specific network interface, and on a different port that the default port (5432), thus avoiding basic connection attempts from unwanted sources.

A correct user management, either using secure passwords or limiting access and privileges, is also an important piece of the security settings. It is recommended to assign the minimum amount of privileges possible to users, as well as to specify, if possible, the source of the connection.

We can also enable data encryption, either in transit or at rest, avoiding access to information to unauthorized persons.

The audit is important to know what happens or happened in our database. PostgreSQL allows you to configure several parameters for logging or even use the pgAudit extension for this task.

Last but not least, it is recommended to keep our database and servers up to date with the latest patches, to avoid security risks. For this ClusterControl gives us the possibility to generate operational reports to verify if we have update availables, and even help us to update our database.

Conclusion

In this blog we have reviewed some concepts regarding HA. We went through some possible architectures and the necessary components to set them effectively.

After that we explained how ClusterControl makes use of these components to deploy a complete HA environment for PostgreSQL.

And finally we reviewed some important security aspects to take into account before going live.

Tags:

We’re excited to announce a major update to our tutorial “Database Load Balancing for MySQL and MariaDB with ProxySQL”

In this new update we’ve…

Updated the information about how to best deploy ProxySQL via ClusterControl
Revamped the section “Getting Started with ProxySQL”
Added a new section on Data Masking
Added new frequently asked questions (FAQs)

ClusterControl for ProxySQL

ClusterControl delivers on an array of features to help deploy and manage ProxySQL

Advanced Graphical Interface - ClusterControl provides the only GUI on the market for the easy deployment, configuration and management of ProxySQL.
Point and Click deployment - With ClusterControl you’re able to apply point and click deployments to MySQL, MySQL replication, MySQL Cluster, Galera Cluster, MariaDB, MariaDB Galera Cluster, and Percona XtraDB technologies, as well the top related load balancers with HAProxy, MaxScale and ProxySQL.
Suite of monitoring graphs - With comprehensive reports you have a clear view of data points like connections, queries, data transfer and utilization, and more.
Configuration Management - Easily configure and manage your ProxySQL deployments with a simple UI. With ClusterControl you can create servers, re-orientate your setup, create users, set rules, manage query routing, and enable variable configurations.

Make sure to check out the update tutorial today!

Tags:

HAProxy is one of the most popular load balancers for MySQL and MariaDB.

Feature-wise, it cannot be compared to ProxySQL or MaxScale, but it is fast and robust and may work perfectly fine in any environment as long as the application can perform the read/write split and send SELECT queries to one backend and all writes and SELECT … FOR UPDATE to a separate backend.

Keeping track of the metrics made available by HAProxy is very important. You have to be able to know the state of your proxy, especially if you encountered any issues.

ClusterControl always made available an HAProxy status page, which showed the state in real time. With the new, Prometheus-based SCUMM (Severalnines ClusterControl Unified Monitoring & Management) dashboards, it is now possible to track how those metrics change in time.

In this blog post we will go over the different metrics presented in the HAProxy SCUMM dashboard.

First of all, by default Prometheus and SCUMM dashboards are disabled in ClusterControl. In that case, it’s just a matter of one click to deploy them for a given cluster. If you have multiple clusters monitored by ClusterControl you can reuse the same Prometheus instance for every cluster managed by ClusterControl.

Once deployed, we can access the HAProxy dashboard. We will go over the data shown in it.

As you can see, it starts with the information about the state of the backends. Here, please note this may depend on the cluster type and how you deployed HAProxy. In this case, it was a Galera cluster and HAProxy was deployed in a round-robin fashion; therefore you see three backends for reads and three for writes, six in total. It is also the reason why you see all backends marked as up. In case of the replication cluster, things are looking different as the HAProxy will be deployed in a read/write split, and the scripts will keep only one host (master) up and running in the writer’s backend:

This is why on the screen above, you can see two backend servers marked as “down”.

Next graph focuses on the data sent and received by both backend (from HAProxy to the database servers) and frontend (between HAProxy and client hosts).

We can also check the traffic distribution between backends that are configured in HAProxy. In this case we have two backends and the queries are sent via port 3308, which acts as the round-robin access point to our Galera Cluster.

We can also find graphs showing how the traffic was distributed across all backend servers. In this case, due to round-robin access pattern, data was more or less evenly distributed across all three backend Galera servers.

Next, we can see information about sessions. How many sessions were opened from HAProxy to the backend servers. We can also track how many times per second a new session was opened to the backend. You can also check how those metrics look like when you look at them on per backend server basis.

Next two graphs show what was the maximum number of sessions per backend server and also when some connectivity issues showed up. This can be quite useful for debugging purposes when you hit some configuration error on your HAProxy instance and connections started to be dropped.

Next graph might be even more valuable as it shows different metrics related to error handling - response errors, request errors, retries on the backend side etc. Then we have a Sessions graph, which shows the overview of the session metrics.

On the next graph we can track the connection errors in time, this can be also useful to pinpoint the time when the issue started to evolve.

Finally, two graphs related to queued requests. HAProxy queues requests to backend if the backend servers are oversaturated. This can point to, for example, the overloaded database servers, which cannot cope with more traffic.

As you can see, ClusterControl tracks the most important metrics of HAProxy and can show how they change in time. This is very useful in pinpointing when an issue started and, to some extent, what could be the root cause of it. Try it out (it’s free) for yourself.

Tags:

Below is an excerpt from our whitepaper “PostgreSQL Management and Automation with ClusterControl” which can be downloaded for free.

Revision Note: Keep in mind that the terms used in this blog Master-Slave are synonymous of Master-Standby terms used by PostgreSQL. We're using Master-Slave to keep the parallelism with others technologies.

For HA configuration we can have several architectures, but the basic ones would be master-slave and master-master architectures.Database servers can work together to allow a second server to take over quickly if the primary server fails (high availability), or to allow several computers to serve the same data (load balancing).

PostgreSQL Master-Slave Architectures

These architectures enable us to maintain an master database with one or more standby servers ready to take over operations if the primary server fails. These standby databases will remain synchronized (or almost synchronized) with the master.

The replication between the master and the slaves can be made via SQL statements (logical standbys) or via the internal data structure modifications (physical standbys). PostgreSQL uses a stream of write-ahead log (WAL) records to keep the standby databases synchronized. If the main server fails, the standby contains almost all of the data of the main server, and can be quickly made the new master database server. This can be synchronous or asynchronous and can only be done for the entire database Server.

Setting up streaming replication is a task that requires some steps to be followed thoroughly. For those steps and some more background on this subject, please see: Become a PostgreSQL DBA - How to Setup Streaming Replication for High Availability.

From version 10, PostgreSQL includes the option to setup logical replication.

Logical replication allows a database server to send a stream of data modifications to another server. PostgreSQL logical replication constructs a stream of logical data modifications from the WAL. Logical replication allows the data changes from individual tables to be replicated. It doesn’t require a particular server to be designated as a master or a replica but allows data to flow in multiple directions.

You can find more information regarding logical replication: Blog: An Overview of Logical Replication in PostgreSQL.

To effectively ensure high availability, it is not enough to have a master-slave architecture. We also need to enable some automatic form of failover, so if something fails we can have the smallest possible delay in resuming normal functionality. PostgreSQL does not include an automatic failover mechanism to identify failures on the master database and notify the salve to take ownership, so that will require a little bit of work on the DBA’s side. You should work on a script that includes the pg_ctl promote command, that will promote the slave as a new master. There are also some third party tools for this automation. Many such tools exist and are well integrated with the operating system facilities required for successful failover, such as IP address Migration.

After a failover happens, you need to modify your application accordingly to work with the new master. You will also have only one server working, so re-creation of the master-slave architecture needs to be done, so we get back to the same normal situation that we had before the issue.

PostgreSQL Management & Automation with ClusterControl

Learn about what you need to know to deploy, monitor, manage and scale PostgreSQL

Download the Whitepaper

PostgreSQL Master-Master Architectures

This architecture provides a way of minimizing the impact of an error in one of the nodes, as the other node can take care of all the traffic, maybe slightly affecting the performance, but never losing functionality. It is also used to accomplish (and maybe this is even a more interesting point) horizontal scalability (scale-out), opposite to the concept of vertical scalability where we add more resources to a server (scale-up).

For implementing this architecture, you will need to use external tools, as this feature is not (yet) natively supported by PostgreSQL.

You must be very careful when choosing a solution for implementing master-master, as there are many different products. A lot of them are still “green” , with few serious users or success cases. Some other projects have, on the other hand, been abandoned, as there are no active maintainers.

For more information on the available tools please refer to: Blog: Top PG Clustering HA Solutions for PostgreSQL.

Load Balancing and Connection Pooling

There are several load balancer tools that can be used to manage the traffic from your application to get the most of your database architecture. In the same way, there are some others that can help you manage the way the application connects to the database, by pooling these connections and reusing them between different requests.

There are some products that are used for both purposes, like the well known pgpool, and some others that will focus in only one of these features, like pgbouncer (connection pooling) and HAProxy (used for load balancing).

Tags:

High availability is a high percentage of time that the system is working and responding according to the business needs. For production database systems it is typically the highest priority to keep it close to 100%. We build database clusters to eliminate all single point of failure. If an instance becomes unavailable, another node should be able to take the workload and carry on from there. In a perfect world, a database cluster would solve all of our system availability problems. Unfortunately, while all may look good on paper, the reality is often different. So where can it go wrong?

Transactional databases systems come with sophisticated storage engines. Keeping data consistent across multiple nodes makes this task way harder. Clustering introduces a number of new variables that highly depend on network and underlying infrastructure. It is not uncommon for a standalone database instance that was running fine on a single node suddenly performs poorly in a cluster environment.

Among the number of things that can affect cluster availability, latency issues play a crucial role. However, what is the latency? Is it only related to the network?

The term "latency" actually refers to several kinds of delays incurred in the processing of data. It’s how long it takes for a piece of information to move from stage to another.

In this blog post, we’ll look at the two main high availability solutions for MySQL and MariaDB, and how they can each be affected by latency issues.

At the end of the article, we take a look at modern load balancers and discuss how they can help you address some types of latency issues.

In a previous article, my colleague Krzysztof Książek wrote about "Dealing with Unreliable Networks When Crafting an HA Solution for MySQL or MariaDB". You will find tips which can help you to design your production ready HA architecture, and avoid some of the issues described here.

Master-Slave replication for High Availability.

MySQL master-slave replication is probably the most popular database cluster type on the planet. One of the main things you want to monitor while running your master-slave replication cluster is the slave lag. Depending on your application requirements and the way how you utilize your database, the replication latency (slave lag) may determine if the data can be read from the slave node or not. Data committed on master but not yet available on an asynchronous slave means that the slave has an older state. When it’s not ok to read from a slave, you would need to go to the master, and that can affect application performance. In the worst case scenario, your system will not be able to handle all the workload on a master.

Slave lag and stale data

To check the status of the master-slave replication, you should start with below command:

SHOW SLAVE STATUS\G
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.3.100
                  Master_User: rpl_user
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: binlog.000021
          Read_Master_Log_Pos: 5101
               Relay_Log_File: relay-bin.000002
                Relay_Log_Pos: 809
        Relay_Master_Log_File: binlog.000021
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 5101
              Relay_Log_Space: 1101
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 3
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 0-3-1179
      Replicate_Do_Domain_Ids: 
  Replicate_Ignore_Domain_Ids: 
                Parallel_Mode: conservative
1 row in set (0.01 sec)

Using the above information you can determine how good the overall replication latency is. The lower the value you see in "Seconds_Behind_Master", the better the data transfer speed for replication.

Another way to monitor slave lag is to use ClusterControl replication monitoring. In this screenshot we can see the replication status of asymchoronous Master-Slave (2x) Cluster with ProxySQL.

There are a number of things that can affect replication time. The most obvious is the network throughput and how much data you can transfer. MySQL comes with multiple configuration options to optimize replication process. The essential replication related parameters are:

Parallel apply
Logical clock algorithm
Compression
Selective master-slave replication
Replication mode

Parallel apply

It’s not uncommon to start replication tuning with enabling parallel process apply. The reason for that is by default, MySQL goes with sequential binary log apply, and a typical database server comes with several CPUs to use.

To get around sequential log apply, both MariaDB and MySQL offer parallel replication. The implementation may differ per vendor and version. E.g. MySQL 5.6 offers parallel replication as long as a schema separates the queries while MariaDB (starting version 10.0) and MySQL 5.7 both can handle parallel replication across schemas. Different vendors and versions come with their limitations and feature so always check the documentation.

Executing queries via parallel slave threads may speed up your replication stream if you are write heavy. However, if you aren’t, it would be best to stick to the traditional single-threaded replication. To enable parallel processing, change the slave_parallel_workers to the number of CPU threads you want to involve in the process. It is recommended to keep the value lower of the number of available CPU threads.

Parallel replication works best with the group commits. To check if you have group commits happening run following query.

show global status like 'binlog_%commits';

The bigger the ratio between these two values the better.

Logical clock

The slave_parallel_type=LOGICAL_CLOCK is an implementation of a Lamport clock algorithm. When using a multithreaded slave this variable specifies the method used to decide which transactions are allowed to execute in parallel on the slave. The variable has no effect on slaves for which multithreading is not enabled so make sure slave_parallel_workers is set higher than 0.

MariaDB users should also check optimistic mode introduced in version 10.1.3 as it also may give you better results.

GTID

MariaDB comes with its own implementation of GTID. MariaDB’s sequence consists of a domain, server, and transaction. Domains allow multi-source replication with distinct ID. Different domain ID’s can be used to replicate the portion of data out-of-order (in parallel). As long it’s okayish for your application this can reduce replication latency.

The similar technique applies to MySQL 5.7 which can also use the multisource master and independent replication channels.

Compression

CPU power is getting less expensive over time, so using it for binlog compression could be a good option for many database environments. The slave_compressed_protocol parameter tells MySQL to use compression if both master and slave support it. By default, this parameter is disabled.

Starting from MariaDB 10.2.3, selected events in the binary log can be optionally compressed, to save the network transfers.

Replication formats

MySQL offers several replication modes. Choosing the right replication format helps to minimize the time to pass data between the cluster nodes.

Multimaster Replication For High Availability

Some applications can not afford to operate on outdated data.

In such cases, you may want to enforce consistency across the nodes with synchronous replication. Keeping data synchronous requires an additional plugin, and for some, the best solution on the market for that is Galera Cluster.

Galera cluster comes with wsrep API which is responsible of transmitting transactions to all nodes and executing them according to a cluster-wide ordering. This will block the execution of subsequent queries until the node has applied all write-sets from its applier queue. While it’s a good solution for consistency, you may hit some architectural limitations. The common latency issues can be related to:

The slowest node in the cluster
Horizontal scaling and write operations
Geolocated clusters
High Ping
Transaction size

The slowest node in the cluster

By design, the write performance of the cluster cannot be higher than the performance of the slowest node in the cluster. Start your cluster review by checking the machine resources and verify the configuration files to make sure they all run on the same performance settings.

Parallelization

Parallel threads do not guarantee better performance, but it may speed up the synchronization of new nodes with the cluster. The status wsrep_cert_deps_distance tells us the possible degree of parallelization. It is the value of the average distance between the highest and lowest seqno values that can be possibly applied in parallel. You can use the wsrep_cert_deps_distance status variable to determine the maximum number of slave threads possible.

Horizontal scaling

By adding more nodes in the cluster, we have fewer points that could fail; however, the information needs to go across multi-instances until it’s committed, which multiplies the response times. If you need scalable writes, consider an architecture based on sharding. A good solution can be a Spider storage engine.

In some cases, to reduce information shared across the cluster nodes, you can consider having one writer at a time. It’s relatively easy to implement while using a load balancer. When you do this manually make sure you have a procedure to change DNS value when your writer node goes down.

Geolocated clusters

Although Galera Cluster is synchronous, it is possible to deploy a Galera Cluster across data centers. Synchronous replication like MySQL Cluster (NDB) implements a two-phase commit, where messages are sent to all nodes in a cluster in a 'prepare' phase, and another set of messages are sent in a 'commit' phase. This approach is usually not suitable for geographically disparate nodes, because of the latencies in sending messages between nodes.

High Ping

Galera Cluster with the default settings does not handle well high network latency. If you have a network with a node that shows a high ping time, consider changing evs.send_window and evs.user_send_window parameters. These variables define the maximum number of data packets in replication at a time. For WAN setups, the variable can be set to a considerably higher value than the default value of 2. It’s common to set it to 512. These parameters are part of wsrep_provider_options.

--wsrep_provider_options="evs.send_window=512;evs.user_send_window=512"

Transaction size

One of the things you need to consider while running Galera Cluster is the size of the transaction. Finding the balance between the transaction size, performance and Galera certification process is something you have to estimate in your application. You can find more information about that in the article How to Improve Performance of Galera Cluster for MySQL or MariaDB by Ashraf Sharif.

Load Balancer Causal Consistency Reads

Even with the minimized risk of data latency issues, standard MySQL asynchronous replication cannot guarantee consistency. It is still possible that the data is yet not replicated to slave while your application is reading it from there. Synchronous replication can solve this problem, but it has architecture limitations and may not fit your application requirements (e.g., intensive bulk writes). So how to overcome it?

The first step to avoid stale data reading is to make the application aware of replication delay. It is usually programmed in application code. Fortunately, there are modern database load balancers with the support of adaptive query routing based on GTID tracking. The most popular are ProxySQL and Maxscale.

ProxySQL 2.0

ProxySQL Binlog Reader allows ProxySQL to know in real time which GTID has been executed on every MySQL server, slaves and master itself. Thanks to this, when a client executes a reads that needs to provide causal consistency reads, ProxySQL immediately knows on which server the query can be executed. If for whatever reason the writes were not executed on any slave yet, ProxySQL will know that the writer was executed on master and send the read there.

Maxscale 2.3

MariaDB introduced casual reads in Maxscale 2.3.0. The way it works it’s similar to ProxySQL 2.0. Basically when causal_reads are enabled, any subsequent reads performed on slave servers will be done in a manner that prevents replication lag from affecting the results. If the slave has not caught up to the master within the configured time, the query will be retried on the master.

Tags: