Slony-I REL_1_1_0 Documentation | ||||
---|---|---|---|---|
Prev | Fast Backward | Fast Forward | Next |
Slony-I is an asynchronous replication system. Because of that, it is almost certain that at the moment the current origin of a set fails, the final transactions committed at the origin will have not yet propagated to the subscribers. Systems are particularly likely to fail under heavy load; that is one of the corollaries of Murphy's Law. Therefore the principal goal is to prevent the main server from failing. The best way to do that is frequent maintenance.
Opening the case of a running server is not exactly what we should consider a "professional" way to do system maintenance. And interestingly, those users who found it valuable to use replication for backup and failover purposes are the very ones that have the lowest tolerance for terms like "system downtime." To help support these requirements, Slony-I not only offers failover capabilities, but also the notion of controlled origin transfer.
It is assumed in this document that the reader is familiar with the slonik utility and knows at least how to set up a simple 2 node replication system with Slony-I.
We assume a current "origin" as node1 with one "subscriber" as node2 (e.g. - slave). A web application on a third server is accessing the database on node1. Both databases are up and running and replication is more or less in sync. We do controlled switchover using MOVE SET.
At the time of this writing switchover to another server requires the application to reconnect to the database. So in order to avoid any complications, we simply shut down the web server. Users who use pg_pool for the applications database connections merely have to shut down the pool.
A small slonik script executes the following commands:
lock set (id = 1, origin = 1); wait for event (origin = 1, confirmed = 2); move set (id = 1, old origin = 1, new origin = 2); wait for event (origin = 1, confirmed = 2);
After these commands, the origin (master role) of data set 1 has been transferred to node2. And it is not simply transferred; it is done in a fashion such that node1 becomes a fully synchronized subscriber, actively replicating the set. So the two nodes have switched roles completely.
After reconfiguring the web application (or pgpool ) to connect to the database on node2, the web server is restarted and resumes normal operation.
Done in one shell script, that does the application shutdown, slonik, move config files and startup all together, this entire procedure is likely to take less than 10 seconds.
You may now simply shutdown the server hosting node1 and do whatever is required to maintain the server. When slon node1 is restarted later, it will start replicating again, and soon catch up. At this point the procedure to switch origins is executed again to restore the original configuration.
This is the preferred way to handle things; it runs quickly, under control of the administrators, and there is no need for there to be any loss of data.
If some more serious problem occurs on the "origin" server, it may be necessary to FAILOVER to a backup server. This is a highly undesirable circumstance, as transactions "committed" on the origin, but not applied to the subscribers, will be lost. You may have reported these transactions as "successful" to outside users. As a result, failover should be considered a last resort. If the "injured" origin server can be brought up to the point where it can limp along long enough to do a controlled switchover, that is greatly preferable.
Slony-I does not provide any automatic detection for failed systems. Abandoning committed transactions is a business decision that cannot be made by a database system. If someone wants to put the commands below into a script executed automatically from the network monitoring system, well ... it's your data, and it's your failover policy.
The slonik command
failover (id = 1, backup node = 2);
causes node2 to assume the ownership (origin) of all sets that have node1 as their current origin. If there should happen to be additional nodes in the Slony-I cluster, all direct subscribers of node1 are instructed that this is happening. Slonik will also query all direct subscribers in order to determine out which node has the highest replication status (e.g. - the latest committed transaction) for each set, and the configuration will be changed in a way that node2 first applies those final before actually allowing write access to the tables.
In addition, all nodes that subscribed directly to node1 will now use node2 as data provider for the set. This means that after the failover command succeeded, no node in the entire replication setup will receive anything from node1 any more.
Reconfigure and restart the application (or pgpool) to cause it to reconnect to node2.
After the failover is complete and node2 accepts write operations against the tables, remove all remnants of node1's configuration information with the DROP NODE command:
drop node (id = 1, event node = 2);
After the above failover, the data stored on node1 will rapidly become increasingly out of sync with the rest of the nodes, and must be treated as corrupt. Therefore, the only way to get node1 back and transfer the origin role back to it is to rebuild it from scratch as a subscriber, let it catch up, and then follow the switchover procedure.
A good reason not to do this automatically is the fact that important updates (from a business perspective) may have been committed on the failing system. You probably want to analyze the last few transactions that made it into the failed node to see if some of them need to be reapplied on the "live" cluster. For instance, if someone was entering bank deposits affecting customer accounts at the time of failure, you wouldn't want to lose that information.
Warning |
It has been observed that there can be some very confusing results if a node is "failed" due to a persistent network outage as opposed to failure of data storage. In such a scenario, the "failed" node has a database in perfectly fine form; it is just that since it was cut off, it "screams in silence." If the network connection is repaired, that node could reappear, and as far as its configuration is concerned, all is well, and it should communicate with the rest of its Slony-I cluster. In fact, the only confusion taking place is on that node. The other nodes in the cluster are not confused at all; they know that this node is "dead," and that they should ignore it. But there is not a way to know this by looking at the "failed" node. This points back to the design point that Slony-I is not a network monitoring tool. You need to have clear methods of communicating to applications and users what database hosts are to be used. If those methods are lacking, adding replication to the mix will worsen the potential for confusion, and failover will be the point at which there is the greatest potential for confusion. |
If the database is very large, it may take many hours to recover node1 as a functioning Slony-I node; that is another reason to consider failover as an undesirable "final resort."