Durability, the D of ACID | Software Engineering Dictionary

Published: 15 October 2021
on channel: Studying With Alex
3,286
172

Durability is a property of a database that guarantees that completed updates will not be lost if a system crashes.

In general, when you ask databases to store data, they write that data to a form of persistent storage like a disk drive. It’s called persistent because once the data is written, the system could crash, or even suffer a power outage, and the data will be safe.

However, some databases will delay writing to persistent storage until later so that they can write several updates in one batch. For example, if I ask the database to write data 1, then data 2, it might hang on to them in memory and then write them in one batch later. This technique, called batching, increases performance. However, if the database has data that hasn’t yet been written and there’s a power outage, that data is lost. These databases do not have durability.

If the database doesn’t use batching and writes to persistent storage on every request, is it durable? Not quite. Persistent storage is actually three different components: the operating system, the disk drive controller, and the actual physical atoms and electrons used for storage. The operating system and disk drive controller also perform batching for performance improvements. In a power outage, data queued for batching at the operating system or disk layer will also be lost because it didn’t make it to the actual atoms and electrons.

To avoid batching, it’s possible to tell the operating system and disk drive controller to write everything currently queued, which is called flushing. A database with durability both executes a write to the disk drive on every update and tells the disk drive to flush the write.

In summary, a database is considered to have durability if it can guarantee that whenever you ask it to write something, that when it says “done writing”, the data is encoded in real physical atoms and electrons. The downside of this is that forcing a physical write for every update makes updates slower, so there is a tradeoff between durability and performance.

For the final bit, let’s look at Redis, a database with configurable durability policies. There’s a handy post on their site about persistence, and the link is in the video description. Looks like Redis has three kinds of durability. Let’s cover the third one, then the first one, then the second one, so it’s in order of complexity. The third one is no persistence. When the database process exits or there’s a power outage, all data is lost. That’s easy enough to understand. It could be useful if your application is OK with all data being wiped at any moment, like a short-lived cache. Next, we have RDB, which provides point-in-time snapshots. In other words, every N seconds, it saves all the data in the database to disk. If you have a power outage, you’ll lose all the data since the last save, which will be at most N seconds ago. This doesn’t meet the criteria for durability, but it shows the tradeoff between persistence and performance. If you save more often, you lose less data, but you have to spend more time saving. And the last one is AOF, which records every write as soon as it’s received. Since it records every write, is it durable? It depends on the flush policy, which we can see if we scroll down. Redis uses the term fsync, short for filesystem sync, as a synonym for flushing. There are three options here. The first is always, which flushes on every write. This is durable! But as the post notes, it’s very slow. The second option is every second, which is not durable, but provides at most 1 second of data loss, similar to the previous approach saving every 1 second. And the last doesn’t explicitly flush and leaves it up to the operating system to organically decide when to write to disk. Which settings are best is up to you, the application developer, to decide, based on what you’re using the technology for.

00:00 Intro
00:07 What Is Persistent Storage?
00:21 Uh Oh: Batching Makes Databases Not Durable
00:49 The Operating System and Disk Drive Also Batch
01:17 Tell Them Not To Batch With The Flush Command
01:32 Summary
01:52 Redis Durability Research


Watch video Durability, the D of ACID | Software Engineering Dictionary online without registration, duration hours minute second in high quality. This video was added by user Studying With Alex 15 October 2021, don't forget to share it with your friends and acquaintances, it has been viewed on our site 3,286 once and liked it 172 people.