At last year's Linux Storage, Filesystem, Memory-Management, and BPF Summit (LSFMM+BPF), there was a discussion about atomic writes that was accompanied by patches to support the feature in the block layer, and for direct I/O on XFS. That work was merged, but another piece of that discussion concerned adding the feature for buffered I/O, in part because the PostgreSQL database currently has to jump through hoops to ensure that its writes are not "torn" (partially written) when there is an error or crash. Luis Chamberlain led a combined storage and filesystem track at this year's summit to revisit the idea of providing atomic (or untorn) writes for buffered I/O.
在去年的 Linux 存储、文件系统、内存管理与 BPF 峰会(LSFMM+BPF)上,曾讨论过原子写入问题,并附带了一些补丁,为块层和 XFS 的直接 I/O 添加该功能。这些工作已被合并,但讨论的另一部分内容是是否为缓冲 I/O 添加该功能,部分原因在于 PostgreSQL 数据库目前需要采用许多复杂手段来确保写入在发生错误或崩溃时不会“撕裂”(即部分写入)。Luis Chamberlain 在今年的峰会中主持了一个联合的存储和文件系统会议议程,重新探讨为缓冲 I/O 提供原子(或未撕裂)写入的想法。
Chamberlain suggested that there was a belief that it did not make sense to work on buffered atomic I/O simply to work around a missing feature in PostgreSQL; some think that the database should just support direct I/O. It turns out that the default storage engine for MongoDB supports both buffered and direct I/O, he said, but MongoDB recommends using buffered. The reason is that MongoDB compresses data on disk by default and keeps the data uncompressed in its cache. The data can be accessed via mmap(), which is not compatible with direct I/O.
Chamberlain 表示,有一种观点认为,仅仅为了绕过 PostgreSQL 缺少的功能去实现缓冲原子 I/O 并不值得;一些人认为数据库应当直接支持 direct I/O。但实际情况是,MongoDB 的默认存储引擎同时支持缓冲和直接 I/O,不过 MongoDB 推荐使用缓冲方式。原因是 MongoDB 默认会对磁盘上的数据进行压缩,并在缓存中保留未压缩版本。数据通过 mmap() 访问,这与 direct I/O 并不兼容。
He thinks that the database developers should be able to decide on the architecture that works best for their needs. Providing untorn buffered writes allows the databases to eliminate the double-buffering they are doing now as a workaround. There are configuration options to turn off the double-buffering for MySQL and PostgreSQL, which can be used to test the impacts of the change.
他认为,数据库开发人员应有权选择最适合自身需求的架构。提供未撕裂的缓冲写入可以让数据库摆脱目前使用双重缓冲作为权宜之计的做法。MySQL 和 PostgreSQL 都有配置选项可以关闭双重缓冲,从而测试该更改的影响。
The atomic-write API could eventually be u