Following a database read to the metal

In my research to understand how a simple SQL query sent from the app end up in the disk I learned that the term <code>page </code>and <code>block</code> is perhaps the most overloaded concepts in software engineering. There is a database page, an operating system virtual memory page, a file system block, an SSD page, two types of SSD blocks, one called the logical block that maps to the file system and one is the larger unit that is called erase unit which contains multiple pages. All of these units can have different sizes, some match some don’t. In this post I walk through a select statement and how the different levels of I/O are being performed all the way to the metal on disk. Here is a full picture of what we will explain   Following a database read to the metal <h2>But first fundamentals</h2> When you create a table in a database, a file is created on disk and the data are layout out into a fixed-sized database pages. How the data is laid out in the page depends on whether the engine is row-store or column-store. Think of a page as a structure which has a header and data, the data portion is where the rows live. A database page can be 8KB (Postgres) or 16KB (e.g. MySQL innodb) or more. The table is stored as an array of pages in the file, where page index + size tells the database exactly which offset to seek to and how much to read. For example, assume database page size is 8KB to read page 7 on disk, you need to seek to the offset 7*8192 + 1 and you would read a length of 8192 bytes. <a href="https://medium.com/@hnasr/following-a-database-read-to-the-metal-a187541333c2">Click Here</a>