[CMU 15-445/645] Project#1 Buffer Pool Manager

[CMU 15-445/645] Project#1 Buffer Pool Manager

[CMU 15-445/645] Project#1 Buffer Pool Manager

Project地址:Buffer Pool Manager

概要

The first programming project is to implement a buffer pool in your storage manager. The buffer pool is responsible for moving physical pages back and forth from main memory to disk. It allows a DBMS to support databases that are larger than the amount of memory that is available to the system. The buffer pool's operations are transparent to other parts in the system. For example, the system asks the buffer pool for a page using its unique identifier (page_id_t) and it does not know whether that page is already in memory or whether the system has to go retrieve it from disk.

Your implementation will need to be thread-safe. Multiple threads will be accessing the internal data structures at the same and thus you need to make sure that their critical sections are protected with latches (these are called "locks" in operating systems).

该部分是实现一个缓冲池的管理。由于通常主存不足以容纳整个数据库系统,因此我们需要管理物理页与主存之间的关系,包括将物理页调入到主存中或者将主存中的页写回物理磁盘中。该部分在本书推荐教材的Chapter12以及在计算机组成原理,操作系统中都有涉及。

我们需要完成的就是实现这整个调入调出的过程,以及调入调出的策略。需要注意的是我们的实现需要是线程安全的,因此在这个过程中我们需要锁机制。

项目规范

For each of the following components, we are providing you with stub classes that contain the API that you need to implement. You should not modify the signatures for the pre-defined functions in these classes. If you modify the signatures, the test code that we use for grading will break and you will get no credit for the project. You also should not add additional classes in the source code for these components. These components should be entirely self-contained.

If a class already contains data members, you should not remove them. For example, the BufferPoolManager contains DiskManager and Replacer objects. These are required to implement the functionality that is needed by the rest of the system. On the other hand, you may need to add data members to these classes in order to correctly implement the required functionality. You can also add additional helper functions to these classes. The choice is yours.

You are allowed to use any built-in C++17 containers in your project unless specified otherwise. It is up to you to decide which ones you want to use. Note that these containers are not thread-safe and that you will need to include latches in your implementation to protect them. You may not bring in additional third-party dependencies (e.g. boost).

不能修改已有的预先定义的函数,也不能增加一些额外的类,也不能删除原有的数据成员。可以使用c++ 17内置的容器,但不能使用一些第三方依赖例如boost。

TASK #1 - LRU REPLACEMENT POLICY

概述

This component is responsible for tracking page usage in the buffer pool. You will implement a new sub-class called LRUReplacer in src/include/buffer/lru_replacer.h and its corresponding implementation file in src/buffer/lru_replacer.cpp. LRUReplacer extends the abstract Replacer class (src/include/buffer/replacer.h), which contains the function specifications.

The size of the LRUReplacer is the same as buffer pool since it contains placeholders for all of the frames in the BufferPoolManager. However, not all the frames are considered as in the LRUReplacer. The LRUReplacer is initialized to have no frame in it. Then, only the newly unpinned ones will be considered in the LRUReplacer.

这个是用来管理缓冲池中的页的使用,使用的是LRU的替换策略。LRU就是最近最久未使用的算法,其例子就有手机的后台管理,每次我们使用应用时,这个应用在后台管理中就会被置于最前面,而最近最久未使用的,就在后台管理的最后面,当我们想要淘汰一个应用时,就会淘汰最后一个。这个是LRU的基本思想,但是在这里稍微有点变动,这个后面实现Unpin(T)时会提到。LRU的算法实现也很简单,能够通过哈希表与双向链表来让添加和查询的时间复杂度为O(1).

Leetcode习题:146. LRU 缓存机制 可以学习到LRU的基本写法。

要求

LRUReplacer 要求我们实现以下几个成员函数:

  • Victim(T*) : 当 LRUReplacer 为空时返回 false ,否则从 LRUReplacer 中找出最近最近最久未使用的对象并将其内容存储在输出内容中
  • Pin(T) : 当 BufferPoolManager 将一个页固定在页框中后,调用该函数。该函数会将容纳这个页的页框从 LRUReplacer 中删除。
  • Unpin(T) : 当一个页面的 pin_count 变成0的时候会调用这个方法。 这个方法会将没有固定页的页框加入到 LRUReplacer 中。
  • Size() : 返回 LRUReplacer 中的页框数。

这个具体的实现细节由我们决定,可以使用STL容器,可以假设内存不会被耗尽,但必须保证线程安全。

实现

这个实现比较简单,利用C++ STL自带的list以及unordered_map可以实现LRU的维护。

  • Victim(T*) : 假设双向链表尾是需要被淘汰的,那么我们直接从双向链表尾取出队尾元素并赋值给输出参数即可。然后更新LRU列表。
  • Pin(T) : 通过unordered_map找到该元素并在list中删除。
  • Unpin(T) : 这里有比较坑的点,一般LRU访问某个元素后,会将该元素提至链表头,但这里不需要,并且当LRU列表满了之后,不需要任何操作,因此这两种情况我们可以直接return掉,剩下的直接添加到链表头,并在unordered_map中添加就行
  • Size() : 返回 LRUReplacer 中的页框数。
  • 线程安全:我在这里是直接使用了std::scope_lock以及std::mutex来保证线程安全。

TASK #2 - BUFFER POOL MANAGER

概述

Next, you need to implement the buffer pool manager in your system (BufferPoolManager). The BufferPoolManager is responsible for fetching database pages from the DiskManager and storing them in memory. The BufferPoolManager can also write dirty pages out to disk when it is either explicitly instructed to do so or when it needs to evict a page to make space for a new page.

To make sure that your implementation works correctly with the rest of the system, we will provide you with some of the functions already filled in. You will also not need to implement the code that actually reads and writes data to disk (this is called the DiskManager in our implementation). We will provide that functionality for you.

All in-memory pages in the system are represented by Page objects. The BufferPoolManager does not need to understand the contents of these pages. But it is important for you as the system developer to understand that Page objects are just containers for memory in the buffer pool and thus are not specific to a unique page. That is, each Page object contains a block of memory that the DiskManager will use as a location to copy the contents of a physical page that it reads from disk. The BufferPoolManager will reuse the same Page object to store data as it moves back and forth to disk. This means that the same Page object may contain a different physical page throughout the life of the system. The Page object's identifer (page_id) keeps track of what physical page it contains; if a Page object does not contain a physical page, then its page_id must be set to INVALID_PAGE_ID.

Each Page object also maintains a counter for the number of threads that have "pinned" that page. Your BufferPoolManager is not allowed to free a Page that is pinned. Each Page object also keeps track of whether it is dirty or not. It is your job to record whether a page was modified before it is unpinned. Your BufferPoolManager must write the contents of a dirty Page back to disk before that object can be reused.

这里就是描述了 BufferPoolManager 的一些基本情况,要实现这个类我们需要对 DiskManager 类以及 Page 类有一些了解,这些类都在项目代码中可以查看。其中 DiskManager 是对物理页以及磁盘的管理,可以分配新的物理页以及对磁盘的读写。Page 类则是在内存中的页,是物理页的拷贝, BufferPoolManager 正是对Page 的管理。

实现

需要实现以下几个函数:

  • FetchPageImpl(page_id)
  • NewPageImpl(page_id)
  • UnpinPageImpl(page_id, is_dirty)
  • FlushPageImpl(page_id)
  • DeletePageImpl(page_id)
  • FlushAllPagesImpl()

其详细步骤以及功能在代码中都有描述。同样要保证线程安全, BufferPoolManager 中有提供一个锁 latch_ ,可以使用std::scope_lock来进行多线程的管理。

这里遇到的一个坑点就是UnpinPageImpl(page_id, is_dirty)中的描述为

/**
   * Unpin the target page from the buffer pool.
   * @param page_id id of page to be unpinned
   * @param is_dirty true if the page should be marked as dirty, false otherwise
   * @return false if the page pin count is <= 0 before this call, true otherwise
   */

也就是只有在 pin_count<=0 的情况下才返回 false,否则一律返回 true

同样遵循课程要求这里不公开完整实现代码

测试

本地

  • LRUReplacer: test/buffer/lru_replacer_test.cpp

  • BufferPoolManager: test/buffer/buffer_pool_manager_test.cpp

首先需要将以上两个文件中测试样例的 DISABLED_ 前缀删掉。然后

# LRUReplacer
cd build
make lru_replacer_test
./test/lru_replacer_test
# BufferPoolManager
cd build
make buffer_pool_manager_test
./test/buffer_pool_manager_test

在线

需要包括的文件

  • src/include/buffer/lru_replacer.h
  • src/buffer/lru_replacer.cpp
  • src/include/buffer/buffer_pool_manager.h
  • src/buffer/buffer_pool_manager.cpp

打包脚本

zip -r p1_submission.zip \
src/include/buffer/lru_replacer.h \
src/buffer/lru_replacer.cpp \
src/include/buffer/buffer_pool_manager.h \
src/buffer/buffer_pool_manager.cpp

测试结果
image.png