[RFC] Introduce generalized data temperature estimation framework

[PROBLEM DECLARATION]
Efficient data placement policy is a Holy Grail for data
storage and file system engineers. Achieving this goal is
equally important and really hard. Multiple data storage
and file system technologies have been invented to manage
the data placement policy (for example, COW, ZNS, FDP, etc).
But these technologies still require the hints related to
nature of data from application side.

[DATA "TEMPERATURE" CONCEPT]
One of the widely used and intuitively clear idea of data
nature definition is data "temperature" (cold, warm,
hot data). However, data "temperature" is as intuitively
sound as illusive definition of data nature. Generally
speaking, thermodynamics defines temperature as a way
to estimate the average kinetic energy of vibrating
atoms in a substance. But we cannot see a direct analogy
between data "temperature" and temperature in physics
because data is not something that has kinetic energy.

[WHAT IS GENERALIZED DATA "TEMPERATURE" ESTIMATION]
We usually imply that if some data is updated more
frequently, then such data is more hot than other one.
But, it is possible to see several problems here:
(1) How can we estimate the data "hotness" in
quantitative way? (2) We can state that data is "hot"
after some number of updates. It means that this
definition implies state of the data in the past.
Will this data continue to be "hot" in the future?
Generally speaking, the crucial problem is how to define
the data nature or data "temperature" in the future.
Because, this knowledge is the fundamental basis for
elaboration an efficient data placement policy.
Generalized data "temperature" estimation framework
suggests the way to define a future state of the data
and the basis for quantitative measurement of data
"temperature".

[ARCHITECTURE OF FRAMEWORK]
Usually, file system has a page cache for every inode. And
initially memory pages become dirty in page cache. Finally,
dirty pages will be sent to storage device. Technically
speaking, the number of dirty pages in a particular page
cache is the quantitative measurement of current "hotness"
of a file. But number of dirty pages is still not stable
basis for quantitative measurement of data "temperature".
It is possible to suggest of using the total number of
logical blocks in a file as a unit of one degree of data
"temperature". As a result, if the whole file was updated
several times, then "temperature" of the file has been
increased for several degrees. And if the file is under
continous updates, then the file "temperature" is growing.

We need to keep not only current number of dirty pages,
but also the number of updated pages in the near past
for accumulating the total "temperature" of a file.
Generally speaking, total number of updated pages in the
nearest past defines the aggregated "temperature" of file.
And number of dirty pages defines the delta of
"temperature" growth for current update operation.
This approach defines the mechanism of "temperature" growth.

But if we have no more updates for the file, then
"temperature" needs to decrease. Starting and ending
timestamps of update operation can work as a basis for
decreasing "temperature" of a file. If we know the number
of updated logical blocks of the file, then we can divide
the duration of update operation on number of updated
logical blocks. As a result, this is the way to define
a time duration per one logical block. By means of
multiplying this value (time duration per one logical
block) on total number of logical blocks in file, we
can calculate the time duration of "temperature"
decreasing for one degree. Finally, the operation of
division the time range (between end of last update
operation and begin of new update operation) on
the time duration of "temperature" decreasing for
one degree provides the way to define how many
degrees should be subtracted from current "temperature"
of the file.

[HOW TO USE THE APPROACH]
The lifetime of data "temperature" value for a file
can be explained by steps: (1) iget() method sets
the data "temperature" object; (2) folio_account_dirtied()
method accounts the number of dirty memory pages and
tries to estimate the current temperature of the file;
(3) folio_clear_dirty_for_io() decrease number of dirty
memory pages and increases number of updated pages;
(4) folio_account_dirtied() also decreases file's
"temperature" if updates hasn't happened some time;
(5) file system can get file's temperature and
to share the hint with block layer; (6) inode
eviction method removes and free the data "temperature"
object.

Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
---
 fs/Kconfig                             |   2 +
 fs/Makefile                            |   1 +
 fs/data-temperature/Kconfig            |  11 +
 fs/data-temperature/Makefile           |   3 +
 fs/data-temperature/data_temperature.c | 347 +++++++++++++++++++++++++
 include/linux/data_temperature.h       | 124 +++++++++
 include/linux/fs.h                     |   4 +
 mm/page-writeback.c                    |   9 +
 8 files changed, 501 insertions(+)
 create mode 100644 fs/data-temperature/Kconfig
 create mode 100644 fs/data-temperature/Makefile
 create mode 100644 fs/data-temperature/data_temperature.c
 create mode 100644 include/linux/data_temperature.h

Message ID	20250123202455.11338-1-slava@dubeyko.com (mailing list archive)
State	New
Headers	show Received: from mail-oi1-f176.google.com (mail-oi1-f176.google.com [209.85.167.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 566DD1F8F0B for <linux-fsdevel@vger.kernel.org>; Thu, 23 Jan 2025 20:25:41 +0000 (UTC) From: Viacheslav Dubeyko <slava@dubeyko.com> To: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: linux-mm@kvack.org, javier.gonz@samsung.com, Slava.Dubeyko@ibm.com, Viacheslav Dubeyko <slava@dubeyko.com> Subject: [RFC PATCH] Introduce generalized data temperature estimation framework Date: Thu, 23 Jan 2025 12:24:55 -0800 Message-ID: <20250123202455.11338-1-slava@dubeyko.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[RFC] Introduce generalized data temperature estimation framework \| expand [RFC] Introduce generalized data temperature estimation framework

[RFC] Introduce generalized data temperature estimation framework

Commit Message

Comments

Patch