diff mbox series

[v4,03/34] ext4: trim delalloc extent

Message ID 20240410142948.2817554-4-yi.zhang@huaweicloud.com (mailing list archive)
State New
Headers show
Series ext4: use iomap for regular file's buffered IO path and enable large folio | expand

Commit Message

Zhang Yi April 10, 2024, 2:29 p.m. UTC
From: Zhang Yi <yi.zhang@huawei.com>

The cached delalloc or hole extent should be trimed to the map->map_len
if we map delalloc blocks in ext4_da_map_blocks(). But it doesn't
trigger any issue now because the map->m_len is always set to one and we
always insert one delayed block once a time. Fix this by trim the extent
once we get one from the cached extent tree, prearing for mapping a
extent with multiple delalloc blocks.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

Comments

Ritesh Harjani (IBM) May 1, 2024, 2:31 p.m. UTC | #1
Zhang Yi <yi.zhang@huaweicloud.com> writes:

> From: Zhang Yi <yi.zhang@huawei.com>
>
> The cached delalloc or hole extent should be trimed to the map->map_len
> if we map delalloc blocks in ext4_da_map_blocks().

Why do you say the cached delalloc extent should also be trimemd to
m_len? Because we are only inserting delalloc blocks of
min(hole_len, m_len), right?

If we find delalloc blocks, we don't need to insert anything in ES
cache. So we just return 0 in such case in this function.


> But it doesn't
> trigger any issue now because the map->m_len is always set to one and we
> always insert one delayed block once a time. Fix this by trim the extent
> once we get one from the cached extent tree, prearing for mapping a
> extent with multiple delalloc blocks.
>

Yes, it wasn't clear until I looked at the discussion in the other
thread. It would be helpful if you could use that example in the commit
msg here for clarity.


"""
Yeah, now we only trim map len if we found an unwritten extent or written
extent in the cache, this isn't okay if we found a hole and
ext4_insert_delayed_block() and ext4_da_map_blocks() support inserting
map->len blocks. If we found a hole which es->es_len is shorter than the
length we want to write, we could delay more blocks than we expected.

Please assume we write data [A, C) to a file that contains a hole extent
[A, B) and a written extent [B, D) in cache.

                      A     B  C  D
before da write:   ...hhhhhh|wwwwww....

Then we will get extent [A, B), we should trim map->m_len to B-A before
inserting new delalloc blocks, if not, the range [B, C) is duplicated.

"""

Minor nit: ext4_da_map_blocks() function comments have become stale now. 
It's not clear of it's return value, the lock it uses etc. etc. If we are
at it, we might as well fix the function description.

-ritesh
Zhang Yi May 6, 2024, 6:15 a.m. UTC | #2
On 2024/5/1 22:31, Ritesh Harjani (IBM) wrote:
> Zhang Yi <yi.zhang@huaweicloud.com> writes:
> 
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> The cached delalloc or hole extent should be trimed to the map->map_len
>> if we map delalloc blocks in ext4_da_map_blocks().
> 
> Why do you say the cached delalloc extent should also be trimemd to
> m_len? Because we are only inserting delalloc blocks of
> min(hole_len, m_len), right?
> 
> If we find delalloc blocks, we don't need to insert anything in ES
> cache. So we just return 0 in such case in this function.
> 

I'm sorry for the clerical error, it should not be trimmed to m_len, it
should be trimmed to es->es_len. If we find a delalloc entry that shorter
than the map->m_len, it means the front part of this write range has
already been delayed, we can't insert the delalloc extent that contains
the latter part in this round, we need to trim the map->m_len and return 0,
the caller will increase the position and call ext4_da_map_blocks() again.
For example, please assume we write data [A, C) to a file that contains a
delayed extent [A, B) in the cache.

                      A     B  C
before da write:   ...dddddd|hhh....

Then we will get delayed extent [A, B), we should trim map->m_len to B-A
and return 0, if not, the caller will incorrectly assume that the write
is complete and won't insert [B, C) later.

> 
>> But it doesn't
>> trigger any issue now because the map->m_len is always set to one and we
>> always insert one delayed block once a time. Fix this by trim the extent
>> once we get one from the cached extent tree, prearing for mapping a
>> extent with multiple delalloc blocks.
>>
> 
> Yes, it wasn't clear until I looked at the discussion in the other
> thread. It would be helpful if you could use that example in the commit
> msg here for clarity.
> 
> 
> """
> Yeah, now we only trim map len if we found an unwritten extent or written
> extent in the cache, this isn't okay if we found a hole and
> ext4_insert_delayed_block() and ext4_da_map_blocks() support inserting
> map->len blocks. If we found a hole which es->es_len is shorter than the
> length we want to write, we could delay more blocks than we expected.
> 
> Please assume we write data [A, C) to a file that contains a hole extent
> [A, B) and a written extent [B, D) in cache.
> 
>                       A     B  C  D
> before da write:   ...hhhhhh|wwwwww....
> 
> Then we will get extent [A, B), we should trim map->m_len to B-A before
> inserting new delalloc blocks, if not, the range [B, C) is duplicated.
> 
> """
> 
> Minor nit: ext4_da_map_blocks() function comments have become stale now. 
> It's not clear of it's return value, the lock it uses etc. etc. If we are
> at it, we might as well fix the function description.
> 

Thanks for the reminder, I will update it in patch 9 since it does
some cleanup and also changes the return value.

Thanks,
Yi.
diff mbox series

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 118b0497a954..e4043ddb07a5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1734,6 +1734,11 @@  static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
 
 	/* Lookup extent status tree firstly */
 	if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
+		retval = es.es_len - (iblock - es.es_lblk);
+		if (retval > map->m_len)
+			retval = map->m_len;
+		map->m_len = retval;
+
 		if (ext4_es_is_hole(&es))
 			goto add_delayed;
 
@@ -1750,10 +1755,6 @@  static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
 		}
 
 		map->m_pblk = ext4_es_pblock(&es) + iblock - es.es_lblk;
-		retval = es.es_len - (iblock - es.es_lblk);
-		if (retval > map->m_len)
-			retval = map->m_len;
-		map->m_len = retval;
 		if (ext4_es_is_written(&es))
 			map->m_flags |= EXT4_MAP_MAPPED;
 		else if (ext4_es_is_unwritten(&es))
@@ -1788,6 +1789,11 @@  static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
 	 * whitout holding i_rwsem and folio lock.
 	 */
 	if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
+		retval = es.es_len - (iblock - es.es_lblk);
+		if (retval > map->m_len)
+			retval = map->m_len;
+		map->m_len = retval;
+
 		if (!ext4_es_is_hole(&es)) {
 			up_write(&EXT4_I(inode)->i_data_sem);
 			goto found;