Message ID | 20240731043835.1828697-1-yangerkun@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | libfs: fix infinite directory reads for offset dir | expand |
On Wed 31-07-24 12:38:35, yangerkun wrote: > After we switch tmpfs dir operations from simple_dir_operations to > simple_offset_dir_operations, every rename happened will fill new dentry > to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free > key starting with octx->newx_offset, and then set newx_offset equals to > free key + 1. This will lead to infinite readdir combine with rename > happened at the same time, which fail generic/736 in xfstests(detail show > as below). > > 1. create 5000 files(1 2 3...) under one dir > 2. call readdir(man 3 readdir) once, and get one entry > 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry) > 4. loop 2~3, until readdir return nothing or we loop too many > times(tmpfs break test with the second condition) > > We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite > directory reads") to fix it, record the last_index when we open dir, and > do not emit the entry which index >= last_index. The file->private_data > now used in offset dir can use directly to do this, and we also update > the last_index when we llseek the dir file. The patch looks good! Just I'm not sure about the llseek part. As far as I understand it was added due to this sentence in the standard: "If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified." So if the offset used in offset_dir_llseek() is 0, then we should update last_index. But otherwise I'd leave it alone because IMHO it would do more harm than good. Honza > > Fixes: a2e459555c5f ("shmem: stable directory offsets") > Signed-off-by: yangerkun <yangerkun@huawei.com> > --- > fs/libfs.c | 34 +++++++++++++++++++++++----------- > 1 file changed, 23 insertions(+), 11 deletions(-) > > diff --git a/fs/libfs.c b/fs/libfs.c > index 8aa34870449f..38b306738c00 100644 > --- a/fs/libfs.c > +++ b/fs/libfs.c > @@ -450,6 +450,14 @@ void simple_offset_destroy(struct offset_ctx *octx) > mtree_destroy(&octx->mt); > } > > +static int offset_dir_open(struct inode *inode, struct file *file) > +{ > + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); > + > + file->private_data = (void *)ctx->next_offset; > + return 0; > +} > + > /** > * offset_dir_llseek - Advance the read position of a directory descriptor > * @file: an open directory whose position is to be updated > @@ -463,6 +471,9 @@ void simple_offset_destroy(struct offset_ctx *octx) > */ > static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) > { > + struct inode *inode = file->f_inode; > + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); > + > switch (whence) { > case SEEK_CUR: > offset += file->f_pos; > @@ -476,7 +487,7 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) > } > > /* In this case, ->private_data is protected by f_pos_lock */ > - file->private_data = NULL; > + file->private_data = (void *)ctx->next_offset; > return vfs_setpos(file, offset, LONG_MAX); > } > > @@ -507,7 +518,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) > inode->i_ino, fs_umode_to_dtype(inode->i_mode)); > } > > -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) > +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index) > { > struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode); > struct dentry *dentry; > @@ -515,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) > while (true) { > dentry = offset_find_next(octx, ctx->pos); > if (!dentry) > - return ERR_PTR(-ENOENT); > + return; > + > + if (dentry2offset(dentry) >= last_index) { > + dput(dentry); > + return; > + } > > if (!offset_dir_emit(ctx, dentry)) { > dput(dentry); > - break; > + return; > } > > ctx->pos = dentry2offset(dentry) + 1; > dput(dentry); > } > - return NULL; > } > > /** > @@ -552,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) > static int offset_readdir(struct file *file, struct dir_context *ctx) > { > struct dentry *dir = file->f_path.dentry; > + long last_index = (long)file->private_data; > > lockdep_assert_held(&d_inode(dir)->i_rwsem); > > if (!dir_emit_dots(file, ctx)) > return 0; > > - /* In this case, ->private_data is protected by f_pos_lock */ > - if (ctx->pos == DIR_OFFSET_MIN) > - file->private_data = NULL; > - else if (file->private_data == ERR_PTR(-ENOENT)) > - return 0; > - file->private_data = offset_iterate_dir(d_inode(dir), ctx); > + offset_iterate_dir(d_inode(dir), ctx, last_index); > return 0; > } > > const struct file_operations simple_offset_dir_operations = { > + .open = offset_dir_open, > .llseek = offset_dir_llseek, > .iterate_shared = offset_readdir, > .read = generic_read_dir, > -- > 2.39.2 >
Hi! 在 2024/7/31 19:51, Jan Kara 写道: > On Wed 31-07-24 12:38:35, yangerkun wrote: >> After we switch tmpfs dir operations from simple_dir_operations to >> simple_offset_dir_operations, every rename happened will fill new dentry >> to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free >> key starting with octx->newx_offset, and then set newx_offset equals to >> free key + 1. This will lead to infinite readdir combine with rename >> happened at the same time, which fail generic/736 in xfstests(detail show >> as below). >> >> 1. create 5000 files(1 2 3...) under one dir >> 2. call readdir(man 3 readdir) once, and get one entry >> 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry) >> 4. loop 2~3, until readdir return nothing or we loop too many >> times(tmpfs break test with the second condition) >> >> We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite >> directory reads") to fix it, record the last_index when we open dir, and >> do not emit the entry which index >= last_index. The file->private_data >> now used in offset dir can use directly to do this, and we also update >> the last_index when we llseek the dir file. > > The patch looks good! Just I'm not sure about the llseek part. As far as I > understand it was added due to this sentence in the standard: > > "If a file is removed from or added to the directory after the most recent > call to opendir() or rewinddir(), whether a subsequent call to readdir() > returns an entry for that file is unspecified." > > So if the offset used in offset_dir_llseek() is 0, then we should update > last_index. But otherwise I'd leave it alone because IMHO it would do more > harm than good. IIUC, what you means is that we should only reset the private_data to new last_index when we call rewinddir(which will call lseek to set offset of dir file to 0)? Yeah, I prefer the logic you describle! Besides, we may also change btrfs that do the same(e60aa5da14d0 ("btrfs: refresh dir last index during a rewinddir(3) call")). Filipe, how do you think? Thanks, Erkun. > Honza > >> >> Fixes: a2e459555c5f ("shmem: stable directory offsets") >> Signed-off-by: yangerkun <yangerkun@huawei.com> >> --- >> fs/libfs.c | 34 +++++++++++++++++++++++----------- >> 1 file changed, 23 insertions(+), 11 deletions(-) >> >> diff --git a/fs/libfs.c b/fs/libfs.c >> index 8aa34870449f..38b306738c00 100644 >> --- a/fs/libfs.c >> +++ b/fs/libfs.c >> @@ -450,6 +450,14 @@ void simple_offset_destroy(struct offset_ctx *octx) >> mtree_destroy(&octx->mt); >> } >> >> +static int offset_dir_open(struct inode *inode, struct file *file) >> +{ >> + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); >> + >> + file->private_data = (void *)ctx->next_offset; >> + return 0; >> +} >> + >> /** >> * offset_dir_llseek - Advance the read position of a directory descriptor >> * @file: an open directory whose position is to be updated >> @@ -463,6 +471,9 @@ void simple_offset_destroy(struct offset_ctx *octx) >> */ >> static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) >> { >> + struct inode *inode = file->f_inode; >> + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); >> + >> switch (whence) { >> case SEEK_CUR: >> offset += file->f_pos; >> @@ -476,7 +487,7 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) >> } >> >> /* In this case, ->private_data is protected by f_pos_lock */ >> - file->private_data = NULL; >> + file->private_data = (void *)ctx->next_offset; >> return vfs_setpos(file, offset, LONG_MAX); >> } >> >> @@ -507,7 +518,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) >> inode->i_ino, fs_umode_to_dtype(inode->i_mode)); >> } >> >> -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) >> +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index) >> { >> struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode); >> struct dentry *dentry; >> @@ -515,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) >> while (true) { >> dentry = offset_find_next(octx, ctx->pos); >> if (!dentry) >> - return ERR_PTR(-ENOENT); >> + return; >> + >> + if (dentry2offset(dentry) >= last_index) { >> + dput(dentry); >> + return; >> + } >> >> if (!offset_dir_emit(ctx, dentry)) { >> dput(dentry); >> - break; >> + return; >> } >> >> ctx->pos = dentry2offset(dentry) + 1; >> dput(dentry); >> } >> - return NULL; >> } >> >> /** >> @@ -552,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) >> static int offset_readdir(struct file *file, struct dir_context *ctx) >> { >> struct dentry *dir = file->f_path.dentry; >> + long last_index = (long)file->private_data; >> >> lockdep_assert_held(&d_inode(dir)->i_rwsem); >> >> if (!dir_emit_dots(file, ctx)) >> return 0; >> >> - /* In this case, ->private_data is protected by f_pos_lock */ >> - if (ctx->pos == DIR_OFFSET_MIN) >> - file->private_data = NULL; >> - else if (file->private_data == ERR_PTR(-ENOENT)) >> - return 0; >> - file->private_data = offset_iterate_dir(d_inode(dir), ctx); >> + offset_iterate_dir(d_inode(dir), ctx, last_index); >> return 0; >> } >> >> const struct file_operations simple_offset_dir_operations = { >> + .open = offset_dir_open, >> .llseek = offset_dir_llseek, >> .iterate_shared = offset_readdir, >> .read = generic_read_dir, >> -- >> 2.39.2 >>
On Wed 31-07-24 20:51:05, yangerkun wrote: > 在 2024/7/31 19:51, Jan Kara 写道: > > On Wed 31-07-24 12:38:35, yangerkun wrote: > > > After we switch tmpfs dir operations from simple_dir_operations to > > > simple_offset_dir_operations, every rename happened will fill new dentry > > > to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free > > > key starting with octx->newx_offset, and then set newx_offset equals to > > > free key + 1. This will lead to infinite readdir combine with rename > > > happened at the same time, which fail generic/736 in xfstests(detail show > > > as below). > > > > > > 1. create 5000 files(1 2 3...) under one dir > > > 2. call readdir(man 3 readdir) once, and get one entry > > > 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry) > > > 4. loop 2~3, until readdir return nothing or we loop too many > > > times(tmpfs break test with the second condition) > > > > > > We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite > > > directory reads") to fix it, record the last_index when we open dir, and > > > do not emit the entry which index >= last_index. The file->private_data > > > now used in offset dir can use directly to do this, and we also update > > > the last_index when we llseek the dir file. > > > > The patch looks good! Just I'm not sure about the llseek part. As far as I > > understand it was added due to this sentence in the standard: > > > > "If a file is removed from or added to the directory after the most recent > > call to opendir() or rewinddir(), whether a subsequent call to readdir() > > returns an entry for that file is unspecified." > > > > So if the offset used in offset_dir_llseek() is 0, then we should update > > last_index. But otherwise I'd leave it alone because IMHO it would do more > > harm than good. > > IIUC, what you means is that we should only reset the private_data to > new last_index when we call rewinddir(which will call lseek to set > offset of dir file to 0)? Yes, exactly. Sorry for being a bit vague. Honza
On Wed, Jul 31, 2024 at 1:51 PM yangerkun <yangerkun@huaweicloud.com> wrote: > > Hi! > > 在 2024/7/31 19:51, Jan Kara 写道: > > On Wed 31-07-24 12:38:35, yangerkun wrote: > >> After we switch tmpfs dir operations from simple_dir_operations to > >> simple_offset_dir_operations, every rename happened will fill new dentry > >> to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free > >> key starting with octx->newx_offset, and then set newx_offset equals to > >> free key + 1. This will lead to infinite readdir combine with rename > >> happened at the same time, which fail generic/736 in xfstests(detail show > >> as below). > >> > >> 1. create 5000 files(1 2 3...) under one dir > >> 2. call readdir(man 3 readdir) once, and get one entry > >> 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry) > >> 4. loop 2~3, until readdir return nothing or we loop too many > >> times(tmpfs break test with the second condition) > >> > >> We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite > >> directory reads") to fix it, record the last_index when we open dir, and > >> do not emit the entry which index >= last_index. The file->private_data > >> now used in offset dir can use directly to do this, and we also update > >> the last_index when we llseek the dir file. > > > > The patch looks good! Just I'm not sure about the llseek part. As far as I > > understand it was added due to this sentence in the standard: > > > > "If a file is removed from or added to the directory after the most recent > > call to opendir() or rewinddir(), whether a subsequent call to readdir() > > returns an entry for that file is unspecified." > > > > So if the offset used in offset_dir_llseek() is 0, then we should update > > last_index. But otherwise I'd leave it alone because IMHO it would do more > > harm than good. > > IIUC, what you means is that we should only reset the private_data to > new last_index when we call rewinddir(which will call lseek to set > offset of dir file to 0)? > > Yeah, I prefer the logic you describle! Besides, we may also change > btrfs that do the same(e60aa5da14d0 ("btrfs: refresh dir last index > during a rewinddir(3) call")). Filipe, how do you think? What problem does it solve? The standard doesn't forbid it, and I can't see anything wrong with it. > > Thanks, > Erkun. > > > Honza > > > >> > >> Fixes: a2e459555c5f ("shmem: stable directory offsets") > >> Signed-off-by: yangerkun <yangerkun@huawei.com> > >> --- > >> fs/libfs.c | 34 +++++++++++++++++++++++----------- > >> 1 file changed, 23 insertions(+), 11 deletions(-) > >> > >> diff --git a/fs/libfs.c b/fs/libfs.c > >> index 8aa34870449f..38b306738c00 100644 > >> --- a/fs/libfs.c > >> +++ b/fs/libfs.c > >> @@ -450,6 +450,14 @@ void simple_offset_destroy(struct offset_ctx *octx) > >> mtree_destroy(&octx->mt); > >> } > >> > >> +static int offset_dir_open(struct inode *inode, struct file *file) > >> +{ > >> + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); > >> + > >> + file->private_data = (void *)ctx->next_offset; > >> + return 0; > >> +} > >> + > >> /** > >> * offset_dir_llseek - Advance the read position of a directory descriptor > >> * @file: an open directory whose position is to be updated > >> @@ -463,6 +471,9 @@ void simple_offset_destroy(struct offset_ctx *octx) > >> */ > >> static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) > >> { > >> + struct inode *inode = file->f_inode; > >> + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); > >> + > >> switch (whence) { > >> case SEEK_CUR: > >> offset += file->f_pos; > >> @@ -476,7 +487,7 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) > >> } > >> > >> /* In this case, ->private_data is protected by f_pos_lock */ > >> - file->private_data = NULL; > >> + file->private_data = (void *)ctx->next_offset; > >> return vfs_setpos(file, offset, LONG_MAX); > >> } > >> > >> @@ -507,7 +518,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) > >> inode->i_ino, fs_umode_to_dtype(inode->i_mode)); > >> } > >> > >> -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) > >> +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index) > >> { > >> struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode); > >> struct dentry *dentry; > >> @@ -515,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) > >> while (true) { > >> dentry = offset_find_next(octx, ctx->pos); > >> if (!dentry) > >> - return ERR_PTR(-ENOENT); > >> + return; > >> + > >> + if (dentry2offset(dentry) >= last_index) { > >> + dput(dentry); > >> + return; > >> + } > >> > >> if (!offset_dir_emit(ctx, dentry)) { > >> dput(dentry); > >> - break; > >> + return; > >> } > >> > >> ctx->pos = dentry2offset(dentry) + 1; > >> dput(dentry); > >> } > >> - return NULL; > >> } > >> > >> /** > >> @@ -552,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) > >> static int offset_readdir(struct file *file, struct dir_context *ctx) > >> { > >> struct dentry *dir = file->f_path.dentry; > >> + long last_index = (long)file->private_data; > >> > >> lockdep_assert_held(&d_inode(dir)->i_rwsem); > >> > >> if (!dir_emit_dots(file, ctx)) > >> return 0; > >> > >> - /* In this case, ->private_data is protected by f_pos_lock */ > >> - if (ctx->pos == DIR_OFFSET_MIN) > >> - file->private_data = NULL; > >> - else if (file->private_data == ERR_PTR(-ENOENT)) > >> - return 0; > >> - file->private_data = offset_iterate_dir(d_inode(dir), ctx); > >> + offset_iterate_dir(d_inode(dir), ctx, last_index); > >> return 0; > >> } > >> > >> const struct file_operations simple_offset_dir_operations = { > >> + .open = offset_dir_open, > >> .llseek = offset_dir_llseek, > >> .iterate_shared = offset_readdir, > >> .read = generic_read_dir, > >> -- > >> 2.39.2 > >> >
On Wed, Jul 31, 2024 at 12:38:35PM +0800, yangerkun wrote: > After we switch tmpfs dir operations from simple_dir_operations to > simple_offset_dir_operations, every rename happened will fill new dentry > to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free > key starting with octx->newx_offset, and then set newx_offset equals to > free key + 1. This will lead to infinite readdir combine with rename > happened at the same time, which fail generic/736 in xfstests(detail show > as below). > > 1. create 5000 files(1 2 3...) under one dir > 2. call readdir(man 3 readdir) once, and get one entry > 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry) > 4. loop 2~3, until readdir return nothing or we loop too many > times(tmpfs break test with the second condition) > > We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite > directory reads") to fix it, record the last_index when we open dir, and > do not emit the entry which index >= last_index. The file->private_data > now used in offset dir can use directly to do this, and we also update > the last_index when we llseek the dir file. > > Fixes: a2e459555c5f ("shmem: stable directory offsets") > Signed-off-by: yangerkun <yangerkun@huawei.com> I agree with Jan's down-thread comments about llseek, but other than that: Reviewed-by: Chuck Lever <chuck.lever@oracle.com> > --- > fs/libfs.c | 34 +++++++++++++++++++++++----------- > 1 file changed, 23 insertions(+), 11 deletions(-) > > diff --git a/fs/libfs.c b/fs/libfs.c > index 8aa34870449f..38b306738c00 100644 > --- a/fs/libfs.c > +++ b/fs/libfs.c > @@ -450,6 +450,14 @@ void simple_offset_destroy(struct offset_ctx *octx) > mtree_destroy(&octx->mt); > } > > +static int offset_dir_open(struct inode *inode, struct file *file) > +{ > + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); > + > + file->private_data = (void *)ctx->next_offset; > + return 0; > +} > + > /** > * offset_dir_llseek - Advance the read position of a directory descriptor > * @file: an open directory whose position is to be updated > @@ -463,6 +471,9 @@ void simple_offset_destroy(struct offset_ctx *octx) > */ > static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) > { > + struct inode *inode = file->f_inode; > + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); > + > switch (whence) { > case SEEK_CUR: > offset += file->f_pos; > @@ -476,7 +487,7 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) > } > > /* In this case, ->private_data is protected by f_pos_lock */ > - file->private_data = NULL; > + file->private_data = (void *)ctx->next_offset; > return vfs_setpos(file, offset, LONG_MAX); > } > > @@ -507,7 +518,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) > inode->i_ino, fs_umode_to_dtype(inode->i_mode)); > } > > -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) > +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index) > { > struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode); > struct dentry *dentry; > @@ -515,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) > while (true) { > dentry = offset_find_next(octx, ctx->pos); > if (!dentry) > - return ERR_PTR(-ENOENT); > + return; > + > + if (dentry2offset(dentry) >= last_index) { > + dput(dentry); > + return; > + } > > if (!offset_dir_emit(ctx, dentry)) { > dput(dentry); > - break; > + return; > } > > ctx->pos = dentry2offset(dentry) + 1; > dput(dentry); > } > - return NULL; > } > > /** > @@ -552,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) > static int offset_readdir(struct file *file, struct dir_context *ctx) > { > struct dentry *dir = file->f_path.dentry; > + long last_index = (long)file->private_data; > > lockdep_assert_held(&d_inode(dir)->i_rwsem); > > if (!dir_emit_dots(file, ctx)) > return 0; > > - /* In this case, ->private_data is protected by f_pos_lock */ > - if (ctx->pos == DIR_OFFSET_MIN) > - file->private_data = NULL; > - else if (file->private_data == ERR_PTR(-ENOENT)) > - return 0; > - file->private_data = offset_iterate_dir(d_inode(dir), ctx); > + offset_iterate_dir(d_inode(dir), ctx, last_index); > return 0; > } > > const struct file_operations simple_offset_dir_operations = { > + .open = offset_dir_open, > .llseek = offset_dir_llseek, > .iterate_shared = offset_readdir, > .read = generic_read_dir, > -- > 2.39.2 > >
On Wed, 31 Jul 2024 12:38:35 +0800, yangerkun wrote: > After we switch tmpfs dir operations from simple_dir_operations to > simple_offset_dir_operations, every rename happened will fill new dentry > to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free > key starting with octx->newx_offset, and then set newx_offset equals to > free key + 1. This will lead to infinite readdir combine with rename > happened at the same time, which fail generic/736 in xfstests(detail show > as below). > > [...] @Chuck, @Jan I did the requested change directly. Please check! --- Applied to the vfs.fixes branch of the vfs/vfs.git tree. Patches in the vfs.fixes branch should appear in linux-next soon. Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it. It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated. Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch. tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs.fixes [1/1] libfs: fix infinite directory reads for offset dir https://git.kernel.org/vfs/vfs/c/fad90bfe412e
On Wed 31-07-24 16:16:42, Christian Brauner wrote: > On Wed, 31 Jul 2024 12:38:35 +0800, yangerkun wrote: > > After we switch tmpfs dir operations from simple_dir_operations to > > simple_offset_dir_operations, every rename happened will fill new dentry > > to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free > > key starting with octx->newx_offset, and then set newx_offset equals to > > free key + 1. This will lead to infinite readdir combine with rename > > happened at the same time, which fail generic/736 in xfstests(detail show > > as below). > > > > [...] > > @Chuck, @Jan I did the requested change directly. Please check! Thanks! Looks good. Feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza > > --- > > Applied to the vfs.fixes branch of the vfs/vfs.git tree. > Patches in the vfs.fixes branch should appear in linux-next soon. > > Please report any outstanding bugs that were missed during review in a > new review to the original patch series allowing us to drop it. > > It's encouraged to provide Acked-bys and Reviewed-bys even though the > patch has now been applied. If possible patch trailers will be updated. > > Note that commit hashes shown below are subject to change due to rebase, > trailer updates or similar. If in doubt, please check the listed branch. > > tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git > branch: vfs.fixes > > [1/1] libfs: fix infinite directory reads for offset dir > https://git.kernel.org/vfs/vfs/c/fad90bfe412e >
Hi! 在 2024/7/31 21:10, Filipe Manana 写道: > On Wed, Jul 31, 2024 at 1:51 PM yangerkun <yangerkun@huaweicloud.com> wrote: >> >> Hi! >> >> 在 2024/7/31 19:51, Jan Kara 写道: >>> On Wed 31-07-24 12:38:35, yangerkun wrote: >>>> After we switch tmpfs dir operations from simple_dir_operations to >>>> simple_offset_dir_operations, every rename happened will fill new dentry >>>> to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free >>>> key starting with octx->newx_offset, and then set newx_offset equals to >>>> free key + 1. This will lead to infinite readdir combine with rename >>>> happened at the same time, which fail generic/736 in xfstests(detail show >>>> as below). >>>> >>>> 1. create 5000 files(1 2 3...) under one dir >>>> 2. call readdir(man 3 readdir) once, and get one entry >>>> 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry) >>>> 4. loop 2~3, until readdir return nothing or we loop too many >>>> times(tmpfs break test with the second condition) >>>> >>>> We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite >>>> directory reads") to fix it, record the last_index when we open dir, and >>>> do not emit the entry which index >= last_index. The file->private_data >>>> now used in offset dir can use directly to do this, and we also update >>>> the last_index when we llseek the dir file. >>> >>> The patch looks good! Just I'm not sure about the llseek part. As far as I >>> understand it was added due to this sentence in the standard: >>> >>> "If a file is removed from or added to the directory after the most recent >>> call to opendir() or rewinddir(), whether a subsequent call to readdir() >>> returns an entry for that file is unspecified." >>> >>> So if the offset used in offset_dir_llseek() is 0, then we should update >>> last_index. But otherwise I'd leave it alone because IMHO it would do more >>> harm than good. >> >> IIUC, what you means is that we should only reset the private_data to >> new last_index when we call rewinddir(which will call lseek to set >> offset of dir file to 0)? >> >> Yeah, I prefer the logic you describle! Besides, we may also change >> btrfs that do the same(e60aa5da14d0 ("btrfs: refresh dir last index >> during a rewinddir(3) call")). Filipe, how do you think? > > What problem does it solve? > The standard doesn't forbid it, and I can't see anything wrong with it. Yeah, I didn't find any obvious bugs for this behavior. It's a choose does this seems better, and I think it's also ok to keep btrfs not change now. Thanks for your reply! > >> >> Thanks, >> Erkun. >> >>> Honza >>> >>>> >>>> Fixes: a2e459555c5f ("shmem: stable directory offsets") >>>> Signed-off-by: yangerkun <yangerkun@huawei.com> >>>> --- >>>> fs/libfs.c | 34 +++++++++++++++++++++++----------- >>>> 1 file changed, 23 insertions(+), 11 deletions(-) >>>> >>>> diff --git a/fs/libfs.c b/fs/libfs.c >>>> index 8aa34870449f..38b306738c00 100644 >>>> --- a/fs/libfs.c >>>> +++ b/fs/libfs.c >>>> @@ -450,6 +450,14 @@ void simple_offset_destroy(struct offset_ctx *octx) >>>> mtree_destroy(&octx->mt); >>>> } >>>> >>>> +static int offset_dir_open(struct inode *inode, struct file *file) >>>> +{ >>>> + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); >>>> + >>>> + file->private_data = (void *)ctx->next_offset; >>>> + return 0; >>>> +} >>>> + >>>> /** >>>> * offset_dir_llseek - Advance the read position of a directory descriptor >>>> * @file: an open directory whose position is to be updated >>>> @@ -463,6 +471,9 @@ void simple_offset_destroy(struct offset_ctx *octx) >>>> */ >>>> static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) >>>> { >>>> + struct inode *inode = file->f_inode; >>>> + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); >>>> + >>>> switch (whence) { >>>> case SEEK_CUR: >>>> offset += file->f_pos; >>>> @@ -476,7 +487,7 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) >>>> } >>>> >>>> /* In this case, ->private_data is protected by f_pos_lock */ >>>> - file->private_data = NULL; >>>> + file->private_data = (void *)ctx->next_offset; >>>> return vfs_setpos(file, offset, LONG_MAX); >>>> } >>>> >>>> @@ -507,7 +518,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) >>>> inode->i_ino, fs_umode_to_dtype(inode->i_mode)); >>>> } >>>> >>>> -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) >>>> +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index) >>>> { >>>> struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode); >>>> struct dentry *dentry; >>>> @@ -515,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) >>>> while (true) { >>>> dentry = offset_find_next(octx, ctx->pos); >>>> if (!dentry) >>>> - return ERR_PTR(-ENOENT); >>>> + return; >>>> + >>>> + if (dentry2offset(dentry) >= last_index) { >>>> + dput(dentry); >>>> + return; >>>> + } >>>> >>>> if (!offset_dir_emit(ctx, dentry)) { >>>> dput(dentry); >>>> - break; >>>> + return; >>>> } >>>> >>>> ctx->pos = dentry2offset(dentry) + 1; >>>> dput(dentry); >>>> } >>>> - return NULL; >>>> } >>>> >>>> /** >>>> @@ -552,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) >>>> static int offset_readdir(struct file *file, struct dir_context *ctx) >>>> { >>>> struct dentry *dir = file->f_path.dentry; >>>> + long last_index = (long)file->private_data; >>>> >>>> lockdep_assert_held(&d_inode(dir)->i_rwsem); >>>> >>>> if (!dir_emit_dots(file, ctx)) >>>> return 0; >>>> >>>> - /* In this case, ->private_data is protected by f_pos_lock */ >>>> - if (ctx->pos == DIR_OFFSET_MIN) >>>> - file->private_data = NULL; >>>> - else if (file->private_data == ERR_PTR(-ENOENT)) >>>> - return 0; >>>> - file->private_data = offset_iterate_dir(d_inode(dir), ctx); >>>> + offset_iterate_dir(d_inode(dir), ctx, last_index); >>>> return 0; >>>> } >>>> >>>> const struct file_operations simple_offset_dir_operations = { >>>> + .open = offset_dir_open, >>>> .llseek = offset_dir_llseek, >>>> .iterate_shared = offset_readdir, >>>> .read = generic_read_dir, >>>> -- >>>> 2.39.2 >>>> >>
Hi! 在 2024/7/31 22:16, Christian Brauner 写道: > On Wed, 31 Jul 2024 12:38:35 +0800, yangerkun wrote: >> After we switch tmpfs dir operations from simple_dir_operations to >> simple_offset_dir_operations, every rename happened will fill new dentry >> to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free >> key starting with octx->newx_offset, and then set newx_offset equals to >> free key + 1. This will lead to infinite readdir combine with rename >> happened at the same time, which fail generic/736 in xfstests(detail show >> as below). >> >> [...] > > @Chuck, @Jan I did the requested change directly. Please check! Thanks for applied this patch, the suggestions from Jan and Chuck will be a separates patch! > > --- > > Applied to the vfs.fixes branch of the vfs/vfs.git tree. > Patches in the vfs.fixes branch should appear in linux-next soon. > > Please report any outstanding bugs that were missed during review in a > new review to the original patch series allowing us to drop it. > > It's encouraged to provide Acked-bys and Reviewed-bys even though the > patch has now been applied. If possible patch trailers will be updated. > > Note that commit hashes shown below are subject to change due to rebase, > trailer updates or similar. If in doubt, please check the listed branch. > > tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git > branch: vfs.fixes > > [1/1] libfs: fix infinite directory reads for offset dir > https://git.kernel.org/vfs/vfs/c/fad90bfe412e
On Thu 01-08-24 11:32:25, yangerkun wrote: > Hi! > > 在 2024/7/31 22:16, Christian Brauner 写道: > > On Wed, 31 Jul 2024 12:38:35 +0800, yangerkun wrote: > > > After we switch tmpfs dir operations from simple_dir_operations to > > > simple_offset_dir_operations, every rename happened will fill new dentry > > > to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free > > > key starting with octx->newx_offset, and then set newx_offset equals to > > > free key + 1. This will lead to infinite readdir combine with rename > > > happened at the same time, which fail generic/736 in xfstests(detail show > > > as below). > > > > > > [...] > > > > @Chuck, @Jan I did the requested change directly. Please check! > > Thanks for applied this patch, the suggestions from Jan and Chuck will > be a separates patch! Christian already updated the patch as I've suggested so no need for you to send anything. Honza
在 2024/8/1 21:30, Jan Kara 写道: > On Thu 01-08-24 11:32:25, yangerkun wrote: >> Hi! >> >> 在 2024/7/31 22:16, Christian Brauner 写道: >>> On Wed, 31 Jul 2024 12:38:35 +0800, yangerkun wrote: >>>> After we switch tmpfs dir operations from simple_dir_operations to >>>> simple_offset_dir_operations, every rename happened will fill new dentry >>>> to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free >>>> key starting with octx->newx_offset, and then set newx_offset equals to >>>> free key + 1. This will lead to infinite readdir combine with rename >>>> happened at the same time, which fail generic/736 in xfstests(detail show >>>> as below). >>>> >>>> [...] >>> >>> @Chuck, @Jan I did the requested change directly. Please check! >> >> Thanks for applied this patch, the suggestions from Jan and Chuck will >> be a separates patch! > > Christian already updated the patch as I've suggested so no need for you > to send anything. OK, thanks for that! > > Honza >
diff --git a/fs/libfs.c b/fs/libfs.c index 8aa34870449f..38b306738c00 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -450,6 +450,14 @@ void simple_offset_destroy(struct offset_ctx *octx) mtree_destroy(&octx->mt); } +static int offset_dir_open(struct inode *inode, struct file *file) +{ + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); + + file->private_data = (void *)ctx->next_offset; + return 0; +} + /** * offset_dir_llseek - Advance the read position of a directory descriptor * @file: an open directory whose position is to be updated @@ -463,6 +471,9 @@ void simple_offset_destroy(struct offset_ctx *octx) */ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) { + struct inode *inode = file->f_inode; + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); + switch (whence) { case SEEK_CUR: offset += file->f_pos; @@ -476,7 +487,7 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) } /* In this case, ->private_data is protected by f_pos_lock */ - file->private_data = NULL; + file->private_data = (void *)ctx->next_offset; return vfs_setpos(file, offset, LONG_MAX); } @@ -507,7 +518,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) inode->i_ino, fs_umode_to_dtype(inode->i_mode)); } -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index) { struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode); struct dentry *dentry; @@ -515,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) while (true) { dentry = offset_find_next(octx, ctx->pos); if (!dentry) - return ERR_PTR(-ENOENT); + return; + + if (dentry2offset(dentry) >= last_index) { + dput(dentry); + return; + } if (!offset_dir_emit(ctx, dentry)) { dput(dentry); - break; + return; } ctx->pos = dentry2offset(dentry) + 1; dput(dentry); } - return NULL; } /** @@ -552,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) static int offset_readdir(struct file *file, struct dir_context *ctx) { struct dentry *dir = file->f_path.dentry; + long last_index = (long)file->private_data; lockdep_assert_held(&d_inode(dir)->i_rwsem); if (!dir_emit_dots(file, ctx)) return 0; - /* In this case, ->private_data is protected by f_pos_lock */ - if (ctx->pos == DIR_OFFSET_MIN) - file->private_data = NULL; - else if (file->private_data == ERR_PTR(-ENOENT)) - return 0; - file->private_data = offset_iterate_dir(d_inode(dir), ctx); + offset_iterate_dir(d_inode(dir), ctx, last_index); return 0; } const struct file_operations simple_offset_dir_operations = { + .open = offset_dir_open, .llseek = offset_dir_llseek, .iterate_shared = offset_readdir, .read = generic_read_dir,
After we switch tmpfs dir operations from simple_dir_operations to simple_offset_dir_operations, every rename happened will fill new dentry to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free key starting with octx->newx_offset, and then set newx_offset equals to free key + 1. This will lead to infinite readdir combine with rename happened at the same time, which fail generic/736 in xfstests(detail show as below). 1. create 5000 files(1 2 3...) under one dir 2. call readdir(man 3 readdir) once, and get one entry 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry) 4. loop 2~3, until readdir return nothing or we loop too many times(tmpfs break test with the second condition) We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite directory reads") to fix it, record the last_index when we open dir, and do not emit the entry which index >= last_index. The file->private_data now used in offset dir can use directly to do this, and we also update the last_index when we llseek the dir file. Fixes: a2e459555c5f ("shmem: stable directory offsets") Signed-off-by: yangerkun <yangerkun@huawei.com> --- fs/libfs.c | 34 +++++++++++++++++++++++----------- 1 file changed, 23 insertions(+), 11 deletions(-)