Message ID | 0e445145d0faff95d0c42e5ebac222d838bb0293.1690495785.git.boris@bur.io (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: simple quotas | expand |
On Thu, Jul 27, 2023 at 03:13:01PM -0700, Boris Burkov wrote: > Consider the following sequence: > - enable quotas > - create subvol S id 256 at dir outer/ > - create a qgroup 1/100 > - add 0/256 (S's auto qgroup) to 1/100 > - create subvol T id 257 at dir outer/inner/ > > With full qgroups, there is no relationship between 0/257 and either of > 0/256 or 1/100. There is an inherit feature that the creator of inner/ > can use to specify it ought to be in 1/100. > > Simple quotas are targeted at container isolation, where such automatic > inheritance for not necessarily trusted/controlled nested subvol > creation would be quite helpful. Therefore, add a new default behavior > for simple quotas: when you create a nested subvol, automatically > inherit as parents any parents of the qgroup of the subvol the new inode > is going in. > > In our example, 257/0 would also be under 1/100, allowing easy control > of a total quota over an arbitrary hierarchy of subvolumes. > > I think this _might_ be a generally useful behavior, so it could be > interesting to put it behind a new inheritance flag that simple quotas > always use while traditional quotas let the user specify, but this is a > minimally intrusive change to start. > > Signed-off-by: Boris Burkov <boris@bur.io> > --- > fs/btrfs/ioctl.c | 2 +- > fs/btrfs/qgroup.c | 44 +++++++++++++++++++++++++++++++++++++++--- > fs/btrfs/qgroup.h | 6 +++--- > fs/btrfs/transaction.c | 13 +++++++++---- > 4 files changed, 54 insertions(+), 11 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 9b61bc62e439..c9b069077fd0 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -652,7 +652,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap, > /* Tree log can't currently deal with an inode which is a new root. */ > btrfs_set_log_full_commit(trans); > > - ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit); > + ret = btrfs_qgroup_inherit(trans, 0, objectid, root->root_key.objectid, inherit); > if (ret) > goto out; > > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c > index dedc532669f4..58e9ed0deedd 100644 > --- a/fs/btrfs/qgroup.c > +++ b/fs/btrfs/qgroup.c > @@ -1550,8 +1550,7 @@ static int quick_update_accounting(struct btrfs_fs_info *fs_info, > return ret; > } > > -int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, > - u64 dst) > +int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst) > { > struct btrfs_fs_info *fs_info = trans->fs_info; > struct btrfs_qgroup *parent; > @@ -2991,6 +2990,40 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans) > return ret; > } > > +static int qgroup_auto_inherit(struct btrfs_fs_info *fs_info, > + u64 inode_rootid, > + struct btrfs_qgroup_inherit **inherit) > +{ > + int i = 0; > + u64 num_qgroups = 0; > + struct btrfs_qgroup *inode_qg; > + struct btrfs_qgroup_list *qg_list; > + > + if (*inherit) > + return -EEXIST; > + > + inode_qg = find_qgroup_rb(fs_info, inode_rootid); > + if (!inode_qg) > + return -ENOENT; > + > + num_qgroups = list_count_nodes(&inode_qg->groups); > + > + if (!num_qgroups) > + return 0; > + > + *inherit = kzalloc(sizeof(**inherit) + num_qgroups * sizeof(u64), GFP_NOFS); > + if (!*inherit) > + return -ENOMEM; > + (*inherit)->num_qgroups = num_qgroups; > + > + list_for_each_entry(qg_list, &inode_qg->groups, next_group) { > + u64 qg_id = qg_list->group->qgroupid; > + *((u64 *)((*inherit)+1) + i) = qg_id; > + } > + > + return 0; > +} > + > /* > * Copy the accounting information between qgroups. This is necessary > * when a snapshot or a subvolume is created. Throwing an error will > @@ -2998,7 +3031,8 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans) > * when a readonly fs is a reasonable outcome. > */ > int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, > - u64 objectid, struct btrfs_qgroup_inherit *inherit) > + u64 objectid, u64 inode_rootid, > + struct btrfs_qgroup_inherit *inherit) > { > int ret = 0; > int i; > @@ -3040,6 +3074,9 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, > goto out; > } > > + if (!inherit && btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) > + qgroup_auto_inherit(fs_info, inode_rootid, &inherit); > + > if (inherit) { > i_qgroups = (u64 *)(inherit + 1); > nums = inherit->num_qgroups + 2 * inherit->num_ref_copies + > @@ -3066,6 +3103,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, > if (ret) > goto out; > > + Extraneous whitespace change. Once fixed you can add Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
On Thu, Jul 27, 2023 at 03:13:01PM -0700, Boris Burkov wrote: > Consider the following sequence: > - enable quotas > - create subvol S id 256 at dir outer/ > - create a qgroup 1/100 > - add 0/256 (S's auto qgroup) to 1/100 > - create subvol T id 257 at dir outer/inner/ > > With full qgroups, there is no relationship between 0/257 and either of > 0/256 or 1/100. There is an inherit feature that the creator of inner/ > can use to specify it ought to be in 1/100. > > Simple quotas are targeted at container isolation, where such automatic > inheritance for not necessarily trusted/controlled nested subvol > creation would be quite helpful. Therefore, add a new default behavior > for simple quotas: when you create a nested subvol, automatically > inherit as parents any parents of the qgroup of the subvol the new inode > is going in. > > In our example, 257/0 would also be under 1/100, allowing easy control > of a total quota over an arbitrary hierarchy of subvolumes. > > I think this _might_ be a generally useful behavior, so it could be > interesting to put it behind a new inheritance flag that simple quotas > always use while traditional quotas let the user specify, but this is a > minimally intrusive change to start. > > Signed-off-by: Boris Burkov <boris@bur.io> > --- > fs/btrfs/ioctl.c | 2 +- > fs/btrfs/qgroup.c | 44 +++++++++++++++++++++++++++++++++++++++--- > fs/btrfs/qgroup.h | 6 +++--- > fs/btrfs/transaction.c | 13 +++++++++---- > 4 files changed, 54 insertions(+), 11 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 9b61bc62e439..c9b069077fd0 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -652,7 +652,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap, > /* Tree log can't currently deal with an inode which is a new root. */ > btrfs_set_log_full_commit(trans); > > - ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit); > + ret = btrfs_qgroup_inherit(trans, 0, objectid, root->root_key.objectid, inherit); > if (ret) > goto out; > > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c > index dedc532669f4..58e9ed0deedd 100644 > --- a/fs/btrfs/qgroup.c > +++ b/fs/btrfs/qgroup.c > @@ -1550,8 +1550,7 @@ static int quick_update_accounting(struct btrfs_fs_info *fs_info, > return ret; > } > > -int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, > - u64 dst) > +int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst) > { > struct btrfs_fs_info *fs_info = trans->fs_info; > struct btrfs_qgroup *parent; > @@ -2991,6 +2990,40 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans) > return ret; > } > > +static int qgroup_auto_inherit(struct btrfs_fs_info *fs_info, > + u64 inode_rootid, > + struct btrfs_qgroup_inherit **inherit) > +{ > + int i = 0; > + u64 num_qgroups = 0; > + struct btrfs_qgroup *inode_qg; > + struct btrfs_qgroup_list *qg_list; > + > + if (*inherit) > + return -EEXIST; > + > + inode_qg = find_qgroup_rb(fs_info, inode_rootid); > + if (!inode_qg) > + return -ENOENT; > + > + num_qgroups = list_count_nodes(&inode_qg->groups); > + > + if (!num_qgroups) > + return 0; > + > + *inherit = kzalloc(sizeof(**inherit) + num_qgroups * sizeof(u64), GFP_NOFS); There's kcalloc that verifies the potential multiplication overflow. > + if (!*inherit) > + return -ENOMEM; > + (*inherit)->num_qgroups = num_qgroups; > + > + list_for_each_entry(qg_list, &inode_qg->groups, next_group) { > + u64 qg_id = qg_list->group->qgroupid; > + *((u64 *)((*inherit)+1) + i) = qg_id; What does this do? > + } Instead of reusing the *inherit for operations, please add a local variable and at the end assign it to *inherit. > + > + return 0; > +} > + > /* > * Copy the accounting information between qgroups. This is necessary > * when a snapshot or a subvolume is created. Throwing an error will > @@ -2998,7 +3031,8 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans) > * when a readonly fs is a reasonable outcome. > */ > int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, > - u64 objectid, struct btrfs_qgroup_inherit *inherit) > + u64 objectid, u64 inode_rootid, > + struct btrfs_qgroup_inherit *inherit) > { > int ret = 0; > int i; > @@ -3040,6 +3074,9 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, > goto out; > } > > + if (!inherit && btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) Swap the conditions please > + qgroup_auto_inherit(fs_info, inode_rootid, &inherit); > + > if (inherit) { > i_qgroups = (u64 *)(inherit + 1); > nums = inherit->num_qgroups + 2 * inherit->num_ref_copies + > @@ -3066,6 +3103,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, > if (ret) > goto out; > > + Stray newline > /* > * add qgroup to all inherited groups > */ > diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h > index 94d85b4fbebd..ce6fa8694ca7 100644 > --- a/fs/btrfs/qgroup.h > +++ b/fs/btrfs/qgroup.h > @@ -271,8 +271,7 @@ int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); > void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info); > int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info, > bool interruptible); > -int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, > - u64 dst); > +int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst); > int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, > u64 dst); > int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid); > @@ -366,7 +365,8 @@ int btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans, u64 bytenr, > int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans); > int btrfs_run_qgroups(struct btrfs_trans_handle *trans); > int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, > - u64 objectid, struct btrfs_qgroup_inherit *inherit); > + u64 objectid, u64 inode_rootid, > + struct btrfs_qgroup_inherit *inherit); > void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info, > u64 ref_root, u64 num_bytes, > enum btrfs_qgroup_rsv_type type); > diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c > index 25217888e897..fb857147df57 100644 > --- a/fs/btrfs/transaction.c > +++ b/fs/btrfs/transaction.c > @@ -1529,13 +1529,14 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, > int ret; > > /* > - * Save some performance in the case that full qgroups are not > + * Save some performance in the case that qgroups are not > * enabled. If this check races with the ioctl, rescan will > * kick in anyway. > */ > if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) > return 0; > > + Again > /* > * Ensure dirty @src will be committed. Or, after coming > * commit_fs_roots() and switch_commit_roots(), any dirty but not > @@ -1572,7 +1573,7 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, > > /* Now qgroup are all updated, we can inherit it to new qgroups */ > ret = btrfs_qgroup_inherit(trans, src->root_key.objectid, dst_objectid, > - inherit); > + parent->root_key.objectid, inherit); > if (ret < 0) > goto out; > > @@ -1839,8 +1840,12 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, > * To co-operate with that hack, we do hack again. > * Or snapshot will be greatly slowed down by a subtree qgroup rescan > */ > - ret = qgroup_account_snapshot(trans, root, parent_root, > - pending->inherit, objectid); > + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_FULL) > + ret = qgroup_account_snapshot(trans, root, parent_root, > + pending->inherit, objectid); > + else if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) > + ret = btrfs_qgroup_inherit(trans, root->root_key.objectid, objectid, > + parent_root->root_key.objectid, pending->inherit); > if (ret < 0) > goto fail; > > -- > 2.41.0
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 9b61bc62e439..c9b069077fd0 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -652,7 +652,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap, /* Tree log can't currently deal with an inode which is a new root. */ btrfs_set_log_full_commit(trans); - ret = btrfs_qgroup_inherit(trans, 0, objectid, inherit); + ret = btrfs_qgroup_inherit(trans, 0, objectid, root->root_key.objectid, inherit); if (ret) goto out; diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index dedc532669f4..58e9ed0deedd 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -1550,8 +1550,7 @@ static int quick_update_accounting(struct btrfs_fs_info *fs_info, return ret; } -int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, - u64 dst) +int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst) { struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_qgroup *parent; @@ -2991,6 +2990,40 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans) return ret; } +static int qgroup_auto_inherit(struct btrfs_fs_info *fs_info, + u64 inode_rootid, + struct btrfs_qgroup_inherit **inherit) +{ + int i = 0; + u64 num_qgroups = 0; + struct btrfs_qgroup *inode_qg; + struct btrfs_qgroup_list *qg_list; + + if (*inherit) + return -EEXIST; + + inode_qg = find_qgroup_rb(fs_info, inode_rootid); + if (!inode_qg) + return -ENOENT; + + num_qgroups = list_count_nodes(&inode_qg->groups); + + if (!num_qgroups) + return 0; + + *inherit = kzalloc(sizeof(**inherit) + num_qgroups * sizeof(u64), GFP_NOFS); + if (!*inherit) + return -ENOMEM; + (*inherit)->num_qgroups = num_qgroups; + + list_for_each_entry(qg_list, &inode_qg->groups, next_group) { + u64 qg_id = qg_list->group->qgroupid; + *((u64 *)((*inherit)+1) + i) = qg_id; + } + + return 0; +} + /* * Copy the accounting information between qgroups. This is necessary * when a snapshot or a subvolume is created. Throwing an error will @@ -2998,7 +3031,8 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans) * when a readonly fs is a reasonable outcome. */ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, - u64 objectid, struct btrfs_qgroup_inherit *inherit) + u64 objectid, u64 inode_rootid, + struct btrfs_qgroup_inherit *inherit) { int ret = 0; int i; @@ -3040,6 +3074,9 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, goto out; } + if (!inherit && btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + qgroup_auto_inherit(fs_info, inode_rootid, &inherit); + if (inherit) { i_qgroups = (u64 *)(inherit + 1); nums = inherit->num_qgroups + 2 * inherit->num_ref_copies + @@ -3066,6 +3103,7 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, if (ret) goto out; + /* * add qgroup to all inherited groups */ diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index 94d85b4fbebd..ce6fa8694ca7 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -271,8 +271,7 @@ int btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info); void btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info); int btrfs_qgroup_wait_for_completion(struct btrfs_fs_info *fs_info, bool interruptible); -int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, - u64 dst); +int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst); int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, u64 src, u64 dst); int btrfs_create_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid); @@ -366,7 +365,8 @@ int btrfs_qgroup_account_extent(struct btrfs_trans_handle *trans, u64 bytenr, int btrfs_qgroup_account_extents(struct btrfs_trans_handle *trans); int btrfs_run_qgroups(struct btrfs_trans_handle *trans); int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, u64 srcid, - u64 objectid, struct btrfs_qgroup_inherit *inherit); + u64 objectid, u64 inode_rootid, + struct btrfs_qgroup_inherit *inherit); void btrfs_qgroup_free_refroot(struct btrfs_fs_info *fs_info, u64 ref_root, u64 num_bytes, enum btrfs_qgroup_rsv_type type); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 25217888e897..fb857147df57 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1529,13 +1529,14 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, int ret; /* - * Save some performance in the case that full qgroups are not + * Save some performance in the case that qgroups are not * enabled. If this check races with the ioctl, rescan will * kick in anyway. */ if (btrfs_qgroup_mode(fs_info) != BTRFS_QGROUP_MODE_FULL) return 0; + /* * Ensure dirty @src will be committed. Or, after coming * commit_fs_roots() and switch_commit_roots(), any dirty but not @@ -1572,7 +1573,7 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, /* Now qgroup are all updated, we can inherit it to new qgroups */ ret = btrfs_qgroup_inherit(trans, src->root_key.objectid, dst_objectid, - inherit); + parent->root_key.objectid, inherit); if (ret < 0) goto out; @@ -1839,8 +1840,12 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, * To co-operate with that hack, we do hack again. * Or snapshot will be greatly slowed down by a subtree qgroup rescan */ - ret = qgroup_account_snapshot(trans, root, parent_root, - pending->inherit, objectid); + if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_FULL) + ret = qgroup_account_snapshot(trans, root, parent_root, + pending->inherit, objectid); + else if (btrfs_qgroup_mode(fs_info) == BTRFS_QGROUP_MODE_SIMPLE) + ret = btrfs_qgroup_inherit(trans, root->root_key.objectid, objectid, + parent_root->root_key.objectid, pending->inherit); if (ret < 0) goto fail;
Consider the following sequence: - enable quotas - create subvol S id 256 at dir outer/ - create a qgroup 1/100 - add 0/256 (S's auto qgroup) to 1/100 - create subvol T id 257 at dir outer/inner/ With full qgroups, there is no relationship between 0/257 and either of 0/256 or 1/100. There is an inherit feature that the creator of inner/ can use to specify it ought to be in 1/100. Simple quotas are targeted at container isolation, where such automatic inheritance for not necessarily trusted/controlled nested subvol creation would be quite helpful. Therefore, add a new default behavior for simple quotas: when you create a nested subvol, automatically inherit as parents any parents of the qgroup of the subvol the new inode is going in. In our example, 257/0 would also be under 1/100, allowing easy control of a total quota over an arbitrary hierarchy of subvolumes. I think this _might_ be a generally useful behavior, so it could be interesting to put it behind a new inheritance flag that simple quotas always use while traditional quotas let the user specify, but this is a minimally intrusive change to start. Signed-off-by: Boris Burkov <boris@bur.io> --- fs/btrfs/ioctl.c | 2 +- fs/btrfs/qgroup.c | 44 +++++++++++++++++++++++++++++++++++++++--- fs/btrfs/qgroup.h | 6 +++--- fs/btrfs/transaction.c | 13 +++++++++---- 4 files changed, 54 insertions(+), 11 deletions(-)