Message ID | 20211101115324.374076-1-nborisov@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | Balance vs device add fixes | expand |
On 01/11/2021 19:53, Nikolay Borisov wrote: > This series enables adding of a device when balance is paused (i.e an fs is mounted > with skip_balance options). This is needed to give users a chance to gracefully > handle an ENOSPC situation in the face of running balance. To achieve this introduce > a new exclop - BALANCE_PAUSED which is made compatible with device add. More > details in each patche. > Have you thought about allowing the device-add during the balance-running, not only at the balance-paused? Is it much complicated? If not, then we could drop the new exclop state BALANCE_PAUSED altogether.
On Tue, Nov 02, 2021 at 12:52:11PM +0800, Anand Jain wrote: > On 01/11/2021 19:53, Nikolay Borisov wrote: > > This series enables adding of a device when balance is paused (i.e an fs is mounted > > with skip_balance options). This is needed to give users a chance to gracefully > > handle an ENOSPC situation in the face of running balance. To achieve this introduce > > a new exclop - BALANCE_PAUSED which is made compatible with device add. More > > details in each patche. > > > > Have you thought about allowing the device-add during the balance-running, > not only at the balance-paused? Is it much complicated? If not, then we > could drop the new exclop state BALANCE_PAUSED altogether. > We could be changing the raid settings, adding chunks before the new device is added and then ending up with chunks that don't have all of their stripes. The number of bugs that could show up is scary, I think this is the safer choice for now. Thanks, Josef
On Mon, Nov 01, 2021 at 01:53:21PM +0200, Nikolay Borisov wrote: > This series enables adding of a device when balance is paused (i.e an fs is mounted > with skip_balance options). This is needed to give users a chance to gracefully > handle an ENOSPC situation in the face of running balance. To achieve this introduce > a new exclop - BALANCE_PAUSED which is made compatible with device add. More > details in each patche. > > I've tested this with an fstests which I will be posting in a bit. > > Nikolay Borisov (3): > btrfs: introduce BTRFS_EXCLOP_BALANCE_PAUSED exclusive state > btrfs: make device add compatible with paused balance in > btrfs_exclop_start_try_lock > btrfs: allow device add if balance is paused > > fs/btrfs/ctree.h | 1 + > fs/btrfs/ioctl.c | 49 +++++++++++++++++++++++++++++++++++++++------- > fs/btrfs/volumes.c | 23 ++++++++++++++++++---- > fs/btrfs/volumes.h | 2 +- > 4 files changed, 63 insertions(+), 12 deletions(-) > A few things 1) Can we integrate the flipping into helpers? Something like btrfs_exclop_change_state(PAUSED); So the locking and stuff is all with the code that messes with the exclop? 2) The existing helpers do WRITE_ONCE(), is that needed here? I assume not because we're not actually exiting our exclop state, but still seems wonky. 3) Maybe have an __btrfs_exclop_finish(type), so instead of if (paused) { do thing; } else { btrfs_exclop_finish(); } you can instead do type = BTRFS_EXCLOP_NONE; if (pause stuff) { do things; type = BTRFS_EXCLOP_BALANCE_PAUSED; } /* other stuff. */ __btrfs_exclop_finish(type); then btrfs_exclop_finish just does __btrfs_exclop_finish(NONE); Thanks, Josef
On 2.11.21 г. 16:30, Josef Bacik wrote: > On Mon, Nov 01, 2021 at 01:53:21PM +0200, Nikolay Borisov wrote: >> This series enables adding of a device when balance is paused (i.e an fs is mounted >> with skip_balance options). This is needed to give users a chance to gracefully >> handle an ENOSPC situation in the face of running balance. To achieve this introduce >> a new exclop - BALANCE_PAUSED which is made compatible with device add. More >> details in each patche. >> >> I've tested this with an fstests which I will be posting in a bit. >> >> Nikolay Borisov (3): >> btrfs: introduce BTRFS_EXCLOP_BALANCE_PAUSED exclusive state >> btrfs: make device add compatible with paused balance in >> btrfs_exclop_start_try_lock >> btrfs: allow device add if balance is paused >> >> fs/btrfs/ctree.h | 1 + >> fs/btrfs/ioctl.c | 49 +++++++++++++++++++++++++++++++++++++++------- >> fs/btrfs/volumes.c | 23 ++++++++++++++++++---- >> fs/btrfs/volumes.h | 2 +- >> 4 files changed, 63 insertions(+), 12 deletions(-) >> > > A few things > > 1) Can we integrate the flipping into helpers? Something like > > btrfs_exclop_change_state(PAUSED); > > So the locking and stuff is all with the code that messes with the exclop? Right, I left the code flipping balance->paused opencoded because that's really a special case. By all means I can add a specific helper so that the ASSERT is not lost as well. The reason I didn't do it in the first place is because PAUSED is really "special" in the sense it can be entered only from BALANCE and it's not really generic. If you take a look how btrfs_exclop_start does it for example, it simply checks we don't have a running op and simply sets it to whatever is passed > > 2) The existing helpers do WRITE_ONCE(), is that needed here? I assume not> because we're not actually exiting our exclop state, but still seems wonky. That got me thinking in the first place and actually initially I had a patch which removed it. However, I *think* it might be required since exclusive_operation is accessed without a lock ini the sysfs code i.e. btrfs_exclusive_operation_show so I guess that's why we need it. Goldwyn, what's your take on this? > > 3) Maybe have an __btrfs_exclop_finish(type), so instead of > > if (paused) { > do thing; > } else { > btrfs_exclop_finish(); > } > > you can instead do > > type = BTRFS_EXCLOP_NONE; > if (pause stuff) { > do things; > type = BTRFS_EXCLOP_BALANCE_PAUSED; > } > > /* other stuff. */ > __btrfs_exclop_finish(type); > > then btrfs_exclop_finish just does __btrfs_exclop_finish(NONE); I'm having a hard time seeing how this would increase readability. What should go into the __btrfs_exclop_finish function? > Thanks, > > Josef >
On Tue, Nov 02, 2021 at 05:25:32PM +0200, Nikolay Borisov wrote: > > > On 2.11.21 г. 16:30, Josef Bacik wrote: > > On Mon, Nov 01, 2021 at 01:53:21PM +0200, Nikolay Borisov wrote: > >> This series enables adding of a device when balance is paused (i.e an fs is mounted > >> with skip_balance options). This is needed to give users a chance to gracefully > >> handle an ENOSPC situation in the face of running balance. To achieve this introduce > >> a new exclop - BALANCE_PAUSED which is made compatible with device add. More > >> details in each patche. > >> > >> I've tested this with an fstests which I will be posting in a bit. > >> > >> Nikolay Borisov (3): > >> btrfs: introduce BTRFS_EXCLOP_BALANCE_PAUSED exclusive state > >> btrfs: make device add compatible with paused balance in > >> btrfs_exclop_start_try_lock > >> btrfs: allow device add if balance is paused > >> > >> fs/btrfs/ctree.h | 1 + > >> fs/btrfs/ioctl.c | 49 +++++++++++++++++++++++++++++++++++++++------- > >> fs/btrfs/volumes.c | 23 ++++++++++++++++++---- > >> fs/btrfs/volumes.h | 2 +- > >> 4 files changed, 63 insertions(+), 12 deletions(-) > >> > > > > A few things > > > > 1) Can we integrate the flipping into helpers? Something like > > > > btrfs_exclop_change_state(PAUSED); > > > > So the locking and stuff is all with the code that messes with the exclop? > > Right, I left the code flipping balance->paused opencoded because that's > really a special case. By all means I can add a specific helper so that > the ASSERT is not lost as well. The reason I didn't do it in the first > place is because PAUSED is really "special" in the sense it can be > entered only from BALANCE and it's not really generic. If you take a > look how btrfs_exclop_start does it for example, it simply checks we > don't have a running op and simply sets it to whatever is passed > > > > > 2) The existing helpers do WRITE_ONCE(), is that needed here? I assume not> because we're not actually exiting our exclop state, but still > seems wonky. > > That got me thinking in the first place and actually initially I had a > patch which removed it. However, I *think* it might be required since > exclusive_operation is accessed without a lock ini the sysfs code i.e. > btrfs_exclusive_operation_show so I guess that's why we need it. > > Goldwyn, what's your take on this? > > > > > 3) Maybe have an __btrfs_exclop_finish(type), so instead of > > > > if (paused) { > > do thing; > > } else { > > btrfs_exclop_finish(); > > } > > > > you can instead do > > > > type = BTRFS_EXCLOP_NONE; > > if (pause stuff) { > > do things; > > type = BTRFS_EXCLOP_BALANCE_PAUSED; > > } > > > > /* other stuff. */ > > __btrfs_exclop_finish(type); > > > > then btrfs_exclop_finish just does __btrfs_exclop_finish(NONE); > > I'm having a hard time seeing how this would increase readability. What > should go into the __btrfs_exclop_finish function? > btrfs_exclop_finish would become __btrfs_exclop_finish(type) and do all the work, but instead of setting NONE it would set type. Honestly I could go either way, having a helper would make it more readable than it is, because then its if (pause) btrfs_exclop_pause(); else btrfs_exclop_finish(); I'm not strong on this, I think having a helper instead of open coding helps given the number of places it's used. Perhaps just doing that step will make it clean enough. Thanks, Josef
On 17:25 02/11, Nikolay Borisov wrote: > > > On 2.11.21 г. 16:30, Josef Bacik wrote: > > On Mon, Nov 01, 2021 at 01:53:21PM +0200, Nikolay Borisov wrote: > >> This series enables adding of a device when balance is paused (i.e an fs is mounted > >> with skip_balance options). This is needed to give users a chance to gracefully > >> handle an ENOSPC situation in the face of running balance. To achieve this introduce > >> a new exclop - BALANCE_PAUSED which is made compatible with device add. More > >> details in each patche. > >> > >> I've tested this with an fstests which I will be posting in a bit. > >> > >> Nikolay Borisov (3): > >> btrfs: introduce BTRFS_EXCLOP_BALANCE_PAUSED exclusive state > >> btrfs: make device add compatible with paused balance in > >> btrfs_exclop_start_try_lock > >> btrfs: allow device add if balance is paused > >> > >> fs/btrfs/ctree.h | 1 + > >> fs/btrfs/ioctl.c | 49 +++++++++++++++++++++++++++++++++++++++------- > >> fs/btrfs/volumes.c | 23 ++++++++++++++++++---- > >> fs/btrfs/volumes.h | 2 +- > >> 4 files changed, 63 insertions(+), 12 deletions(-) > >> > > > > A few things > > > > 1) Can we integrate the flipping into helpers? Something like > > > > btrfs_exclop_change_state(PAUSED); > > > > So the locking and stuff is all with the code that messes with the exclop? > > Right, I left the code flipping balance->paused opencoded because that's > really a special case. By all means I can add a specific helper so that > the ASSERT is not lost as well. The reason I didn't do it in the first > place is because PAUSED is really "special" in the sense it can be > entered only from BALANCE and it's not really generic. If you take a > look how btrfs_exclop_start does it for example, it simply checks we > don't have a running op and simply sets it to whatever is passed > > > > > 2) The existing helpers do WRITE_ONCE(), is that needed here? I assume not> because we're not actually exiting our exclop state, but still > seems wonky. > > That got me thinking in the first place and actually initially I had a > patch which removed it. However, I *think* it might be required since > exclusive_operation is accessed without a lock ini the sysfs code i.e. > btrfs_exclusive_operation_show so I guess that's why we need it. > > Goldwyn, what's your take on this? Yes, btrfs_exclusive_operation_show() does not lock so it would deem necessary. But do we really need to use *_ONCE, assuming btrfs_exclusive_operation fits in 8 bits?
On 2.11.21 г. 19:25, Goldwyn Rodrigues wrote: > But do we really need to use *_ONCE, assuming btrfs_exclusive_operation > fits in 8 bits? > The way I understand it based on the LWN articles is that the effect of _ONCE is twofold: 1. It prevents theoretical torn writes + forces the compiler to always issue the access i.e prevent it being optimized out - this could be moot in our case. 2. It serves a documentation purpose where it states "this variable is accessed in multithreaded contexts, possibly without an explicit lock" - and this IMO is quite helpful in this particular context.