Message ID | cover.1659357652.git.dsterba@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [GIT,PULL] Btrfs updates for 5.20 | expand |
On 2022/8/2 00:40, David Sterba wrote: > Hi, > > this update brings some long awaited changes, the send protocol bump, > otherwise lots of small improvements and fixes. The main core part is > reworking bio handling, cleaning up the submission and endio and > improving error handling. > > There are some non-btrfs patches adding helpers or updating API, > listed at the end of the changelog. > > Please pull, thanks. > > Features: > > - sysfs: > - export chunk size, in debug mode add tunable for setting its size > - show zoned among features (was only in debug mode) > - show commit stats (number, last/max/total duration) > > - send protocol updated to 2 > - new commands: > - ability write larger data chunks than 64K > - send raw compressed extents (uses the encoded data ioctls), ie. no > decompression on send side, no compression needed on receive side > if supported > - send 'otime' (inode creation time) among other timestamps > - send file attributes (a.k.a file flags and xflags) > - this is first version bump, backward compatibility on send and > receive side is provided > - there are still some known and wanted commands that will be > implemented in the near future, another version bump will be needed, > however we want to minimize that to avoid causing usability issues > > - print checksum type and implementation at mount time > > - don't print some messages at mount (mentioned as people asked about > it), we want to print messages namely for new features so let's make > some space for that > - big metadata - this has been supported for a long time and is not a > feature that's worth mentioning > - skinny metadata - same reason, set by default by mkfs > > Performance improvements: > > - reduced amount of reserved metadata for delayed items > - when inserted items can be batched into one leaf > - when deleting batched directory index items > - when deleting delayed items used for deletion > - overall improved count of files/sec, decreased subvolume lock > contention > > - metadata item access bounds checker micro-optimized, with a few > percent of improved runtime for metadata-heavy operations > > - increase direct io limit for read to 256 sectors, improved throughput > by 3x on sample workload > > Notable fixes: > > - raid56 > - reduce parity writes, skip sectors of stripe when there are no data > updates > - restore reading from stripe cache instead of triggering new read Small typo I guess. It's the opposite, restore reading from on-disk data instead of using stripe cache. In fact, these two modification mostly makes btrfs/125 and other recovery related scenarios (greatly reduce the chance of destructive RMW). In fact, these two may be way more important than write-intent/full-journal. Thanks, Qu > > - refuse to replay log with unknown incompat read-only feature bit set > > - zoned > - fix page locking when COW fails in the middle of allocation > - improved tracking of active zones, ZNS drives may limit the number > and there are ENOSPC errors due to that limit and not actual lack of > space > - adjust maximum extent size for zone append so it does not cause late > ENOSPC due to underreservation > > - mirror reading error messages show the mirror number > > - don't fallback to buffered IO for NOWAIT direct IO writes, we don't > have the NOWAIT semantics for buffered io yet > > - send, fix sending link commands for existing file paths when there are > deleted and created hardlinks for same files > > - repair all mirrors for profiles with more than 1 copy (raid1c34) > > - fix repair of compressed extents, unify where error detection and > repair happen > > Core changes: > > - bio completion cleanups > - don't double defer compression bios > - simplify endio workqueues > - add more data to btrfs_bio to avoid allocation for read requests > - rework bio error handling so it's same what block layer does, the > submission works and errors are consumed in endio > - when asynchronous bio offload fails fall back to synchronous > checksum calculation to avoid errors under writeback or memory > pressure > > - new trace points > - raid56 events > - ordered extent operations > > - super block log_root_transid deprecated (never used) > > - mixed_backref and big_metadata sysfs feature files removed, they've > been default for sufficiently long time, there are no known users and > mixed_backref could be confused with mixed_groups > > Non-btrfs changes, API updates: > > - minor highmem API update to cover const arguments > > - switch all kmap/kmap_atomic to kmap_local > > - remove redundant flush_dcache_page() > > - address_space_operations::writepage callback removed > > - add bdev_max_segments() helper > > ---------------------------------------------------------------- > The following changes since commit e0dccc3b76fb35bb257b4118367a883073d7390e: > > Linux 5.19-rc8 (2022-07-24 13:26:27 -0700) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.20-tag > > for you to fetch changes up to 0b078d9db8793b1bd911e97be854e3c964235c78: > > btrfs: don't call btrfs_page_set_checked in finish_compressed_bio_read (2022-07-25 19:56:16 +0200) > > ---------------------------------------------------------------- > BingJing Chang (2): > btrfs: send: introduce recorded_ref_alloc and recorded_ref_free > btrfs: send: fix sending link commands for existing file paths > > Christoph Hellwig (37): > btrfs: factor out a helper to end a single sector buffer I/O > btrfs: refactor end_bio_extent_readpage code flow > btrfs: factor out a btrfs_csum_ptr helper > btrfs: use btrfs_bio_for_each_sector in btrfs_check_read_dio_bio > btrfs: move more work into btrfs_end_bioc > btrfs: simplify code flow in btrfs_submit_dio_bio > btrfs: split btrfs_submit_data_bio to read and write parts > btrfs: defer I/O completion based on the btrfs_raid_bio > btrfs: don't double-defer bio completions for compressed reads > btrfs: don't use btrfs_bio_wq_end_io for compressed writes > btrfs: centralize setting REQ_META > btrfs: remove btrfs_end_io_wq > btrfs: factor stripe submission logic out of btrfs_map_bio > btrfs: do not allocate a btrfs_bio for low-level bios > btrfs: don't use bio->bi_private to pass the inode to submit_one_bio > btrfs: merge end_write_bio and flush_write_bio > btrfs: pass the btrfs_bio_ctrl to submit_one_bio > btrfs: stop looking at btrfs_bio->iter in index_one_bio > btrfs: split discard handling out of btrfs_map_block > btrfs: remove the finish_func argument to btrfs_mark_ordered_io_finished > btrfs: increase direct io read size limit to 256 sectors > btrfs: remove extent writepage address space operation > btrfs: raid56: use fixed stripe length everywhere > btrfs: do not return errors from btrfs_map_bio > btrfs: do not return errors from raid56_parity_write > btrfs: do not return errors from raid56_parity_recover > btrfs: raid56: transfer the bio counter reference to the raid submission helpers > btrfs: simplify sync/async submission in btrfs_submit_data_write_bio > btrfs: handle allocation failure in btrfs_wq_submit_bio gracefully > btrfs: do not return errors from btrfs_submit_dio_bio > btrfs: merge btrfs_dev_stat_print_on_error with its only caller > btrfs: repair all known bad mirrors > btrfs: simplify the pending I/O counting in struct compressed_bio > btrfs: pass a btrfs_bio to btrfs_repair_one_sector > btrfs: remove the start argument to check_data_csum and export > btrfs: fix repair of compressed extents > btrfs: don't call btrfs_page_set_checked in finish_compressed_bio_read > > David Sterba (30): > btrfs: fix typos in comments > btrfs: remove redundant calls to flush_dcache_page > btrfs: remove redundant check in up check_setget_bounds > btrfs: sysfs: advertise zoned support among features > btrfs: open code rbtree search in split_state > btrfs: open code rbtree search in insert_state > btrfs: lift start and end parameters to callers of insert_state > btrfs: pass bits by value not by pointer for extent_state helpers > btrfs: add fast path for extent_state insertion > btrfs: remove node and parent parameters from insert_state > btrfs: open code inexact rbtree search in tree_search > btrfs: make tree search for insert more generic and use it for tree_search > btrfs: unify tree search helper returning prev and next nodes > btrfs: call inode_to_path directly and drop indirection > btrfs: simplify parameters of backref iterators > btrfs: sink iterator parameter to btrfs_ioctl_logical_to_ino > btrfs: remove unused typedefs get_extent_t and btrfs_work_func_t > btrfs: send: drop __KERNEL__ ifdef from send.h > btrfs: send: simplify includes > btrfs: send: remove old TODO regarding ERESTARTSYS > btrfs: send: use boolean types for current inode status > btrfs: send: add OTIME as utimes attribute for proto 2+ by default > btrfs: send: add new command FILEATTR for file attributes > btrfs: print checksum type and implementation at mount time > btrfs: use mask for all RAID1* profiles in btrfs_calc_avail_data_space > btrfs: merge calculations for simple striped profiles in btrfs_rmap_block > btrfs: clean up chained assignments > btrfs: switch btrfs_block_rsv::full to bool > btrfs: switch btrfs_block_rsv::failfast to bool > btrfs: use enum for btrfs_block_rsv::type > > Fabio M. De Francesco (7): > btrfs: replace kmap() with kmap_local_page() in inode.c > btrfs: replace kmap() with kmap_local_page() in lzo.c > highmem: Make __kunmap_{local,atomic}() take const void pointer > btrfs: zstd: replace kmap() with kmap_local_page() > btrfs: zlib: replace kmap() with kmap_local_page() in zlib_compress_pages() > btrfs: zlib: replace kmap() with kmap_local_page() in zlib_decompress_bio() > btrfs: replace kmap_atomic() with kmap_local_page() > > Fanjun Kong (1): > btrfs: use PAGE_ALIGNED instead of IS_ALIGNED > > Filipe Manana (18): > btrfs: balance btree dirty pages and delayed items after a rename > btrfs: free the path earlier when creating a new inode > btrfs: balance btree dirty pages and delayed items after clone and dedupe > btrfs: add assertions when deleting batches of delayed items > btrfs: deal with deletion errors when deleting delayed items > btrfs: refactor the delayed item deletion entry point > btrfs: improve batch deletion of delayed dir index items > btrfs: assert that delayed item is a dir index item when adding it > btrfs: improve batch insertion of delayed dir index items > btrfs: do not BUG_ON() on failure to reserve metadata for delayed item > btrfs: set delayed item type when initializing it > btrfs: reduce amount of reserved metadata for delayed item insertion > btrfs: remove the inode cache check at btrfs_is_free_space_inode() > btrfs: don't fallback to buffered IO for NOWAIT direct IO writes > btrfs: set the objectid of the btree inode's location key > btrfs: add optimized btrfs_ino() version for 64 bits systems > btrfs: send: always use the rbtree based inode ref management infrastructure > btrfs: join running log transaction when logging new name > > Ioannis Angelakopoulos (2): > btrfs: collect commit stats, count, duration > btrfs: sysfs: export commit stats > > Johannes Thumshirn (1): > btrfs: add tracepoints for ordered extents > > Josef Bacik (3): > btrfs: do not batch insert non-consecutive dir indexes during log replay > btrfs: tree-log: make the return value for log syncing consistent > btrfs: reset block group chunk force if we have to wait > > Naohiro Aota (17): > btrfs: ensure pages are unlocked on cow_file_range() failure > btrfs: extend btrfs_cleanup_ordered_extents for NULL locked_page > btrfs: fix error handling of fallback uncompress write > btrfs: replace unnecessary goto with direct return at cow_file_range() > block: add bdev_max_segments() helper > btrfs: zoned: revive max_zone_append_bytes > btrfs: replace BTRFS_MAX_EXTENT_SIZE with fs_info->max_extent_size > btrfs: convert count_max_extents() to use fs_info->max_extent_size > btrfs: use fs_info->max_extent_size in get_extent_max_capacity() > btrfs: let can_allocate_chunk return error > btrfs: zoned: finish least available block group on data bg allocation > btrfs: zoned: introduce space_info->active_total_bytes > btrfs: zoned: disable metadata overcommit for zoned > btrfs: zoned: activate metadata block group on flush_space > btrfs: zoned: activate necessary block group > btrfs: zoned: write out partially allocated region > btrfs: zoned: wait until zone is finished when allocation didn't progress > > Nikolay Borisov (9): > btrfs: introduce btrfs_try_lock_balance > btrfs: use btrfs_try_lock_balance in btrfs_ioctl_balance > btrfs: batch up release of reserved metadata for delayed items used for deletion > btrfs: properly flag filesystem with BTRFS_FEATURE_INCOMPAT_BIG_METADATA > btrfs: don't print 'flagging with big metadata' anymore on mount > btrfs: don't print 'has skinny extents' anymore on mount > btrfs: sysfs: remove MIXED_BACKREF feature file > btrfs: sysfs: remove BIG_METADATA feature files > btrfs: simplify error handling in btrfs_lookup_dentry > > Omar Sandoval (7): > btrfs: send: remove unused send_ctx::{total,cmd}_send_size > btrfs: send: explicitly number commands and attributes > btrfs: send: add stream v2 definitions > btrfs: send: write larger chunks when using stream v2 > btrfs: send: get send buffer pages for protocol v2 > btrfs: send: send compressed extents with encoded writes > btrfs: send: enable support for stream v2 and compressed writes > > Pankaj Raghav (1): > btrfs: zoned: fix comment description for sb_write_pointer logic > > Qu Wenruo (25): > btrfs: quit early if the fs has no RAID56 support for raid56 related checks > btrfs: introduce a data checksum checking helper > btrfs: remove duplicated parameters from submit_data_read_repair() > btrfs: add a helper to iterate through a btrfs_bio with sector sized chunks > btrfs: use integrated bitmaps for btrfs_raid_bio::dbitmap and finish_pbitmap > btrfs: use integrated bitmaps for scrub_parity::dbitmap and ebitmap > btrfs: only write the sectors in the vertical stripe which has data stripes > btrfs: update stripe_sectors::uptodate in steal_rbio > btrfs: add trace event for submitted RAID56 bio > btrfs: make btrfs_super_block::log_root_transid deprecated > btrfs: reject log replay if there is unsupported RO compat flag > btrfs: raid56: avoid double for loop inside finish_rmw() > btrfs: raid56: avoid double for loop inside __raid56_parity_recover() > btrfs: raid56: avoid double for loop inside alloc_rbio_essential_pages() > btrfs: raid56: avoid double for loop inside raid56_rmw_stripe() > btrfs: raid56: avoid double for loop inside raid56_parity_scrub_stripe() > btrfs: remove parameter dev_extent_len from scrub_stripe() > btrfs: use btrfs_chunk_max_errors() to replace tolerance calculation > btrfs: use btrfs_raid_array to calculate number of parity stripes > btrfs: use ncopies from btrfs_raid_array in btrfs_num_copies() > btrfs: use named constant for reserved device space > btrfs: warn about dev extents that are inside the reserved range > btrfs: raid56: don't trust any cached sector in __raid56_parity_recover() > btrfs: output mirror number for bad metadata > btrfs: return proper mapped length for RAID56 profiles in __btrfs_map_block() > > Stefan Roesch (3): > btrfs: store chunk size in space-info struct > btrfs: sysfs: export chunk size in space infos > btrfs: sysfs: add force_chunk_alloc trigger to force allocation > > arch/parisc/include/asm/cacheflush.h | 6 +- > arch/parisc/kernel/cache.c | 2 +- > fs/btrfs/async-thread.h | 1 - > fs/btrfs/backref.c | 88 ++-- > fs/btrfs/backref.h | 3 +- > fs/btrfs/block-group.c | 34 +- > fs/btrfs/block-rsv.c | 21 +- > fs/btrfs/block-rsv.h | 15 +- > fs/btrfs/btrfs_inode.h | 25 +- > fs/btrfs/compression.c | 359 ++++---------- > fs/btrfs/compression.h | 18 +- > fs/btrfs/ctree.h | 105 ++++- > fs/btrfs/delalloc-space.c | 6 +- > fs/btrfs/delayed-inode.c | 395 +++++++++++----- > fs/btrfs/delayed-inode.h | 11 + > fs/btrfs/delayed-ref.c | 4 +- > fs/btrfs/dev-replace.c | 3 +- > fs/btrfs/disk-io.c | 268 ++++------- > fs/btrfs/disk-io.h | 17 +- > fs/btrfs/extent-tree.c | 149 +++--- > fs/btrfs/extent_io.c | 873 ++++++++++++++++------------------- > fs/btrfs/extent_io.h | 15 +- > fs/btrfs/file.c | 29 +- > fs/btrfs/free-space-cache.c | 3 +- > fs/btrfs/inode.c | 764 +++++++++++++++--------------- > fs/btrfs/ioctl.c | 150 +++--- > fs/btrfs/lzo.c | 28 +- > fs/btrfs/ordered-data.c | 40 +- > fs/btrfs/ordered-data.h | 5 +- > fs/btrfs/raid56.c | 792 +++++++++++++++---------------- > fs/btrfs/raid56.h | 168 ++++++- > fs/btrfs/reflink.c | 19 +- > fs/btrfs/scrub.c | 71 ++- > fs/btrfs/send.c | 781 +++++++++++++++++++++---------- > fs/btrfs/send.h | 169 ++++--- > fs/btrfs/space-info.c | 110 ++++- > fs/btrfs/space-info.h | 8 +- > fs/btrfs/struct-funcs.c | 11 +- > fs/btrfs/subpage.c | 4 +- > fs/btrfs/super.c | 36 +- > fs/btrfs/sysfs.c | 186 +++++++- > fs/btrfs/tests/btrfs-tests.c | 1 + > fs/btrfs/tests/extent-buffer-tests.c | 3 +- > fs/btrfs/transaction.c | 26 +- > fs/btrfs/tree-log.c | 29 +- > fs/btrfs/tree-log.h | 3 + > fs/btrfs/volumes.c | 362 +++++++-------- > fs/btrfs/volumes.h | 46 +- > fs/btrfs/zlib.c | 42 +- > fs/btrfs/zoned.c | 131 +++++- > fs/btrfs/zoned.h | 18 + > fs/btrfs/zstd.c | 33 +- > include/linux/blkdev.h | 5 + > include/linux/highmem-internal.h | 10 +- > include/trace/events/btrfs.h | 158 +++++++ > include/uapi/linux/btrfs.h | 10 +- > mm/highmem.c | 2 +- > 57 files changed, 3842 insertions(+), 2829 deletions(-)
On Tue, Aug 02, 2022 at 09:11:09AM +0800, Qu Wenruo wrote: > On 2022/8/2 00:40, David Sterba wrote: > > Notable fixes: > > > > - raid56 > > - reduce parity writes, skip sectors of stripe when there are no data > > updates > > - restore reading from stripe cache instead of triggering new read > > Small typo I guess. > It's the opposite, restore reading from on-disk data instead of using > stripe cache. Thanks. Linus, please replace the sentence with - restore reading from on-disk data instead of using stripe cache, this reduces chances to damage correct data due to RMW cycle
The pull request you sent on Mon, 1 Aug 2022 18:40:03 +0200:
> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-5.20-tag
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/353767e4aaeb7bc818273dfacbb01dd36a9db47a
Thank you!