Message ID | 20230518150105.3160445-1-mcgrof@kernel.org (mailing list archive) |
---|---|
State | Deferred, archived |
Headers | show |
Series | [v2] Documentation: add initial iomap kdoc | expand |
On 5/18/23 08:01, Luis Chamberlain wrote: > To help with iomap adoption / porting I set out the goal to try to > help improve the iomap documentation and get general guidance for > filesystem conversions over from buffer-head in time for this year's > LSFMM. The end results thanks to the review of Darrick, Christoph and > others is on the kernelnewbies wiki [0]. > > This brings this forward a relevant subset of that documentation to > the kernel in kdoc format and also kdoc'ifies the existing documentation > on iomap.h. > > Tested with: > > make htmldocs SPHINXDIRS="filesystems" > > Then looking at the docs produced on: > > Documentation/output/filesystems/iomap.html > > [0] https://kernelnewbies.org/KernelProjects/iomap > > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> > --- > > Changes on v2: > > * use 80 char length as if we're in the 1980's Well, like Jon said, long lines are difficult to read, even on printed paper. That's (at least one reason) why newspapers(!) have narrow columns of print. Anyway, thanks for doing it. > > Documentation/filesystems/index.rst | 1 + > Documentation/filesystems/iomap.rst | 253 +++++++++++++++++++++ > include/linux/iomap.h | 336 ++++++++++++++++++---------- > 3 files changed, 468 insertions(+), 122 deletions(-) > create mode 100644 Documentation/filesystems/iomap.rst > > diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst > index fbb2b5ada95b..6186ab7c3ea8 100644 > --- a/Documentation/filesystems/index.rst > +++ b/Documentation/filesystems/index.rst > @@ -34,6 +34,7 @@ algorithms work. > seq_file > sharedsubtree > idmappings > + iomap > > automount-support > > diff --git a/Documentation/filesystems/iomap.rst b/Documentation/filesystems/iomap.rst > new file mode 100644 > index 000000000000..be487030fcff > --- /dev/null > +++ b/Documentation/filesystems/iomap.rst > @@ -0,0 +1,253 @@ > +.. SPDX-License-Identifier: GPL-2.0 > +.. _iomap: > + > +.. > + Mapping of heading styles within this document: > + Heading 1 uses "====" above and below > + Heading 2 uses "====" > + Heading 3 uses "----" > + Heading 4 uses "````" > + Heading 5 uses "^^^^" > + Heading 6 uses "~~~~" > + Heading 7 uses "...." > + > + Sections are manually numbered because apparently that's what everyone > + does in the kernel. > +.. contents:: Table of Contents > + :local: > + > +===== > +iomap > +===== > + > +.. kernel-doc:: include/linux/iomap.h > + > +A modern block abstraction > +========================== > + > +**iomap** allows filesystems to query storage media for data using *byte > +ranges*. Since block mapping are provided for a *byte ranges* for cache data in mappings > +memory, in the page cache, naturally this implies operations on block ranges > +will also deal with *multipage* operations in the page cache. **Folios** are > +used to help provide *multipage* operations in memory for the *byte ranges* > +being worked on. > + > + > +iomap IO interfaces > +=================== > + > +You call **iomap** depending on the type of filesystem operation you are working > +on. We detail some of these interactions below. > + > +iomap for bufferred IO writes buffered > +----------------------------- > + > +You call **iomap** for buffered IO with: > + > + * ``iomap_file_buffered_write()`` - for buffered writes > + * ``iomap_page_mkwrite()`` - when dealing callbacks for dealing with callbacks for > + ``struct vm_operations_struct`` > + > + * ``struct vm_operations_struct.page_mkwrite()`` > + * ``struct vm_operations_struct.fault()`` > + * ``struct vm_operations_struct.huge_fault()`` > + * ``struct vm_operations_struct`.pfn_mkwrite()`` What are these 4 structs for? Or why the blank line above them? Confusing. > + > +You *may* use buffered writes to also deal with ``fallocate()``: > + > + * ``iomap_zero_range()`` on fallocate for zeroing > + * ``iomap_truncate_page()`` on fallocate for truncation > + > +Typically you'd also happen to use these on paths when updating an inode's size. > + > +iomap for direct IO > +------------------- > + > +You call **iomap** for direct IO with: > + > + * ``iomap_dio_rw()`` > + > +You **may** use direct IO writes to also deal with ``fallocate()``: > + > + * ``iomap_zero_range()`` on fallocate for zeroing > + * ``iomap_truncate_page()`` on fallocate for truncation > + > +Typically you'd also happen to use these on paths when updating an inode's size. also use these on paths ... > + > +iomap for reads > +--------------- > + > +You can call into **iomap** for reading, ie, dealing with the filesystems's i.e., > +``struct file_operations``: > + > + * ``struct file_operations.read_iter()``: note that depending on the type of > + read your filesystem might use ``iomap_dio_rw()`` for direct IO, > + generic_file_read_iter() for buffered IO and > + ``dax_iomap_rw()`` for DAX. > + * ``struct file_operations.remap_file_range()`` - currently the special > + ``dax_remap_file_range_prep()`` helper is provided for DAX mode reads. > + > +iomap for userspace file extent mapping > +--------------------------------------- > + > +The ``fiemap`` ioctl can be used to allow userspace to get a file extent > +mapping. The older ``bmap()`` (aka ``FIBMAP``) allows the VM to map logical > +block offset to physical block number. ``bmap()`` is a legacy block mapping > +operation supported only for the ioctl and two areas in the kernel which likely > +are broken (the default swapfile implementation and odd md bitmap code). > +``bmap()`` was only useful in the days of ext2 when there were no support for > +delalloc or unwritten extents. Consequently, the interface reports nothing for > +those types of mappings. Because of this we don't want filesystems to start > +exporting this interface if they don't already do so. > + > +The ``fiemap`` ioctl is supported through an inode ``struct > +inode_operations.fiemap()`` callback. > + > +You would use ``iomap_fiemap()`` to provide the mapping. You could use two > +seperate ``struct iomap_ops`` one for when requested to also map extended separate > +attributes (``FIEMAP_FLAG_XATTR``) and your another ``struct iomap_ops`` for yet > +regular read ``struct iomap_ops`` when there is no need for extended attributes. > +In the future **iomap** may provide its own dedicated ops structure for > +``fiemap``. > + > +``iomap_bmap()`` exists and should *only be used* by filesystems that > +**already** supported ``FIBMAP``. ``FIBMAP`` **should not be used** with the support > +address_space -- we have iomap readpages and writepages for that. > + > +iomap for assisting the VFS > +--------------------------- > + > +A filesystem also needs to call **iomap** when assisting the VFS manipulating a > +file into the page cache. > + > +iomap for VFS reading > +--------------------- > + > +A filesystem can call **iomap** to deal with the VFS reading a file into folios > +with: > + > + * ``iomap_bmap()`` - called to assist the VFS when manipulating page cache with > + ``struct address_space_operations.bmap()``, to help the VFS map a logical > + block offset to physical block number. > + * ``iomap_read_folio()`` - called to assist the page cache with > + ``struct address_space_operations.read_folio()`` > + * ``iomap_readahead()`` - called to assist the page cache with > + ``struct address_space_operations.readahead()`` > + > +iomap for VFS writepages > +------------------------ > + > +A filesystem can call **iomap** to deal with the VFS write out of pages back to > +backing store, that is to help deal with a filesystems's ``struct > +address_space_operations.writepages()``. The special ``iomap_writepages()`` is > +used for this case with its own respective filestems's ``struct iomap_ops`` for > +this. > + > +iomap for VFS llseek > +-------------------- > + > +A filesystem ``struct address_space_operations.llseek()`` is used by the VFS > +when it needs to move the current file offset, the file offset is in ``struct offset; or offset. The > +file.f_pos``. **iomap** has special support for the ``llseek`` ``SEEK_HOLE`` or > +``SEEK_DATA`` interfaces: > + > + * ``iomap_seek_hole()``: for when the > + ``struct address_space_operations.llseek()`` *whence* argument is > + ``SEEK_HOLE``, when looking for the file's next hole. > + * ``iomap_seek_data()``: for when the > + ``struct address_space_operations.llseek()`` *whence* argument isj is > + ``SEEK_DATA`` when looking for the file's next data area. > + > +Your own ``struct iomap_ops`` for this is encouraged. > + > +iomap for DAX > +------------- > +You can use ``dax_iomap_rw()`` when calling iomap from a DAX context, this is context. This is > +typically from the filesystems's ``struct file_operations.write_iter()`` > +callback. > + > +Converting filesystems from buffer-head to iomap guide > +====================================================== > + > +These are generic guidelines on converting a filesystem over to **iomap** from > +'''buffer-heads'''. > + > +One op at at time > +----------------- > + > +You may try to convert a filesystem with different clustered set of operations> +at time, below are a generic order you may strive to target: on at a time. Below is > + > + * direct io IO > + * miscellaneous helpers (seek/fiemap/bmap) > + * buffered io IO > + > +Defining a simple filesystem > +---------------------------- > + > +A simple filesystem is perhaps the easiest to convert over to **iomap**, a **iomap**. A > +simple filesystem is one which: > + > + * does not use fsverify, fscrypt, compression > + * has no Copy on Write support (reflinks) > + > +Converting a simple filesystem to iomap > +--------------------------------------- > + > +Simple filesystems should covert to IOMAP piecemeal wise first converting over wise, first > +**direct IO**, then the miscellaneous helpers (seek/fiemap/bmap) and last No need for **..** on direct IO if not also used on buffered IO below. s/ / / > +should be buffered IO. > + > +Converting shared filesystem features > +------------------------------------- > + > +Shared filesystems features such as fscrypt, compression, erasure coding, and > +any other data transformations need to be ported to **iomap** first, as none of > +the current **iomap** users require any of this functionality. > + > +Converting complex filesystems > +------------------------------ > + > +If your filesystem relies on any shared filesystem features mentioned above above, > +those would need to be converted piecemeal wise. If reflinks are supported you IMO "wise" is not needed here or above when piecemeal is use. > +need to first ensure proper locking sanity in order to be able to address byte > +ranges can be handled properly through **iomap** operations. An example > +filesystem where this work is taking place is btrfs. > + > +IOMAP_F_BUFFER_HEAD considerations > +---------------------------------- > + > +``IOMAP_F_BUFFER_HEAD`` won't be removed until we have all filesystem fully > +converted away from **buffer-heads**, and this could be never. > + > +``IOMAP_F_BUFFER_HEAD`` should be avoided as a stepping stone / to port > +filesystems over to **iomap** as it's support for **buffer-heads** only apply to its applies to > +the buffered write path and nothing else including the read_folio/readahead and > +writepages aops. > + > +Testing Direct IO > +================= > + > +Other than fstests you can use LTP's dio, however this tests is limited as it dio. However, this test > +does not test stale data. > + > +{{{ > +./runltp -f dio -d /mnt1/scratch/tmp/ > +}}} > + > +Known issues and future improvements > +==================================== > + > +We try to document known issues that folks should be aware of with **iomap** here. > + > + * write amplification on IOMAP when bs < ps: **iomap** needs improvements for Is that buffer size < page size? Preferably don't use such cryptic abbreviations. > + large folios for dirty bitmap tracking > + * filesystems which use buffer head helpers such as ``sb_bread()`` and friends > + will have to continue to use buffer heads as there is no generic iomap > + metadata read/write library yet. > + > +References > +========== > + > + * `Presentation on iomap evolution`<https://docs.google.com/presentation/d/e/2PACX-1vSN4TmhiTu1c6HNv6_gJZFqbFZpbF7GkABllSwJw5iLnSYKkkO-etQJ3AySYEbgJA/pub?start=true&loop=false&delayms=3000&slide=id.g189cfd05063_0_185>` > + * `LWN review on deprecating buffer-heads <https://lwn.net/Articles/930173/>` > diff --git a/include/linux/iomap.h b/include/linux/iomap.h > index e2b836c2e119..ee4b026995ac 100644 > --- a/include/linux/iomap.h > +++ b/include/linux/iomap.h > @@ -10,6 +10,30 @@ > #include <linux/mm_types.h> > #include <linux/blkdev.h> > > +/** > + * DOC: Introduction > + * > + * iomap allows filesystems to sequentially iterate over byte addressable block > + * ranges on an inode and apply operations to it. > + * > + * iomap grew out of the need to provide a modern block mapping abstraction for > + * filesystems with the different IO access methods they support and assisting > + * the VFS with manipulating files into the page cache. iomap helpers are > + * provided for each of these mechanisms. However, block mapping is just one of > + * the features of iomap, given iomap supports DAX IO for filesystems and also > + * supports such the ``lseek``/``llseek`` ``SEEK_DATA``/``SEEK_HOLE`` > + * interfaces. > + * > + * Block mapping provides a mapping between data cached in memory and the > + * location on persistent storage where that data lives. `LWN has an great > + * review of the old buffer-heads block-mapping and why they are inefficient > + * <https://lwn.net/Articles/930173/>`, since the inception of Linux. Since > + * **buffer-heads** work on a 512-byte block based paradigm, it creates an block-based > + * overhead for modern storage media which no longer necessarily works only on > + * 512-blocks. iomap is flexible providing block ranges in *bytes*. iomap, with 512-byte blocks. > + * the support of folios, provides a modern replacement for **buffer-heads**. > + */ > + > struct address_space; > struct fiemap_extent_info; > struct inode; > @@ -22,37 +46,43 @@ struct page; > struct vm_area_struct; > struct vm_fault; > > -/* > - * Types of block ranges for iomap mappings: > +/** > + * DOC: iomap block ranges types > + * > + * * IOMAP_HOLE - no blocks allocated, need allocation > + * * IOMAP_DELALLOC - delayed allocation blocks > + * * IOMAP_MAPPED - blocks allocated at @addr > + * * IOMAP_UNWRITTEN - blocks allocated at @addr in unwritten state > + * * IOMAP_INLINE - data inline in the inode > */ > -#define IOMAP_HOLE 0 /* no blocks allocated, need allocation */ > -#define IOMAP_DELALLOC 1 /* delayed allocation blocks */ > -#define IOMAP_MAPPED 2 /* blocks allocated at @addr */ > -#define IOMAP_UNWRITTEN 3 /* blocks allocated at @addr in unwritten state */ > -#define IOMAP_INLINE 4 /* data inline in the inode */ > +#define IOMAP_HOLE 0 > +#define IOMAP_DELALLOC 1 > +#define IOMAP_MAPPED 2 > +#define IOMAP_UNWRITTEN 3 > +#define IOMAP_INLINE 4 > > -/* > - * Flags reported by the file system from iomap_begin: > +/** > + * DOC: Flags reported by the file system from iomap_begin > * > - * IOMAP_F_NEW indicates that the blocks have been newly allocated and need > - * zeroing for areas that no data is copied to. > + * * IOMAP_F_NEW: indicates that the blocks have been newly allocated and need > + * zeroing for areas that no data is copied to. > * > - * IOMAP_F_DIRTY indicates the inode has uncommitted metadata needed to access > - * written data and requires fdatasync to commit them to persistent storage. > - * This needs to take into account metadata changes that *may* be made at IO > - * completion, such as file size updates from direct IO. > + * * IOMAP_F_DIRTY: indicates the inode has uncommitted metadata needed to access > + * written data and requires fdatasync to commit them to persistent storage. > + * This needs to take into account metadata changes that *may* be made at IO > + * completion, such as file size updates from direct IO. > * > - * IOMAP_F_SHARED indicates that the blocks are shared, and will need to be > - * unshared as part a write. > + * * IOMAP_F_SHARED: indicates that the blocks are shared, and will need to be > + * unshared as part a write. > * > - * IOMAP_F_MERGED indicates that the iomap contains the merge of multiple block > - * mappings. > + * * IOMAP_F_MERGED: indicates that the iomap contains the merge of multiple > + * block mappings. > * > - * IOMAP_F_BUFFER_HEAD indicates that the file system requires the use of > - * buffer heads for this mapping. > + * * IOMAP_F_BUFFER_HEAD: indicates that the file system requires the use of > + * buffer heads for this mapping. > * > - * IOMAP_F_XATTR indicates that the iomap is for an extended attribute extent > - * rather than a file data extent. > + * * IOMAP_F_XATTR: indicates that the iomap is for an extended attribute extent > + * rather than a file data extent. > */ > #define IOMAP_F_NEW (1U << 0) > #define IOMAP_F_DIRTY (1U << 1) > @@ -61,22 +91,20 @@ struct vm_fault; > #define IOMAP_F_BUFFER_HEAD (1U << 4) > #define IOMAP_F_XATTR (1U << 5) > > -/* > - * Flags set by the core iomap code during operations: > +/** > + * DOC: Flags set by the core iomap code during operations > + * > + * * IOMAP_F_SIZE_CHANGED: indicates to the iomap_end method that the file size > + * has changed as the result of this write operation. > * > - * IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size > - * has changed as the result of this write operation. > + * * IOMAP_F_STALE: indicates that the iomap is not valid any longer and the file > + * range it covers needs to be remapped by the high level before the > + * operation can proceed. > * > - * IOMAP_F_STALE indicates that the iomap is not valid any longer and the file > - * range it covers needs to be remapped by the high level before the operation > - * can proceed. > + * * IOMAP_F_PRIVATE: Flags from 0x1000 up are for file system specific usage > */ > #define IOMAP_F_SIZE_CHANGED (1U << 8) > #define IOMAP_F_STALE (1U << 9) > - > -/* > - * Flags from 0x1000 up are for file system specific usage: > - */ > #define IOMAP_F_PRIVATE (1U << 12) > > > @@ -124,73 +152,119 @@ static inline bool iomap_inline_data_valid(const struct iomap *iomap) > return iomap->length <= PAGE_SIZE - offset_in_page(iomap->inline_data); > } > > -/* > - * When a filesystem sets folio_ops in an iomap mapping it returns, get_folio > - * and put_folio will be called for each folio written to. This only applies > - * to buffered writes as unbuffered writes will not typically have folios > - * associated with them. > - * > - * When get_folio succeeds, put_folio will always be called to do any > - * cleanup work necessary. put_folio is responsible for unlocking and putting > - * @folio. > +/** > + * struct iomap_folio_ops - buffered writes folio folio reference count helpers double folio? > + * > + * A filesystem can optionally set folio_ops in a &struct iomap mapping it > + * returns to override the default get_folio and put_folio for each folio > + * written to. This only applies to buffered writes as unbuffered writes will > + * not typically have folios associated with them. > + * > + * @get_folio: iomap defaults to iomap_get_folio() (which calls > + * __filemap_get_folio()) if the filesystem did not provide a get folio op. > + * > + * @put_folio: when get_folio succeeds, put_folio will always be called to do > + * any cleanup work necessary. put_folio is responsible for unlocking and > + * putting @folio. > + * > + * @iomap_valid: check that the cached iomap still maps correctly to the > + * filesystem's internal extent map. FS internal extent maps can change > + * while iomap is iterating a cached iomap, so this hook allows iomap to > + * detect that the iomap needs to be refreshed during a long running write operation. > + * > + * The filesystem can store internal state (e.g. a sequence number) in > + * iomap->validity_cookie when the iomap is first mapped to be able to > + * detect changes between mapping time and whenever .iomap_valid() is > + * called. > + * > + * This is called with the folio over the specified file position held > + * locked by the iomap code. This is useful for filesystems that have > + * dynamic mappings (e.g. anything other than zonefs). An example reason > + * as to why this is necessary is writeback doesn't take the vfs locks. VFS > */ > struct iomap_folio_ops { > struct folio *(*get_folio)(struct iomap_iter *iter, loff_t pos, > unsigned len); > void (*put_folio)(struct inode *inode, loff_t pos, unsigned copied, > struct folio *folio); > - > - /* > - * Check that the cached iomap still maps correctly to the filesystem's > - * internal extent map. FS internal extent maps can change while iomap > - * is iterating a cached iomap, so this hook allows iomap to detect that > - * the iomap needs to be refreshed during a long running write > - * operation. > - * > - * The filesystem can store internal state (e.g. a sequence number) in > - * iomap->validity_cookie when the iomap is first mapped to be able to > - * detect changes between mapping time and whenever .iomap_valid() is > - * called. > - * > - * This is called with the folio over the specified file position held > - * locked by the iomap code. > - */ > bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap); > }; > > -/* > - * Flags for iomap_begin / iomap_end. No flag implies a read. > +/** > + * DOC: Flags for iomap_begin / iomap_end. No flag implies a read. > + * > + * * IOMAP_WRITE: writing, must allocate blocks > + * * IOMAP_ZERO: zeroing operation, may skip holes > + * * IOMAP_REPORT: report extent status, e.g. FIEMAP > + * * IOMAP_FAULT: mapping for page fault > + * * IOMAP_DIRECT: direct I/O > + * * IOMAP_NOWAIT: do not block > + * * IOMAP_OVERWRITE_ONLY: only pure overwrites allowed > + * * IOMAP_UNSHARE: unshare_file_range > + * * IOMAP_DAX: DAX mapping > */ > -#define IOMAP_WRITE (1 << 0) /* writing, must allocate blocks */ > -#define IOMAP_ZERO (1 << 1) /* zeroing operation, may skip holes */ > -#define IOMAP_REPORT (1 << 2) /* report extent status, e.g. FIEMAP */ > -#define IOMAP_FAULT (1 << 3) /* mapping for page fault */ > -#define IOMAP_DIRECT (1 << 4) /* direct I/O */ > -#define IOMAP_NOWAIT (1 << 5) /* do not block */ > -#define IOMAP_OVERWRITE_ONLY (1 << 6) /* only pure overwrites allowed */ > -#define IOMAP_UNSHARE (1 << 7) /* unshare_file_range */ > +#define IOMAP_WRITE (1 << 0) > +#define IOMAP_ZERO (1 << 1) > +#define IOMAP_REPORT (1 << 2) > +#define IOMAP_FAULT (1 << 3) > +#define IOMAP_DIRECT (1 << 4) > +#define IOMAP_NOWAIT (1 << 5) > +#define IOMAP_OVERWRITE_ONLY (1 << 6) > +#define IOMAP_UNSHARE (1 << 7) > #ifdef CONFIG_FS_DAX > -#define IOMAP_DAX (1 << 8) /* DAX mapping */ > +#define IOMAP_DAX (1 << 8) > #else > #define IOMAP_DAX 0 > #endif /* CONFIG_FS_DAX */ > > +/** > + * struct iomap_ops - IO interface specific operations > + * > + * A filesystem is must provide a &struct iomap_ops for to deal with the s/is // s/for // s/the// > + * beginning an IO operation, iomap_begin(), and ending an IO operation on as a > + * block range, ``iomap_end()``. You would call iomap with a specialized iomap > + * operation depending on its filesystem or the VFS needs. > + * > + * For example iomap_dio_rw() would be used for for a filesystem when doing a double for checkpatch should have caught that. > + * block range read or write operation with direct IO. In this case your > + * filesystem's respective &struct file_operations.write_iter() would eventually > + * call iomap_dio_rw() on the filesystem's &struct file_operations.write_iter(). > + * > + * For buffered IO a filesystem would use iomap_file_buffered_write() on the > + * same &struct file_operations.write_iter(). But that is not the only situation > + * in which a filesystem would deal with buffered writes, you could also use writes. You > + * buffered writes when a filesystem has to deal with &struct > + * file_operations.fallocate(). However fallocate() can be used for *zeroing* or However, > + * for *truncation* purposes. A special respective iomap_zero_range() would be > + * used for *zeroing* and a iomap_truncate_page() would be used for and an > + * *truncation*. > + * > + * Experience with adopting iomap on filesystems have shown that the filesystem has > + * implementation of these operations can be simplified considerably if one > + * &struct iomap_ops is provided per major filesystem IO operation: > + * > + * * buffered io IO > + * * direct io IO > + * * DAX io IO > + * * fiemap for with extended attributes (``FIEMAP_FLAG_XATTR``) > + * * lseek > + * > + * @iomap_begin: return the existing mapping at pos, or reserve space starting > + * at pos for up to length, as long as we can do it as a single mapping. The > + * actual length is returned in iomap->length. The &struct iomap iomap must > + * always be set. The &struct iomap srcmap should be set if the range is > + * CoW. > + * > + * @iomap_end: commit and/or unreserve space previous allocated using > + * iomap_begin. Written indicates the length of the successful write > + * operation which needs to be committed, while the rest needs to be > + * unreserved. Written might be zero if no data was written. > + */ > struct iomap_ops { > - /* > - * Return the existing mapping at pos, or reserve space starting at > - * pos for up to length, as long as we can do it as a single mapping. > - * The actual length is returned in iomap->length. > - */ > int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length, > unsigned flags, struct iomap *iomap, > struct iomap *srcmap); > > - /* > - * Commit and/or unreserve space previous allocated using iomap_begin. > - * Written indicates the length of the successful write operation which > - * needs to be commited, while the rest needs to be unreserved. > - * Written might be zero if no data was written. > - */ > int (*iomap_end)(struct inode *inode, loff_t pos, loff_t length, > ssize_t written, unsigned flags, struct iomap *iomap); > }; > @@ -207,6 +281,7 @@ struct iomap_ops { > * @flags: Zero or more of the iomap_begin flags above. > * @iomap: Map describing the I/O iteration > * @srcmap: Source map for COW operations > + * @private: internal use > */ > struct iomap_iter { > struct inode *inode; > @@ -241,7 +316,7 @@ static inline u64 iomap_length(const struct iomap_iter *iter) > * @i: iteration structure > * > * Write operations on file systems with reflink support might require a > - * source and a destination map. This function retourns the source map > + * source and a destination map. This function returns the source map > * for a given operation, which may or may no be identical to the destination > * map in &i->iomap. > */ > @@ -281,42 +356,52 @@ loff_t iomap_seek_data(struct inode *inode, loff_t offset, > sector_t iomap_bmap(struct address_space *mapping, sector_t bno, > const struct iomap_ops *ops); > > -/* > - * Structure for writeback I/O completions. > +/** > + * struct iomap_ioend - for writeback I/O completions > + * > + * @io_list: next ioend in chain > + * @io_type: missing field description. > + * @io_flags: IOMAP_F_* > + * @io_folios: folios added to ioend > + * @io_inode: file being written to > + * @io_size: size of the extent > + * @io_offset: offset in the file > + * @io_sector: start sector of ioend > + * @io_bio: bio being built > + * @io_inline_bio: MUST BE LAST! > */ > struct iomap_ioend { > - struct list_head io_list; /* next ioend in chain */ > + struct list_head io_list; > u16 io_type; > - u16 io_flags; /* IOMAP_F_* */ > - u32 io_folios; /* folios added to ioend */ > - struct inode *io_inode; /* file being written to */ > - size_t io_size; /* size of the extent */ > - loff_t io_offset; /* offset in the file */ > - sector_t io_sector; /* start sector of ioend */ > - struct bio *io_bio; /* bio being built */ > - struct bio io_inline_bio; /* MUST BE LAST! */ > + u16 io_flags; > + u32 io_folios; > + struct inode *io_inode; > + size_t io_size; > + loff_t io_offset > + sector_t io_sector; > + struct bio *io_bio; > + struct bio io_inline_bio; > }; > > +/** > + * struct iomap_writeback_ops - used for writeback > + * > + * This structure is used to support dealing with a filesystem > + * ``struct address_space_operations.writepages()``, for writeback. > + * > + * @map_blocks: required, maps the blocks so that writeback can be performed on > + * the range starting at offset. > + * @prepare_ioend: optional, allows the file systems to perform actions just > + * before submitting the bio and/or override the bio end_io handler for > + * complex operations like copy on write extent manipulation or unwritten > + * extent conversions. > + * @discard_folio: optional, allows the file system to discard state on a page where > + * we failed to submit any I/O. > + */ > struct iomap_writeback_ops { > - /* > - * Required, maps the blocks so that writeback can be performed on > - * the range starting at offset. > - */ > int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode, > loff_t offset); > - > - /* > - * Optional, allows the file systems to perform actions just before > - * submitting the bio and/or override the bio end_io handler for complex > - * operations like copy on write extent manipulation or unwritten extent > - * conversions. > - */ > int (*prepare_ioend)(struct iomap_ioend *ioend, int status); > - > - /* > - * Optional, allows the file system to discard state on a page where > - * we failed to submit any I/O. > - */ > void (*discard_folio)(struct folio *folio, loff_t pos); > }; > > @@ -334,26 +419,33 @@ int iomap_writepages(struct address_space *mapping, > struct writeback_control *wbc, struct iomap_writepage_ctx *wpc, > const struct iomap_writeback_ops *ops); > > -/* > - * Flags for direct I/O ->end_io: > +/** > + * DOC: Flags for direct I/O ->end_io > + * > + * * IOMAP_DIO_UNWRITTEN: covers unwritten extent(s) > + * * IOMAP_DIO_COW: covers COW extent(s) > */ > -#define IOMAP_DIO_UNWRITTEN (1 << 0) /* covers unwritten extent(s) */ > -#define IOMAP_DIO_COW (1 << 1) /* covers COW extent(s) */ > +#define IOMAP_DIO_UNWRITTEN (1 << 0) > +#define IOMAP_DIO_COW (1 << 1) > > +/** > + * struct iomap_dio_ops - used for direct IO > + * > + * This is used to support direct IO. > + * > + * @end_io: > + * @submit_io: missing field descriptions. > + * @bio_set: Filesystems wishing to attach private information to a direct io IO > + * bio must provide a ->submit_io method that attaches the additional > + * information to the bio and changes the ->bi_end_io callback to a custom > + * function. This function should, at a minimum, perform any relevant > + * post-processing of the bio and end with a call to iomap_dio_bio_end_io. > + */ > struct iomap_dio_ops { > int (*end_io)(struct kiocb *iocb, ssize_t size, int error, > unsigned flags); > void (*submit_io)(const struct iomap_iter *iter, struct bio *bio, > loff_t file_offset); > - > - /* > - * Filesystems wishing to attach private information to a direct io bio > - * must provide a ->submit_io method that attaches the additional > - * information to the bio and changes the ->bi_end_io callback to a > - * custom function. This function should, at a minimum, perform any > - * relevant post-processing of the bio and end with a call to > - * iomap_dio_bio_end_io. > - */ > struct bio_set *bio_set; > }; > Thanks for doing all this work.
Hi Luis, kernel test robot noticed the following build errors: [auto build test ERROR on xfs-linux/for-next] [also build test ERROR on lwn/docs-next linus/master v6.4-rc2 next-20230518] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Luis-Chamberlain/Documentation-add-initial-iomap-kdoc/20230518-231530 base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next patch link: https://lore.kernel.org/r/20230518150105.3160445-1-mcgrof%40kernel.org patch subject: [PATCH v2] Documentation: add initial iomap kdoc config: i386-randconfig-a003 compiler: gcc-11 (Debian 11.3.0-12) 11.3.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/4db9a33eed82de4cb27f6e2890110a39bb4859af git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Luis-Chamberlain/Documentation-add-initial-iomap-kdoc/20230518-231530 git checkout 4db9a33eed82de4cb27f6e2890110a39bb4859af # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=i386 olddefconfig make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash fs/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202305190340.ukIywuHk-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from fs/gfs2/bmap.c:13: >> include/linux/iomap.h:381:9: error: expected ':', ',', ';', '}' or '__attribute__' before 'sector_t' 381 | sector_t io_sector; | ^~~~~~~~ -- In file included from fs/iomap/buffered-io.c:9: >> include/linux/iomap.h:381:9: error: expected ':', ',', ';', '}' or '__attribute__' before 'sector_t' 381 | sector_t io_sector; | ^~~~~~~~ fs/iomap/buffered-io.c: In function 'iomap_finish_ioend': >> fs/iomap/buffered-io.c:1323:33: error: 'struct iomap_ioend' has no member named 'io_inline_bio' 1323 | struct bio *bio = &ioend->io_inline_bio; | ^~ >> fs/iomap/buffered-io.c:1324:33: error: 'struct iomap_ioend' has no member named 'io_bio' 1324 | struct bio *last = ioend->io_bio, *next; | ^~ >> fs/iomap/buffered-io.c:1326:30: error: 'struct iomap_ioend' has no member named 'io_offset' 1326 | loff_t offset = ioend->io_offset; | ^~ fs/iomap/buffered-io.c:1330:26: error: 'struct iomap_ioend' has no member named 'io_inline_bio' 1330 | for (bio = &ioend->io_inline_bio; bio; bio = next) { | ^~ fs/iomap/buffered-io.c: In function 'iomap_ioend_can_merge': fs/iomap/buffered-io.c:1397:18: error: 'struct iomap_ioend' has no member named 'io_bio' 1397 | if (ioend->io_bio->bi_status != next->io_bio->bi_status) | ^~ fs/iomap/buffered-io.c:1397:45: error: 'struct iomap_ioend' has no member named 'io_bio' 1397 | if (ioend->io_bio->bi_status != next->io_bio->bi_status) | ^~ fs/iomap/buffered-io.c:1405:18: error: 'struct iomap_ioend' has no member named 'io_offset' 1405 | if (ioend->io_offset + ioend->io_size != next->io_offset) | ^~ fs/iomap/buffered-io.c:1405:54: error: 'struct iomap_ioend' has no member named 'io_offset' 1405 | if (ioend->io_offset + ioend->io_size != next->io_offset) | ^~ >> fs/iomap/buffered-io.c:1417:18: error: 'struct iomap_ioend' has no member named 'io_sector' 1417 | if (ioend->io_sector + (ioend->io_size >> 9) != next->io_sector) | ^~ fs/iomap/buffered-io.c:1417:61: error: 'struct iomap_ioend' has no member named 'io_sector' 1417 | if (ioend->io_sector + (ioend->io_size >> 9) != next->io_sector) | ^~ fs/iomap/buffered-io.c: In function 'iomap_ioend_compare': fs/iomap/buffered-io.c:1446:15: error: 'struct iomap_ioend' has no member named 'io_offset' 1446 | if (ia->io_offset < ib->io_offset) | ^~ fs/iomap/buffered-io.c:1446:31: error: 'struct iomap_ioend' has no member named 'io_offset' 1446 | if (ia->io_offset < ib->io_offset) | ^~ fs/iomap/buffered-io.c:1448:15: error: 'struct iomap_ioend' has no member named 'io_offset' 1448 | if (ia->io_offset > ib->io_offset) | ^~ fs/iomap/buffered-io.c:1448:31: error: 'struct iomap_ioend' has no member named 'io_offset' 1448 | if (ia->io_offset > ib->io_offset) | ^~ fs/iomap/buffered-io.c: In function 'iomap_submit_ioend': fs/iomap/buffered-io.c:1479:14: error: 'struct iomap_ioend' has no member named 'io_bio' 1479 | ioend->io_bio->bi_private = ioend; | ^~ fs/iomap/buffered-io.c:1480:14: error: 'struct iomap_ioend' has no member named 'io_bio' 1480 | ioend->io_bio->bi_end_io = iomap_writepage_end_bio; | ^~ fs/iomap/buffered-io.c:1491:22: error: 'struct iomap_ioend' has no member named 'io_bio' 1491 | ioend->io_bio->bi_status = errno_to_blk_status(error); | ^~ fs/iomap/buffered-io.c:1492:32: error: 'struct iomap_ioend' has no member named 'io_bio' 1492 | bio_endio(ioend->io_bio); | ^~ fs/iomap/buffered-io.c:1496:25: error: 'struct iomap_ioend' has no member named 'io_bio' 1496 | submit_bio(ioend->io_bio); | ^~ In file included from include/linux/container_of.h:5, from include/linux/list.h:5, from include/linux/module.h:12, from fs/iomap/buffered-io.c:6: fs/iomap/buffered-io.c: In function 'iomap_alloc_ioend': >> include/linux/container_of.h:20:54: error: 'struct iomap_ioend' has no member named 'io_inline_bio' 20 | static_assert(__same_type(*(ptr), ((type *)0)->member) || \ | ^~ include/linux/build_bug.h:78:56: note: in definition of macro '__static_assert' 78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg) | ^~~~ include/linux/container_of.h:20:9: note: in expansion of macro 'static_assert' 20 | static_assert(__same_type(*(ptr), ((type *)0)->member) || \ | ^~~~~~~~~~~~~ include/linux/container_of.h:20:23: note: in expansion of macro '__same_type' 20 | static_assert(__same_type(*(ptr), ((type *)0)->member) || \ | ^~~~~~~~~~~ fs/iomap/buffered-io.c:1513:17: note: in expansion of macro 'container_of' 1513 | ioend = container_of(bio, struct iomap_ioend, io_inline_bio); | ^~~~~~~~~~~~ >> include/linux/compiler_types.h:338:27: error: expression in static assertion is not an integer 338 | #define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b)) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:78:56: note: in definition of macro '__static_assert' 78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg) | ^~~~ include/linux/container_of.h:20:9: note: in expansion of macro 'static_assert' 20 | static_assert(__same_type(*(ptr), ((type *)0)->member) || \ | ^~~~~~~~~~~~~ include/linux/container_of.h:20:23: note: in expansion of macro '__same_type' 20 | static_assert(__same_type(*(ptr), ((type *)0)->member) || \ | ^~~~~~~~~~~ fs/iomap/buffered-io.c:1513:17: note: in expansion of macro 'container_of' 1513 | ioend = container_of(bio, struct iomap_ioend, io_inline_bio); | ^~~~~~~~~~~~ In file included from include/uapi/linux/posix_types.h:5, from include/uapi/linux/types.h:14, from include/linux/types.h:6, from include/linux/kasan-checks.h:5, from include/asm-generic/rwonce.h:26, from ./arch/x86/include/generated/asm/rwonce.h:1, from include/linux/compiler.h:247, from include/linux/build_bug.h:5, from include/linux/container_of.h:5, from include/linux/list.h:5, from include/linux/module.h:12, from fs/iomap/buffered-io.c:6: >> include/linux/stddef.h:16:33: error: 'struct iomap_ioend' has no member named 'io_inline_bio' 16 | #define offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER) | ^~~~~~~~~~~~~~~~~~ include/linux/container_of.h:23:28: note: in expansion of macro 'offsetof' 23 | ((type *)(__mptr - offsetof(type, member))); }) | ^~~~~~~~ fs/iomap/buffered-io.c:1513:17: note: in expansion of macro 'container_of' 1513 | ioend = container_of(bio, struct iomap_ioend, io_inline_bio); | ^~~~~~~~~~~~ fs/iomap/buffered-io.c:1520:14: error: 'struct iomap_ioend' has no member named 'io_offset' 1520 | ioend->io_offset = offset; | ^~ fs/iomap/buffered-io.c:1521:14: error: 'struct iomap_ioend' has no member named 'io_bio' 1521 | ioend->io_bio = bio; | ^~ fs/iomap/buffered-io.c:1522:14: error: 'struct iomap_ioend' has no member named 'io_sector' 1522 | ioend->io_sector = sector; | ^~ fs/iomap/buffered-io.c: In function 'iomap_can_add_to_ioend': fs/iomap/buffered-io.c:1557:33: error: 'struct iomap_ioend' has no member named 'io_offset' 1557 | if (offset != wpc->ioend->io_offset + wpc->ioend->io_size) | ^~ In file included from include/linux/blkdev.h:17, from include/linux/iomap.h:11, from fs/iomap/buffered-io.c:9: fs/iomap/buffered-io.c:1559:48: error: 'struct iomap_ioend' has no member named 'io_bio' 1559 | if (sector != bio_end_sector(wpc->ioend->io_bio)) | ^~ include/linux/bio.h:40:38: note: in definition of macro 'bvec_iter_end_sector' 40 | #define bvec_iter_end_sector(iter) ((iter).bi_sector + bvec_iter_sectors((iter))) | ^~~~ fs/iomap/buffered-io.c:1559:23: note: in expansion of macro 'bio_end_sector' 1559 | if (sector != bio_end_sector(wpc->ioend->io_bio)) | ^~~~~~~~~~~~~~ fs/iomap/buffered-io.c:1559:48: error: 'struct iomap_ioend' has no member named 'io_bio' 1559 | if (sector != bio_end_sector(wpc->ioend->io_bio)) | ^~ include/linux/bio.h:39:35: note: in definition of macro 'bvec_iter_sectors' 39 | #define bvec_iter_sectors(iter) ((iter).bi_size >> 9) | ^~~~ include/linux/bio.h:43:33: note: in expansion of macro 'bvec_iter_end_sector' 43 | #define bio_end_sector(bio) bvec_iter_end_sector((bio)->bi_iter) | ^~~~~~~~~~~~~~~~~~~~ fs/iomap/buffered-io.c:1559:23: note: in expansion of macro 'bio_end_sector' 1559 | if (sector != bio_end_sector(wpc->ioend->io_bio)) | ^~~~~~~~~~~~~~ fs/iomap/buffered-io.c: In function 'iomap_add_to_ioend': fs/iomap/buffered-io.c:1590:38: error: 'struct iomap_ioend' has no member named 'io_bio' 1590 | if (!bio_add_folio(wpc->ioend->io_bio, folio, len, poff)) { | ^~ fs/iomap/buffered-io.c:1591:27: error: 'struct iomap_ioend' has no member named 'io_bio' 1591 | wpc->ioend->io_bio = iomap_chain_bio(wpc->ioend->io_bio); | ^~ fs/iomap/buffered-io.c:1591:64: error: 'struct iomap_ioend' has no member named 'io_bio' 1591 | wpc->ioend->io_bio = iomap_chain_bio(wpc->ioend->io_bio); | ^~ fs/iomap/buffered-io.c:1592:41: error: 'struct iomap_ioend' has no member named 'io_bio' 1592 | bio_add_folio(wpc->ioend->io_bio, folio, len, poff); | ^~ In file included from include/uapi/linux/posix_types.h:5, from include/uapi/linux/types.h:14, from include/linux/types.h:6, from include/linux/kasan-checks.h:5, from include/asm-generic/rwonce.h:26, from ./arch/x86/include/generated/asm/rwonce.h:1, from include/linux/compiler.h:247, from include/linux/build_bug.h:5, from include/linux/container_of.h:5, from include/linux/list.h:5, from include/linux/module.h:12, from fs/iomap/buffered-io.c:6: fs/iomap/buffered-io.c: In function 'iomap_init': >> include/linux/stddef.h:16:33: error: 'struct iomap_ioend' has no member named 'io_inline_bio' 16 | #define offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER) | ^~~~~~~~~~~~~~~~~~ fs/iomap/buffered-io.c:1830:28: note: in expansion of macro 'offsetof' 1830 | offsetof(struct iomap_ioend, io_inline_bio), | ^~~~~~~~ fs/iomap/buffered-io.c:1832:1: error: control reaches end of non-void function [-Werror=return-type] 1832 | } | ^ cc1: some warnings being treated as errors -- In file included from fs/gfs2/bmap.h:10, from fs/gfs2/glock.c:50: >> include/linux/iomap.h:381:9: error: expected ':', ',', ';', '}' or '__attribute__' before 'sector_t' 381 | sector_t io_sector; | ^~~~~~~~ In file included from fs/gfs2/trace_gfs2.h:641, from fs/gfs2/glock.c:52: include/trace/define_trace.h:95:42: fatal error: ./trace_gfs2.h: No such file or directory 95 | #include TRACE_INCLUDE(TRACE_INCLUDE_FILE) | ^ compilation terminated. -- In file included from fs/xfs/xfs_iomap.h:9, from fs/xfs/xfs_aops.c:15: >> include/linux/iomap.h:381:9: error: expected ':', ',', ';', '}' or '__attribute__' before 'sector_t' 381 | sector_t io_sector; | ^~~~~~~~ fs/xfs/xfs_aops.c: In function 'xfs_ioend_is_append': >> fs/xfs/xfs_aops.c:40:21: error: 'struct iomap_ioend' has no member named 'io_offset' 40 | return ioend->io_offset + ioend->io_size > | ^~ fs/xfs/xfs_aops.c: In function 'xfs_end_ioend': fs/xfs/xfs_aops.c:88:47: error: 'struct iomap_ioend' has no member named 'io_offset' 88 | xfs_off_t offset = ioend->io_offset; | ^~ >> fs/xfs/xfs_aops.c:115:42: error: 'struct iomap_ioend' has no member named 'io_bio' 115 | error = blk_status_to_errno(ioend->io_bio->bi_status); | ^~ fs/xfs/xfs_aops.c:134:50: error: 'struct iomap_ioend' has no member named 'io_offset' 134 | error = xfs_setfilesize(ip, ioend->io_offset, ioend->io_size); | ^~ fs/xfs/xfs_aops.c: In function 'xfs_prepare_ioend': fs/xfs/xfs_aops.c:439:38: error: 'struct iomap_ioend' has no member named 'io_offset' 439 | ioend->io_offset, ioend->io_size); | ^~ fs/xfs/xfs_aops.c:447:22: error: 'struct iomap_ioend' has no member named 'io_bio' 447 | ioend->io_bio->bi_end_io = xfs_end_bio; | ^~ fs/xfs/xfs_aops.c: In function 'xfs_ioend_is_append': fs/xfs/xfs_aops.c:42:1: error: control reaches end of non-void function [-Werror=return-type] 42 | } | ^ cc1: some warnings being treated as errors vim +381 include/linux/iomap.h 329 330 ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from, 331 const struct iomap_ops *ops); 332 int iomap_file_buffered_write_punch_delalloc(struct inode *inode, 333 struct iomap *iomap, loff_t pos, loff_t length, ssize_t written, 334 int (*punch)(struct inode *inode, loff_t pos, loff_t length)); 335 336 int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops); 337 void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops); 338 bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count); 339 struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos); 340 bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags); 341 void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len); 342 int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, 343 const struct iomap_ops *ops); 344 int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, 345 bool *did_zero, const struct iomap_ops *ops); 346 int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, 347 const struct iomap_ops *ops); 348 vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, 349 const struct iomap_ops *ops); 350 int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, 351 u64 start, u64 len, const struct iomap_ops *ops); 352 loff_t iomap_seek_hole(struct inode *inode, loff_t offset, 353 const struct iomap_ops *ops); 354 loff_t iomap_seek_data(struct inode *inode, loff_t offset, 355 const struct iomap_ops *ops); 356 sector_t iomap_bmap(struct address_space *mapping, sector_t bno, 357 const struct iomap_ops *ops); 358 359 /** 360 * struct iomap_ioend - for writeback I/O completions 361 * 362 * @io_list: next ioend in chain 363 * @io_type: 364 * @io_flags: IOMAP_F_* 365 * @io_folios: folios added to ioend 366 * @io_inode: file being written to 367 * @io_size: size of the extent 368 * @io_offset: offset in the file 369 * @io_sector: start sector of ioend 370 * @io_bio: bio being built 371 * @io_inline_bio: MUST BE LAST! 372 */ 373 struct iomap_ioend { 374 struct list_head io_list; 375 u16 io_type; 376 u16 io_flags; 377 u32 io_folios; 378 struct inode *io_inode; 379 size_t io_size; 380 loff_t io_offset > 381 sector_t io_sector; 382 struct bio *io_bio; 383 struct bio io_inline_bio; 384 }; 385
Hi Luis, kernel test robot noticed the following build errors: [auto build test ERROR on xfs-linux/for-next] [also build test ERROR on lwn/docs-next linus/master v6.4-rc2 next-20230518] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Luis-Chamberlain/Documentation-add-initial-iomap-kdoc/20230518-231530 base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next patch link: https://lore.kernel.org/r/20230518150105.3160445-1-mcgrof%40kernel.org patch subject: [PATCH v2] Documentation: add initial iomap kdoc config: i386-randconfig-a002 compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/4db9a33eed82de4cb27f6e2890110a39bb4859af git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Luis-Chamberlain/Documentation-add-initial-iomap-kdoc/20230518-231530 git checkout 4db9a33eed82de4cb27f6e2890110a39bb4859af # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 olddefconfig COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash fs/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202305190426.ohhxM3oA-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from fs/f2fs/file.c:27: >> include/linux/iomap.h:380:20: error: expected ';' at end of declaration list loff_t io_offset ^ ; 1 error generated. -- In file included from fs/ext4/file.c:24: >> include/linux/iomap.h:380:20: error: expected ';' at end of declaration list loff_t io_offset ^ ; In file included from fs/ext4/file.c:31: include/linux/mman.h:154:9: warning: division by zero is undefined [-Wdivision-by-zero] _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/mman.h:132:21: note: expanded from macro '_calc_vm_trans' : ((x) & (bit1)) / ((bit1) / (bit2)))) ^ ~~~~~~~~~~~~~~~~~ 1 warning and 1 error generated. -- In file included from fs/iomap/buffered-io.c:9: >> include/linux/iomap.h:380:20: error: expected ';' at end of declaration list loff_t io_offset ^ ; >> fs/iomap/buffered-io.c:1417:13: error: no member named 'io_sector' in 'struct iomap_ioend' if (ioend->io_sector + (ioend->io_size >> 9) != next->io_sector) ~~~~~ ^ fs/iomap/buffered-io.c:1417:56: error: no member named 'io_sector' in 'struct iomap_ioend' if (ioend->io_sector + (ioend->io_size >> 9) != next->io_sector) ~~~~ ^ fs/iomap/buffered-io.c:1522:9: error: no member named 'io_sector' in 'struct iomap_ioend' ioend->io_sector = sector; ~~~~~ ^ 4 errors generated. vim +380 include/linux/iomap.h 329 330 ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from, 331 const struct iomap_ops *ops); 332 int iomap_file_buffered_write_punch_delalloc(struct inode *inode, 333 struct iomap *iomap, loff_t pos, loff_t length, ssize_t written, 334 int (*punch)(struct inode *inode, loff_t pos, loff_t length)); 335 336 int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops); 337 void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops); 338 bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count); 339 struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos); 340 bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags); 341 void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len); 342 int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, 343 const struct iomap_ops *ops); 344 int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, 345 bool *did_zero, const struct iomap_ops *ops); 346 int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, 347 const struct iomap_ops *ops); 348 vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, 349 const struct iomap_ops *ops); 350 int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, 351 u64 start, u64 len, const struct iomap_ops *ops); 352 loff_t iomap_seek_hole(struct inode *inode, loff_t offset, 353 const struct iomap_ops *ops); 354 loff_t iomap_seek_data(struct inode *inode, loff_t offset, 355 const struct iomap_ops *ops); 356 sector_t iomap_bmap(struct address_space *mapping, sector_t bno, 357 const struct iomap_ops *ops); 358 359 /** 360 * struct iomap_ioend - for writeback I/O completions 361 * 362 * @io_list: next ioend in chain 363 * @io_type: 364 * @io_flags: IOMAP_F_* 365 * @io_folios: folios added to ioend 366 * @io_inode: file being written to 367 * @io_size: size of the extent 368 * @io_offset: offset in the file 369 * @io_sector: start sector of ioend 370 * @io_bio: bio being built 371 * @io_inline_bio: MUST BE LAST! 372 */ 373 struct iomap_ioend { 374 struct list_head io_list; 375 u16 io_type; 376 u16 io_flags; 377 u32 io_folios; 378 struct inode *io_inode; 379 size_t io_size; > 380 loff_t io_offset 381 sector_t io_sector; 382 struct bio *io_bio; 383 struct bio io_inline_bio; 384 }; 385
On Thu, May 18, 2023 at 08:49:26AM -0700, Randy Dunlap wrote: > On 5/18/23 08:01, Luis Chamberlain wrote: > > * use 80 char length as if we're in the 1980's > > Well, like Jon said, long lines are difficult to read, even on printed paper. > That's (at least one reason) why newspapers(!) have narrow columns of print. Makes sense! > Anyway, thanks for doing it. Heh yeah, I just trying to follow the convention, but I didn't know it was a special-case for bumping up to 100 only, and that it was definitely not good for docs. It's easy to loose it though, we have one for commit log, one for length on code with an exception, and we have a clear perference for docs. Makes me wonder if editors pick up on project specific requirements somehow? So for instance I used to have incorrectly; set textwidth=100 autocmd FileType gitcommit set textwidth=72 set colorcolumn=+1 Cleary that 100 is wrong now and I've now updated it bacak to 80. Could one be used for FileType for rst and ascii files? If we shared something on the top level which lets developers optionally pick up on project specific guideline it would be a less common problem to ping back / forth about this. Curious how many patches "length" is the reason introduces a latency for patches getting upstream. You figured this would be a simple fix in year 2023 :P Thanks for the fix recommendations! I'll wait and see if others find others! Luis
Luis Chamberlain <mcgrof@kernel.org> writes: > To help with iomap adoption / porting I set out the goal to try to > help improve the iomap documentation and get general guidance for > filesystem conversions over from buffer-head in time for this year's > LSFMM. The end results thanks to the review of Darrick, Christoph and > others is on the kernelnewbies wiki [0]. > > This brings this forward a relevant subset of that documentation to > the kernel in kdoc format and also kdoc'ifies the existing documentation > on iomap.h. OK, I've had a read through it. Thanks again for doing it, we definitely need this. There are typos and such that Randy has already pointed out, so I won't bother with those. My main comment is mainly a high level one, along with a handful of nits. My high-level question is: who is the audience for this document? I'm guessing it's filesystem developers? Whoever it is, the document could benefit from an introductory section, aimed at that audience, saying how to get *started* with iomap. What include files do I need? How do I provide a set of iomap callbacks and make them visible to the VFS? Without that sort of stuff, it makes for a rough jumping-in experience. The nits: > diff --git a/Documentation/filesystems/iomap.rst b/Documentation/filesystems/iomap.rst > new file mode 100644 > index 000000000000..be487030fcff > --- /dev/null > +++ b/Documentation/filesystems/iomap.rst > @@ -0,0 +1,253 @@ > +.. SPDX-License-Identifier: GPL-2.0 > +.. _iomap: I don't think you use this label anywhere, so it doesn't need to be here. > +.. > + Mapping of heading styles within this document: > + Heading 1 uses "====" above and below > + Heading 2 uses "====" > + Heading 3 uses "----" > + Heading 4 uses "````" > + Heading 5 uses "^^^^" > + Heading 6 uses "~~~~" > + Heading 7 uses "...." We have a set of conventions for section headings, nicely documented in Documentation/doc-guide/sphinx.rst. This hierarchy doesn't quite match it, but you don't get far enough into it to hit the differences. I'd just take this out. > + > + Sections are manually numbered because apparently that's what everyone > + does in the kernel. The sections are *not* manually numbered, which I think is entirely fine. But that makes this comment a bit weird. > +.. contents:: Table of Contents > + :local: > + > +===== > +iomap > +===== > + > +.. kernel-doc:: include/linux/iomap.h > + > +A modern block abstraction > +========================== > + > +**iomap** allows filesystems to query storage media for data using *byte > +ranges*. Since block mapping are provided for a *byte ranges* for cache data in > +memory, in the page cache, naturally this implies operations on block ranges > +will also deal with *multipage* operations in the page cache. **Folios** are > +used to help provide *multipage* operations in memory for the *byte ranges* > +being worked on. This text (and the document as a whole) is a bit heavy on the markup. I'd consider taking some of it out to improve the plain-text readability. [...] > diff --git a/include/linux/iomap.h b/include/linux/iomap.h > index e2b836c2e119..ee4b026995ac 100644 > --- a/include/linux/iomap.h > +++ b/include/linux/iomap.h > @@ -10,6 +10,30 @@ > #include <linux/mm_types.h> > #include <linux/blkdev.h> > > +/** > + * DOC: Introduction > + * > + * iomap allows filesystems to sequentially iterate over byte addressable block > + * ranges on an inode and apply operations to it. > + * > + * iomap grew out of the need to provide a modern block mapping abstraction for > + * filesystems with the different IO access methods they support and assisting > + * the VFS with manipulating files into the page cache. iomap helpers are > + * provided for each of these mechanisms. However, block mapping is just one of > + * the features of iomap, given iomap supports DAX IO for filesystems and also > + * supports such the ``lseek``/``llseek`` ``SEEK_DATA``/``SEEK_HOLE`` > + * interfaces. > + * > + * Block mapping provides a mapping between data cached in memory and the > + * location on persistent storage where that data lives. `LWN has an great > + * review of the old buffer-heads block-mapping and why they are inefficient > + * <https://lwn.net/Articles/930173/>`, since the inception of Linux. Since > + * **buffer-heads** work on a 512-byte block based paradigm, it creates an > + * overhead for modern storage media which no longer necessarily works only on > + * 512-blocks. iomap is flexible providing block ranges in *bytes*. iomap, with > + * the support of folios, provides a modern replacement for **buffer-heads**. As much as I love to see LWN references embedded in as many kernel files as possible, I'm honestly not sure that the iomap documentation needs to talk about buffer heads at all - except maybe for help in replacing them. In particular, I don't think that the documentation needs to justify iomap's existence; we can look at the commit logs if we need to remind ourselves of that. Thanks, jon
On Thu, May 18, 2023 at 08:01:05AM -0700, Luis Chamberlain wrote: > To help with iomap adoption / porting I set out the goal to try to > help improve the iomap documentation and get general guidance for > filesystem conversions over from buffer-head in time for this year's > LSFMM. The end results thanks to the review of Darrick, Christoph and > others is on the kernelnewbies wiki [0]. > > This brings this forward a relevant subset of that documentation to > the kernel in kdoc format and also kdoc'ifies the existing documentation > on iomap.h. > > Tested with: > > make htmldocs SPHINXDIRS="filesystems" > > Then looking at the docs produced on: > > Documentation/output/filesystems/iomap.html > > [0] https://kernelnewbies.org/KernelProjects/iomap > > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> > --- > > Changes on v2: > > * use 80 char length as if we're in the 1980's > > Documentation/filesystems/index.rst | 1 + > Documentation/filesystems/iomap.rst | 253 +++++++++++++++++++++ > include/linux/iomap.h | 336 ++++++++++++++++++---------- > 3 files changed, 468 insertions(+), 122 deletions(-) > create mode 100644 Documentation/filesystems/iomap.rst > > diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst > index fbb2b5ada95b..6186ab7c3ea8 100644 > --- a/Documentation/filesystems/index.rst > +++ b/Documentation/filesystems/index.rst > @@ -34,6 +34,7 @@ algorithms work. > seq_file > sharedsubtree > idmappings > + iomap > > automount-support > > diff --git a/Documentation/filesystems/iomap.rst b/Documentation/filesystems/iomap.rst > new file mode 100644 > index 000000000000..be487030fcff > --- /dev/null > +++ b/Documentation/filesystems/iomap.rst > @@ -0,0 +1,253 @@ > +.. SPDX-License-Identifier: GPL-2.0 > +.. _iomap: > + > +.. > + Mapping of heading styles within this document: > + Heading 1 uses "====" above and below > + Heading 2 uses "====" > + Heading 3 uses "----" > + Heading 4 uses "````" > + Heading 5 uses "^^^^" > + Heading 6 uses "~~~~" > + Heading 7 uses "...." > + > + Sections are manually numbered because apparently that's what everyone > + does in the kernel. > +.. contents:: Table of Contents > + :local: > + > +===== > +iomap > +===== > + > +.. kernel-doc:: include/linux/iomap.h > + > +A modern block abstraction > +========================== iomap is not a "block" abstraction layer. It's a mapping layer that maps -file offset- ranges to -extents- that describe a LBA range in a block device address space. > +**iomap** allows filesystems to query storage media for data using *byte > +ranges*. Since block mapping are provided for a *byte ranges* for cache data in > +memory, in the page cache, naturally this implies operations on block ranges > +will also deal with *multipage* operations in the page cache. I don't know what this means. > **Folios** are > +used to help provide *multipage* operations in memory for the *byte ranges* > +being worked on. No. iomap naturally does multi-page operations that span the mapped extent that is returned. We do not need folios for this (never have) but folios can optimise it. Any discussion that tries to describe iomap in page cache terms has started off in completely the wrong direction. The whole point of iomap is that it is filesystem centric rather than page cache or mm centric. i.e. the fundamental structure of the iomap infrastructure is this: user IO loop for file IO range loop for each mapped extent if (buffered) { loop for each page/folio { instantiate page cache copy data to/from page cache update page cache state } } else { /* direct IO */ loop for each bio { pack user pages into bio submit bio } } } } IOWs, if you are trying to describe iomap in terms of page cache behaviour, then you've got it completely back to front. The whole point of iomap is that it is "completely back to front" w.r.t. to the old "get_blocks()" per page callback mechanism the page cache used to use to interact with filesystems. The introduction should actually give some grounding in how the infrastructure works - how it iterates, when mapping callbacks run, why we have iomap_begin and iomap_end operations and when it is necessary to supply both, etc. Spreading all that information across this file and structure and function definitions in the code does not make for coherent, readable or understandable documentation... Also, describing the required locking heirachy also helps to explain how things work. iomap assumes two layers of locking. It requires locking above the iomap layer for IO serialisation (i_rwsem, invalidation lock) which is generally taken before calling into iomap functions. There is also locking below iomap for mapping/allocation serialisation on an inode (e.g. XFS_ILOCK, i_data_sem in ext4, etc) that is usually taken inside the mapping methods filesystems supply to the iomap infrastructure. This layer of locking needs to be independent of the IO path serialisation locking as it nests inside in the IO path but is also used without the filesystem IO path locking protecting it (e.g. in the iomap writeback path). > +iomap IO interfaces > +=================== > + > +You call **iomap** depending on the type of filesystem operation you are working > +on. We detail some of these interactions below. > + > +iomap for bufferred IO writes > +----------------------------- > + > +You call **iomap** for buffered IO with: > + > + * ``iomap_file_buffered_write()`` - for buffered writes This is not sufficient to implement a filesystem based on iomap. There's no discussion of: - locking requirements - IOCB_NOWAIT requirements - that write validity checking must have already been performed - it doesn't do things like update timestamps or strip privileges - the ops structure that needs to be supplied to this function needs to be documented as well as when those callbacks are run and what they are supposed to do. > + * ``iomap_page_mkwrite()`` - when dealing callbacks for > + ``struct vm_operations_struct`` Similar issues here. > + > + * ``struct vm_operations_struct.page_mkwrite()`` > + * ``struct vm_operations_struct.fault()`` > + * ``struct vm_operations_struct.huge_fault()`` > + * ``struct vm_operations_struct`.pfn_mkwrite()`` No idea what this is doing in an iomap document. :/ > +You *may* use buffered writes to also deal with ``fallocate()``: > + > + * ``iomap_zero_range()`` on fallocate for zeroing > + * ``iomap_truncate_page()`` on fallocate for truncation These are *not* fallocate() operations. fallocate() implementations may use iomap_zero_range(), but that's because fallocate() ops might need to -zero file offset ranges within EOF- IOWs, iomap_zero_range() is a standalone helper that converts a range of the file to contain zeros. That's all it does. It also needs documenting about what locks need to be held when it's called, along with a discussion of how it handles partial block zeroing and what happens when the range spans unwritten extents.. iomap_truncate_page() is only used when removing file data from the file. It is a destructive operation, and it is typically used to remove cached data from beyond EOF when a truncate down operation is making the file smaller. It has nothing to do with fallocate() at all.... > +Typically you'd also happen to use these on paths when updating an inode's size. > + > +iomap for direct IO > +------------------- > + > +You call **iomap** for direct IO with: > + > + * ``iomap_dio_rw()`` All the same comments as for buffered IO. There are also addition data integrity concerns with serialising concurrent sub-block DIO constraints that the filesystem has to handle itself. > +You **may** use direct IO writes to also deal with ``fallocate()``: > + > + * ``iomap_zero_range()`` on fallocate for zeroing > + * ``iomap_truncate_page()`` on fallocate for truncation This makes no sense at all. iomap_dio_rw() does it's own sub-block zeroing if the mapping returned to it by the filesystem requires it. That's where all the additional data integrity concerns with concurrent sub-block DIO come from. > +Typically you'd also happen to use these on paths when updating an inode's size. What has that got to do with discussing how iomap_dio_rw() works? > +iomap for reads > +--------------- > + > +You can call into **iomap** for reading, ie, dealing with the filesystems's > +``struct file_operations``: > + > + * ``struct file_operations.read_iter()``: note that depending on the type of > + read your filesystem might use ``iomap_dio_rw()`` for direct IO, > + generic_file_read_iter() for buffered IO and > + ``dax_iomap_rw()`` for DAX. > + * ``struct file_operations.remap_file_range()`` - currently the special > + ``dax_remap_file_range_prep()`` helper is provided for DAX mode reads. This is confusing. Why mention struct file_operations here and not in the write path? Why mention DAX here, and not in the write path? Why mention remap_file_range here, when that is an operation that simply layers over the read and write IO paths? Indeed, you don't discuss anything about the IO path, locking requirements, how reads and writes can be serialised against each other with iomap (e.g. via coarse grained per-IO exclusion via i_rwsem or fine-grained via folio locks), etc. Those are things that people implementing filesystem actually need to know.... > +iomap for userspace file extent mapping > +--------------------------------------- > + > +The ``fiemap`` ioctl can be used to allow userspace to get a file extent > +mapping. The older ``bmap()`` (aka ``FIBMAP``) allows the VM to map logical > +block offset to physical block number. ``bmap()`` is a legacy block mapping > +operation supported only for the ioctl and two areas in the kernel which likely > +are broken (the default swapfile implementation and odd md bitmap code). > +``bmap()`` was only useful in the days of ext2 when there were no support for > +delalloc or unwritten extents. Consequently, the interface reports nothing for > +those types of mappings. Because of this we don't want filesystems to start > +exporting this interface if they don't already do so. Discussion of FIBMAP problems has no place in a description of how to use iomap to support FIEMAP. > +The ``fiemap`` ioctl is supported through an inode ``struct > +inode_operations.fiemap()`` callback. Well, yes, but that's largely irrelevant to how to use iomap to implement it? > +You would use ``iomap_fiemap()`` to provide the mapping. You could use two > +seperate ``struct iomap_ops`` one for when requested to also map extended > +attributes (``FIEMAP_FLAG_XATTR``) and your another ``struct iomap_ops`` for > +regular read ``struct iomap_ops`` when there is no need for extended attributes. Yay, finally the 'struct iomap_ops' is mentioned, but without any description of what it is or how it gets used, this is just confusing. "iomap_fiemap() will return the fiemap data ready to be returned to userspace based on the mapping function provided by the caller. Note that FIEMAP may ask for different mappings from the same inode (e.g. attribute vs data) so care must be taken to ensure the mapping function provided returns the information the caller asked for." > +In the future **iomap** may provide its own dedicated ops structure for > +``fiemap``. unnecessary. > +``iomap_bmap()`` exists and should *only be used* by filesystems that > +**already** supported ``FIBMAP``. ``FIBMAP`` **should not be used** with the > +address_space -- we have iomap readpages and writepages for that. I really don't understand what you are trying to say "do not do" here. Nothing in a filesystem implementation needs to call iomap_bmap(); it's a function to implement the fs ->bmap method and nothing else. > +iomap for assisting the VFS > +--------------------------- > + > +A filesystem also needs to call **iomap** when assisting the VFS manipulating a > +file into the page cache. I have no idea what this means. > +iomap for VFS reading > +--------------------- > + > +A filesystem can call **iomap** to deal with the VFS reading a file into folios > +with: > + > + * ``iomap_bmap()`` - called to assist the VFS when manipulating page cache with > + ``struct address_space_operations.bmap()``, to help the VFS map a logical > + block offset to physical block number. What? Nobody should be using ->bmap() except for FIBMAP and the legacy swapfile mapping path. Any filesystem that is implementing iomap that supports swapfiles should be using iomap_swapfile_activate(). Swapfiles are most definitely not a "VFS reading a file into folios" situation. > + * ``iomap_read_folio()`` - called to assist the page cache with > + ``struct address_space_operations.read_folio()`` > + * ``iomap_readahead()`` - called to assist the page cache with > + ``struct address_space_operations.readahead()`` Oh, these aren't "VFS reading" operations. These are page cache IO interfaces. > + > +iomap for VFS writepages > +------------------------ > + > +A filesystem can call **iomap** to deal with the VFS write out of pages back to > +backing store, that is to help deal with a filesystems's ``struct > +address_space_operations.writepages()``. The special ``iomap_writepages()`` is > +used for this case with its own respective filestems's ``struct iomap_ops`` for > +this. That is .... very brief. There's no discussion of: - the iomap_writepage_ctx and how to use that, - calling context constraints (e.g. no direct reclaim calls), - what happens when the page spans or is beyond EOF, - what the ->map_blocks() callback is supposed to do, - how errors during write submission are handled, - when ->discard_folio() should be implemented, - data integrity requirements, - how ioends work - when a filesystem might need to implement ->prepare_ioend() (e.g. for COW state updates) - how ioend completion merging can be implemented - how to attach custom IO completion handlers (e.g. for unwritten extent conversion, COW state updates, file size updates, etc) > +iomap for VFS llseek > +-------------------- > + > +A filesystem ``struct address_space_operations.llseek()`` is used by the VFS > +when it needs to move the current file offset, the file offset is in ``struct > +file.f_pos``. **iomap** has special support for the ``llseek`` ``SEEK_HOLE`` or > +``SEEK_DATA`` interfaces: All this needs to say is: "iomap provides generic support for SEEK_HOLE and SEEK_DATA via iomap_seek_hole() and iomap_seek_data(). This requires the filesystem to supply a mapping callback to allow targeted iteration of the current inode's extent map." > + > + * ``iomap_seek_hole()``: for when the > + ``struct address_space_operations.llseek()`` *whence* argument is > + ``SEEK_HOLE``, when looking for the file's next hole. > + * ``iomap_seek_data()``: for when the > + ``struct address_space_operations.llseek()`` *whence* argument isj > + ``SEEK_DATA`` when looking for the file's next data area. > + > +Your own ``struct iomap_ops`` for this is encouraged. What does that mean? The structure with a valid ->iomap_begin method is required for correct operation.... > +iomap for DAX > +------------- > +You can use ``dax_iomap_rw()`` when calling iomap from a DAX context, this is > +typically from the filesystems's ``struct file_operations.write_iter()`` > +callback. Ugh. That is *not useful*. It's worse than not documenting it at all. DAX has a separate iomap API and iterator. it implements many of the same functions unsing the same API. e.g. dax_zero_range(), dax_truncate_page(), dax_iomap_fault(), dax_file_unshare(), dax_remap_file_range_prep(), etc. Any filesystem that uses the iomap API should be able to branch operations to the DAX iomap operations if IS_DAX() is set on the inode. Other than that, implementation of DAX support is way outside the scope of implementing iomap support in a filesystem. > +Converting filesystems from buffer-head to iomap guide > +====================================================== > + > +These are generic guidelines on converting a filesystem over to **iomap** from > +'''buffer-heads'''. > + > +One op at at time > +----------------- > + > +You may try to convert a filesystem with different clustered set of operations > +at time, below are a generic order you may strive to target: > + > + * direct io > + * miscellaneous helpers (seek/fiemap/bmap) > + * buffered io That doesn't tell anyone how to actually go about this. The right approach is to first implement ->iomap_begin and (if necessary) ->iomap_end to allow iomap to obtain a read-only mapping of a file range. In most cases, this is a relatively trivial conversion of the existing get_block() callback for read-only mappings. FIEMAP is a really good first target because it is trivial to implement support for it and then to determine that the extent map iteration is correct from userspace. i.e. if FIEMAP is returning the correct information, it's a good sign that other read-only mapping operations will also do the right thing. Next, rewrite the filesystem's get_block(create = false) implementatio to use the new ->iomap_begin() implementation. i.e. get_block wraps around the outside and converts the information in the iomap to the bufferhead-based map that getblock returns. This will convert all the existing read-only mapping users to use the new iomap mapping function internally. This way the iomap mapping function can be further tested without needing to implement any other iomap APIs. Once everything is working like this, then convert all the other read-only mapping operations to use iomap. Done one at a time, regressions should be self evident. The only likely complexity at this point will be the buffered read IO path because bufferheads. The buffered read IO paths don't need to be converted yet, though the direct IO read path should be converted. The next thing to do is implement get_blocks(create = true) functionality in the ->iomap_begin/end() methods. Then convert the direct IO write path to iomap, and start running fsx w/ DIO enabled in earnest on filesystem. This will flush out lots of data integrity corner case bug that the new write mapping implementation introduces. At this point, converting the entire get_blocks() path to call the iomap functions and convert the iomaps to bufferhead maps is possible. THis will get the entire filesystem using the new mapping functions, and they should largely be debugged and working correctly after this step. This now largely leaves the buffered read and write paths to be converted. The mapping functions should all work correctly, so all that needs to be done is rewriting all the code that interfaces with bufferheads to interface with iomap and folios. This is left as an exercise for the reader, as it will be different for every filesystem. > +Defining a simple filesystem > +---------------------------- > + > +A simple filesystem is perhaps the easiest to convert over to **iomap**, a > +simple filesystem is one which: > + > + * does not use fsverify, fscrypt, compression > + * has no Copy on Write support (reflinks) > + > +Converting a simple filesystem to iomap > +--------------------------------------- > + > +Simple filesystems should covert to IOMAP piecemeal wise first converting over > +**direct IO**, then the miscellaneous helpers (seek/fiemap/bmap) and last > +should be buffered IO. > + > +Converting shared filesystem features > +------------------------------------- > + > +Shared filesystems features such as fscrypt, compression, erasure coding, and > +any other data transformations need to be ported to **iomap** first, as none of > +the current **iomap** users require any of this functionality. > + > +Converting complex filesystems > +------------------------------ > + > +If your filesystem relies on any shared filesystem features mentioned above > +those would need to be converted piecemeal wise. "piecemeal wise"? Anyway, didn't you just address this in the previous section? Why mention it again? > If reflinks are supported you > +need to first ensure proper locking sanity in order to be able to address byte > +ranges can be handled properly through **iomap** operations. An example > +filesystem where this work is taking place is btrfs. Ah, what? To use iomap at all, you first need to ensure properly locking sanity across the filesystem operations. Reflink or any other filesystem functionality doesn't magically create locking issues with iomap - the iomap infrastructure *always* assumes the caller is holding the correct locks when iomap functions are called, and that the methods will take the right locks to ensure filesystem mapping operations are serialised against each other correctly. i.e. there is no locking in the iomap infrastructure except for internal state protection.... > +IOMAP_F_BUFFER_HEAD considerations > +---------------------------------- > + > +``IOMAP_F_BUFFER_HEAD`` won't be removed until we have all filesystem fully > +converted away from **buffer-heads**, and this could be never. > + > +``IOMAP_F_BUFFER_HEAD`` should be avoided as a stepping stone / to port > +filesystems over to **iomap** as it's support for **buffer-heads** only apply to > +the buffered write path and nothing else including the read_folio/readahead and > +writepages aops. > + > +Testing Direct IO > +================= > + > +Other than fstests you can use LTP's dio, however this tests is limited as it > +does not test stale data. > + > +{{{ > +./runltp -f dio -d /mnt1/scratch/tmp/ > +}}} This should point to all the data integrity tests and IO path stress tests in fstests (i.e. '-g rw -g aio -g fiemap -g seek'). I think those cover far more cases (including stale data and buffered/mmap IO) than what you are suggesting here. > + > +Known issues and future improvements > +==================================== > + > +We try to document known issues that folks should be aware of with **iomap** here. > + > + * write amplification on IOMAP when bs < ps: **iomap** needs improvements for > + large folios for dirty bitmap tracking > + * filesystems which use buffer head helpers such as ``sb_bread()`` and friends > + will have to continue to use buffer heads as there is no generic iomap > + metadata read/write library yet. This will get out of data rapidly. I would not bother to document stuff like this - if it needs to be documented it should be in the code where the problems are being fixed, not in the documentation. > +References > +========== > + > + * `Presentation on iomap evolution`<https://docs.google.com/presentation/d/e/2PACX-1vSN4TmhiTu1c6HNv6_gJZFqbFZpbF7GkABllSwJw5iLnSYKkkO-etQJ3AySYEbgJA/pub?start=true&loop=false&delayms=3000&slide=id.g189cfd05063_0_185>` Someone did "git grep iomap fs/xfs" and turned it into a slide set? Yeah, they did. The total contents of Slide 11: | XFS IOMAP usage today | | git grep "struct buffer_head" fs/xfs |wc -l | 0 That's really not a useful reference at all. Maybe what was said during the talk was more insightful, but as reference material it is not useful at all. > + * `LWN review on deprecating buffer-heads <https://lwn.net/Articles/930173/>` > diff --git a/include/linux/iomap.h b/include/linux/iomap.h > index e2b836c2e119..ee4b026995ac 100644 > --- a/include/linux/iomap.h > +++ b/include/linux/iomap.h > @@ -10,6 +10,30 @@ > #include <linux/mm_types.h> > #include <linux/blkdev.h> > > +/** > + * DOC: Introduction > + * > + * iomap allows filesystems to sequentially iterate over byte addressable block > + * ranges on an inode and apply operations to it. > + * > + * iomap grew out of the need to provide a modern block mapping abstraction for > + * filesystems with the different IO access methods they support and assisting > + * the VFS with manipulating files into the page cache. iomap helpers are > + * provided for each of these mechanisms. However, block mapping is just one of > + * the features of iomap, given iomap supports DAX IO for filesystems and also > + * supports such the ``lseek``/``llseek`` ``SEEK_DATA``/``SEEK_HOLE`` > + * interfaces. > + * > + * Block mapping provides a mapping between data cached in memory and the > + * location on persistent storage where that data lives. `LWN has an great > + * review of the old buffer-heads block-mapping and why they are inefficient > + * <https://lwn.net/Articles/930173/>`, since the inception of Linux. Since > + * **buffer-heads** work on a 512-byte block based paradigm, it creates an > + * overhead for modern storage media which no longer necessarily works only on > + * 512-blocks. iomap is flexible providing block ranges in *bytes*. iomap, with > + * the support of folios, provides a modern replacement for **buffer-heads**. > + */ This commentary belongs in the documentation, not the code. If you are going to document anything in the code, it should be the requirements the code is implementing, the design behind the implementation in the code, implementation constraints and assumptions, etc. > + > struct address_space; > struct fiemap_extent_info; > struct inode; > @@ -22,37 +46,43 @@ struct page; > struct vm_area_struct; > struct vm_fault; > > -/* > - * Types of block ranges for iomap mappings: > +/** > + * DOC: iomap block ranges types I seriously dislike this "DOC:" keyword appearing everywhere. We've already got a "this is a comment for documentation" annotation in the "/**" comment prefix, having to add "DOC:" is entirely redudant and unnecessary noise. > + * > + * * IOMAP_HOLE - no blocks allocated, need allocation > + * * IOMAP_DELALLOC - delayed allocation blocks > + * * IOMAP_MAPPED - blocks allocated at @addr > + * * IOMAP_UNWRITTEN - blocks allocated at @addr in unwritten state > + * * IOMAP_INLINE - data inline in the inode Why we need "double *" here? This just makes the comment look weird. > */ > -#define IOMAP_HOLE 0 /* no blocks allocated, need allocation */ > -#define IOMAP_DELALLOC 1 /* delayed allocation blocks */ > -#define IOMAP_MAPPED 2 /* blocks allocated at @addr */ > -#define IOMAP_UNWRITTEN 3 /* blocks allocated at @addr in unwritten state */ > -#define IOMAP_INLINE 4 /* data inline in the inode */ > +#define IOMAP_HOLE 0 > +#define IOMAP_DELALLOC 1 > +#define IOMAP_MAPPED 2 > +#define IOMAP_UNWRITTEN 3 > +#define IOMAP_INLINE 4 > > -/* > - * Flags reported by the file system from iomap_begin: > +/** > + * DOC: Flags reported by the file system from iomap_begin > * > - * IOMAP_F_NEW indicates that the blocks have been newly allocated and need > - * zeroing for areas that no data is copied to. > + * * IOMAP_F_NEW: indicates that the blocks have been newly allocated and need > + * zeroing for areas that no data is copied to. > * > - * IOMAP_F_DIRTY indicates the inode has uncommitted metadata needed to access > - * written data and requires fdatasync to commit them to persistent storage. > - * This needs to take into account metadata changes that *may* be made at IO > - * completion, such as file size updates from direct IO. + * * IOMAP_F_DIRTY: indicates the inode has uncommitted metadata needed to access > + * written data and requires fdatasync to commit them to persistent storage. > + * This needs to take into account metadata changes that *may* be made at IO > + * completion, such as file size updates from direct IO. This breaks all the 80 column wrapping in this file. If you are going to indent comments, they need to be reflowed to stay within 80 columns. Which brings me back to why do we want all this extra comment formatting constraints?" Next question: Why is the comment formatting here different to the first set of "DOC" comments you added? > @@ -61,22 +91,20 @@ struct vm_fault; > #define IOMAP_F_BUFFER_HEAD (1U << 4) > #define IOMAP_F_XATTR (1U << 5) > > -/* > - * Flags set by the core iomap code during operations: > +/** > + * DOC: Flags set by the core iomap code during operations > + * > + * * IOMAP_F_SIZE_CHANGED: indicates to the iomap_end method that the file size > + * has changed as the result of this write operation. > * > - * IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size > - * has changed as the result of this write operation. > + * * IOMAP_F_STALE: indicates that the iomap is not valid any longer and the file > + * range it covers needs to be remapped by the high level before the > + * operation can proceed. Again beyond 80 columns. > * > - * IOMAP_F_STALE indicates that the iomap is not valid any longer and the file > - * range it covers needs to be remapped by the high level before the operation > - * can proceed. > + * * IOMAP_F_PRIVATE: Flags from 0x1000 up are for file system specific usage > */ > #define IOMAP_F_SIZE_CHANGED (1U << 8) > #define IOMAP_F_STALE (1U << 9) > - > -/* > - * Flags from 0x1000 up are for file system specific usage: > - */ > #define IOMAP_F_PRIVATE (1U << 12) > > > @@ -124,73 +152,119 @@ static inline bool iomap_inline_data_valid(const struct iomap *iomap) > return iomap->length <= PAGE_SIZE - offset_in_page(iomap->inline_data); > } > > -/* > - * When a filesystem sets folio_ops in an iomap mapping it returns, get_folio > - * and put_folio will be called for each folio written to. This only applies > - * to buffered writes as unbuffered writes will not typically have folios > - * associated with them. > - * > - * When get_folio succeeds, put_folio will always be called to do any > - * cleanup work necessary. put_folio is responsible for unlocking and putting > - * @folio. > +/** > + * struct iomap_folio_ops - buffered writes folio folio reference count helpers folio folio folio folio.... > + * > + * A filesystem can optionally set folio_ops in a &struct iomap mapping it > + * returns to override the default get_folio and put_folio for each folio > + * written to. That's much worse than the comment it replaces. "get_folio and put_folio will be called for each folio written to" is *much clearer* compared to "override the default". > > This only applies to buffered writes as unbuffered writes will > + * not typically have folios associated with them. > + * > + * @get_folio: iomap defaults to iomap_get_folio() (which calls > + * __filemap_get_folio()) if the filesystem did not provide a get folio op. Nope. Document what the method is provided with and is expected to perform, not what iomap_get_folio() does. > + * @put_folio: when get_folio succeeds, put_folio will always be called to do > + * any cleanup work necessary. put_folio is responsible for unlocking and > + * putting @folio. See, you didn't mention that get_folio is responsible for returning a locked, referenced folio.... > + * @iomap_valid: check that the cached iomap still maps correctly to the > + * filesystem's internal extent map. FS internal extent maps can change > + * while iomap is iterating a cached iomap, so this hook allows iomap to > + * detect that the iomap needs to be refreshed during a long running write operation. Line length. > + * > + * The filesystem can store internal state (e.g. a sequence number) in > + * iomap->validity_cookie when the iomap is first mapped to be able to > + * detect changes between mapping time and whenever .iomap_valid() is > + * called. > + * > + * This is called with the folio over the specified file position held > + * locked by the iomap code. This is useful for filesystems that have > + * dynamic mappings (e.g. anything other than zonefs). An example reason > + * as to why this is necessary is writeback doesn't take the vfs locks. I don't think these last two sentences add any value here. the first paragraph makes it pretty clear that it is related to dynamic mappings, and not taking VFS locks in writeback does not mean a filesystem needs to implement ->iomap_valid(). Indeed, the problem this fixes with XFS is related to stale unwritten extent mappings and memory reclaim racing with write() - it has nothing to do with the locking in ->writepages so at minimum the comment doesn't reflect the actual reason for this method existing.... > -/* > - * Flags for iomap_begin / iomap_end. No flag implies a read. > +/** > + * DOC: Flags for iomap_begin / iomap_end. No flag implies a read. > + * > + * * IOMAP_WRITE: writing, must allocate blocks > + * * IOMAP_ZERO: zeroing operation, may skip holes > + * * IOMAP_REPORT: report extent status, e.g. FIEMAP > + * * IOMAP_FAULT: mapping for page fault > + * * IOMAP_DIRECT: direct I/O > + * * IOMAP_NOWAIT: do not block > + * * IOMAP_OVERWRITE_ONLY: only pure overwrites allowed > + * * IOMAP_UNSHARE: unshare_file_range > + * * IOMAP_DAX: DAX mapping Yet another different doc comment style. > +/** > + * struct iomap_ops - IO interface specific operations > + * > + * A filesystem is must provide a &struct iomap_ops for to deal with the > + * beginning an IO operation, iomap_begin(), and ending an IO operation on a > + * block range, ``iomap_end()``. You would call iomap with a specialized iomap > + * operation depending on its filesystem or the VFS needs. > + * > + * For example iomap_dio_rw() would be used for for a filesystem when doing a > + * block range read or write operation with direct IO. In this case your > + * filesystem's respective &struct file_operations.write_iter() would eventually > + * call iomap_dio_rw() on the filesystem's &struct file_operations.write_iter(). > + * > + * For buffered IO a filesystem would use iomap_file_buffered_write() on the > + * same &struct file_operations.write_iter(). But that is not the only situation > + * in which a filesystem would deal with buffered writes, you could also use > + * buffered writes when a filesystem has to deal with &struct > + * file_operations.fallocate(). However fallocate() can be used for *zeroing* or > + * for *truncation* purposes. A special respective iomap_zero_range() would be > + * used for *zeroing* and a iomap_truncate_page() would be used for > + * *truncation*. This isn't useful information. It's commentary - it doesn't document what the methods are supposed to do or provide the iomap infrastructure. This sort of stuff belongs in the documentation, not the code. > + * > + * Experience with adopting iomap on filesystems have shown that the filesystem > + * implementation of these operations can be simplified considerably if one > + * &struct iomap_ops is provided per major filesystem IO operation: > + * > + * * buffered io > + * * direct io > + * * DAX io > + * * fiemap for with extended attributes (``FIEMAP_FLAG_XATTR``) > + * * lseek And this most definitely doesn't belong in the code. Put the "how to implement a filesystem" commentary in the documentation, leave the comments for the API to *describe the API*. /** * struct iomap_ops - IO interface specific operations * * This structure provides the filesystem methods for mapping a file * offset range to a mapped extent than an IO operation could be * performed on. * * An iomap reflects a single contiguous range of filesystem address * space that either exists in memory or on a block device. The type * of the iomap determines what the range maps to, and there are * several different state flags that can be returned that indicate * how the iomap infrastructure should modify it's behaviour to do * the right thing. * * The methods are designed to be used as pairs. The begin method * creates the iomap and attaches all the necessary state and * information subsequent iomap methods and callbacks might need. * and once the iomap infrastructure has finished working on the * iomap it will call the end method to allow the filesystem to tear * down any unused space and/or structures it created for the * specific iomap context. * ..... > + * > + * @iomap_begin: return the existing mapping at pos, or reserve space starting > + * at pos for up to length, as long as we can do it as a single mapping. The > + * actual length is returned in iomap->length. The &struct iomap iomap must > + * always be set. The &struct iomap srcmap should be set if the range is > + * CoW. Discuss what @flags contains and what the flags mean. Describe what the @srcmap is used for when CoW is being set up. Explain constraints on @srcmap because it is not passed to the ->iomap_end() method. > + * > + * @iomap_end: commit and/or unreserve space previous allocated using > + * iomap_begin. Written indicates the length of the successful write Shouldn't this be @written > + * operation which needs to be committed, while the rest needs to be > + * unreserved. Written might be zero if no data was written. @flags is undocumented. > + */ > struct iomap_ops { > - /* > - * Return the existing mapping at pos, or reserve space starting at > - * pos for up to length, as long as we can do it as a single mapping. > - * The actual length is returned in iomap->length. > - */ > int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length, > unsigned flags, struct iomap *iomap, > struct iomap *srcmap); > > - /* > - * Commit and/or unreserve space previous allocated using iomap_begin. > - * Written indicates the length of the successful write operation which > - * needs to be commited, while the rest needs to be unreserved. > - * Written might be zero if no data was written. > - */ > int (*iomap_end)(struct inode *inode, loff_t pos, loff_t length, > ssize_t written, unsigned flags, struct iomap *iomap); > }; > @@ -207,6 +281,7 @@ struct iomap_ops { > * @flags: Zero or more of the iomap_begin flags above. > * @iomap: Map describing the I/O iteration > * @srcmap: Source map for COW operations > + * @private: internal use Completely useless description. Wrong as well. iomap_dio_rw() allows the caller to pass a filesystem private pointer which gets stored in the iomap_iter. The iomap_iter is then passed to iomap_dio_ops->submit_io, and the filesystem can then grab whatever it private structure it passes to the DIO from the iter and make use of it. i.e. if you are going to document this, it is a pointer to a filesystem owned private structure that was passed to the high level iomap IO function so it is available to individual filesystem methods that are passed the iomap_iter structure as it is iterated. > */ > struct iomap_iter { > struct inode *inode; > @@ -241,7 +316,7 @@ static inline u64 iomap_length(const struct iomap_iter *iter) > * @i: iteration structure > * > * Write operations on file systems with reflink support might require a > - * source and a destination map. This function retourns the source map > + * source and a destination map. This function returns the source map > * for a given operation, which may or may no be identical to the destination > * map in &i->iomap. > */ > @@ -281,42 +356,52 @@ loff_t iomap_seek_data(struct inode *inode, loff_t offset, > sector_t iomap_bmap(struct address_space *mapping, sector_t bno, > const struct iomap_ops *ops); > > -/* > - * Structure for writeback I/O completions. > +/** > + * struct iomap_ioend - for writeback I/O completions > + * > + * @io_list: next ioend in chain > + * @io_type: Care to update this? > + * @io_flags: IOMAP_F_* > + * @io_folios: folios added to ioend > + * @io_inode: file being written to > + * @io_size: size of the extent > + * @io_offset: offset in the file > + * @io_sector: start sector of ioend > + * @io_bio: bio being built > + * @io_inline_bio: MUST BE LAST! Yet another different doc comment format.... That said, these comments lose a lot of their relevance when the varible type is stripped out. e.g "io_folios" - is that a pointer to a list of folios, an array of folios, or something else? it's something else - it's a counter, and that's obvious from the "u32" type. It's not obvious from the comment in isolation. Hence I think just lifting the comments as "documentation" is less useful than not documenting them at all. I'd still have to go look at the code to know what these variables mean, and at that point there's no reason to look at the documentation.... > */ > struct iomap_ioend { > - struct list_head io_list; /* next ioend in chain */ > + struct list_head io_list; > u16 io_type; > - u16 io_flags; /* IOMAP_F_* */ > - u32 io_folios; /* folios added to ioend */ > - struct inode *io_inode; /* file being written to */ > - size_t io_size; /* size of the extent */ > - loff_t io_offset; /* offset in the file */ > - sector_t io_sector; /* start sector of ioend */ > - struct bio *io_bio; /* bio being built */ > - struct bio io_inline_bio; /* MUST BE LAST! */ > + u16 io_flags; > + u32 io_folios; > + struct inode *io_inode; > + size_t io_size; > + loff_t io_offset > + sector_t io_sector; > + struct bio *io_bio; > + struct bio io_inline_bio; > }; > > +/** > + * struct iomap_writeback_ops - used for writeback > + * > + * This structure is used to support dealing with a filesystem > + * ``struct address_space_operations.writepages()``, for writeback. > + * > + * @map_blocks: required, maps the blocks so that writeback can be performed on > + * the range starting at offset. > + * @prepare_ioend: optional, allows the file systems to perform actions just > + * before submitting the bio and/or override the bio end_io handler for > + * complex operations like copy on write extent manipulation or unwritten > + * extent conversions. > + * @discard_folio: optional, allows the file system to discard state on a page where > + * we failed to submit any I/O. > + */ And yet another different document comment format. Lines are still too long. This is also much harder to read than the original comments. The layout of the original comments make it very clear what is required or optional, now it's hidden in a big chunk of dense text. > struct iomap_writeback_ops { > - /* > - * Required, maps the blocks so that writeback can be performed on > - * the range starting at offset. > - */ > int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode, > loff_t offset); > - > - /* > - * Optional, allows the file systems to perform actions just before > - * submitting the bio and/or override the bio end_io handler for complex > - * operations like copy on write extent manipulation or unwritten extent > - * conversions. > - */ > int (*prepare_ioend)(struct iomap_ioend *ioend, int status); > - > - /* > - * Optional, allows the file system to discard state on a page where > - * we failed to submit any I/O. > - */ > void (*discard_folio)(struct folio *folio, loff_t pos); > }; > > @@ -334,26 +419,33 @@ int iomap_writepages(struct address_space *mapping, > struct writeback_control *wbc, struct iomap_writepage_ctx *wpc, > const struct iomap_writeback_ops *ops); > > -/* > - * Flags for direct I/O ->end_io: > +/** > + * DOC: Flags for direct I/O ->end_io > + * > + * * IOMAP_DIO_UNWRITTEN: covers unwritten extent(s) > + * * IOMAP_DIO_COW: covers COW extent(s) > */ > -#define IOMAP_DIO_UNWRITTEN (1 << 0) /* covers unwritten extent(s) */ > -#define IOMAP_DIO_COW (1 << 1) /* covers COW extent(s) */ > +#define IOMAP_DIO_UNWRITTEN (1 << 0) > +#define IOMAP_DIO_COW (1 << 1) > > +/** > + * struct iomap_dio_ops - used for direct IO > + * > + * This is used to support direct IO. > + * > + * @end_io: > + * @submit_io: > + * @bio_set: Filesystems wishing to attach private information to a direct io > + * bio must provide a ->submit_io method that attaches the additional > + * information to the bio and changes the ->bi_end_io callback to a custom > + * function. This function should, at a minimum, perform any relevant > + * post-processing of the bio and end with a call to iomap_dio_bio_end_io. > + */ Why does the documentation of bio_set document what end_io and submit_io do, and provide no information about what @bio_set actually does? Blindly lifting comments and making them documentation doesn't make for good documentation.... Cheers, Dave.
On Thu, May 18, 2023 at 08:01:05AM -0700, Luis Chamberlain wrote: > + Mapping of heading styles within this document: > + Heading 1 uses "====" above and below > + Heading 2 uses "====" > + Heading 3 uses "----" > + Heading 4 uses "````" > + Heading 5 uses "^^^^" > + Heading 6 uses "~~~~" > + Heading 7 uses "...." > + > + Sections are manually numbered because apparently that's what everyone Why are you picking different defaults then the rest of the kernel documentation? > + > +A modern block abstraction > +========================== > + > +**iomap** allows filesystems to query storage media for data using *byte > +ranges*. Since block mapping are provided for a *byte ranges* for cache data in > +memory, in the page cache, naturally this implies operations on block ranges > +will also deal with *multipage* operations in the page cache. **Folios** are > +used to help provide *multipage* operations in memory for the *byte ranges* > +being worked on. As mentioned you list time this information was circulated this is not true. iomap itself has nothing to with blocks, and even less so with the page cache per se. It just iterates over ranges of file data and applies work to it. > +iomap IO interfaces > +=================== > + > +You call **iomap** depending on the type of filesystem operation you are working > +on. We detail some of these interactions below. Who is you? > + > +iomap for bufferred IO writes > +----------------------------- > + > +You call **iomap** for buffered IO with: > + > + * ``iomap_file_buffered_write()`` - for buffered writes > + * ``iomap_page_mkwrite()`` - when dealing callbacks for > + ``struct vm_operations_struct`` > + > + * ``struct vm_operations_struct.page_mkwrite()`` > + * ``struct vm_operations_struct.fault()`` > + * ``struct vm_operations_struct.huge_fault()`` > + * ``struct vm_operations_struct`.pfn_mkwrite()`` > + > +You *may* use buffered writes to also deal with ``fallocate()``: > + > + * ``iomap_zero_range()`` on fallocate for zeroing > + * ``iomap_truncate_page()`` on fallocate for truncation > + > +Typically you'd also happen to use these on paths when updating an inode's size. I'm not really sure what this is trying to explain. It basically looks like filler text generated by machine learning algorithms.. The same is true for a large part of this document. > +A filesystem also needs to call **iomap** when assisting the VFS manipulating a > +file into the page cache. A file systsem doesn't _need_ to do anything. It may chose to do things, and the iomap based helpers might be useful for that. But again, I'm still not getting what this document is even trying to explain, as "to implement the method foo, use the iomap_foo" isn't really helping anyone. > +Converting filesystems from buffer-head to iomap guide > +====================================================== If you want such a guide, please keep it in a separate file from the iomap API documentation. I'd also suggest that you actually try such a conversion first, as that might help shaping the documentation :) > +Testing Direct IO > +================= > + > +Other than fstests you can use LTP's dio, however this tests is limited as it > +does not test stale data. > + > +{{{ > +./runltp -f dio -d /mnt1/scratch/tmp/ > +}}} How does this belong into an iomap documentation? If LTPs dio is really all that useful we should import it into xfstests, btw. I'm not sure it is, though. > +We try to document known issues that folks should be aware of with **iomap** here. Who is "we"? > + * DOC: Introduction > + * > + * iomap allows filesystems to sequentially iterate over byte addressable block > + * ranges on an inode and apply operations to it. > + * > + * iomap grew out of the need to provide a modern block mapping abstraction for > + * filesystems with the different IO access methods they support and assisting > + * the VFS with manipulating files into the page cache. iomap helpers are > + * provided for each of these mechanisms. However, block mapping is just one of > + * the features of iomap, given iomap supports DAX IO for filesystems and also > + * supports such the ``lseek``/``llseek`` ``SEEK_DATA``/``SEEK_HOLE`` > + * interfaces. > + * > + * Block mapping provides a mapping between data cached in memory and the > + * location on persistent storage where that data lives. `LWN has an great > + * review of the old buffer-heads block-mapping and why they are inefficient > + * <https://lwn.net/Articles/930173/>`, since the inception of Linux. Since > + * **buffer-heads** work on a 512-byte block based paradigm, it creates an > + * overhead for modern storage media which no longer necessarily works only on > + * 512-blocks. iomap is flexible providing block ranges in *bytes*. iomap, with > + * the support of folios, provides a modern replacement for **buffer-heads**. > + */ I really don't want random blurbs and links like this in the main header. If you want to ramble in a little howto that's fine, but the main header is not the place for it. Also please keep improvements to the header in a separate patch from adding Documentation/ documents.
Dave Chinner <david@fromorbit.com> writes: One little point among all the more substantial stuff: >> -/* >> - * Types of block ranges for iomap mappings: >> +/** >> + * DOC: iomap block ranges types > > I seriously dislike this "DOC:" keyword appearing everywhere. > We've already got a "this is a comment for documentation" annotation > in the "/**" comment prefix, having to add "DOC:" is entirely > redudant and unnecessary noise. DOC: actually isn't redundant, it causes the kernel-doc directive to pull that text into the rendered documentation. This document shows both the advantages and disadvantages of that mechanism, IMO. It allows the documentation to be kept with the code, where optimistic people think it is more likely to be updated. But it also scatters the material to the detriment of readers of the plain-text documentation. The rendered version of iomap.rst is rather more complete and comprehensible than the RST file itself. Thanks, jon
On Thu, May 18, 2023 at 10:04:43PM -0700, Christoph Hellwig wrote: > On Thu, May 18, 2023 at 08:01:05AM -0700, Luis Chamberlain wrote: > > + Mapping of heading styles within this document: > > + Heading 1 uses "====" above and below > > + Heading 2 uses "====" > > + Heading 3 uses "----" > > + Heading 4 uses "````" > > + Heading 5 uses "^^^^" > > + Heading 6 uses "~~~~" > > + Heading 7 uses "...." > > + > > + Sections are manually numbered because apparently that's what everyone > > Why are you picking different defaults then the rest of the kernel > documentation? I bet Luis copied that from the online fsck document. IIRC the doc generator is smart enough to handle per-file heading usage. The rst parser sourcecode doesn't seem to have harcoded defaults; every time it sees an unfamiliar heading style in a .rst file, it adds that as the next level down in the hierarchy. Also, where are the "proper" headings documented for Documentation/? (Skip to the end; it's late and I don't have time right now to read the content of this patch.) > > + > > +A modern block abstraction > > +========================== > > + > > +**iomap** allows filesystems to query storage media for data using *byte > > +ranges*. Since block mapping are provided for a *byte ranges* for cache data in > > +memory, in the page cache, naturally this implies operations on block ranges > > +will also deal with *multipage* operations in the page cache. **Folios** are > > +used to help provide *multipage* operations in memory for the *byte ranges* > > +being worked on. > > As mentioned you list time this information was circulated this is not > true. iomap itself has nothing to with blocks, and even less so with > the page cache per se. It just iterates over ranges of file data and > applies work to it. > > > +iomap IO interfaces > > +=================== > > + > > +You call **iomap** depending on the type of filesystem operation you are working > > +on. We detail some of these interactions below. > > Who is you? > > > + > > +iomap for bufferred IO writes > > +----------------------------- > > + > > +You call **iomap** for buffered IO with: > > + > > + * ``iomap_file_buffered_write()`` - for buffered writes > > + * ``iomap_page_mkwrite()`` - when dealing callbacks for > > + ``struct vm_operations_struct`` > > + > > + * ``struct vm_operations_struct.page_mkwrite()`` > > + * ``struct vm_operations_struct.fault()`` > > + * ``struct vm_operations_struct.huge_fault()`` > > + * ``struct vm_operations_struct`.pfn_mkwrite()`` > > + > > +You *may* use buffered writes to also deal with ``fallocate()``: > > + > > + * ``iomap_zero_range()`` on fallocate for zeroing > > + * ``iomap_truncate_page()`` on fallocate for truncation > > + > > +Typically you'd also happen to use these on paths when updating an inode's size. > > I'm not really sure what this is trying to explain. It basically looks > like filler text generated by machine learning algorithms.. > > The same is true for a large part of this document. > > > +A filesystem also needs to call **iomap** when assisting the VFS manipulating a > > +file into the page cache. > > A file systsem doesn't _need_ to do anything. It may chose to do > things, and the iomap based helpers might be useful for that. But > again, I'm still not getting what this document is even trying to > explain, as "to implement the method foo, use the iomap_foo" isn't > really helping anyone. > > > +Converting filesystems from buffer-head to iomap guide > > +====================================================== > > If you want such a guide, please keep it in a separate file from the > iomap API documentation. I'd also suggest that you actually try such > a conversion first, as that might help shaping the documentation :) > > > +Testing Direct IO > > +================= > > + > > +Other than fstests you can use LTP's dio, however this tests is limited as it > > +does not test stale data. > > + > > +{{{ > > +./runltp -f dio -d /mnt1/scratch/tmp/ > > +}}} > > How does this belong into an iomap documentation? If LTPs dio is really > all that useful we should import it into xfstests, btw. I'm not sure it > is, though. > > > +We try to document known issues that folks should be aware of with **iomap** here. > > Who is "we"? > > > + * DOC: Introduction > > + * > > + * iomap allows filesystems to sequentially iterate over byte addressable block > > + * ranges on an inode and apply operations to it. > > + * > > + * iomap grew out of the need to provide a modern block mapping abstraction for > > + * filesystems with the different IO access methods they support and assisting > > + * the VFS with manipulating files into the page cache. iomap helpers are > > + * provided for each of these mechanisms. However, block mapping is just one of > > + * the features of iomap, given iomap supports DAX IO for filesystems and also > > + * supports such the ``lseek``/``llseek`` ``SEEK_DATA``/``SEEK_HOLE`` > > + * interfaces. > > + * > > + * Block mapping provides a mapping between data cached in memory and the > > + * location on persistent storage where that data lives. `LWN has an great > > + * review of the old buffer-heads block-mapping and why they are inefficient > > + * <https://lwn.net/Articles/930173/>`, since the inception of Linux. Since > > + * **buffer-heads** work on a 512-byte block based paradigm, it creates an > > + * overhead for modern storage media which no longer necessarily works only on > > + * 512-blocks. iomap is flexible providing block ranges in *bytes*. iomap, with > > + * the support of folios, provides a modern replacement for **buffer-heads**. > > + */ > > I really don't want random blurbs and links like this in the main > header. If you want to ramble in a little howto that's fine, but the > main header is not the place for it. > > Also please keep improvements to the header in a separate patch from > adding Documentation/ documents. Frankly I don't really like the iomap.h changes -- that's going to blow up the git blame on that file, just to produce a stilted-language manpage. Someone who wants to port a filesystem to iomap (or write a new fs) will need a coherent narrative (you know, with paragraphs and sentences) about how to build this piece and that. The rst file under Documentation/ is the place for that, not trying to mash it into a C header. --D
On 5/22/23 18:20, Darrick J. Wong wrote: > On Thu, May 18, 2023 at 10:04:43PM -0700, Christoph Hellwig wrote: >> On Thu, May 18, 2023 at 08:01:05AM -0700, Luis Chamberlain wrote: >>> + Mapping of heading styles within this document: >>> + Heading 1 uses "====" above and below >>> + Heading 2 uses "====" >>> + Heading 3 uses "----" >>> + Heading 4 uses "````" >>> + Heading 5 uses "^^^^" >>> + Heading 6 uses "~~~~" >>> + Heading 7 uses "...." >>> + >>> + Sections are manually numbered because apparently that's what everyone >> >> Why are you picking different defaults then the rest of the kernel >> documentation? > > I bet Luis copied that from the online fsck document. > > IIRC the doc generator is smart enough to handle per-file heading usage. > The rst parser sourcecode doesn't seem to have harcoded defaults; every > time it sees an unfamiliar heading style in a .rst file, it adds that as > the next level down in the hierarchy. > > Also, where are the "proper" headings documented for Documentation/? Documentation/doc-guide/sphinx.rst: * Please stick to this order of heading adornments: and following lines.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index fbb2b5ada95b..6186ab7c3ea8 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -34,6 +34,7 @@ algorithms work. seq_file sharedsubtree idmappings + iomap automount-support diff --git a/Documentation/filesystems/iomap.rst b/Documentation/filesystems/iomap.rst new file mode 100644 index 000000000000..be487030fcff --- /dev/null +++ b/Documentation/filesystems/iomap.rst @@ -0,0 +1,253 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. _iomap: + +.. + Mapping of heading styles within this document: + Heading 1 uses "====" above and below + Heading 2 uses "====" + Heading 3 uses "----" + Heading 4 uses "````" + Heading 5 uses "^^^^" + Heading 6 uses "~~~~" + Heading 7 uses "...." + + Sections are manually numbered because apparently that's what everyone + does in the kernel. +.. contents:: Table of Contents + :local: + +===== +iomap +===== + +.. kernel-doc:: include/linux/iomap.h + +A modern block abstraction +========================== + +**iomap** allows filesystems to query storage media for data using *byte +ranges*. Since block mapping are provided for a *byte ranges* for cache data in +memory, in the page cache, naturally this implies operations on block ranges +will also deal with *multipage* operations in the page cache. **Folios** are +used to help provide *multipage* operations in memory for the *byte ranges* +being worked on. + + +iomap IO interfaces +=================== + +You call **iomap** depending on the type of filesystem operation you are working +on. We detail some of these interactions below. + +iomap for bufferred IO writes +----------------------------- + +You call **iomap** for buffered IO with: + + * ``iomap_file_buffered_write()`` - for buffered writes + * ``iomap_page_mkwrite()`` - when dealing callbacks for + ``struct vm_operations_struct`` + + * ``struct vm_operations_struct.page_mkwrite()`` + * ``struct vm_operations_struct.fault()`` + * ``struct vm_operations_struct.huge_fault()`` + * ``struct vm_operations_struct`.pfn_mkwrite()`` + +You *may* use buffered writes to also deal with ``fallocate()``: + + * ``iomap_zero_range()`` on fallocate for zeroing + * ``iomap_truncate_page()`` on fallocate for truncation + +Typically you'd also happen to use these on paths when updating an inode's size. + +iomap for direct IO +------------------- + +You call **iomap** for direct IO with: + + * ``iomap_dio_rw()`` + +You **may** use direct IO writes to also deal with ``fallocate()``: + + * ``iomap_zero_range()`` on fallocate for zeroing + * ``iomap_truncate_page()`` on fallocate for truncation + +Typically you'd also happen to use these on paths when updating an inode's size. + +iomap for reads +--------------- + +You can call into **iomap** for reading, ie, dealing with the filesystems's +``struct file_operations``: + + * ``struct file_operations.read_iter()``: note that depending on the type of + read your filesystem might use ``iomap_dio_rw()`` for direct IO, + generic_file_read_iter() for buffered IO and + ``dax_iomap_rw()`` for DAX. + * ``struct file_operations.remap_file_range()`` - currently the special + ``dax_remap_file_range_prep()`` helper is provided for DAX mode reads. + +iomap for userspace file extent mapping +--------------------------------------- + +The ``fiemap`` ioctl can be used to allow userspace to get a file extent +mapping. The older ``bmap()`` (aka ``FIBMAP``) allows the VM to map logical +block offset to physical block number. ``bmap()`` is a legacy block mapping +operation supported only for the ioctl and two areas in the kernel which likely +are broken (the default swapfile implementation and odd md bitmap code). +``bmap()`` was only useful in the days of ext2 when there were no support for +delalloc or unwritten extents. Consequently, the interface reports nothing for +those types of mappings. Because of this we don't want filesystems to start +exporting this interface if they don't already do so. + +The ``fiemap`` ioctl is supported through an inode ``struct +inode_operations.fiemap()`` callback. + +You would use ``iomap_fiemap()`` to provide the mapping. You could use two +seperate ``struct iomap_ops`` one for when requested to also map extended +attributes (``FIEMAP_FLAG_XATTR``) and your another ``struct iomap_ops`` for +regular read ``struct iomap_ops`` when there is no need for extended attributes. +In the future **iomap** may provide its own dedicated ops structure for +``fiemap``. + +``iomap_bmap()`` exists and should *only be used* by filesystems that +**already** supported ``FIBMAP``. ``FIBMAP`` **should not be used** with the +address_space -- we have iomap readpages and writepages for that. + +iomap for assisting the VFS +--------------------------- + +A filesystem also needs to call **iomap** when assisting the VFS manipulating a +file into the page cache. + +iomap for VFS reading +--------------------- + +A filesystem can call **iomap** to deal with the VFS reading a file into folios +with: + + * ``iomap_bmap()`` - called to assist the VFS when manipulating page cache with + ``struct address_space_operations.bmap()``, to help the VFS map a logical + block offset to physical block number. + * ``iomap_read_folio()`` - called to assist the page cache with + ``struct address_space_operations.read_folio()`` + * ``iomap_readahead()`` - called to assist the page cache with + ``struct address_space_operations.readahead()`` + +iomap for VFS writepages +------------------------ + +A filesystem can call **iomap** to deal with the VFS write out of pages back to +backing store, that is to help deal with a filesystems's ``struct +address_space_operations.writepages()``. The special ``iomap_writepages()`` is +used for this case with its own respective filestems's ``struct iomap_ops`` for +this. + +iomap for VFS llseek +-------------------- + +A filesystem ``struct address_space_operations.llseek()`` is used by the VFS +when it needs to move the current file offset, the file offset is in ``struct +file.f_pos``. **iomap** has special support for the ``llseek`` ``SEEK_HOLE`` or +``SEEK_DATA`` interfaces: + + * ``iomap_seek_hole()``: for when the + ``struct address_space_operations.llseek()`` *whence* argument is + ``SEEK_HOLE``, when looking for the file's next hole. + * ``iomap_seek_data()``: for when the + ``struct address_space_operations.llseek()`` *whence* argument isj + ``SEEK_DATA`` when looking for the file's next data area. + +Your own ``struct iomap_ops`` for this is encouraged. + +iomap for DAX +------------- +You can use ``dax_iomap_rw()`` when calling iomap from a DAX context, this is +typically from the filesystems's ``struct file_operations.write_iter()`` +callback. + +Converting filesystems from buffer-head to iomap guide +====================================================== + +These are generic guidelines on converting a filesystem over to **iomap** from +'''buffer-heads'''. + +One op at at time +----------------- + +You may try to convert a filesystem with different clustered set of operations +at time, below are a generic order you may strive to target: + + * direct io + * miscellaneous helpers (seek/fiemap/bmap) + * buffered io + +Defining a simple filesystem +---------------------------- + +A simple filesystem is perhaps the easiest to convert over to **iomap**, a +simple filesystem is one which: + + * does not use fsverify, fscrypt, compression + * has no Copy on Write support (reflinks) + +Converting a simple filesystem to iomap +--------------------------------------- + +Simple filesystems should covert to IOMAP piecemeal wise first converting over +**direct IO**, then the miscellaneous helpers (seek/fiemap/bmap) and last +should be buffered IO. + +Converting shared filesystem features +------------------------------------- + +Shared filesystems features such as fscrypt, compression, erasure coding, and +any other data transformations need to be ported to **iomap** first, as none of +the current **iomap** users require any of this functionality. + +Converting complex filesystems +------------------------------ + +If your filesystem relies on any shared filesystem features mentioned above +those would need to be converted piecemeal wise. If reflinks are supported you +need to first ensure proper locking sanity in order to be able to address byte +ranges can be handled properly through **iomap** operations. An example +filesystem where this work is taking place is btrfs. + +IOMAP_F_BUFFER_HEAD considerations +---------------------------------- + +``IOMAP_F_BUFFER_HEAD`` won't be removed until we have all filesystem fully +converted away from **buffer-heads**, and this could be never. + +``IOMAP_F_BUFFER_HEAD`` should be avoided as a stepping stone / to port +filesystems over to **iomap** as it's support for **buffer-heads** only apply to +the buffered write path and nothing else including the read_folio/readahead and +writepages aops. + +Testing Direct IO +================= + +Other than fstests you can use LTP's dio, however this tests is limited as it +does not test stale data. + +{{{ +./runltp -f dio -d /mnt1/scratch/tmp/ +}}} + +Known issues and future improvements +==================================== + +We try to document known issues that folks should be aware of with **iomap** here. + + * write amplification on IOMAP when bs < ps: **iomap** needs improvements for + large folios for dirty bitmap tracking + * filesystems which use buffer head helpers such as ``sb_bread()`` and friends + will have to continue to use buffer heads as there is no generic iomap + metadata read/write library yet. + +References +========== + + * `Presentation on iomap evolution`<https://docs.google.com/presentation/d/e/2PACX-1vSN4TmhiTu1c6HNv6_gJZFqbFZpbF7GkABllSwJw5iLnSYKkkO-etQJ3AySYEbgJA/pub?start=true&loop=false&delayms=3000&slide=id.g189cfd05063_0_185>` + * `LWN review on deprecating buffer-heads <https://lwn.net/Articles/930173/>` diff --git a/include/linux/iomap.h b/include/linux/iomap.h index e2b836c2e119..ee4b026995ac 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -10,6 +10,30 @@ #include <linux/mm_types.h> #include <linux/blkdev.h> +/** + * DOC: Introduction + * + * iomap allows filesystems to sequentially iterate over byte addressable block + * ranges on an inode and apply operations to it. + * + * iomap grew out of the need to provide a modern block mapping abstraction for + * filesystems with the different IO access methods they support and assisting + * the VFS with manipulating files into the page cache. iomap helpers are + * provided for each of these mechanisms. However, block mapping is just one of + * the features of iomap, given iomap supports DAX IO for filesystems and also + * supports such the ``lseek``/``llseek`` ``SEEK_DATA``/``SEEK_HOLE`` + * interfaces. + * + * Block mapping provides a mapping between data cached in memory and the + * location on persistent storage where that data lives. `LWN has an great + * review of the old buffer-heads block-mapping and why they are inefficient + * <https://lwn.net/Articles/930173/>`, since the inception of Linux. Since + * **buffer-heads** work on a 512-byte block based paradigm, it creates an + * overhead for modern storage media which no longer necessarily works only on + * 512-blocks. iomap is flexible providing block ranges in *bytes*. iomap, with + * the support of folios, provides a modern replacement for **buffer-heads**. + */ + struct address_space; struct fiemap_extent_info; struct inode; @@ -22,37 +46,43 @@ struct page; struct vm_area_struct; struct vm_fault; -/* - * Types of block ranges for iomap mappings: +/** + * DOC: iomap block ranges types + * + * * IOMAP_HOLE - no blocks allocated, need allocation + * * IOMAP_DELALLOC - delayed allocation blocks + * * IOMAP_MAPPED - blocks allocated at @addr + * * IOMAP_UNWRITTEN - blocks allocated at @addr in unwritten state + * * IOMAP_INLINE - data inline in the inode */ -#define IOMAP_HOLE 0 /* no blocks allocated, need allocation */ -#define IOMAP_DELALLOC 1 /* delayed allocation blocks */ -#define IOMAP_MAPPED 2 /* blocks allocated at @addr */ -#define IOMAP_UNWRITTEN 3 /* blocks allocated at @addr in unwritten state */ -#define IOMAP_INLINE 4 /* data inline in the inode */ +#define IOMAP_HOLE 0 +#define IOMAP_DELALLOC 1 +#define IOMAP_MAPPED 2 +#define IOMAP_UNWRITTEN 3 +#define IOMAP_INLINE 4 -/* - * Flags reported by the file system from iomap_begin: +/** + * DOC: Flags reported by the file system from iomap_begin * - * IOMAP_F_NEW indicates that the blocks have been newly allocated and need - * zeroing for areas that no data is copied to. + * * IOMAP_F_NEW: indicates that the blocks have been newly allocated and need + * zeroing for areas that no data is copied to. * - * IOMAP_F_DIRTY indicates the inode has uncommitted metadata needed to access - * written data and requires fdatasync to commit them to persistent storage. - * This needs to take into account metadata changes that *may* be made at IO - * completion, such as file size updates from direct IO. + * * IOMAP_F_DIRTY: indicates the inode has uncommitted metadata needed to access + * written data and requires fdatasync to commit them to persistent storage. + * This needs to take into account metadata changes that *may* be made at IO + * completion, such as file size updates from direct IO. * - * IOMAP_F_SHARED indicates that the blocks are shared, and will need to be - * unshared as part a write. + * * IOMAP_F_SHARED: indicates that the blocks are shared, and will need to be + * unshared as part a write. * - * IOMAP_F_MERGED indicates that the iomap contains the merge of multiple block - * mappings. + * * IOMAP_F_MERGED: indicates that the iomap contains the merge of multiple + * block mappings. * - * IOMAP_F_BUFFER_HEAD indicates that the file system requires the use of - * buffer heads for this mapping. + * * IOMAP_F_BUFFER_HEAD: indicates that the file system requires the use of + * buffer heads for this mapping. * - * IOMAP_F_XATTR indicates that the iomap is for an extended attribute extent - * rather than a file data extent. + * * IOMAP_F_XATTR: indicates that the iomap is for an extended attribute extent + * rather than a file data extent. */ #define IOMAP_F_NEW (1U << 0) #define IOMAP_F_DIRTY (1U << 1) @@ -61,22 +91,20 @@ struct vm_fault; #define IOMAP_F_BUFFER_HEAD (1U << 4) #define IOMAP_F_XATTR (1U << 5) -/* - * Flags set by the core iomap code during operations: +/** + * DOC: Flags set by the core iomap code during operations + * + * * IOMAP_F_SIZE_CHANGED: indicates to the iomap_end method that the file size + * has changed as the result of this write operation. * - * IOMAP_F_SIZE_CHANGED indicates to the iomap_end method that the file size - * has changed as the result of this write operation. + * * IOMAP_F_STALE: indicates that the iomap is not valid any longer and the file + * range it covers needs to be remapped by the high level before the + * operation can proceed. * - * IOMAP_F_STALE indicates that the iomap is not valid any longer and the file - * range it covers needs to be remapped by the high level before the operation - * can proceed. + * * IOMAP_F_PRIVATE: Flags from 0x1000 up are for file system specific usage */ #define IOMAP_F_SIZE_CHANGED (1U << 8) #define IOMAP_F_STALE (1U << 9) - -/* - * Flags from 0x1000 up are for file system specific usage: - */ #define IOMAP_F_PRIVATE (1U << 12) @@ -124,73 +152,119 @@ static inline bool iomap_inline_data_valid(const struct iomap *iomap) return iomap->length <= PAGE_SIZE - offset_in_page(iomap->inline_data); } -/* - * When a filesystem sets folio_ops in an iomap mapping it returns, get_folio - * and put_folio will be called for each folio written to. This only applies - * to buffered writes as unbuffered writes will not typically have folios - * associated with them. - * - * When get_folio succeeds, put_folio will always be called to do any - * cleanup work necessary. put_folio is responsible for unlocking and putting - * @folio. +/** + * struct iomap_folio_ops - buffered writes folio folio reference count helpers + * + * A filesystem can optionally set folio_ops in a &struct iomap mapping it + * returns to override the default get_folio and put_folio for each folio + * written to. This only applies to buffered writes as unbuffered writes will + * not typically have folios associated with them. + * + * @get_folio: iomap defaults to iomap_get_folio() (which calls + * __filemap_get_folio()) if the filesystem did not provide a get folio op. + * + * @put_folio: when get_folio succeeds, put_folio will always be called to do + * any cleanup work necessary. put_folio is responsible for unlocking and + * putting @folio. + * + * @iomap_valid: check that the cached iomap still maps correctly to the + * filesystem's internal extent map. FS internal extent maps can change + * while iomap is iterating a cached iomap, so this hook allows iomap to + * detect that the iomap needs to be refreshed during a long running write operation. + * + * The filesystem can store internal state (e.g. a sequence number) in + * iomap->validity_cookie when the iomap is first mapped to be able to + * detect changes between mapping time and whenever .iomap_valid() is + * called. + * + * This is called with the folio over the specified file position held + * locked by the iomap code. This is useful for filesystems that have + * dynamic mappings (e.g. anything other than zonefs). An example reason + * as to why this is necessary is writeback doesn't take the vfs locks. */ struct iomap_folio_ops { struct folio *(*get_folio)(struct iomap_iter *iter, loff_t pos, unsigned len); void (*put_folio)(struct inode *inode, loff_t pos, unsigned copied, struct folio *folio); - - /* - * Check that the cached iomap still maps correctly to the filesystem's - * internal extent map. FS internal extent maps can change while iomap - * is iterating a cached iomap, so this hook allows iomap to detect that - * the iomap needs to be refreshed during a long running write - * operation. - * - * The filesystem can store internal state (e.g. a sequence number) in - * iomap->validity_cookie when the iomap is first mapped to be able to - * detect changes between mapping time and whenever .iomap_valid() is - * called. - * - * This is called with the folio over the specified file position held - * locked by the iomap code. - */ bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap); }; -/* - * Flags for iomap_begin / iomap_end. No flag implies a read. +/** + * DOC: Flags for iomap_begin / iomap_end. No flag implies a read. + * + * * IOMAP_WRITE: writing, must allocate blocks + * * IOMAP_ZERO: zeroing operation, may skip holes + * * IOMAP_REPORT: report extent status, e.g. FIEMAP + * * IOMAP_FAULT: mapping for page fault + * * IOMAP_DIRECT: direct I/O + * * IOMAP_NOWAIT: do not block + * * IOMAP_OVERWRITE_ONLY: only pure overwrites allowed + * * IOMAP_UNSHARE: unshare_file_range + * * IOMAP_DAX: DAX mapping */ -#define IOMAP_WRITE (1 << 0) /* writing, must allocate blocks */ -#define IOMAP_ZERO (1 << 1) /* zeroing operation, may skip holes */ -#define IOMAP_REPORT (1 << 2) /* report extent status, e.g. FIEMAP */ -#define IOMAP_FAULT (1 << 3) /* mapping for page fault */ -#define IOMAP_DIRECT (1 << 4) /* direct I/O */ -#define IOMAP_NOWAIT (1 << 5) /* do not block */ -#define IOMAP_OVERWRITE_ONLY (1 << 6) /* only pure overwrites allowed */ -#define IOMAP_UNSHARE (1 << 7) /* unshare_file_range */ +#define IOMAP_WRITE (1 << 0) +#define IOMAP_ZERO (1 << 1) +#define IOMAP_REPORT (1 << 2) +#define IOMAP_FAULT (1 << 3) +#define IOMAP_DIRECT (1 << 4) +#define IOMAP_NOWAIT (1 << 5) +#define IOMAP_OVERWRITE_ONLY (1 << 6) +#define IOMAP_UNSHARE (1 << 7) #ifdef CONFIG_FS_DAX -#define IOMAP_DAX (1 << 8) /* DAX mapping */ +#define IOMAP_DAX (1 << 8) #else #define IOMAP_DAX 0 #endif /* CONFIG_FS_DAX */ +/** + * struct iomap_ops - IO interface specific operations + * + * A filesystem is must provide a &struct iomap_ops for to deal with the + * beginning an IO operation, iomap_begin(), and ending an IO operation on a + * block range, ``iomap_end()``. You would call iomap with a specialized iomap + * operation depending on its filesystem or the VFS needs. + * + * For example iomap_dio_rw() would be used for for a filesystem when doing a + * block range read or write operation with direct IO. In this case your + * filesystem's respective &struct file_operations.write_iter() would eventually + * call iomap_dio_rw() on the filesystem's &struct file_operations.write_iter(). + * + * For buffered IO a filesystem would use iomap_file_buffered_write() on the + * same &struct file_operations.write_iter(). But that is not the only situation + * in which a filesystem would deal with buffered writes, you could also use + * buffered writes when a filesystem has to deal with &struct + * file_operations.fallocate(). However fallocate() can be used for *zeroing* or + * for *truncation* purposes. A special respective iomap_zero_range() would be + * used for *zeroing* and a iomap_truncate_page() would be used for + * *truncation*. + * + * Experience with adopting iomap on filesystems have shown that the filesystem + * implementation of these operations can be simplified considerably if one + * &struct iomap_ops is provided per major filesystem IO operation: + * + * * buffered io + * * direct io + * * DAX io + * * fiemap for with extended attributes (``FIEMAP_FLAG_XATTR``) + * * lseek + * + * @iomap_begin: return the existing mapping at pos, or reserve space starting + * at pos for up to length, as long as we can do it as a single mapping. The + * actual length is returned in iomap->length. The &struct iomap iomap must + * always be set. The &struct iomap srcmap should be set if the range is + * CoW. + * + * @iomap_end: commit and/or unreserve space previous allocated using + * iomap_begin. Written indicates the length of the successful write + * operation which needs to be committed, while the rest needs to be + * unreserved. Written might be zero if no data was written. + */ struct iomap_ops { - /* - * Return the existing mapping at pos, or reserve space starting at - * pos for up to length, as long as we can do it as a single mapping. - * The actual length is returned in iomap->length. - */ int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length, unsigned flags, struct iomap *iomap, struct iomap *srcmap); - /* - * Commit and/or unreserve space previous allocated using iomap_begin. - * Written indicates the length of the successful write operation which - * needs to be commited, while the rest needs to be unreserved. - * Written might be zero if no data was written. - */ int (*iomap_end)(struct inode *inode, loff_t pos, loff_t length, ssize_t written, unsigned flags, struct iomap *iomap); }; @@ -207,6 +281,7 @@ struct iomap_ops { * @flags: Zero or more of the iomap_begin flags above. * @iomap: Map describing the I/O iteration * @srcmap: Source map for COW operations + * @private: internal use */ struct iomap_iter { struct inode *inode; @@ -241,7 +316,7 @@ static inline u64 iomap_length(const struct iomap_iter *iter) * @i: iteration structure * * Write operations on file systems with reflink support might require a - * source and a destination map. This function retourns the source map + * source and a destination map. This function returns the source map * for a given operation, which may or may no be identical to the destination * map in &i->iomap. */ @@ -281,42 +356,52 @@ loff_t iomap_seek_data(struct inode *inode, loff_t offset, sector_t iomap_bmap(struct address_space *mapping, sector_t bno, const struct iomap_ops *ops); -/* - * Structure for writeback I/O completions. +/** + * struct iomap_ioend - for writeback I/O completions + * + * @io_list: next ioend in chain + * @io_type: + * @io_flags: IOMAP_F_* + * @io_folios: folios added to ioend + * @io_inode: file being written to + * @io_size: size of the extent + * @io_offset: offset in the file + * @io_sector: start sector of ioend + * @io_bio: bio being built + * @io_inline_bio: MUST BE LAST! */ struct iomap_ioend { - struct list_head io_list; /* next ioend in chain */ + struct list_head io_list; u16 io_type; - u16 io_flags; /* IOMAP_F_* */ - u32 io_folios; /* folios added to ioend */ - struct inode *io_inode; /* file being written to */ - size_t io_size; /* size of the extent */ - loff_t io_offset; /* offset in the file */ - sector_t io_sector; /* start sector of ioend */ - struct bio *io_bio; /* bio being built */ - struct bio io_inline_bio; /* MUST BE LAST! */ + u16 io_flags; + u32 io_folios; + struct inode *io_inode; + size_t io_size; + loff_t io_offset + sector_t io_sector; + struct bio *io_bio; + struct bio io_inline_bio; }; +/** + * struct iomap_writeback_ops - used for writeback + * + * This structure is used to support dealing with a filesystem + * ``struct address_space_operations.writepages()``, for writeback. + * + * @map_blocks: required, maps the blocks so that writeback can be performed on + * the range starting at offset. + * @prepare_ioend: optional, allows the file systems to perform actions just + * before submitting the bio and/or override the bio end_io handler for + * complex operations like copy on write extent manipulation or unwritten + * extent conversions. + * @discard_folio: optional, allows the file system to discard state on a page where + * we failed to submit any I/O. + */ struct iomap_writeback_ops { - /* - * Required, maps the blocks so that writeback can be performed on - * the range starting at offset. - */ int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode, loff_t offset); - - /* - * Optional, allows the file systems to perform actions just before - * submitting the bio and/or override the bio end_io handler for complex - * operations like copy on write extent manipulation or unwritten extent - * conversions. - */ int (*prepare_ioend)(struct iomap_ioend *ioend, int status); - - /* - * Optional, allows the file system to discard state on a page where - * we failed to submit any I/O. - */ void (*discard_folio)(struct folio *folio, loff_t pos); }; @@ -334,26 +419,33 @@ int iomap_writepages(struct address_space *mapping, struct writeback_control *wbc, struct iomap_writepage_ctx *wpc, const struct iomap_writeback_ops *ops); -/* - * Flags for direct I/O ->end_io: +/** + * DOC: Flags for direct I/O ->end_io + * + * * IOMAP_DIO_UNWRITTEN: covers unwritten extent(s) + * * IOMAP_DIO_COW: covers COW extent(s) */ -#define IOMAP_DIO_UNWRITTEN (1 << 0) /* covers unwritten extent(s) */ -#define IOMAP_DIO_COW (1 << 1) /* covers COW extent(s) */ +#define IOMAP_DIO_UNWRITTEN (1 << 0) +#define IOMAP_DIO_COW (1 << 1) +/** + * struct iomap_dio_ops - used for direct IO + * + * This is used to support direct IO. + * + * @end_io: + * @submit_io: + * @bio_set: Filesystems wishing to attach private information to a direct io + * bio must provide a ->submit_io method that attaches the additional + * information to the bio and changes the ->bi_end_io callback to a custom + * function. This function should, at a minimum, perform any relevant + * post-processing of the bio and end with a call to iomap_dio_bio_end_io. + */ struct iomap_dio_ops { int (*end_io)(struct kiocb *iocb, ssize_t size, int error, unsigned flags); void (*submit_io)(const struct iomap_iter *iter, struct bio *bio, loff_t file_offset); - - /* - * Filesystems wishing to attach private information to a direct io bio - * must provide a ->submit_io method that attaches the additional - * information to the bio and changes the ->bi_end_io callback to a - * custom function. This function should, at a minimum, perform any - * relevant post-processing of the bio and end with a call to - * iomap_dio_bio_end_io. - */ struct bio_set *bio_set; };
To help with iomap adoption / porting I set out the goal to try to help improve the iomap documentation and get general guidance for filesystem conversions over from buffer-head in time for this year's LSFMM. The end results thanks to the review of Darrick, Christoph and others is on the kernelnewbies wiki [0]. This brings this forward a relevant subset of that documentation to the kernel in kdoc format and also kdoc'ifies the existing documentation on iomap.h. Tested with: make htmldocs SPHINXDIRS="filesystems" Then looking at the docs produced on: Documentation/output/filesystems/iomap.html [0] https://kernelnewbies.org/KernelProjects/iomap Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- Changes on v2: * use 80 char length as if we're in the 1980's Documentation/filesystems/index.rst | 1 + Documentation/filesystems/iomap.rst | 253 +++++++++++++++++++++ include/linux/iomap.h | 336 ++++++++++++++++++---------- 3 files changed, 468 insertions(+), 122 deletions(-) create mode 100644 Documentation/filesystems/iomap.rst