Message ID | 20230612135228.10702-5-sergei.shtepa@veeam.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blksnap - block devices snapshots module | expand |
On Mon, Jun 12, 2023 at 03:52:21PM +0200, Sergei Shtepa wrote: > The header file contains a set of declarations, structures and control > requests (ioctl) that allows to manage the module from the user space. > > Co-developed-by: Christoph Hellwig <hch@infradead.org> > Signed-off-by: Christoph Hellwig <hch@infradead.org> > Tested-by: Donald Buczek <buczek@molgen.mpg.de> > Signed-off-by: Sergei Shtepa <sergei.shtepa@veeam.com> > --- > MAINTAINERS | 1 + > include/uapi/linux/blksnap.h | 421 +++++++++++++++++++++++++++++++++++ > 2 files changed, 422 insertions(+) > create mode 100644 include/uapi/linux/blksnap.h ..... > +/** > + * struct blksnap_snapshot_append_storage - Argument for the > + * &IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE control. > + * > + * @id: > + * Snapshot ID. > + * @bdev_path: > + * Device path string buffer. > + * @bdev_path_size: > + * Device path string buffer size. > + * @count: > + * Size of @ranges in the number of &struct blksnap_sectors. > + * @ranges: > + * Pointer to the array of &struct blksnap_sectors. > + */ > +struct blksnap_snapshot_append_storage { > + struct blksnap_uuid id; > + __u64 bdev_path; > + __u32 bdev_path_size; > + __u32 count; > + __u64 ranges; > +}; > + > +/** > + * define IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE - Append storage to the > + * difference storage of the snapshot. > + * > + * The snapshot difference storage can be set either before or after creating > + * the snapshot images. This allows to dynamically expand the difference > + * storage while holding the snapshot. > + * > + * Return: 0 if succeeded, negative errno otherwise. > + */ > +#define IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE \ > + _IOW(BLKSNAP, blksnap_ioctl_snapshot_append_storage, \ > + struct blksnap_snapshot_append_storage) That's an API I'm extremely uncomfortable with. We've learnt the lesson *many times* that userspace physical mappings of underlying file storage are unreliable. i.e. This is reliant on userspace telling the kernel the physical mapping of the filesystem file to block device LBA space and then providing a guarantee (somehow) that the mapping will always remain unchanged. i.e. It's reliant on passing FIEMAP data from the filesystem to userspace and then back into the kernel without it becoming stale and somehow providing a guarantee that nothing (not even the filesystem doing internal garbage collection) will change it. It is reliant on userspace detecting shared blocks in files and avoiding them; it's reliant on userspace never being able to read, write or modify that file; it's reliant on the -filesystem- never modifying the layout of that file; it's even reliant on a internal filesystem state that has to be locked down before the block mapping can be delegated to a third party for IO control. Further, we can't allow userspace to have any read access to the snapshot file even after it is no longer in use by the blksnap driver. The contents of the file will span multiple security contexts, contain sensitive data, etc and so it's contents must never be exposed to userspace. We cannot rely on userspace to delete it safely after use and hence we have to protect it's contents from exposure to userspace, too. We already have a mechanism that provides all these guarantees to a third party kernel subsystem: swap files. We already have a trusted path in the kernel to allow internal block mapping of a swap file to be retreived by the mm subsystem. We also have an inode flag that protects it such files against access and modification from anything other than internal kernel IO paths. We also allow them to be allocated as unwritten extents using fallocate() and we are never converted to written whilist in use as a swapfile. Hence the contents of them cannot be exposed to userspace even if the swapfile flag is removed and owner/permission changes are made to the file after it is released by the kernel. Swap files are an intrinsically safe mechanism for delegating fixed file mappings to kernel subsystems that have requirements for secure, trusted storage that userspace cannot tamper with. I note that the code behind the IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE ends up in diff_storage_add_range(), which allocates an extent structure for each range and links it into a linked list for later use. This is effectively the same structure that the mm swapfile code uses. It provides a swap_info_struct and a struct file to the filesystem via aops->swap_activate. The filesystem then iterates the extent list for the file and calls add_swap_extent() for each physical range in the file. The mm code then allocates a new extent structure for the range and links it into the extent rbtree in the swap_info_struct. This is the mapping it uses later on in the IO path. Adding a similar, more generic mapping operation that allows a private structure and a callback to the provided would allow the filesystem to provide this callback directly to subsystems like blksnap. Essentially diff_storage_add_range() becomes the iterator callback for blksnap. This makes the whole "userspace provides the mapping" problem goes away and we can use the swapfile mechanisms to provide all the other guarantees the kernel needs to ensure it can trust the contents and mappings of the blksnap snapshot files.... Thoughts? -Dave.
On Wed, Jun 14, 2023 at 08:25:15AM +1000, Dave Chinner wrote: > > + * Return: 0 if succeeded, negative errno otherwise. > > + */ > > +#define IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE \ > > + _IOW(BLKSNAP, blksnap_ioctl_snapshot_append_storage, \ > > + struct blksnap_snapshot_append_storage) > > That's an API I'm extremely uncomfortable with. We've learnt the > lesson *many times* that userspace physical mappings of underlying > file storage are unreliable. > > i.e. This is reliant on userspace telling the kernel the physical > mapping of the filesystem file to block device LBA space and then > providing a guarantee (somehow) that the mapping will always remain > unchanged. i.e. It's reliant on passing FIEMAP data from the > filesystem to userspace and then back into the kernel without it > becoming stale and somehow providing a guarantee that nothing (not > even the filesystem doing internal garbage collection) will change > it. Hmm, I never thought of this API as used on files that somewhere had a logical to physical mapping applied to them. Sergey, is that the indtended use case? If so we really should be going through the file system using direct I/O.
On 6/14/23 08:26, Christoph Hellwig wrote: > Subject: > Re: [PATCH v5 04/11] blksnap: header file of the module interface > From: > Christoph Hellwig <hch@infradead.org> > Date: > 6/14/23, 08:26 > > To: > Dave Chinner <david@fromorbit.com> > CC: > Sergei Shtepa <sergei.shtepa@veeam.com>, axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, dchinner@redhat.com, willy@infradead.org, dlemoal@kernel.org, linux@weissschuh.net, jack@suse.cz, ming.lei@redhat.com, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Donald Buczek <buczek@molgen.mpg.de> > > > On Wed, Jun 14, 2023 at 08:25:15AM +1000, Dave Chinner wrote: >>> + * Return: 0 if succeeded, negative errno otherwise. >>> + */ >>> +#define IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE \ >>> + _IOW(BLKSNAP, blksnap_ioctl_snapshot_append_storage, \ >>> + struct blksnap_snapshot_append_storage) >> That's an API I'm extremely uncomfortable with. We've learnt the >> lesson *many times* that userspace physical mappings of underlying >> file storage are unreliable. >> >> i.e. This is reliant on userspace telling the kernel the physical >> mapping of the filesystem file to block device LBA space and then >> providing a guarantee (somehow) that the mapping will always remain >> unchanged. i.e. It's reliant on passing FIEMAP data from the >> filesystem to userspace and then back into the kernel without it >> becoming stale and somehow providing a guarantee that nothing (not >> even the filesystem doing internal garbage collection) will change >> it. > Hmm, I never thought of this API as used on files that somewhere > had a logical to physical mapping applied to them. > > Sergey, is that the indtended use case? If so we really should > be going through the file system using direct I/O. > Hi! Thank you, Dave, for such a detailed comment. Yes, everything is really as you described. This code worked quite successfully for the veeamsnap module, on the basis of which blksnap was created. Indeed, such an allocation of an area on a block device using a file does not look safe. We've already discussed this with Donald Buczek <buczek@molgen.mpg.de>. Link: https://github.com/veeam/blksnap/issues/57#issuecomment-1576569075 And I have planned work on moving to a more secure ioctl in the future. Link: https://github.com/veeam/blksnap/issues/61 Now, thanks to Dave, it becomes clear to me how to solve this problem best. swapfile is a good example of how to do it right. Fixing this vulnerability will entail transferring the algorithm for allocating the difference storage from the user-space to the blksnap code. The changes are quite significant. The UAPI will be changed. So I agree that the blksnap module is not good enough for upstream yet.
On Wed, Jun 14, 2023 at 11:26:20AM +0200, Sergei Shtepa wrote: > This code worked quite successfully for the veeamsnap module, on the > basis of which blksnap was created. Indeed, such an allocation of an > area on a block device using a file does not look safe. > > We've already discussed this with Donald Buczek <buczek@molgen.mpg.de>. > Link: https://github.com/veeam/blksnap/issues/57#issuecomment-1576569075 > And I have planned work on moving to a more secure ioctl in the future. > Link: https://github.com/veeam/blksnap/issues/61 > > Now, thanks to Dave, it becomes clear to me how to solve this problem best. > swapfile is a good example of how to do it right. I don't actually think swapfile is a very good idea, in fact the Linux swap code in general is not a very good place to look for inspirations :) IFF the usage is always to have a whole file for the diff storage the over all API is very simple - just pass a fd to the kernel for the area, and then use in-kernel direct I/O on it. Now if that file should also be able to reside on the same file system that the snapshot is taken of things get a little more complicated, because writes to it also need to automatically set the BIO_REFFED flag. I have some ideas for that and will share some draft code with you.
On 6/14/23 16:07, Christoph Hellwig wrote: > I don't actually think swapfile is a very good idea, in fact the Linux > swap code in general is not a very good place to look for inspirations >
On Wed, Jun 14, 2023 at 07:07:16AM -0700, Christoph Hellwig wrote: > On Wed, Jun 14, 2023 at 11:26:20AM +0200, Sergei Shtepa wrote: > > This code worked quite successfully for the veeamsnap module, on the > > basis of which blksnap was created. Indeed, such an allocation of an > > area on a block device using a file does not look safe. > > > > We've already discussed this with Donald Buczek <buczek@molgen.mpg.de>. > > Link: https://github.com/veeam/blksnap/issues/57#issuecomment-1576569075 > > And I have planned work on moving to a more secure ioctl in the future. > > Link: https://github.com/veeam/blksnap/issues/61 > > > > Now, thanks to Dave, it becomes clear to me how to solve this problem best. > > swapfile is a good example of how to do it right. > > I don't actually think swapfile is a very good idea, in fact the Linux > swap code in general is not a very good place to look for inspirations > :) Yeah, the swapfile implementation isn't very nice, I was really just using it as an example of how we can implement the requirements of block mapping delegation in a safe manner to a kernel subsystem. I think the important part is the swapfile inode flag, because that is what keeps userspace from being able to screw with the file while the kernel is using it and allows us to do read/write IO to unwritten extents without converting them to written... > IFF the usage is always to have a whole file for the diff storage the > over all API is very simple - just pass a fd to the kernel for the area, > and then use in-kernel direct I/O on it. Yeah, I was thinking a fd is a better choice for the UAPI as it frees up the kernel implementation, and it doesn't need us to pass a separate bdev identifier in the ioctl. It also means we can pass a regular file or a block device and the kernel code doesn't need to care that they are different. If you think direct IO is a better idea, then I have no objection to that - I haven't looked into the implementation that deeply at this point. I wanted to get an understanding of how all the pieces went together first, so all I've read is the documentation and looked at the UAPI. I made a leap from that: the documentation keeps talking about using files a the filesystem for the difference storage, but the only UAPI for telling the kernel about storage regions it can use is this physical bdev LBA mapping ioctl. Hence if file storage is being used.... > Now if that file should also > be able to reside on the same file system that the snapshot is taken > of things get a little more complicated, because writes to it also need > to automatically set the BIO_REFFED flag. I have some ideas for that > and will share some draft code with you. Cool, I look forward to the updates; I know of a couple of applications that could make use of this functionality right away.... Cheers, Dave.
On 2023-06-12 15:52:21+0200, Sergei Shtepa wrote: > [..] > diff --git a/include/uapi/linux/blksnap.h b/include/uapi/linux/blksnap.h > new file mode 100644 > index 000000000000..2fe3f2a43bc5 > --- /dev/null > +++ b/include/uapi/linux/blksnap.h > @@ -0,0 +1,421 @@ > [..] > +/** > + * struct blksnap_snapshotinfo - Result for the command > + * &blkfilter_ctl_blksnap.blkfilter_ctl_blksnap_snapshotinfo. > + * > + * @error_code: > + * Zero if there were no errors while holding the snapshot. > + * The error code -ENOSPC means that while holding the snapshot, a snapshot > + * overflow situation has occurred. Other error codes mean other reasons > + * for failure. > + * The error code is reset when the device is added to a new snapshot. > + * @image: > + * If the snapshot was taken, it stores the block device name of the > + * image, or empty string otherwise. > + */ > +struct blksnap_snapshotinfo { > + __s32 error_code; > + __u8 image[IMAGE_DISK_NAME_LEN]; Nitpick: Seems a bit weird to have a signed error code that is always negative. Couldn't this be an unsigned number or directly return the error from the ioctl() itself? > +}; > + > +/** > + * DOC: Interface for managing snapshots > + * > + * Control commands that are transmitted through the blksnap module interface. > + */ > +enum blksnap_ioctl { > + blksnap_ioctl_version, > + blksnap_ioctl_snapshot_create, > + blksnap_ioctl_snapshot_destroy, > + blksnap_ioctl_snapshot_append_storage, > + blksnap_ioctl_snapshot_take, > + blksnap_ioctl_snapshot_collect, > + blksnap_ioctl_snapshot_wait_event, > +}; > + > +/** > + * struct blksnap_version - Module version. > + * > + * @major: > + * Version major part. > + * @minor: > + * Version minor part. > + * @revision: > + * Revision number. > + * @build: > + * Build number. Should be zero. > + */ > +struct blksnap_version { > + __u16 major; > + __u16 minor; > + __u16 revision; > + __u16 build; > +}; > + > +/** > + * define IOCTL_BLKSNAP_VERSION - Get module version. > + * > + * The version may increase when the API changes. But linking the user space > + * behavior to the version code does not seem to be a good idea. > + * To ensure backward compatibility, API changes should be made by adding new > + * ioctl without changing the behavior of existing ones. The version should be > + * used for logs. > + * > + * Return: 0 if succeeded, negative errno otherwise. > + */ > +#define IOCTL_BLKSNAP_VERSION \ > + _IOW(BLKSNAP, blksnap_ioctl_version, struct blksnap_version) Shouldn't this be _IOR()? "_IOW means userland is writing and kernel is reading. _IOR means userland is reading and kernel is writing." The other ioctl definitions seem to need a review, too.
Hi! Thanks for the review. On 7/17/23 20:57, Thomas Weißschuh wrote: > Subject: > Re: [PATCH v5 04/11] blksnap: header file of the module interface > From: > Thomas Weißschuh <thomas@t-8ch.de> > Date: > 7/17/23, 20:57 > > To: > Sergei Shtepa <sergei.shtepa@veeam.com> > CC: > axboe@kernel.dk, hch@infradead.org, corbet@lwn.net, snitzer@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, dchinner@redhat.com, willy@infradead.org, dlemoal@kernel.org, jack@suse.cz, ming.lei@redhat.com, linux-block@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Donald Buczek <buczek@molgen.mpg.de> > > > On 2023-06-12 15:52:21+0200, Sergei Shtepa wrote: > >> [..] >> diff --git a/include/uapi/linux/blksnap.h b/include/uapi/linux/blksnap.h >> new file mode 100644 >> index 000000000000..2fe3f2a43bc5 >> --- /dev/null >> +++ b/include/uapi/linux/blksnap.h >> @@ -0,0 +1,421 @@ >> [..] >> +/** >> + * struct blksnap_snapshotinfo - Result for the command >> + * &blkfilter_ctl_blksnap.blkfilter_ctl_blksnap_snapshotinfo. >> + * >> + * @error_code: >> + * Zero if there were no errors while holding the snapshot. >> + * The error code -ENOSPC means that while holding the snapshot, a snapshot >> + * overflow situation has occurred. Other error codes mean other reasons >> + * for failure. >> + * The error code is reset when the device is added to a new snapshot. >> + * @image: >> + * If the snapshot was taken, it stores the block device name of the >> + * image, or empty string otherwise. >> + */ >> +struct blksnap_snapshotinfo { >> + __s32 error_code; >> + __u8 image[IMAGE_DISK_NAME_LEN]; > Nitpick: > > Seems a bit weird to have a signed error code that is always negative. > Couldn't this be an unsigned number or directly return the error from > the ioctl() itself? Yes, it's a good idea to pass the error code as an unsigned value. And this positive value can be passed in case of successful execution of ioctl(), but I would not like to put different error signs in one value. > >> +}; >> + >> +/** >> + * DOC: Interface for managing snapshots >> + * >> + * Control commands that are transmitted through the blksnap module interface. >> + */ >> +enum blksnap_ioctl { >> + blksnap_ioctl_version, >> + blksnap_ioctl_snapshot_create, >> + blksnap_ioctl_snapshot_destroy, >> + blksnap_ioctl_snapshot_append_storage, >> + blksnap_ioctl_snapshot_take, >> + blksnap_ioctl_snapshot_collect, >> + blksnap_ioctl_snapshot_wait_event, >> +}; >> + >> +/** >> + * struct blksnap_version - Module version. >> + * >> + * @major: >> + * Version major part. >> + * @minor: >> + * Version minor part. >> + * @revision: >> + * Revision number. >> + * @build: >> + * Build number. Should be zero. >> + */ >> +struct blksnap_version { >> + __u16 major; >> + __u16 minor; >> + __u16 revision; >> + __u16 build; >> +}; >> + >> +/** >> + * define IOCTL_BLKSNAP_VERSION - Get module version. >> + * >> + * The version may increase when the API changes. But linking the user space >> + * behavior to the version code does not seem to be a good idea. >> + * To ensure backward compatibility, API changes should be made by adding new >> + * ioctl without changing the behavior of existing ones. The version should be >> + * used for logs. >> + * >> + * Return: 0 if succeeded, negative errno otherwise. >> + */ >> +#define IOCTL_BLKSNAP_VERSION \ >> + _IOW(BLKSNAP, blksnap_ioctl_version, struct blksnap_version) > Shouldn't this be _IOR()? > > "_IOW means userland is writing and kernel is reading. _IOR > means userland is reading and kernel is writing." > > The other ioctl definitions seem to need a review, too. > Yeah. I need to replace _IOR and _IOW in all ioctl. Thanks!
On Tue, Jul 18, 2023 at 11:53:54AM +0200, Sergei Shtepa wrote: > > Seems a bit weird to have a signed error code that is always negative. > > Couldn't this be an unsigned number or directly return the error from > > the ioctl() itself? > > Yes, it's a good idea to pass the error code as an unsigned value. > And this positive value can be passed in case of successful execution > of ioctl(), but I would not like to put different error signs in one value. Linux tends to use negative error values in basically all interfaces. I think it will be less confusing to stick to that.
diff --git a/MAINTAINERS b/MAINTAINERS index c7dabe785cf1..76b14ad604dc 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3594,6 +3594,7 @@ M: Sergei Shtepa <sergei.shtepa@veeam.com> L: linux-block@vger.kernel.org S: Supported F: Documentation/block/blksnap.rst +F: include/uapi/linux/blksnap.h BLOCK LAYER M: Jens Axboe <axboe@kernel.dk> diff --git a/include/uapi/linux/blksnap.h b/include/uapi/linux/blksnap.h new file mode 100644 index 000000000000..2fe3f2a43bc5 --- /dev/null +++ b/include/uapi/linux/blksnap.h @@ -0,0 +1,421 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* Copyright (C) 2023 Veeam Software Group GmbH */ +#ifndef _UAPI_LINUX_BLKSNAP_H +#define _UAPI_LINUX_BLKSNAP_H + +#include <linux/types.h> + +#define BLKSNAP_CTL "blksnap-control" +#define BLKSNAP_IMAGE_NAME "blksnap-image" +#define BLKSNAP 'V' + +/** + * DOC: Block device filter interface. + * + * Control commands that are transmitted through the block device filter + * interface. + */ + +/** + * enum blkfilter_ctl_blksnap - List of commands for BLKFILTER_CTL ioctl + * + * @blkfilter_ctl_blksnap_cbtinfo: + * Get CBT information. + * The result of executing the command is a &struct blksnap_cbtinfo. + * Return 0 if succeeded, negative errno otherwise. + * @blkfilter_ctl_blksnap_cbtmap: + * Read the CBT map. + * The option passes the &struct blksnap_cbtmap. + * The size of the table can be quite large. Thus, the table is read in + * a loop, in each cycle of which the next offset is set to + * &blksnap_tracker_read_cbt_bitmap.offset. + * Return a count of bytes read if succeeded, negative errno otherwise. + * @blkfilter_ctl_blksnap_cbtdirty: + * Set dirty blocks in the CBT map. + * The option passes the &struct blksnap_cbtdirty. + * There are cases when some blocks need to be marked as changed. + * This ioctl allows to do this. + * Return 0 if succeeded, negative errno otherwise. + * @blkfilter_ctl_blksnap_snapshotadd: + * Add device to snapshot. + * The option passes the &struct blksnap_snapshotadd. + * Return 0 if succeeded, negative errno otherwise. + * @blkfilter_ctl_blksnap_snapshotinfo: + * Get information about snapshot. + * The result of executing the command is a &struct blksnap_snapshotinfo. + * Return 0 if succeeded, negative errno otherwise. + */ +enum blkfilter_ctl_blksnap { + blkfilter_ctl_blksnap_cbtinfo, + blkfilter_ctl_blksnap_cbtmap, + blkfilter_ctl_blksnap_cbtdirty, + blkfilter_ctl_blksnap_snapshotadd, + blkfilter_ctl_blksnap_snapshotinfo, +}; + +#ifndef UUID_SIZE +#define UUID_SIZE 16 +#endif + +/** + * struct blksnap_uuid - Unique 16-byte identifier. + * + * @b: + * An array of 16 bytes. + */ +struct blksnap_uuid { + __u8 b[UUID_SIZE]; +}; + +/** + * struct blksnap_cbtinfo - Result for the command + * &blkfilter_ctl_blksnap.blkfilter_ctl_blksnap_cbtinfo. + * + * @device_capacity: + * Device capacity in bytes. + * @block_size: + * Block size in bytes. + * @block_count: + * Number of blocks. + * @generation_id: + * Unique identifier of change tracking generation. + * @changes_number: + * Current changes number. + */ +struct blksnap_cbtinfo { + __u64 device_capacity; + __u32 block_size; + __u32 block_count; + struct blksnap_uuid generation_id; + __u8 changes_number; +}; + +/** + * struct blksnap_cbtmap - Option for the command + * &blkfilter_ctl_blksnap.blkfilter_ctl_blksnap_cbtmap. + * + * @offset: + * Offset from the beginning of the CBT bitmap in bytes. + * @length: + * Size of @buffer in bytes. + * @buffer: + * Pointer to the buffer for output. + */ +struct blksnap_cbtmap { + __u32 offset; + __u32 length; + __u64 buffer; +}; + +/** + * struct blksnap_sectors - Description of the block device region. + * + * @offset: + * Offset from the beginning of the disk in sectors. + * @count: + * Count of sectors. + */ +struct blksnap_sectors { + __u64 offset; + __u64 count; +}; + +/** + * struct blksnap_cbtdirty - Option for the command + * &blkfilter_ctl_blksnap.blkfilter_ctl_blksnap_cbtdirty. + * + * @count: + * Count of elements in the @dirty_sectors. + * @dirty_sectors: + * Pointer to the array of &struct blksnap_sectors. + */ +struct blksnap_cbtdirty { + __u32 count; + __u64 dirty_sectors; +}; + +/** + * struct blksnap_snapshotadd - Option for the command + * &blkfilter_ctl_blksnap.blkfilter_ctl_blksnap_snapshotadd. + * + * @id: + * ID of the snapshot to which the block device should be added. + */ +struct blksnap_snapshotadd { + struct blksnap_uuid id; +}; + +#define IMAGE_DISK_NAME_LEN 32 + +/** + * struct blksnap_snapshotinfo - Result for the command + * &blkfilter_ctl_blksnap.blkfilter_ctl_blksnap_snapshotinfo. + * + * @error_code: + * Zero if there were no errors while holding the snapshot. + * The error code -ENOSPC means that while holding the snapshot, a snapshot + * overflow situation has occurred. Other error codes mean other reasons + * for failure. + * The error code is reset when the device is added to a new snapshot. + * @image: + * If the snapshot was taken, it stores the block device name of the + * image, or empty string otherwise. + */ +struct blksnap_snapshotinfo { + __s32 error_code; + __u8 image[IMAGE_DISK_NAME_LEN]; +}; + +/** + * DOC: Interface for managing snapshots + * + * Control commands that are transmitted through the blksnap module interface. + */ +enum blksnap_ioctl { + blksnap_ioctl_version, + blksnap_ioctl_snapshot_create, + blksnap_ioctl_snapshot_destroy, + blksnap_ioctl_snapshot_append_storage, + blksnap_ioctl_snapshot_take, + blksnap_ioctl_snapshot_collect, + blksnap_ioctl_snapshot_wait_event, +}; + +/** + * struct blksnap_version - Module version. + * + * @major: + * Version major part. + * @minor: + * Version minor part. + * @revision: + * Revision number. + * @build: + * Build number. Should be zero. + */ +struct blksnap_version { + __u16 major; + __u16 minor; + __u16 revision; + __u16 build; +}; + +/** + * define IOCTL_BLKSNAP_VERSION - Get module version. + * + * The version may increase when the API changes. But linking the user space + * behavior to the version code does not seem to be a good idea. + * To ensure backward compatibility, API changes should be made by adding new + * ioctl without changing the behavior of existing ones. The version should be + * used for logs. + * + * Return: 0 if succeeded, negative errno otherwise. + */ +#define IOCTL_BLKSNAP_VERSION \ + _IOW(BLKSNAP, blksnap_ioctl_version, struct blksnap_version) + + +/** + * define IOCTL_BLKSNAP_SNAPSHOT_CREATE - Create snapshot. + * + * Creates a snapshot structure in the memory and allocates an identifier for + * it. Further interaction with the snapshot is possible by this identifier. + * A snapshot is created for several block devices at once. + * Several snapshots can be created at the same time, but with the condition + * that one block device can only be included in one snapshot. + * + * Return: 0 if succeeded, negative errno otherwise. + */ +#define IOCTL_BLKSNAP_SNAPSHOT_CREATE \ + _IOW(BLKSNAP, blksnap_ioctl_snapshot_create, \ + struct blksnap_uuid) + + +/** + * define IOCTL_BLKSNAP_SNAPSHOT_DESTROY - Release and destroy the snapshot. + * + * Destroys snapshot with &blksnap_snapshot_destroy.id. This leads to the + * deletion of all block device images of the snapshot. The difference storage + * is being released. But the change tracker keeps tracking. + * + * Return: 0 if succeeded, negative errno otherwise. + */ +#define IOCTL_BLKSNAP_SNAPSHOT_DESTROY \ + _IOR(BLKSNAP, blksnap_ioctl_snapshot_destroy, \ + struct blksnap_uuid) + +/** + * struct blksnap_snapshot_append_storage - Argument for the + * &IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE control. + * + * @id: + * Snapshot ID. + * @bdev_path: + * Device path string buffer. + * @bdev_path_size: + * Device path string buffer size. + * @count: + * Size of @ranges in the number of &struct blksnap_sectors. + * @ranges: + * Pointer to the array of &struct blksnap_sectors. + */ +struct blksnap_snapshot_append_storage { + struct blksnap_uuid id; + __u64 bdev_path; + __u32 bdev_path_size; + __u32 count; + __u64 ranges; +}; + +/** + * define IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE - Append storage to the + * difference storage of the snapshot. + * + * The snapshot difference storage can be set either before or after creating + * the snapshot images. This allows to dynamically expand the difference + * storage while holding the snapshot. + * + * Return: 0 if succeeded, negative errno otherwise. + */ +#define IOCTL_BLKSNAP_SNAPSHOT_APPEND_STORAGE \ + _IOW(BLKSNAP, blksnap_ioctl_snapshot_append_storage, \ + struct blksnap_snapshot_append_storage) + +/** + * define IOCTL_BLKSNAP_SNAPSHOT_TAKE - Take snapshot. + * + * Creates snapshot images of block devices and switches change trackers tables. + * The snapshot must be created before this call, and the areas of block + * devices should be added to the difference storage. + * + * Return: 0 if succeeded, negative errno otherwise. + */ +#define IOCTL_BLKSNAP_SNAPSHOT_TAKE \ + _IOR(BLKSNAP, blksnap_ioctl_snapshot_take, \ + struct blksnap_uuid) + +/** + * struct blksnap_snapshot_collect - Argument for the + * &IOCTL_BLKSNAP_SNAPSHOT_COLLECT control. + * + * @count: + * Size of &blksnap_snapshot_collect.ids in the number of 16-byte UUID. + * @ids: + * Pointer to the array of struct blksnap_uuid for output. + */ +struct blksnap_snapshot_collect { + __u32 count; + __u64 ids; +}; + +/** + * define IOCTL_BLKSNAP_SNAPSHOT_COLLECT - Get collection of created snapshots. + * + * Multiple snapshots can be created at the same time. This allows for one + * system to create backups for different data with a independent schedules. + * + * If in &blksnap_snapshot_collect.count is less than required to store the + * &blksnap_snapshot_collect.ids, the array is not filled, and the ioctl + * returns the required count for &blksnap_snapshot_collect.ids. + * + * So, it is recommended to call the ioctl twice. The first call with an null + * pointer &blksnap_snapshot_collect.ids and a zero value in + * &blksnap_snapshot_collect.count. It will set the required array size in + * &blksnap_snapshot_collect.count. The second call with a pointer + * &blksnap_snapshot_collect.ids to an array of the required size will allow to + * get collection of active snapshots. + * + * Return: 0 if succeeded, -ENODATA if there is not enough space in the array + * to store collection of active snapshots, or negative errno otherwise. + */ +#define IOCTL_BLKSNAP_SNAPSHOT_COLLECT \ + _IOW(BLKSNAP, blksnap_ioctl_snapshot_collect, \ + struct blksnap_snapshot_collect) + +/** + * enum blksnap_event_codes - Variants of event codes. + * + * @blksnap_event_code_low_free_space: + * Low free space in difference storage event. + * If the free space in the difference storage is reduced to the specified + * limit, the module generates this event. + * @blksnap_event_code_corrupted: + * Snapshot image is corrupted event. + * If a chunk could not be allocated when trying to save data to the + * difference storage, this event is generated. However, this does not mean + * that the backup process was interrupted with an error. If the snapshot + * image has been read to the end by this time, the backup process is + * considered successful. + */ +enum blksnap_event_codes { + blksnap_event_code_low_free_space, + blksnap_event_code_corrupted, +}; + +/** + * struct blksnap_snapshot_event - Argument for the + * &IOCTL_BLKSNAP_SNAPSHOT_WAIT_EVENT control. + * + * @id: + * Snapshot ID. + * @timeout_ms: + * Timeout for waiting in milliseconds. + * @code: + * Code of the received event &enum blksnap_event_codes. + * @time_label: + * Timestamp of the received event. + * @data: + * The received event body. + */ +struct blksnap_snapshot_event { + struct blksnap_uuid id; + __u32 timeout_ms; + __u32 code; + __s64 time_label; + __u8 data[4096 - 32]; +}; + +/** + * define IOCTL_BLKSNAP_SNAPSHOT_WAIT_EVENT - Wait and get the event from the + * snapshot. + * + * While holding the snapshot, the kernel module can transmit information about + * changes in its state in the form of events to the user level. + * It is very important to receive these events as quickly as possible, so the + * user's thread is in the state of interruptable sleep. + * + * Return: 0 if succeeded, negative errno otherwise. + */ +#define IOCTL_BLKSNAP_SNAPSHOT_WAIT_EVENT \ + _IOW(BLKSNAP, blksnap_ioctl_snapshot_wait_event, \ + struct blksnap_snapshot_event) + +/** + * struct blksnap_event_low_free_space - Data for the + * &blksnap_event_code_low_free_space event. + * + * @requested_nr_sect: + * The required number of sectors. + */ +struct blksnap_event_low_free_space { + __u64 requested_nr_sect; +}; + +/** + * struct blksnap_event_corrupted - Data for the + * &blksnap_event_code_corrupted event. + * + * @dev_id_mj: + * Major part of original device ID. + * @dev_id_mn: + * Minor part of original device ID. + * @err_code: + * Error code. + */ +struct blksnap_event_corrupted { + __u32 dev_id_mj; + __u32 dev_id_mn; + __s32 err_code; +}; + +#endif /* _UAPI_LINUX_BLKSNAP_H */