Message ID | 051232485e4ac1a1a5fd35de7328208385c59f65.1637179348.git.osandov@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: add ioctls and send/receive support for reading/writing compressed data | expand |
On Wed, Nov 17, 2021 at 12:19:24PM -0800, Omar Sandoval wrote: > From: Omar Sandoval <osandov@fb.com> > > The length field of the send stream TLV header is 16 bits. This means > that the maximum amount of data that can be sent for one write is 64k > minus one. However, encoded writes must be able to send the maximum > compressed extent (128k) in one command. To support this, send stream > version 2 encodes the DATA attribute differently: it has no length > field, and the length is implicitly up to the end of containing command > (which has a 32-bit length field). Although this is necessary for > encoded writes, normal writes can benefit from it, too. > > Also add a check to enforce that the DATA attribute is last. It is only > strictly necessary for v2, but we might as well make v1 consistent with > it. > > For v2, let's bump up the send buffer to the maximum compressed extent > size plus 16k for the other metadata (144k total). I'm not sure we want to set the number like that, it feels quite limiting for potential compression enhancements. > Since this will most > likely be vmalloc'd (and always will be after the next commit), we round > it up to the next page since we might as well use the rest of the page > on systems with >16k pages. Would it work also without the virtual mappings? For speedup it makes sense to use vmalloc area, but as a fallback writing in smaller portions or page by page eventually should be also possible. For that reason I don't think we should set the maximum other than what fits to 32bit number minus some overhead.
On Thu, Nov 18, 2021 at 04:50:37PM +0100, David Sterba wrote: > On Wed, Nov 17, 2021 at 12:19:24PM -0800, Omar Sandoval wrote: > > From: Omar Sandoval <osandov@fb.com> > > > > The length field of the send stream TLV header is 16 bits. This means > > that the maximum amount of data that can be sent for one write is 64k > > minus one. However, encoded writes must be able to send the maximum > > compressed extent (128k) in one command. To support this, send stream > > version 2 encodes the DATA attribute differently: it has no length > > field, and the length is implicitly up to the end of containing command > > (which has a 32-bit length field). Although this is necessary for > > encoded writes, normal writes can benefit from it, too. > > > > Also add a check to enforce that the DATA attribute is last. It is only > > strictly necessary for v2, but we might as well make v1 consistent with > > it. > > > > For v2, let's bump up the send buffer to the maximum compressed extent > > size plus 16k for the other metadata (144k total). > > I'm not sure we want to set the number like that, it feels quite > limiting for potential compression enhancements. This is all we need for now, but we can always raise it in the future. I amended the protocol and the progs send parsing code to assume no hard limit. > > Since this will most > > likely be vmalloc'd (and always will be after the next commit), we round > > it up to the next page since we might as well use the rest of the page > > on systems with >16k pages. > > Would it work also without the virtual mappings? For speedup it makes > sense to use vmalloc area, but as a fallback writing in smaller portions > or page by page eventually should be also possible. For that reason I > don't think we should set the maximum other than what fits to 32bit > number minus some overhead. I think you're saying that we could allocate a smaller buffer and do smaller reads that we immediately write to the send pipe/file? So something like: send_write() { write_tlv_metadata_to_pipe(); while (written < to_write) { read_small_chunk(); write_small_chunk_to_pipe(); written += size_of_small_chunk(); } } And from the protocol's point of view, it's still one big command, although we didn't have to keep it all in memory at once. If I'm understanding correctly, then yes, I think that's something we could do eventually. And my description of v2 allows this: -#define BTRFS_SEND_BUF_SIZE SZ_64K +/* + * In send stream v1, no command is larger than 64k. In send stream v2, no limit + * should be assumed. + */ +#define BTRFS_SEND_BUF_SIZE_V1 SZ_64K Although receive would have to be more intelligent about reading huge commands.
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 53b3cc2276ea..12844cb20584 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -81,6 +81,7 @@ struct send_ctx { char *send_buf; u32 send_size; u32 send_max_size; + bool put_data; u64 flags; /* 'flags' member of btrfs_ioctl_send_args is u64 */ /* Protocol version compatibility requested */ u32 proto; @@ -584,6 +585,9 @@ static int tlv_put(struct send_ctx *sctx, u16 attr, const void *data, int len) int total_len = sizeof(*hdr) + len; int left = sctx->send_max_size - sctx->send_size; + if (WARN_ON_ONCE(sctx->put_data)) + return -EINVAL; + if (unlikely(left < total_len)) return -EOVERFLOW; @@ -721,6 +725,7 @@ static int send_cmd(struct send_ctx *sctx) &sctx->send_off); sctx->send_size = 0; + sctx->put_data = false; return ret; } @@ -4902,14 +4907,30 @@ static inline u64 max_send_read_size(const struct send_ctx *sctx) static int put_data_header(struct send_ctx *sctx, u32 len) { - struct btrfs_tlv_header *hdr; + if (WARN_ON_ONCE(sctx->put_data)) + return -EINVAL; + sctx->put_data = true; + if (sctx->proto >= 2) { + /* + * In v2, the data attribute header doesn't include a length; it + * is implicitly to the end of the command. + */ + if (sctx->send_max_size - sctx->send_size < 2 + len) + return -EOVERFLOW; + put_unaligned_le16(BTRFS_SEND_A_DATA, + sctx->send_buf + sctx->send_size); + sctx->send_size += 2; + } else { + struct btrfs_tlv_header *hdr; - if (sctx->send_max_size - sctx->send_size < sizeof(*hdr) + len) - return -EOVERFLOW; - hdr = (struct btrfs_tlv_header *)(sctx->send_buf + sctx->send_size); - put_unaligned_le16(BTRFS_SEND_A_DATA, &hdr->tlv_type); - put_unaligned_le16(len, &hdr->tlv_len); - sctx->send_size += sizeof(*hdr); + if (sctx->send_max_size - sctx->send_size < sizeof(*hdr) + len) + return -EOVERFLOW; + hdr = (struct btrfs_tlv_header *)(sctx->send_buf + + sctx->send_size); + put_unaligned_le16(BTRFS_SEND_A_DATA, &hdr->tlv_type); + put_unaligned_le16(len, &hdr->tlv_len); + sctx->send_size += sizeof(*hdr); + } return 0; } @@ -7292,7 +7313,12 @@ long btrfs_ioctl_send(struct file *mnt_file, struct btrfs_ioctl_send_args *arg) sctx->clone_roots_cnt = arg->clone_sources_count; - sctx->send_max_size = BTRFS_SEND_BUF_SIZE_V1; + if (sctx->proto >= 2) { + sctx->send_max_size = ALIGN(SZ_16K + BTRFS_MAX_COMPRESSED, + PAGE_SIZE); + } else { + sctx->send_max_size = BTRFS_SEND_BUF_SIZE_V1; + } sctx->send_buf = kvmalloc(sctx->send_max_size, GFP_KERNEL); if (!sctx->send_buf) { ret = -ENOMEM;