Message ID | 1528375581-29538-1-git-send-email-pl@kamp.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2018-06-07 14:46, Peter Lieven wrote: > We currently don't enforce that the sparse segments we detect during convert are > aligned. This leads to unnecessary and costly read-modify-write cycles either > internally in Qemu or in the background on the storage device as nearly all > modern filesystems or hardware has a 4k alignment internally. > > As we per default set the min_sparse size to 4k it makes perfectly sense to ensure > that these sparse holes in the file are placed at 4k boundaries. > > The number of RMW cycles when converting an example image [1] to a raw device that > has 4k sector size is about 4600 4k read requests to perform a total of about 15000 > write requests. With this path the 4600 additional read requests are eliminated. > > [1] https://cloud-images.ubuntu.com/releases/16.04/release/ubuntu-16.04-server-cloudimg-amd64-disk1.vmdk > > Signed-off-by: Peter Lieven <pl@kamp.de> > --- > qemu-img.c | 21 +++++++++++++++------ > 1 file changed, 15 insertions(+), 6 deletions(-) I like the idea, but it doesn't seem guaranteed that is_allocated_sectors() is called on aligned offsets, so this alignment work may still leave things unaligned. Furthermore, we should probably not blindly assume 4k but instead use some block limit of the target, like pwrite_zeroes_alignment, or pdiscard_alignment, depending on the case. (Or probably still min_sparse, if that's less.) Since is_allocated_sectors_min() (the only caller of is_allocated_sectors()) is called from just a single place, taking those factors into account should be possible. Max
Am 11.06.2018 um 15:30 schrieb Max Reitz: > On 2018-06-07 14:46, Peter Lieven wrote: >> We currently don't enforce that the sparse segments we detect during convert are >> aligned. This leads to unnecessary and costly read-modify-write cycles either >> internally in Qemu or in the background on the storage device as nearly all >> modern filesystems or hardware has a 4k alignment internally. >> >> As we per default set the min_sparse size to 4k it makes perfectly sense to ensure >> that these sparse holes in the file are placed at 4k boundaries. >> >> The number of RMW cycles when converting an example image [1] to a raw device that >> has 4k sector size is about 4600 4k read requests to perform a total of about 15000 >> write requests. With this path the 4600 additional read requests are eliminated. >> >> [1] https://cloud-images.ubuntu.com/releases/16.04/release/ubuntu-16.04-server-cloudimg-amd64-disk1.vmdk >> >> Signed-off-by: Peter Lieven <pl@kamp.de> >> --- >> qemu-img.c | 21 +++++++++++++++------ >> 1 file changed, 15 insertions(+), 6 deletions(-) > I like the idea, but it doesn't seem guaranteed that > is_allocated_sectors() is called on aligned offsets, so this alignment > work may still leave things unaligned. I can't image why this should happen. As long as the alignment devides the buffer size we either write or skip aligned bytes. Maybe get_block_status returns an unaligned number of sectors? > > Furthermore, we should probably not blindly assume 4k but instead use > some block limit of the target, like pwrite_zeroes_alignment, or > pdiscard_alignment, depending on the case. (Or probably still > min_sparse, if that's less.) > > Since is_allocated_sectors_min() (the only caller of > is_allocated_sectors()) is called from just a single place, taking those > factors into account should be possible. I also thought of this, but for instance for raw-posix I always get a request_alignment of 1. But maybe the alignments you proposed produce a better result. I will check that. Thanks, Peter
On 2018-06-11 15:59, Peter Lieven wrote: > Am 11.06.2018 um 15:30 schrieb Max Reitz: >> On 2018-06-07 14:46, Peter Lieven wrote: >>> We currently don't enforce that the sparse segments we detect during >>> convert are >>> aligned. This leads to unnecessary and costly read-modify-write >>> cycles either >>> internally in Qemu or in the background on the storage device as >>> nearly all >>> modern filesystems or hardware has a 4k alignment internally. >>> >>> As we per default set the min_sparse size to 4k it makes perfectly >>> sense to ensure >>> that these sparse holes in the file are placed at 4k boundaries. >>> >>> The number of RMW cycles when converting an example image [1] to a >>> raw device that >>> has 4k sector size is about 4600 4k read requests to perform a total >>> of about 15000 >>> write requests. With this path the 4600 additional read requests are >>> eliminated. >>> >>> [1] >>> https://cloud-images.ubuntu.com/releases/16.04/release/ubuntu-16.04-server-cloudimg-amd64-disk1.vmdk >>> >>> >>> Signed-off-by: Peter Lieven <pl@kamp.de> >>> --- >>> qemu-img.c | 21 +++++++++++++++------ >>> 1 file changed, 15 insertions(+), 6 deletions(-) >> I like the idea, but it doesn't seem guaranteed that >> is_allocated_sectors() is called on aligned offsets, so this alignment >> work may still leave things unaligned. > > I can't image why this should happen. As long as the alignment devides > the buffer size we either > write or skip aligned bytes. Maybe get_block_status returns an unaligned > number of sectors? Yes, because the source medium does not need to be the same as the destination (so the source may have e.g. 512-byte clusters). >> Furthermore, we should probably not blindly assume 4k but instead use >> some block limit of the target, like pwrite_zeroes_alignment, or >> pdiscard_alignment, depending on the case. (Or probably still >> min_sparse, if that's less.) >> >> Since is_allocated_sectors_min() (the only caller of >> is_allocated_sectors()) is called from just a single place, taking those >> factors into account should be possible. > > I also thought of this, but for instance for raw-posix I always get a > request_alignment of 1. Yes, because request_alignment is a hard requirement. With caching, you can send requests with any alignment, so it's 1. pwrite_zeroes_alignment and pdiscard_alignment are described as "Optimal alignment", so those should contain the values we/you want. If they are 0, then you should probably fall back to opt_transfer instead of request_alignment. Max > But maybe the alignments you proposed produce a better result. I will > check that.
Am 11.06.2018 um 16:04 schrieb Max Reitz: > On 2018-06-11 15:59, Peter Lieven wrote: >> Am 11.06.2018 um 15:30 schrieb Max Reitz: >>> On 2018-06-07 14:46, Peter Lieven wrote: >>>> We currently don't enforce that the sparse segments we detect during >>>> convert are >>>> aligned. This leads to unnecessary and costly read-modify-write >>>> cycles either >>>> internally in Qemu or in the background on the storage device as >>>> nearly all >>>> modern filesystems or hardware has a 4k alignment internally. >>>> >>>> As we per default set the min_sparse size to 4k it makes perfectly >>>> sense to ensure >>>> that these sparse holes in the file are placed at 4k boundaries. >>>> >>>> The number of RMW cycles when converting an example image [1] to a >>>> raw device that >>>> has 4k sector size is about 4600 4k read requests to perform a total >>>> of about 15000 >>>> write requests. With this path the 4600 additional read requests are >>>> eliminated. >>>> >>>> [1] >>>> https://cloud-images.ubuntu.com/releases/16.04/release/ubuntu-16.04-server-cloudimg-amd64-disk1.vmdk >>>> >>>> >>>> Signed-off-by: Peter Lieven <pl@kamp.de> >>>> --- >>>> qemu-img.c | 21 +++++++++++++++------ >>>> 1 file changed, 15 insertions(+), 6 deletions(-) >>> I like the idea, but it doesn't seem guaranteed that >>> is_allocated_sectors() is called on aligned offsets, so this alignment >>> work may still leave things unaligned. >> I can't image why this should happen. As long as the alignment devides >> the buffer size we either >> write or skip aligned bytes. Maybe get_block_status returns an unaligned >> number of sectors? > Yes, because the source medium does not need to be the same as the > destination (so the source may have e.g. 512-byte clusters). Okay, I will try to figure out how to cope with it. So the function needs to get the offset and the alignment to make the right "decision". > >>> Furthermore, we should probably not blindly assume 4k but instead use >>> some block limit of the target, like pwrite_zeroes_alignment, or >>> pdiscard_alignment, depending on the case. (Or probably still >>> min_sparse, if that's less.) >>> >>> Since is_allocated_sectors_min() (the only caller of >>> is_allocated_sectors()) is called from just a single place, taking those >>> factors into account should be possible. >> I also thought of this, but for instance for raw-posix I always get a >> request_alignment of 1. > Yes, because request_alignment is a hard requirement. With caching, you > can send requests with any alignment, so it's 1. > > pwrite_zeroes_alignment and pdiscard_alignment are described as "Optimal > alignment", so those should contain the values we/you want. If they are > 0, then you should probably fall back to opt_transfer instead of > request_alignment. I will check that for the targets that I can test and send a V2. Thanks for your feedback, Peter
Am 11.06.2018 um 16:04 schrieb Max Reitz: > On 2018-06-11 15:59, Peter Lieven wrote: >> Am 11.06.2018 um 15:30 schrieb Max Reitz: >>> On 2018-06-07 14:46, Peter Lieven wrote: >>>> We currently don't enforce that the sparse segments we detect during >>>> convert are >>>> aligned. This leads to unnecessary and costly read-modify-write >>>> cycles either >>>> internally in Qemu or in the background on the storage device as >>>> nearly all >>>> modern filesystems or hardware has a 4k alignment internally. >>>> >>>> As we per default set the min_sparse size to 4k it makes perfectly >>>> sense to ensure >>>> that these sparse holes in the file are placed at 4k boundaries. >>>> >>>> The number of RMW cycles when converting an example image [1] to a >>>> raw device that >>>> has 4k sector size is about 4600 4k read requests to perform a total >>>> of about 15000 >>>> write requests. With this path the 4600 additional read requests are >>>> eliminated. >>>> >>>> [1] >>>> https://cloud-images.ubuntu.com/releases/16.04/release/ubuntu-16.04-server-cloudimg-amd64-disk1.vmdk >>>> >>>> >>>> Signed-off-by: Peter Lieven <pl@kamp.de> >>>> --- >>>> qemu-img.c | 21 +++++++++++++++------ >>>> 1 file changed, 15 insertions(+), 6 deletions(-) >>> I like the idea, but it doesn't seem guaranteed that >>> is_allocated_sectors() is called on aligned offsets, so this alignment >>> work may still leave things unaligned. >> I can't image why this should happen. As long as the alignment devides >> the buffer size we either >> write or skip aligned bytes. Maybe get_block_status returns an unaligned >> number of sectors? > Yes, because the source medium does not need to be the same as the > destination (so the source may have e.g. 512-byte clusters). > >>> Furthermore, we should probably not blindly assume 4k but instead use >>> some block limit of the target, like pwrite_zeroes_alignment, or >>> pdiscard_alignment, depending on the case. (Or probably still >>> min_sparse, if that's less.) >>> >>> Since is_allocated_sectors_min() (the only caller of >>> is_allocated_sectors()) is called from just a single place, taking those >>> factors into account should be possible. >> I also thought of this, but for instance for raw-posix I always get a >> request_alignment of 1. > Yes, because request_alignment is a hard requirement. With caching, you > can send requests with any alignment, so it's 1. > > pwrite_zeroes_alignment and pdiscard_alignment are described as "Optimal > alignment", so those should contain the values we/you want. If they are > 0, then you should probably fall back to opt_transfer instead of > request_alignment. I am still trying to figure out what is the best solution. If I take the optima into account I might ending up transfering more data than necessary just to create an optimal request. I just want to avoid unnecessary RMW cycles. And even if modern byte interfaces advertise a request_alignment of 1 someone has to do the RMW cycle. Either the OS or the harddrive itself. I am thinking about sth like alignment = MAX(request_alignment, opt_transfer, min_sparse) as a starting point? I found that opt_transfer seems to be 0 for everything I found to test. So maybe even reduce the alignment to MAX(request_alignment, min_sparse). Peter
diff --git a/qemu-img.c b/qemu-img.c index 75f1610..68eefba 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -1096,24 +1096,33 @@ static int64_t find_nonzero(const uint8_t *buf, int64_t n) * * 'pnum' is set to the number of sectors (including and immediately following * the first one) that are known to be in the same allocated/unallocated state. + * The function will try to align 'pnum' to 8 sectors (4k) to avoid unnecassary + * RMW cycles on modern hardware. */ static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum) { bool is_zero; - int i; + int i, alignment = 1; if (n <= 0) { *pnum = 0; return 0; } - is_zero = buffer_is_zero(buf, 512); - for(i = 1; i < n; i++) { - buf += 512; - if (is_zero != buffer_is_zero(buf, 512)) { + + if (!(n & 7)) { + /* the buffer size is dividable by 4k */ + alignment = 8; + n /= 8; + } + + is_zero = buffer_is_zero(buf, BDRV_SECTOR_SIZE * alignment); + for (i = 1; i < n; i++) { + buf += BDRV_SECTOR_SIZE * alignment; + if (is_zero != buffer_is_zero(buf, BDRV_SECTOR_SIZE * alignment)) { break; } } - *pnum = i; + *pnum = i * alignment; return !is_zero; }
We currently don't enforce that the sparse segments we detect during convert are aligned. This leads to unnecessary and costly read-modify-write cycles either internally in Qemu or in the background on the storage device as nearly all modern filesystems or hardware has a 4k alignment internally. As we per default set the min_sparse size to 4k it makes perfectly sense to ensure that these sparse holes in the file are placed at 4k boundaries. The number of RMW cycles when converting an example image [1] to a raw device that has 4k sector size is about 4600 4k read requests to perform a total of about 15000 write requests. With this path the 4600 additional read requests are eliminated. [1] https://cloud-images.ubuntu.com/releases/16.04/release/ubuntu-16.04-server-cloudimg-amd64-disk1.vmdk Signed-off-by: Peter Lieven <pl@kamp.de> --- qemu-img.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)