diff mbox

qemu-img: Do not truncate before preallocation

Message ID 20170203195037.4238-1-nirsof@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nir Soffer Feb. 3, 2017, 7:50 p.m. UTC
When using file system that does not support fallocate() (e.g. NFS <
4.2), truncating the file only when preallocation=OFF speeds up creating
raw file.

Here is example run, tested on Fedora 24 machine, creating raw file on
NFS version 3 server.

$ time ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 1g
Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc

real	0m21.185s
user	0m0.022s
sys	0m0.574s

$ time ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 1g
Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc

real	0m11.601s
user	0m0.016s
sys	0m0.525s

$ time dd if=/dev/zero of=mnt/test bs=1M count=1024 oflag=direct
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.6627 s, 68.6 MB/s

real	0m16.104s
user	0m0.009s
sys	0m0.220s

Running with strace we can see that without this change we do one
pread() and one pwrite() for each block. With this change, we do only
one pwrite() per block.

$ strace ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 8192
...
pread64(9, "\0", 1, 4095)               = 1
pwrite64(9, "\0", 1, 4095)              = 1
pread64(9, "\0", 1, 8191)               = 1
pwrite64(9, "\0", 1, 8191)              = 1

$ strace ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 8192
...
pwrite64(9, "\0", 1, 4095)              = 1
pwrite64(9, "\0", 1, 8191)              = 1

This happens because posix_fallocate is checking if each block is
allocated before writing a byte to the block, and when truncating the
file before preallocation, all blocks are unallocated.

Signed-off-by: Nir Soffer <nirsof@gmail.com>
---

I sent this a week ago:
http://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg06123.html

Sending again with improved commit message.

 block/file-posix.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

Comments

Nir Soffer Feb. 16, 2017, 5:38 p.m. UTC | #1
Ping

On Fri, Feb 3, 2017 at 9:50 PM, Nir Soffer <nirsof@gmail.com> wrote:
> When using file system that does not support fallocate() (e.g. NFS <
> 4.2), truncating the file only when preallocation=OFF speeds up creating
> raw file.
>
> Here is example run, tested on Fedora 24 machine, creating raw file on
> NFS version 3 server.
>
> $ time ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 1g
> Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc
>
> real    0m21.185s
> user    0m0.022s
> sys     0m0.574s
>
> $ time ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 1g
> Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc
>
> real    0m11.601s
> user    0m0.016s
> sys     0m0.525s
>
> $ time dd if=/dev/zero of=mnt/test bs=1M count=1024 oflag=direct
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.6627 s, 68.6 MB/s
>
> real    0m16.104s
> user    0m0.009s
> sys     0m0.220s
>
> Running with strace we can see that without this change we do one
> pread() and one pwrite() for each block. With this change, we do only
> one pwrite() per block.
>
> $ strace ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 8192
> ...
> pread64(9, "\0", 1, 4095)               = 1
> pwrite64(9, "\0", 1, 4095)              = 1
> pread64(9, "\0", 1, 8191)               = 1
> pwrite64(9, "\0", 1, 8191)              = 1
>
> $ strace ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 8192
> ...
> pwrite64(9, "\0", 1, 4095)              = 1
> pwrite64(9, "\0", 1, 8191)              = 1
>
> This happens because posix_fallocate is checking if each block is
> allocated before writing a byte to the block, and when truncating the
> file before preallocation, all blocks are unallocated.
>
> Signed-off-by: Nir Soffer <nirsof@gmail.com>
> ---
>
> I sent this a week ago:
> http://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg06123.html
>
> Sending again with improved commit message.
>
>  block/file-posix.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 2134e0e..442f080 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1591,12 +1591,6 @@ static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
>  #endif
>      }
>
> -    if (ftruncate(fd, total_size) != 0) {
> -        result = -errno;
> -        error_setg_errno(errp, -result, "Could not resize file");
> -        goto out_close;
> -    }
> -
>      switch (prealloc) {
>  #ifdef CONFIG_POSIX_FALLOCATE
>      case PREALLOC_MODE_FALLOC:
> @@ -1636,6 +1630,10 @@ static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
>          break;
>      }
>      case PREALLOC_MODE_OFF:
> +        if (ftruncate(fd, total_size) != 0) {
> +            result = -errno;
> +            error_setg_errno(errp, -result, "Could not resize file");
> +        }
>          break;
>      default:
>          result = -EINVAL;
> @@ -1644,7 +1642,6 @@ static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
>          break;
>      }
>
> -out_close:
>      if (qemu_close(fd) != 0 && result == 0) {
>          result = -errno;
>          error_setg_errno(errp, -result, "Could not close the new file");
> --
> 2.9.3
>
Kevin Wolf Feb. 16, 2017, 5:52 p.m. UTC | #2
Am 03.02.2017 um 20:50 hat Nir Soffer geschrieben:
> When using file system that does not support fallocate() (e.g. NFS <
> 4.2), truncating the file only when preallocation=OFF speeds up creating
> raw file.
> 
> Here is example run, tested on Fedora 24 machine, creating raw file on
> NFS version 3 server.
> 
> $ time ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 1g
> Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc
> 
> real	0m21.185s
> user	0m0.022s
> sys	0m0.574s
> 
> $ time ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 1g
> Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc
> 
> real	0m11.601s
> user	0m0.016s
> sys	0m0.525s
> 
> $ time dd if=/dev/zero of=mnt/test bs=1M count=1024 oflag=direct
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.6627 s, 68.6 MB/s
> 
> real	0m16.104s
> user	0m0.009s
> sys	0m0.220s
> 
> Running with strace we can see that without this change we do one
> pread() and one pwrite() for each block. With this change, we do only
> one pwrite() per block.
> 
> $ strace ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 8192
> ...
> pread64(9, "\0", 1, 4095)               = 1
> pwrite64(9, "\0", 1, 4095)              = 1
> pread64(9, "\0", 1, 8191)               = 1
> pwrite64(9, "\0", 1, 8191)              = 1
> 
> $ strace ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 8192
> ...
> pwrite64(9, "\0", 1, 4095)              = 1
> pwrite64(9, "\0", 1, 8191)              = 1
> 
> This happens because posix_fallocate is checking if each block is
> allocated before writing a byte to the block, and when truncating the
> file before preallocation, all blocks are unallocated.
> 
> Signed-off-by: Nir Soffer <nirsof@gmail.com>

Thanks, applied to the block branch.

I'm not completely sure if doing an ftruncate() first couldn't improve
PREALLOC_MODE_FULL somewhat in some cases, but I agree that the patch
should still result in correct images.

KEvin
Nir Soffer Feb. 16, 2017, 6:23 p.m. UTC | #3
On Thu, Feb 16, 2017 at 7:52 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 03.02.2017 um 20:50 hat Nir Soffer geschrieben:
>> When using file system that does not support fallocate() (e.g. NFS <
>> 4.2), truncating the file only when preallocation=OFF speeds up creating
>> raw file.
>>
>> Here is example run, tested on Fedora 24 machine, creating raw file on
>> NFS version 3 server.
>>
>> $ time ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 1g
>> Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc
>>
>> real  0m21.185s
>> user  0m0.022s
>> sys   0m0.574s
>>
>> $ time ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 1g
>> Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc
>>
>> real  0m11.601s
>> user  0m0.016s
>> sys   0m0.525s
>>
>> $ time dd if=/dev/zero of=mnt/test bs=1M count=1024 oflag=direct
>> 1024+0 records in
>> 1024+0 records out
>> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.6627 s, 68.6 MB/s
>>
>> real  0m16.104s
>> user  0m0.009s
>> sys   0m0.220s
>>
>> Running with strace we can see that without this change we do one
>> pread() and one pwrite() for each block. With this change, we do only
>> one pwrite() per block.
>>
>> $ strace ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 8192
>> ...
>> pread64(9, "\0", 1, 4095)               = 1
>> pwrite64(9, "\0", 1, 4095)              = 1
>> pread64(9, "\0", 1, 8191)               = 1
>> pwrite64(9, "\0", 1, 8191)              = 1
>>
>> $ strace ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 8192
>> ...
>> pwrite64(9, "\0", 1, 4095)              = 1
>> pwrite64(9, "\0", 1, 8191)              = 1
>>
>> This happens because posix_fallocate is checking if each block is
>> allocated before writing a byte to the block, and when truncating the
>> file before preallocation, all blocks are unallocated.
>>
>> Signed-off-by: Nir Soffer <nirsof@gmail.com>
>
> Thanks, applied to the block branch.
>
> I'm not completely sure if doing an ftruncate() first couldn't improve
> PREALLOC_MODE_FULL somewhat in some cases, but I agree that the patch
> should still result in correct images.

Good point, I'll do some tests with full mode to check this.

Do you know which cases can benefit from ftruncate() before full preallocation?

Nir
Kevin Wolf Feb. 16, 2017, 6:36 p.m. UTC | #4
Am 16.02.2017 um 19:23 hat Nir Soffer geschrieben:
> On Thu, Feb 16, 2017 at 7:52 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> > Am 03.02.2017 um 20:50 hat Nir Soffer geschrieben:
> >> When using file system that does not support fallocate() (e.g. NFS <
> >> 4.2), truncating the file only when preallocation=OFF speeds up creating
> >> raw file.
> >>
> >> Here is example run, tested on Fedora 24 machine, creating raw file on
> >> NFS version 3 server.
> >>
> >> $ time ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 1g
> >> Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc
> >>
> >> real  0m21.185s
> >> user  0m0.022s
> >> sys   0m0.574s
> >>
> >> $ time ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 1g
> >> Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc
> >>
> >> real  0m11.601s
> >> user  0m0.016s
> >> sys   0m0.525s
> >>
> >> $ time dd if=/dev/zero of=mnt/test bs=1M count=1024 oflag=direct
> >> 1024+0 records in
> >> 1024+0 records out
> >> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.6627 s, 68.6 MB/s
> >>
> >> real  0m16.104s
> >> user  0m0.009s
> >> sys   0m0.220s
> >>
> >> Running with strace we can see that without this change we do one
> >> pread() and one pwrite() for each block. With this change, we do only
> >> one pwrite() per block.
> >>
> >> $ strace ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 8192
> >> ...
> >> pread64(9, "\0", 1, 4095)               = 1
> >> pwrite64(9, "\0", 1, 4095)              = 1
> >> pread64(9, "\0", 1, 8191)               = 1
> >> pwrite64(9, "\0", 1, 8191)              = 1
> >>
> >> $ strace ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 8192
> >> ...
> >> pwrite64(9, "\0", 1, 4095)              = 1
> >> pwrite64(9, "\0", 1, 8191)              = 1
> >>
> >> This happens because posix_fallocate is checking if each block is
> >> allocated before writing a byte to the block, and when truncating the
> >> file before preallocation, all blocks are unallocated.
> >>
> >> Signed-off-by: Nir Soffer <nirsof@gmail.com>
> >
> > Thanks, applied to the block branch.
> >
> > I'm not completely sure if doing an ftruncate() first couldn't improve
> > PREALLOC_MODE_FULL somewhat in some cases, but I agree that the patch
> > should still result in correct images.
> 
> Good point, I'll do some tests with full mode to check this.
> 
> Do you know which cases can benefit from ftruncate() before full preallocation?

Knowing the final size from the beginning could allow the file system
driver to do less allocations and possibly avoid fragmentation of the
file. I'm not sure if that's easily measurable, though.

I'd also expect that with XFS, you'll see no or less preallocation.
'du' often shows values much larger than the actual image size on XFS
because the file was growing a lot, so without having the actual size,
XFS takes a guess and does some generous preallocation.

Kevin
diff mbox

Patch

diff --git a/block/file-posix.c b/block/file-posix.c
index 2134e0e..442f080 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1591,12 +1591,6 @@  static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
 #endif
     }
 
-    if (ftruncate(fd, total_size) != 0) {
-        result = -errno;
-        error_setg_errno(errp, -result, "Could not resize file");
-        goto out_close;
-    }
-
     switch (prealloc) {
 #ifdef CONFIG_POSIX_FALLOCATE
     case PREALLOC_MODE_FALLOC:
@@ -1636,6 +1630,10 @@  static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
         break;
     }
     case PREALLOC_MODE_OFF:
+        if (ftruncate(fd, total_size) != 0) {
+            result = -errno;
+            error_setg_errno(errp, -result, "Could not resize file");
+        }
         break;
     default:
         result = -EINVAL;
@@ -1644,7 +1642,6 @@  static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
         break;
     }
 
-out_close:
     if (qemu_close(fd) != 0 && result == 0) {
         result = -errno;
         error_setg_errno(errp, -result, "Could not close the new file");