Message ID | 1487318764-29513-1-git-send-email-pagupta@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Am 17.02.2017 um 09:06 hat Pankaj Gupta geschrieben: > To maintain consistency at all the places use qemu_madvise wrapper > inplace of madvise call. > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Juan/Dave, if one of you can give an Acked-by, I can take this through my tree. Kevin
* Kevin Wolf (kwolf@redhat.com) wrote: > Am 17.02.2017 um 09:06 hat Pankaj Gupta geschrieben: > > To maintain consistency at all the places use qemu_madvise wrapper > > inplace of madvise call. > > > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com> > > Reviewed-by: Kevin Wolf <kwolf@redhat.com> > > Juan/Dave, if one of you can give an Acked-by, I can take this through > my tree. NACK That's wrong; qemu_madvise can end up going through posix_madvise and using POSIX_MADV_DONTNEED, it has different semantics to the madvise(MADV_DONTNEED) and we need the semantics of madvise - i.e. it's guaranteed to throw away the pages, where as posix_madvise *may* throw away the pages if the kernel feels like it. Dave > Kevin -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Fri 17 Feb 2017 09:06:04 AM CET, Pankaj Gupta wrote: > To maintain consistency at all the places use qemu_madvise wrapper > inplace of madvise call. > if (length > 0) { > - madvise((uint8_t *) t + offset, length, MADV_DONTNEED); > + qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED); This was changed two months ago from qemu_madvise() to madvise(), is there any reason why you want to revert that change? Those two calls are not equivalent, please see commit 2f2c8d6b371cfc6689affb0b7e for an explanation. > - if (madvise(start, length, MADV_DONTNEED)) { > + if (qemu_madvise(start, length, QEMU_MADV_DONTNEED)) { > error_report("%s MADV_DONTNEED: %s", __func__, strerror(errno)); And this is the same case. Berto
Thanks for your comments. I have below query. > > On Fri 17 Feb 2017 09:06:04 AM CET, Pankaj Gupta wrote: > > To maintain consistency at all the places use qemu_madvise wrapper > > inplace of madvise call. > > > if (length > 0) { > > - madvise((uint8_t *) t + offset, length, MADV_DONTNEED); > > + qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED); > > This was changed two months ago from qemu_madvise() to madvise(), is > there any reason why you want to revert that change? Those two calls are > not equivalent, please see commit 2f2c8d6b371cfc6689affb0b7e for an > explanation. > > > - if (madvise(start, length, MADV_DONTNEED)) { > > + if (qemu_madvise(start, length, QEMU_MADV_DONTNEED)) { > > error_report("%s MADV_DONTNEED: %s", __func__, strerror(errno)); I checked history of only change related to 'postcopy'. For my linux machine: ./config-host.mak CONFIG_MADVISE=y CONFIG_POSIX_MADVISE=y As both these options are set for Linux, every time we call call 'qemu_madvise' ==>"madvise(addr, len, advice);" will be compiled/called. I don't understand why '2f2c8d6b371cfc6689affb0b7e' explicitly changed for :"#ifdef CONFIG_LINUX" I think its better to write generic function maybe in a wrapper then to conditionally set something at different places. int qemu_madvise(void *addr, size_t len, int advice) { if (advice == QEMU_MADV_INVALID) { errno = EINVAL; return -1; } #if defined(CONFIG_MADVISE) return madvise(addr, len, advice); #elif defined(CONFIG_POSIX_MADVISE) return posix_madvise(addr, len, advice); #else errno = EINVAL; return -1; #endif } > > And this is the same case. > > Berto >
* Pankaj Gupta (pagupta@redhat.com) wrote: > > Thanks for your comments. I have below query. > > > > On Fri 17 Feb 2017 09:06:04 AM CET, Pankaj Gupta wrote: > > > To maintain consistency at all the places use qemu_madvise wrapper > > > inplace of madvise call. > > > > > if (length > 0) { > > > - madvise((uint8_t *) t + offset, length, MADV_DONTNEED); > > > + qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED); > > > > This was changed two months ago from qemu_madvise() to madvise(), is > > there any reason why you want to revert that change? Those two calls are > > not equivalent, please see commit 2f2c8d6b371cfc6689affb0b7e for an > > explanation. > > > > > - if (madvise(start, length, MADV_DONTNEED)) { > > > + if (qemu_madvise(start, length, QEMU_MADV_DONTNEED)) { > > > error_report("%s MADV_DONTNEED: %s", __func__, strerror(errno)); > > I checked history of only change related to 'postcopy'. > > For my linux machine: > > ./config-host.mak > > CONFIG_MADVISE=y > CONFIG_POSIX_MADVISE=y > > As both these options are set for Linux, every time we call call 'qemu_madvise' ==>"madvise(addr, len, advice);" will > be compiled/called. I don't understand why '2f2c8d6b371cfc6689affb0b7e' explicitly changed for :"#ifdef CONFIG_LINUX" > I think its better to write generic function maybe in a wrapper then to conditionally set something at different places. No; the problem is that the behaviours are different. You're right that the current build on Linux defines MADVISE and thus we are safe because qemu_madvise takes teh CONFIG_MADVISE/madvise route - but we need to be explicit that it's only the madvise() route that's safe, not any of the calls implemented by qemu_madvise, because if in the future someone was to rearrange qemu_madvise to prefer posix_madvise postcopy would break in a very subtle way. IMHO it might even be better to remove the definition of QEMU_MADV_DONTNEED altogether and make a name that wasn't ambiguous between the two, since the posix definition is so different. Dave > int qemu_madvise(void *addr, size_t len, int advice) > { > if (advice == QEMU_MADV_INVALID) { > errno = EINVAL; > return -1; > } > #if defined(CONFIG_MADVISE) > return madvise(addr, len, advice); > #elif defined(CONFIG_POSIX_MADVISE) > return posix_madvise(addr, len, advice); > #else > errno = EINVAL; > return -1; > #endif > } > > > > > And this is the same case. > > > > Berto > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > * Pankaj Gupta (pagupta@redhat.com) wrote: > > > > Thanks for your comments. I have below query. > > > > > > On Fri 17 Feb 2017 09:06:04 AM CET, Pankaj Gupta wrote: > > > > To maintain consistency at all the places use qemu_madvise wrapper > > > > inplace of madvise call. > > > > > > > if (length > 0) { > > > > - madvise((uint8_t *) t + offset, length, MADV_DONTNEED); > > > > + qemu_madvise((uint8_t *) t + offset, length, > > > > QEMU_MADV_DONTNEED); > > > > > > This was changed two months ago from qemu_madvise() to madvise(), is > > > there any reason why you want to revert that change? Those two calls are > > > not equivalent, please see commit 2f2c8d6b371cfc6689affb0b7e for an > > > explanation. > > > > > > > - if (madvise(start, length, MADV_DONTNEED)) { > > > > + if (qemu_madvise(start, length, QEMU_MADV_DONTNEED)) { > > > > error_report("%s MADV_DONTNEED: %s", __func__, > > > > strerror(errno)); > > > > I checked history of only change related to 'postcopy'. > > > > For my linux machine: > > > > ./config-host.mak > > > > CONFIG_MADVISE=y > > CONFIG_POSIX_MADVISE=y > > > > As both these options are set for Linux, every time we call call > > 'qemu_madvise' ==>"madvise(addr, len, advice);" will > > be compiled/called. I don't understand why '2f2c8d6b371cfc6689affb0b7e' > > explicitly changed for :"#ifdef CONFIG_LINUX" > > I think its better to write generic function maybe in a wrapper then to > > conditionally set something at different places. > > No; the problem is that the behaviours are different. > You're right that the current build on Linux defines MADVISE and thus we are > safe because qemu_madvise > takes teh CONFIG_MADVISE/madvise route - but we need to be explicit that it's > only > the madvise() route that's safe, not any of the calls implemented by > qemu_madvise, because if in the future someone was to rearrange qemu_madvise > to prefer posix_madvise postcopy would break in a very subtle way. Agree. We can add comment explaining this? > > IMHO it might even be better to remove the definition of QEMU_MADV_DONTNEED > altogether > and make a name that wasn't ambiguous between the two, since the posix > definition is > so different. I think 'posix_madvise' was added for systems which didnot have 'madvise'. If I look at makefile, first we check what all calls are available and then set config option accordingly. We give 'madvise' precedence over 'posix_madvise' if both are present. For the systems which don't have madvise call 'posix_madvise' is called which as per discussion is not right thing for 'DONTNEED' option. It will not give desired results. Either we have to find right alternative or else it is already broken for systems which don't support madvise. > > Dave > > > int qemu_madvise(void *addr, size_t len, int advice) > > { > > if (advice == QEMU_MADV_INVALID) { > > errno = EINVAL; > > return -1; > > } > > #if defined(CONFIG_MADVISE) > > return madvise(addr, len, advice); > > #elif defined(CONFIG_POSIX_MADVISE) > > return posix_madvise(addr, len, advice); > > #else > > errno = EINVAL; > > return -1; > > #endif > > } > > > > > > > > And this is the same case. > > > > > > Berto > > > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > >
On Fri 17 Feb 2017 12:30:28 PM CET, Pankaj Gupta wrote: >> > To maintain consistency at all the places use qemu_madvise wrapper >> > inplace of madvise call. >> >> > - madvise((uint8_t *) t + offset, length, MADV_DONTNEED); >> > + qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED); >> >> Those two calls are not equivalent, please see commit >> 2f2c8d6b371cfc6689affb0b7e for an explanation. > I don't understand why '2f2c8d6b371cfc6689affb0b7e' explicitly changed > for :"#ifdef CONFIG_LINUX" I think its better to write generic > function maybe in a wrapper then to conditionally set something at > different places. The problem with qemu_madvise(QEMU_MADV_DONTNEED) is that it can mean different things depending on the platform: posix_madvise(POSIX_MADV_DONTNEED) madvise(MADV_DONTNEED) The first call is standard but it doesn't do what we need, so we cannot use it. The second call -- madvise(MADV_DONTNEED) -- is not standard, and it doesn't do the same in all platforms. The only platform in which it does what we need is Linux, hence the #ifdef CONFIG_LINUX and #if defined(__linux__) that you see in the code. I agree with David's comment that maybe it's better to remove QEMU_MADV_DONTNEED altogether since it's not reliable. Berto
On Fri 17 Feb 2017 01:30:09 PM CET, Pankaj Gupta wrote: > I think 'posix_madvise' was added for systems which didnot have > 'madvise' [...] For the systems which don't have madvise call > 'posix_madvise' is called which as per discussion is not right thing > for 'DONTNEED' option. It will not give desired results. > > Either we have to find right alternative or else it is already broken > for systems which don't support madvise. Do you have an example of a call that is currently broken in the QEMU code? Berto
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c index 1d25147..4991ca5 100644 --- a/block/qcow2-cache.c +++ b/block/qcow2-cache.c @@ -74,7 +74,7 @@ static void qcow2_cache_table_release(BlockDriverState *bs, Qcow2Cache *c, size_t offset = QEMU_ALIGN_UP((uintptr_t) t, align) - (uintptr_t) t; size_t length = QEMU_ALIGN_DOWN(mem_size - offset, align); if (length > 0) { - madvise((uint8_t *) t + offset, length, MADV_DONTNEED); + qemu_madvise((uint8_t *) t + offset, length, QEMU_MADV_DONTNEED); } #endif } diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index a40dddb..558fec1 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -213,7 +213,7 @@ int postcopy_ram_discard_range(MigrationIncomingState *mis, uint8_t *start, size_t length) { trace_postcopy_ram_discard_range(start, length); - if (madvise(start, length, MADV_DONTNEED)) { + if (qemu_madvise(start, length, QEMU_MADV_DONTNEED)) { error_report("%s MADV_DONTNEED: %s", __func__, strerror(errno)); return -1; }
To maintain consistency at all the places use qemu_madvise wrapper inplace of madvise call. Signed-off-by: Pankaj Gupta <pagupta@redhat.com> --- block/qcow2-cache.c | 2 +- migration/postcopy-ram.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)