Message ID | 20220707161656.41664-1-cohuck@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | arm: enable MTE for QEMU + kvm | expand |
On 7/7/22 21:46, Cornelia Huck wrote: > If I'm not misunderstanding things, we need a way to fault in a page together > with the tag; doing that in one go is probably the only way that we can be > sure that this is race-free on the QEMU side. That's my understanding as well. r~
* Cornelia Huck (cohuck@redhat.com) wrote: > This series makes it possible to enable MTE for kvm guests, if the kernel > supports it. Again, tested on the simulator via patiently waiting for the > arm64/mte kselftests to finish successfully. > > For tcg, turning on mte on the machine level (to get tag memory) stays a > requirement. If the new mte cpu feature is not explicitly specified, a tcg > vm will get mte depending on the presence of tag memory (just as today). > > For kvm, mte stays off by default; this is because migration is not yet > supported (postcopy will need an extension of the kernel interface, possibly > an extension of the userfaultfd interface), and turning on mte will add a > migration blocker. My assumption was that a normal migration would need something as well to retrieve and place the MTE flags; albeit not atomically. > My biggest question going forward is actually concerning migration; I gather > that we should not bother adding something unless postcopy is working as well? I don't think that restriction is fair on you; just make sure postcopy_ram_supported_by_host gains an arch call and fails cleanly; that way if anyone tries to enable postcopy they'll find out with a clean fail. > If I'm not misunderstanding things, we need a way to fault in a page together > with the tag; doing that in one go is probably the only way that we can be > sure that this is race-free on the QEMU side. Comments welcome :) I think it will. But, ignoring postcopy for a minute, with KVM how do different types of backing memory work - e.g. if I back a region of guest memory with /dev/shm/something or a hugepage equivalent, where does the MTE memory come from, and how do you set it? Dave > Changes v1->v2: [Thanks to Eric for the feedback!] > - add documentation > - switch the mte prop to OnOffAuto; this improves the interaction with the > existing mte machine prop > - leave mte off for kvm by default > - improve tests; the poking in QDicts feels a bit ugly, but seems to work > > Cornelia Huck (2): > arm/kvm: add support for MTE > qtests/arm: add some mte tests > > docs/system/arm/cpu-features.rst | 21 +++++ > target/arm/cpu.c | 18 ++--- > target/arm/cpu.h | 1 + > target/arm/cpu64.c | 132 +++++++++++++++++++++++++++++++ > target/arm/internals.h | 1 + > target/arm/kvm64.c | 5 ++ > target/arm/kvm_arm.h | 12 +++ > target/arm/monitor.c | 1 + > tests/qtest/arm-cpu-features.c | 77 ++++++++++++++++++ > 9 files changed, 256 insertions(+), 12 deletions(-) > > -- > 2.35.3 >
On Mon, 11 Jul 2022 at 14:24, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > But, ignoring postcopy for a minute, with KVM how do different types of > backing memory work - e.g. if I back a region of guest memory with > /dev/shm/something or a hugepage equivalent, where does the MTE memory > come from, and how do you set it? Generally in an MTE system anything that's "plain old RAM" is expected to support tags. (The architecture manual calls this "conventional memory". This isn't quite the same as "anything that looks RAM-like", e.g. the graphics card framebuffer doesn't have to support tags!) One plausible implementation is that the firmware and memory controller are in cahoots and arrange that the appropriate fraction of the DRAM is reserved for holding tags (and inaccessible as normal RAM even by the OS); but where the tags are stored is entirely impdef and an implementation could choose to put the tags in their own entirely separate storage if it liked. The only way to access the tag storage is via the instructions for getting and setting tags. -- PMM
* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: > * Cornelia Huck (cohuck@redhat.com) wrote: > > This series makes it possible to enable MTE for kvm guests, if the kernel > > supports it. Again, tested on the simulator via patiently waiting for the > > arm64/mte kselftests to finish successfully. > > > > For tcg, turning on mte on the machine level (to get tag memory) stays a > > requirement. If the new mte cpu feature is not explicitly specified, a tcg > > vm will get mte depending on the presence of tag memory (just as today). > > > > For kvm, mte stays off by default; this is because migration is not yet > > supported (postcopy will need an extension of the kernel interface, possibly > > an extension of the userfaultfd interface), and turning on mte will add a > > migration blocker. > > My assumption was that a normal migration would need something as well > to retrieve and place the MTE flags; albeit not atomically. > > > My biggest question going forward is actually concerning migration; I gather > > that we should not bother adding something unless postcopy is working as well? > > I don't think that restriction is fair on you; just make sure > postcopy_ram_supported_by_host gains an arch call and fails cleanly; > that way if anyone tries to enable postcopy they'll find out with a > clean fail. > > > If I'm not misunderstanding things, we need a way to fault in a page together > > with the tag; doing that in one go is probably the only way that we can be > > sure that this is race-free on the QEMU side. Comments welcome :) > > I think it will. > But, ignoring postcopy for a minute, with KVM how do different types of > backing memory work - e.g. if I back a region of guest memory with > /dev/shm/something or a hugepage equivalent, where does the MTE memory > come from, and how do you set it? Another case that just came to mind, are the data content optimisations; we special case all-zero pages, which I guess you still need to transmit tags for, and the xbzrle page-difference code wouldn't notice differences in tags. Dave > Dave > > > Changes v1->v2: [Thanks to Eric for the feedback!] > > - add documentation > > - switch the mte prop to OnOffAuto; this improves the interaction with the > > existing mte machine prop > > - leave mte off for kvm by default > > - improve tests; the poking in QDicts feels a bit ugly, but seems to work > > > > Cornelia Huck (2): > > arm/kvm: add support for MTE > > qtests/arm: add some mte tests > > > > docs/system/arm/cpu-features.rst | 21 +++++ > > target/arm/cpu.c | 18 ++--- > > target/arm/cpu.h | 1 + > > target/arm/cpu64.c | 132 +++++++++++++++++++++++++++++++ > > target/arm/internals.h | 1 + > > target/arm/kvm64.c | 5 ++ > > target/arm/kvm_arm.h | 12 +++ > > target/arm/monitor.c | 1 + > > tests/qtest/arm-cpu-features.c | 77 ++++++++++++++++++ > > 9 files changed, 256 insertions(+), 12 deletions(-) > > > > -- > > 2.35.3 > > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
* Peter Maydell (peter.maydell@linaro.org) wrote: > On Mon, 11 Jul 2022 at 14:24, Dr. David Alan Gilbert > <dgilbert@redhat.com> wrote: > > But, ignoring postcopy for a minute, with KVM how do different types of > > backing memory work - e.g. if I back a region of guest memory with > > /dev/shm/something or a hugepage equivalent, where does the MTE memory > > come from, and how do you set it? > > Generally in an MTE system anything that's "plain old RAM" is expected > to support tags. (The architecture manual calls this "conventional > memory". This isn't quite the same as "anything that looks RAM-like", > e.g. the graphics card framebuffer doesn't have to support tags!) I guess things like non-volatile disks mapped as DAX are fun edge cases. > One plausible implementation is that the firmware and memory controller > are in cahoots and arrange that the appropriate fraction of the DRAM is > reserved for holding tags (and inaccessible as normal RAM even by the OS); > but where the tags are stored is entirely impdef and an implementation > could choose to put the tags in their own entirely separate storage if > it liked. The only way to access the tag storage is via the instructions > for getting and setting tags. Hmm OK; In postcopy, at the moment, the call qemu uses is a call that atomically places a page of data in memory and then tells the vCPUs to continue. I guess a variant that took an extra blob of MTE data would do. Note that other VMMs built on kvm work in different ways; the other common way is to write into the backing file (i.e. the /dev/shm whatever atomically somehow) and then do the userfault call to tell the vcpus to continue. It looks like this is the way things will work in the split hugepage mechanism Google are currently adding. Dave > -- PMM >
On Mon, Jul 11 2022, "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: > * Peter Maydell (peter.maydell@linaro.org) wrote: >> On Mon, 11 Jul 2022 at 14:24, Dr. David Alan Gilbert >> <dgilbert@redhat.com> wrote: >> > But, ignoring postcopy for a minute, with KVM how do different types of >> > backing memory work - e.g. if I back a region of guest memory with >> > /dev/shm/something or a hugepage equivalent, where does the MTE memory >> > come from, and how do you set it? >> >> Generally in an MTE system anything that's "plain old RAM" is expected >> to support tags. (The architecture manual calls this "conventional >> memory". This isn't quite the same as "anything that looks RAM-like", >> e.g. the graphics card framebuffer doesn't have to support tags!) > > I guess things like non-volatile disks mapped as DAX are fun edge cases. > >> One plausible implementation is that the firmware and memory controller >> are in cahoots and arrange that the appropriate fraction of the DRAM is >> reserved for holding tags (and inaccessible as normal RAM even by the OS); >> but where the tags are stored is entirely impdef and an implementation >> could choose to put the tags in their own entirely separate storage if >> it liked. The only way to access the tag storage is via the instructions >> for getting and setting tags. > > Hmm OK; In postcopy, at the moment, the call qemu uses is a call that > atomically places a page of data in memory and then tells the vCPUs to > continue. I guess a variant that took an extra blob of MTE data would > do. Yes, the current idea is to extend UFFDIO_COPY with a flag so that we get the tag data along with the page. > Note that other VMMs built on kvm work in different ways; the other > common way is to write into the backing file (i.e. the /dev/shm > whatever atomically somehow) and then do the userfault call to tell the > vcpus to continue. It looks like this is the way things will work in > the split hugepage mechanism Google are currently adding. Hmm... I had the impression that other VMMs had not cared about this particular use case yet; if they need a slightly different mechanism, it would complicate things a bit.
On Mon, Jul 11 2022, "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: > * Cornelia Huck (cohuck@redhat.com) wrote: >> For kvm, mte stays off by default; this is because migration is not yet >> supported (postcopy will need an extension of the kernel interface, possibly >> an extension of the userfaultfd interface), and turning on mte will add a >> migration blocker. > > My assumption was that a normal migration would need something as well > to retrieve and place the MTE flags; albeit not atomically. There's KVM_ARM_MTE_COPY_TAGS, which should be sufficient to move tags around for normal migration. > >> My biggest question going forward is actually concerning migration; I gather >> that we should not bother adding something unless postcopy is working as well? > > I don't think that restriction is fair on you; just make sure > postcopy_ram_supported_by_host gains an arch call and fails cleanly; > that way if anyone tries to enable postcopy they'll find out with a > clean fail. Ok, if simply fencing off postcopy is fine, we can try to move forward with what we have now. The original attempt at https://lore.kernel.org/all/881871e8394fa18a656dfb105d42e6099335c721.1615972140.git.haibo.xu@linaro.org/ hooked itself directly into common code; maybe we should rather copy the approach used for s390 storage keys (extra "device") instead?
* Cornelia Huck (cohuck@redhat.com) wrote: > On Mon, Jul 11 2022, "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: > > > * Cornelia Huck (cohuck@redhat.com) wrote: > >> For kvm, mte stays off by default; this is because migration is not yet > >> supported (postcopy will need an extension of the kernel interface, possibly > >> an extension of the userfaultfd interface), and turning on mte will add a > >> migration blocker. > > > > My assumption was that a normal migration would need something as well > > to retrieve and place the MTE flags; albeit not atomically. > > There's KVM_ARM_MTE_COPY_TAGS, which should be sufficient to move tags > around for normal migration. > > > > >> My biggest question going forward is actually concerning migration; I gather > >> that we should not bother adding something unless postcopy is working as well? > > > > I don't think that restriction is fair on you; just make sure > > postcopy_ram_supported_by_host gains an arch call and fails cleanly; > > that way if anyone tries to enable postcopy they'll find out with a > > clean fail. > > Ok, if simply fencing off postcopy is fine, we can try to move forward > with what we have now. The original attempt at > https://lore.kernel.org/all/881871e8394fa18a656dfb105d42e6099335c721.1615972140.git.haibo.xu@linaro.org/ > hooked itself directly into common code; maybe we should rather copy the > approach used for s390 storage keys (extra "device") instead? I don't understand how a separate device would keep the idea of page changed flags coherent with the main RAM that the tags correspond to. Dave
* Cornelia Huck (cohuck@redhat.com) wrote: > On Mon, Jul 11 2022, "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: > > > * Peter Maydell (peter.maydell@linaro.org) wrote: > >> On Mon, 11 Jul 2022 at 14:24, Dr. David Alan Gilbert > >> <dgilbert@redhat.com> wrote: > >> > But, ignoring postcopy for a minute, with KVM how do different types of > >> > backing memory work - e.g. if I back a region of guest memory with > >> > /dev/shm/something or a hugepage equivalent, where does the MTE memory > >> > come from, and how do you set it? > >> > >> Generally in an MTE system anything that's "plain old RAM" is expected > >> to support tags. (The architecture manual calls this "conventional > >> memory". This isn't quite the same as "anything that looks RAM-like", > >> e.g. the graphics card framebuffer doesn't have to support tags!) > > > > I guess things like non-volatile disks mapped as DAX are fun edge cases. > > > >> One plausible implementation is that the firmware and memory controller > >> are in cahoots and arrange that the appropriate fraction of the DRAM is > >> reserved for holding tags (and inaccessible as normal RAM even by the OS); > >> but where the tags are stored is entirely impdef and an implementation > >> could choose to put the tags in their own entirely separate storage if > >> it liked. The only way to access the tag storage is via the instructions > >> for getting and setting tags. > > > > Hmm OK; In postcopy, at the moment, the call qemu uses is a call that > > atomically places a page of data in memory and then tells the vCPUs to > > continue. I guess a variant that took an extra blob of MTE data would > > do. > > Yes, the current idea is to extend UFFDIO_COPY with a flag so that we > get the tag data along with the page. > > > Note that other VMMs built on kvm work in different ways; the other > > common way is to write into the backing file (i.e. the /dev/shm > > whatever atomically somehow) and then do the userfault call to tell the > > vcpus to continue. It looks like this is the way things will work in > > the split hugepage mechanism Google are currently adding. > > Hmm... I had the impression that other VMMs had not cared about this > particular use case yet; if they need a slightly different mechanism, > it would complicate things a bit. I think Google's internal VMM doesn't use UFFDIO_COPY - but I don't have details to be sure of that. Dave