Message ID | 20241126080213.248-1-weichenforschung@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size | expand |
> Please provide more information how this is supposed to work We initially discovered that virtio-mem could be used by a malicious agent to trigger the Rowhammer vulnerability and further achieve a VM escape. Simply speaking, Rowhammer is a DRAM vulnerability where frequent access to a memory location might cause voltage leakage to adjacent locations, effectively flipping bits in these locations. In other words, with Rowhammer, an adversary can modify the data stored in the memory. For a complete attack, an adversary needs to: a) determine which parts of the memory are prone to bit flips, b) trick the system to store important data on those parts of memory and c) trigger bit flips to tamper important data. Now, for an attacker who only has access to their VM but not to the hypervisor, one important challenge among the three is b), i.e., to give back the memory they determine as vulnerable to the hypervisor. This is where the pitfall for virtio-mem lies: the attacker can modify the virtio-mem driver in the VM's kernel and unplug memory proactively. The current impl of virtio-mem in qemu does not check if it is valid for the VM to unplug memory. Therefore, as is proved by our experiments, this method works in practice. > whether this is a purely theoretical case, and how relevant this is in > practice. In our design, on a host machine equipped with certain Intel processors and inside a VM that a) has a passed-through PCI device, b) has a vIOMMU and c) has a virtio-mem device, an attacker can force the EPT to use pages that are prone to Rowhammer bit flips and thus modify the EPT to gain read and write privileges to an arbitrary memory location. Our efforts involved conducting end-to-end attacks on two separate machines with the Core i3-10100 and the Xeon E2124 processors respectively, and has achieved successful VM escapes. > Further, what about virtio-balloon, which does not even support > rejecting requests? virtio-balloon does not work with device passthrough currently, so we have yet to produce a feasible attack with it. > I recall that that behavior was desired once the driver would support > de-fragmenting unplugged memory blocks. By "that behavior" do you mean to unplug memory when size <= requested_size? I am not sure how that is to be implemented. > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed That is true, but the attacker will want the capability to release a specific sub-block. In fact, a sub-block is still somewhat coarse, because most likely there is only one page in a sub-block that contains potential bit flips. When the attacker spawns EPTEs, they have to spawn enough to make sure the target page is used to store the EPTEs. A 2MB sub-block can store 2MB/4KB*512=262,144 EPTEs, equating to at least 1GB of memory. In other words, the attack program exhausts 1GB of memory just for the possibility that KVM uses the target page to store EPTEs. Best regards, Wei Chen On 2024/11/26 20:29, David Hildenbrand wrote: > On 26.11.24 09:02, Wei Chen wrote: >> A malicious guest can exploit virtio-mem to release memory back to the >> hypervisor and attempt Rowhammer attacks. > > Please provide more information how this is supposed to work, whether > this is a purely theoretical case, and how relevant this is in practice. > > Because I am not sure how relevant and accurate this statement is, and > if any action is needed at all. > > Further, what about virtio-balloon, which does not even support > rejecting requests? > > The only case reasonable for >> unplugging is when the size > requested_size. > > I recall that that behavior was desired once the driver would support > de-fragmenting unplugged memory blocks. I don't think drivers do that > today (would have to double-check the Windows one). The spec does not > document what is to happen in that case. > > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed, so > this change would not cover all cases. VIRTIO_MEM_REQ_UNPLUG_ALL could > be ratelimited -- if there is a real issue here. > > >> >> Signed-off-by: Wei Chen <weichenforschung@gmail.com> >> Signed-off-by: Zhi Zhang <zzhangphd@gmail.com> >> --- >> hw/virtio/virtio-mem.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c >> index 80ada89551..4ef67082a2 100644 >> --- a/hw/virtio/virtio-mem.c >> +++ b/hw/virtio/virtio-mem.c >> @@ -671,6 +671,10 @@ static int >> virtio_mem_state_change_request(VirtIOMEM *vmem, uint64_t gpa, >> return VIRTIO_MEM_RESP_NACK; >> } >> + if (!plug && vmem->size <= vmem->requested_size) { >> + return VIRTIO_MEM_RESP_NACK; >> + } >> + >> /* test if really all blocks are in the opposite state */ >> if ((plug && !virtio_mem_is_range_unplugged(vmem, gpa, size)) || >> (!plug && !virtio_mem_is_range_plugged(vmem, gpa, size))) { > >
On 26.11.24 15:20, Wei Chen wrote: > > Please provide more information how this is supposed to work > Thanks for the information. A lot of what you wrote belongs into the patch description. Especially, that this might currently only be relevant with device passthrough + viommu. > We initially discovered that virtio-mem could be used by a malicious > agent to trigger the Rowhammer vulnerability and further achieve a VM > escape. > > Simply speaking, Rowhammer is a DRAM vulnerability where frequent access > to a memory location might cause voltage leakage to adjacent locations, > effectively flipping bits in these locations. In other words, with > Rowhammer, an adversary can modify the data stored in the memory. > > For a complete attack, an adversary needs to: a) determine which parts > of the memory are prone to bit flips, b) trick the system to store > important data on those parts of memory and c) trigger bit flips to > tamper important data. > > Now, for an attacker who only has access to their VM but not to the > hypervisor, one important challenge among the three is b), i.e., to give > back the memory they determine as vulnerable to the hypervisor. This is > where the pitfall for virtio-mem lies: the attacker can modify the > virtio-mem driver in the VM's kernel and unplug memory proactively. But b), as you write, is not only about giving back that memory to the hypervisor. How can you be sure (IOW trigger) that the system will store "important data" like EPTs? > > The current impl of virtio-mem in qemu does not check if it is valid for > the VM to unplug memory. Therefore, as is proved by our experiments, > this method works in practice. > > > whether this is a purely theoretical case, and how relevant this is in > > practice. > > In our design, on a host machine equipped with certain Intel processors > and inside a VM that a) has a passed-through PCI device, b) has a vIOMMU > and c) has a virtio-mem device, an attacker can force the EPT to use > pages that are prone to Rowhammer bit flips and thus modify the EPT to > gain read and write privileges to an arbitrary memory location. > > Our efforts involved conducting end-to-end attacks on two separate > machines with the Core i3-10100 and the Xeon E2124 processors > respectively, and has achieved successful VM escapes. Out of curiosity, are newer CPUs no longer affected? > > > Further, what about virtio-balloon, which does not even support > > rejecting requests? > > virtio-balloon does not work with device passthrough currently, so we > have yet to produce a feasible attack with it. So is one magic bit really that for your experiments, one needs a viommu? The only mentioning of rohammer+memory ballooning I found is: https://www.whonix.org/pipermail/whonix-devel/2016-September/000746.html > > > I recall that that behavior was desired once the driver would support > > de-fragmenting unplugged memory blocks. > > By "that behavior" do you mean to unplug memory when size <= > requested_size? I am not sure how that is to be implemented. To defragment, the idea was to unplug one additional block, so we can plug another block. > > > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed > > That is true, but the attacker will want the capability to release a > specific sub-block. So it won't be sufficient to have a single sub-block plugged and then trigger VIRTIO_MEM_REQ_UNPLUG_ALL? > > In fact, a sub-block is still somewhat coarse, because most likely there > is only one page in a sub-block that contains potential bit flips. When > the attacker spawns EPTEs, they have to spawn enough to make sure the > target page is used to store the EPTEs. > > A 2MB sub-block can store 2MB/4KB*512=262,144 EPTEs, equating to at > least 1GB of memory. In other words, the attack program exhausts 1GB of > memory just for the possibility that KVM uses the target page to store > EPTEs. Ah, that makes sense. Can you compress what you wrote into the patch description? Further, I assume we want to add a Fixes: tag and Cc: QEMU Stable <qemu-stable@nongnu.org> Thanks!
On 26.11.24 15:46, David Hildenbrand wrote: > On 26.11.24 15:20, Wei Chen wrote: >> > Please provide more information how this is supposed to work >> > > Thanks for the information. A lot of what you wrote belongs into the > patch description. Especially, that this might currently only be > relevant with device passthrough + viommu. > >> We initially discovered that virtio-mem could be used by a malicious >> agent to trigger the Rowhammer vulnerability and further achieve a VM >> escape. >> >> Simply speaking, Rowhammer is a DRAM vulnerability where frequent access >> to a memory location might cause voltage leakage to adjacent locations, >> effectively flipping bits in these locations. In other words, with >> Rowhammer, an adversary can modify the data stored in the memory. >> >> For a complete attack, an adversary needs to: a) determine which parts >> of the memory are prone to bit flips, b) trick the system to store >> important data on those parts of memory and c) trigger bit flips to >> tamper important data. >> >> Now, for an attacker who only has access to their VM but not to the >> hypervisor, one important challenge among the three is b), i.e., to give >> back the memory they determine as vulnerable to the hypervisor. This is >> where the pitfall for virtio-mem lies: the attacker can modify the >> virtio-mem driver in the VM's kernel and unplug memory proactively. > > But b), as you write, is not only about giving back that memory to the > hypervisor. How can you be sure (IOW trigger) that the system will store > "important data" like EPTs? > >> >> The current impl of virtio-mem in qemu does not check if it is valid for >> the VM to unplug memory. Therefore, as is proved by our experiments, >> this method works in practice. >> >> > whether this is a purely theoretical case, and how relevant this is in >> > practice. >> >> In our design, on a host machine equipped with certain Intel processors >> and inside a VM that a) has a passed-through PCI device, b) has a vIOMMU >> and c) has a virtio-mem device, an attacker can force the EPT to use >> pages that are prone to Rowhammer bit flips and thus modify the EPT to >> gain read and write privileges to an arbitrary memory location. >> >> Our efforts involved conducting end-to-end attacks on two separate >> machines with the Core i3-10100 and the Xeon E2124 processors >> respectively, and has achieved successful VM escapes. > > Out of curiosity, are newer CPUs no longer affected? > >> >> > Further, what about virtio-balloon, which does not even support >> > rejecting requests? >> >> virtio-balloon does not work with device passthrough currently, so we >> have yet to produce a feasible attack with it. > > So is one magic bit really that for your experiments, one needs a viommu? > > The only mentioning of rohammer+memory ballooning I found is: > https://www.whonix.org/pipermail/whonix-devel/2016-September/000746.html > >> >> > I recall that that behavior was desired once the driver would support >> > de-fragmenting unplugged memory blocks. >> >> By "that behavior" do you mean to unplug memory when size <= >> requested_size? I am not sure how that is to be implemented. > > To defragment, the idea was to unplug one additional block, so we can > plug another block. > >> >> > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed >> >> That is true, but the attacker will want the capability to release a >> specific sub-block. > > So it won't be sufficient to have a single sub-block plugged and then > trigger VIRTIO_MEM_REQ_UNPLUG_ALL? > >> >> In fact, a sub-block is still somewhat coarse, because most likely there >> is only one page in a sub-block that contains potential bit flips. When >> the attacker spawns EPTEs, they have to spawn enough to make sure the >> target page is used to store the EPTEs. >> >> A 2MB sub-block can store 2MB/4KB*512=262,144 EPTEs, equating to at >> least 1GB of memory. In other words, the attack program exhausts 1GB of >> memory just for the possibility that KVM uses the target page to store >> EPTEs. > > Ah, that makes sense. > > Can you compress what you wrote into the patch description? Further, I > assume we want to add a Fixes: tag and Cc: QEMU Stable > <qemu-stable@nongnu.org> I just recalled another scenario where we unplug memory: see virtio_mem_cleanup_pending_mb() in the Linux driver as one example. We first plug memory, then add the memory to Linux. If that adding fails, we unplug the memory again. So this change can turn the virtio_mem driver in Linux non-functional, unfortunately.
On 26.11.24 16:08, David Hildenbrand wrote: > On 26.11.24 15:46, David Hildenbrand wrote: >> On 26.11.24 15:20, Wei Chen wrote: >>> > Please provide more information how this is supposed to work >>> >> >> Thanks for the information. A lot of what you wrote belongs into the >> patch description. Especially, that this might currently only be >> relevant with device passthrough + viommu. >> >>> We initially discovered that virtio-mem could be used by a malicious >>> agent to trigger the Rowhammer vulnerability and further achieve a VM >>> escape. >>> >>> Simply speaking, Rowhammer is a DRAM vulnerability where frequent access >>> to a memory location might cause voltage leakage to adjacent locations, >>> effectively flipping bits in these locations. In other words, with >>> Rowhammer, an adversary can modify the data stored in the memory. >>> >>> For a complete attack, an adversary needs to: a) determine which parts >>> of the memory are prone to bit flips, b) trick the system to store >>> important data on those parts of memory and c) trigger bit flips to >>> tamper important data. >>> >>> Now, for an attacker who only has access to their VM but not to the >>> hypervisor, one important challenge among the three is b), i.e., to give >>> back the memory they determine as vulnerable to the hypervisor. This is >>> where the pitfall for virtio-mem lies: the attacker can modify the >>> virtio-mem driver in the VM's kernel and unplug memory proactively. >> >> But b), as you write, is not only about giving back that memory to the >> hypervisor. How can you be sure (IOW trigger) that the system will store >> "important data" like EPTs? >> >>> >>> The current impl of virtio-mem in qemu does not check if it is valid for >>> the VM to unplug memory. Therefore, as is proved by our experiments, >>> this method works in practice. >>> >>> > whether this is a purely theoretical case, and how relevant this is in >>> > practice. >>> >>> In our design, on a host machine equipped with certain Intel processors >>> and inside a VM that a) has a passed-through PCI device, b) has a vIOMMU >>> and c) has a virtio-mem device, an attacker can force the EPT to use >>> pages that are prone to Rowhammer bit flips and thus modify the EPT to >>> gain read and write privileges to an arbitrary memory location. >>> >>> Our efforts involved conducting end-to-end attacks on two separate >>> machines with the Core i3-10100 and the Xeon E2124 processors >>> respectively, and has achieved successful VM escapes. >> >> Out of curiosity, are newer CPUs no longer affected? >> >>> >>> > Further, what about virtio-balloon, which does not even support >>> > rejecting requests? >>> >>> virtio-balloon does not work with device passthrough currently, so we >>> have yet to produce a feasible attack with it. >> >> So is one magic bit really that for your experiments, one needs a viommu? >> >> The only mentioning of rohammer+memory ballooning I found is: >> https://www.whonix.org/pipermail/whonix-devel/2016-September/000746.html >> >>> >>> > I recall that that behavior was desired once the driver would support >>> > de-fragmenting unplugged memory blocks. >>> >>> By "that behavior" do you mean to unplug memory when size <= >>> requested_size? I am not sure how that is to be implemented. >> >> To defragment, the idea was to unplug one additional block, so we can >> plug another block. >> >>> >>> > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed >>> >>> That is true, but the attacker will want the capability to release a >>> specific sub-block. >> >> So it won't be sufficient to have a single sub-block plugged and then >> trigger VIRTIO_MEM_REQ_UNPLUG_ALL? >> >>> >>> In fact, a sub-block is still somewhat coarse, because most likely there >>> is only one page in a sub-block that contains potential bit flips. When >>> the attacker spawns EPTEs, they have to spawn enough to make sure the >>> target page is used to store the EPTEs. >>> >>> A 2MB sub-block can store 2MB/4KB*512=262,144 EPTEs, equating to at >>> least 1GB of memory. In other words, the attack program exhausts 1GB of >>> memory just for the possibility that KVM uses the target page to store >>> EPTEs. >> >> Ah, that makes sense. >> >> Can you compress what you wrote into the patch description? Further, I >> assume we want to add a Fixes: tag and Cc: QEMU Stable >> <qemu-stable@nongnu.org> > > I just recalled another scenario where we unplug memory: see > virtio_mem_cleanup_pending_mb() in the Linux driver as one example. > > We first plug memory, then add the memory to Linux. If that adding > fails, we unplug the memory again. > > So this change can turn the virtio_mem driver in Linux non-functional, > unfortunately. Further, the Linux driver does not expect a NACK on unplug requests, see virtio_mem_send_unplug_request(). So this change won't work. We could return VIRTIO_MEM_RESP_BUSY, but to handle what I raised above, we would still have to make it work every now and then (ratelimit), to not break the driver. The alternative is to delay freeing of the memory in case we run into this condition. Hm ...
> How can you be sure (IOW trigger) that the system will store > "important data" like EPTs? We cannot, but we have designed the attack (see below) to improve the possibility. > So is one magic bit really that for your experiments, one needs a > viommu? Admittedly the way we accomplish a VM escape is a bit arcane. We require device passthrough because it pins the VM's memory down and converts them to MIGRATE_UNMOVABLE. Hotplugged memory will also be converted to MIGRATE_UNMOVABLE. That way when we give memory back to the hypervisor, they stay UNMOVABLE. Otherwise we will have to convert the pages to UNMOVABLE or exhaust ALL MIGRATE_MOVALE pages, both of which cannot be easily accomplished. Then we require vIOMMU because vIOMMU mappings, much like EPTEs, use MIGRATE_UNMOVABLE pages as well. By spawning lots of meaningless vIOMMU entries, we exhaust UNMOVABLE page blocks of lower orders (<9). Next time KVM tries to allocate pages to store EPTEs, the kernel has to split an order-9 page block, which is exactly the size of a 2MB sub-block. > Out of curiosity, are newer CPUs no longer affected? When qemu pins down the VM's memory, it also establishes every possible mapping to the VM's memory in the EPT. To spawn new EPTEs, we exploit KVM's fix to the iTLB multihit bug. Basically, we execute a bunch of no-op functions, and KVM will have to split hugepages into 4KB pages. This process creates a large number of EPTEs. The iTLB multihit bug roughly speaking is only present on non-Atom Intel CPUs manufactured before 2020. > So it won't be sufficient to have a single sub-block plugged and then > trigger VIRTIO_MEM_REQ_UNPLUG_ALL? Could work in theory, but if the newly plugged sub-block does not contain vulnerable pages, there is no promise that the attacker would get a sub-block containing a different set of pages next time. It also depends heavily on the configuration of the virtio-mem device. If there is not much non-virtio-mem memory for the VM, the attacker could easily run out of memory. Best regards, Wei Chen On 2024/11/26 22:46, David Hildenbrand wrote: > On 26.11.24 15:20, Wei Chen wrote: >> > Please provide more information how this is supposed to work >> > > Thanks for the information. A lot of what you wrote belongs into the > patch description. Especially, that this might currently only be > relevant with device passthrough + viommu. > >> We initially discovered that virtio-mem could be used by a malicious >> agent to trigger the Rowhammer vulnerability and further achieve a VM >> escape. >> >> Simply speaking, Rowhammer is a DRAM vulnerability where frequent access >> to a memory location might cause voltage leakage to adjacent locations, >> effectively flipping bits in these locations. In other words, with >> Rowhammer, an adversary can modify the data stored in the memory. >> >> For a complete attack, an adversary needs to: a) determine which parts >> of the memory are prone to bit flips, b) trick the system to store >> important data on those parts of memory and c) trigger bit flips to >> tamper important data. >> >> Now, for an attacker who only has access to their VM but not to the >> hypervisor, one important challenge among the three is b), i.e., to give >> back the memory they determine as vulnerable to the hypervisor. This is >> where the pitfall for virtio-mem lies: the attacker can modify the >> virtio-mem driver in the VM's kernel and unplug memory proactively. > > But b), as you write, is not only about giving back that memory to the > hypervisor. How can you be sure (IOW trigger) that the system will > store "important data" like EPTs? > >> >> The current impl of virtio-mem in qemu does not check if it is valid for >> the VM to unplug memory. Therefore, as is proved by our experiments, >> this method works in practice. >> >> > whether this is a purely theoretical case, and how relevant this >> is in >> > practice. >> >> In our design, on a host machine equipped with certain Intel processors >> and inside a VM that a) has a passed-through PCI device, b) has a vIOMMU >> and c) has a virtio-mem device, an attacker can force the EPT to use >> pages that are prone to Rowhammer bit flips and thus modify the EPT to >> gain read and write privileges to an arbitrary memory location. >> >> Our efforts involved conducting end-to-end attacks on two separate >> machines with the Core i3-10100 and the Xeon E2124 processors >> respectively, and has achieved successful VM escapes. > > Out of curiosity, are newer CPUs no longer affected? > >> >> > Further, what about virtio-balloon, which does not even support >> > rejecting requests? >> >> virtio-balloon does not work with device passthrough currently, so we >> have yet to produce a feasible attack with it. > > So is one magic bit really that for your experiments, one needs a viommu? > > The only mentioning of rohammer+memory ballooning I found is: > https://www.whonix.org/pipermail/whonix-devel/2016-September/000746.html > >> >> > I recall that that behavior was desired once the driver would >> support >> > de-fragmenting unplugged memory blocks. >> >> By "that behavior" do you mean to unplug memory when size <= >> requested_size? I am not sure how that is to be implemented. > > To defragment, the idea was to unplug one additional block, so we can > plug another block. > >> >> > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed >> >> That is true, but the attacker will want the capability to release a >> specific sub-block. > > So it won't be sufficient to have a single sub-block plugged and then > trigger VIRTIO_MEM_REQ_UNPLUG_ALL? > >> >> In fact, a sub-block is still somewhat coarse, because most likely there >> is only one page in a sub-block that contains potential bit flips. When >> the attacker spawns EPTEs, they have to spawn enough to make sure the >> target page is used to store the EPTEs. >> >> A 2MB sub-block can store 2MB/4KB*512=262,144 EPTEs, equating to at >> least 1GB of memory. In other words, the attack program exhausts 1GB of >> memory just for the possibility that KVM uses the target page to store >> EPTEs. > > Ah, that makes sense. > > Can you compress what you wrote into the patch description? Further, I > assume we want to add a Fixes: tag and Cc: QEMU Stable > <qemu-stable@nongnu.org> > > Thanks! >
Thanks for the information! I will try to come up with V2 that does not impact virtio-mem's functionality. Best regards, Wei Chen On 2024/11/26 23:14, David Hildenbrand wrote: > On 26.11.24 16:08, David Hildenbrand wrote: >> On 26.11.24 15:46, David Hildenbrand wrote: >>> On 26.11.24 15:20, Wei Chen wrote: >>>> > Please provide more information how this is supposed to work >>>> >>> >>> Thanks for the information. A lot of what you wrote belongs into the >>> patch description. Especially, that this might currently only be >>> relevant with device passthrough + viommu. >>> >>>> We initially discovered that virtio-mem could be used by a malicious >>>> agent to trigger the Rowhammer vulnerability and further achieve a VM >>>> escape. >>>> >>>> Simply speaking, Rowhammer is a DRAM vulnerability where frequent >>>> access >>>> to a memory location might cause voltage leakage to adjacent >>>> locations, >>>> effectively flipping bits in these locations. In other words, with >>>> Rowhammer, an adversary can modify the data stored in the memory. >>>> >>>> For a complete attack, an adversary needs to: a) determine which parts >>>> of the memory are prone to bit flips, b) trick the system to store >>>> important data on those parts of memory and c) trigger bit flips to >>>> tamper important data. >>>> >>>> Now, for an attacker who only has access to their VM but not to the >>>> hypervisor, one important challenge among the three is b), i.e., to >>>> give >>>> back the memory they determine as vulnerable to the hypervisor. >>>> This is >>>> where the pitfall for virtio-mem lies: the attacker can modify the >>>> virtio-mem driver in the VM's kernel and unplug memory proactively. >>> >>> But b), as you write, is not only about giving back that memory to the >>> hypervisor. How can you be sure (IOW trigger) that the system will >>> store >>> "important data" like EPTs? >>> >>>> >>>> The current impl of virtio-mem in qemu does not check if it is >>>> valid for >>>> the VM to unplug memory. Therefore, as is proved by our experiments, >>>> this method works in practice. >>>> >>>> > whether this is a purely theoretical case, and how relevant >>>> this is in >>>> > practice. >>>> >>>> In our design, on a host machine equipped with certain Intel >>>> processors >>>> and inside a VM that a) has a passed-through PCI device, b) has a >>>> vIOMMU >>>> and c) has a virtio-mem device, an attacker can force the EPT to use >>>> pages that are prone to Rowhammer bit flips and thus modify the EPT to >>>> gain read and write privileges to an arbitrary memory location. >>>> >>>> Our efforts involved conducting end-to-end attacks on two separate >>>> machines with the Core i3-10100 and the Xeon E2124 processors >>>> respectively, and has achieved successful VM escapes. >>> >>> Out of curiosity, are newer CPUs no longer affected? >>> >>>> >>>> > Further, what about virtio-balloon, which does not even support >>>> > rejecting requests? >>>> >>>> virtio-balloon does not work with device passthrough currently, so we >>>> have yet to produce a feasible attack with it. >>> >>> So is one magic bit really that for your experiments, one needs a >>> viommu? >>> >>> The only mentioning of rohammer+memory ballooning I found is: >>> https://www.whonix.org/pipermail/whonix-devel/2016-September/000746.html >>> >>> >>>> >>>> > I recall that that behavior was desired once the driver would >>>> support >>>> > de-fragmenting unplugged memory blocks. >>>> >>>> By "that behavior" do you mean to unplug memory when size <= >>>> requested_size? I am not sure how that is to be implemented. >>> >>> To defragment, the idea was to unplug one additional block, so we can >>> plug another block. >>> >>>> >>>> > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be >>>> allowed >>>> >>>> That is true, but the attacker will want the capability to release a >>>> specific sub-block. >>> >>> So it won't be sufficient to have a single sub-block plugged and then >>> trigger VIRTIO_MEM_REQ_UNPLUG_ALL? >>> >>>> >>>> In fact, a sub-block is still somewhat coarse, because most likely >>>> there >>>> is only one page in a sub-block that contains potential bit flips. >>>> When >>>> the attacker spawns EPTEs, they have to spawn enough to make sure the >>>> target page is used to store the EPTEs. >>>> >>>> A 2MB sub-block can store 2MB/4KB*512=262,144 EPTEs, equating to at >>>> least 1GB of memory. In other words, the attack program exhausts >>>> 1GB of >>>> memory just for the possibility that KVM uses the target page to store >>>> EPTEs. >>> >>> Ah, that makes sense. >>> >>> Can you compress what you wrote into the patch description? Further, I >>> assume we want to add a Fixes: tag and Cc: QEMU Stable >>> <qemu-stable@nongnu.org> >> >> I just recalled another scenario where we unplug memory: see >> virtio_mem_cleanup_pending_mb() in the Linux driver as one example. >> >> We first plug memory, then add the memory to Linux. If that adding >> fails, we unplug the memory again. >> >> So this change can turn the virtio_mem driver in Linux non-functional, >> unfortunately. > > Further, the Linux driver does not expect a NACK on unplug requests, > see virtio_mem_send_unplug_request(). > > So this change won't work. > > We could return VIRTIO_MEM_RESP_BUSY, but to handle what I raised > above, we would still have to make it work every now and then > (ratelimit), to not break the driver. > > The alternative is to delay freeing of the memory in case we run into > this condition. Hm ... >
On 26.11.24 16:31, Wei Chen wrote: > > How can you be sure (IOW trigger) that the system will store > > "important data" like EPTs? > > We cannot, but we have designed the attack (see below) to improve the > possibility. > > > So is one magic bit really that for your experiments, one needs a > > viommu? > > Admittedly the way we accomplish a VM escape is a bit arcane. That's what I imagined :) > > We require device passthrough because it pins the VM's memory down and > converts them to MIGRATE_UNMOVABLE. Interesting, that's news to me. Can you share where GUP in the kernel would do that? > Hotplugged memory will also be > converted to MIGRATE_UNMOVABLE. But that's in the VM? Because we don't hotplug memory in the hypervisor. That way when we give memory back to the > hypervisor, they stay UNMOVABLE. Otherwise we will have to convert the > pages to UNMOVABLE or exhaust ALL MIGRATE_MOVALE pages, both of which > cannot be easily accomplished. > > Then we require vIOMMU because vIOMMU mappings, much like EPTEs, use > MIGRATE_UNMOVABLE pages as well. By spawning lots of meaningless vIOMMU > entries, we exhaust UNMOVABLE page blocks of lower orders (<9). Next > time KVM tries to allocate pages to store EPTEs, the kernel has to split > an order-9 page block, which is exactly the size of a 2MB sub-block. > Ah, so you also need a THP in the hypervisor I assume. > > Out of curiosity, are newer CPUs no longer affected? > > When qemu pins down the VM's memory, it also establishes every possible > mapping to the VM's memory in the EPT. > > To spawn new EPTEs, we exploit KVM's fix to the iTLB multihit bug. > Basically, we execute a bunch of no-op functions, and KVM will have to > split hugepages into 4KB pages. This process creates a large number of > EPTEs. > > The iTLB multihit bug roughly speaking is only present on non-Atom Intel > CPUs manufactured before 2020. Interesting, thanks! > > > So it won't be sufficient to have a single sub-block plugged and then > > trigger VIRTIO_MEM_REQ_UNPLUG_ALL? > > Could work in theory, but if the newly plugged sub-block does not > contain vulnerable pages, there is no promise that the attacker would > get a sub-block containing a different set of pages next time. Right.
On 26.11.24 16:41, Wei Chen wrote: > Thanks for the information! I will try to come up with V2 that does not > impact virtio-mem's functionality. Thanks. In case we want to go this path in this patch, we'd have to glue the new behavior to a new feature flag, and implement support for that in Linux (+Windows) drivers. So if we can find a way to avoid that, it would be beneficial.
On Tue, Nov 26, 2024 at 11:52 PM David Hildenbrand <david@redhat.com> wrote: > On 26.11.24 16:31, Wei Chen wrote: > > > How can you be sure (IOW trigger) that the system will store > > > "important data" like EPTs? > > > > We cannot, but we have designed the attack (see below) to improve the > > possibility. > > > > > So is one magic bit really that for your experiments, one needs a > > > viommu? > > > > Admittedly the way we accomplish a VM escape is a bit arcane. > > That's what I imagined :) > > > > > We require device passthrough because it pins the VM's memory down and > > converts them to MIGRATE_UNMOVABLE. > > Interesting, that's news to me. Can you share where GUP in the kernel > would do that? > In /drivers/vfio/vfio_iommu_type1.c, there is a function called vfio_iommu_type1_pin_pages where VM's memory is pinned down. > > > Hotplugged memory will also be > > converted to MIGRATE_UNMOVABLE. > > But that's in the VM? Because we don't hotplug memory in the hypervisor. > Yes, the virtio-mem driver in the VM is modified to actively release memory vulnerable to Rowhammer. For more details, would you be interested in reading our paper? It was recently submitted to ASPLOS for publication and we are happy to share it with you. Regards, Zhi Zhang
On 27.11.24 03:00, zhi zhang wrote: > > > On Tue, Nov 26, 2024 at 11:52 PM David Hildenbrand <david@redhat.com > <mailto:david@redhat.com>> wrote: > > On 26.11.24 16:31, Wei Chen wrote: > > > How can you be sure (IOW trigger) that the system will store > > > "important data" like EPTs? > > > > We cannot, but we have designed the attack (see below) to improve the > > possibility. > > > > > So is one magic bit really that for your experiments, one needs a > > > viommu? > > > > Admittedly the way we accomplish a VM escape is a bit arcane. > > That's what I imagined :) > > > > > We require device passthrough because it pins the VM's memory > down and > > converts them to MIGRATE_UNMOVABLE. > > Interesting, that's news to me. Can you share where GUP in the kernel > would do that? > > > In /drivers/vfio/vfio_iommu_type1.c, there is a function called > vfio_iommu_type1_pin_pages where VM's memory is pinned down. That doesn't explain the full story about MIGRATE_UNMOVABLE. I assume one precondition is missing in your explanation. VFIO will call pin_user_pages_remote(FOLL_LONGTERM). Two cases: a) Memory is already allocated (which would mostly be MIGRATE_MOVABLE, because it's ordinary user memory). We'll simply longterm pin the memory without changing the migratetype. b) Memory is not allocated yet. We'll call faultin_page()->handle_mm_fault(). There is no FOLL_LONGTERM special-casing, so you'll mostly get MIGRATE_MOVABLE. Now, there is one corner case: we disallow longterm pinning on ZONE_MOVABLE and MIGRATE_CMA. In case our user space allocation ended up on there, check_and_migrate_movable_pages() would detect that the memory resides on ZONE_MOVABLE or MIGRATE_CMA, and allocate a destination page in migrate_longterm_unpinnable_folios() using "GFP_USER | __GFP_NOWARN". So I assume one precondition is that your hypervisor has at least some ZONE_MOVABLE or CMA memory? Otherwise I don't see how you would reliably get MIGRATE_UNMOVABLE. > > > > Hotplugged memory will also be > > converted to MIGRATE_UNMOVABLE. > > But that's in the VM? Because we don't hotplug memory in the hypervisor. > > > Yes, the virtio-mem driver in the VM is modified to actively release > memory vulnerable to Rowhammer. I think I now understand that statement: Memory to-be-hotplugged to the VM will be migrated to MIGRATE_UNMOVABLE during longterm pinning, if it resides on ZONE_MOVABLE or MIGRATE_CMA. > For more details, would you be interested in reading our paper? It was > recently submitted to ASPLOS for publication and we are happy to share > it with you. Yes, absolutely! Please send a private mail :)
On 26.11.24 16:41, Wei Chen wrote: > Thanks for the information! I will try to come up with V2 that does not > impact virtio-mem's functionality. So, thinking about this ... both UNPLUG_ALL and "over-UNPLUG" (exceeding the request) will happen in sane environment currently very rarely. In many setups never at all. We could likely limit them to "once every 60s" without causing real harm. Would that sufficient to mitigate the problem? How often would you usually have to retry in order to make it fly?
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index 80ada89551..4ef67082a2 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -671,6 +671,10 @@ static int virtio_mem_state_change_request(VirtIOMEM *vmem, uint64_t gpa, return VIRTIO_MEM_RESP_NACK; } + if (!plug && vmem->size <= vmem->requested_size) { + return VIRTIO_MEM_RESP_NACK; + } + /* test if really all blocks are in the opposite state */ if ((plug && !virtio_mem_is_range_unplugged(vmem, gpa, size)) || (!plug && !virtio_mem_is_range_plugged(vmem, gpa, size))) {