diff mbox series

hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size

Message ID 20241126080213.248-1-weichenforschung@gmail.com (mailing list archive)
State New
Headers show
Series hw/virtio/virtio-mem: Prohibit unplugging when size <= requested_size | expand

Commit Message

Wei Chen Nov. 26, 2024, 8:02 a.m. UTC
A malicious guest can exploit virtio-mem to release memory back to the
hypervisor and attempt Rowhammer attacks. The only case reasonable for
unplugging is when the size > requested_size.

Signed-off-by: Wei Chen <weichenforschung@gmail.com>
Signed-off-by: Zhi Zhang <zzhangphd@gmail.com>
---
 hw/virtio/virtio-mem.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Wei Chen Nov. 26, 2024, 2:20 p.m. UTC | #1
> Please provide more information how this is supposed to work

We initially discovered that virtio-mem could be used by a malicious
agent to trigger the Rowhammer vulnerability and further achieve a VM
escape.

Simply speaking, Rowhammer is a DRAM vulnerability where frequent access
to a memory location might cause voltage leakage to adjacent locations,
effectively flipping bits in these locations. In other words, with
Rowhammer, an adversary can modify the data stored in the memory.

For a complete attack, an adversary needs to: a) determine which parts
of the memory are prone to bit flips, b) trick the system to store
important data on those parts of memory and c) trigger bit flips to
tamper important data.

Now, for an attacker who only has access to their VM but not to the
hypervisor, one important challenge among the three is b), i.e., to give
back the memory they determine as vulnerable to the hypervisor. This is
where the pitfall for virtio-mem lies: the attacker can modify the
virtio-mem driver in the VM's kernel and unplug memory proactively.

The current impl of virtio-mem in qemu does not check if it is valid for
the VM to unplug memory. Therefore, as is proved by our experiments,
this method works in practice.

 > whether this is a purely theoretical case, and how relevant this is in
 > practice.

In our design, on a host machine equipped with certain Intel processors
and inside a VM that a) has a passed-through PCI device, b) has a vIOMMU
and c) has a virtio-mem device, an attacker can force the EPT to use
pages that are prone to Rowhammer bit flips and thus modify the EPT to
gain read and write privileges to an arbitrary memory location.

Our efforts involved conducting end-to-end attacks on two separate
machines with the Core i3-10100 and the Xeon E2124 processors
respectively, and has achieved successful VM escapes.

 > Further, what about virtio-balloon, which does not even support
 > rejecting requests?

virtio-balloon does not work with device passthrough currently, so we
have yet to produce a feasible attack with it.

 > I recall that that behavior was desired once the driver would support
 > de-fragmenting unplugged memory blocks.

By "that behavior" do you mean to unplug memory when size <=
requested_size? I am not sure how that is to be implemented.

 > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed

That is true, but the attacker will want the capability to release a
specific sub-block.

In fact, a sub-block is still somewhat coarse, because most likely there
is only one page in a sub-block that contains potential bit flips. When
the attacker spawns EPTEs, they have to spawn enough to make sure the
target page is used to store the EPTEs.

A 2MB sub-block can store 2MB/4KB*512=262,144 EPTEs, equating to at
least 1GB of memory. In other words, the attack program exhausts 1GB of
memory just for the possibility that KVM uses the target page to store
EPTEs.


Best regards,
Wei Chen

On 2024/11/26 20:29, David Hildenbrand wrote:
> On 26.11.24 09:02, Wei Chen wrote:
>> A malicious guest can exploit virtio-mem to release memory back to the
>> hypervisor and attempt Rowhammer attacks.
>
> Please provide more information how this is supposed to work, whether 
> this is a purely theoretical case, and how relevant this is in practice.
>
> Because I am not sure how relevant and accurate this statement is, and 
> if any action is needed at all.
>
> Further, what about virtio-balloon, which does not even support 
> rejecting requests?
>
> The only case reasonable for
>> unplugging is when the size > requested_size.
>
> I recall that that behavior was desired once the driver would support 
> de-fragmenting unplugged memory blocks. I don't think drivers do that 
> today (would have to double-check the Windows one). The spec does not 
> document what is to happen in that case.
>
> Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed, so 
> this change would not cover all cases. VIRTIO_MEM_REQ_UNPLUG_ALL could 
> be ratelimited -- if there is a real issue here.
>
>
>>
>> Signed-off-by: Wei Chen <weichenforschung@gmail.com>
>> Signed-off-by: Zhi Zhang <zzhangphd@gmail.com>
>> ---
>>   hw/virtio/virtio-mem.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
>> index 80ada89551..4ef67082a2 100644
>> --- a/hw/virtio/virtio-mem.c
>> +++ b/hw/virtio/virtio-mem.c
>> @@ -671,6 +671,10 @@ static int 
>> virtio_mem_state_change_request(VirtIOMEM *vmem, uint64_t gpa,
>>           return VIRTIO_MEM_RESP_NACK;
>>       }
>>   +    if (!plug && vmem->size <= vmem->requested_size) {
>> +        return VIRTIO_MEM_RESP_NACK;
>> +    }
>> +
>>       /* test if really all blocks are in the opposite state */
>>       if ((plug && !virtio_mem_is_range_unplugged(vmem, gpa, size)) ||
>>           (!plug && !virtio_mem_is_range_plugged(vmem, gpa, size))) {
>
>
David Hildenbrand Nov. 26, 2024, 2:46 p.m. UTC | #2
On 26.11.24 15:20, Wei Chen wrote:
>   > Please provide more information how this is supposed to work
> 

Thanks for the information. A lot of what you wrote belongs into the 
patch description. Especially, that this might currently only be 
relevant with device passthrough + viommu.

> We initially discovered that virtio-mem could be used by a malicious
> agent to trigger the Rowhammer vulnerability and further achieve a VM
> escape.
> 
> Simply speaking, Rowhammer is a DRAM vulnerability where frequent access
> to a memory location might cause voltage leakage to adjacent locations,
> effectively flipping bits in these locations. In other words, with
> Rowhammer, an adversary can modify the data stored in the memory.
> 
> For a complete attack, an adversary needs to: a) determine which parts
> of the memory are prone to bit flips, b) trick the system to store
> important data on those parts of memory and c) trigger bit flips to
> tamper important data.
> 
> Now, for an attacker who only has access to their VM but not to the
> hypervisor, one important challenge among the three is b), i.e., to give
> back the memory they determine as vulnerable to the hypervisor. This is
> where the pitfall for virtio-mem lies: the attacker can modify the
> virtio-mem driver in the VM's kernel and unplug memory proactively.

But b), as you write, is not only about giving back that memory to the 
hypervisor. How can you be sure (IOW trigger) that the system will store 
"important data" like EPTs?

> 
> The current impl of virtio-mem in qemu does not check if it is valid for
> the VM to unplug memory. Therefore, as is proved by our experiments,
> this method works in practice.
> 
>   > whether this is a purely theoretical case, and how relevant this is in
>   > practice.
> 
> In our design, on a host machine equipped with certain Intel processors
> and inside a VM that a) has a passed-through PCI device, b) has a vIOMMU
> and c) has a virtio-mem device, an attacker can force the EPT to use
> pages that are prone to Rowhammer bit flips and thus modify the EPT to
> gain read and write privileges to an arbitrary memory location.
> 
> Our efforts involved conducting end-to-end attacks on two separate
> machines with the Core i3-10100 and the Xeon E2124 processors
> respectively, and has achieved successful VM escapes.

Out of curiosity, are newer CPUs no longer affected?

> 
>   > Further, what about virtio-balloon, which does not even support
>   > rejecting requests?
> 
> virtio-balloon does not work with device passthrough currently, so we
> have yet to produce a feasible attack with it.

So is one magic bit really that for your experiments, one needs a viommu?

The only mentioning of rohammer+memory ballooning I found is: 
https://www.whonix.org/pipermail/whonix-devel/2016-September/000746.html

> 
>   > I recall that that behavior was desired once the driver would support
>   > de-fragmenting unplugged memory blocks.
> 
> By "that behavior" do you mean to unplug memory when size <=
> requested_size? I am not sure how that is to be implemented.

To defragment, the idea was to unplug one additional block, so we can 
plug another block.

> 
>   > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed
> 
> That is true, but the attacker will want the capability to release a
> specific sub-block.

So it won't be sufficient to have a single sub-block plugged and then 
trigger VIRTIO_MEM_REQ_UNPLUG_ALL?

> 
> In fact, a sub-block is still somewhat coarse, because most likely there
> is only one page in a sub-block that contains potential bit flips. When
> the attacker spawns EPTEs, they have to spawn enough to make sure the
> target page is used to store the EPTEs.
> 
> A 2MB sub-block can store 2MB/4KB*512=262,144 EPTEs, equating to at
> least 1GB of memory. In other words, the attack program exhausts 1GB of
> memory just for the possibility that KVM uses the target page to store
> EPTEs.

Ah, that makes sense.

Can you compress what you wrote into the patch description? Further, I 
assume we want to add a Fixes: tag and Cc: QEMU Stable 
<qemu-stable@nongnu.org>

Thanks!
David Hildenbrand Nov. 26, 2024, 3:08 p.m. UTC | #3
On 26.11.24 15:46, David Hildenbrand wrote:
> On 26.11.24 15:20, Wei Chen wrote:
>>    > Please provide more information how this is supposed to work
>>
> 
> Thanks for the information. A lot of what you wrote belongs into the
> patch description. Especially, that this might currently only be
> relevant with device passthrough + viommu.
> 
>> We initially discovered that virtio-mem could be used by a malicious
>> agent to trigger the Rowhammer vulnerability and further achieve a VM
>> escape.
>>
>> Simply speaking, Rowhammer is a DRAM vulnerability where frequent access
>> to a memory location might cause voltage leakage to adjacent locations,
>> effectively flipping bits in these locations. In other words, with
>> Rowhammer, an adversary can modify the data stored in the memory.
>>
>> For a complete attack, an adversary needs to: a) determine which parts
>> of the memory are prone to bit flips, b) trick the system to store
>> important data on those parts of memory and c) trigger bit flips to
>> tamper important data.
>>
>> Now, for an attacker who only has access to their VM but not to the
>> hypervisor, one important challenge among the three is b), i.e., to give
>> back the memory they determine as vulnerable to the hypervisor. This is
>> where the pitfall for virtio-mem lies: the attacker can modify the
>> virtio-mem driver in the VM's kernel and unplug memory proactively.
> 
> But b), as you write, is not only about giving back that memory to the
> hypervisor. How can you be sure (IOW trigger) that the system will store
> "important data" like EPTs?
> 
>>
>> The current impl of virtio-mem in qemu does not check if it is valid for
>> the VM to unplug memory. Therefore, as is proved by our experiments,
>> this method works in practice.
>>
>>    > whether this is a purely theoretical case, and how relevant this is in
>>    > practice.
>>
>> In our design, on a host machine equipped with certain Intel processors
>> and inside a VM that a) has a passed-through PCI device, b) has a vIOMMU
>> and c) has a virtio-mem device, an attacker can force the EPT to use
>> pages that are prone to Rowhammer bit flips and thus modify the EPT to
>> gain read and write privileges to an arbitrary memory location.
>>
>> Our efforts involved conducting end-to-end attacks on two separate
>> machines with the Core i3-10100 and the Xeon E2124 processors
>> respectively, and has achieved successful VM escapes.
> 
> Out of curiosity, are newer CPUs no longer affected?
> 
>>
>>    > Further, what about virtio-balloon, which does not even support
>>    > rejecting requests?
>>
>> virtio-balloon does not work with device passthrough currently, so we
>> have yet to produce a feasible attack with it.
> 
> So is one magic bit really that for your experiments, one needs a viommu?
> 
> The only mentioning of rohammer+memory ballooning I found is:
> https://www.whonix.org/pipermail/whonix-devel/2016-September/000746.html
> 
>>
>>    > I recall that that behavior was desired once the driver would support
>>    > de-fragmenting unplugged memory blocks.
>>
>> By "that behavior" do you mean to unplug memory when size <=
>> requested_size? I am not sure how that is to be implemented.
> 
> To defragment, the idea was to unplug one additional block, so we can
> plug another block.
> 
>>
>>    > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed
>>
>> That is true, but the attacker will want the capability to release a
>> specific sub-block.
> 
> So it won't be sufficient to have a single sub-block plugged and then
> trigger VIRTIO_MEM_REQ_UNPLUG_ALL?
> 
>>
>> In fact, a sub-block is still somewhat coarse, because most likely there
>> is only one page in a sub-block that contains potential bit flips. When
>> the attacker spawns EPTEs, they have to spawn enough to make sure the
>> target page is used to store the EPTEs.
>>
>> A 2MB sub-block can store 2MB/4KB*512=262,144 EPTEs, equating to at
>> least 1GB of memory. In other words, the attack program exhausts 1GB of
>> memory just for the possibility that KVM uses the target page to store
>> EPTEs.
> 
> Ah, that makes sense.
> 
> Can you compress what you wrote into the patch description? Further, I
> assume we want to add a Fixes: tag and Cc: QEMU Stable
> <qemu-stable@nongnu.org>

I just recalled another scenario where we unplug memory: see 
virtio_mem_cleanup_pending_mb() in the Linux driver as one example.

We first plug memory, then add the memory to Linux. If that adding 
fails, we unplug the memory again.

So this change can turn the virtio_mem driver in Linux non-functional, 
unfortunately.
David Hildenbrand Nov. 26, 2024, 3:14 p.m. UTC | #4
On 26.11.24 16:08, David Hildenbrand wrote:
> On 26.11.24 15:46, David Hildenbrand wrote:
>> On 26.11.24 15:20, Wei Chen wrote:
>>>     > Please provide more information how this is supposed to work
>>>
>>
>> Thanks for the information. A lot of what you wrote belongs into the
>> patch description. Especially, that this might currently only be
>> relevant with device passthrough + viommu.
>>
>>> We initially discovered that virtio-mem could be used by a malicious
>>> agent to trigger the Rowhammer vulnerability and further achieve a VM
>>> escape.
>>>
>>> Simply speaking, Rowhammer is a DRAM vulnerability where frequent access
>>> to a memory location might cause voltage leakage to adjacent locations,
>>> effectively flipping bits in these locations. In other words, with
>>> Rowhammer, an adversary can modify the data stored in the memory.
>>>
>>> For a complete attack, an adversary needs to: a) determine which parts
>>> of the memory are prone to bit flips, b) trick the system to store
>>> important data on those parts of memory and c) trigger bit flips to
>>> tamper important data.
>>>
>>> Now, for an attacker who only has access to their VM but not to the
>>> hypervisor, one important challenge among the three is b), i.e., to give
>>> back the memory they determine as vulnerable to the hypervisor. This is
>>> where the pitfall for virtio-mem lies: the attacker can modify the
>>> virtio-mem driver in the VM's kernel and unplug memory proactively.
>>
>> But b), as you write, is not only about giving back that memory to the
>> hypervisor. How can you be sure (IOW trigger) that the system will store
>> "important data" like EPTs?
>>
>>>
>>> The current impl of virtio-mem in qemu does not check if it is valid for
>>> the VM to unplug memory. Therefore, as is proved by our experiments,
>>> this method works in practice.
>>>
>>>     > whether this is a purely theoretical case, and how relevant this is in
>>>     > practice.
>>>
>>> In our design, on a host machine equipped with certain Intel processors
>>> and inside a VM that a) has a passed-through PCI device, b) has a vIOMMU
>>> and c) has a virtio-mem device, an attacker can force the EPT to use
>>> pages that are prone to Rowhammer bit flips and thus modify the EPT to
>>> gain read and write privileges to an arbitrary memory location.
>>>
>>> Our efforts involved conducting end-to-end attacks on two separate
>>> machines with the Core i3-10100 and the Xeon E2124 processors
>>> respectively, and has achieved successful VM escapes.
>>
>> Out of curiosity, are newer CPUs no longer affected?
>>
>>>
>>>     > Further, what about virtio-balloon, which does not even support
>>>     > rejecting requests?
>>>
>>> virtio-balloon does not work with device passthrough currently, so we
>>> have yet to produce a feasible attack with it.
>>
>> So is one magic bit really that for your experiments, one needs a viommu?
>>
>> The only mentioning of rohammer+memory ballooning I found is:
>> https://www.whonix.org/pipermail/whonix-devel/2016-September/000746.html
>>
>>>
>>>     > I recall that that behavior was desired once the driver would support
>>>     > de-fragmenting unplugged memory blocks.
>>>
>>> By "that behavior" do you mean to unplug memory when size <=
>>> requested_size? I am not sure how that is to be implemented.
>>
>> To defragment, the idea was to unplug one additional block, so we can
>> plug another block.
>>
>>>
>>>     > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed
>>>
>>> That is true, but the attacker will want the capability to release a
>>> specific sub-block.
>>
>> So it won't be sufficient to have a single sub-block plugged and then
>> trigger VIRTIO_MEM_REQ_UNPLUG_ALL?
>>
>>>
>>> In fact, a sub-block is still somewhat coarse, because most likely there
>>> is only one page in a sub-block that contains potential bit flips. When
>>> the attacker spawns EPTEs, they have to spawn enough to make sure the
>>> target page is used to store the EPTEs.
>>>
>>> A 2MB sub-block can store 2MB/4KB*512=262,144 EPTEs, equating to at
>>> least 1GB of memory. In other words, the attack program exhausts 1GB of
>>> memory just for the possibility that KVM uses the target page to store
>>> EPTEs.
>>
>> Ah, that makes sense.
>>
>> Can you compress what you wrote into the patch description? Further, I
>> assume we want to add a Fixes: tag and Cc: QEMU Stable
>> <qemu-stable@nongnu.org>
> 
> I just recalled another scenario where we unplug memory: see
> virtio_mem_cleanup_pending_mb() in the Linux driver as one example.
> 
> We first plug memory, then add the memory to Linux. If that adding
> fails, we unplug the memory again.
> 
> So this change can turn the virtio_mem driver in Linux non-functional,
> unfortunately.

Further, the Linux driver does not expect a NACK on unplug requests, see 
virtio_mem_send_unplug_request().

So this change won't work.

We could return VIRTIO_MEM_RESP_BUSY, but to handle what I raised above, 
we would still have to make it work every now and then (ratelimit), to 
not break the driver.

The alternative is to delay freeing of the memory in case we run into 
this condition. Hm ...
Wei Chen Nov. 26, 2024, 3:31 p.m. UTC | #5
> How can you be sure (IOW trigger) that the system will store
 > "important data" like EPTs?

We cannot, but we have designed the attack (see below) to improve the
possibility.

 > So is one magic bit really that for your experiments, one needs a
 > viommu?

Admittedly the way we accomplish a VM escape is a bit arcane.

We require device passthrough because it pins the VM's memory down and
converts them to MIGRATE_UNMOVABLE. Hotplugged memory will also be
converted to MIGRATE_UNMOVABLE. That way when we give memory back to the
hypervisor, they stay UNMOVABLE. Otherwise we will have to convert the
pages to UNMOVABLE or exhaust ALL MIGRATE_MOVALE pages, both of which
cannot be easily accomplished.

Then we require vIOMMU because vIOMMU mappings, much like EPTEs, use
MIGRATE_UNMOVABLE pages as well. By spawning lots of meaningless vIOMMU
entries, we exhaust UNMOVABLE page blocks of lower orders (<9). Next
time KVM tries to allocate pages to store EPTEs, the kernel has to split
an order-9 page block, which is exactly the size of a 2MB sub-block.

 > Out of curiosity, are newer CPUs no longer affected?

When qemu pins down the VM's memory, it also establishes every possible
mapping to the VM's memory in the EPT.

To spawn new EPTEs, we exploit KVM's fix to the iTLB multihit bug.
Basically, we execute a bunch of no-op functions, and KVM will have to
split hugepages into 4KB pages. This process creates a large number of
EPTEs.

The iTLB multihit bug roughly speaking is only present on non-Atom Intel
CPUs manufactured before 2020.

 > So it won't be sufficient to have a single sub-block plugged and then
 > trigger VIRTIO_MEM_REQ_UNPLUG_ALL?

Could work in theory, but if the newly plugged sub-block does not
contain vulnerable pages, there is no promise that the attacker would
get a sub-block containing a different set of pages next time.

It also depends heavily on the configuration of the virtio-mem device.
If there is not much non-virtio-mem memory for the VM, the attacker
could easily run out of memory.


Best regards,
Wei Chen

On 2024/11/26 22:46, David Hildenbrand wrote:
> On 26.11.24 15:20, Wei Chen wrote:
>>   > Please provide more information how this is supposed to work
>>
>
> Thanks for the information. A lot of what you wrote belongs into the 
> patch description. Especially, that this might currently only be 
> relevant with device passthrough + viommu.
>
>> We initially discovered that virtio-mem could be used by a malicious
>> agent to trigger the Rowhammer vulnerability and further achieve a VM
>> escape.
>>
>> Simply speaking, Rowhammer is a DRAM vulnerability where frequent access
>> to a memory location might cause voltage leakage to adjacent locations,
>> effectively flipping bits in these locations. In other words, with
>> Rowhammer, an adversary can modify the data stored in the memory.
>>
>> For a complete attack, an adversary needs to: a) determine which parts
>> of the memory are prone to bit flips, b) trick the system to store
>> important data on those parts of memory and c) trigger bit flips to
>> tamper important data.
>>
>> Now, for an attacker who only has access to their VM but not to the
>> hypervisor, one important challenge among the three is b), i.e., to give
>> back the memory they determine as vulnerable to the hypervisor. This is
>> where the pitfall for virtio-mem lies: the attacker can modify the
>> virtio-mem driver in the VM's kernel and unplug memory proactively.
>
> But b), as you write, is not only about giving back that memory to the 
> hypervisor. How can you be sure (IOW trigger) that the system will 
> store "important data" like EPTs?
>
>>
>> The current impl of virtio-mem in qemu does not check if it is valid for
>> the VM to unplug memory. Therefore, as is proved by our experiments,
>> this method works in practice.
>>
>>   > whether this is a purely theoretical case, and how relevant this 
>> is in
>>   > practice.
>>
>> In our design, on a host machine equipped with certain Intel processors
>> and inside a VM that a) has a passed-through PCI device, b) has a vIOMMU
>> and c) has a virtio-mem device, an attacker can force the EPT to use
>> pages that are prone to Rowhammer bit flips and thus modify the EPT to
>> gain read and write privileges to an arbitrary memory location.
>>
>> Our efforts involved conducting end-to-end attacks on two separate
>> machines with the Core i3-10100 and the Xeon E2124 processors
>> respectively, and has achieved successful VM escapes.
>
> Out of curiosity, are newer CPUs no longer affected?
>
>>
>>   > Further, what about virtio-balloon, which does not even support
>>   > rejecting requests?
>>
>> virtio-balloon does not work with device passthrough currently, so we
>> have yet to produce a feasible attack with it.
>
> So is one magic bit really that for your experiments, one needs a viommu?
>
> The only mentioning of rohammer+memory ballooning I found is: 
> https://www.whonix.org/pipermail/whonix-devel/2016-September/000746.html
>
>>
>>   > I recall that that behavior was desired once the driver would 
>> support
>>   > de-fragmenting unplugged memory blocks.
>>
>> By "that behavior" do you mean to unplug memory when size <=
>> requested_size? I am not sure how that is to be implemented.
>
> To defragment, the idea was to unplug one additional block, so we can 
> plug another block.
>
>>
>>   > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be allowed
>>
>> That is true, but the attacker will want the capability to release a
>> specific sub-block.
>
> So it won't be sufficient to have a single sub-block plugged and then 
> trigger VIRTIO_MEM_REQ_UNPLUG_ALL?
>
>>
>> In fact, a sub-block is still somewhat coarse, because most likely there
>> is only one page in a sub-block that contains potential bit flips. When
>> the attacker spawns EPTEs, they have to spawn enough to make sure the
>> target page is used to store the EPTEs.
>>
>> A 2MB sub-block can store 2MB/4KB*512=262,144 EPTEs, equating to at
>> least 1GB of memory. In other words, the attack program exhausts 1GB of
>> memory just for the possibility that KVM uses the target page to store
>> EPTEs.
>
> Ah, that makes sense.
>
> Can you compress what you wrote into the patch description? Further, I 
> assume we want to add a Fixes: tag and Cc: QEMU Stable 
> <qemu-stable@nongnu.org>
>
> Thanks!
>
Wei Chen Nov. 26, 2024, 3:41 p.m. UTC | #6
Thanks for the information! I will try to come up with V2 that does not
impact virtio-mem's functionality.


Best regards,
Wei Chen

On 2024/11/26 23:14, David Hildenbrand wrote:
> On 26.11.24 16:08, David Hildenbrand wrote:
>> On 26.11.24 15:46, David Hildenbrand wrote:
>>> On 26.11.24 15:20, Wei Chen wrote:
>>>>     > Please provide more information how this is supposed to work
>>>>
>>>
>>> Thanks for the information. A lot of what you wrote belongs into the
>>> patch description. Especially, that this might currently only be
>>> relevant with device passthrough + viommu.
>>>
>>>> We initially discovered that virtio-mem could be used by a malicious
>>>> agent to trigger the Rowhammer vulnerability and further achieve a VM
>>>> escape.
>>>>
>>>> Simply speaking, Rowhammer is a DRAM vulnerability where frequent 
>>>> access
>>>> to a memory location might cause voltage leakage to adjacent 
>>>> locations,
>>>> effectively flipping bits in these locations. In other words, with
>>>> Rowhammer, an adversary can modify the data stored in the memory.
>>>>
>>>> For a complete attack, an adversary needs to: a) determine which parts
>>>> of the memory are prone to bit flips, b) trick the system to store
>>>> important data on those parts of memory and c) trigger bit flips to
>>>> tamper important data.
>>>>
>>>> Now, for an attacker who only has access to their VM but not to the
>>>> hypervisor, one important challenge among the three is b), i.e., to 
>>>> give
>>>> back the memory they determine as vulnerable to the hypervisor. 
>>>> This is
>>>> where the pitfall for virtio-mem lies: the attacker can modify the
>>>> virtio-mem driver in the VM's kernel and unplug memory proactively.
>>>
>>> But b), as you write, is not only about giving back that memory to the
>>> hypervisor. How can you be sure (IOW trigger) that the system will 
>>> store
>>> "important data" like EPTs?
>>>
>>>>
>>>> The current impl of virtio-mem in qemu does not check if it is 
>>>> valid for
>>>> the VM to unplug memory. Therefore, as is proved by our experiments,
>>>> this method works in practice.
>>>>
>>>>     > whether this is a purely theoretical case, and how relevant 
>>>> this is in
>>>>     > practice.
>>>>
>>>> In our design, on a host machine equipped with certain Intel 
>>>> processors
>>>> and inside a VM that a) has a passed-through PCI device, b) has a 
>>>> vIOMMU
>>>> and c) has a virtio-mem device, an attacker can force the EPT to use
>>>> pages that are prone to Rowhammer bit flips and thus modify the EPT to
>>>> gain read and write privileges to an arbitrary memory location.
>>>>
>>>> Our efforts involved conducting end-to-end attacks on two separate
>>>> machines with the Core i3-10100 and the Xeon E2124 processors
>>>> respectively, and has achieved successful VM escapes.
>>>
>>> Out of curiosity, are newer CPUs no longer affected?
>>>
>>>>
>>>>     > Further, what about virtio-balloon, which does not even support
>>>>     > rejecting requests?
>>>>
>>>> virtio-balloon does not work with device passthrough currently, so we
>>>> have yet to produce a feasible attack with it.
>>>
>>> So is one magic bit really that for your experiments, one needs a 
>>> viommu?
>>>
>>> The only mentioning of rohammer+memory ballooning I found is:
>>> https://www.whonix.org/pipermail/whonix-devel/2016-September/000746.html 
>>>
>>>
>>>>
>>>>     > I recall that that behavior was desired once the driver would 
>>>> support
>>>>     > de-fragmenting unplugged memory blocks.
>>>>
>>>> By "that behavior" do you mean to unplug memory when size <=
>>>> requested_size? I am not sure how that is to be implemented.
>>>
>>> To defragment, the idea was to unplug one additional block, so we can
>>> plug another block.
>>>
>>>>
>>>>     > Note that VIRTIO_MEM_REQ_UNPLUG_ALL would still always be 
>>>> allowed
>>>>
>>>> That is true, but the attacker will want the capability to release a
>>>> specific sub-block.
>>>
>>> So it won't be sufficient to have a single sub-block plugged and then
>>> trigger VIRTIO_MEM_REQ_UNPLUG_ALL?
>>>
>>>>
>>>> In fact, a sub-block is still somewhat coarse, because most likely 
>>>> there
>>>> is only one page in a sub-block that contains potential bit flips. 
>>>> When
>>>> the attacker spawns EPTEs, they have to spawn enough to make sure the
>>>> target page is used to store the EPTEs.
>>>>
>>>> A 2MB sub-block can store 2MB/4KB*512=262,144 EPTEs, equating to at
>>>> least 1GB of memory. In other words, the attack program exhausts 
>>>> 1GB of
>>>> memory just for the possibility that KVM uses the target page to store
>>>> EPTEs.
>>>
>>> Ah, that makes sense.
>>>
>>> Can you compress what you wrote into the patch description? Further, I
>>> assume we want to add a Fixes: tag and Cc: QEMU Stable
>>> <qemu-stable@nongnu.org>
>>
>> I just recalled another scenario where we unplug memory: see
>> virtio_mem_cleanup_pending_mb() in the Linux driver as one example.
>>
>> We first plug memory, then add the memory to Linux. If that adding
>> fails, we unplug the memory again.
>>
>> So this change can turn the virtio_mem driver in Linux non-functional,
>> unfortunately.
>
> Further, the Linux driver does not expect a NACK on unplug requests, 
> see virtio_mem_send_unplug_request().
>
> So this change won't work.
>
> We could return VIRTIO_MEM_RESP_BUSY, but to handle what I raised 
> above, we would still have to make it work every now and then 
> (ratelimit), to not break the driver.
>
> The alternative is to delay freeing of the memory in case we run into 
> this condition. Hm ...
>
David Hildenbrand Nov. 26, 2024, 3:51 p.m. UTC | #7
On 26.11.24 16:31, Wei Chen wrote:
>   > How can you be sure (IOW trigger) that the system will store
>   > "important data" like EPTs?
> 
> We cannot, but we have designed the attack (see below) to improve the
> possibility.
> 
>   > So is one magic bit really that for your experiments, one needs a
>   > viommu?
> 
> Admittedly the way we accomplish a VM escape is a bit arcane.

That's what I imagined :)

> 
> We require device passthrough because it pins the VM's memory down and
> converts them to MIGRATE_UNMOVABLE. 

Interesting, that's news to me. Can you share where GUP in the kernel 
would do that?

> Hotplugged memory will also be
> converted to MIGRATE_UNMOVABLE. 

But that's in the VM? Because we don't hotplug memory in the hypervisor.

That way when we give memory back to the
> hypervisor, they stay UNMOVABLE. Otherwise we will have to convert the
> pages to UNMOVABLE or exhaust ALL MIGRATE_MOVALE pages, both of which
> cannot be easily accomplished.
> 
> Then we require vIOMMU because vIOMMU mappings, much like EPTEs, use
> MIGRATE_UNMOVABLE pages as well. By spawning lots of meaningless vIOMMU
> entries, we exhaust UNMOVABLE page blocks of lower orders (<9). Next
> time KVM tries to allocate pages to store EPTEs, the kernel has to split
> an order-9 page block, which is exactly the size of a 2MB sub-block.
> 

Ah, so you also need a THP in the hypervisor I assume.

>   > Out of curiosity, are newer CPUs no longer affected?
> 
> When qemu pins down the VM's memory, it also establishes every possible
> mapping to the VM's memory in the EPT.
> 
> To spawn new EPTEs, we exploit KVM's fix to the iTLB multihit bug.
> Basically, we execute a bunch of no-op functions, and KVM will have to
> split hugepages into 4KB pages. This process creates a large number of
> EPTEs.
> 
> The iTLB multihit bug roughly speaking is only present on non-Atom Intel
> CPUs manufactured before 2020.

Interesting, thanks!

> 
>   > So it won't be sufficient to have a single sub-block plugged and then
>   > trigger VIRTIO_MEM_REQ_UNPLUG_ALL?
> 
> Could work in theory, but if the newly plugged sub-block does not
> contain vulnerable pages, there is no promise that the attacker would
> get a sub-block containing a different set of pages next time.

Right.
David Hildenbrand Nov. 26, 2024, 3:53 p.m. UTC | #8
On 26.11.24 16:41, Wei Chen wrote:
> Thanks for the information! I will try to come up with V2 that does not
> impact virtio-mem's functionality.

Thanks. In case we want to go this path in this patch, we'd have to glue 
the new behavior to a new feature flag, and implement support for that 
in Linux (+Windows) drivers.

So if we can find a way to avoid that, it would be beneficial.
diff mbox series

Patch

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 80ada89551..4ef67082a2 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -671,6 +671,10 @@  static int virtio_mem_state_change_request(VirtIOMEM *vmem, uint64_t gpa,
         return VIRTIO_MEM_RESP_NACK;
     }
 
+    if (!plug && vmem->size <= vmem->requested_size) {
+        return VIRTIO_MEM_RESP_NACK;
+    }
+
     /* test if really all blocks are in the opposite state */
     if ((plug && !virtio_mem_is_range_unplugged(vmem, gpa, size)) ||
         (!plug && !virtio_mem_is_range_plugged(vmem, gpa, size))) {