Message ID | 1491308099-4751-1-git-send-email-rcojocaru@bitdefender.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 04/04/17 13:14, Razvan Cojocaru wrote: > Currently xc_translate_foreign_address only checks for PSE bit only on > level 2 entries (that's 2 MB pages on x64 and 32-bit with PAE, and 4 MB > pages on 32-bit). But linux kernel sometimes uses 1 GB pages. This patch > fixes that, and checks the PSE bit on level 3 entries if the guest has 4 > translation levels (that means 64-bit guests only). > > Signed-off-by: Cristian-Bogdan Sirb <csirb@bitdefender.com> This function is in a rather tragic state. Lucky it isn't used by code covered by Xen's security statement. This patch reintroduces XSA-176, and the existing code doesn't work for 4M superpages, or guests using PSE36. (I might be acutely aware of pagetable issues, following c/s 4c5d78a10dc89). Furthermore, the map/unmap overhead must be a large overhead. How often is this used by introspection? To properly walk the pagetables, you need access to the CPUID information (as well as the control register state), and that isn't yet available to the toolstack in Xen. I'm wondering whether it might be better to have a way of asking Xen to perform a virtual to linear translation in the context of a specific vcpu. My gut feeling is that it might be quicker than running this logic, and is definitely more simple than trying to fix this code not to have vulnerabilities in it. ~Andrew
On 04/04/2017 06:11 PM, Andrew Cooper wrote: > On 04/04/17 13:14, Razvan Cojocaru wrote: >> Currently xc_translate_foreign_address only checks for PSE bit only on >> level 2 entries (that's 2 MB pages on x64 and 32-bit with PAE, and 4 MB >> pages on 32-bit). But linux kernel sometimes uses 1 GB pages. This patch >> fixes that, and checks the PSE bit on level 3 entries if the guest has 4 >> translation levels (that means 64-bit guests only). >> >> Signed-off-by: Cristian-Bogdan Sirb <csirb@bitdefender.com> > > This function is in a rather tragic state. Lucky it isn't used by code > covered by Xen's security statement. > > This patch reintroduces XSA-176, and the existing code doesn't work for > 4M superpages, or guests using PSE36. (I might be acutely aware of > pagetable issues, following c/s > 4c5d78a10dc89). Furthermore, the map/unmap overhead must be a large > overhead. > > How often is this used by introspection? To properly walk the > pagetables, you need access to the CPUID information (as well as the > control register state), and that isn't yet available to the toolstack > in Xen. > > I'm wondering whether it might be better to have a way of asking Xen to > perform a virtual to linear translation in the context of a specific > vcpu. My gut feeling is that it might be quicker than running this > logic, and is definitely more simple than trying to fix this code not to > have vulnerabilities in it. We have a workaround looking directly into the guest but have felt that the proper thing to do was to contribute the fix back to Xen. Thanks, Razvan
On Tue, Apr 4, 2017 at 9:11 AM, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 04/04/17 13:14, Razvan Cojocaru wrote: > > Currently xc_translate_foreign_address only checks for PSE bit only on > > level 2 entries (that's 2 MB pages on x64 and 32-bit with PAE, and 4 MB > > pages on 32-bit). But linux kernel sometimes uses 1 GB pages. This patch > > fixes that, and checks the PSE bit on level 3 entries if the guest has 4 > > translation levels (that means 64-bit guests only). > > > > Signed-off-by: Cristian-Bogdan Sirb <csirb@bitdefender.com> > > This function is in a rather tragic state. Lucky it isn't used by code > covered by Xen's security statement. > > This patch reintroduces XSA-176, and the existing code doesn't work for > 4M superpages, or guests using PSE36. (I might be acutely aware of > pagetable issues, following c/s > 4c5d78a10dc89). Furthermore, the map/unmap overhead must be a large > overhead. > Indeed it is, that's why in LibVMI there is actually a cache implemented for mapped pages so we throw them away only after they have been idle for a while. > > How often is this used by introspection? To properly walk the > pagetables, you need access to the CPUID information (as well as the > control register state), and that isn't yet available to the toolstack > in Xen. > Control register state is certainly available. > > I'm wondering whether it might be better to have a way of asking Xen to > perform a virtual to linear translation in the context of a specific > vcpu. My gut feeling is that it might be quicker than running this > logic, and is definitely more simple than trying to fix this code not to > have vulnerabilities in it. I don't think it would be necessary. IMHO doing this in Xen wouldn't really net us much performance-wise. Furthermore, there are certain PTE bits that are OS specific and Xen wouldn't necessary have the required information to do the translation properly (ie. present bit unset but transition bits used for Windows guests). Tamas
On 04/04/17 16:39, Tamas K Lengyel wrote: > > > On Tue, Apr 4, 2017 at 9:11 AM, Andrew Cooper > <andrew.cooper3@citrix.com <mailto:andrew.cooper3@citrix.com>> wrote: > > On 04/04/17 13:14, Razvan Cojocaru wrote: > > Currently xc_translate_foreign_address only checks for PSE bit > only on > > level 2 entries (that's 2 MB pages on x64 and 32-bit with PAE, > and 4 MB > > pages on 32-bit). But linux kernel sometimes uses 1 GB pages. > This patch > > fixes that, and checks the PSE bit on level 3 entries if the > guest has 4 > > translation levels (that means 64-bit guests only). > > > > Signed-off-by: Cristian-Bogdan Sirb <csirb@bitdefender.com > <mailto:csirb@bitdefender.com>> > > This function is in a rather tragic state. Lucky it isn't used by > code > covered by Xen's security statement. > > This patch reintroduces XSA-176, and the existing code doesn't > work for > 4M superpages, or guests using PSE36. (I might be acutely aware of > pagetable issues, following c/s > 4c5d78a10dc89). Furthermore, the map/unmap overhead must be a large > overhead. > > > Indeed it is, that's why in LibVMI there is actually a cache > implemented for mapped pages so we throw them away only after they > have been idle for a while. > > > > How often is this used by introspection? To properly walk the > pagetables, you need access to the CPUID information (as well as the > control register state), and that isn't yet available to the toolstack > in Xen. > > > Control register state is certainly available. > > > > I'm wondering whether it might be better to have a way of asking > Xen to > perform a virtual to linear translation in the context of a specific > vcpu. My gut feeling is that it might be quicker than running this > logic, and is definitely more simple than trying to fix this code > not to > have vulnerabilities in it. > > > I don't think it would be necessary. IMHO doing this in Xen wouldn't > really net us much performance-wise. Furthermore, there are certain > PTE bits that are OS specific and Xen wouldn't necessary have the > required information to do the translation properly (ie. present bit > unset but transition bits used for Windows guests). Ok. Xen needs to care only about the behaviour of real pointers in guest context, and whether they raise #PF. It sounds like the best thing to do is to provide userspace with all information it needs to actually perform the walk safely, and let the library apply guest-specific knowledge if it wants. Off the top of my head, the control information needed is: Hap/Shadow, hardwares views towards page1gb and pse36, whether EFER.NX is leaked from Xen, EFER.NX, CR0.{PG,WP}, CR4.{PAE,PSE,PCID,SMEP,SMAP,PKE}, and the PDPTRs for the 32bit PAE case, because the translation in use isn't necessary the value you would read by following CR3 in memory. ~Andrew
On Tue, Apr 4, 2017 at 9:58 AM, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 04/04/17 16:39, Tamas K Lengyel wrote: > > > > On Tue, Apr 4, 2017 at 9:11 AM, Andrew Cooper <andrew.cooper3@citrix.com> > wrote: > >> On 04/04/17 13:14, Razvan Cojocaru wrote: >> > Currently xc_translate_foreign_address only checks for PSE bit only on >> > level 2 entries (that's 2 MB pages on x64 and 32-bit with PAE, and 4 MB >> > pages on 32-bit). But linux kernel sometimes uses 1 GB pages. This patch >> > fixes that, and checks the PSE bit on level 3 entries if the guest has 4 >> > translation levels (that means 64-bit guests only). >> > >> > Signed-off-by: Cristian-Bogdan Sirb <csirb@bitdefender.com> >> >> This function is in a rather tragic state. Lucky it isn't used by code >> covered by Xen's security statement. >> >> This patch reintroduces XSA-176, and the existing code doesn't work for >> 4M superpages, or guests using PSE36. (I might be acutely aware of >> pagetable issues, following c/s >> 4c5d78a10dc89). Furthermore, the map/unmap overhead must be a large >> overhead. >> > > Indeed it is, that's why in LibVMI there is actually a cache implemented > for mapped pages so we throw them away only after they have been idle for a > while. > > >> >> How often is this used by introspection? To properly walk the >> pagetables, you need access to the CPUID information (as well as the >> control register state), and that isn't yet available to the toolstack >> in Xen. >> > > Control register state is certainly available. > > >> >> I'm wondering whether it might be better to have a way of asking Xen to >> perform a virtual to linear translation in the context of a specific >> vcpu. My gut feeling is that it might be quicker than running this >> logic, and is definitely more simple than trying to fix this code not to >> have vulnerabilities in it. > > > I don't think it would be necessary. IMHO doing this in Xen wouldn't > really net us much performance-wise. Furthermore, there are certain PTE > bits that are OS specific and Xen wouldn't necessary have the required > information to do the translation properly (ie. present bit unset but > transition bits used for Windows guests). > > > Ok. Xen needs to care only about the behaviour of real pointers in guest > context, and whether they raise #PF. > > It sounds like the best thing to do is to provide userspace with all > information it needs to actually perform the walk safely, and let the > library apply guest-specific knowledge if it wants. > > Off the top of my head, the control information needed is: > > Hap/Shadow, hardwares views towards page1gb and pse36, whether EFER.NX is > leaked from Xen, EFER.NX, CR0.{PG,WP}, CR4.{PAE,PSE,PCID,SMEP,SMAP,PKE}, > and the PDPTRs for the 32bit PAE case, because the translation in use isn't > necessary the value you would read by following CR3 in memory. > The userspace already has access to these informations and we use them in LibVMI already (see https://github.com/libvmi/libvmi/blob/master/libvmi/memory.c#L37). It's only this libxc function that has not really been touched in a long time because I don't think anyone uses it.. Tamas
On 04/04/2017 07:08 PM, Tamas K Lengyel wrote: > > > On Tue, Apr 4, 2017 at 9:58 AM, Andrew Cooper <andrew.cooper3@citrix.com > <mailto:andrew.cooper3@citrix.com>> wrote: > > On 04/04/17 16:39, Tamas K Lengyel wrote: >> >> >> On Tue, Apr 4, 2017 at 9:11 AM, Andrew Cooper >> <andrew.cooper3@citrix.com <mailto:andrew.cooper3@citrix.com>> wrote: >> >> On 04/04/17 13:14, Razvan Cojocaru wrote: >> > Currently xc_translate_foreign_address only checks for PSE >> bit only on >> > level 2 entries (that's 2 MB pages on x64 and 32-bit with >> PAE, and 4 MB >> > pages on 32-bit). But linux kernel sometimes uses 1 GB >> pages. This patch >> > fixes that, and checks the PSE bit on level 3 entries if the >> guest has 4 >> > translation levels (that means 64-bit guests only). >> > >> > Signed-off-by: Cristian-Bogdan Sirb <csirb@bitdefender.com >> <mailto:csirb@bitdefender.com>> >> >> This function is in a rather tragic state. Lucky it isn't >> used by code >> covered by Xen's security statement. >> >> This patch reintroduces XSA-176, and the existing code doesn't >> work for >> 4M superpages, or guests using PSE36. (I might be acutely >> aware of >> pagetable issues, following c/s >> 4c5d78a10dc89). Furthermore, the map/unmap overhead must be a >> large >> overhead. >> >> >> Indeed it is, that's why in LibVMI there is actually a cache >> implemented for mapped pages so we throw them away only after they >> have been idle for a while. >> >> >> >> How often is this used by introspection? To properly walk the >> pagetables, you need access to the CPUID information (as well >> as the >> control register state), and that isn't yet available to the >> toolstack >> in Xen. >> >> >> Control register state is certainly available. >> >> >> >> I'm wondering whether it might be better to have a way of >> asking Xen to >> perform a virtual to linear translation in the context of a >> specific >> vcpu. My gut feeling is that it might be quicker than running >> this >> logic, and is definitely more simple than trying to fix this >> code not to >> have vulnerabilities in it. >> >> >> I don't think it would be necessary. IMHO doing this in Xen >> wouldn't really net us much performance-wise. Furthermore, there >> are certain PTE bits that are OS specific and Xen wouldn't >> necessary have the required information to do the translation >> properly (ie. present bit unset but transition bits used for >> Windows guests). > > Ok. Xen needs to care only about the behaviour of real pointers in > guest context, and whether they raise #PF. > > It sounds like the best thing to do is to provide userspace with all > information it needs to actually perform the walk safely, and let > the library apply guest-specific knowledge if it wants. > > Off the top of my head, the control information needed is: > > Hap/Shadow, hardwares views towards page1gb and pse36, whether > EFER.NX is leaked from Xen, EFER.NX, CR0.{PG,WP}, > CR4.{PAE,PSE,PCID,SMEP,SMAP,PKE}, and the PDPTRs for the 32bit PAE > case, because the translation in use isn't necessary the value you > would read by following CR3 in memory. > > > The userspace already has access to these informations and we use them > in LibVMI already (see > https://github.com/libvmi/libvmi/blob/master/libvmi/memory.c#L37). It's > only this libxc function that has not really been touched in a long time > because I don't think anyone uses it.. Should it then be removed altogether, or at least be marked with a #warning or a comment? Thanks, Razvan
On 04/04/17 17:45, Razvan Cojocaru wrote: > On 04/04/2017 07:08 PM, Tamas K Lengyel wrote: >> >> On Tue, Apr 4, 2017 at 9:58 AM, Andrew Cooper <andrew.cooper3@citrix.com >> <mailto:andrew.cooper3@citrix.com>> wrote: >> >> On 04/04/17 16:39, Tamas K Lengyel wrote: >>> >>> On Tue, Apr 4, 2017 at 9:11 AM, Andrew Cooper >>> <andrew.cooper3@citrix.com <mailto:andrew.cooper3@citrix.com>> wrote: >>> >>> On 04/04/17 13:14, Razvan Cojocaru wrote: >>> > Currently xc_translate_foreign_address only checks for PSE >>> bit only on >>> > level 2 entries (that's 2 MB pages on x64 and 32-bit with >>> PAE, and 4 MB >>> > pages on 32-bit). But linux kernel sometimes uses 1 GB >>> pages. This patch >>> > fixes that, and checks the PSE bit on level 3 entries if the >>> guest has 4 >>> > translation levels (that means 64-bit guests only). >>> > >>> > Signed-off-by: Cristian-Bogdan Sirb <csirb@bitdefender.com >>> <mailto:csirb@bitdefender.com>> >>> >>> This function is in a rather tragic state. Lucky it isn't >>> used by code >>> covered by Xen's security statement. >>> >>> This patch reintroduces XSA-176, and the existing code doesn't >>> work for >>> 4M superpages, or guests using PSE36. (I might be acutely >>> aware of >>> pagetable issues, following c/s >>> 4c5d78a10dc89). Furthermore, the map/unmap overhead must be a >>> large >>> overhead. >>> >>> >>> Indeed it is, that's why in LibVMI there is actually a cache >>> implemented for mapped pages so we throw them away only after they >>> have been idle for a while. >>> >>> >>> >>> How often is this used by introspection? To properly walk the >>> pagetables, you need access to the CPUID information (as well >>> as the >>> control register state), and that isn't yet available to the >>> toolstack >>> in Xen. >>> >>> >>> Control register state is certainly available. >>> >>> >>> >>> I'm wondering whether it might be better to have a way of >>> asking Xen to >>> perform a virtual to linear translation in the context of a >>> specific >>> vcpu. My gut feeling is that it might be quicker than running >>> this >>> logic, and is definitely more simple than trying to fix this >>> code not to >>> have vulnerabilities in it. >>> >>> >>> I don't think it would be necessary. IMHO doing this in Xen >>> wouldn't really net us much performance-wise. Furthermore, there >>> are certain PTE bits that are OS specific and Xen wouldn't >>> necessary have the required information to do the translation >>> properly (ie. present bit unset but transition bits used for >>> Windows guests). >> Ok. Xen needs to care only about the behaviour of real pointers in >> guest context, and whether they raise #PF. >> >> It sounds like the best thing to do is to provide userspace with all >> information it needs to actually perform the walk safely, and let >> the library apply guest-specific knowledge if it wants. >> >> Off the top of my head, the control information needed is: >> >> Hap/Shadow, hardwares views towards page1gb and pse36, whether >> EFER.NX is leaked from Xen, EFER.NX, CR0.{PG,WP}, >> CR4.{PAE,PSE,PCID,SMEP,SMAP,PKE}, and the PDPTRs for the 32bit PAE >> case, because the translation in use isn't necessary the value you >> would read by following CR3 in memory. >> >> >> The userspace already has access to these informations and we use them >> in LibVMI already (see >> https://github.com/libvmi/libvmi/blob/master/libvmi/memory.c#L37). It's >> only this libxc function that has not really been touched in a long time >> because I don't think anyone uses it.. > Should it then be removed altogether, or at least be marked with a > #warning or a comment? Removing it entirely will break the gdbsx build. As its current user is only for debugging, I think this functional fix as proposed is fine, as long as it also adds a comment at the top indicating that the use of this function is hazardous for your health in production scenarios. ~Andrew
On 04/04/2017 08:04 PM, Andrew Cooper wrote: > On 04/04/17 17:45, Razvan Cojocaru wrote: >> On 04/04/2017 07:08 PM, Tamas K Lengyel wrote: >>> >>> On Tue, Apr 4, 2017 at 9:58 AM, Andrew Cooper <andrew.cooper3@citrix.com >>> <mailto:andrew.cooper3@citrix.com>> wrote: >>> >>> On 04/04/17 16:39, Tamas K Lengyel wrote: >>>> >>>> On Tue, Apr 4, 2017 at 9:11 AM, Andrew Cooper >>>> <andrew.cooper3@citrix.com <mailto:andrew.cooper3@citrix.com>> wrote: >>>> >>>> On 04/04/17 13:14, Razvan Cojocaru wrote: >>>> > Currently xc_translate_foreign_address only checks for PSE >>>> bit only on >>>> > level 2 entries (that's 2 MB pages on x64 and 32-bit with >>>> PAE, and 4 MB >>>> > pages on 32-bit). But linux kernel sometimes uses 1 GB >>>> pages. This patch >>>> > fixes that, and checks the PSE bit on level 3 entries if the >>>> guest has 4 >>>> > translation levels (that means 64-bit guests only). >>>> > >>>> > Signed-off-by: Cristian-Bogdan Sirb <csirb@bitdefender.com >>>> <mailto:csirb@bitdefender.com>> >>>> >>>> This function is in a rather tragic state. Lucky it isn't >>>> used by code >>>> covered by Xen's security statement. >>>> >>>> This patch reintroduces XSA-176, and the existing code doesn't >>>> work for >>>> 4M superpages, or guests using PSE36. (I might be acutely >>>> aware of >>>> pagetable issues, following c/s >>>> 4c5d78a10dc89). Furthermore, the map/unmap overhead must be a >>>> large >>>> overhead. >>>> >>>> >>>> Indeed it is, that's why in LibVMI there is actually a cache >>>> implemented for mapped pages so we throw them away only after they >>>> have been idle for a while. >>>> >>>> >>>> >>>> How often is this used by introspection? To properly walk the >>>> pagetables, you need access to the CPUID information (as well >>>> as the >>>> control register state), and that isn't yet available to the >>>> toolstack >>>> in Xen. >>>> >>>> >>>> Control register state is certainly available. >>>> >>>> >>>> >>>> I'm wondering whether it might be better to have a way of >>>> asking Xen to >>>> perform a virtual to linear translation in the context of a >>>> specific >>>> vcpu. My gut feeling is that it might be quicker than running >>>> this >>>> logic, and is definitely more simple than trying to fix this >>>> code not to >>>> have vulnerabilities in it. >>>> >>>> >>>> I don't think it would be necessary. IMHO doing this in Xen >>>> wouldn't really net us much performance-wise. Furthermore, there >>>> are certain PTE bits that are OS specific and Xen wouldn't >>>> necessary have the required information to do the translation >>>> properly (ie. present bit unset but transition bits used for >>>> Windows guests). >>> Ok. Xen needs to care only about the behaviour of real pointers in >>> guest context, and whether they raise #PF. >>> >>> It sounds like the best thing to do is to provide userspace with all >>> information it needs to actually perform the walk safely, and let >>> the library apply guest-specific knowledge if it wants. >>> >>> Off the top of my head, the control information needed is: >>> >>> Hap/Shadow, hardwares views towards page1gb and pse36, whether >>> EFER.NX is leaked from Xen, EFER.NX, CR0.{PG,WP}, >>> CR4.{PAE,PSE,PCID,SMEP,SMAP,PKE}, and the PDPTRs for the 32bit PAE >>> case, because the translation in use isn't necessary the value you >>> would read by following CR3 in memory. >>> >>> >>> The userspace already has access to these informations and we use them >>> in LibVMI already (see >>> https://github.com/libvmi/libvmi/blob/master/libvmi/memory.c#L37). It's >>> only this libxc function that has not really been touched in a long time >>> because I don't think anyone uses it.. >> Should it then be removed altogether, or at least be marked with a >> #warning or a comment? > > Removing it entirely will break the gdbsx build. > > As its current user is only for debugging, I think this functional fix > as proposed is fine, as long as it also adds a comment at the top > indicating that the use of this function is hazardous for your health in > production scenarios. Right, so should we then submit V2 with a comment stating this in the header file? Thanks, Razvan
On Wed, Apr 05, 2017 at 02:58:48PM +0300, Razvan Cojocaru wrote: > > > > > As its current user is only for debugging, I think this functional fix > > as proposed is fine, as long as it also adds a comment at the top > > indicating that the use of this function is hazardous for your health in > > production scenarios. > > Right, so should we then submit V2 with a comment stating this in the > header file? Yes please. > > > Thanks, > Razvan
diff --git a/tools/libxc/xc_pagetab.c b/tools/libxc/xc_pagetab.c index 92eebd6..db25c20 100644 --- a/tools/libxc/xc_pagetab.c +++ b/tools/libxc/xc_pagetab.c @@ -93,7 +93,7 @@ unsigned long xc_translate_foreign_address(xc_interface *xch, uint32_t dom, return 0; } paddr = pte & 0x000ffffffffff000ull; - if (level == 2 && (pte & PTE_PSE)) { + if ((level == 2 || (level == 3 && pt_levels == 4)) && (pte & PTE_PSE)) { mask = ((mask ^ ~-mask) >> 1); /* All bits below first set bit */ return ((paddr & ~mask) | (virt & mask)) >> PAGE_SHIFT; }
Currently xc_translate_foreign_address only checks for PSE bit only on level 2 entries (that's 2 MB pages on x64 and 32-bit with PAE, and 4 MB pages on 32-bit). But linux kernel sometimes uses 1 GB pages. This patch fixes that, and checks the PSE bit on level 3 entries if the guest has 4 translation levels (that means 64-bit guests only). Signed-off-by: Cristian-Bogdan Sirb <csirb@bitdefender.com> --- tools/libxc/xc_pagetab.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)