Message ID | 1395074636-32337-1-git-send-email-dh.herrmann@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi! On 03/17/2014 05:43 PM, David Herrmann wrote: > We introduced render-nodes about 1/2 year ago and no problems showed up. > Remove the drm_rnodes argument and enable them by default now. So what about the malicious execbuf command stream problem? Do we require all drivers that enable render-nodes to have a mechanism to prevent this in place? /Thomas
Hi On Thu, Mar 20, 2014 at 7:43 AM, Thomas Hellstrom <thomas@shipmail.org> wrote: > On 03/17/2014 05:43 PM, David Herrmann wrote: >> We introduced render-nodes about 1/2 year ago and no problems showed up. >> Remove the drm_rnodes argument and enable them by default now. > > So what about the malicious execbuf command stream problem? Do we > require all drivers that enable > render-nodes to have a mechanism to prevent this in place? No, that's no requirement. Render-nodes provide a secure API, if the underlying driver does no command-stream validation (I guess for performance-reasons and lack of VM), it's an implementation detail, not an API. Furthermore, you can always set higher restrictions on the render-node char-dev in case this bothers you. Cheers David
On 03/20/2014 08:36 AM, David Herrmann wrote: > Hi > > On Thu, Mar 20, 2014 at 7:43 AM, Thomas Hellstrom <thomas@shipmail.org> wrote: >> On 03/17/2014 05:43 PM, David Herrmann wrote: >>> We introduced render-nodes about 1/2 year ago and no problems showed up. >>> Remove the drm_rnodes argument and enable them by default now. >> So what about the malicious execbuf command stream problem? Do we >> require all drivers that enable >> render-nodes to have a mechanism to prevent this in place? > No, that's no requirement. Render-nodes provide a secure API, if the > underlying driver does no command-stream validation (I guess for > performance-reasons and lack of VM), it's an implementation detail, > not an API. Furthermore, you can always set higher restrictions on the > render-node char-dev in case this bothers you. I'm merely trying to envision the situation where a distro wants to create, for example an udev rule for the render nodes. How should the distro know that the implementation is not insecure? Historically drm has refused to upstream drivers without a proper command validation mechanism in place (via for example), but that validation mechanism only needed to make sure no random system memory was ever accessible to an authenticated DRM client. Now, render nodes are designed to provide also user data isolation. But if we allow insecure implementations, wouldn't that compromise the whole idea? Now that we have a secure API within reach, wouldn't it be reasonable to require implementations to also be secure, following earlier DRM practices? Or am I missing something? /Thomas > > Cheers > David
Hi Thomas On Thu, Mar 20, 2014 at 9:48 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote: > I'm merely trying to envision the situation where a distro wants to > create, for example an udev rule for the render nodes. > > How should the distro know that the implementation is not insecure? > > Historically drm has refused to upstream drivers without a proper > command validation mechanism in place (via for example), > but that validation mechanism only needed to make sure no random system > memory was ever accessible to an authenticated DRM client. I expect all drivers to protect system-memory. All that I am talking about is one exec-buffer writing to memory of another _GPU_ buffer that this client doesn't have access to. But maybe driver authors can give some insights. I'd even expect non-texture/data GPU buffers to be protected. > Now, render nodes are designed to provide also user data isolation. But > if we allow insecure implementations, wouldn't that compromise the whole > idea? > Now that we have a secure API within reach, wouldn't it be reasonable to > require implementations to also be secure, following earlier DRM practices? I don't understand what this has to do with render-nodes? The API _is_ secure. What would be the gain of disabling render-nodes for broken (even deliberately) drivers? User-space is not going to assume drivers are broken and rely on hand-crafted exec-buffers that cross buffer boundaries. So yes, we're fooling our selves with API guarantees that drivers cannot deliver, but what's the downside? Thanks David
On 03/20/2014 10:05 AM, David Herrmann wrote: > Hi Thomas > > On Thu, Mar 20, 2014 at 9:48 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote: >> I'm merely trying to envision the situation where a distro wants to >> create, for example an udev rule for the render nodes. >> >> How should the distro know that the implementation is not insecure? >> >> Historically drm has refused to upstream drivers without a proper >> command validation mechanism in place (via for example), >> but that validation mechanism only needed to make sure no random system >> memory was ever accessible to an authenticated DRM client. > I expect all drivers to protect system-memory. All that I am talking > about is one exec-buffer writing to memory of another _GPU_ buffer > that this client doesn't have access to. But maybe driver authors can > give some insights. I'd even expect non-texture/data GPU buffers to be > protected. > >> Now, render nodes are designed to provide also user data isolation. But >> if we allow insecure implementations, wouldn't that compromise the whole >> idea? >> Now that we have a secure API within reach, wouldn't it be reasonable to >> require implementations to also be secure, following earlier DRM practices? > I don't understand what this has to do with render-nodes? The API _is_ > secure. What would be the gain of disabling render-nodes for broken > (even deliberately) drivers? User-space is not going to assume drivers > are broken and rely on hand-crafted exec-buffers that cross buffer > boundaries. So yes, we're fooling our selves with API guarantees that > drivers cannot deliver, but what's the downside? > > Thanks > David OK, let me give one example: A user logs in to a system where DRI clients use render nodes. The system grants rw permission on the render nodes for the console user. User starts editing a secret document, starts some GPGPU structural FEM computations of the Pentagon building. Locks the screen and goes for lunch. A malicious user logs in using fast user switching and becomes the owner of the render node. Tries to map a couple of random offsets, but that fails, due to security checks. Now crafts a malicious command stream to dump all GPU memory to a file. Steals the first user's secret document and the intermediate Pentagon data. Logs out and starts data mining. Now if we require drivers to block these malicious command streams this can never happen, and distros can reliably grant rw access to the render nodes to the user currently logged into the console. I guest basically what I'm trying to say that with the legacy concept it was OK to access all GPU memory, because an authenticated X user basically had the same permissions. With render nodes we're allowing multiple users into the GPU at the same time, and it's not OK anymore for a client to access another clients GPU buffer through a malicious command stream. /Thomas
Hi On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote: > A user logs in to a system where DRI clients use render nodes. The > system grants rw permission on the render nodes for the console user. > User starts editing a secret document, starts some GPGPU structural FEM > computations of the Pentagon building. Locks the screen and goes for lunch. > > A malicious user logs in using fast user switching and becomes the owner > of the render node. Tries to map a couple of random offsets, but that > fails, due to security checks. Now crafts a malicious command stream to > dump all GPU memory to a file. Steals the first user's secret document > and the intermediate Pentagon data. Logs out and starts data mining. > > Now if we require drivers to block these malicious command streams this > can never happen, and distros can reliably grant rw access to the render > nodes to the user currently logged into the console. > > I guest basically what I'm trying to say that with the legacy concept it > was OK to access all GPU memory, because an authenticated X user > basically had the same permissions. > > With render nodes we're allowing multiple users into the GPU at the same > time, and it's not OK anymore for a client to access another clients GPU > buffer through a malicious command stream. Yes, I understand the attack scenario, but that's not related to render-nodes at all. The exact same races exist on the legacy node: 1) If you can do fast-user switching, you can spawn your own X-server, get authenticated on your own server and you are allowed into the GPU. You cannot map other user's buffers because they're on a different master-object, but you _can_ craft malicious GPU streams and access the other user's buffer. 2) If you can do fast-user switching, switch to an empty VT, open the legacy node and you automatically become DRM-Master because there is no active master. Now you can do anything on the DRM node, including crafting malicious GPU streams. Given that the legacy node is always around and _always_ has these races, why should we prevent render-nodes from appearing just because the _driver_ is racy? I mean, there is no gain in that.. if it opens a new race, as you assumed, then yes, we should avoid it. But at least for all drivers supporting render-nodes so far, they either are entirely safe or the just described races exist on both nodes. Thanks David
On 03/20/2014 10:43 AM, David Herrmann wrote: > Hi > > On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom > <thellstrom@vmware.com> wrote: >> A user logs in to a system where DRI clients use render nodes. The >> system grants rw permission on the render nodes for the console user. >> User starts editing a secret document, starts some GPGPU structural FEM >> computations of the Pentagon building. Locks the screen and goes for lunch. >> >> A malicious user logs in using fast user switching and becomes the owner >> of the render node. Tries to map a couple of random offsets, but that >> fails, due to security checks. Now crafts a malicious command stream to >> dump all GPU memory to a file. Steals the first user's secret document >> and the intermediate Pentagon data. Logs out and starts data mining. >> >> Now if we require drivers to block these malicious command streams this >> can never happen, and distros can reliably grant rw access to the render >> nodes to the user currently logged into the console. >> >> I guest basically what I'm trying to say that with the legacy concept it >> was OK to access all GPU memory, because an authenticated X user >> basically had the same permissions. >> >> With render nodes we're allowing multiple users into the GPU at the same >> time, and it's not OK anymore for a client to access another clients GPU >> buffer through a malicious command stream. > Yes, I understand the attack scenario, but that's not related to > render-nodes at all. The exact same races exist on the legacy node: I was under the impression that render nodes were designed to fix these issues? > > 1) If you can do fast-user switching, you can spawn your own X-server, > get authenticated on your own server and you are allowed into the GPU. > You cannot map other user's buffers because they're on a different > master-object, but you _can_ craft malicious GPU streams and access > the other user's buffer. But with legacy nodes, drivers can (and should IMO) throw out all data from GPU memory on master drop, and then block dropped master authenticated clients from GPU, until their master becomes active again or dies (in which case they are killed). In line with a previous discussion we had. Now you can't do this with render nodes, so yes they do open up a new race that requires command stream validation. > > 2) If you can do fast-user switching, switch to an empty VT, open the > legacy node and you automatically become DRM-Master because there is > no active master. Now you can do anything on the DRM node, including > crafting malicious GPU streams. I believe the above solution should work for this case as well. > > Given that the legacy node is always around and _always_ has these > races, why should we prevent render-nodes from appearing just because > the _driver_ is racy? I mean, there is no gain in that.. if it opens a > new race, as you assumed, then yes, we should avoid it. But at least > for all drivers supporting render-nodes so far, they either are > entirely safe or the just described races exist on both nodes. My suggestion is actually not to prevent render nodes from appearing, but rather that we should restrict them to drivers with command stream verification and / or per process virtual memory, and I also think we should plug the above races on legacy nodes. That way legacy nodes would use the old "master owns it all" model, while render nodes could allow multiple users at the same time. /Thomas
On Thu, Mar 20, 2014 at 11:28:18AM +0100, Thomas Hellstrom wrote: > On 03/20/2014 10:43 AM, David Herrmann wrote: > > Hi > > > > On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom > > <thellstrom@vmware.com> wrote: > >> A user logs in to a system where DRI clients use render nodes. The > >> system grants rw permission on the render nodes for the console user. > >> User starts editing a secret document, starts some GPGPU structural FEM > >> computations of the Pentagon building. Locks the screen and goes for lunch. > >> > >> A malicious user logs in using fast user switching and becomes the owner > >> of the render node. Tries to map a couple of random offsets, but that > >> fails, due to security checks. Now crafts a malicious command stream to > >> dump all GPU memory to a file. Steals the first user's secret document > >> and the intermediate Pentagon data. Logs out and starts data mining. > >> > >> Now if we require drivers to block these malicious command streams this > >> can never happen, and distros can reliably grant rw access to the render > >> nodes to the user currently logged into the console. > >> > >> I guest basically what I'm trying to say that with the legacy concept it > >> was OK to access all GPU memory, because an authenticated X user > >> basically had the same permissions. > >> > >> With render nodes we're allowing multiple users into the GPU at the same > >> time, and it's not OK anymore for a client to access another clients GPU > >> buffer through a malicious command stream. > > Yes, I understand the attack scenario, but that's not related to > > render-nodes at all. The exact same races exist on the legacy node: > > I was under the impression that render nodes were designed to fix these > issues? > > > > > 1) If you can do fast-user switching, you can spawn your own X-server, > > get authenticated on your own server and you are allowed into the GPU. > > You cannot map other user's buffers because they're on a different > > master-object, but you _can_ craft malicious GPU streams and access > > the other user's buffer. > > But with legacy nodes, drivers can (and should IMO) throw out all data > from GPU memory on master drop, > and then block dropped master authenticated clients from GPU, until > their master becomes active again or dies (in which case they are > killed). In line with a previous discussion we had. Now you can't do > this with render nodes, so yes they do open up > a new race that requires command stream validation. > > > > > 2) If you can do fast-user switching, switch to an empty VT, open the > > legacy node and you automatically become DRM-Master because there is > > no active master. Now you can do anything on the DRM node, including > > crafting malicious GPU streams. > > I believe the above solution should work for this case as well. > > > > > Given that the legacy node is always around and _always_ has these > > races, why should we prevent render-nodes from appearing just because > > the _driver_ is racy? I mean, there is no gain in that.. if it opens a > > new race, as you assumed, then yes, we should avoid it. But at least > > for all drivers supporting render-nodes so far, they either are > > entirely safe or the just described races exist on both nodes. > > My suggestion is actually not to prevent render nodes from appearing, > but rather that we should restrict them to drivers with command stream > verification and / or per process virtual memory, and I also think we > should plug the above races on legacy nodes. That way legacy nodes would > use the old "master owns it all" model, while render nodes could allow > multiple users at the same time. FYI both radeon and nouveau (last time i check) can not be abuse to access random VRAM or buffer bind by another process (except if the buffer is share of course). So there is no way (modulo any bug in command stream checking or hardware) to abuse render node to access any memory that you do not have the right to access. I am pretty sure this stands for Intel too, i do not know if others drm driver have this kind of assurance. In any case this is not a render node issue and there is no reasons to force full VRAM eviction or anything like that. Cheers, Jérôme
On Thu, Mar 20, 2014 at 10:36 AM, Jerome Glisse <j.glisse@gmail.com> wrote: > On Thu, Mar 20, 2014 at 11:28:18AM +0100, Thomas Hellstrom wrote: >> On 03/20/2014 10:43 AM, David Herrmann wrote: >> > On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom >> > <thellstrom@vmware.com> wrote: >> > Given that the legacy node is always around and _always_ has these >> > races, why should we prevent render-nodes from appearing just because >> > the _driver_ is racy? I mean, there is no gain in that.. if it opens a >> > new race, as you assumed, then yes, we should avoid it. But at least >> > for all drivers supporting render-nodes so far, they either are >> > entirely safe or the just described races exist on both nodes. >> >> My suggestion is actually not to prevent render nodes from appearing, >> but rather that we should restrict them to drivers with command stream >> verification and / or per process virtual memory, and I also think we >> should plug the above races on legacy nodes. That way legacy nodes would >> use the old "master owns it all" model, while render nodes could allow >> multiple users at the same time. > > FYI both radeon and nouveau (last time i check) can not be abuse to access > random VRAM or buffer bind by another process (except if the buffer is share > of course). I'm not aware of any restrictions in nouveau that would prevent you from accessing any vram... there are a number of copy engines accessible that would allow you to copy one region to another, and I'm not aware of code to validate pushbuf contents (it would have to parse the pushbuf and know which methods store source/destination addresses). You could probably get creative and use that to overwrite the MMU tables, to then gain the ability to read/write any part of system memory... but perhaps the engines won't allow you to do that, not sure. (Or perhaps the PDE isn't mapped into VRAM. Again, not sure.) -ilia
On 03/20/2014 03:36 PM, Jerome Glisse wrote: > > In any case this is not a render node issue and there is no reasons to force > full VRAM eviction or anything like that. This comment suggests you haven't read the discussion fully. VRAM eviction was discussed in the context of legacy nodes. This is a render node issue because with legacy nodes you can work around insufficient command checking. With render nodes you can't. /Thomas > Cheers, > Jérôme
On Thu, Mar 20, 2014 at 03:59:17PM +0100, Thomas Hellstrom wrote: > On 03/20/2014 03:36 PM, Jerome Glisse wrote: > > > > In any case this is not a render node issue and there is no reasons to force > > full VRAM eviction or anything like that. > > This comment suggests you haven't read the discussion fully. > > VRAM eviction was discussed in the context of legacy nodes. > > This is a render node issue because with legacy nodes you can work > around insufficient command checking. > With render nodes you can't. On radeon you can not abuse the GPU period legacy node or not. My comment was about the fact that this is a driver issue and not a render node issue. I would consider driver that allow to abuse the GPU block to access any memory as broken no matter if we are talking about legacy or new render node. Cheers, Jérôme > > /Thomas > > > Cheers, > > Jérôme
On Thu, Mar 20, 2014 at 10:44:10AM -0400, Ilia Mirkin wrote: > On Thu, Mar 20, 2014 at 10:36 AM, Jerome Glisse <j.glisse@gmail.com> wrote: > > On Thu, Mar 20, 2014 at 11:28:18AM +0100, Thomas Hellstrom wrote: > >> On 03/20/2014 10:43 AM, David Herrmann wrote: > >> > On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom > >> > <thellstrom@vmware.com> wrote: > >> > Given that the legacy node is always around and _always_ has these > >> > races, why should we prevent render-nodes from appearing just because > >> > the _driver_ is racy? I mean, there is no gain in that.. if it opens a > >> > new race, as you assumed, then yes, we should avoid it. But at least > >> > for all drivers supporting render-nodes so far, they either are > >> > entirely safe or the just described races exist on both nodes. > >> > >> My suggestion is actually not to prevent render nodes from appearing, > >> but rather that we should restrict them to drivers with command stream > >> verification and / or per process virtual memory, and I also think we > >> should plug the above races on legacy nodes. That way legacy nodes would > >> use the old "master owns it all" model, while render nodes could allow > >> multiple users at the same time. > > > > FYI both radeon and nouveau (last time i check) can not be abuse to access > > random VRAM or buffer bind by another process (except if the buffer is share > > of course). > > I'm not aware of any restrictions in nouveau that would prevent you > from accessing any vram... there are a number of copy engines > accessible that would allow you to copy one region to another, and I'm > not aware of code to validate pushbuf contents (it would have to parse > the pushbuf and know which methods store source/destination > addresses). You could probably get creative and use that to overwrite > the MMU tables, to then gain the ability to read/write any part of > system memory... but perhaps the engines won't allow you to do that, > not sure. (Or perhaps the PDE isn't mapped into VRAM. Again, not > sure.) > > -ilia Well i obviously have not look at that in long time, but if the nouveau API can be abuse than i consider this a broken by design. I know command checking is painfull and CPU intensive but we have done it in radeon for a reason. Cheers, Jérôme
On 03/20/2014 04:34 PM, Jerome Glisse wrote: > On Thu, Mar 20, 2014 at 03:59:17PM +0100, Thomas Hellstrom wrote: >> On 03/20/2014 03:36 PM, Jerome Glisse wrote: >>> In any case this is not a render node issue and there is no reasons to force >>> full VRAM eviction or anything like that. >> This comment suggests you haven't read the discussion fully. >> >> VRAM eviction was discussed in the context of legacy nodes. >> >> This is a render node issue because with legacy nodes you can work >> around insufficient command checking. >> With render nodes you can't. > On radeon you can not abuse the GPU period legacy node or not. My comment > was about the fact that this is a driver issue and not a render node issue. > I would consider driver that allow to abuse the GPU block to access any > memory as broken no matter if we are talking about legacy or new render > node. > > Cheers, > Jérôme > Great. Then I assume you don't have an issue with not enabling render-nodes for those broken drivers, (or at least a sysfs property or something similar flagging those device nodes as broken)? Thanks, Thomas
On Thu, Mar 20, 2014 at 04:49:42PM +0100, Thomas Hellstrom wrote: > On 03/20/2014 04:34 PM, Jerome Glisse wrote: > > On Thu, Mar 20, 2014 at 03:59:17PM +0100, Thomas Hellstrom wrote: > >> On 03/20/2014 03:36 PM, Jerome Glisse wrote: > >>> In any case this is not a render node issue and there is no reasons to force > >>> full VRAM eviction or anything like that. > >> This comment suggests you haven't read the discussion fully. > >> > >> VRAM eviction was discussed in the context of legacy nodes. > >> > >> This is a render node issue because with legacy nodes you can work > >> around insufficient command checking. > >> With render nodes you can't. > > On radeon you can not abuse the GPU period legacy node or not. My comment > > was about the fact that this is a driver issue and not a render node issue. > > I would consider driver that allow to abuse the GPU block to access any > > memory as broken no matter if we are talking about legacy or new render > > node. > > > > Cheers, > > Jérôme > > > > Great. Then I assume you don't have an issue with not enabling > render-nodes for those broken drivers, > (or at least a sysfs property or something similar flagging those device > nodes as broken)? > > Thanks, > Thomas > > Yes, broken driver should not have render node, at leadt in my view. Cheers, Jérôme
On Thu, Mar 20, 2014 at 6:28 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote: > On 03/20/2014 10:43 AM, David Herrmann wrote: >> Hi >> >> On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom >> <thellstrom@vmware.com> wrote: >>> A user logs in to a system where DRI clients use render nodes. The >>> system grants rw permission on the render nodes for the console user. >>> User starts editing a secret document, starts some GPGPU structural FEM >>> computations of the Pentagon building. Locks the screen and goes for lunch. >>> >>> A malicious user logs in using fast user switching and becomes the owner >>> of the render node. Tries to map a couple of random offsets, but that >>> fails, due to security checks. Now crafts a malicious command stream to >>> dump all GPU memory to a file. Steals the first user's secret document >>> and the intermediate Pentagon data. Logs out and starts data mining. >>> >>> Now if we require drivers to block these malicious command streams this >>> can never happen, and distros can reliably grant rw access to the render >>> nodes to the user currently logged into the console. >>> >>> I guest basically what I'm trying to say that with the legacy concept it >>> was OK to access all GPU memory, because an authenticated X user >>> basically had the same permissions. >>> >>> With render nodes we're allowing multiple users into the GPU at the same >>> time, and it's not OK anymore for a client to access another clients GPU >>> buffer through a malicious command stream. >> Yes, I understand the attack scenario, but that's not related to >> render-nodes at all. The exact same races exist on the legacy node: > > I was under the impression that render nodes were designed to fix these > issues? > >> >> 1) If you can do fast-user switching, you can spawn your own X-server, >> get authenticated on your own server and you are allowed into the GPU. >> You cannot map other user's buffers because they're on a different >> master-object, but you _can_ craft malicious GPU streams and access >> the other user's buffer. > > But with legacy nodes, drivers can (and should IMO) throw out all data > from GPU memory on master drop, > and then block dropped master authenticated clients from GPU, until > their master becomes active again or dies (in which case they are > killed). In line with a previous discussion we had. Now you can't do > this with render nodes, so yes they do open up > a new race that requires command stream validation. > >> >> 2) If you can do fast-user switching, switch to an empty VT, open the >> legacy node and you automatically become DRM-Master because there is >> no active master. Now you can do anything on the DRM node, including >> crafting malicious GPU streams. > > I believe the above solution should work for this case as well. > >> >> Given that the legacy node is always around and _always_ has these >> races, why should we prevent render-nodes from appearing just because >> the _driver_ is racy? I mean, there is no gain in that.. if it opens a >> new race, as you assumed, then yes, we should avoid it. But at least >> for all drivers supporting render-nodes so far, they either are >> entirely safe or the just described races exist on both nodes. > > My suggestion is actually not to prevent render nodes from appearing, > but rather that we should restrict them to drivers with command stream > verification and / or per process virtual memory, and I also think we > should plug the above races on legacy nodes. That way legacy nodes would > use the old "master owns it all" model, while render nodes could allow > multiple users at the same time. > hmm, if you only have global gpu virtual memory (rather than per-process), it would still be kinda nice to support render nodes if app had some way to figure out whether or not it's gpu buffers were secure. Ie. an app that was using the gpu for something secure could simply choose not to if the hw/driver could not guarantee that another process using the gpu could not get access to the buffers.. BR, -R > > /Thomas > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel
On Thu, Mar 20, 2014 at 10:44 AM, Ilia Mirkin <imirkin@alum.mit.edu> wrote: > On Thu, Mar 20, 2014 at 10:36 AM, Jerome Glisse <j.glisse@gmail.com> wrote: >> On Thu, Mar 20, 2014 at 11:28:18AM +0100, Thomas Hellstrom wrote: >>> On 03/20/2014 10:43 AM, David Herrmann wrote: >>> > On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom >>> > <thellstrom@vmware.com> wrote: >>> > Given that the legacy node is always around and _always_ has these >>> > races, why should we prevent render-nodes from appearing just because >>> > the _driver_ is racy? I mean, there is no gain in that.. if it opens a >>> > new race, as you assumed, then yes, we should avoid it. But at least >>> > for all drivers supporting render-nodes so far, they either are >>> > entirely safe or the just described races exist on both nodes. >>> >>> My suggestion is actually not to prevent render nodes from appearing, >>> but rather that we should restrict them to drivers with command stream >>> verification and / or per process virtual memory, and I also think we >>> should plug the above races on legacy nodes. That way legacy nodes would >>> use the old "master owns it all" model, while render nodes could allow >>> multiple users at the same time. >> >> FYI both radeon and nouveau (last time i check) can not be abuse to access >> random VRAM or buffer bind by another process (except if the buffer is share >> of course). > > I'm not aware of any restrictions in nouveau that would prevent you > from accessing any vram... there are a number of copy engines > accessible that would allow you to copy one region to another, and I'm > not aware of code to validate pushbuf contents (it would have to parse > the pushbuf and know which methods store source/destination > addresses). You could probably get creative and use that to overwrite > the MMU tables, to then gain the ability to read/write any part of > system memory... but perhaps the engines won't allow you to do that, > not sure. (Or perhaps the PDE isn't mapped into VRAM. Again, not > sure.) Please ignore this totally wrong comment. While there is indeed no pushbuf validation, every process (well, every open() of the drm node) gets its own virtual memory space for all GPU's with MMU's (nv50+). -ilia
On 03/20/2014 06:34 PM, Rob Clark wrote: > On Thu, Mar 20, 2014 at 6:28 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote: >> On 03/20/2014 10:43 AM, David Herrmann wrote: >>> Hi >>> >>> On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom >>> <thellstrom@vmware.com> wrote: >>>> A user logs in to a system where DRI clients use render nodes. The >>>> system grants rw permission on the render nodes for the console user. >>>> User starts editing a secret document, starts some GPGPU structural FEM >>>> computations of the Pentagon building. Locks the screen and goes for lunch. >>>> >>>> A malicious user logs in using fast user switching and becomes the owner >>>> of the render node. Tries to map a couple of random offsets, but that >>>> fails, due to security checks. Now crafts a malicious command stream to >>>> dump all GPU memory to a file. Steals the first user's secret document >>>> and the intermediate Pentagon data. Logs out and starts data mining. >>>> >>>> Now if we require drivers to block these malicious command streams this >>>> can never happen, and distros can reliably grant rw access to the render >>>> nodes to the user currently logged into the console. >>>> >>>> I guest basically what I'm trying to say that with the legacy concept it >>>> was OK to access all GPU memory, because an authenticated X user >>>> basically had the same permissions. >>>> >>>> With render nodes we're allowing multiple users into the GPU at the same >>>> time, and it's not OK anymore for a client to access another clients GPU >>>> buffer through a malicious command stream. >>> Yes, I understand the attack scenario, but that's not related to >>> render-nodes at all. The exact same races exist on the legacy node: >> I was under the impression that render nodes were designed to fix these >> issues? >> >>> 1) If you can do fast-user switching, you can spawn your own X-server, >>> get authenticated on your own server and you are allowed into the GPU. >>> You cannot map other user's buffers because they're on a different >>> master-object, but you _can_ craft malicious GPU streams and access >>> the other user's buffer. >> But with legacy nodes, drivers can (and should IMO) throw out all data >> from GPU memory on master drop, >> and then block dropped master authenticated clients from GPU, until >> their master becomes active again or dies (in which case they are >> killed). In line with a previous discussion we had. Now you can't do >> this with render nodes, so yes they do open up >> a new race that requires command stream validation. >> >>> 2) If you can do fast-user switching, switch to an empty VT, open the >>> legacy node and you automatically become DRM-Master because there is >>> no active master. Now you can do anything on the DRM node, including >>> crafting malicious GPU streams. >> I believe the above solution should work for this case as well. >> >>> Given that the legacy node is always around and _always_ has these >>> races, why should we prevent render-nodes from appearing just because >>> the _driver_ is racy? I mean, there is no gain in that.. if it opens a >>> new race, as you assumed, then yes, we should avoid it. But at least >>> for all drivers supporting render-nodes so far, they either are >>> entirely safe or the just described races exist on both nodes. >> My suggestion is actually not to prevent render nodes from appearing, >> but rather that we should restrict them to drivers with command stream >> verification and / or per process virtual memory, and I also think we >> should plug the above races on legacy nodes. That way legacy nodes would >> use the old "master owns it all" model, while render nodes could allow >> multiple users at the same time. >> > hmm, if you only have global gpu virtual memory (rather than > per-process), it would still be kinda nice to support render nodes if > app had some way to figure out whether or not it's gpu buffers were > secure. If there is only global GPU memory there's of course also the option of providing a command stream verifier. > Ie. an app that was using the gpu for something secure could > simply choose not to if the hw/driver could not guarantee that another > process using the gpu could not get access to the buffers.. IMO that should work fine, but we need to provide a way for user-space to determine whether the render node is secure or not. Perhaps a sysfs attribute and / or a drm getparam() parameter? /Thomas > > BR, > -R > >> /Thomas >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/dri-devel&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=FdvSEw9pF7e8FqQQ9dlatoXG%2BSm0ueWrIhyOeI%2BOc64%3D%0A&s=926ef90112ced9636ddbf2873c3770bdf06ca244ec32744fe33e478b93e0d695
On Thu, Mar 20, 2014 at 4:54 PM, Thomas Hellstrom <thellstrom@vmware.com> wrote: > On 03/20/2014 06:34 PM, Rob Clark wrote: >> On Thu, Mar 20, 2014 at 6:28 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote: >>> On 03/20/2014 10:43 AM, David Herrmann wrote: >>>> Hi >>>> >>>> On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom >>>> <thellstrom@vmware.com> wrote: >>>>> A user logs in to a system where DRI clients use render nodes. The >>>>> system grants rw permission on the render nodes for the console user. >>>>> User starts editing a secret document, starts some GPGPU structural FEM >>>>> computations of the Pentagon building. Locks the screen and goes for lunch. >>>>> >>>>> A malicious user logs in using fast user switching and becomes the owner >>>>> of the render node. Tries to map a couple of random offsets, but that >>>>> fails, due to security checks. Now crafts a malicious command stream to >>>>> dump all GPU memory to a file. Steals the first user's secret document >>>>> and the intermediate Pentagon data. Logs out and starts data mining. >>>>> >>>>> Now if we require drivers to block these malicious command streams this >>>>> can never happen, and distros can reliably grant rw access to the render >>>>> nodes to the user currently logged into the console. >>>>> >>>>> I guest basically what I'm trying to say that with the legacy concept it >>>>> was OK to access all GPU memory, because an authenticated X user >>>>> basically had the same permissions. >>>>> >>>>> With render nodes we're allowing multiple users into the GPU at the same >>>>> time, and it's not OK anymore for a client to access another clients GPU >>>>> buffer through a malicious command stream. >>>> Yes, I understand the attack scenario, but that's not related to >>>> render-nodes at all. The exact same races exist on the legacy node: >>> I was under the impression that render nodes were designed to fix these >>> issues? >>> >>>> 1) If you can do fast-user switching, you can spawn your own X-server, >>>> get authenticated on your own server and you are allowed into the GPU. >>>> You cannot map other user's buffers because they're on a different >>>> master-object, but you _can_ craft malicious GPU streams and access >>>> the other user's buffer. >>> But with legacy nodes, drivers can (and should IMO) throw out all data >>> from GPU memory on master drop, >>> and then block dropped master authenticated clients from GPU, until >>> their master becomes active again or dies (in which case they are >>> killed). In line with a previous discussion we had. Now you can't do >>> this with render nodes, so yes they do open up >>> a new race that requires command stream validation. >>> >>>> 2) If you can do fast-user switching, switch to an empty VT, open the >>>> legacy node and you automatically become DRM-Master because there is >>>> no active master. Now you can do anything on the DRM node, including >>>> crafting malicious GPU streams. >>> I believe the above solution should work for this case as well. >>> >>>> Given that the legacy node is always around and _always_ has these >>>> races, why should we prevent render-nodes from appearing just because >>>> the _driver_ is racy? I mean, there is no gain in that.. if it opens a >>>> new race, as you assumed, then yes, we should avoid it. But at least >>>> for all drivers supporting render-nodes so far, they either are >>>> entirely safe or the just described races exist on both nodes. >>> My suggestion is actually not to prevent render nodes from appearing, >>> but rather that we should restrict them to drivers with command stream >>> verification and / or per process virtual memory, and I also think we >>> should plug the above races on legacy nodes. That way legacy nodes would >>> use the old "master owns it all" model, while render nodes could allow >>> multiple users at the same time. >>> >> hmm, if you only have global gpu virtual memory (rather than >> per-process), it would still be kinda nice to support render nodes if >> app had some way to figure out whether or not it's gpu buffers were >> secure. > > If there is only global GPU memory there's of course also the option of > providing a > command stream verifier. well, that wouldn't really help separate buffers from other contexts (since a3xx and later has various load/store instructions).. At the moment, I have two cases: 1) MMU... gpu has no direct access to system memory, so other than access to other contexts buffers, the system is secure 2) no-MMU... vram carveout and set some registers to limit access to addresses within that range In case #1 we could implement per-context vm.. just a matter of writing some code. Doing it the naive way requires draining the command queue on context switches and getting the CPU involved, which isn't so terribly awesome. The downstream android driver does context switches on the CP itself (ie. bang some registers to point the MMU at new set of tables and TLB flush).. but it needs to be confirmed that this can be done securely (ie. restricted to ringbuffer controlled by kernel and not IB buffer from userspace). If it isn't restricted to kernel ringbuffer, then that sort of opens up an even bigger hole than it closes ;-) >> Ie. an app that was using the gpu for something secure could >> simply choose not to if the hw/driver could not guarantee that another >> process using the gpu could not get access to the buffers.. > > IMO that should work fine, but we need to provide a way for user-space > to determine whether > the render node is secure or not. Perhaps a sysfs attribute and / or a > drm getparam() parameter? I'd *assume* that that sort of thing would just be some sort of CL extension? But no objection to exposing it in a more common way. I guess it is also an option to keep the bootarg to override default (with the default being 'enabled' for hw w/ per-context/process vm and 'disabled' otherwise). BR, -R > /Thomas > > >> >> BR, >> -R >> >>> /Thomas >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel@lists.freedesktop.org >>> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/dri-devel&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=FdvSEw9pF7e8FqQQ9dlatoXG%2BSm0ueWrIhyOeI%2BOc64%3D%0A&s=926ef90112ced9636ddbf2873c3770bdf06ca244ec32744fe33e478b93e0d695
On Thu, Mar 20, 2014 at 10:13 PM, Rob Clark <robdclark@gmail.com> wrote: >>> Ie. an app that was using the gpu for something secure could >>> simply choose not to if the hw/driver could not guarantee that another >>> process using the gpu could not get access to the buffers.. >> >> IMO that should work fine, but we need to provide a way for user-space >> to determine whether >> the render node is secure or not. Perhaps a sysfs attribute and / or a >> drm getparam() parameter? > > I'd *assume* that that sort of thing would just be some sort of CL extension? > > But no objection to exposing it in a more common way. > > I guess it is also an option to keep the bootarg to override default > (with the default being 'enabled' for hw w/ per-context/process vm and > 'disabled' otherwise). Imo if we expose this through sysfs we should always enable rendernodes. The udev scripts can then decide what permissions to set on them. -Daniel
On 03/21/2014 08:10 AM, Daniel Vetter wrote: > On Thu, Mar 20, 2014 at 10:13 PM, Rob Clark <robdclark@gmail.com> wrote: >>>> Ie. an app that was using the gpu for something secure could >>>> simply choose not to if the hw/driver could not guarantee that another >>>> process using the gpu could not get access to the buffers.. >>> IMO that should work fine, but we need to provide a way for user-space >>> to determine whether >>> the render node is secure or not. Perhaps a sysfs attribute and / or a >>> drm getparam() parameter? >> I'd *assume* that that sort of thing would just be some sort of CL extension? >> >> But no objection to exposing it in a more common way. >> >> I guess it is also an option to keep the bootarg to override default >> (with the default being 'enabled' for hw w/ per-context/process vm and >> 'disabled' otherwise). > Imo if we expose this through sysfs we should always enable > rendernodes. The udev scripts can then decide what permissions to set > on them. Agreed. Thomas > -Daniel
diff --git a/drivers/gpu/drm/drm_stub.c b/drivers/gpu/drm/drm_stub.c index c23eaf6..b6cf0b3 100644 --- a/drivers/gpu/drm/drm_stub.c +++ b/drivers/gpu/drm/drm_stub.c @@ -40,9 +40,6 @@ unsigned int drm_debug = 0; /* 1 to enable debug output */ EXPORT_SYMBOL(drm_debug); -unsigned int drm_rnodes = 0; /* 1 to enable experimental render nodes API */ -EXPORT_SYMBOL(drm_rnodes); - unsigned int drm_vblank_offdelay = 5000; /* Default to 5000 msecs. */ EXPORT_SYMBOL(drm_vblank_offdelay); @@ -59,13 +56,11 @@ MODULE_AUTHOR(CORE_AUTHOR); MODULE_DESCRIPTION(CORE_DESC); MODULE_LICENSE("GPL and additional rights"); MODULE_PARM_DESC(debug, "Enable debug output"); -MODULE_PARM_DESC(rnodes, "Enable experimental render nodes API"); MODULE_PARM_DESC(vblankoffdelay, "Delay until vblank irq auto-disable [msecs]"); MODULE_PARM_DESC(timestamp_precision_usec, "Max. error on timestamps [usecs]"); MODULE_PARM_DESC(timestamp_monotonic, "Use monotonic timestamps"); module_param_named(debug, drm_debug, int, 0600); -module_param_named(rnodes, drm_rnodes, int, 0600); module_param_named(vblankoffdelay, drm_vblank_offdelay, int, 0600); module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, 0600); module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 0600); @@ -505,7 +500,7 @@ struct drm_device *drm_dev_alloc(struct drm_driver *driver, goto err_minors; } - if (drm_core_check_feature(dev, DRIVER_RENDER) && drm_rnodes) { + if (drm_core_check_feature(dev, DRIVER_RENDER)) { ret = drm_minor_alloc(dev, DRM_MINOR_RENDER); if (ret) goto err_minors; diff --git a/include/drm/drmP.h b/include/drm/drmP.h index 5380790..2d81c66 100644 --- a/include/drm/drmP.h +++ b/include/drm/drmP.h @@ -1454,7 +1454,6 @@ extern void drm_master_put(struct drm_master **master); extern void drm_put_dev(struct drm_device *dev); extern void drm_unplug_dev(struct drm_device *dev); extern unsigned int drm_debug; -extern unsigned int drm_rnodes; extern unsigned int drm_vblank_offdelay; extern unsigned int drm_timestamp_precision;