diff mbox

[v2] drm: enable render-nodes by default

Message ID 1395074636-32337-1-git-send-email-dh.herrmann@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

David Herrmann March 17, 2014, 4:43 p.m. UTC
We introduced render-nodes about 1/2 year ago and no problems showed up.
Remove the drm_rnodes argument and enable them by default now.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
v2:
 - rebased on drm-next

 drivers/gpu/drm/drm_stub.c | 7 +------
 include/drm/drmP.h         | 1 -
 2 files changed, 1 insertion(+), 7 deletions(-)

Comments

Thomas Hellström (VMware) March 20, 2014, 6:43 a.m. UTC | #1
Hi!

On 03/17/2014 05:43 PM, David Herrmann wrote:
> We introduced render-nodes about 1/2 year ago and no problems showed up.
> Remove the drm_rnodes argument and enable them by default now.

So what about the malicious execbuf command stream problem? Do we
require all drivers that enable
render-nodes to have a mechanism to prevent this in place?

/Thomas
David Herrmann March 20, 2014, 7:36 a.m. UTC | #2
Hi

On Thu, Mar 20, 2014 at 7:43 AM, Thomas Hellstrom <thomas@shipmail.org> wrote:
> On 03/17/2014 05:43 PM, David Herrmann wrote:
>> We introduced render-nodes about 1/2 year ago and no problems showed up.
>> Remove the drm_rnodes argument and enable them by default now.
>
> So what about the malicious execbuf command stream problem? Do we
> require all drivers that enable
> render-nodes to have a mechanism to prevent this in place?

No, that's no requirement. Render-nodes provide a secure API, if the
underlying driver does no command-stream validation (I guess for
performance-reasons and lack of VM), it's an implementation detail,
not an API. Furthermore, you can always set higher restrictions on the
render-node char-dev in case this bothers you.

Cheers
David
Thomas Hellstrom March 20, 2014, 8:48 a.m. UTC | #3
On 03/20/2014 08:36 AM, David Herrmann wrote:
> Hi
>
> On Thu, Mar 20, 2014 at 7:43 AM, Thomas Hellstrom <thomas@shipmail.org> wrote:
>> On 03/17/2014 05:43 PM, David Herrmann wrote:
>>> We introduced render-nodes about 1/2 year ago and no problems showed up.
>>> Remove the drm_rnodes argument and enable them by default now.
>> So what about the malicious execbuf command stream problem? Do we
>> require all drivers that enable
>> render-nodes to have a mechanism to prevent this in place?
> No, that's no requirement. Render-nodes provide a secure API, if the
> underlying driver does no command-stream validation (I guess for
> performance-reasons and lack of VM), it's an implementation detail,
> not an API. Furthermore, you can always set higher restrictions on the
> render-node char-dev in case this bothers you.

I'm merely trying to envision the situation where a distro wants to
create, for example an udev rule for the render nodes.

How should the distro know that the implementation is not insecure?

Historically drm has refused to upstream drivers without a proper
command validation mechanism in place (via for example),
but that validation mechanism only needed to make sure no random system
memory was ever accessible to an authenticated DRM client.

Now, render nodes are designed to provide also user data isolation. But
if we allow insecure implementations, wouldn't that compromise the whole
idea?
Now that we have a secure API within reach, wouldn't it be reasonable to
require implementations to also be secure, following earlier DRM practices?

Or am I missing something?

/Thomas


>
> Cheers
> David
David Herrmann March 20, 2014, 9:05 a.m. UTC | #4
Hi Thomas

On Thu, Mar 20, 2014 at 9:48 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
> I'm merely trying to envision the situation where a distro wants to
> create, for example an udev rule for the render nodes.
>
> How should the distro know that the implementation is not insecure?
>
> Historically drm has refused to upstream drivers without a proper
> command validation mechanism in place (via for example),
> but that validation mechanism only needed to make sure no random system
> memory was ever accessible to an authenticated DRM client.

I expect all drivers to protect system-memory. All that I am talking
about is one exec-buffer writing to memory of another _GPU_ buffer
that this client doesn't have access to. But maybe driver authors can
give some insights. I'd even expect non-texture/data GPU buffers to be
protected.

> Now, render nodes are designed to provide also user data isolation. But
> if we allow insecure implementations, wouldn't that compromise the whole
> idea?
> Now that we have a secure API within reach, wouldn't it be reasonable to
> require implementations to also be secure, following earlier DRM practices?

I don't understand what this has to do with render-nodes? The API _is_
secure. What would be the gain of disabling render-nodes for broken
(even deliberately) drivers? User-space is not going to assume drivers
are broken and rely on hand-crafted exec-buffers that cross buffer
boundaries. So yes, we're fooling our selves with API guarantees that
drivers cannot deliver, but what's the downside?

Thanks
David
Thomas Hellstrom March 20, 2014, 9:27 a.m. UTC | #5
On 03/20/2014 10:05 AM, David Herrmann wrote:
> Hi Thomas
>
> On Thu, Mar 20, 2014 at 9:48 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
>> I'm merely trying to envision the situation where a distro wants to
>> create, for example an udev rule for the render nodes.
>>
>> How should the distro know that the implementation is not insecure?
>>
>> Historically drm has refused to upstream drivers without a proper
>> command validation mechanism in place (via for example),
>> but that validation mechanism only needed to make sure no random system
>> memory was ever accessible to an authenticated DRM client.
> I expect all drivers to protect system-memory. All that I am talking
> about is one exec-buffer writing to memory of another _GPU_ buffer
> that this client doesn't have access to. But maybe driver authors can
> give some insights. I'd even expect non-texture/data GPU buffers to be
> protected.
>
>> Now, render nodes are designed to provide also user data isolation. But
>> if we allow insecure implementations, wouldn't that compromise the whole
>> idea?
>> Now that we have a secure API within reach, wouldn't it be reasonable to
>> require implementations to also be secure, following earlier DRM practices?
> I don't understand what this has to do with render-nodes? The API _is_
> secure. What would be the gain of disabling render-nodes for broken
> (even deliberately) drivers? User-space is not going to assume drivers
> are broken and rely on hand-crafted exec-buffers that cross buffer
> boundaries. So yes, we're fooling our selves with API guarantees that
> drivers cannot deliver, but what's the downside?
>
> Thanks
> David

OK, let me give one example:

A user logs in to a system where DRI clients use render nodes. The
system grants rw permission on the render nodes for the console user. 
User starts editing a secret document, starts some GPGPU structural FEM
computations of  the Pentagon building. Locks the screen and goes for lunch.

A malicious user logs in using fast user switching and becomes the owner
of the render node. Tries to map a couple of random offsets, but that
fails, due to security checks. Now crafts a malicious command stream to
dump all GPU memory to a file. Steals the first user's secret document
and the intermediate Pentagon data. Logs out and starts data mining.

Now if we require drivers to block these malicious command streams this
can never happen, and distros can reliably grant rw access to the render
nodes to the user currently logged into the console.

I guest basically what I'm trying to say that with the legacy concept it
was OK to access all GPU memory, because an authenticated X user
basically had the same permissions.

With render nodes we're allowing multiple users into the GPU at the same
time, and it's not OK anymore for a client to access another clients GPU
buffer through a malicious command stream.

/Thomas
David Herrmann March 20, 2014, 9:43 a.m. UTC | #6
Hi

On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
<thellstrom@vmware.com> wrote:
> A user logs in to a system where DRI clients use render nodes. The
> system grants rw permission on the render nodes for the console user.
> User starts editing a secret document, starts some GPGPU structural FEM
> computations of  the Pentagon building. Locks the screen and goes for lunch.
>
> A malicious user logs in using fast user switching and becomes the owner
> of the render node. Tries to map a couple of random offsets, but that
> fails, due to security checks. Now crafts a malicious command stream to
> dump all GPU memory to a file. Steals the first user's secret document
> and the intermediate Pentagon data. Logs out and starts data mining.
>
> Now if we require drivers to block these malicious command streams this
> can never happen, and distros can reliably grant rw access to the render
> nodes to the user currently logged into the console.
>
> I guest basically what I'm trying to say that with the legacy concept it
> was OK to access all GPU memory, because an authenticated X user
> basically had the same permissions.
>
> With render nodes we're allowing multiple users into the GPU at the same
> time, and it's not OK anymore for a client to access another clients GPU
> buffer through a malicious command stream.

Yes, I understand the attack scenario, but that's not related to
render-nodes at all. The exact same races exist on the legacy node:

1) If you can do fast-user switching, you can spawn your own X-server,
get authenticated on your own server and you are allowed into the GPU.
You cannot map other user's buffers because they're on a different
master-object, but you _can_ craft malicious GPU streams and access
the other user's buffer.

2) If you can do fast-user switching, switch to an empty VT, open the
legacy node and you automatically become DRM-Master because there is
no active master. Now you can do anything on the DRM node, including
crafting malicious GPU streams.

Given that the legacy node is always around and _always_ has these
races, why should we prevent render-nodes from appearing just because
the _driver_ is racy? I mean, there is no gain in that.. if it opens a
new race, as you assumed, then yes, we should avoid it. But at least
for all drivers supporting render-nodes so far, they either are
entirely safe or the just described races exist on both nodes.

Thanks
David
Thomas Hellstrom March 20, 2014, 10:28 a.m. UTC | #7
On 03/20/2014 10:43 AM, David Herrmann wrote:
> Hi
>
> On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
> <thellstrom@vmware.com> wrote:
>> A user logs in to a system where DRI clients use render nodes. The
>> system grants rw permission on the render nodes for the console user.
>> User starts editing a secret document, starts some GPGPU structural FEM
>> computations of  the Pentagon building. Locks the screen and goes for lunch.
>>
>> A malicious user logs in using fast user switching and becomes the owner
>> of the render node. Tries to map a couple of random offsets, but that
>> fails, due to security checks. Now crafts a malicious command stream to
>> dump all GPU memory to a file. Steals the first user's secret document
>> and the intermediate Pentagon data. Logs out and starts data mining.
>>
>> Now if we require drivers to block these malicious command streams this
>> can never happen, and distros can reliably grant rw access to the render
>> nodes to the user currently logged into the console.
>>
>> I guest basically what I'm trying to say that with the legacy concept it
>> was OK to access all GPU memory, because an authenticated X user
>> basically had the same permissions.
>>
>> With render nodes we're allowing multiple users into the GPU at the same
>> time, and it's not OK anymore for a client to access another clients GPU
>> buffer through a malicious command stream.
> Yes, I understand the attack scenario, but that's not related to
> render-nodes at all. The exact same races exist on the legacy node:

I was under the impression that render nodes were designed to fix these
issues?

>
> 1) If you can do fast-user switching, you can spawn your own X-server,
> get authenticated on your own server and you are allowed into the GPU.
> You cannot map other user's buffers because they're on a different
> master-object, but you _can_ craft malicious GPU streams and access
> the other user's buffer.

But with legacy nodes, drivers can (and should IMO) throw out all data
from GPU memory on master drop,
and then block dropped master authenticated clients from GPU, until
their master becomes active again or dies (in which case they are
killed). In line with a previous discussion we had. Now you can't do
this with render nodes, so yes they do open up
a new race that requires command stream validation.

>
> 2) If you can do fast-user switching, switch to an empty VT, open the
> legacy node and you automatically become DRM-Master because there is
> no active master. Now you can do anything on the DRM node, including
> crafting malicious GPU streams.

I believe the above solution should work for this case as well.

>
> Given that the legacy node is always around and _always_ has these
> races, why should we prevent render-nodes from appearing just because
> the _driver_ is racy? I mean, there is no gain in that.. if it opens a
> new race, as you assumed, then yes, we should avoid it. But at least
> for all drivers supporting render-nodes so far, they either are
> entirely safe or the just described races exist on both nodes.

My suggestion is actually not to prevent render nodes from appearing,
but rather that we should restrict them to drivers with command stream
verification and / or per process virtual memory, and I also think we
should plug the above races on legacy nodes. That way legacy nodes would
use the old "master owns it all" model, while render nodes could allow
multiple users at the same time.


/Thomas
Jerome Glisse March 20, 2014, 2:36 p.m. UTC | #8
On Thu, Mar 20, 2014 at 11:28:18AM +0100, Thomas Hellstrom wrote:
> On 03/20/2014 10:43 AM, David Herrmann wrote:
> > Hi
> >
> > On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
> > <thellstrom@vmware.com> wrote:
> >> A user logs in to a system where DRI clients use render nodes. The
> >> system grants rw permission on the render nodes for the console user.
> >> User starts editing a secret document, starts some GPGPU structural FEM
> >> computations of  the Pentagon building. Locks the screen and goes for lunch.
> >>
> >> A malicious user logs in using fast user switching and becomes the owner
> >> of the render node. Tries to map a couple of random offsets, but that
> >> fails, due to security checks. Now crafts a malicious command stream to
> >> dump all GPU memory to a file. Steals the first user's secret document
> >> and the intermediate Pentagon data. Logs out and starts data mining.
> >>
> >> Now if we require drivers to block these malicious command streams this
> >> can never happen, and distros can reliably grant rw access to the render
> >> nodes to the user currently logged into the console.
> >>
> >> I guest basically what I'm trying to say that with the legacy concept it
> >> was OK to access all GPU memory, because an authenticated X user
> >> basically had the same permissions.
> >>
> >> With render nodes we're allowing multiple users into the GPU at the same
> >> time, and it's not OK anymore for a client to access another clients GPU
> >> buffer through a malicious command stream.
> > Yes, I understand the attack scenario, but that's not related to
> > render-nodes at all. The exact same races exist on the legacy node:
> 
> I was under the impression that render nodes were designed to fix these
> issues?
> 
> >
> > 1) If you can do fast-user switching, you can spawn your own X-server,
> > get authenticated on your own server and you are allowed into the GPU.
> > You cannot map other user's buffers because they're on a different
> > master-object, but you _can_ craft malicious GPU streams and access
> > the other user's buffer.
> 
> But with legacy nodes, drivers can (and should IMO) throw out all data
> from GPU memory on master drop,
> and then block dropped master authenticated clients from GPU, until
> their master becomes active again or dies (in which case they are
> killed). In line with a previous discussion we had. Now you can't do
> this with render nodes, so yes they do open up
> a new race that requires command stream validation.
> 
> >
> > 2) If you can do fast-user switching, switch to an empty VT, open the
> > legacy node and you automatically become DRM-Master because there is
> > no active master. Now you can do anything on the DRM node, including
> > crafting malicious GPU streams.
> 
> I believe the above solution should work for this case as well.
> 
> >
> > Given that the legacy node is always around and _always_ has these
> > races, why should we prevent render-nodes from appearing just because
> > the _driver_ is racy? I mean, there is no gain in that.. if it opens a
> > new race, as you assumed, then yes, we should avoid it. But at least
> > for all drivers supporting render-nodes so far, they either are
> > entirely safe or the just described races exist on both nodes.
> 
> My suggestion is actually not to prevent render nodes from appearing,
> but rather that we should restrict them to drivers with command stream
> verification and / or per process virtual memory, and I also think we
> should plug the above races on legacy nodes. That way legacy nodes would
> use the old "master owns it all" model, while render nodes could allow
> multiple users at the same time.

FYI both radeon and nouveau (last time i check) can not be abuse to access
random VRAM or buffer bind by another process (except if the buffer is share
of course).

So there is no way (modulo any bug in command stream checking or hardware)
to abuse render node to access any memory that you do not have the right to
access.

I am pretty sure this stands for Intel too, i do not know if others drm
driver have this kind of assurance.

In any case this is not a render node issue and there is no reasons to force
full VRAM eviction or anything like that.

Cheers,
Jérôme
Ilia Mirkin March 20, 2014, 2:44 p.m. UTC | #9
On Thu, Mar 20, 2014 at 10:36 AM, Jerome Glisse <j.glisse@gmail.com> wrote:
> On Thu, Mar 20, 2014 at 11:28:18AM +0100, Thomas Hellstrom wrote:
>> On 03/20/2014 10:43 AM, David Herrmann wrote:
>> > On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
>> > <thellstrom@vmware.com> wrote:
>> > Given that the legacy node is always around and _always_ has these
>> > races, why should we prevent render-nodes from appearing just because
>> > the _driver_ is racy? I mean, there is no gain in that.. if it opens a
>> > new race, as you assumed, then yes, we should avoid it. But at least
>> > for all drivers supporting render-nodes so far, they either are
>> > entirely safe or the just described races exist on both nodes.
>>
>> My suggestion is actually not to prevent render nodes from appearing,
>> but rather that we should restrict them to drivers with command stream
>> verification and / or per process virtual memory, and I also think we
>> should plug the above races on legacy nodes. That way legacy nodes would
>> use the old "master owns it all" model, while render nodes could allow
>> multiple users at the same time.
>
> FYI both radeon and nouveau (last time i check) can not be abuse to access
> random VRAM or buffer bind by another process (except if the buffer is share
> of course).

I'm not aware of any restrictions in nouveau that would prevent you
from accessing any vram... there are a number of copy engines
accessible that would allow you to copy one region to another, and I'm
not aware of code to validate pushbuf contents (it would have to parse
the pushbuf and know which methods store source/destination
addresses). You could probably get creative and use that to overwrite
the MMU tables, to then gain the ability to read/write any part of
system memory... but perhaps the engines won't allow you to do that,
not sure. (Or perhaps the PDE isn't mapped into VRAM. Again, not
sure.)

  -ilia
Thomas Hellstrom March 20, 2014, 2:59 p.m. UTC | #10
On 03/20/2014 03:36 PM, Jerome Glisse wrote:
>
> In any case this is not a render node issue and there is no reasons to force
> full VRAM eviction or anything like that.

This comment suggests you haven't read the discussion fully.

VRAM eviction was discussed in the context of legacy nodes.

This is a render node issue because with legacy nodes you can work
around insufficient command checking.
With render nodes you can't.

/Thomas

> Cheers,
> Jérôme
Jerome Glisse March 20, 2014, 3:34 p.m. UTC | #11
On Thu, Mar 20, 2014 at 03:59:17PM +0100, Thomas Hellstrom wrote:
> On 03/20/2014 03:36 PM, Jerome Glisse wrote:
> >
> > In any case this is not a render node issue and there is no reasons to force
> > full VRAM eviction or anything like that.
> 
> This comment suggests you haven't read the discussion fully.
> 
> VRAM eviction was discussed in the context of legacy nodes.
> 
> This is a render node issue because with legacy nodes you can work
> around insufficient command checking.
> With render nodes you can't.

On radeon you can not abuse the GPU period legacy node or not. My comment
was about the fact that this is a driver issue and not a render node issue.
I would consider driver that allow to abuse the GPU block to access any
memory as broken no matter if we are talking about legacy or new render
node.

Cheers,
Jérôme


> 
> /Thomas
> 
> > Cheers,
> > Jérôme
Jerome Glisse March 20, 2014, 3:35 p.m. UTC | #12
On Thu, Mar 20, 2014 at 10:44:10AM -0400, Ilia Mirkin wrote:
> On Thu, Mar 20, 2014 at 10:36 AM, Jerome Glisse <j.glisse@gmail.com> wrote:
> > On Thu, Mar 20, 2014 at 11:28:18AM +0100, Thomas Hellstrom wrote:
> >> On 03/20/2014 10:43 AM, David Herrmann wrote:
> >> > On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
> >> > <thellstrom@vmware.com> wrote:
> >> > Given that the legacy node is always around and _always_ has these
> >> > races, why should we prevent render-nodes from appearing just because
> >> > the _driver_ is racy? I mean, there is no gain in that.. if it opens a
> >> > new race, as you assumed, then yes, we should avoid it. But at least
> >> > for all drivers supporting render-nodes so far, they either are
> >> > entirely safe or the just described races exist on both nodes.
> >>
> >> My suggestion is actually not to prevent render nodes from appearing,
> >> but rather that we should restrict them to drivers with command stream
> >> verification and / or per process virtual memory, and I also think we
> >> should plug the above races on legacy nodes. That way legacy nodes would
> >> use the old "master owns it all" model, while render nodes could allow
> >> multiple users at the same time.
> >
> > FYI both radeon and nouveau (last time i check) can not be abuse to access
> > random VRAM or buffer bind by another process (except if the buffer is share
> > of course).
> 
> I'm not aware of any restrictions in nouveau that would prevent you
> from accessing any vram... there are a number of copy engines
> accessible that would allow you to copy one region to another, and I'm
> not aware of code to validate pushbuf contents (it would have to parse
> the pushbuf and know which methods store source/destination
> addresses). You could probably get creative and use that to overwrite
> the MMU tables, to then gain the ability to read/write any part of
> system memory... but perhaps the engines won't allow you to do that,
> not sure. (Or perhaps the PDE isn't mapped into VRAM. Again, not
> sure.)
> 
>   -ilia

Well i obviously have not look at that in long time, but if the nouveau
API can be abuse than i consider this a broken by design. I know command
checking is painfull and CPU intensive but we have done it in radeon for
a reason.

Cheers,
Jérôme
Thomas Hellström (VMware) March 20, 2014, 3:49 p.m. UTC | #13
On 03/20/2014 04:34 PM, Jerome Glisse wrote:
> On Thu, Mar 20, 2014 at 03:59:17PM +0100, Thomas Hellstrom wrote:
>> On 03/20/2014 03:36 PM, Jerome Glisse wrote:
>>> In any case this is not a render node issue and there is no reasons to force
>>> full VRAM eviction or anything like that.
>> This comment suggests you haven't read the discussion fully.
>>
>> VRAM eviction was discussed in the context of legacy nodes.
>>
>> This is a render node issue because with legacy nodes you can work
>> around insufficient command checking.
>> With render nodes you can't.
> On radeon you can not abuse the GPU period legacy node or not. My comment
> was about the fact that this is a driver issue and not a render node issue.
> I would consider driver that allow to abuse the GPU block to access any
> memory as broken no matter if we are talking about legacy or new render
> node.
>
> Cheers,
> Jérôme
>

Great. Then I assume you don't have an issue with not enabling
render-nodes for those broken drivers,
(or at least a sysfs property or something similar flagging those device
nodes as broken)?

Thanks,
Thomas
Jerome Glisse March 20, 2014, 5:04 p.m. UTC | #14
On Thu, Mar 20, 2014 at 04:49:42PM +0100, Thomas Hellstrom wrote:
> On 03/20/2014 04:34 PM, Jerome Glisse wrote:
> > On Thu, Mar 20, 2014 at 03:59:17PM +0100, Thomas Hellstrom wrote:
> >> On 03/20/2014 03:36 PM, Jerome Glisse wrote:
> >>> In any case this is not a render node issue and there is no reasons to force
> >>> full VRAM eviction or anything like that.
> >> This comment suggests you haven't read the discussion fully.
> >>
> >> VRAM eviction was discussed in the context of legacy nodes.
> >>
> >> This is a render node issue because with legacy nodes you can work
> >> around insufficient command checking.
> >> With render nodes you can't.
> > On radeon you can not abuse the GPU period legacy node or not. My comment
> > was about the fact that this is a driver issue and not a render node issue.
> > I would consider driver that allow to abuse the GPU block to access any
> > memory as broken no matter if we are talking about legacy or new render
> > node.
> >
> > Cheers,
> > Jérôme
> >
> 
> Great. Then I assume you don't have an issue with not enabling
> render-nodes for those broken drivers,
> (or at least a sysfs property or something similar flagging those device
> nodes as broken)?
> 
> Thanks,
> Thomas
> 
> 

Yes, broken driver should not have render node, at leadt in my view.

Cheers,
Jérôme
Rob Clark March 20, 2014, 5:34 p.m. UTC | #15
On Thu, Mar 20, 2014 at 6:28 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
> On 03/20/2014 10:43 AM, David Herrmann wrote:
>> Hi
>>
>> On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
>> <thellstrom@vmware.com> wrote:
>>> A user logs in to a system where DRI clients use render nodes. The
>>> system grants rw permission on the render nodes for the console user.
>>> User starts editing a secret document, starts some GPGPU structural FEM
>>> computations of  the Pentagon building. Locks the screen and goes for lunch.
>>>
>>> A malicious user logs in using fast user switching and becomes the owner
>>> of the render node. Tries to map a couple of random offsets, but that
>>> fails, due to security checks. Now crafts a malicious command stream to
>>> dump all GPU memory to a file. Steals the first user's secret document
>>> and the intermediate Pentagon data. Logs out and starts data mining.
>>>
>>> Now if we require drivers to block these malicious command streams this
>>> can never happen, and distros can reliably grant rw access to the render
>>> nodes to the user currently logged into the console.
>>>
>>> I guest basically what I'm trying to say that with the legacy concept it
>>> was OK to access all GPU memory, because an authenticated X user
>>> basically had the same permissions.
>>>
>>> With render nodes we're allowing multiple users into the GPU at the same
>>> time, and it's not OK anymore for a client to access another clients GPU
>>> buffer through a malicious command stream.
>> Yes, I understand the attack scenario, but that's not related to
>> render-nodes at all. The exact same races exist on the legacy node:
>
> I was under the impression that render nodes were designed to fix these
> issues?
>
>>
>> 1) If you can do fast-user switching, you can spawn your own X-server,
>> get authenticated on your own server and you are allowed into the GPU.
>> You cannot map other user's buffers because they're on a different
>> master-object, but you _can_ craft malicious GPU streams and access
>> the other user's buffer.
>
> But with legacy nodes, drivers can (and should IMO) throw out all data
> from GPU memory on master drop,
> and then block dropped master authenticated clients from GPU, until
> their master becomes active again or dies (in which case they are
> killed). In line with a previous discussion we had. Now you can't do
> this with render nodes, so yes they do open up
> a new race that requires command stream validation.
>
>>
>> 2) If you can do fast-user switching, switch to an empty VT, open the
>> legacy node and you automatically become DRM-Master because there is
>> no active master. Now you can do anything on the DRM node, including
>> crafting malicious GPU streams.
>
> I believe the above solution should work for this case as well.
>
>>
>> Given that the legacy node is always around and _always_ has these
>> races, why should we prevent render-nodes from appearing just because
>> the _driver_ is racy? I mean, there is no gain in that.. if it opens a
>> new race, as you assumed, then yes, we should avoid it. But at least
>> for all drivers supporting render-nodes so far, they either are
>> entirely safe or the just described races exist on both nodes.
>
> My suggestion is actually not to prevent render nodes from appearing,
> but rather that we should restrict them to drivers with command stream
> verification and / or per process virtual memory, and I also think we
> should plug the above races on legacy nodes. That way legacy nodes would
> use the old "master owns it all" model, while render nodes could allow
> multiple users at the same time.
>

hmm, if you only have global gpu virtual memory (rather than
per-process), it would still be kinda nice to support render nodes if
app had some way to figure out whether or not it's gpu buffers were
secure.  Ie. an app that was using the gpu for something secure could
simply choose not to if the hw/driver could not guarantee that another
process using the gpu could not get access to the buffers..

BR,
-R

>
> /Thomas
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
Ilia Mirkin March 20, 2014, 5:39 p.m. UTC | #16
On Thu, Mar 20, 2014 at 10:44 AM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> On Thu, Mar 20, 2014 at 10:36 AM, Jerome Glisse <j.glisse@gmail.com> wrote:
>> On Thu, Mar 20, 2014 at 11:28:18AM +0100, Thomas Hellstrom wrote:
>>> On 03/20/2014 10:43 AM, David Herrmann wrote:
>>> > On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
>>> > <thellstrom@vmware.com> wrote:
>>> > Given that the legacy node is always around and _always_ has these
>>> > races, why should we prevent render-nodes from appearing just because
>>> > the _driver_ is racy? I mean, there is no gain in that.. if it opens a
>>> > new race, as you assumed, then yes, we should avoid it. But at least
>>> > for all drivers supporting render-nodes so far, they either are
>>> > entirely safe or the just described races exist on both nodes.
>>>
>>> My suggestion is actually not to prevent render nodes from appearing,
>>> but rather that we should restrict them to drivers with command stream
>>> verification and / or per process virtual memory, and I also think we
>>> should plug the above races on legacy nodes. That way legacy nodes would
>>> use the old "master owns it all" model, while render nodes could allow
>>> multiple users at the same time.
>>
>> FYI both radeon and nouveau (last time i check) can not be abuse to access
>> random VRAM or buffer bind by another process (except if the buffer is share
>> of course).
>
> I'm not aware of any restrictions in nouveau that would prevent you
> from accessing any vram... there are a number of copy engines
> accessible that would allow you to copy one region to another, and I'm
> not aware of code to validate pushbuf contents (it would have to parse
> the pushbuf and know which methods store source/destination
> addresses). You could probably get creative and use that to overwrite
> the MMU tables, to then gain the ability to read/write any part of
> system memory... but perhaps the engines won't allow you to do that,
> not sure. (Or perhaps the PDE isn't mapped into VRAM. Again, not
> sure.)

Please ignore this totally wrong comment. While there is indeed no
pushbuf validation, every process (well, every open() of the drm node)
gets its own virtual memory space for all GPU's with MMU's (nv50+).

  -ilia
Thomas Hellstrom March 20, 2014, 8:54 p.m. UTC | #17
On 03/20/2014 06:34 PM, Rob Clark wrote:
> On Thu, Mar 20, 2014 at 6:28 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
>> On 03/20/2014 10:43 AM, David Herrmann wrote:
>>> Hi
>>>
>>> On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
>>> <thellstrom@vmware.com> wrote:
>>>> A user logs in to a system where DRI clients use render nodes. The
>>>> system grants rw permission on the render nodes for the console user.
>>>> User starts editing a secret document, starts some GPGPU structural FEM
>>>> computations of  the Pentagon building. Locks the screen and goes for lunch.
>>>>
>>>> A malicious user logs in using fast user switching and becomes the owner
>>>> of the render node. Tries to map a couple of random offsets, but that
>>>> fails, due to security checks. Now crafts a malicious command stream to
>>>> dump all GPU memory to a file. Steals the first user's secret document
>>>> and the intermediate Pentagon data. Logs out and starts data mining.
>>>>
>>>> Now if we require drivers to block these malicious command streams this
>>>> can never happen, and distros can reliably grant rw access to the render
>>>> nodes to the user currently logged into the console.
>>>>
>>>> I guest basically what I'm trying to say that with the legacy concept it
>>>> was OK to access all GPU memory, because an authenticated X user
>>>> basically had the same permissions.
>>>>
>>>> With render nodes we're allowing multiple users into the GPU at the same
>>>> time, and it's not OK anymore for a client to access another clients GPU
>>>> buffer through a malicious command stream.
>>> Yes, I understand the attack scenario, but that's not related to
>>> render-nodes at all. The exact same races exist on the legacy node:
>> I was under the impression that render nodes were designed to fix these
>> issues?
>>
>>> 1) If you can do fast-user switching, you can spawn your own X-server,
>>> get authenticated on your own server and you are allowed into the GPU.
>>> You cannot map other user's buffers because they're on a different
>>> master-object, but you _can_ craft malicious GPU streams and access
>>> the other user's buffer.
>> But with legacy nodes, drivers can (and should IMO) throw out all data
>> from GPU memory on master drop,
>> and then block dropped master authenticated clients from GPU, until
>> their master becomes active again or dies (in which case they are
>> killed). In line with a previous discussion we had. Now you can't do
>> this with render nodes, so yes they do open up
>> a new race that requires command stream validation.
>>
>>> 2) If you can do fast-user switching, switch to an empty VT, open the
>>> legacy node and you automatically become DRM-Master because there is
>>> no active master. Now you can do anything on the DRM node, including
>>> crafting malicious GPU streams.
>> I believe the above solution should work for this case as well.
>>
>>> Given that the legacy node is always around and _always_ has these
>>> races, why should we prevent render-nodes from appearing just because
>>> the _driver_ is racy? I mean, there is no gain in that.. if it opens a
>>> new race, as you assumed, then yes, we should avoid it. But at least
>>> for all drivers supporting render-nodes so far, they either are
>>> entirely safe or the just described races exist on both nodes.
>> My suggestion is actually not to prevent render nodes from appearing,
>> but rather that we should restrict them to drivers with command stream
>> verification and / or per process virtual memory, and I also think we
>> should plug the above races on legacy nodes. That way legacy nodes would
>> use the old "master owns it all" model, while render nodes could allow
>> multiple users at the same time.
>>
> hmm, if you only have global gpu virtual memory (rather than
> per-process), it would still be kinda nice to support render nodes if
> app had some way to figure out whether or not it's gpu buffers were
> secure.

If there is only global GPU memory there's of course also the option of
providing a
command stream verifier.

> Ie. an app that was using the gpu for something secure could
> simply choose not to if the hw/driver could not guarantee that another
> process using the gpu could not get access to the buffers..

IMO that should work fine, but we need to provide a way for user-space
to determine whether
the render node is secure or not. Perhaps a sysfs attribute and / or a
drm getparam() parameter?

/Thomas


>
> BR,
> -R
>
>> /Thomas
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/dri-devel&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=FdvSEw9pF7e8FqQQ9dlatoXG%2BSm0ueWrIhyOeI%2BOc64%3D%0A&s=926ef90112ced9636ddbf2873c3770bdf06ca244ec32744fe33e478b93e0d695
Rob Clark March 20, 2014, 9:13 p.m. UTC | #18
On Thu, Mar 20, 2014 at 4:54 PM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
> On 03/20/2014 06:34 PM, Rob Clark wrote:
>> On Thu, Mar 20, 2014 at 6:28 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
>>> On 03/20/2014 10:43 AM, David Herrmann wrote:
>>>> Hi
>>>>
>>>> On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
>>>> <thellstrom@vmware.com> wrote:
>>>>> A user logs in to a system where DRI clients use render nodes. The
>>>>> system grants rw permission on the render nodes for the console user.
>>>>> User starts editing a secret document, starts some GPGPU structural FEM
>>>>> computations of  the Pentagon building. Locks the screen and goes for lunch.
>>>>>
>>>>> A malicious user logs in using fast user switching and becomes the owner
>>>>> of the render node. Tries to map a couple of random offsets, but that
>>>>> fails, due to security checks. Now crafts a malicious command stream to
>>>>> dump all GPU memory to a file. Steals the first user's secret document
>>>>> and the intermediate Pentagon data. Logs out and starts data mining.
>>>>>
>>>>> Now if we require drivers to block these malicious command streams this
>>>>> can never happen, and distros can reliably grant rw access to the render
>>>>> nodes to the user currently logged into the console.
>>>>>
>>>>> I guest basically what I'm trying to say that with the legacy concept it
>>>>> was OK to access all GPU memory, because an authenticated X user
>>>>> basically had the same permissions.
>>>>>
>>>>> With render nodes we're allowing multiple users into the GPU at the same
>>>>> time, and it's not OK anymore for a client to access another clients GPU
>>>>> buffer through a malicious command stream.
>>>> Yes, I understand the attack scenario, but that's not related to
>>>> render-nodes at all. The exact same races exist on the legacy node:
>>> I was under the impression that render nodes were designed to fix these
>>> issues?
>>>
>>>> 1) If you can do fast-user switching, you can spawn your own X-server,
>>>> get authenticated on your own server and you are allowed into the GPU.
>>>> You cannot map other user's buffers because they're on a different
>>>> master-object, but you _can_ craft malicious GPU streams and access
>>>> the other user's buffer.
>>> But with legacy nodes, drivers can (and should IMO) throw out all data
>>> from GPU memory on master drop,
>>> and then block dropped master authenticated clients from GPU, until
>>> their master becomes active again or dies (in which case they are
>>> killed). In line with a previous discussion we had. Now you can't do
>>> this with render nodes, so yes they do open up
>>> a new race that requires command stream validation.
>>>
>>>> 2) If you can do fast-user switching, switch to an empty VT, open the
>>>> legacy node and you automatically become DRM-Master because there is
>>>> no active master. Now you can do anything on the DRM node, including
>>>> crafting malicious GPU streams.
>>> I believe the above solution should work for this case as well.
>>>
>>>> Given that the legacy node is always around and _always_ has these
>>>> races, why should we prevent render-nodes from appearing just because
>>>> the _driver_ is racy? I mean, there is no gain in that.. if it opens a
>>>> new race, as you assumed, then yes, we should avoid it. But at least
>>>> for all drivers supporting render-nodes so far, they either are
>>>> entirely safe or the just described races exist on both nodes.
>>> My suggestion is actually not to prevent render nodes from appearing,
>>> but rather that we should restrict them to drivers with command stream
>>> verification and / or per process virtual memory, and I also think we
>>> should plug the above races on legacy nodes. That way legacy nodes would
>>> use the old "master owns it all" model, while render nodes could allow
>>> multiple users at the same time.
>>>
>> hmm, if you only have global gpu virtual memory (rather than
>> per-process), it would still be kinda nice to support render nodes if
>> app had some way to figure out whether or not it's gpu buffers were
>> secure.
>
> If there is only global GPU memory there's of course also the option of
> providing a
> command stream verifier.

well, that wouldn't really help separate buffers from other contexts
(since a3xx and later has various load/store instructions)..

At the moment, I have two cases:

1) MMU... gpu has no direct access to system memory, so other than
access to other contexts buffers, the system is secure
2) no-MMU... vram carveout and set some registers to limit access to
addresses within that range

In case #1 we could implement per-context vm.. just a matter of
writing some code.  Doing it the naive way requires draining the
command queue on context switches and getting the CPU involved, which
isn't so terribly awesome.

The downstream android driver does context switches on the CP itself
(ie. bang some registers to point the MMU at new set of tables and TLB
flush)..  but it needs to be confirmed that this can be done securely
(ie. restricted to ringbuffer controlled by kernel and not IB buffer
from userspace).  If it isn't restricted to kernel ringbuffer, then
that sort of opens up an even bigger hole than it closes ;-)

>> Ie. an app that was using the gpu for something secure could
>> simply choose not to if the hw/driver could not guarantee that another
>> process using the gpu could not get access to the buffers..
>
> IMO that should work fine, but we need to provide a way for user-space
> to determine whether
> the render node is secure or not. Perhaps a sysfs attribute and / or a
> drm getparam() parameter?

I'd *assume* that that sort of thing would just be some sort of CL extension?

But no objection to exposing it in a more common way.

I guess it is also an option to keep the bootarg to override default
(with the default being 'enabled' for hw w/ per-context/process vm and
'disabled' otherwise).

BR,
-R

> /Thomas
>
>
>>
>> BR,
>> -R
>>
>>> /Thomas
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/dri-devel&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=FdvSEw9pF7e8FqQQ9dlatoXG%2BSm0ueWrIhyOeI%2BOc64%3D%0A&s=926ef90112ced9636ddbf2873c3770bdf06ca244ec32744fe33e478b93e0d695
Daniel Vetter March 21, 2014, 7:10 a.m. UTC | #19
On Thu, Mar 20, 2014 at 10:13 PM, Rob Clark <robdclark@gmail.com> wrote:
>>> Ie. an app that was using the gpu for something secure could
>>> simply choose not to if the hw/driver could not guarantee that another
>>> process using the gpu could not get access to the buffers..
>>
>> IMO that should work fine, but we need to provide a way for user-space
>> to determine whether
>> the render node is secure or not. Perhaps a sysfs attribute and / or a
>> drm getparam() parameter?
>
> I'd *assume* that that sort of thing would just be some sort of CL extension?
>
> But no objection to exposing it in a more common way.
>
> I guess it is also an option to keep the bootarg to override default
> (with the default being 'enabled' for hw w/ per-context/process vm and
> 'disabled' otherwise).

Imo if we expose this through sysfs we should always enable
rendernodes. The udev scripts can then decide what permissions to set
on them.
-Daniel
Thomas Hellstrom March 21, 2014, 8:29 a.m. UTC | #20
On 03/21/2014 08:10 AM, Daniel Vetter wrote:
> On Thu, Mar 20, 2014 at 10:13 PM, Rob Clark <robdclark@gmail.com> wrote:
>>>> Ie. an app that was using the gpu for something secure could
>>>> simply choose not to if the hw/driver could not guarantee that another
>>>> process using the gpu could not get access to the buffers..
>>> IMO that should work fine, but we need to provide a way for user-space
>>> to determine whether
>>> the render node is secure or not. Perhaps a sysfs attribute and / or a
>>> drm getparam() parameter?
>> I'd *assume* that that sort of thing would just be some sort of CL extension?
>>
>> But no objection to exposing it in a more common way.
>>
>> I guess it is also an option to keep the bootarg to override default
>> (with the default being 'enabled' for hw w/ per-context/process vm and
>> 'disabled' otherwise).
> Imo if we expose this through sysfs we should always enable
> rendernodes. The udev scripts can then decide what permissions to set
> on them.

Agreed.
Thomas


> -Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/drm_stub.c b/drivers/gpu/drm/drm_stub.c
index c23eaf6..b6cf0b3 100644
--- a/drivers/gpu/drm/drm_stub.c
+++ b/drivers/gpu/drm/drm_stub.c
@@ -40,9 +40,6 @@ 
 unsigned int drm_debug = 0;	/* 1 to enable debug output */
 EXPORT_SYMBOL(drm_debug);
 
-unsigned int drm_rnodes = 0;	/* 1 to enable experimental render nodes API */
-EXPORT_SYMBOL(drm_rnodes);
-
 unsigned int drm_vblank_offdelay = 5000;    /* Default to 5000 msecs. */
 EXPORT_SYMBOL(drm_vblank_offdelay);
 
@@ -59,13 +56,11 @@  MODULE_AUTHOR(CORE_AUTHOR);
 MODULE_DESCRIPTION(CORE_DESC);
 MODULE_LICENSE("GPL and additional rights");
 MODULE_PARM_DESC(debug, "Enable debug output");
-MODULE_PARM_DESC(rnodes, "Enable experimental render nodes API");
 MODULE_PARM_DESC(vblankoffdelay, "Delay until vblank irq auto-disable [msecs]");
 MODULE_PARM_DESC(timestamp_precision_usec, "Max. error on timestamps [usecs]");
 MODULE_PARM_DESC(timestamp_monotonic, "Use monotonic timestamps");
 
 module_param_named(debug, drm_debug, int, 0600);
-module_param_named(rnodes, drm_rnodes, int, 0600);
 module_param_named(vblankoffdelay, drm_vblank_offdelay, int, 0600);
 module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, 0600);
 module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 0600);
@@ -505,7 +500,7 @@  struct drm_device *drm_dev_alloc(struct drm_driver *driver,
 			goto err_minors;
 	}
 
-	if (drm_core_check_feature(dev, DRIVER_RENDER) && drm_rnodes) {
+	if (drm_core_check_feature(dev, DRIVER_RENDER)) {
 		ret = drm_minor_alloc(dev, DRM_MINOR_RENDER);
 		if (ret)
 			goto err_minors;
diff --git a/include/drm/drmP.h b/include/drm/drmP.h
index 5380790..2d81c66 100644
--- a/include/drm/drmP.h
+++ b/include/drm/drmP.h
@@ -1454,7 +1454,6 @@  extern void drm_master_put(struct drm_master **master);
 extern void drm_put_dev(struct drm_device *dev);
 extern void drm_unplug_dev(struct drm_device *dev);
 extern unsigned int drm_debug;
-extern unsigned int drm_rnodes;
 
 extern unsigned int drm_vblank_offdelay;
 extern unsigned int drm_timestamp_precision;