[v2] drm: enable render-nodes by default

Message ID	1395074636-32337-1-git-send-email-dh.herrmann@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: David Herrmann <dh.herrmann@gmail.com> To: dri-devel@lists.freedesktop.org Subject: [PATCH v2] drm: enable render-nodes by default Date: Mon, 17 Mar 2014 17:43:56 +0100 Message-Id: <1395074636-32337-1-git-send-email-dh.herrmann@gmail.com> In-Reply-To: <1394977400-10465-1-git-send-email-dh.herrmann@gmail.com> References: <1394977400-10465-1-git-send-email-dh.herrmann@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

David Herrmann March 17, 2014, 4:43 p.m. UTC

We introduced render-nodes about 1/2 year ago and no problems showed up.
Remove the drm_rnodes argument and enable them by default now.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
v2:
 - rebased on drm-next

 drivers/gpu/drm/drm_stub.c | 7 +------
 include/drm/drmP.h         | 1 -
 2 files changed, 1 insertion(+), 7 deletions(-)

Thomas Hellström (VMware) March 20, 2014, 6:43 a.m. UTC | #1

Hi!

On 03/17/2014 05:43 PM, David Herrmann wrote:
> We introduced render-nodes about 1/2 year ago and no problems showed up.
> Remove the drm_rnodes argument and enable them by default now.

So what about the malicious execbuf command stream problem? Do we
require all drivers that enable
render-nodes to have a mechanism to prevent this in place?

/Thomas

David Herrmann March 20, 2014, 7:36 a.m. UTC | #2

Hi

On Thu, Mar 20, 2014 at 7:43 AM, Thomas Hellstrom <thomas@shipmail.org> wrote:
> On 03/17/2014 05:43 PM, David Herrmann wrote:
>> We introduced render-nodes about 1/2 year ago and no problems showed up.
>> Remove the drm_rnodes argument and enable them by default now.
>
> So what about the malicious execbuf command stream problem? Do we
> require all drivers that enable
> render-nodes to have a mechanism to prevent this in place?

No, that's no requirement. Render-nodes provide a secure API, if the
underlying driver does no command-stream validation (I guess for
performance-reasons and lack of VM), it's an implementation detail,
not an API. Furthermore, you can always set higher restrictions on the
render-node char-dev in case this bothers you.

Cheers
David

Thomas Hellstrom March 20, 2014, 8:48 a.m. UTC | #3

On 03/20/2014 08:36 AM, David Herrmann wrote:
> Hi
>
> On Thu, Mar 20, 2014 at 7:43 AM, Thomas Hellstrom <thomas@shipmail.org> wrote:
>> On 03/17/2014 05:43 PM, David Herrmann wrote:
>>> We introduced render-nodes about 1/2 year ago and no problems showed up.
>>> Remove the drm_rnodes argument and enable them by default now.
>> So what about the malicious execbuf command stream problem? Do we
>> require all drivers that enable
>> render-nodes to have a mechanism to prevent this in place?
> No, that's no requirement. Render-nodes provide a secure API, if the
> underlying driver does no command-stream validation (I guess for
> performance-reasons and lack of VM), it's an implementation detail,
> not an API. Furthermore, you can always set higher restrictions on the
> render-node char-dev in case this bothers you.

I'm merely trying to envision the situation where a distro wants to
create, for example an udev rule for the render nodes.

How should the distro know that the implementation is not insecure?

Historically drm has refused to upstream drivers without a proper
command validation mechanism in place (via for example),
but that validation mechanism only needed to make sure no random system
memory was ever accessible to an authenticated DRM client.

Now, render nodes are designed to provide also user data isolation. But
if we allow insecure implementations, wouldn't that compromise the whole
idea?
Now that we have a secure API within reach, wouldn't it be reasonable to
require implementations to also be secure, following earlier DRM practices?

Or am I missing something?

/Thomas

>
> Cheers
> David

David Herrmann March 20, 2014, 9:05 a.m. UTC | #4

Hi Thomas

On Thu, Mar 20, 2014 at 9:48 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
> I'm merely trying to envision the situation where a distro wants to
> create, for example an udev rule for the render nodes.
>
> How should the distro know that the implementation is not insecure?
>
> Historically drm has refused to upstream drivers without a proper
> command validation mechanism in place (via for example),
> but that validation mechanism only needed to make sure no random system
> memory was ever accessible to an authenticated DRM client.

I expect all drivers to protect system-memory. All that I am talking
about is one exec-buffer writing to memory of another _GPU_ buffer
that this client doesn't have access to. But maybe driver authors can
give some insights. I'd even expect non-texture/data GPU buffers to be
protected.

> Now, render nodes are designed to provide also user data isolation. But
> if we allow insecure implementations, wouldn't that compromise the whole
> idea?
> Now that we have a secure API within reach, wouldn't it be reasonable to
> require implementations to also be secure, following earlier DRM practices?

I don't understand what this has to do with render-nodes? The API _is_
secure. What would be the gain of disabling render-nodes for broken
(even deliberately) drivers? User-space is not going to assume drivers
are broken and rely on hand-crafted exec-buffers that cross buffer
boundaries. So yes, we're fooling our selves with API guarantees that
drivers cannot deliver, but what's the downside?

Thanks
David

Thomas Hellstrom March 20, 2014, 9:27 a.m. UTC | #5

On 03/20/2014 10:05 AM, David Herrmann wrote:
> Hi Thomas
>
> On Thu, Mar 20, 2014 at 9:48 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
>> I'm merely trying to envision the situation where a distro wants to
>> create, for example an udev rule for the render nodes.
>>
>> How should the distro know that the implementation is not insecure?
>>
>> Historically drm has refused to upstream drivers without a proper
>> command validation mechanism in place (via for example),
>> but that validation mechanism only needed to make sure no random system
>> memory was ever accessible to an authenticated DRM client.
> I expect all drivers to protect system-memory. All that I am talking
> about is one exec-buffer writing to memory of another _GPU_ buffer
> that this client doesn't have access to. But maybe driver authors can
> give some insights. I'd even expect non-texture/data GPU buffers to be
> protected.
>
>> Now, render nodes are designed to provide also user data isolation. But
>> if we allow insecure implementations, wouldn't that compromise the whole
>> idea?
>> Now that we have a secure API within reach, wouldn't it be reasonable to
>> require implementations to also be secure, following earlier DRM practices?
> I don't understand what this has to do with render-nodes? The API _is_
> secure. What would be the gain of disabling render-nodes for broken
> (even deliberately) drivers? User-space is not going to assume drivers
> are broken and rely on hand-crafted exec-buffers that cross buffer
> boundaries. So yes, we're fooling our selves with API guarantees that
> drivers cannot deliver, but what's the downside?
>
> Thanks
> David

OK, let me give one example:

A user logs in to a system where DRI clients use render nodes. The
system grants rw permission on the render nodes for the console user. 
User starts editing a secret document, starts some GPGPU structural FEM
computations of  the Pentagon building. Locks the screen and goes for lunch.

A malicious user logs in using fast user switching and becomes the owner
of the render node. Tries to map a couple of random offsets, but that
fails, due to security checks. Now crafts a malicious command stream to
dump all GPU memory to a file. Steals the first user's secret document
and the intermediate Pentagon data. Logs out and starts data mining.

Now if we require drivers to block these malicious command streams this
can never happen, and distros can reliably grant rw access to the render
nodes to the user currently logged into the console.

I guest basically what I'm trying to say that with the legacy concept it
was OK to access all GPU memory, because an authenticated X user
basically had the same permissions.

With render nodes we're allowing multiple users into the GPU at the same
time, and it's not OK anymore for a client to access another clients GPU
buffer through a malicious command stream.

/Thomas

David Herrmann March 20, 2014, 9:43 a.m. UTC | #6

Hi

On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
<thellstrom@vmware.com> wrote:
> A user logs in to a system where DRI clients use render nodes. The
> system grants rw permission on the render nodes for the console user.
> User starts editing a secret document, starts some GPGPU structural FEM
> computations of  the Pentagon building. Locks the screen and goes for lunch.
>
> A malicious user logs in using fast user switching and becomes the owner
> of the render node. Tries to map a couple of random offsets, but that
> fails, due to security checks. Now crafts a malicious command stream to
> dump all GPU memory to a file. Steals the first user's secret document
> and the intermediate Pentagon data. Logs out and starts data mining.
>
> Now if we require drivers to block these malicious command streams this
> can never happen, and distros can reliably grant rw access to the render
> nodes to the user currently logged into the console.
>
> I guest basically what I'm trying to say that with the legacy concept it
> was OK to access all GPU memory, because an authenticated X user
> basically had the same permissions.
>
> With render nodes we're allowing multiple users into the GPU at the same
> time, and it's not OK anymore for a client to access another clients GPU
> buffer through a malicious command stream.

Yes, I understand the attack scenario, but that's not related to
render-nodes at all. The exact same races exist on the legacy node:

1) If you can do fast-user switching, you can spawn your own X-server,
get authenticated on your own server and you are allowed into the GPU.
You cannot map other user's buffers because they're on a different
master-object, but you _can_ craft malicious GPU streams and access
the other user's buffer.

2) If you can do fast-user switching, switch to an empty VT, open the
legacy node and you automatically become DRM-Master because there is
no active master. Now you can do anything on the DRM node, including
crafting malicious GPU streams.

Given that the legacy node is always around and _always_ has these
races, why should we prevent render-nodes from appearing just because
the _driver_ is racy? I mean, there is no gain in that.. if it opens a
new race, as you assumed, then yes, we should avoid it. But at least
for all drivers supporting render-nodes so far, they either are
entirely safe or the just described races exist on both nodes.

Thanks
David

Thomas Hellstrom March 20, 2014, 10:28 a.m. UTC | #7

On 03/20/2014 10:43 AM, David Herrmann wrote:
> Hi
>
> On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
> <thellstrom@vmware.com> wrote:
>> A user logs in to a system where DRI clients use render nodes. The
>> system grants rw permission on the render nodes for the console user.
>> User starts editing a secret document, starts some GPGPU structural FEM
>> computations of  the Pentagon building. Locks the screen and goes for lunch.
>>
>> A malicious user logs in using fast user switching and becomes the owner
>> of the render node. Tries to map a couple of random offsets, but that
>> fails, due to security checks. Now crafts a malicious command stream to
>> dump all GPU memory to a file. Steals the first user's secret document
>> and the intermediate Pentagon data. Logs out and starts data mining.
>>
>> Now if we require drivers to block these malicious command streams this
>> can never happen, and distros can reliably grant rw access to the render
>> nodes to the user currently logged into the console.
>>
>> I guest basically what I'm trying to say that with the legacy concept it
>> was OK to access all GPU memory, because an authenticated X user
>> basically had the same permissions.
>>
>> With render nodes we're allowing multiple users into the GPU at the same
>> time, and it's not OK anymore for a client to access another clients GPU
>> buffer through a malicious command stream.
> Yes, I understand the attack scenario, but that's not related to
> render-nodes at all. The exact same races exist on the legacy node:

I was under the impression that render nodes were designed to fix these
issues?

>
> 1) If you can do fast-user switching, you can spawn your own X-server,
> get authenticated on your own server and you are allowed into the GPU.
> You cannot map other user's buffers because they're on a different
> master-object, but you _can_ craft malicious GPU streams and access
> the other user's buffer.

But with legacy nodes, drivers can (and should IMO) throw out all data
from GPU memory on master drop,
and then block dropped master authenticated clients from GPU, until
their master becomes active again or dies (in which case they are
killed). In line with a previous discussion we had. Now you can't do
this with render nodes, so yes they do open up
a new race that requires command stream validation.

>
> 2) If you can do fast-user switching, switch to an empty VT, open the
> legacy node and you automatically become DRM-Master because there is
> no active master. Now you can do anything on the DRM node, including
> crafting malicious GPU streams.

I believe the above solution should work for this case as well.

>
> Given that the legacy node is always around and _always_ has these
> races, why should we prevent render-nodes from appearing just because
> the _driver_ is racy? I mean, there is no gain in that.. if it opens a
> new race, as you assumed, then yes, we should avoid it. But at least
> for all drivers supporting render-nodes so far, they either are
> entirely safe or the just described races exist on both nodes.

My suggestion is actually not to prevent render nodes from appearing,
but rather that we should restrict them to drivers with command stream
verification and / or per process virtual memory, and I also think we
should plug the above races on legacy nodes. That way legacy nodes would
use the old "master owns it all" model, while render nodes could allow
multiple users at the same time.


/Thomas

Jerome Glisse March 20, 2014, 2:36 p.m. UTC | #8

On Thu, Mar 20, 2014 at 11:28:18AM +0100, Thomas Hellstrom wrote:
> On 03/20/2014 10:43 AM, David Herrmann wrote:
> > Hi
> >
> > On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
> > <thellstrom@vmware.com> wrote:
> >> A user logs in to a system where DRI clients use render nodes. The
> >> system grants rw permission on the render nodes for the console user.
> >> User starts editing a secret document, starts some GPGPU structural FEM
> >> computations of  the Pentagon building. Locks the screen and goes for lunch.
> >>
> >> A malicious user logs in using fast user switching and becomes the owner
> >> of the render node. Tries to map a couple of random offsets, but that
> >> fails, due to security checks. Now crafts a malicious command stream to
> >> dump all GPU memory to a file. Steals the first user's secret document
> >> and the intermediate Pentagon data. Logs out and starts data mining.
> >>
> >> Now if we require drivers to block these malicious command streams this
> >> can never happen, and distros can reliably grant rw access to the render
> >> nodes to the user currently logged into the console.
> >>
> >> I guest basically what I'm trying to say that with the legacy concept it
> >> was OK to access all GPU memory, because an authenticated X user
> >> basically had the same permissions.
> >>
> >> With render nodes we're allowing multiple users into the GPU at the same
> >> time, and it's not OK anymore for a client to access another clients GPU
> >> buffer through a malicious command stream.
> > Yes, I understand the attack scenario, but that's not related to
> > render-nodes at all. The exact same races exist on the legacy node:
> 
> I was under the impression that render nodes were designed to fix these
> issues?
> 
> >
> > 1) If you can do fast-user switching, you can spawn your own X-server,
> > get authenticated on your own server and you are allowed into the GPU.
> > You cannot map other user's buffers because they're on a different
> > master-object, but you _can_ craft malicious GPU streams and access
> > the other user's buffer.
> 
> But with legacy nodes, drivers can (and should IMO) throw out all data
> from GPU memory on master drop,
> and then block dropped master authenticated clients from GPU, until
> their master becomes active again or dies (in which case they are
> killed). In line with a previous discussion we had. Now you can't do
> this with render nodes, so yes they do open up
> a new race that requires command stream validation.
> 
> >
> > 2) If you can do fast-user switching, switch to an empty VT, open the
> > legacy node and you automatically become DRM-Master because there is
> > no active master. Now you can do anything on the DRM node, including
> > crafting malicious GPU streams.
> 
> I believe the above solution should work for this case as well.
> 
> >
> > Given that the legacy node is always around and _always_ has these
> > races, why should we prevent render-nodes from appearing just because
> > the _driver_ is racy? I mean, there is no gain in that.. if it opens a
> > new race, as you assumed, then yes, we should avoid it. But at least
> > for all drivers supporting render-nodes so far, they either are
> > entirely safe or the just described races exist on both nodes.
> 
> My suggestion is actually not to prevent render nodes from appearing,
> but rather that we should restrict them to drivers with command stream
> verification and / or per process virtual memory, and I also think we
> should plug the above races on legacy nodes. That way legacy nodes would
> use the old "master owns it all" model, while render nodes could allow
> multiple users at the same time.

FYI both radeon and nouveau (last time i check) can not be abuse to access
random VRAM or buffer bind by another process (except if the buffer is share
of course).

So there is no way (modulo any bug in command stream checking or hardware)
to abuse render node to access any memory that you do not have the right to
access.

I am pretty sure this stands for Intel too, i do not know if others drm
driver have this kind of assurance.

In any case this is not a render node issue and there is no reasons to force
full VRAM eviction or anything like that.

Cheers,
Jérôme

Ilia Mirkin March 20, 2014, 2:44 p.m. UTC | #9

On Thu, Mar 20, 2014 at 10:36 AM, Jerome Glisse <j.glisse@gmail.com> wrote:
> On Thu, Mar 20, 2014 at 11:28:18AM +0100, Thomas Hellstrom wrote:
>> On 03/20/2014 10:43 AM, David Herrmann wrote:
>> > On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
>> > <thellstrom@vmware.com> wrote:
>> > Given that the legacy node is always around and _always_ has these
>> > races, why should we prevent render-nodes from appearing just because
>> > the _driver_ is racy? I mean, there is no gain in that.. if it opens a
>> > new race, as you assumed, then yes, we should avoid it. But at least
>> > for all drivers supporting render-nodes so far, they either are
>> > entirely safe or the just described races exist on both nodes.
>>
>> My suggestion is actually not to prevent render nodes from appearing,
>> but rather that we should restrict them to drivers with command stream
>> verification and / or per process virtual memory, and I also think we
>> should plug the above races on legacy nodes. That way legacy nodes would
>> use the old "master owns it all" model, while render nodes could allow
>> multiple users at the same time.
>
> FYI both radeon and nouveau (last time i check) can not be abuse to access
> random VRAM or buffer bind by another process (except if the buffer is share
> of course).

I'm not aware of any restrictions in nouveau that would prevent you
from accessing any vram... there are a number of copy engines
accessible that would allow you to copy one region to another, and I'm
not aware of code to validate pushbuf contents (it would have to parse
the pushbuf and know which methods store source/destination
addresses). You could probably get creative and use that to overwrite
the MMU tables, to then gain the ability to read/write any part of
system memory... but perhaps the engines won't allow you to do that,
not sure. (Or perhaps the PDE isn't mapped into VRAM. Again, not
sure.)

  -ilia

Thomas Hellstrom March 20, 2014, 2:59 p.m. UTC | #10

On 03/20/2014 03:36 PM, Jerome Glisse wrote:
>
> In any case this is not a render node issue and there is no reasons to force
> full VRAM eviction or anything like that.

This comment suggests you haven't read the discussion fully.

VRAM eviction was discussed in the context of legacy nodes.

This is a render node issue because with legacy nodes you can work
around insufficient command checking.
With render nodes you can't.

/Thomas

> Cheers,
> Jérôme

Jerome Glisse March 20, 2014, 3:34 p.m. UTC | #11

On Thu, Mar 20, 2014 at 03:59:17PM +0100, Thomas Hellstrom wrote:
> On 03/20/2014 03:36 PM, Jerome Glisse wrote:
> >
> > In any case this is not a render node issue and there is no reasons to force
> > full VRAM eviction or anything like that.
> 
> This comment suggests you haven't read the discussion fully.
> 
> VRAM eviction was discussed in the context of legacy nodes.
> 
> This is a render node issue because with legacy nodes you can work
> around insufficient command checking.
> With render nodes you can't.

On radeon you can not abuse the GPU period legacy node or not. My comment
was about the fact that this is a driver issue and not a render node issue.
I would consider driver that allow to abuse the GPU block to access any
memory as broken no matter if we are talking about legacy or new render
node.

Cheers,
Jérôme

> 
> /Thomas
> 
> > Cheers,
> > Jérôme

Jerome Glisse March 20, 2014, 3:35 p.m. UTC | #12

On Thu, Mar 20, 2014 at 10:44:10AM -0400, Ilia Mirkin wrote:
> On Thu, Mar 20, 2014 at 10:36 AM, Jerome Glisse <j.glisse@gmail.com> wrote:
> > On Thu, Mar 20, 2014 at 11:28:18AM +0100, Thomas Hellstrom wrote:
> >> On 03/20/2014 10:43 AM, David Herrmann wrote:
> >> > On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
> >> > <thellstrom@vmware.com> wrote:
> >> > Given that the legacy node is always around and _always_ has these
> >> > races, why should we prevent render-nodes from appearing just because
> >> > the _driver_ is racy? I mean, there is no gain in that.. if it opens a
> >> > new race, as you assumed, then yes, we should avoid it. But at least
> >> > for all drivers supporting render-nodes so far, they either are
> >> > entirely safe or the just described races exist on both nodes.
> >>
> >> My suggestion is actually not to prevent render nodes from appearing,
> >> but rather that we should restrict them to drivers with command stream
> >> verification and / or per process virtual memory, and I also think we
> >> should plug the above races on legacy nodes. That way legacy nodes would
> >> use the old "master owns it all" model, while render nodes could allow
> >> multiple users at the same time.
> >
> > FYI both radeon and nouveau (last time i check) can not be abuse to access
> > random VRAM or buffer bind by another process (except if the buffer is share
> > of course).
> 
> I'm not aware of any restrictions in nouveau that would prevent you
> from accessing any vram... there are a number of copy engines
> accessible that would allow you to copy one region to another, and I'm
> not aware of code to validate pushbuf contents (it would have to parse
> the pushbuf and know which methods store source/destination
> addresses). You could probably get creative and use that to overwrite
> the MMU tables, to then gain the ability to read/write any part of
> system memory... but perhaps the engines won't allow you to do that,
> not sure. (Or perhaps the PDE isn't mapped into VRAM. Again, not
> sure.)
> 
>   -ilia

Well i obviously have not look at that in long time, but if the nouveau
API can be abuse than i consider this a broken by design. I know command
checking is painfull and CPU intensive but we have done it in radeon for
a reason.

Cheers,
Jérôme

Thomas Hellström (VMware) March 20, 2014, 3:49 p.m. UTC | #13

On 03/20/2014 04:34 PM, Jerome Glisse wrote:
> On Thu, Mar 20, 2014 at 03:59:17PM +0100, Thomas Hellstrom wrote:
>> On 03/20/2014 03:36 PM, Jerome Glisse wrote:
>>> In any case this is not a render node issue and there is no reasons to force
>>> full VRAM eviction or anything like that.
>> This comment suggests you haven't read the discussion fully.
>>
>> VRAM eviction was discussed in the context of legacy nodes.
>>
>> This is a render node issue because with legacy nodes you can work
>> around insufficient command checking.
>> With render nodes you can't.
> On radeon you can not abuse the GPU period legacy node or not. My comment
> was about the fact that this is a driver issue and not a render node issue.
> I would consider driver that allow to abuse the GPU block to access any
> memory as broken no matter if we are talking about legacy or new render
> node.
>
> Cheers,
> Jérôme
>

Great. Then I assume you don't have an issue with not enabling
render-nodes for those broken drivers,
(or at least a sysfs property or something similar flagging those device
nodes as broken)?

Thanks,
Thomas

Jerome Glisse March 20, 2014, 5:04 p.m. UTC | #14

On Thu, Mar 20, 2014 at 04:49:42PM +0100, Thomas Hellstrom wrote:
> On 03/20/2014 04:34 PM, Jerome Glisse wrote:
> > On Thu, Mar 20, 2014 at 03:59:17PM +0100, Thomas Hellstrom wrote:
> >> On 03/20/2014 03:36 PM, Jerome Glisse wrote:
> >>> In any case this is not a render node issue and there is no reasons to force
> >>> full VRAM eviction or anything like that.
> >> This comment suggests you haven't read the discussion fully.
> >>
> >> VRAM eviction was discussed in the context of legacy nodes.
> >>
> >> This is a render node issue because with legacy nodes you can work
> >> around insufficient command checking.
> >> With render nodes you can't.
> > On radeon you can not abuse the GPU period legacy node or not. My comment
> > was about the fact that this is a driver issue and not a render node issue.
> > I would consider driver that allow to abuse the GPU block to access any
> > memory as broken no matter if we are talking about legacy or new render
> > node.
> >
> > Cheers,
> > Jérôme
> >
> 
> Great. Then I assume you don't have an issue with not enabling
> render-nodes for those broken drivers,
> (or at least a sysfs property or something similar flagging those device
> nodes as broken)?
> 
> Thanks,
> Thomas
> 
> 

Yes, broken driver should not have render node, at leadt in my view.

Cheers,
Jérôme

Rob Clark March 20, 2014, 5:34 p.m. UTC | #15

On Thu, Mar 20, 2014 at 6:28 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
> On 03/20/2014 10:43 AM, David Herrmann wrote:
>> Hi
>>
>> On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
>> <thellstrom@vmware.com> wrote:
>>> A user logs in to a system where DRI clients use render nodes. The
>>> system grants rw permission on the render nodes for the console user.
>>> User starts editing a secret document, starts some GPGPU structural FEM
>>> computations of  the Pentagon building. Locks the screen and goes for lunch.
>>>
>>> A malicious user logs in using fast user switching and becomes the owner
>>> of the render node. Tries to map a couple of random offsets, but that
>>> fails, due to security checks. Now crafts a malicious command stream to
>>> dump all GPU memory to a file. Steals the first user's secret document
>>> and the intermediate Pentagon data. Logs out and starts data mining.
>>>
>>> Now if we require drivers to block these malicious command streams this
>>> can never happen, and distros can reliably grant rw access to the render
>>> nodes to the user currently logged into the console.
>>>
>>> I guest basically what I'm trying to say that with the legacy concept it
>>> was OK to access all GPU memory, because an authenticated X user
>>> basically had the same permissions.
>>>
>>> With render nodes we're allowing multiple users into the GPU at the same
>>> time, and it's not OK anymore for a client to access another clients GPU
>>> buffer through a malicious command stream.
>> Yes, I understand the attack scenario, but that's not related to
>> render-nodes at all. The exact same races exist on the legacy node:
>
> I was under the impression that render nodes were designed to fix these
> issues?
>
>>
>> 1) If you can do fast-user switching, you can spawn your own X-server,
>> get authenticated on your own server and you are allowed into the GPU.
>> You cannot map other user's buffers because they're on a different
>> master-object, but you _can_ craft malicious GPU streams and access
>> the other user's buffer.
>
> But with legacy nodes, drivers can (and should IMO) throw out all data
> from GPU memory on master drop,
> and then block dropped master authenticated clients from GPU, until
> their master becomes active again or dies (in which case they are
> killed). In line with a previous discussion we had. Now you can't do
> this with render nodes, so yes they do open up
> a new race that requires command stream validation.
>
>>
>> 2) If you can do fast-user switching, switch to an empty VT, open the
>> legacy node and you automatically become DRM-Master because there is
>> no active master. Now you can do anything on the DRM node, including
>> crafting malicious GPU streams.
>
> I believe the above solution should work for this case as well.
>
>>
>> Given that the legacy node is always around and _always_ has these
>> races, why should we prevent render-nodes from appearing just because
>> the _driver_ is racy? I mean, there is no gain in that.. if it opens a
>> new race, as you assumed, then yes, we should avoid it. But at least
>> for all drivers supporting render-nodes so far, they either are
>> entirely safe or the just described races exist on both nodes.
>
> My suggestion is actually not to prevent render nodes from appearing,
> but rather that we should restrict them to drivers with command stream
> verification and / or per process virtual memory, and I also think we
> should plug the above races on legacy nodes. That way legacy nodes would
> use the old "master owns it all" model, while render nodes could allow
> multiple users at the same time.
>

hmm, if you only have global gpu virtual memory (rather than
per-process), it would still be kinda nice to support render nodes if
app had some way to figure out whether or not it's gpu buffers were
secure.  Ie. an app that was using the gpu for something secure could
simply choose not to if the hw/driver could not guarantee that another
process using the gpu could not get access to the buffers..

BR,
-R

>
> /Thomas
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

Ilia Mirkin March 20, 2014, 5:39 p.m. UTC | #16

On Thu, Mar 20, 2014 at 10:44 AM, Ilia Mirkin <imirkin@alum.mit.edu> wrote:
> On Thu, Mar 20, 2014 at 10:36 AM, Jerome Glisse <j.glisse@gmail.com> wrote:
>> On Thu, Mar 20, 2014 at 11:28:18AM +0100, Thomas Hellstrom wrote:
>>> On 03/20/2014 10:43 AM, David Herrmann wrote:
>>> > On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
>>> > <thellstrom@vmware.com> wrote:
>>> > Given that the legacy node is always around and _always_ has these
>>> > races, why should we prevent render-nodes from appearing just because
>>> > the _driver_ is racy? I mean, there is no gain in that.. if it opens a
>>> > new race, as you assumed, then yes, we should avoid it. But at least
>>> > for all drivers supporting render-nodes so far, they either are
>>> > entirely safe or the just described races exist on both nodes.
>>>
>>> My suggestion is actually not to prevent render nodes from appearing,
>>> but rather that we should restrict them to drivers with command stream
>>> verification and / or per process virtual memory, and I also think we
>>> should plug the above races on legacy nodes. That way legacy nodes would
>>> use the old "master owns it all" model, while render nodes could allow
>>> multiple users at the same time.
>>
>> FYI both radeon and nouveau (last time i check) can not be abuse to access
>> random VRAM or buffer bind by another process (except if the buffer is share
>> of course).
>
> I'm not aware of any restrictions in nouveau that would prevent you
> from accessing any vram... there are a number of copy engines
> accessible that would allow you to copy one region to another, and I'm
> not aware of code to validate pushbuf contents (it would have to parse
> the pushbuf and know which methods store source/destination
> addresses). You could probably get creative and use that to overwrite
> the MMU tables, to then gain the ability to read/write any part of
> system memory... but perhaps the engines won't allow you to do that,
> not sure. (Or perhaps the PDE isn't mapped into VRAM. Again, not
> sure.)

Please ignore this totally wrong comment. While there is indeed no
pushbuf validation, every process (well, every open() of the drm node)
gets its own virtual memory space for all GPU's with MMU's (nv50+).

  -ilia

Thomas Hellstrom March 20, 2014, 8:54 p.m. UTC | #17

On 03/20/2014 06:34 PM, Rob Clark wrote:
> On Thu, Mar 20, 2014 at 6:28 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
>> On 03/20/2014 10:43 AM, David Herrmann wrote:
>>> Hi
>>>
>>> On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
>>> <thellstrom@vmware.com> wrote:
>>>> A user logs in to a system where DRI clients use render nodes. The
>>>> system grants rw permission on the render nodes for the console user.
>>>> User starts editing a secret document, starts some GPGPU structural FEM
>>>> computations of  the Pentagon building. Locks the screen and goes for lunch.
>>>>
>>>> A malicious user logs in using fast user switching and becomes the owner
>>>> of the render node. Tries to map a couple of random offsets, but that
>>>> fails, due to security checks. Now crafts a malicious command stream to
>>>> dump all GPU memory to a file. Steals the first user's secret document
>>>> and the intermediate Pentagon data. Logs out and starts data mining.
>>>>
>>>> Now if we require drivers to block these malicious command streams this
>>>> can never happen, and distros can reliably grant rw access to the render
>>>> nodes to the user currently logged into the console.
>>>>
>>>> I guest basically what I'm trying to say that with the legacy concept it
>>>> was OK to access all GPU memory, because an authenticated X user
>>>> basically had the same permissions.
>>>>
>>>> With render nodes we're allowing multiple users into the GPU at the same
>>>> time, and it's not OK anymore for a client to access another clients GPU
>>>> buffer through a malicious command stream.
>>> Yes, I understand the attack scenario, but that's not related to
>>> render-nodes at all. The exact same races exist on the legacy node:
>> I was under the impression that render nodes were designed to fix these
>> issues?
>>
>>> 1) If you can do fast-user switching, you can spawn your own X-server,
>>> get authenticated on your own server and you are allowed into the GPU.
>>> You cannot map other user's buffers because they're on a different
>>> master-object, but you _can_ craft malicious GPU streams and access
>>> the other user's buffer.
>> But with legacy nodes, drivers can (and should IMO) throw out all data
>> from GPU memory on master drop,
>> and then block dropped master authenticated clients from GPU, until
>> their master becomes active again or dies (in which case they are
>> killed). In line with a previous discussion we had. Now you can't do
>> this with render nodes, so yes they do open up
>> a new race that requires command stream validation.
>>
>>> 2) If you can do fast-user switching, switch to an empty VT, open the
>>> legacy node and you automatically become DRM-Master because there is
>>> no active master. Now you can do anything on the DRM node, including
>>> crafting malicious GPU streams.
>> I believe the above solution should work for this case as well.
>>
>>> Given that the legacy node is always around and _always_ has these
>>> races, why should we prevent render-nodes from appearing just because
>>> the _driver_ is racy? I mean, there is no gain in that.. if it opens a
>>> new race, as you assumed, then yes, we should avoid it. But at least
>>> for all drivers supporting render-nodes so far, they either are
>>> entirely safe or the just described races exist on both nodes.
>> My suggestion is actually not to prevent render nodes from appearing,
>> but rather that we should restrict them to drivers with command stream
>> verification and / or per process virtual memory, and I also think we
>> should plug the above races on legacy nodes. That way legacy nodes would
>> use the old "master owns it all" model, while render nodes could allow
>> multiple users at the same time.
>>
> hmm, if you only have global gpu virtual memory (rather than
> per-process), it would still be kinda nice to support render nodes if
> app had some way to figure out whether or not it's gpu buffers were
> secure.

If there is only global GPU memory there's of course also the option of
providing a
command stream verifier.

> Ie. an app that was using the gpu for something secure could
> simply choose not to if the hw/driver could not guarantee that another
> process using the gpu could not get access to the buffers..

IMO that should work fine, but we need to provide a way for user-space
to determine whether
the render node is secure or not. Perhaps a sysfs attribute and / or a
drm getparam() parameter?

/Thomas


>
> BR,
> -R
>
>> /Thomas
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/dri-devel&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=FdvSEw9pF7e8FqQQ9dlatoXG%2BSm0ueWrIhyOeI%2BOc64%3D%0A&s=926ef90112ced9636ddbf2873c3770bdf06ca244ec32744fe33e478b93e0d695

Rob Clark March 20, 2014, 9:13 p.m. UTC | #18

On Thu, Mar 20, 2014 at 4:54 PM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
> On 03/20/2014 06:34 PM, Rob Clark wrote:
>> On Thu, Mar 20, 2014 at 6:28 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
>>> On 03/20/2014 10:43 AM, David Herrmann wrote:
>>>> Hi
>>>>
>>>> On Thu, Mar 20, 2014 at 10:27 AM, Thomas Hellstrom
>>>> <thellstrom@vmware.com> wrote:
>>>>> A user logs in to a system where DRI clients use render nodes. The
>>>>> system grants rw permission on the render nodes for the console user.
>>>>> User starts editing a secret document, starts some GPGPU structural FEM
>>>>> computations of  the Pentagon building. Locks the screen and goes for lunch.
>>>>>
>>>>> A malicious user logs in using fast user switching and becomes the owner
>>>>> of the render node. Tries to map a couple of random offsets, but that
>>>>> fails, due to security checks. Now crafts a malicious command stream to
>>>>> dump all GPU memory to a file. Steals the first user's secret document
>>>>> and the intermediate Pentagon data. Logs out and starts data mining.
>>>>>
>>>>> Now if we require drivers to block these malicious command streams this
>>>>> can never happen, and distros can reliably grant rw access to the render
>>>>> nodes to the user currently logged into the console.
>>>>>
>>>>> I guest basically what I'm trying to say that with the legacy concept it
>>>>> was OK to access all GPU memory, because an authenticated X user
>>>>> basically had the same permissions.
>>>>>
>>>>> With render nodes we're allowing multiple users into the GPU at the same
>>>>> time, and it's not OK anymore for a client to access another clients GPU
>>>>> buffer through a malicious command stream.
>>>> Yes, I understand the attack scenario, but that's not related to
>>>> render-nodes at all. The exact same races exist on the legacy node:
>>> I was under the impression that render nodes were designed to fix these
>>> issues?
>>>
>>>> 1) If you can do fast-user switching, you can spawn your own X-server,
>>>> get authenticated on your own server and you are allowed into the GPU.
>>>> You cannot map other user's buffers because they're on a different
>>>> master-object, but you _can_ craft malicious GPU streams and access
>>>> the other user's buffer.
>>> But with legacy nodes, drivers can (and should IMO) throw out all data
>>> from GPU memory on master drop,
>>> and then block dropped master authenticated clients from GPU, until
>>> their master becomes active again or dies (in which case they are
>>> killed). In line with a previous discussion we had. Now you can't do
>>> this with render nodes, so yes they do open up
>>> a new race that requires command stream validation.
>>>
>>>> 2) If you can do fast-user switching, switch to an empty VT, open the
>>>> legacy node and you automatically become DRM-Master because there is
>>>> no active master. Now you can do anything on the DRM node, including
>>>> crafting malicious GPU streams.
>>> I believe the above solution should work for this case as well.
>>>
>>>> Given that the legacy node is always around and _always_ has these
>>>> races, why should we prevent render-nodes from appearing just because
>>>> the _driver_ is racy? I mean, there is no gain in that.. if it opens a
>>>> new race, as you assumed, then yes, we should avoid it. But at least
>>>> for all drivers supporting render-nodes so far, they either are
>>>> entirely safe or the just described races exist on both nodes.
>>> My suggestion is actually not to prevent render nodes from appearing,
>>> but rather that we should restrict them to drivers with command stream
>>> verification and / or per process virtual memory, and I also think we
>>> should plug the above races on legacy nodes. That way legacy nodes would
>>> use the old "master owns it all" model, while render nodes could allow
>>> multiple users at the same time.
>>>
>> hmm, if you only have global gpu virtual memory (rather than
>> per-process), it would still be kinda nice to support render nodes if
>> app had some way to figure out whether or not it's gpu buffers were
>> secure.
>
> If there is only global GPU memory there's of course also the option of
> providing a
> command stream verifier.

well, that wouldn't really help separate buffers from other contexts
(since a3xx and later has various load/store instructions)..

At the moment, I have two cases:

1) MMU... gpu has no direct access to system memory, so other than
access to other contexts buffers, the system is secure
2) no-MMU... vram carveout and set some registers to limit access to
addresses within that range

In case #1 we could implement per-context vm.. just a matter of
writing some code.  Doing it the naive way requires draining the
command queue on context switches and getting the CPU involved, which
isn't so terribly awesome.

The downstream android driver does context switches on the CP itself
(ie. bang some registers to point the MMU at new set of tables and TLB
flush)..  but it needs to be confirmed that this can be done securely
(ie. restricted to ringbuffer controlled by kernel and not IB buffer
from userspace).  If it isn't restricted to kernel ringbuffer, then
that sort of opens up an even bigger hole than it closes ;-)

>> Ie. an app that was using the gpu for something secure could
>> simply choose not to if the hw/driver could not guarantee that another
>> process using the gpu could not get access to the buffers..
>
> IMO that should work fine, but we need to provide a way for user-space
> to determine whether
> the render node is secure or not. Perhaps a sysfs attribute and / or a
> drm getparam() parameter?

I'd *assume* that that sort of thing would just be some sort of CL extension?

But no objection to exposing it in a more common way.

I guess it is also an option to keep the bootarg to override default
(with the default being 'enabled' for hw w/ per-context/process vm and
'disabled' otherwise).

BR,
-R

> /Thomas
>
>
>>
>> BR,
>> -R
>>
>>> /Thomas
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/dri-devel&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=FdvSEw9pF7e8FqQQ9dlatoXG%2BSm0ueWrIhyOeI%2BOc64%3D%0A&s=926ef90112ced9636ddbf2873c3770bdf06ca244ec32744fe33e478b93e0d695

Daniel Vetter March 21, 2014, 7:10 a.m. UTC | #19

On Thu, Mar 20, 2014 at 10:13 PM, Rob Clark <robdclark@gmail.com> wrote:
>>> Ie. an app that was using the gpu for something secure could
>>> simply choose not to if the hw/driver could not guarantee that another
>>> process using the gpu could not get access to the buffers..
>>
>> IMO that should work fine, but we need to provide a way for user-space
>> to determine whether
>> the render node is secure or not. Perhaps a sysfs attribute and / or a
>> drm getparam() parameter?
>
> I'd *assume* that that sort of thing would just be some sort of CL extension?
>
> But no objection to exposing it in a more common way.
>
> I guess it is also an option to keep the bootarg to override default
> (with the default being 'enabled' for hw w/ per-context/process vm and
> 'disabled' otherwise).

Imo if we expose this through sysfs we should always enable
rendernodes. The udev scripts can then decide what permissions to set
on them.
-Daniel

Thomas Hellstrom March 21, 2014, 8:29 a.m. UTC | #20

On 03/21/2014 08:10 AM, Daniel Vetter wrote:
> On Thu, Mar 20, 2014 at 10:13 PM, Rob Clark <robdclark@gmail.com> wrote:
>>>> Ie. an app that was using the gpu for something secure could
>>>> simply choose not to if the hw/driver could not guarantee that another
>>>> process using the gpu could not get access to the buffers..
>>> IMO that should work fine, but we need to provide a way for user-space
>>> to determine whether
>>> the render node is secure or not. Perhaps a sysfs attribute and / or a
>>> drm getparam() parameter?
>> I'd *assume* that that sort of thing would just be some sort of CL extension?
>>
>> But no objection to exposing it in a more common way.
>>
>> I guess it is also an option to keep the bootarg to override default
>> (with the default being 'enabled' for hw w/ per-context/process vm and
>> 'disabled' otherwise).
> Imo if we expose this through sysfs we should always enable
> rendernodes. The udev scripts can then decide what permissions to set
> on them.

Agreed.
Thomas


> -Daniel

[v2] drm: enable render-nodes by default

Commit Message

Comments

Patch