diff mbox

[v2,08/27] drm/tegra: gr2d: Miscellaneous cleanups

Message ID 1381134884-5816-9-git-send-email-treding@nvidia.com (mailing list archive)
State New, archived
Headers show

Commit Message

Thierry Reding Oct. 7, 2013, 8:34 a.m. UTC
Rework the address table code for the host1x firewall. The previous
implementation allocated a bitfield but didn't check for a valid pointer
so it could potentially crash. Furthermore setting up a bitfield makes
the code more complex than it needs to be.

Don't annotate the driver's .remove() function __exit. Even if built in
the driver can be unloaded via sysfs, so .remove() needs to stick around
after initialization. Also remove the explicit initialization of the
driver's .owner field to THIS_MODULE because that's now handled by the
driver core.

Furthermore make an error message more consistent with other subdrivers,
index the syncpts array for better readability, remove a gratuituous
newline and reorder some variable declarations to make the code easier
to read.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/host1x/drm/gr2d.c | 95 +++++++++++++++++++------------------------
 1 file changed, 41 insertions(+), 54 deletions(-)

Comments

Erik Faye-Lund Oct. 7, 2013, 11:34 a.m. UTC | #1
On Mon, Oct 7, 2013 at 10:34 AM, Thierry Reding
<thierry.reding@gmail.com> wrote:
> Rework the address table code for the host1x firewall. The previous
> implementation allocated a bitfield but didn't check for a valid pointer
> so it could potentially crash.

I don't think it could crash. The bitmaps was allocated as a 256-bit
field, and the register offset gets AND'ed with 0xFF before being
looked up. What am I missing?

> Furthermore setting up a bitfield makes
> the code more complex than it needs to be.

Doesn't this perform worse than the current implementation? Going from
1 to 13 checks per write sounds less than ideal to me...
Thierry Reding Oct. 7, 2013, 12:14 p.m. UTC | #2
On Mon, Oct 07, 2013 at 01:34:44PM +0200, Erik Faye-Lund wrote:
> On Mon, Oct 7, 2013 at 10:34 AM, Thierry Reding
> <thierry.reding@gmail.com> wrote:
> > Rework the address table code for the host1x firewall. The previous
> > implementation allocated a bitfield but didn't check for a valid pointer
> > so it could potentially crash.
> 
> I don't think it could crash. The bitmaps was allocated as a 256-bit
> field, and the register offset gets AND'ed with 0xFF before being
> looked up. What am I missing?

The pointer returned from the allocation is never checked, so it could
theoretically be NULL and be used regardless.

Also I'm not sure that AND'ing with 0xff is the right thing to do here.

> > Furthermore setting up a bitfield makes
> > the code more complex than it needs to be.
> 
> Doesn't this perform worse than the current implementation? Going from
> 1 to 13 checks per write sounds less than ideal to me...

I'm not so sure. Caching should alleviate the issue with the increased
amount of data. Then there's the fact that previously we needed to
divide and compute the remainder of the division for the BIT_MASK and
BIT_WORD operations.

One other added benefit of this approach is that the address registers
are stored in a const array and therefore could reside in a read-only
memory region. With the previous approach, once somebody had access to
the bitmap, it could easily be overwritten with zeros and effectively
make the firewall useless. I'm not sure how likely that would be, but it
could be done.

I guess one could actually go and run a benchmark against both versions
and balance the performance impact against the possible security
implications. But given that we don't really have any benchmarks that's
pretty hard to do.

Thierry
Terje Bergstrom Oct. 7, 2013, 12:52 p.m. UTC | #3
On 07.10.2013 15:14, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Mon, Oct 07, 2013 at 01:34:44PM +0200, Erik Faye-Lund wrote:
>> On Mon, Oct 7, 2013 at 10:34 AM, Thierry Reding
>> <thierry.reding@gmail.com> wrote:
>>> Rework the address table code for the host1x firewall. The previous
>>> implementation allocated a bitfield but didn't check for a valid pointer
>>> so it could potentially crash.
>>
>> I don't think it could crash. The bitmaps was allocated as a 256-bit
>> field, and the register offset gets AND'ed with 0xFF before being
>> looked up. What am I missing?
> 
> The pointer returned from the allocation is never checked, so it could
> theoretically be NULL and be used regardless.
> 
> Also I'm not sure that AND'ing with 0xff is the right thing to do here.

Oops, there's a check for NULL missing. If that allocation fails, probe
should fail, so we just need to propagate the error condition. Otherwise
we might just leave the firewall off, and let everything go in unchecked.

AND 0xff is necessary, because the same registers are mirrored in
multiple contexts. AND removes the offset coming from context, and
leaves just the plain register offset.

> I'm not so sure. Caching should alleviate the issue with the increased
> amount of data. Then there's the fact that previously we needed to
> divide and compute the remainder of the division for the BIT_MASK and
> BIT_WORD operations.

Don't these get compiled into shifts and bitwise ands?

> One other added benefit of this approach is that the address registers
> are stored in a const array and therefore could reside in a read-only
> memory region. With the previous approach, once somebody had access to
> the bitmap, it could easily be overwritten with zeros and effectively
> make the firewall useless. I'm not sure how likely that would be, but it
> could be done.

If somebody gets access to the bitmap, he has access to kernel data
structures. GR2D firewall is the least of our worries in this case.

Terje
Erik Faye-Lund Oct. 7, 2013, 12:53 p.m. UTC | #4
On Mon, Oct 7, 2013 at 2:14 PM, Thierry Reding <thierry.reding@gmail.com> wrote:
> On Mon, Oct 07, 2013 at 01:34:44PM +0200, Erik Faye-Lund wrote:
>> On Mon, Oct 7, 2013 at 10:34 AM, Thierry Reding
>> <thierry.reding@gmail.com> wrote:
>> > Rework the address table code for the host1x firewall. The previous
>> > implementation allocated a bitfield but didn't check for a valid pointer
>> > so it could potentially crash.
>>
>> I don't think it could crash. The bitmaps was allocated as a 256-bit
>> field, and the register offset gets AND'ed with 0xFF before being
>> looked up. What am I missing?
>
> The pointer returned from the allocation is never checked, so it could
> theoretically be NULL and be used regardless.
>

Right. Thanks for clarifying.

> Also I'm not sure that AND'ing with 0xff is the right thing to do here.

Well, maybe not. I'm not entirely convinced it's *not*, though.

>> > Furthermore setting up a bitfield makes
>> > the code more complex than it needs to be.
>>
>> Doesn't this perform worse than the current implementation? Going from
>> 1 to 13 checks per write sounds less than ideal to me...
>
> I'm not so sure. Caching should alleviate the issue with the increased
> amount of data. Then there's the fact that previously we needed to
> divide and compute the remainder of the division for the BIT_MASK and
> BIT_WORD operations.

That's an AND and a shift, not division and modulo, both single-cycle
instructions on ARM. I'm pretty sure that'd be a win.

> One other added benefit of this approach is that the address registers
> are stored in a const array and therefore could reside in a read-only
> memory region. With the previous approach, once somebody had access to
> the bitmap, it could easily be overwritten with zeros and effectively
> make the firewall useless. I'm not sure how likely that would be, but it
> could be done.

Perhaps the bitmap could be generated compile-time and stuck in
read-only memory?

> I guess one could actually go and run a benchmark against both versions
> and balance the performance impact against the possible security
> implications. But given that we don't really have any benchmarks that's
> pretty hard to do.

Well, perhaps we could have both? :)
Erik Faye-Lund Oct. 7, 2013, 1:02 p.m. UTC | #5
On Mon, Oct 7, 2013 at 2:53 PM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> On Mon, Oct 7, 2013 at 2:14 PM, Thierry Reding <thierry.reding@gmail.com> wrote:
>> On Mon, Oct 07, 2013 at 01:34:44PM +0200, Erik Faye-Lund wrote:
>>> On Mon, Oct 7, 2013 at 10:34 AM, Thierry Reding
>>> <thierry.reding@gmail.com> wrote:
>>> > Rework the address table code for the host1x firewall. The previous
>>> > implementation allocated a bitfield but didn't check for a valid pointer
>>> > so it could potentially crash.
>>>
>>> I don't think it could crash. The bitmaps was allocated as a 256-bit
>>> field, and the register offset gets AND'ed with 0xFF before being
>>> looked up. What am I missing?
>>
>> The pointer returned from the allocation is never checked, so it could
>> theoretically be NULL and be used regardless.
>>
>> Also I'm not sure that AND'ing with 0xff is the right thing to do here.
>
> Well, maybe not. I'm not entirely convinced it's *not*, though.
>

I'm sorry, I intended to fill out a bit more here before I hit send,
and totally forgot.

Point is, it seems only 0x00..0x4c is used (and then repeated at
multiples of 0x4000 for each context, but the offsets don't have
enough bits to reach that far), from looking at the TRM.

So the question is really how the hardware treats writes to
non-existent registers. My guess would be that they are simply not
recorded, and if that's the case it doesn't matter what we do. And
doing an unconditional AND is faster than doing a bit-test followed by
a conditional branch.

Of course, if the hardware does *not* ignore writes to non-existent
register, then we might be in trouble. But for that, we'd probably
want a full map of valid registers, not just saying that they aren't
pointers, no?
Erik Faye-Lund Oct. 7, 2013, 1:05 p.m. UTC | #6
On Mon, Oct 7, 2013 at 2:52 PM, Terje Bergström <tbergstrom@nvidia.com> wrote:
> On 07.10.2013 15:14, Thierry Reding wrote:
>> * PGP Signed by an unknown key
>>
>> On Mon, Oct 07, 2013 at 01:34:44PM +0200, Erik Faye-Lund wrote:
>>> On Mon, Oct 7, 2013 at 10:34 AM, Thierry Reding
>>> <thierry.reding@gmail.com> wrote:
>>>> Rework the address table code for the host1x firewall. The previous
>>>> implementation allocated a bitfield but didn't check for a valid pointer
>>>> so it could potentially crash.
>>>
>>> I don't think it could crash. The bitmaps was allocated as a 256-bit
>>> field, and the register offset gets AND'ed with 0xFF before being
>>> looked up. What am I missing?
>>
>> The pointer returned from the allocation is never checked, so it could
>> theoretically be NULL and be used regardless.
>>
>> Also I'm not sure that AND'ing with 0xff is the right thing to do here.
>
> Oops, there's a check for NULL missing. If that allocation fails, probe
> should fail, so we just need to propagate the error condition. Otherwise
> we might just leave the firewall off, and let everything go in unchecked.
>
> AND 0xff is necessary, because the same registers are mirrored in
> multiple contexts. AND removes the offset coming from context, and
> leaves just the plain register offset.
>

The offsets in the commands don't have enough bits to reach over to
the next context. The contexts are repeated at multiples of 0x4000,
and 0xFFF is the largest encodable offset. So I don't really thing the
AND is needed for *that* purpose.

> If somebody gets access to the bitmap, he has access to kernel data
> structures. GR2D firewall is the least of our worries in this case.

Indeed, that's only a problem if someone is already on the other side
of the air-tight hatch.
Thierry Reding Oct. 7, 2013, 1:13 p.m. UTC | #7
On Mon, Oct 07, 2013 at 03:52:28PM +0300, Terje Bergström wrote:
> On 07.10.2013 15:14, Thierry Reding wrote:
> > * PGP Signed by an unknown key
> > 
> > On Mon, Oct 07, 2013 at 01:34:44PM +0200, Erik Faye-Lund wrote:
> >> On Mon, Oct 7, 2013 at 10:34 AM, Thierry Reding
> >> <thierry.reding@gmail.com> wrote:
> >>> Rework the address table code for the host1x firewall. The previous
> >>> implementation allocated a bitfield but didn't check for a valid pointer
> >>> so it could potentially crash.
> >>
> >> I don't think it could crash. The bitmaps was allocated as a 256-bit
> >> field, and the register offset gets AND'ed with 0xFF before being
> >> looked up. What am I missing?
> > 
> > The pointer returned from the allocation is never checked, so it could
> > theoretically be NULL and be used regardless.
> > 
> > Also I'm not sure that AND'ing with 0xff is the right thing to do here.
> 
> Oops, there's a check for NULL missing. If that allocation fails, probe
> should fail, so we just need to propagate the error condition. Otherwise
> we might just leave the firewall off, and let everything go in unchecked.

Yes, definitely.

> AND 0xff is necessary, because the same registers are mirrored in
> multiple contexts. AND removes the offset coming from context, and
> leaves just the plain register offset.

That looks like something which should go into a comment.

> > I'm not so sure. Caching should alleviate the issue with the increased
> > amount of data. Then there's the fact that previously we needed to
> > divide and compute the remainder of the division for the BIT_MASK and
> > BIT_WORD operations.
> 
> Don't these get compiled into shifts and bitwise ands?

Yes, they are.

> > One other added benefit of this approach is that the address registers
> > are stored in a const array and therefore could reside in a read-only
> > memory region. With the previous approach, once somebody had access to
> > the bitmap, it could easily be overwritten with zeros and effectively
> > make the firewall useless. I'm not sure how likely that would be, but it
> > could be done.
> 
> If somebody gets access to the bitmap, he has access to kernel data
> structures. GR2D firewall is the least of our worries in this case.

I see the point. Oh well, all my arguments are torn down. I'll drop this
patch. Or at least the part that rewrites the gr2d_is_addr() and make it
check for failed allocations properly.

For what it's worth, I still think the plain table lookup is much easier
to understand.

Thierry
Terje Bergstrom Oct. 8, 2013, 5:36 a.m. UTC | #8
On 07.10.2013 16:05, Erik Faye-Lund wrote:
> On Mon, Oct 7, 2013 at 2:52 PM, Terje Bergström <tbergstrom@nvidia.com> wrote:
>> AND 0xff is necessary, because the same registers are mirrored in
>> multiple contexts. AND removes the offset coming from context, and
>> leaves just the plain register offset.
> The offsets in the commands don't have enough bits to reach over to
> the next context. The contexts are repeated at multiples of 0x4000,
> and 0xFFF is the largest encodable offset. So I don't really thing the
> AND is needed for *that* purpose.

Well, that was embarrassing. Of course that is true. You can access all
registers via MMIO or with SETCLASS to correct context. But that was
still the reason I used when I wrote that 0xff, so it's bogus.

Real fix is to do what Thierry already did: limit the bitmap to max
register number we're interested in, and have a if() to fail fast if
register number falls outside that range.

Even better, get IOMMU up and running.

Terje
Terje Bergstrom Oct. 8, 2013, 5:48 a.m. UTC | #9
On 07.10.2013 16:02, Erik Faye-Lund wrote:
> So the question is really how the hardware treats writes to
> non-existent registers. My guess would be that they are simply not
> recorded, and if that's the case it doesn't matter what we do. And
> doing an unconditional AND is faster than doing a bit-test followed by
> a conditional branch.

Hardware ignores writes to non-existent registers. Sometimes
non-existent registers are taken into use in future versions, though.

Terje
Erik Faye-Lund Oct. 8, 2013, 11:11 a.m. UTC | #10
On Tue, Oct 8, 2013 at 7:48 AM, Terje Bergström <tbergstrom@nvidia.com> wrote:
> On 07.10.2013 16:02, Erik Faye-Lund wrote:
>> So the question is really how the hardware treats writes to
>> non-existent registers. My guess would be that they are simply not
>> recorded, and if that's the case it doesn't matter what we do. And
>> doing an unconditional AND is faster than doing a bit-test followed by
>> a conditional branch.
>
> Hardware ignores writes to non-existent registers. Sometimes
> non-existent registers are taken into use in future versions, though.

Right. That might be a motivation to treat non-existent registers as
errors. But then I start to wonder about "holes" in the pointer
address space should be treated differently. For instance, from the
TRM it looks like registers 0x3 - 0x8, 0xe, 0x15 and 0x42 are
undefined for T20. If writes to registers beyond 0x4c fails, shouldn't
these also fail? Or are somehow guaranteed that these holes will never
be plugged? Are they simply missing from the TRM?

I dunno...
diff mbox

Patch

diff --git a/drivers/gpu/host1x/drm/gr2d.c b/drivers/gpu/host1x/drm/gr2d.c
index 8d9a10f..07f0425 100644
--- a/drivers/gpu/host1x/drm/gr2d.c
+++ b/drivers/gpu/host1x/drm/gr2d.c
@@ -27,9 +27,8 @@ 
 
 struct gr2d {
 	struct host1x_client client;
-	struct clk *clk;
 	struct host1x_channel *channel;
-	unsigned long *addr_regs;
+	struct clk *clk;
 };
 
 static inline struct gr2d *to_gr2d(struct host1x_client *client)
@@ -37,8 +36,6 @@  static inline struct gr2d *to_gr2d(struct host1x_client *client)
 	return container_of(client, struct gr2d, client);
 }
 
-static int gr2d_is_addr_reg(struct device *dev, u32 class, u32 reg);
-
 static int gr2d_client_init(struct host1x_client *client,
 			    struct drm_device *drm)
 {
@@ -56,7 +53,6 @@  static int gr2d_open_channel(struct host1x_client *client,
 	struct gr2d *gr2d = to_gr2d(client);
 
 	context->channel = host1x_channel_get(gr2d->channel);
-
 	if (!context->channel)
 		return -ENOMEM;
 
@@ -87,11 +83,38 @@  static struct host1x_bo *host1x_bo_lookup(struct drm_device *drm,
 	return &bo->base;
 }
 
+static const u32 gr2d_addr_regs[] = {
+	0x1a, 0x1b, 0x26, 0x2b, 0x2c, 0x2d, 0x31, 0x32,
+	0x48, 0x49, 0x4a, 0x4b, 0x4c
+};
+
+static int gr2d_is_addr_reg(struct device *dev, u32 class, u32 offset)
+{
+	unsigned int i;
+
+	switch (class) {
+	case HOST1X_CLASS_HOST1X:
+		if (offset == 0x2b)
+			return 1;
+
+		break;
+
+	case HOST1X_CLASS_GR2D:
+	case HOST1X_CLASS_GR2D_SB:
+		for (i = 0; i < ARRAY_SIZE(gr2d_addr_regs); i++)
+			if (offset == gr2d_addr_regs[i])
+				return 1;
+
+		break;
+	}
+
+	return 0;
+}
+
 static int gr2d_submit(struct tegra_drm_context *context,
 		       struct drm_tegra_submit *args, struct drm_device *drm,
 		       struct drm_file *file)
 {
-	struct host1x_job *job;
 	unsigned int num_cmdbufs = args->num_cmdbufs;
 	unsigned int num_relocs = args->num_relocs;
 	unsigned int num_waitchks = args->num_waitchks;
@@ -102,6 +125,7 @@  static int gr2d_submit(struct tegra_drm_context *context,
 	struct drm_tegra_waitchk __user *waitchks =
 		(void * __user)(uintptr_t)args->waitchks;
 	struct drm_tegra_syncpt syncpt;
+	struct host1x_job *job;
 	int err;
 
 	/* We don't yet support other than one syncpt_incr struct per submit */
@@ -205,41 +229,6 @@  static struct host1x_client_ops gr2d_client_ops = {
 	.submit = gr2d_submit,
 };
 
-static void gr2d_init_addr_reg_map(struct device *dev, struct gr2d *gr2d)
-{
-	const u32 gr2d_addr_regs[] = {0x1a, 0x1b, 0x26, 0x2b, 0x2c, 0x2d, 0x31,
-				      0x32, 0x48, 0x49, 0x4a, 0x4b, 0x4c};
-	unsigned long *bitmap;
-	int i;
-
-	bitmap = devm_kzalloc(dev, DIV_ROUND_UP(256, BITS_PER_BYTE),
-			      GFP_KERNEL);
-
-	for (i = 0; i < ARRAY_SIZE(gr2d_addr_regs); ++i) {
-		u32 reg = gr2d_addr_regs[i];
-		bitmap[BIT_WORD(reg)] |= BIT_MASK(reg);
-	}
-
-	gr2d->addr_regs = bitmap;
-}
-
-static int gr2d_is_addr_reg(struct device *dev, u32 class, u32 reg)
-{
-	struct gr2d *gr2d = dev_get_drvdata(dev);
-
-	switch (class) {
-	case HOST1X_CLASS_HOST1X:
-		return reg == 0x2b;
-	case HOST1X_CLASS_GR2D:
-	case HOST1X_CLASS_GR2D_SB:
-		reg &= 0xff;
-		if (gr2d->addr_regs[BIT_WORD(reg)] & BIT_MASK(reg))
-			return 1;
-	default:
-		return 0;
-	}
-}
-
 static const struct of_device_id gr2d_match[] = {
 	{ .compatible = "nvidia,tegra30-gr2d" },
 	{ .compatible = "nvidia,tegra20-gr2d" },
@@ -248,11 +237,11 @@  static const struct of_device_id gr2d_match[] = {
 
 static int gr2d_probe(struct platform_device *pdev)
 {
+	struct tegra_drm *tegra = host1x_get_drm_data(pdev->dev.parent);
 	struct device *dev = &pdev->dev;
-	struct tegra_drm *tegra = host1x_get_drm_data(dev->parent);
-	int err;
-	struct gr2d *gr2d = NULL;
 	struct host1x_syncpt **syncpts;
+	struct gr2d *gr2d;
+	int err;
 
 	gr2d = devm_kzalloc(dev, sizeof(*gr2d), GFP_KERNEL);
 	if (!gr2d)
@@ -278,8 +267,8 @@  static int gr2d_probe(struct platform_device *pdev)
 	if (!gr2d->channel)
 		return -ENOMEM;
 
-	*syncpts = host1x_syncpt_request(dev, false);
-	if (!(*syncpts)) {
+	syncpts[0] = host1x_syncpt_request(dev, false);
+	if (!syncpts[0]) {
 		host1x_channel_free(gr2d->channel);
 		return -ENOMEM;
 	}
@@ -296,14 +285,12 @@  static int gr2d_probe(struct platform_device *pdev)
 		return err;
 	}
 
-	gr2d_init_addr_reg_map(dev, gr2d);
-
 	platform_set_drvdata(pdev, gr2d);
 
 	return 0;
 }
 
-static int __exit gr2d_remove(struct platform_device *pdev)
+static int gr2d_remove(struct platform_device *pdev)
 {
 	struct tegra_drm *tegra = host1x_get_drm_data(pdev->dev.parent);
 	struct gr2d *gr2d = platform_get_drvdata(pdev);
@@ -312,7 +299,8 @@  static int __exit gr2d_remove(struct platform_device *pdev)
 
 	err = host1x_unregister_client(tegra, &gr2d->client);
 	if (err < 0) {
-		dev_err(&pdev->dev, "failed to unregister client: %d\n", err);
+		dev_err(&pdev->dev, "failed to unregister host1x client: %d\n",
+			err);
 		return err;
 	}
 
@@ -326,11 +314,10 @@  static int __exit gr2d_remove(struct platform_device *pdev)
 }
 
 struct platform_driver tegra_gr2d_driver = {
-	.probe = gr2d_probe,
-	.remove = __exit_p(gr2d_remove),
 	.driver = {
-		.owner = THIS_MODULE,
 		.name = "gr2d",
 		.of_match_table = gr2d_match,
-	}
+	},
+	.probe = gr2d_probe,
+	.remove = gr2d_remove,
 };