Message ID | 1381134884-5816-9-git-send-email-treding@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Oct 7, 2013 at 10:34 AM, Thierry Reding <thierry.reding@gmail.com> wrote: > Rework the address table code for the host1x firewall. The previous > implementation allocated a bitfield but didn't check for a valid pointer > so it could potentially crash. I don't think it could crash. The bitmaps was allocated as a 256-bit field, and the register offset gets AND'ed with 0xFF before being looked up. What am I missing? > Furthermore setting up a bitfield makes > the code more complex than it needs to be. Doesn't this perform worse than the current implementation? Going from 1 to 13 checks per write sounds less than ideal to me...
On Mon, Oct 07, 2013 at 01:34:44PM +0200, Erik Faye-Lund wrote: > On Mon, Oct 7, 2013 at 10:34 AM, Thierry Reding > <thierry.reding@gmail.com> wrote: > > Rework the address table code for the host1x firewall. The previous > > implementation allocated a bitfield but didn't check for a valid pointer > > so it could potentially crash. > > I don't think it could crash. The bitmaps was allocated as a 256-bit > field, and the register offset gets AND'ed with 0xFF before being > looked up. What am I missing? The pointer returned from the allocation is never checked, so it could theoretically be NULL and be used regardless. Also I'm not sure that AND'ing with 0xff is the right thing to do here. > > Furthermore setting up a bitfield makes > > the code more complex than it needs to be. > > Doesn't this perform worse than the current implementation? Going from > 1 to 13 checks per write sounds less than ideal to me... I'm not so sure. Caching should alleviate the issue with the increased amount of data. Then there's the fact that previously we needed to divide and compute the remainder of the division for the BIT_MASK and BIT_WORD operations. One other added benefit of this approach is that the address registers are stored in a const array and therefore could reside in a read-only memory region. With the previous approach, once somebody had access to the bitmap, it could easily be overwritten with zeros and effectively make the firewall useless. I'm not sure how likely that would be, but it could be done. I guess one could actually go and run a benchmark against both versions and balance the performance impact against the possible security implications. But given that we don't really have any benchmarks that's pretty hard to do. Thierry
On 07.10.2013 15:14, Thierry Reding wrote: > * PGP Signed by an unknown key > > On Mon, Oct 07, 2013 at 01:34:44PM +0200, Erik Faye-Lund wrote: >> On Mon, Oct 7, 2013 at 10:34 AM, Thierry Reding >> <thierry.reding@gmail.com> wrote: >>> Rework the address table code for the host1x firewall. The previous >>> implementation allocated a bitfield but didn't check for a valid pointer >>> so it could potentially crash. >> >> I don't think it could crash. The bitmaps was allocated as a 256-bit >> field, and the register offset gets AND'ed with 0xFF before being >> looked up. What am I missing? > > The pointer returned from the allocation is never checked, so it could > theoretically be NULL and be used regardless. > > Also I'm not sure that AND'ing with 0xff is the right thing to do here. Oops, there's a check for NULL missing. If that allocation fails, probe should fail, so we just need to propagate the error condition. Otherwise we might just leave the firewall off, and let everything go in unchecked. AND 0xff is necessary, because the same registers are mirrored in multiple contexts. AND removes the offset coming from context, and leaves just the plain register offset. > I'm not so sure. Caching should alleviate the issue with the increased > amount of data. Then there's the fact that previously we needed to > divide and compute the remainder of the division for the BIT_MASK and > BIT_WORD operations. Don't these get compiled into shifts and bitwise ands? > One other added benefit of this approach is that the address registers > are stored in a const array and therefore could reside in a read-only > memory region. With the previous approach, once somebody had access to > the bitmap, it could easily be overwritten with zeros and effectively > make the firewall useless. I'm not sure how likely that would be, but it > could be done. If somebody gets access to the bitmap, he has access to kernel data structures. GR2D firewall is the least of our worries in this case. Terje
On Mon, Oct 7, 2013 at 2:14 PM, Thierry Reding <thierry.reding@gmail.com> wrote: > On Mon, Oct 07, 2013 at 01:34:44PM +0200, Erik Faye-Lund wrote: >> On Mon, Oct 7, 2013 at 10:34 AM, Thierry Reding >> <thierry.reding@gmail.com> wrote: >> > Rework the address table code for the host1x firewall. The previous >> > implementation allocated a bitfield but didn't check for a valid pointer >> > so it could potentially crash. >> >> I don't think it could crash. The bitmaps was allocated as a 256-bit >> field, and the register offset gets AND'ed with 0xFF before being >> looked up. What am I missing? > > The pointer returned from the allocation is never checked, so it could > theoretically be NULL and be used regardless. > Right. Thanks for clarifying. > Also I'm not sure that AND'ing with 0xff is the right thing to do here. Well, maybe not. I'm not entirely convinced it's *not*, though. >> > Furthermore setting up a bitfield makes >> > the code more complex than it needs to be. >> >> Doesn't this perform worse than the current implementation? Going from >> 1 to 13 checks per write sounds less than ideal to me... > > I'm not so sure. Caching should alleviate the issue with the increased > amount of data. Then there's the fact that previously we needed to > divide and compute the remainder of the division for the BIT_MASK and > BIT_WORD operations. That's an AND and a shift, not division and modulo, both single-cycle instructions on ARM. I'm pretty sure that'd be a win. > One other added benefit of this approach is that the address registers > are stored in a const array and therefore could reside in a read-only > memory region. With the previous approach, once somebody had access to > the bitmap, it could easily be overwritten with zeros and effectively > make the firewall useless. I'm not sure how likely that would be, but it > could be done. Perhaps the bitmap could be generated compile-time and stuck in read-only memory? > I guess one could actually go and run a benchmark against both versions > and balance the performance impact against the possible security > implications. But given that we don't really have any benchmarks that's > pretty hard to do. Well, perhaps we could have both? :)
On Mon, Oct 7, 2013 at 2:53 PM, Erik Faye-Lund <kusmabite@gmail.com> wrote: > On Mon, Oct 7, 2013 at 2:14 PM, Thierry Reding <thierry.reding@gmail.com> wrote: >> On Mon, Oct 07, 2013 at 01:34:44PM +0200, Erik Faye-Lund wrote: >>> On Mon, Oct 7, 2013 at 10:34 AM, Thierry Reding >>> <thierry.reding@gmail.com> wrote: >>> > Rework the address table code for the host1x firewall. The previous >>> > implementation allocated a bitfield but didn't check for a valid pointer >>> > so it could potentially crash. >>> >>> I don't think it could crash. The bitmaps was allocated as a 256-bit >>> field, and the register offset gets AND'ed with 0xFF before being >>> looked up. What am I missing? >> >> The pointer returned from the allocation is never checked, so it could >> theoretically be NULL and be used regardless. >> >> Also I'm not sure that AND'ing with 0xff is the right thing to do here. > > Well, maybe not. I'm not entirely convinced it's *not*, though. > I'm sorry, I intended to fill out a bit more here before I hit send, and totally forgot. Point is, it seems only 0x00..0x4c is used (and then repeated at multiples of 0x4000 for each context, but the offsets don't have enough bits to reach that far), from looking at the TRM. So the question is really how the hardware treats writes to non-existent registers. My guess would be that they are simply not recorded, and if that's the case it doesn't matter what we do. And doing an unconditional AND is faster than doing a bit-test followed by a conditional branch. Of course, if the hardware does *not* ignore writes to non-existent register, then we might be in trouble. But for that, we'd probably want a full map of valid registers, not just saying that they aren't pointers, no?
On Mon, Oct 7, 2013 at 2:52 PM, Terje Bergström <tbergstrom@nvidia.com> wrote: > On 07.10.2013 15:14, Thierry Reding wrote: >> * PGP Signed by an unknown key >> >> On Mon, Oct 07, 2013 at 01:34:44PM +0200, Erik Faye-Lund wrote: >>> On Mon, Oct 7, 2013 at 10:34 AM, Thierry Reding >>> <thierry.reding@gmail.com> wrote: >>>> Rework the address table code for the host1x firewall. The previous >>>> implementation allocated a bitfield but didn't check for a valid pointer >>>> so it could potentially crash. >>> >>> I don't think it could crash. The bitmaps was allocated as a 256-bit >>> field, and the register offset gets AND'ed with 0xFF before being >>> looked up. What am I missing? >> >> The pointer returned from the allocation is never checked, so it could >> theoretically be NULL and be used regardless. >> >> Also I'm not sure that AND'ing with 0xff is the right thing to do here. > > Oops, there's a check for NULL missing. If that allocation fails, probe > should fail, so we just need to propagate the error condition. Otherwise > we might just leave the firewall off, and let everything go in unchecked. > > AND 0xff is necessary, because the same registers are mirrored in > multiple contexts. AND removes the offset coming from context, and > leaves just the plain register offset. > The offsets in the commands don't have enough bits to reach over to the next context. The contexts are repeated at multiples of 0x4000, and 0xFFF is the largest encodable offset. So I don't really thing the AND is needed for *that* purpose. > If somebody gets access to the bitmap, he has access to kernel data > structures. GR2D firewall is the least of our worries in this case. Indeed, that's only a problem if someone is already on the other side of the air-tight hatch.
On Mon, Oct 07, 2013 at 03:52:28PM +0300, Terje Bergström wrote: > On 07.10.2013 15:14, Thierry Reding wrote: > > * PGP Signed by an unknown key > > > > On Mon, Oct 07, 2013 at 01:34:44PM +0200, Erik Faye-Lund wrote: > >> On Mon, Oct 7, 2013 at 10:34 AM, Thierry Reding > >> <thierry.reding@gmail.com> wrote: > >>> Rework the address table code for the host1x firewall. The previous > >>> implementation allocated a bitfield but didn't check for a valid pointer > >>> so it could potentially crash. > >> > >> I don't think it could crash. The bitmaps was allocated as a 256-bit > >> field, and the register offset gets AND'ed with 0xFF before being > >> looked up. What am I missing? > > > > The pointer returned from the allocation is never checked, so it could > > theoretically be NULL and be used regardless. > > > > Also I'm not sure that AND'ing with 0xff is the right thing to do here. > > Oops, there's a check for NULL missing. If that allocation fails, probe > should fail, so we just need to propagate the error condition. Otherwise > we might just leave the firewall off, and let everything go in unchecked. Yes, definitely. > AND 0xff is necessary, because the same registers are mirrored in > multiple contexts. AND removes the offset coming from context, and > leaves just the plain register offset. That looks like something which should go into a comment. > > I'm not so sure. Caching should alleviate the issue with the increased > > amount of data. Then there's the fact that previously we needed to > > divide and compute the remainder of the division for the BIT_MASK and > > BIT_WORD operations. > > Don't these get compiled into shifts and bitwise ands? Yes, they are. > > One other added benefit of this approach is that the address registers > > are stored in a const array and therefore could reside in a read-only > > memory region. With the previous approach, once somebody had access to > > the bitmap, it could easily be overwritten with zeros and effectively > > make the firewall useless. I'm not sure how likely that would be, but it > > could be done. > > If somebody gets access to the bitmap, he has access to kernel data > structures. GR2D firewall is the least of our worries in this case. I see the point. Oh well, all my arguments are torn down. I'll drop this patch. Or at least the part that rewrites the gr2d_is_addr() and make it check for failed allocations properly. For what it's worth, I still think the plain table lookup is much easier to understand. Thierry
On 07.10.2013 16:05, Erik Faye-Lund wrote: > On Mon, Oct 7, 2013 at 2:52 PM, Terje Bergström <tbergstrom@nvidia.com> wrote: >> AND 0xff is necessary, because the same registers are mirrored in >> multiple contexts. AND removes the offset coming from context, and >> leaves just the plain register offset. > The offsets in the commands don't have enough bits to reach over to > the next context. The contexts are repeated at multiples of 0x4000, > and 0xFFF is the largest encodable offset. So I don't really thing the > AND is needed for *that* purpose. Well, that was embarrassing. Of course that is true. You can access all registers via MMIO or with SETCLASS to correct context. But that was still the reason I used when I wrote that 0xff, so it's bogus. Real fix is to do what Thierry already did: limit the bitmap to max register number we're interested in, and have a if() to fail fast if register number falls outside that range. Even better, get IOMMU up and running. Terje
On 07.10.2013 16:02, Erik Faye-Lund wrote: > So the question is really how the hardware treats writes to > non-existent registers. My guess would be that they are simply not > recorded, and if that's the case it doesn't matter what we do. And > doing an unconditional AND is faster than doing a bit-test followed by > a conditional branch. Hardware ignores writes to non-existent registers. Sometimes non-existent registers are taken into use in future versions, though. Terje
On Tue, Oct 8, 2013 at 7:48 AM, Terje Bergström <tbergstrom@nvidia.com> wrote: > On 07.10.2013 16:02, Erik Faye-Lund wrote: >> So the question is really how the hardware treats writes to >> non-existent registers. My guess would be that they are simply not >> recorded, and if that's the case it doesn't matter what we do. And >> doing an unconditional AND is faster than doing a bit-test followed by >> a conditional branch. > > Hardware ignores writes to non-existent registers. Sometimes > non-existent registers are taken into use in future versions, though. Right. That might be a motivation to treat non-existent registers as errors. But then I start to wonder about "holes" in the pointer address space should be treated differently. For instance, from the TRM it looks like registers 0x3 - 0x8, 0xe, 0x15 and 0x42 are undefined for T20. If writes to registers beyond 0x4c fails, shouldn't these also fail? Or are somehow guaranteed that these holes will never be plugged? Are they simply missing from the TRM? I dunno...
diff --git a/drivers/gpu/host1x/drm/gr2d.c b/drivers/gpu/host1x/drm/gr2d.c index 8d9a10f..07f0425 100644 --- a/drivers/gpu/host1x/drm/gr2d.c +++ b/drivers/gpu/host1x/drm/gr2d.c @@ -27,9 +27,8 @@ struct gr2d { struct host1x_client client; - struct clk *clk; struct host1x_channel *channel; - unsigned long *addr_regs; + struct clk *clk; }; static inline struct gr2d *to_gr2d(struct host1x_client *client) @@ -37,8 +36,6 @@ static inline struct gr2d *to_gr2d(struct host1x_client *client) return container_of(client, struct gr2d, client); } -static int gr2d_is_addr_reg(struct device *dev, u32 class, u32 reg); - static int gr2d_client_init(struct host1x_client *client, struct drm_device *drm) { @@ -56,7 +53,6 @@ static int gr2d_open_channel(struct host1x_client *client, struct gr2d *gr2d = to_gr2d(client); context->channel = host1x_channel_get(gr2d->channel); - if (!context->channel) return -ENOMEM; @@ -87,11 +83,38 @@ static struct host1x_bo *host1x_bo_lookup(struct drm_device *drm, return &bo->base; } +static const u32 gr2d_addr_regs[] = { + 0x1a, 0x1b, 0x26, 0x2b, 0x2c, 0x2d, 0x31, 0x32, + 0x48, 0x49, 0x4a, 0x4b, 0x4c +}; + +static int gr2d_is_addr_reg(struct device *dev, u32 class, u32 offset) +{ + unsigned int i; + + switch (class) { + case HOST1X_CLASS_HOST1X: + if (offset == 0x2b) + return 1; + + break; + + case HOST1X_CLASS_GR2D: + case HOST1X_CLASS_GR2D_SB: + for (i = 0; i < ARRAY_SIZE(gr2d_addr_regs); i++) + if (offset == gr2d_addr_regs[i]) + return 1; + + break; + } + + return 0; +} + static int gr2d_submit(struct tegra_drm_context *context, struct drm_tegra_submit *args, struct drm_device *drm, struct drm_file *file) { - struct host1x_job *job; unsigned int num_cmdbufs = args->num_cmdbufs; unsigned int num_relocs = args->num_relocs; unsigned int num_waitchks = args->num_waitchks; @@ -102,6 +125,7 @@ static int gr2d_submit(struct tegra_drm_context *context, struct drm_tegra_waitchk __user *waitchks = (void * __user)(uintptr_t)args->waitchks; struct drm_tegra_syncpt syncpt; + struct host1x_job *job; int err; /* We don't yet support other than one syncpt_incr struct per submit */ @@ -205,41 +229,6 @@ static struct host1x_client_ops gr2d_client_ops = { .submit = gr2d_submit, }; -static void gr2d_init_addr_reg_map(struct device *dev, struct gr2d *gr2d) -{ - const u32 gr2d_addr_regs[] = {0x1a, 0x1b, 0x26, 0x2b, 0x2c, 0x2d, 0x31, - 0x32, 0x48, 0x49, 0x4a, 0x4b, 0x4c}; - unsigned long *bitmap; - int i; - - bitmap = devm_kzalloc(dev, DIV_ROUND_UP(256, BITS_PER_BYTE), - GFP_KERNEL); - - for (i = 0; i < ARRAY_SIZE(gr2d_addr_regs); ++i) { - u32 reg = gr2d_addr_regs[i]; - bitmap[BIT_WORD(reg)] |= BIT_MASK(reg); - } - - gr2d->addr_regs = bitmap; -} - -static int gr2d_is_addr_reg(struct device *dev, u32 class, u32 reg) -{ - struct gr2d *gr2d = dev_get_drvdata(dev); - - switch (class) { - case HOST1X_CLASS_HOST1X: - return reg == 0x2b; - case HOST1X_CLASS_GR2D: - case HOST1X_CLASS_GR2D_SB: - reg &= 0xff; - if (gr2d->addr_regs[BIT_WORD(reg)] & BIT_MASK(reg)) - return 1; - default: - return 0; - } -} - static const struct of_device_id gr2d_match[] = { { .compatible = "nvidia,tegra30-gr2d" }, { .compatible = "nvidia,tegra20-gr2d" }, @@ -248,11 +237,11 @@ static const struct of_device_id gr2d_match[] = { static int gr2d_probe(struct platform_device *pdev) { + struct tegra_drm *tegra = host1x_get_drm_data(pdev->dev.parent); struct device *dev = &pdev->dev; - struct tegra_drm *tegra = host1x_get_drm_data(dev->parent); - int err; - struct gr2d *gr2d = NULL; struct host1x_syncpt **syncpts; + struct gr2d *gr2d; + int err; gr2d = devm_kzalloc(dev, sizeof(*gr2d), GFP_KERNEL); if (!gr2d) @@ -278,8 +267,8 @@ static int gr2d_probe(struct platform_device *pdev) if (!gr2d->channel) return -ENOMEM; - *syncpts = host1x_syncpt_request(dev, false); - if (!(*syncpts)) { + syncpts[0] = host1x_syncpt_request(dev, false); + if (!syncpts[0]) { host1x_channel_free(gr2d->channel); return -ENOMEM; } @@ -296,14 +285,12 @@ static int gr2d_probe(struct platform_device *pdev) return err; } - gr2d_init_addr_reg_map(dev, gr2d); - platform_set_drvdata(pdev, gr2d); return 0; } -static int __exit gr2d_remove(struct platform_device *pdev) +static int gr2d_remove(struct platform_device *pdev) { struct tegra_drm *tegra = host1x_get_drm_data(pdev->dev.parent); struct gr2d *gr2d = platform_get_drvdata(pdev); @@ -312,7 +299,8 @@ static int __exit gr2d_remove(struct platform_device *pdev) err = host1x_unregister_client(tegra, &gr2d->client); if (err < 0) { - dev_err(&pdev->dev, "failed to unregister client: %d\n", err); + dev_err(&pdev->dev, "failed to unregister host1x client: %d\n", + err); return err; } @@ -326,11 +314,10 @@ static int __exit gr2d_remove(struct platform_device *pdev) } struct platform_driver tegra_gr2d_driver = { - .probe = gr2d_probe, - .remove = __exit_p(gr2d_remove), .driver = { - .owner = THIS_MODULE, .name = "gr2d", .of_match_table = gr2d_match, - } + }, + .probe = gr2d_probe, + .remove = gr2d_remove, };
Rework the address table code for the host1x firewall. The previous implementation allocated a bitfield but didn't check for a valid pointer so it could potentially crash. Furthermore setting up a bitfield makes the code more complex than it needs to be. Don't annotate the driver's .remove() function __exit. Even if built in the driver can be unloaded via sysfs, so .remove() needs to stick around after initialization. Also remove the explicit initialization of the driver's .owner field to THIS_MODULE because that's now handled by the driver core. Furthermore make an error message more consistent with other subdrivers, index the syncpts array for better readability, remove a gratuituous newline and reorder some variable declarations to make the code easier to read. Signed-off-by: Thierry Reding <treding@nvidia.com> --- drivers/gpu/host1x/drm/gr2d.c | 95 +++++++++++++++++++------------------------ 1 file changed, 41 insertions(+), 54 deletions(-)