Message ID | 20200221143820.2795039-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | dma-buf: Precheck for a valid dma_fence before acquiring the reference | expand |
On Fri, Feb 21, 2020 at 3:38 PM Chris Wilson <chris@chris-wilson.co.uk> wrote: > dma_fence_get_rcu() is used to acquire a reference to under a dma-fence > under racey conditions -- a perfect recipe for a disaster. As we know > the caller may be handling stale memory, use kasan to confirm the > dma-fence, or rather its memory block, is valid before attempting to > acquire a reference. This should help us to more quickly and clearly > identify lost races. Hm ... I'm a bit lost on the purpose, and what this does. Fences need to be rcu-freed, and I have honestly no idea how kasan treats those. Are we throwing false positives, because kasan thinks the stuff is freed, but we're still accessing it (while the grace period hasn't passed, so anything freed is still guaranteed to be at least in the slab cache somewhere). I'm not seeing how this catches lost races quicker, since the refcount should get to 0 way before we get to the kfree. So the refcount check on the next line should catch strictly more races than the kasan check. -Daniel > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > --- > include/linux/dma-fence.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h > index 3347c54f3a87..2805edd74738 100644 > --- a/include/linux/dma-fence.h > +++ b/include/linux/dma-fence.h > @@ -301,6 +301,9 @@ static inline struct dma_fence *dma_fence_get(struct dma_fence *fence) > */ > static inline struct dma_fence *dma_fence_get_rcu(struct dma_fence *fence) > { > + if (unlikely(!kasan_check_read(fence, sizeof(*fence)))) > + return NULL; > + > if (kref_get_unless_zero(&fence->refcount)) > return fence; > else > -- > 2.25.1 >
Quoting Daniel Vetter (2020-02-21 15:17:24) > On Fri, Feb 21, 2020 at 3:38 PM Chris Wilson <chris@chris-wilson.co.uk> wrote: > > dma_fence_get_rcu() is used to acquire a reference to under a dma-fence > > under racey conditions -- a perfect recipe for a disaster. As we know > > the caller may be handling stale memory, use kasan to confirm the > > dma-fence, or rather its memory block, is valid before attempting to > > acquire a reference. This should help us to more quickly and clearly > > identify lost races. > > Hm ... I'm a bit lost on the purpose, and what this does. Fences need > to be rcu-freed, and I have honestly no idea how kasan treats those. > Are we throwing false positives, because kasan thinks the stuff is > freed, but we're still accessing it (while the grace period hasn't > passed, so anything freed is still guaranteed to be at least in the > slab cache somewhere). > > I'm not seeing how this catches lost races quicker, since the refcount > should get to 0 way before we get to the kfree. So the refcount check > on the next line should catch strictly more races than the kasan > check. It's not about the fence itself, but those pointing to the fence. That's where we may find garbage, and by returning NULL the kernel keeps working for a bit longer as you try to piece together the race. -Chris
Quoting Chris Wilson (2020-02-21 15:23:38) > Quoting Daniel Vetter (2020-02-21 15:17:24) > > On Fri, Feb 21, 2020 at 3:38 PM Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > dma_fence_get_rcu() is used to acquire a reference to under a dma-fence > > > under racey conditions -- a perfect recipe for a disaster. As we know > > > the caller may be handling stale memory, use kasan to confirm the > > > dma-fence, or rather its memory block, is valid before attempting to > > > acquire a reference. This should help us to more quickly and clearly > > > identify lost races. > > > > Hm ... I'm a bit lost on the purpose, and what this does. Fences need > > to be rcu-freed, and I have honestly no idea how kasan treats those. > > Are we throwing false positives, because kasan thinks the stuff is > > freed, but we're still accessing it (while the grace period hasn't > > passed, so anything freed is still guaranteed to be at least in the > > slab cache somewhere). > > > > I'm not seeing how this catches lost races quicker, since the refcount > > should get to 0 way before we get to the kfree. So the refcount check > > on the next line should catch strictly more races than the kasan > > check. > > It's not about the fence itself, but those pointing to the fence. That's > where we may find garbage, and by returning NULL the kernel keeps > working for a bit longer as you try to piece together the race. Plus given all the inlining, the kasan warning for the refcount is not that clear about where it is called from, which is a bit of a nuisance, so an explicit warning was much easier for finding the culprit. -Chris
On Fri, Feb 21, 2020 at 4:26 PM Chris Wilson <chris@chris-wilson.co.uk> wrote: > > Quoting Chris Wilson (2020-02-21 15:23:38) > > Quoting Daniel Vetter (2020-02-21 15:17:24) > > > On Fri, Feb 21, 2020 at 3:38 PM Chris Wilson <chris@chris-wilson.co.uk> wrote: > > > > dma_fence_get_rcu() is used to acquire a reference to under a dma-fence > > > > under racey conditions -- a perfect recipe for a disaster. As we know > > > > the caller may be handling stale memory, use kasan to confirm the > > > > dma-fence, or rather its memory block, is valid before attempting to > > > > acquire a reference. This should help us to more quickly and clearly > > > > identify lost races. > > > > > > Hm ... I'm a bit lost on the purpose, and what this does. Fences need > > > to be rcu-freed, and I have honestly no idea how kasan treats those. > > > Are we throwing false positives, because kasan thinks the stuff is > > > freed, but we're still accessing it (while the grace period hasn't > > > passed, so anything freed is still guaranteed to be at least in the > > > slab cache somewhere). > > > > > > I'm not seeing how this catches lost races quicker, since the refcount > > > should get to 0 way before we get to the kfree. So the refcount check > > > on the next line should catch strictly more races than the kasan > > > check. > > > > It's not about the fence itself, but those pointing to the fence. That's > > where we may find garbage, and by returning NULL the kernel keeps > > working for a bit longer as you try to piece together the race. > > Plus given all the inlining, the kasan warning for the refcount is not > that clear about where it is called from, which is a bit of a nuisance, > so an explicit warning was much easier for finding the culprit. Hm ok, that makes a lot more sense. Can you pls amend the commit message (paste the above if you feel uninspired, this discussion explains pretty good what's the point). And maybe a link to bug report or include an example splat and how it's improved. With that stuff added Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> But maybe get another co-conspirator/bug-hunter to ack this too. -Daniel
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 3347c54f3a87..2805edd74738 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -301,6 +301,9 @@ static inline struct dma_fence *dma_fence_get(struct dma_fence *fence) */ static inline struct dma_fence *dma_fence_get_rcu(struct dma_fence *fence) { + if (unlikely(!kasan_check_read(fence, sizeof(*fence)))) + return NULL; + if (kref_get_unless_zero(&fence->refcount)) return fence; else
dma_fence_get_rcu() is used to acquire a reference to under a dma-fence under racey conditions -- a perfect recipe for a disaster. As we know the caller may be handling stale memory, use kasan to confirm the dma-fence, or rather its memory block, is valid before attempting to acquire a reference. This should help us to more quickly and clearly identify lost races. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> --- include/linux/dma-fence.h | 3 +++ 1 file changed, 3 insertions(+)