Message ID | 53CE6FB0.90500@canonical.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> No, you really shouldn't be doing much in the check anyway, it's meant to be a lightweight check. If you're not ready yet because of a lockup simply return not signaled yet. It's not only the lockup case from radeon I have in mind here. For userspace queues it might be necessary to call copy_from_user to figure out if a fence is signaled or not. Returning false all the time is probably not a good idea either. Christian. Am 22.07.2014 16:05, schrieb Maarten Lankhorst: > op 22-07-14 14:19, Christian König schreef: >> Am 22.07.2014 13:57, schrieb Daniel Vetter: >>> On Tue, Jul 22, 2014 at 01:46:07PM +0200, Daniel Vetter wrote: >>>> On Tue, Jul 22, 2014 at 10:43:13AM +0200, Christian König wrote: >>>>> Am 22.07.2014 06:05, schrieb Dave Airlie: >>>>>> On 9 July 2014 22:29, Maarten Lankhorst <maarten.lankhorst@canonical.com> wrote: >>>>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> >>>>>>> --- >>>>>>> drivers/gpu/drm/radeon/radeon.h | 15 +- >>>>>>> drivers/gpu/drm/radeon/radeon_device.c | 60 ++++++++- >>>>>>> drivers/gpu/drm/radeon/radeon_fence.c | 223 ++++++++++++++++++++++++++------ >>>>>>> 3 files changed, 248 insertions(+), 50 deletions(-) >>>>>>> >>>>>> From what I can see this is still suffering from the problem that we >>>>>> need to find a proper solution to, >>>>>> >>>>>> My summary of the issues after talking to Jerome and Ben and >>>>>> re-reading things is: >>>>>> >>>>>> We really need to work out a better interface into the drivers to be >>>>>> able to avoid random atomic entrypoints, >>>>> Which is exactly what I criticized from the very first beginning. Good to >>>>> know that I'm not the only one thinking that this isn't such a good idea. >>>> I guess I've lost context a bit, but which atomic entry point are we >>>> talking about? Afaics the only one that's mandatory is the is >>>> fence->signaled callback to check whether a fence really has been >>>> signalled. It's used internally by the fence code to avoid spurious >>>> wakeups. Afaik that should be doable already on any hardware. If that's >>>> not the case then we can always track the signalled state in software and >>>> double-check in a worker thread before updating the sw state. And wrap >>>> this all up into a special fence class if there's more than one driver >>>> needing this. >>> One thing I've forgotten: The i915 scheduler that's floating around runs >>> its bottom half from irq context. So I really want to be able to check >>> fence state from irq context and I also want to make it possible >>> (possible! not mandatory) to register callbacks which are run from any >>> context asap after the fence is signalled. >> NAK, that's just the bad design I've talked about. Checking fence state inside the same driver from interrupt context is OK, because it's the drivers interrupt that we are talking about here. >> >> Checking fence status from another drivers interrupt context is what really concerns me here, cause your driver doesn't have the slightest idea if the called driver is really capable of checking the fence right now. > I think there is a usecase for having atomic context allowed with fence_is_signaled, but I don't think there is one for interrupt context, so it's good with me if fence_is_signaled cannot be called in interrupt context, or with irqs disabled. > > fence_enable_sw_signaling disables interrupts because it holds fence->lock, so in theory it could be called from any context including interrupts. But no sane driver author does that, or at least I hope not.. > > Would a sanity check like the one below be enough to allay your fears? > 8<------- > > diff --git a/include/linux/fence.h b/include/linux/fence.h > index d174585b874b..c1a4519ba2f5 100644 > --- a/include/linux/fence.h > +++ b/include/linux/fence.h > @@ -143,6 +143,7 @@ struct fence_cb { > * the second time will be a noop since it was already signaled. > * > * Notes on signaled: > + * Called with interrupts enabled, and never from interrupt context. > * May set fence->status if returning true. > * > * Notes on wait: > @@ -268,15 +269,29 @@ fence_is_signaled_locked(struct fence *fence) > static inline bool > fence_is_signaled(struct fence *fence) > { > + bool ret; > + > if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->flags)) > return true; > > - if (fence->ops->signaled && fence->ops->signaled(fence)) { > + if (!fence->ops->signaled) > + return false; > + > + if (config_enabled(CONFIG_PROVE_LOCKING)) > + WARN_ON(in_interrupt() || irqs_disabled()); > + > + if (config_enabled(CONFIG_DEBUG_ATOMIC_SLEEP)) > + preempt_disable(); > + > + ret = fence->ops->signaled(fence); > + > + if (config_enabled(CONFIG_DEBUG_ATOMIC_SLEEP)) > + preempt_enable(); > + > + if (ret) > fence_signal(fence); > - return true; > - } > > - return false; > + return ret; > } > > /** > 8<-------- > >>> If the radeon hw/driver doesn't want to cope with that complexity we can >>> fully insolate it with the sw tracked fence state if you don't like >>> Maarten's radeon implementation. But forcing everyone to forgoe this just >>> because you don't like it and don't want to use it in radeon doesn't sound >>> right. >> While it's clearly a hack Maarten's solution for radeon would indeed work, but that's not really the point here. >> >> It's just that I think leaking interrupt context from one driver into another driver is just a really really bad idea from a design point of view. >> >> And calling into a driver while in atomic context to check for a fence being signaled doesn't sounds like a good idea either, cause that limits way to much what the called driver can do for checking the status of a fence. > No, you really shouldn't be doing much in the check anyway, it's meant to be a lightweight check. If you're not ready yet because of a lockup simply return not signaled yet. > > ~Maarten >
op 22-07-14 16:24, Christian König schreef: >> No, you really shouldn't be doing much in the check anyway, it's meant to be a lightweight check. If you're not ready yet because of a lockup simply return not signaled yet. > It's not only the lockup case from radeon I have in mind here. For userspace queues it might be necessary to call copy_from_user to figure out if a fence is signaled or not. > > Returning false all the time is probably not a good idea either. Having userspace implement a fence sounds like an awful idea, why would you want to do that? A fence could be exported to userspace, but that would only mean it can wait for it to be signaled with an interface like poll.. ~Maarten
Am 22.07.2014 16:27, schrieb Maarten Lankhorst: > op 22-07-14 16:24, Christian König schreef: >>> No, you really shouldn't be doing much in the check anyway, it's meant to be a lightweight check. If you're not ready yet because of a lockup simply return not signaled yet. >> It's not only the lockup case from radeon I have in mind here. For userspace queues it might be necessary to call copy_from_user to figure out if a fence is signaled or not. >> >> Returning false all the time is probably not a good idea either. > Having userspace implement a fence sounds like an awful idea, why would you want to do that? Marketing moves in mysterious ways. Don't ask me, but that the direction it currently moves with userspace queues and IOMMU etc... > A fence could be exported to userspace, but that would only mean it can wait for it to be signaled with an interface like poll.. Yeah agree totally, but the point for the fence interface is that I can't predict what's necessary to check if a fence is signaled or not on future hardware. For the currently available radeon hardware I can say that reading a value from a kernel page is pretty much all you need. But for older hardware that was reading from a register which might become very tricky if the hardware is power off or currently inside a reset cycle. Because off this I would avoid any such interface if it's not absolutely required by some use case, and currently I don't see this requirement because the functionality you want to archive could be implemented without this. Christian. > > ~Maarten >
op 22-07-14 16:39, Christian König schreef: > Am 22.07.2014 16:27, schrieb Maarten Lankhorst: >> op 22-07-14 16:24, Christian König schreef: >>>> No, you really shouldn't be doing much in the check anyway, it's meant to be a lightweight check. If you're not ready yet because of a lockup simply return not signaled yet. >>> It's not only the lockup case from radeon I have in mind here. For userspace queues it might be necessary to call copy_from_user to figure out if a fence is signaled or not. >>> >>> Returning false all the time is probably not a good idea either. >> Having userspace implement a fence sounds like an awful idea, why would you want to do that? > > Marketing moves in mysterious ways. Don't ask me, but that the direction it currently moves with userspace queues and IOMMU etc... > >> A fence could be exported to userspace, but that would only mean it can wait for it to be signaled with an interface like poll.. > > Yeah agree totally, but the point for the fence interface is that I can't predict what's necessary to check if a fence is signaled or not on future hardware. > > For the currently available radeon hardware I can say that reading a value from a kernel page is pretty much all you need. But for older hardware that was reading from a register which might become very tricky if the hardware is power off or currently inside a reset cycle. > > Because off this I would avoid any such interface if it's not absolutely required by some use case, and currently I don't see this requirement because the functionality you want to archive could be implemented without this. Oh? I've already done that in radeon_fence, there is no way enable_signaling will fiddle with hardware registers during a reset cycle. I've also made sure that __radeon_fence_is_signaled grabs exclusive_lock in read mode before touching any hw state. Older hardware also doesn't implement optimus, so I think power off is not much of a worry for them, if you could point me at the checking done for that I could make sure that this is the case. ~Maarten
Am 22.07.2014 16:47, schrieb Maarten Lankhorst: > op 22-07-14 16:39, Christian König schreef: >> Am 22.07.2014 16:27, schrieb Maarten Lankhorst: >>> op 22-07-14 16:24, Christian König schreef: >>>>> No, you really shouldn't be doing much in the check anyway, it's meant to be a lightweight check. If you're not ready yet because of a lockup simply return not signaled yet. >>>> It's not only the lockup case from radeon I have in mind here. For userspace queues it might be necessary to call copy_from_user to figure out if a fence is signaled or not. >>>> >>>> Returning false all the time is probably not a good idea either. >>> Having userspace implement a fence sounds like an awful idea, why would you want to do that? >> Marketing moves in mysterious ways. Don't ask me, but that the direction it currently moves with userspace queues and IOMMU etc... >> >>> A fence could be exported to userspace, but that would only mean it can wait for it to be signaled with an interface like poll.. >> Yeah agree totally, but the point for the fence interface is that I can't predict what's necessary to check if a fence is signaled or not on future hardware. >> >> For the currently available radeon hardware I can say that reading a value from a kernel page is pretty much all you need. But for older hardware that was reading from a register which might become very tricky if the hardware is power off or currently inside a reset cycle. >> >> Because off this I would avoid any such interface if it's not absolutely required by some use case, and currently I don't see this requirement because the functionality you want to archive could be implemented without this. > Oh? I've already done that in radeon_fence, there is no way enable_signaling will fiddle with hardware registers during a reset cycle. > I've also made sure that __radeon_fence_is_signaled grabs exclusive_lock in read mode before touching any hw state. > > Older hardware also doesn't implement optimus, so I think power off is not much of a worry for them, if you could point me at the checking done for that I could make sure that this is the case. I'm not talking about any specific radeon hardware or use case here. As far as I can see you indeed solved all driver problems with the current interface design. The question I'm raising is if the current interface design needs as complex as it is. And my answer to this is a clear *no*, so why do you want to stick with this design? I still haven't understood that. If it's just to support a further feature of direct synchronization in interrupt context between different drivers then I must clearly say that this is a NAK, cause you add complexity to the kernel that isn't necessary. Christian. > > ~Maarten >
On Tue, Jul 22, 2014 at 4:39 PM, Christian König <christian.koenig@amd.com> wrote: > Am 22.07.2014 16:27, schrieb Maarten Lankhorst: > >> op 22-07-14 16:24, Christian König schreef: >>>> >>>> No, you really shouldn't be doing much in the check anyway, it's meant >>>> to be a lightweight check. If you're not ready yet because of a lockup >>>> simply return not signaled yet. >>> >>> It's not only the lockup case from radeon I have in mind here. For >>> userspace queues it might be necessary to call copy_from_user to figure out >>> if a fence is signaled or not. >>> >>> Returning false all the time is probably not a good idea either. >> >> Having userspace implement a fence sounds like an awful idea, why would >> you want to do that? > > > Marketing moves in mysterious ways. Don't ask me, but that the direction it > currently moves with userspace queues and IOMMU etc... Fence-based syncing between userspace queues submitted stuff through doorbells and anything submitted by the general simply wont work. Which is why I think the doorbell is a stupid interface since I just don't see cameras and v4l devices implementing all that complexity to get a pure userspace side sync solution. But that's a different problem really, and I guess marketing will eventually figure this one out, too. -Daniel
On Tue, Jul 22, 2014 at 11:19 AM, Daniel Vetter <daniel@ffwll.ch> wrote: > On Tue, Jul 22, 2014 at 4:39 PM, Christian König > <christian.koenig@amd.com> wrote: >> Am 22.07.2014 16:27, schrieb Maarten Lankhorst: >> >>> op 22-07-14 16:24, Christian König schreef: >>>>> >>>>> No, you really shouldn't be doing much in the check anyway, it's meant >>>>> to be a lightweight check. If you're not ready yet because of a lockup >>>>> simply return not signaled yet. >>>> >>>> It's not only the lockup case from radeon I have in mind here. For >>>> userspace queues it might be necessary to call copy_from_user to figure out >>>> if a fence is signaled or not. >>>> >>>> Returning false all the time is probably not a good idea either. >>> >>> Having userspace implement a fence sounds like an awful idea, why would >>> you want to do that? >> >> >> Marketing moves in mysterious ways. Don't ask me, but that the direction it >> currently moves with userspace queues and IOMMU etc... > > Fence-based syncing between userspace queues submitted stuff through > doorbells and anything submitted by the general simply wont work. > Which is why I think the doorbell is a stupid interface since I just > don't see cameras and v4l devices implementing all that complexity to > get a pure userspace side sync solution. > Like it or not this is what a lot of application writers want (look at mantle and metal and similar new APIs or android synpts). Having queues and fences in userspace allows the application to structure things to best fit their own task graphs. The app can decide how to deal with dependencies and synchronization explicitly instead of blocking the queues in the kernel for everyone. Anyway, this is getting off topic. Alex
On Tue, Jul 22, 2014 at 5:42 PM, Alex Deucher <alexdeucher@gmail.com> wrote: >> Fence-based syncing between userspace queues submitted stuff through >> doorbells and anything submitted by the general simply wont work. >> Which is why I think the doorbell is a stupid interface since I just >> don't see cameras and v4l devices implementing all that complexity to >> get a pure userspace side sync solution. >> > > Like it or not this is what a lot of application writers want (look at > mantle and metal and similar new APIs or android synpts). Having > queues and fences in userspace allows the application to structure > things to best fit their own task graphs. The app can decide how to > deal with dependencies and synchronization explicitly instead of > blocking the queues in the kernel for everyone. Anyway, this is > getting off topic. Well there's explicit fences as used in opencl and android syncpts. My plan is actually to support that in i915 using Maarten's struct fence stuff (and there's just a very trivial patch for the android stuff in merging needed to get there). What doesn't work is fences created behind the kernel's back purely in userspace by giving shared memory locations special meaning. Those get the kernel completely out of the picture (as opposed to android syncpts, which just make sync explicit). I guess long-term we might need something like gpu futexes to make that pure userspace syncing integrate a bit better, but imo that's (at least for now) out of scope. For fences here I have the goal of one internally representation used by both implicit syncing (dma-buf on classic linux, e.g. prime) and explicit fencing on android or opencl or something like that. We don't have the code yet ready, but that's the direction i915 will move towards for the near future. Jesse is working on some patches already. -Daniel
On Tue, 22 Jul 2014 17:48:18 +0200 Daniel Vetter <daniel@ffwll.ch> wrote: > On Tue, Jul 22, 2014 at 5:42 PM, Alex Deucher <alexdeucher@gmail.com> wrote: > >> Fence-based syncing between userspace queues submitted stuff through > >> doorbells and anything submitted by the general simply wont work. > >> Which is why I think the doorbell is a stupid interface since I just > >> don't see cameras and v4l devices implementing all that complexity to > >> get a pure userspace side sync solution. > >> > > > > Like it or not this is what a lot of application writers want (look at > > mantle and metal and similar new APIs or android synpts). Having > > queues and fences in userspace allows the application to structure > > things to best fit their own task graphs. The app can decide how to > > deal with dependencies and synchronization explicitly instead of > > blocking the queues in the kernel for everyone. Anyway, this is > > getting off topic. > > Well there's explicit fences as used in opencl and android syncpts. My > plan is actually to support that in i915 using Maarten's struct fence > stuff (and there's just a very trivial patch for the android stuff in > merging needed to get there). What doesn't work is fences created > behind the kernel's back purely in userspace by giving shared memory > locations special meaning. Those get the kernel completely out of the > picture (as opposed to android syncpts, which just make sync > explicit). > > I guess long-term we might need something like gpu futexes to make > that pure userspace syncing integrate a bit better, but imo that's (at > least for now) out of scope. For fences here I have the goal of one Yeah, with a little kernel help you could have a mostly kernel independent sync mechanism using just shared mem in userspace. The kernel would just need to signal any interested clients when something happened (even if it didn't know what) and let userspace sort out the rest. I think that would be a nice thing to provide at some point, as it could allow for some fine grained CPU/GPU algorithms that use lightweight synchronization with and without busy looping on the CPU side. But all of that is definitely a lower priority than getting explicit fencing exported to userspace to work right, both for intra-driver sync and inter-driver sync. > internally representation used by both implicit syncing (dma-buf on > classic linux, e.g. prime) and explicit fencing on android or opencl > or something like that. > > We don't have the code yet ready, but that's the direction i915 will > move towards for the near future. Jesse is working on some patches > already. Yeah I'd like to get some feedback from Maarten on my bits so I can get them ready for upstream. I still need to add documentation and tests, but I'd like to make sure the interfaces and internals get acked first. Thanks,
On Tue, Jul 22, 2014 at 9:14 PM, Jesse Barnes <jbarnes@virtuousgeek.org> wrote: >> We don't have the code yet ready, but that's the direction i915 will >> move towards for the near future. Jesse is working on some patches >> already. > > Yeah I'd like to get some feedback from Maarten on my bits so I can get > them ready for upstream. I still need to add documentation and tests, > but I'd like to make sure the interfaces and internals get acked first. Review works better if you supply a pointer to the patches ;-) I asked Maarten whether he looked at it and he said he didn't know where ... -Daniel
On Wed, 23 Jul 2014 11:47:15 +0200 Daniel Vetter <daniel@ffwll.ch> wrote: > On Tue, Jul 22, 2014 at 9:14 PM, Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > >> We don't have the code yet ready, but that's the direction i915 will > >> move towards for the near future. Jesse is working on some patches > >> already. > > > > Yeah I'd like to get some feedback from Maarten on my bits so I can get > > them ready for upstream. I still need to add documentation and tests, > > but I'd like to make sure the interfaces and internals get acked first. > > Review works better if you supply a pointer to the patches ;-) I asked > Maarten whether he looked at it and he said he didn't know where ... Oh I provided it in IRC earlier, figured Maarten was just busy. :) Tree is android-dma-buf-i915-fences in my fdo linux repo.
diff --git a/include/linux/fence.h b/include/linux/fence.h index d174585b874b..c1a4519ba2f5 100644 --- a/include/linux/fence.h +++ b/include/linux/fence.h @@ -143,6 +143,7 @@ struct fence_cb { * the second time will be a noop since it was already signaled. * * Notes on signaled: + * Called with interrupts enabled, and never from interrupt context. * May set fence->status if returning true. * * Notes on wait: @@ -268,15 +269,29 @@ fence_is_signaled_locked(struct fence *fence) static inline bool fence_is_signaled(struct fence *fence) { + bool ret; + if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->flags)) return true; - if (fence->ops->signaled && fence->ops->signaled(fence)) { + if (!fence->ops->signaled) + return false; + + if (config_enabled(CONFIG_PROVE_LOCKING)) + WARN_ON(in_interrupt() || irqs_disabled()); + + if (config_enabled(CONFIG_DEBUG_ATOMIC_SLEEP)) + preempt_disable(); + + ret = fence->ops->signaled(fence); + + if (config_enabled(CONFIG_DEBUG_ATOMIC_SLEEP)) + preempt_enable(); + + if (ret) fence_signal(fence); - return true; - } - return false; + return ret; } /**