Message ID | 20190220164200.31044-1-aaron.ma@canonical.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] Input: synaptics-rmi4 - clear irqs before set irqs | expand |
On Wed, 2019-02-20 at 17:41 +0100, Aaron Ma wrote: > CAUTION: Email originated externally, do not click links or open > attachments unless you recognize the sender and know the content is > safe. > > > rmi4 got spam data after S3 resume on some ThinkPads. > Then TrackPoint lost when be detected by psmouse. > Clear irqs status before set irqs will make TrackPoint back. > > BugLink: > https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.launchpad.net_bugs_1791427&d=DwIBAg&c=7dfBJ8cXbWjhc0BhImu8wQ&r=veOxv1_7HLXIaWG-OKLqp-qvZc3r7ucT1d-68JSWqpM&m=3nNm4ob6G1wtf502YFuxorJVkSQvdBasje2RrZLxhTM&s=Z0nHSPAKhQLzdoENAZBYuDC6QmZNOliyiE7h1OOVkBA&e= > Cc: <stable@vger.kernel.org> > Signed-off-by: Aaron Ma <aaron.ma@canonical.com> > --- > drivers/input/rmi4/rmi_driver.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/input/rmi4/rmi_driver.c > b/drivers/input/rmi4/rmi_driver.c > index fc3ab93b7aea..20631b272f43 100644 > --- a/drivers/input/rmi4/rmi_driver.c > +++ b/drivers/input/rmi4/rmi_driver.c > @@ -374,6 +374,17 @@ static int rmi_driver_set_irq_bits(struct > rmi_device *rmi_dev, > struct device *dev = &rmi_dev->dev; > > mutex_lock(&data->irq_mutex); > + > + /* Dummy read in order to clear irqs */ > + error = rmi_read_block(rmi_dev, > + data->f01_container->fd.data_base_addr + 1, > + data->irq_status, data->num_of_irq_regs); > + if (error < 0) { > + dev_err(dev, "%s: Failed to read interrupt status!", > + __func__); > + goto error_unlock; > + } > + > bitmap_or(data->new_irq_mask, > data->current_irq_mask, mask, data->irq_count); > > -- > 2.17.1 > Sorry for the long delay in following up on this. I'm not sure this is a safe action, due to a race condition with the actual IRQ handler (rmi_process_interrupt_requests from rmi_driver.c). Remember that reading the IRQ status register clears all the IRQ bits. So you're faced with this possible scenario: - ATTN asserted, indicating new data in IRQ status register - rmi_driver_set_irq_bits called - rmi_driver_set_irq_bits reads IRQ status, clearing bits - rmi_process_interrupt_requests called - rmi_process_interrupt_request reads IRQ status, sees no bits set, nested IRQs are not handled This could lead to loss of data or inconsistent system state information. For example, a button up or down event may be lost, with consequent weird behavior by the user interface. CAVEAT: I haven't had much to do with the RMI4 driver for a long while, and am just starting to poke around with it again. I am not very familiar with the current IRQ handling implementation. Perhaps there is a guarantee in the kernel IRQ mechanism that once ATTN is asserted, then rmi_process_interrupt_requests will be called before anyone else gets a chance to read the IRQ status register. If that's the case, let me know I'm worried about nothing, and ignore this comment. Cheers, Chris
Hi, On 3/9/19 7:13 AM, Christopher Heiny wrote: > I'm not sure this is a safe action, due to a race condition with the > actual IRQ handler (rmi_process_interrupt_requests from rmi_driver.c). > Remember that reading the IRQ status register clears all the IRQ bits. > So you're faced with this possible scenario: > - ATTN asserted, indicating new data in IRQ status register > - rmi_driver_set_irq_bits called > - rmi_driver_set_irq_bits reads IRQ status, clearing bits > - rmi_process_interrupt_requests called > - rmi_process_interrupt_request reads IRQ status, sees no > bits set, nested IRQs are not handled > This could lead to loss of data or inconsistent system state > information. For example, a button up or down event may be lost, with > consequent weird behavior by the user interface. rmi_driver_set_irq_bits is only called to config and enable specific functions of RMI. Reading IRQ status before set irqs is supposed to clear spam data/irq status. spam data make probe/detect touchpad/trackpoint fail. rmi_smb_resume -> rmi_driver_reset_handler -> fn-config -> clear_irq_bits -> set_irq_bits -> enable_irq -> irq_handler -> rmi_process_interrupt_requests set_irq_bits will not be in interrupt context, it enables IRQ bits of RMI. Regards, Aaron
Hi Dmitry and Chiristopher: Do you have any suggestion about these 2 patches? Many users confirmed that they fixed issues of Trackpoint/Touchpad after S3. Will you consider them be accepted? Thanks, Aaron
On Thu, 2019-03-28 at 14:02 +0800, Aaron Ma wrote: > Hi Dmitry and Chiristopher: > > Do you have any suggestion about these 2 patches? > > Many users confirmed that they fixed issues of Trackpoint/Touchpad > after S3. > > Will you consider them be accepted? Hi Aaron, Sorry - I thought I'd replied on the NO SLEEP portion of these patches, but looking back I don't find the draft or the sent email. Sorry about that. I'll summarize here what I wrote last month. This isn't so much a "fix" as a "hacky workaround" for the issue. From the descriptions in the bug you linked in your original patch submission, it appears that the root cause is somewhere in SMBus system (could be SMBus driver, SMBus hardware, or the devices on the SMBus (touch devices or other devices) - it's hard to tell with the info available), where the SMBus is failing to come online correctly coming out of S3 state. Anyway, this patch doesn't fix the root cause, but merely works around it. Setting the NO SLEEP bit will force the touch sensor to remain in a high power consumption state while the rest of the system is in S3. While not a lot of power compared to things like the CPU, display, and others, it is still non-trivial and will result in shorter time-on- battery capability. If you have access to the power pin(s) for the touch sensor(s)/styk(s), it might be interesting to try turning power off on entering S3, and restoring it on exit. That's very hacky, and has the side effect of slightly delaying touchpad readiness on exit from S3. Plus you'll need to restore touch sensor configuration settings on exit. But it definitely reduces power consumption. Separately, I am still concerned about the possibility of dropped touch events in the IRQ clearing. I'm not convinced that the code is safe (as you mentioned in your reply to my earlier comment), so I'll have to study the implementation more carefully. Cheers, Chris > > Thanks, > Aaron
On 4/3/19 12:16 AM, Christopher Heiny wrote: > On Thu, 2019-03-28 at 14:02 +0800, Aaron Ma wrote: >> Hi Dmitry and Chiristopher: >> >> Do you have any suggestion about these 2 patches? >> >> Many users confirmed that they fixed issues of Trackpoint/Touchpad >> after S3. >> >> Will you consider them be accepted? > Hi Aaron, > > Sorry - I thought I'd replied on the NO SLEEP portion of these patches, > but looking back I don't find the draft or the sent email. Sorry about > that. I'll summarize here what I wrote last month. > > This isn't so much a "fix" as a "hacky workaround" for the issue. From > the descriptions in the bug you linked in your original patch > submission, it appears that the root cause is somewhere in SMBus system > (could be SMBus driver, SMBus hardware, or the devices on the SMBus > (touch devices or other devices) - it's hard to tell with the info > available), where the SMBus is failing to come online correctly coming > out of S3 state. Anyway, this patch doesn't fix the root cause, but > merely works around it. Users confirmed the 1st patch that clear irq status fixed their multiple issues on Touchpad and Trackpoint. I think it is a fix. NO SLEEP patch was tried to give users a choice a fix touchpad issues that I didn't reproduce. If you don't like this export, we can drop it now as users confirmed 1st patch works. > > Setting the NO SLEEP bit will force the touch sensor to remain in a > high power consumption state while the rest of the system is in S3. > While not a lot of power compared to things like the CPU, display, and > others, it is still non-trivial and will result in shorter time-on- > battery capability. Verified on s2idle and S3 and system running idle mode. no difference on power consumption of whole system with or without set 1 to nosleep. > > If you have access to the power pin(s) for the touch sensor(s)/styk(s), > it might be interesting to try turning power off on entering S3, and > restoring it on exit. That's very hacky, and has the side effect of > slightly delaying touchpad readiness on exit from S3. Plus you'll need > to restore touch sensor configuration settings on exit. But it > definitely reduces power consumption. > > > Separately, I am still concerned about the possibility of dropped touch > events in the IRQ clearing. I'm not convinced that the code is safe > (as you mentioned in your reply to my earlier comment), so I'll have to > study the implementation more carefully. Sure, take your time, if you have any questions let me know please. Thanks, Aaron > > Cheers, > Chris > > > >> Thanks, >> Aaron >
Hi Christopher: Have got time to review these 2 patches? Users reported it works fine since I sent out this patch. Thanks, Aaron On 4/3/19 9:58 PM, Aaron Ma wrote: > Sure, take your time, if you have any questions let me know please. > > Thanks, > Aaron
On Tue, 2019-06-04 at 10:45 +0800, Aaron Ma wrote: > Hi Christopher: > > Have got time to review these 2 patches? > Users reported it works fine since I sent out this patch. Hi Aaron, I've been poking around with this off and on. Unfortunately, more off than on :-( but here's my current take: rmi_driver_set_irq_bits() isn't going to be called all that often, and it's not going to be called at all during normal operation, which is where the most serious problem would occur. I haven't entirely convinced myself that there couldn't be a problem during repeated spontaneous device resets (for example, due to ESD, a dodgy charger, or firmware bug, among other things). On the other hand, all the scenarios I have come up with are both unlikely and so contrived that the system is probably hosed regardless of what we do in the driver. Given that, I'm willing to accept the patch as is. Cheers, Chris > > Thanks, > Aaron > > On 4/3/19 9:58 PM, Aaron Ma wrote: > > Sure, take your time, if you have any questions let me know please. > > > > Thanks, > > Aaron
Hi Dmitry: Will you apply them? Thanks, Aaron On 6/4/19 1:19 PM, Christopher Heiny wrote: > Given that, I'm willing to accept the patch as is. > > Cheers, > Chris
Hi Aaron, On Wed, Feb 20, 2019 at 05:41:59PM +0100, Aaron Ma wrote: > rmi4 got spam data after S3 resume on some ThinkPads. > Then TrackPoint lost when be detected by psmouse. > Clear irqs status before set irqs will make TrackPoint back. Could you please give me an idea as to what this spam data is? In F03 probe we clear all pending data before enabling the function, maybe the same needs to be done on resume, instead of changing the way we handle IRQ bits? Thanks,
On 6/10/19 12:55 AM, Dmitry Torokhov wrote: > Hi Aaron, > > On Wed, Feb 20, 2019 at 05:41:59PM +0100, Aaron Ma wrote: >> rmi4 got spam data after S3 resume on some ThinkPads. >> Then TrackPoint lost when be detected by psmouse. >> Clear irqs status before set irqs will make TrackPoint back. > Could you please give me an idea as to what this spam data is? > It should be some data 0 during suspend/resume. Actually I don't know how these data 0 is produced. Not all synaptics touchpads have this issue. > In F03 probe we clear all pending data before enabling the function, Yes we did, but not after resume. > maybe the same needs to be done on resume, instead of changing the way > we handle IRQ bits? This patch is supposed to clear irq status like it in fn probe. Not changing IRQ bits. Thanks, Aaron > > Thanks, > > -- Dmitry >
On Tue, Jun 11, 2019 at 12:55:58AM +0800, Aaron Ma wrote: > > On 6/10/19 12:55 AM, Dmitry Torokhov wrote: > > Hi Aaron, > > > > On Wed, Feb 20, 2019 at 05:41:59PM +0100, Aaron Ma wrote: > >> rmi4 got spam data after S3 resume on some ThinkPads. > >> Then TrackPoint lost when be detected by psmouse. > >> Clear irqs status before set irqs will make TrackPoint back. > > Could you please give me an idea as to what this spam data is? > > > > It should be some data 0 during suspend/resume. > Actually I don't know how these data 0 is produced. > Not all synaptics touchpads have this issue. > > > In F03 probe we clear all pending data before enabling the function, > > Yes we did, but not after resume. Yes, I understand that. The question I was asking: if we add code consuming all pending data to f03->suspend(), similarly to what we are doing at probe time, will it fix the issue with trackstick losing synchronization and attempting disconnect? > > > maybe the same needs to be done on resume, instead of changing the way > > we handle IRQ bits? > > This patch is supposed to clear irq status like it in fn probe. Not > changing IRQ bits. What I meant is changing how we enable IRQ bits. I would really prefer we did not lose IRQ state for other functions when we enable interrupts for given function. Thanks.
On 6/12/19 1:35 AM, Dmitry Torokhov wrote: > On Tue, Jun 11, 2019 at 12:55:58AM +0800, Aaron Ma wrote: >> On 6/10/19 12:55 AM, Dmitry Torokhov wrote: >>> Hi Aaron, >>> >>> On Wed, Feb 20, 2019 at 05:41:59PM +0100, Aaron Ma wrote: >>>> rmi4 got spam data after S3 resume on some ThinkPads. >>>> Then TrackPoint lost when be detected by psmouse. >>>> Clear irqs status before set irqs will make TrackPoint back. >>> Could you please give me an idea as to what this spam data is? >>> >> It should be some data 0 during suspend/resume. >> Actually I don't know how these data 0 is produced. >> Not all synaptics touchpads have this issue. >> >>> In F03 probe we clear all pending data before enabling the function, >> Yes we did, but not after resume. > Yes, I understand that. The question I was asking: if we add code > consuming all pending data to f03->suspend(), similarly to what we are > doing at probe time, will it fix the issue with trackstick losing > synchronization and attempting disconnect? > I just do some test via adding code in suspend or resume. But they didn't work out. >>> maybe the same needs to be done on resume, instead of changing the way >>> we handle IRQ bits? >> This patch is supposed to clear irq status like it in fn probe. Not >> changing IRQ bits. > What I meant is changing how we enable IRQ bits. I would really prefer > we did not lose IRQ state for other functions when we enable interrupts > for given function. > Not only F03 with problem, F12 too which is touchpad . User verified this patch fixes problem of F12 too. Clear IRQ status before enable IRQ should be safe. Or we can add code before enable IRQ in F03/F12. Thanks, Aaron > Thanks. > > -- Dmitry >
Hi Dmitry, > On Jun 14, 2019, at 12:26, Aaron Ma <aaron.ma@canonical.com> wrote: > > On 6/12/19 1:35 AM, Dmitry Torokhov wrote: >> On Tue, Jun 11, 2019 at 12:55:58AM +0800, Aaron Ma wrote: >>> On 6/10/19 12:55 AM, Dmitry Torokhov wrote: >>>> Hi Aaron, >>>> >>>> On Wed, Feb 20, 2019 at 05:41:59PM +0100, Aaron Ma wrote: >>>>> rmi4 got spam data after S3 resume on some ThinkPads. >>>>> Then TrackPoint lost when be detected by psmouse. >>>>> Clear irqs status before set irqs will make TrackPoint back. >>>> Could you please give me an idea as to what this spam data is? >>>> >>> It should be some data 0 during suspend/resume. >>> Actually I don't know how these data 0 is produced. >>> Not all synaptics touchpads have this issue. >>> >>>> In F03 probe we clear all pending data before enabling the function, >>> Yes we did, but not after resume. >> Yes, I understand that. The question I was asking: if we add code >> consuming all pending data to f03->suspend(), similarly to what we are >> doing at probe time, will it fix the issue with trackstick losing >> synchronization and attempting disconnect? >> > > I just do some test via adding code in suspend or resume. > But they didn't work out. > >>>> maybe the same needs to be done on resume, instead of changing the way >>>> we handle IRQ bits? >>> This patch is supposed to clear irq status like it in fn probe. Not >>> changing IRQ bits. >> What I meant is changing how we enable IRQ bits. I would really prefer >> we did not lose IRQ state for other functions when we enable interrupts >> for given function. >> > > Not only F03 with problem, F12 too which is touchpad . > User verified this patch fixes problem of F12 too. > Clear IRQ status before enable IRQ should be safe. > > Or we can add code before enable IRQ in F03/F12. Users reported that patch [1/2] alone can solve the issue. Do we need more information before making this fix merged? Kai-Heng > > Thanks, > Aaron > >> Thanks. >> >> -- Dmitry
diff --git a/drivers/input/rmi4/rmi_driver.c b/drivers/input/rmi4/rmi_driver.c index fc3ab93b7aea..20631b272f43 100644 --- a/drivers/input/rmi4/rmi_driver.c +++ b/drivers/input/rmi4/rmi_driver.c @@ -374,6 +374,17 @@ static int rmi_driver_set_irq_bits(struct rmi_device *rmi_dev, struct device *dev = &rmi_dev->dev; mutex_lock(&data->irq_mutex); + + /* Dummy read in order to clear irqs */ + error = rmi_read_block(rmi_dev, + data->f01_container->fd.data_base_addr + 1, + data->irq_status, data->num_of_irq_regs); + if (error < 0) { + dev_err(dev, "%s: Failed to read interrupt status!", + __func__); + goto error_unlock; + } + bitmap_or(data->new_irq_mask, data->current_irq_mask, mask, data->irq_count);
rmi4 got spam data after S3 resume on some ThinkPads. Then TrackPoint lost when be detected by psmouse. Clear irqs status before set irqs will make TrackPoint back. BugLink: https://bugs.launchpad.net/bugs/1791427 Cc: <stable@vger.kernel.org> Signed-off-by: Aaron Ma <aaron.ma@canonical.com> --- drivers/input/rmi4/rmi_driver.c | 11 +++++++++++ 1 file changed, 11 insertions(+)