mbox series

[RFC,0/7] usb: typec: ucsi: Polling the alt modes and PDOs

Message ID 20210607131442.20121-1-heikki.krogerus@linux.intel.com (mailing list archive)
Headers show
Series usb: typec: ucsi: Polling the alt modes and PDOs | expand

Message

Heikki Krogerus June 7, 2021, 1:14 p.m. UTC
Hi,

This is the RFC series I promised [1].

I'm sorry it took this long to prepare these. I had to concentrate on
other task for a while.

Let me know if you still see any problems.

[1] https://lore.kernel.org/linux-usb/YKT3oEt%2F9fX8k8mw@kuha.fi.intel.com/

thanks,

Heikki Krogerus (7):
  usb: typec: ucsi: Always cancel the command if PPM reports BUSY
    condition
  usb: typec: ucsi: Don't stop alt mode registration on busy condition
  usb: typec: ucsi: Add poll worker for alternate modes
  usb: typec: ucsi: acpi: Reduce the command completion timeout
  usb: typec: ucsi: Process every connector change as unique connector
    state
  usb: typec: ucsi: Filter out spurious events
  usb: typec: ucsi: Read the PDOs in separate work

 drivers/usb/typec/ucsi/ucsi.c      | 317 ++++++++++++++++-------------
 drivers/usb/typec/ucsi/ucsi.h      |   3 +-
 drivers/usb/typec/ucsi/ucsi_acpi.c |   2 +-
 3 files changed, 183 insertions(+), 139 deletions(-)

Comments

Benjamin Berg June 7, 2021, 8:09 p.m. UTC | #1
Hi Heikki,

On Mon, 2021-06-07 at 16:14 +0300, Heikki Krogerus wrote:
> This is the RFC series I promised [1].

Cool.

> I'm sorry it took this long to prepare these. I had to concentrate on
> other task for a while.
> 
> Let me know if you still see any problems.

Hmm, I am not sure this got better. I applied the patchset on top of of
the 5.12.9 Fedora 34 kernel. On the machine in question (X1 Carbon 8),
I see the online state getting stuck at 1 occasionally. This can happen
for example when quickly plugging and unplugging a USB-C charger.

Benjamin

> [1]   
> https://lore.kernel.org/linux-usb/YKT3oEt%2F9fX8k8mw@kuha.fi.intel.com/
> 
> thanks,
> 
> Heikki Krogerus (7):
>   usb: typec: ucsi: Always cancel the command if PPM reports BUSY
>     condition
>   usb: typec: ucsi: Don't stop alt mode registration on busy
> condition
>   usb: typec: ucsi: Add poll worker for alternate modes
>   usb: typec: ucsi: acpi: Reduce the command completion timeout
>   usb: typec: ucsi: Process every connector change as unique
> connector
>     state
>   usb: typec: ucsi: Filter out spurious events
>   usb: typec: ucsi: Read the PDOs in separate work
> 
>  drivers/usb/typec/ucsi/ucsi.c      | 317 ++++++++++++++++-----------
> --
>  drivers/usb/typec/ucsi/ucsi.h      |   3 +-
>  drivers/usb/typec/ucsi/ucsi_acpi.c |   2 +-
>  3 files changed, 183 insertions(+), 139 deletions(-)
>
Heikki Krogerus June 8, 2021, 6:42 a.m. UTC | #2
On Mon, Jun 07, 2021 at 10:09:58PM +0200, Benjamin Berg wrote:
> Hi Heikki,
> 
> On Mon, 2021-06-07 at 16:14 +0300, Heikki Krogerus wrote:
> > This is the RFC series I promised [1].
> 
> Cool.
> 
> > I'm sorry it took this long to prepare these. I had to concentrate on
> > other task for a while.
> > 
> > Let me know if you still see any problems.
> 
> Hmm, I am not sure this got better. I applied the patchset on top of of
> the 5.12.9 Fedora 34 kernel. On the machine in question (X1 Carbon 8),
> I see the online state getting stuck at 1 occasionally. This can happen
> for example when quickly plugging and unplugging a USB-C charger.

Please check does the partner device get removed. What do you have
under /sys/class/typec after that happens?

thanks,
Heikki Krogerus June 8, 2021, 6:54 a.m. UTC | #3
On Tue, Jun 08, 2021 at 09:42:09AM +0300, Heikki Krogerus wrote:
> On Mon, Jun 07, 2021 at 10:09:58PM +0200, Benjamin Berg wrote:
> > Hi Heikki,
> > 
> > On Mon, 2021-06-07 at 16:14 +0300, Heikki Krogerus wrote:
> > > This is the RFC series I promised [1].
> > 
> > Cool.
> > 
> > > I'm sorry it took this long to prepare these. I had to concentrate on
> > > other task for a while.
> > > 
> > > Let me know if you still see any problems.
> > 
> > Hmm, I am not sure this got better. I applied the patchset on top of of
> > the 5.12.9 Fedora 34 kernel. On the machine in question (X1 Carbon 8),
> > I see the online state getting stuck at 1 occasionally. This can happen
> > for example when quickly plugging and unplugging a USB-C charger.
> 
> Please check does the partner device get removed. What do you have
> under /sys/class/typec after that happens?

Oh yes. Could you also share the trace output when that happens?

        cd /sys/kernel/debug/tracing
        echo 1 > events/ucsi/enable
        # now reproduce the issue
        cat trace > ucsi.trace

thanks,
Benjamin Berg June 8, 2021, 7:32 p.m. UTC | #4
On Tue, 2021-06-08 at 09:54 +0300, Heikki Krogerus wrote:
> On Tue, Jun 08, 2021 at 09:42:09AM +0300, Heikki Krogerus wrote:
> > Please check does the partner device get removed. What do you have
> > under /sys/class/typec after that happens?
> 
> Oh yes. Could you also share the trace output when that happens?
> 
>         cd /sys/kernel/debug/tracing
>         echo 1 > events/ucsi/enable
>         # now reproduce the issue
>         cat trace > ucsi.trace

So, the partner device is still there when this happens (see below). I
also only see a single event in the trace for the fast plug/unplug
case:
   kworker/u16:8-1771    [003] .... 18848.872145: ucsi_connector_change: port1 status: change=4a04, opmode=5, connected=1, sourcing=0, partner_flags=1, partner_type=1, request_data_obj=1304b12c, BC status=1

The typec port/partner states is:

port1-partner/accessory_mode:none
port1-partner/supports_usb_power_delivery:no
port1-partner/uevent:DEVTYPE=typec_partner
port1-partner/usb_power_delivery_revision:0.0

and

port0/data_role:host [device]
port0/power_operation_mode:default
port0/power_role:source [sink]
port0/supported_accessory_modes:none
port0/uevent:DEVTYPE=typec_port
port0/uevent:TYPEC_PORT=port0
port0/usb_power_delivery_revision:2.0
port0/usb_typec_revision:1.0
port0/vconn_source:no
port1/data_role:host [device]
port1/power_operation_mode:3.0A
port1/power_role:source [sink]
port1/supported_accessory_modes:none
port1/uevent:DEVTYPE=typec_port
port1/uevent:TYPEC_PORT=port1
port1/usb_power_delivery_revision:2.0
port1/usb_typec_revision:1.0
port1/vconn_source:no

Note that for a normal plug I am usually getting a second event. This
second event is occasionally missing though:
  kworker/u16:38-1800    [001] .... 19522.325885: ucsi_connector_change: port1 status: change=4a04, opmode=5, connected=1, sourcing=0, partner_flags=1, partner_type=1, request_data_obj=1304b12c, BC status=1
  kworker/u16:38-1800    [004] .... 19522.552613: ucsi_connector_change: port1 status: change=0044, opmode=3, connected=1, sourcing=0, partner_flags=1, partner_type=1, request_data_obj=1304b12c, BC status=1

and a working unplug consistently looks like:
   kworker/u16:8-1771    [003] .... 19670.020085: ucsi_connector_change: port1 status: change=4804, opmode=0, connected=0, sourcing=0, partner_flags=1, partner_type=1, request_data_obj=1304b12c, BC status=0

Benjamin
Heikki Krogerus June 9, 2021, 11:25 a.m. UTC | #5
On Tue, Jun 08, 2021 at 09:32:01PM +0200, Benjamin Berg wrote:
> On Tue, 2021-06-08 at 09:54 +0300, Heikki Krogerus wrote:
> > On Tue, Jun 08, 2021 at 09:42:09AM +0300, Heikki Krogerus wrote:
> > > Please check does the partner device get removed. What do you have
> > > under /sys/class/typec after that happens?
> > 
> > Oh yes. Could you also share the trace output when that happens?
> > 
> >         cd /sys/kernel/debug/tracing
> >         echo 1 > events/ucsi/enable
> >         # now reproduce the issue
> >         cat trace > ucsi.trace
> 
> So, the partner device is still there when this happens (see below). I
> also only see a single event in the trace for the fast plug/unplug
> case:
>    kworker/u16:8-1771    [003] .... 18848.872145: ucsi_connector_change: port1 status: change=4a04, opmode=5, connected=1, sourcing=0, partner_flags=1, partner_type=1, request_data_obj=1304b12c, BC status=1

OK. Sorry I had to double check because you were only talking about
the psy online state.

Can you now try this HACK on top of these patches:

diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c
index bd39fe2cb1d0b..99f072700ce7f 100644
--- a/drivers/usb/typec/ucsi/ucsi.c
+++ b/drivers/usb/typec/ucsi/ucsi.c
@@ -843,7 +843,8 @@ static void ucsi_handle_connector_change(struct work_struct *work)
 
        if (!status.change) {
                dev_dbg(con->ucsi->dev, "con%d: spurious event\n", con->num);
-               goto out_ack;
+               /* XXX Force connection check. */
+               status.change = UCSI_CONSTAT_CONNECT_CHANGE;
        }
 
        event = kzalloc(sizeof(*event), GFP_KERNEL);

thanks,
Heikki Krogerus June 9, 2021, 12:18 p.m. UTC | #6
On Wed, Jun 09, 2021 at 02:26:00PM +0300, Heikki Krogerus wrote:
> On Tue, Jun 08, 2021 at 09:32:01PM +0200, Benjamin Berg wrote:
> > On Tue, 2021-06-08 at 09:54 +0300, Heikki Krogerus wrote:
> > > On Tue, Jun 08, 2021 at 09:42:09AM +0300, Heikki Krogerus wrote:
> > > > Please check does the partner device get removed. What do you have
> > > > under /sys/class/typec after that happens?
> > > 
> > > Oh yes. Could you also share the trace output when that happens?
> > > 
> > >         cd /sys/kernel/debug/tracing
> > >         echo 1 > events/ucsi/enable
> > >         # now reproduce the issue
> > >         cat trace > ucsi.trace
> > 
> > So, the partner device is still there when this happens (see below). I
> > also only see a single event in the trace for the fast plug/unplug
> > case:
> >    kworker/u16:8-1771    [003] .... 18848.872145: ucsi_connector_change: port1 status: change=4a04, opmode=5, connected=1, sourcing=0, partner_flags=1, partner_type=1, request_data_obj=1304b12c, BC status=1
> 
> OK. Sorry I had to double check because you were only talking about
> the psy online state.
> 
> Can you now try this HACK on top of these patches:
> 
> diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c
> index bd39fe2cb1d0b..99f072700ce7f 100644
> --- a/drivers/usb/typec/ucsi/ucsi.c
> +++ b/drivers/usb/typec/ucsi/ucsi.c
> @@ -843,7 +843,8 @@ static void ucsi_handle_connector_change(struct work_struct *work)
>  
>         if (!status.change) {
>                 dev_dbg(con->ucsi->dev, "con%d: spurious event\n", con->num);
> -               goto out_ack;
> +               /* XXX Force connection check. */
> +               status.change = UCSI_CONSTAT_CONNECT_CHANGE;
>         }
>  
>         event = kzalloc(sizeof(*event), GFP_KERNEL);

No, that's not enough. Sorry.

I'm trying to get a confirmation on my suspecion that we do always
actually get an event from the EC firmware, but we just end up
filtering it out in this case because we are too slow in the driver. I
have an idea what could be done about that, but I need to test if that
really is the case.

I'll prepare a new version out of this entire series.

thanks,
Heikki Krogerus June 9, 2021, 12:56 p.m. UTC | #7
On Wed, Jun 09, 2021 at 03:18:04PM +0300, Heikki Krogerus wrote:
> On Wed, Jun 09, 2021 at 02:26:00PM +0300, Heikki Krogerus wrote:
> > On Tue, Jun 08, 2021 at 09:32:01PM +0200, Benjamin Berg wrote:
> > > On Tue, 2021-06-08 at 09:54 +0300, Heikki Krogerus wrote:
> > > > On Tue, Jun 08, 2021 at 09:42:09AM +0300, Heikki Krogerus wrote:
> > > > > Please check does the partner device get removed. What do you have
> > > > > under /sys/class/typec after that happens?
> > > > 
> > > > Oh yes. Could you also share the trace output when that happens?
> > > > 
> > > >         cd /sys/kernel/debug/tracing
> > > >         echo 1 > events/ucsi/enable
> > > >         # now reproduce the issue
> > > >         cat trace > ucsi.trace
> > > 
> > > So, the partner device is still there when this happens (see below). I
> > > also only see a single event in the trace for the fast plug/unplug
> > > case:
> > >    kworker/u16:8-1771    [003] .... 18848.872145: ucsi_connector_change: port1 status: change=4a04, opmode=5, connected=1, sourcing=0, partner_flags=1, partner_type=1, request_data_obj=1304b12c, BC status=1
> > 
> > OK. Sorry I had to double check because you were only talking about
> > the psy online state.
> > 
> > Can you now try this HACK on top of these patches:
> > 
> > diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c
> > index bd39fe2cb1d0b..99f072700ce7f 100644
> > --- a/drivers/usb/typec/ucsi/ucsi.c
> > +++ b/drivers/usb/typec/ucsi/ucsi.c
> > @@ -843,7 +843,8 @@ static void ucsi_handle_connector_change(struct work_struct *work)
> >  
> >         if (!status.change) {
> >                 dev_dbg(con->ucsi->dev, "con%d: spurious event\n", con->num);
> > -               goto out_ack;
> > +               /* XXX Force connection check. */
> > +               status.change = UCSI_CONSTAT_CONNECT_CHANGE;
> >         }
> >  
> >         event = kzalloc(sizeof(*event), GFP_KERNEL);
> 
> No, that's not enough. Sorry.
> 
> I'm trying to get a confirmation on my suspecion that we do always
> actually get an event from the EC firmware, but we just end up
> filtering it out in this case because we are too slow in the driver. I
> have an idea what could be done about that, but I need to test if that
> really is the case.
> 
> I'll prepare a new version out of this entire series.

Actually, it's easier if you could just test this attached patch on
top of this series. It makes sure the every single event is
considered. I'm sorry about the hassle.

thanks,
Benjamin Berg June 9, 2021, 5:39 p.m. UTC | #8
On Wed, 2021-06-09 at 15:56 +0300, Heikki Krogerus wrote:
> On Wed, Jun 09, 2021 at 03:18:04PM +0300, Heikki Krogerus wrote:
> > 
> > I'm trying to get a confirmation on my suspecion that we do always
> > actually get an event from the EC firmware, but we just end up
> > filtering it out in this case because we are too slow in the driver.
> > I
> > have an idea what could be done about that, but I need to test if
> > that
> > really is the case.
> > 
> > I'll prepare a new version out of this entire series.
> 
> Actually, it's easier if you could just test this attached patch on
> top of this series. It makes sure the every single event is
> considered. I'm sorry about the hassle.

No worries! I probably should have included some more information
earlier (i.e. enabling the debug print for spurious events).

With the patch, I am seeing the following on plug

   kworker/u16:1-6847    [002] ....  1375.485010: ucsi_connector_change: port1 status: change=4a04, opmode=5, connected=1, sourcing=0, partner_flags=1, partner_type=1, request_data_obj=1304b12c, BC status=1
   kworker/u16:2-6848    [006] ....  1375.561811: ucsi_connector_change: port1 status: change=4000, opmode=5, connected=1, sourcing=0, partner_flags=1, partner_type=1, request_data_obj=1304b12c, BC status=1
   kworker/u16:2-6848    [007] ....  1375.634275: ucsi_connector_change: port1 status: change=4000, opmode=5, connected=1, sourcing=0, partner_flags=1, partner_type=1, request_data_obj=1304b12c, BC status=1
   kworker/u16:2-6848    [003] ....  1375.743161: ucsi_connector_change: port1 status: change=4000, opmode=3, connected=1, sourcing=0, partner_flags=1, partner_type=1, request_data_obj=1304b12c, BC status=1

and unplug

   kworker/u16:1-6847    [005] ....  1394.062501: ucsi_connector_change: port1 status: change=4804, opmode=0, connected=0, sourcing=0, partner_flags=0, partner_type=0, request_data_obj=00000000, BC status=0
   kworker/u16:1-6847    [005] ....  1394.161612: ucsi_connector_change: port1 status: change=4000, opmode=0, connected=0, sourcing=0, partner_flags=0, partner_type=0, request_data_obj=00000000, BC status=0
   kworker/u16:1-6847    [005] ....  1394.251503: ucsi_connector_change: port1 status: change=4000, opmode=0, connected=0, sourcing=0, partner_flags=0, partner_type=0, request_data_obj=00000000, BC status=0

where all but the first event are spurious events. I believe that in
the above spurious event with the change to opmode=3, the PPM should be
reporting change=0004 (i.e. UCSI_CONSTAT_POWER_OPMODE_CHANGE).

Occasionally I also see the following on plug. Note the non-spurious
event with change=0040 (UCSI_CONSTAT_POWER_LEVEL_CHANGE) right before
the event where opmode changes.

  kworker/u16:11-2201    [001] ....  3240.124431: ucsi_connector_change: port1 status: change=4a04, opmode=5, connected=1, sourcing=0, partner_flags=1, partner_type=1, request_data_obj=1304b12c, BC status=1
   kworker/u16:3-7469    [003] ....  3240.222799: ucsi_connector_change: port1 status: change=4000, opmode=5, connected=1, sourcing=0, partner_flags=1, partner_type=1, request_data_obj=1304b12c, BC status=1
   kworker/u16:3-7469    [003] ....  3240.325946: ucsi_connector_change: port1 status: change=0040, opmode=5, connected=1, sourcing=0, partner_flags=1, partner_type=1, request_data_obj=1304b12c, BC status=1
   kworker/u16:3-7469    [003] ....  3240.423503: ucsi_connector_change: port1 status: change=4000, opmode=3, connected=1, sourcing=0, partner_flags=1, partner_type=1, request_data_obj=1304b12c, BC status=1
   kworker/u16:3-7469    [003] ....  3240.861986: ucsi_connector_change: port1 status: change=4000, opmode=3, connected=1, sourcing=0, partner_flags=1, partner_type=1, request_data_obj=1304b12c, BC status=1
   kworker/u16:3-7469    [007] ....  3240.999048: ucsi_connector_change: port1 status: change=4000, opmode=3, connected=1, sourcing=0, partner_flags=1, partner_type=1, request_data_obj=1304b12c, BC status=1


My thought when I first ran into the issue was that the PPM simply
resets the change bitfield on ACK, effectively discarding any changes
that happened after the last GET_CONNECTOR_STATUS call. I believed to
have confirmed this by inserting an msleep in between.
However, I have to admit that I have never really looked for
alternative explanations.

Benjamin
Heikki Krogerus June 10, 2021, 12:07 p.m. UTC | #9
On Wed, Jun 09, 2021 at 07:39:41PM +0200, Benjamin Berg wrote:
> My thought when I first ran into the issue was that the PPM simply
> resets the change bitfield on ACK, effectively discarding any changes
> that happened after the last GET_CONNECTOR_STATUS call. I believed to
> have confirmed this by inserting an msleep in between.
> However, I have to admit that I have never really looked for
> alternative explanations.

Thanks a lot for testing these. I now have pretty good idea about the
problem. The problem is not what you though it is, but the driver is
also not too slow like I suspected.

I'm quite certain now that the PPM simply does not create the event at
all in this case, regardless of what the driver does. We do need to
workaround that problem, but I think we can do it in a better way. I
have an idea for that.

thanks,