diff mbox series

ACPI: resource: revert "Remove "Zen" specific match and quirks"

Message ID 20230806151453.10690-1-hdegoede@redhat.com (mailing list archive)
State Superseded, archived
Headers show
Series ACPI: resource: revert "Remove "Zen" specific match and quirks" | expand

Commit Message

Hans de Goede Aug. 6, 2023, 3:14 p.m. UTC
Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and
quirks") is causing keyboard problems for quite a log of AMD based
laptop users, leading to many bug reports.

Revert this change for now, until we can come up with
a better fix for the PS/2 IRQ trigger-type/polarity problems
on some x86 laptops.

Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Linux regressions mailing list <regressions@lists.linux.dev>
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 drivers/acpi/resource.c | 60 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

Comments

August Wikerfors Aug. 6, 2023, 3:51 p.m. UTC | #1
On 2023-08-06 17:14, Hans de Goede wrote:
> Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and
> quirks") is causing keyboard problems for quite a log of AMD based
> laptop users, leading to many bug reports.
> 
> Revert this change for now, until we can come up with
> a better fix for the PS/2 IRQ trigger-type/polarity problems
> on some x86 laptops.
> 
> Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks")
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726

Other reports for various Lenovo laptops:
https://lore.kernel.org/all/596b9c4a-fb83-a8ab-3a44-6052d83fa546@augustwikerfors.se/T/
https://bugzilla.kernel.org/show_bug.cgi?id=217718
https://bugzilla.kernel.org/show_bug.cgi?id=217731

> Cc: Mario Limonciello <mario.limonciello@amd.com>
> Cc: Linux regressions mailing list <regressions@lists.linux.dev>
> Cc: stable@vger.kernel.org
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>

FWIW only reverting a9c4a912b7dc will also cause a regression because it 
did also fix the keyboard on some devices, see the links in its commit 
message.

Regards,
August Wikerfors
Thorsten Leemhuis Aug. 6, 2023, 3:58 p.m. UTC | #2
On 06.08.23 17:14, Hans de Goede wrote:
> Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and
> quirks") is causing keyboard problems for quite a log of AMD based
> laptop users, leading to many bug reports.
> 
> Revert this change for now, until we can come up with
> a better fix for the PS/2 IRQ trigger-type/polarity problems
> on some x86 laptops.

Thx for this.

> Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks")
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726

Not that it matters much, but there is at least one more:

https://lore.kernel.org/lkml/596b9c4a-fb83-a8ab-3a44-6052d83fa546@augustwikerfors.se/

> Cc: Mario Limonciello <mario.limonciello@amd.com>
> Cc: Linux regressions mailing list <regressions@lists.linux.dev>
> Cc: stable@vger.kernel.org
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>

I tend to think this is the right thing to do. But for completeness to
ensure everybody is aware of this: I'm pretty sure (Mario will know for
sure) this will cause a regression for a few users, because a9c4a912b7dc
was meant fixing a regression for them. Not sure how many users this
might affect, but I guess it are at least those linked to in a9c4a912b7dc:

https://bugzilla.kernel.org/show_bug.cgi?id=217336
https://bugzilla.kernel.org/show_bug.cgi?id=217406
https://bugzilla.kernel.org/show_bug.cgi?id=217394

I assume that could be avoided by adding quirk entries for their
machines. Some of the tickets have the data to do so.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
Mario Limonciello Aug. 6, 2023, 5:13 p.m. UTC | #3
On 8/6/23 10:14, Hans de Goede wrote:
> Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and
> quirks") is causing keyboard problems for quite a log of AMD based
> laptop users, leading to many bug reports.
> 
> Revert this change for now, until we can come up with
> a better fix for the PS/2 IRQ trigger-type/polarity problems
> on some x86 laptops.

Reverting it is going to help a bunch of machines but cause regressions 
for others.  How do you prioritize which to fix when it comes to a 
regression?

> 
> Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks")
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726
> Cc: Mario Limonciello <mario.limonciello@amd.com>
> Cc: Linux regressions mailing list <regressions@lists.linux.dev>
> Cc: stable@vger.kernel.org
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> ---
>   drivers/acpi/resource.c | 60 +++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 60 insertions(+)
> 
> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
> index 1dd8d5aebf67..0800a9d77558 100644
> --- a/drivers/acpi/resource.c
> +++ b/drivers/acpi/resource.c
> @@ -470,6 +470,52 @@ static const struct dmi_system_id asus_laptop[] = {
>   	{ }
>   };
>   
> +static const struct dmi_system_id lenovo_laptop[] = {
> +	{
> +		.ident = "LENOVO IdeaPad Flex 5 14ALC7",
> +		.matches = {
> +			DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
> +			DMI_MATCH(DMI_PRODUCT_NAME, "82R9"),
> +		},
> +	},
> +	{
> +		.ident = "LENOVO IdeaPad Flex 5 16ALC7",
> +		.matches = {
> +			DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
> +			DMI_MATCH(DMI_PRODUCT_NAME, "82RA"),
> +		},
> +	},
> +	{ }
> +};
> +
> +static const struct dmi_system_id tongfang_gm_rg[] = {
> +	{
> +		.ident = "TongFang GMxRGxx/XMG CORE 15 (M22)/TUXEDO Stellaris 15 Gen4 AMD",
> +		.matches = {
> +			DMI_MATCH(DMI_BOARD_NAME, "GMxRGxx"),
> +		},
> +	},
> +	{ }
> +};
> +
> +static const struct dmi_system_id maingear_laptop[] = {
> +	{
> +		.ident = "MAINGEAR Vector Pro 2 15",
> +		.matches = {
> +			DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"),
> +			DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-15A3070T"),
> +		}
> +	},
> +	{
> +		.ident = "MAINGEAR Vector Pro 2 17",
> +		.matches = {
> +			DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"),
> +			DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-17A3070T"),
> +		},
> +	},
> +	{ }
> +};
> +
>   static const struct dmi_system_id lg_laptop[] = {
>   	{
>   		.ident = "LG Electronics 17U70P",
> @@ -493,6 +539,10 @@ struct irq_override_cmp {
>   static const struct irq_override_cmp override_table[] = {
>   	{ medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
>   	{ asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
> +	{ lenovo_laptop, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true },
> +	{ lenovo_laptop, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true },
> +	{ tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
> +	{ maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
>   	{ lg_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
>   };
>   
> @@ -512,6 +562,16 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity,
>   			return entry->override;
>   	}
>   
> +#ifdef CONFIG_X86
> +	/*
> +	 * IRQ override isn't needed on modern AMD Zen systems and
> +	 * this override breaks active low IRQs on AMD Ryzen 6000 and
> +	 * newer systems. Skip it.
> +	 */
> +	if (boot_cpu_has(X86_FEATURE_ZEN))
> +		return false;
> +#endif
> +
>   	return true;
>   }
>
Hans de Goede Aug. 6, 2023, 6:20 p.m. UTC | #4
Hi Mario,

On 8/6/23 19:13, Mario Limonciello wrote:
> On 8/6/23 10:14, Hans de Goede wrote:
>> Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and
>> quirks") is causing keyboard problems for quite a log of AMD based
>> laptop users, leading to many bug reports.
>>
>> Revert this change for now, until we can come up with
>> a better fix for the PS/2 IRQ trigger-type/polarity problems
>> on some x86 laptops.
> 
> Reverting it is going to help a bunch of machines but cause regressions for others.  How do you prioritize which to fix when it comes to a regression?

I understand where you are coming from. But we are dealing with a lot of bug reports from users of recent kernel versions (so not on LTS distros) where things used to work fine. Which is pretty much the definition of a regression.

OTOH the do not override IRQ on Zen behavior has been with us for a while now, so things which are broken by it have been broken for a while and have only started working since 6.4.7.

So to me it is clear that we first need to revert to the old state, so that users for which everything was working fine get back to a working system.

I realize this will unfix some very recently fixed systems, but notice the very recently bit here most users are no in 6.4.7 or .8 yet, so most users will not even have gotten the fixing effect OTOH many users are seeing breakage now.

So IMHO we should first get back to the known bad state, instead of introducing a new unknown bad state like a9c4a912b7dc did.

If there are known systems which will need an override with a9c4a912b7dc reverted, then we should probably add those to the override table right away.

And I think what we also need is a way to specify an IRQ trigger-type override for IRQ 1 on the kernel commandline, so that we can easily ask users to try a different trigger-type without them needing to build a kernel with an updated quirk table.

As Linus mentioned already, we really need to find a proper fix for this, but for now my main priority is to stop the influx of new Fedora bugzilla and bugzilla.kernel.org bugs caused by a9c4a912b7dc, counting the extra bug links added in this thread it seems we are up to 6 known bug reports for this and that likely is just the tip of the iceberg.

Regards,

Hans



> 
>>
>> Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks")
>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165
>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726
>> Cc: Mario Limonciello <mario.limonciello@amd.com>
>> Cc: Linux regressions mailing list <regressions@lists.linux.dev>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>> ---
>>   drivers/acpi/resource.c | 60 +++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 60 insertions(+)
>>
>> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
>> index 1dd8d5aebf67..0800a9d77558 100644
>> --- a/drivers/acpi/resource.c
>> +++ b/drivers/acpi/resource.c
>> @@ -470,6 +470,52 @@ static const struct dmi_system_id asus_laptop[] = {
>>       { }
>>   };
>>   +static const struct dmi_system_id lenovo_laptop[] = {
>> +    {
>> +        .ident = "LENOVO IdeaPad Flex 5 14ALC7",
>> +        .matches = {
>> +            DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
>> +            DMI_MATCH(DMI_PRODUCT_NAME, "82R9"),
>> +        },
>> +    },
>> +    {
>> +        .ident = "LENOVO IdeaPad Flex 5 16ALC7",
>> +        .matches = {
>> +            DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
>> +            DMI_MATCH(DMI_PRODUCT_NAME, "82RA"),
>> +        },
>> +    },
>> +    { }
>> +};
>> +
>> +static const struct dmi_system_id tongfang_gm_rg[] = {
>> +    {
>> +        .ident = "TongFang GMxRGxx/XMG CORE 15 (M22)/TUXEDO Stellaris 15 Gen4 AMD",
>> +        .matches = {
>> +            DMI_MATCH(DMI_BOARD_NAME, "GMxRGxx"),
>> +        },
>> +    },
>> +    { }
>> +};
>> +
>> +static const struct dmi_system_id maingear_laptop[] = {
>> +    {
>> +        .ident = "MAINGEAR Vector Pro 2 15",
>> +        .matches = {
>> +            DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"),
>> +            DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-15A3070T"),
>> +        }
>> +    },
>> +    {
>> +        .ident = "MAINGEAR Vector Pro 2 17",
>> +        .matches = {
>> +            DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"),
>> +            DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-17A3070T"),
>> +        },
>> +    },
>> +    { }
>> +};
>> +
>>   static const struct dmi_system_id lg_laptop[] = {
>>       {
>>           .ident = "LG Electronics 17U70P",
>> @@ -493,6 +539,10 @@ struct irq_override_cmp {
>>   static const struct irq_override_cmp override_table[] = {
>>       { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
>>       { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
>> +    { lenovo_laptop, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true },
>> +    { lenovo_laptop, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true },
>> +    { tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
>> +    { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
>>       { lg_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
>>   };
>>   @@ -512,6 +562,16 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity,
>>               return entry->override;
>>       }
>>   +#ifdef CONFIG_X86
>> +    /*
>> +     * IRQ override isn't needed on modern AMD Zen systems and
>> +     * this override breaks active low IRQs on AMD Ryzen 6000 and
>> +     * newer systems. Skip it.
>> +     */
>> +    if (boot_cpu_has(X86_FEATURE_ZEN))
>> +        return false;
>> +#endif
>> +
>>       return true;
>>   }
>>   
>
Mario Limonciello Aug. 7, 2023, 4:38 a.m. UTC | #5
On 8/6/23 13:20, Hans de Goede wrote:
> Hi Mario,
> 
> On 8/6/23 19:13, Mario Limonciello wrote:
>> On 8/6/23 10:14, Hans de Goede wrote:
>>> Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and
>>> quirks") is causing keyboard problems for quite a log of AMD based
>>> laptop users, leading to many bug reports.
>>>
>>> Revert this change for now, until we can come up with
>>> a better fix for the PS/2 IRQ trigger-type/polarity problems
>>> on some x86 laptops.
>>
>> Reverting it is going to help a bunch of machines but cause regressions for others.  How do you prioritize which to fix when it comes to a regression?
> 
> I understand where you are coming from. But we are dealing with a lot of bug reports from users of recent kernel versions (so not on LTS distros) where things used to work fine. Which is pretty much the definition of a regression.
> 
> OTOH the do not override IRQ on Zen behavior has been with us for a while now, so things which are broken by it have been broken for a while and have only started working since 6.4.7.
> 
> So to me it is clear that we first need to revert to the old state, so that users for which everything was working fine get back to a working system.
> 
> I realize this will unfix some very recently fixed systems, but notice the very recently bit here most users are no in 6.4.7 or .8 yet, so most users will not even have gotten the fixing effect OTOH many users are seeing breakage now.
> 
> So IMHO we should first get back to the known bad state, instead of introducing a new unknown bad state like a9c4a912b7dc did.
> 
> If there are known systems which will need an override with a9c4a912b7dc reverted, then we should probably add those to the override table right away.
> 
> And I think what we also need is a way to specify an IRQ trigger-type override for IRQ 1 on the kernel commandline, so that we can easily ask users to try a different trigger-type without them needing to build a kernel with an updated quirk table.
> 
> As Linus mentioned already, we really need to find a proper fix for this, but for now my main priority is to stop the influx of new Fedora bugzilla and bugzilla.kernel.org bugs caused by a9c4a912b7dc, counting the extra bug links added in this thread it seems we are up to 6 known bug reports for this and that likely is just the tip of the iceberg.
> 
> Regards,
> 
> Hans
> 
> 

We haven't even given a try to fixing it; I think the revert is still hasty.

I don't have a machine that can reproduce this failure, but I did 
confirm that my keyboard still works with this:

diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
index 1dd8d5aebf678..b74d7d8cc8630 100644
--- a/drivers/acpi/resource.c
+++ b/drivers/acpi/resource.c
@@ -19,7 +19,7 @@
  #include <linux/dmi.h>

  #ifdef CONFIG_X86
-#define valid_IRQ(i) (((i) != 0) && ((i) != 2))
+#define valid_IRQ(i) ((i) > 2)
  static inline bool acpi_iospace_resource_valid(struct resource *res)
  {
         /* On X86 IO space is limited to the [0 - 64K] IO port range */

Can we perhaps see if that works instead for some affected people?

If it does we should be able to throw away the entire quirks table.

> 
>>
>>>
>>> Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks")
>>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165
>>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726
>>> Cc: Mario Limonciello <mario.limonciello@amd.com>
>>> Cc: Linux regressions mailing list <regressions@lists.linux.dev>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>>> ---
>>>    drivers/acpi/resource.c | 60 +++++++++++++++++++++++++++++++++++++++++
>>>    1 file changed, 60 insertions(+)
>>>
>>> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
>>> index 1dd8d5aebf67..0800a9d77558 100644
>>> --- a/drivers/acpi/resource.c
>>> +++ b/drivers/acpi/resource.c
>>> @@ -470,6 +470,52 @@ static const struct dmi_system_id asus_laptop[] = {
>>>        { }
>>>    };
>>>    +static const struct dmi_system_id lenovo_laptop[] = {
>>> +    {
>>> +        .ident = "LENOVO IdeaPad Flex 5 14ALC7",
>>> +        .matches = {
>>> +            DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
>>> +            DMI_MATCH(DMI_PRODUCT_NAME, "82R9"),
>>> +        },
>>> +    },
>>> +    {
>>> +        .ident = "LENOVO IdeaPad Flex 5 16ALC7",
>>> +        .matches = {
>>> +            DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
>>> +            DMI_MATCH(DMI_PRODUCT_NAME, "82RA"),
>>> +        },
>>> +    },
>>> +    { }
>>> +};
>>> +
>>> +static const struct dmi_system_id tongfang_gm_rg[] = {
>>> +    {
>>> +        .ident = "TongFang GMxRGxx/XMG CORE 15 (M22)/TUXEDO Stellaris 15 Gen4 AMD",
>>> +        .matches = {
>>> +            DMI_MATCH(DMI_BOARD_NAME, "GMxRGxx"),
>>> +        },
>>> +    },
>>> +    { }
>>> +};
>>> +
>>> +static const struct dmi_system_id maingear_laptop[] = {
>>> +    {
>>> +        .ident = "MAINGEAR Vector Pro 2 15",
>>> +        .matches = {
>>> +            DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"),
>>> +            DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-15A3070T"),
>>> +        }
>>> +    },
>>> +    {
>>> +        .ident = "MAINGEAR Vector Pro 2 17",
>>> +        .matches = {
>>> +            DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"),
>>> +            DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-17A3070T"),
>>> +        },
>>> +    },
>>> +    { }
>>> +};
>>> +
>>>    static const struct dmi_system_id lg_laptop[] = {
>>>        {
>>>            .ident = "LG Electronics 17U70P",
>>> @@ -493,6 +539,10 @@ struct irq_override_cmp {
>>>    static const struct irq_override_cmp override_table[] = {
>>>        { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
>>>        { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
>>> +    { lenovo_laptop, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true },
>>> +    { lenovo_laptop, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true },
>>> +    { tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
>>> +    { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
>>>        { lg_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
>>>    };
>>>    @@ -512,6 +562,16 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity,
>>>                return entry->override;
>>>        }
>>>    +#ifdef CONFIG_X86
>>> +    /*
>>> +     * IRQ override isn't needed on modern AMD Zen systems and
>>> +     * this override breaks active low IRQs on AMD Ryzen 6000 and
>>> +     * newer systems. Skip it.
>>> +     */
>>> +    if (boot_cpu_has(X86_FEATURE_ZEN))
>>> +        return false;
>>> +#endif
>>> +
>>>        return true;
>>>    }
>>>    
>>
>
Thorsten Leemhuis Aug. 7, 2023, 5:52 a.m. UTC | #6
On 07.08.23 06:38, Mario Limonciello wrote:
> On 8/6/23 13:20, Hans de Goede wrote:
>> On 8/6/23 19:13, Mario Limonciello wrote:
>>> On 8/6/23 10:14, Hans de Goede wrote:
>>>> Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and
>>>> quirks") is causing keyboard problems for quite a log of AMD based
>>>> laptop users, leading to many bug reports.
>>>>
>>>> Revert this change for now, until we can come up with
>>>> a better fix for the PS/2 IRQ trigger-type/polarity problems
>>>> on some x86 laptops.
>>>
>>> Reverting it is going to help a bunch of machines but cause
>>> regressions for others.  How do you prioritize which to fix when it
>>> comes to a regression?

It's up to Linus in the end, but to echo what Hans already said: I guess
Linus stance in this case would be along the lines of "let's get back to
a known state, even if we known that state has problems as well". And
that state is the mainline release 6.4 . Sure, Greg picked this up for
stable, but apart from that things boil down to "we tried something in
6.5-rc1, it did work well, so let's revert this until we work our a
proper solution that can be applied in a later cycle, no harm done".

> We haven't even given a try to fixing it; I think the revert is still
> hasty.

Some urgency is required, as (a) the patch made it into a stable release
(b) get closer to the end of the merge window and want to avoid last
minute changes.
> [...]
> Can we perhaps see if that works instead for some affected people?
> [...]

I'd say: it would be best to resolve this now with a revert, then we
might even reach the next stable-rc release (Greg unusually releases
those on Mon/Tue/Wed) and thus the next stable release. But let's please
definitely resolve this this week in mainline before -rc6.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
Hans de Goede Aug. 7, 2023, 7:44 a.m. UTC | #7
Hi,

On 8/7/23 06:38, Mario Limonciello wrote:
> On 8/6/23 13:20, Hans de Goede wrote:
>> Hi Mario,
>>
>> On 8/6/23 19:13, Mario Limonciello wrote:
>>> On 8/6/23 10:14, Hans de Goede wrote:
>>>> Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and
>>>> quirks") is causing keyboard problems for quite a log of AMD based
>>>> laptop users, leading to many bug reports.
>>>>
>>>> Revert this change for now, until we can come up with
>>>> a better fix for the PS/2 IRQ trigger-type/polarity problems
>>>> on some x86 laptops.
>>>
>>> Reverting it is going to help a bunch of machines but cause regressions for others.  How do you prioritize which to fix when it comes to a regression?
>>
>> I understand where you are coming from. But we are dealing with a lot of bug reports from users of recent kernel versions (so not on LTS distros) where things used to work fine. Which is pretty much the definition of a regression.
>>
>> OTOH the do not override IRQ on Zen behavior has been with us for a while now, so things which are broken by it have been broken for a while and have only started working since 6.4.7.
>>
>> So to me it is clear that we first need to revert to the old state, so that users for which everything was working fine get back to a working system.
>>
>> I realize this will unfix some very recently fixed systems, but notice the very recently bit here most users are no in 6.4.7 or .8 yet, so most users will not even have gotten the fixing effect OTOH many users are seeing breakage now.
>>
>> So IMHO we should first get back to the known bad state, instead of introducing a new unknown bad state like a9c4a912b7dc did.
>>
>> If there are known systems which will need an override with a9c4a912b7dc reverted, then we should probably add those to the override table right away.
>>
>> And I think what we also need is a way to specify an IRQ trigger-type override for IRQ 1 on the kernel commandline, so that we can easily ask users to try a different trigger-type without them needing to build a kernel with an updated quirk table.
>>
>> As Linus mentioned already, we really need to find a proper fix for this, but for now my main priority is to stop the influx of new Fedora bugzilla and bugzilla.kernel.org bugs caused by a9c4a912b7dc, counting the extra bug links added in this thread it seems we are up to 6 known bug reports for this and that likely is just the tip of the iceberg.
>>
>> Regards,
>>
>> Hans
>>
>>
> 
> We haven't even given a try to fixing it; I think the revert is still hasty.
> 
> I don't have a machine that can reproduce this failure, but I did confirm that my keyboard still works with this:
> 
> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
> index 1dd8d5aebf678..b74d7d8cc8630 100644
> --- a/drivers/acpi/resource.c
> +++ b/drivers/acpi/resource.c
> @@ -19,7 +19,7 @@
>  #include <linux/dmi.h>
> 
>  #ifdef CONFIG_X86
> -#define valid_IRQ(i) (((i) != 0) && ((i) != 2))
> +#define valid_IRQ(i) ((i) > 2)
>  static inline bool acpi_iospace_resource_valid(struct resource *res)
>  {
>         /* On X86 IO space is limited to the [0 - 64K] IO port range */
> 
> Can we perhaps see if that works instead for some affected people?

That does not just skip the override stuff, it will make the kernel return irqresource_disabled(res, 1) for the kbd IRQ:

static inline void irqresource_disabled(struct resource *res, u32 irq)
{
        res->start = irq;
        res->end = irq;
        res->flags |= IORESOURCE_IRQ | IORESOURCE_DISABLED | IORESOURCE_UNSET;
}

Now not much code seems to actually check the IORESOURCE_DISABLED | IORESOURCE_UNSET flags, so this might still work but it does not seem like the right thing to do.

Regards,

Hans






> 
> If it does we should be able to throw away the entire quirks table.
> 
>>
>>>
>>>>
>>>> Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks")
>>>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165
>>>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317
>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726
>>>> Cc: Mario Limonciello <mario.limonciello@amd.com>
>>>> Cc: Linux regressions mailing list <regressions@lists.linux.dev>
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>>>> ---
>>>>    drivers/acpi/resource.c | 60 +++++++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 60 insertions(+)
>>>>
>>>> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
>>>> index 1dd8d5aebf67..0800a9d77558 100644
>>>> --- a/drivers/acpi/resource.c
>>>> +++ b/drivers/acpi/resource.c
>>>> @@ -470,6 +470,52 @@ static const struct dmi_system_id asus_laptop[] = {
>>>>        { }
>>>>    };
>>>>    +static const struct dmi_system_id lenovo_laptop[] = {
>>>> +    {
>>>> +        .ident = "LENOVO IdeaPad Flex 5 14ALC7",
>>>> +        .matches = {
>>>> +            DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
>>>> +            DMI_MATCH(DMI_PRODUCT_NAME, "82R9"),
>>>> +        },
>>>> +    },
>>>> +    {
>>>> +        .ident = "LENOVO IdeaPad Flex 5 16ALC7",
>>>> +        .matches = {
>>>> +            DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
>>>> +            DMI_MATCH(DMI_PRODUCT_NAME, "82RA"),
>>>> +        },
>>>> +    },
>>>> +    { }
>>>> +};
>>>> +
>>>> +static const struct dmi_system_id tongfang_gm_rg[] = {
>>>> +    {
>>>> +        .ident = "TongFang GMxRGxx/XMG CORE 15 (M22)/TUXEDO Stellaris 15 Gen4 AMD",
>>>> +        .matches = {
>>>> +            DMI_MATCH(DMI_BOARD_NAME, "GMxRGxx"),
>>>> +        },
>>>> +    },
>>>> +    { }
>>>> +};
>>>> +
>>>> +static const struct dmi_system_id maingear_laptop[] = {
>>>> +    {
>>>> +        .ident = "MAINGEAR Vector Pro 2 15",
>>>> +        .matches = {
>>>> +            DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"),
>>>> +            DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-15A3070T"),
>>>> +        }
>>>> +    },
>>>> +    {
>>>> +        .ident = "MAINGEAR Vector Pro 2 17",
>>>> +        .matches = {
>>>> +            DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"),
>>>> +            DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-17A3070T"),
>>>> +        },
>>>> +    },
>>>> +    { }
>>>> +};
>>>> +
>>>>    static const struct dmi_system_id lg_laptop[] = {
>>>>        {
>>>>            .ident = "LG Electronics 17U70P",
>>>> @@ -493,6 +539,10 @@ struct irq_override_cmp {
>>>>    static const struct irq_override_cmp override_table[] = {
>>>>        { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
>>>>        { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
>>>> +    { lenovo_laptop, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true },
>>>> +    { lenovo_laptop, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true },
>>>> +    { tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
>>>> +    { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
>>>>        { lg_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
>>>>    };
>>>>    @@ -512,6 +562,16 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity,
>>>>                return entry->override;
>>>>        }
>>>>    +#ifdef CONFIG_X86
>>>> +    /*
>>>> +     * IRQ override isn't needed on modern AMD Zen systems and
>>>> +     * this override breaks active low IRQs on AMD Ryzen 6000 and
>>>> +     * newer systems. Skip it.
>>>> +     */
>>>> +    if (boot_cpu_has(X86_FEATURE_ZEN))
>>>> +        return false;
>>>> +#endif
>>>> +
>>>>        return true;
>>>>    }
>>>>    
>>>
>>
>
Mario Limonciello Aug. 7, 2023, 11:39 a.m. UTC | #8
>> We haven't even given a try to fixing it; I think the revert is still hasty.
>>
>> I don't have a machine that can reproduce this failure, but I did confirm that my keyboard still works with this:
>>
>> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
>> index 1dd8d5aebf678..b74d7d8cc8630 100644
>> --- a/drivers/acpi/resource.c
>> +++ b/drivers/acpi/resource.c
>> @@ -19,7 +19,7 @@
>>   #include <linux/dmi.h>
>>
>>   #ifdef CONFIG_X86
>> -#define valid_IRQ(i) (((i) != 0) && ((i) != 2))
>> +#define valid_IRQ(i) ((i) > 2)
>>   static inline bool acpi_iospace_resource_valid(struct resource *res)
>>   {
>>          /* On X86 IO space is limited to the [0 - 64K] IO port range */
>>
>> Can we perhaps see if that works instead for some affected people?
> 
> That does not just skip the override stuff, it will make the kernel return irqresource_disabled(res, 1) for the kbd IRQ:
> 
> static inline void irqresource_disabled(struct resource *res, u32 irq)
> {
>          res->start = irq;
>          res->end = irq;
>          res->flags |= IORESOURCE_IRQ | IORESOURCE_DISABLED | IORESOURCE_UNSET;
> }
> 
> Now not much code seems to actually check the IORESOURCE_DISABLED | IORESOURCE_UNSET flags, so this might still work but it does not seem like the right thing to do.
> 

Right; so it makes the resource get skipped when PNP is enumerated.

To me it seems that it should match the theme of 'don't reprogram the 
polarity and triggering for the IRQ'.  When the IRQ is set up by i8042 
then it will try to go and get a resource and wouldn't find one.  So 
when the IRQ is set up it should match the values already programmed.

I do wonder if both IRQ 1 and IRQ 12 both would need to be skipped for 
this to work (if people are affected by both i8042 devices).
Hans de Goede Aug. 7, 2023, 3:19 p.m. UTC | #9
Hi All,

On 8/6/23 17:14, Hans de Goede wrote:
> Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and
> quirks") is causing keyboard problems for quite a log of AMD based
> laptop users, leading to many bug reports.
> 
> Revert this change for now, until we can come up with
> a better fix for the PS/2 IRQ trigger-type/polarity problems
> on some x86 laptops.
> 
> Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks")
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726
> Cc: Mario Limonciello <mario.limonciello@amd.com>
> Cc: Linux regressions mailing list <regressions@lists.linux.dev>
> Cc: stable@vger.kernel.org
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>

I've spend most of today analysing the situation / this problem :

213031 - MEDION notebook internal keyboard not recognized / not working correctly
https://bugzilla.kernel.org/show_bug.cgi?id=213031

This is the bug that started it all, the issue here was overriding
a level low DSDT entry:

                IRQ (Level, ActiveLow, Exclusive, )
                    {1}

With an edge high entry from the MADT, note that edge high is the default
mp_irqs[idx].irqflags value for legacy/ISA IRQs. The dmesg for the Ice Lake
Medion M15T this bug is about shows no INT_SRC_OVR entry for IRQ 1
in the MADT, it does show INT_SRC_OVR entries for IRQ 0 and 9.

At first a fix was attempted to not use the MADT override unless
the DSDT entry was edge high. But that caused regressions, so a switch
to a DMI based approach was used instead. Noteworthy is that some of
the regressions benefitted from a MADT override to high edge for
IRQ 3 and 4 (UART IRQs) even though there are no INT_SRC_OVR messages
in the dmesg of the machine with the regression.

*** fast forward to today ***

The DMI quirk based approach seems to have worked well for the Ice Lake
era problems from approx. 3 years ago. But on AMD Zen based systems
the situation seems to be more complex. Not using the MADT override
is a problem for quite a few models. But using the MADT override
is a problem on quite a few other models ...

Looking at the status quo for v6.4 where MADT overriding by default
is not used, 3 bugs have been filed where the override is actually
necessary (note dmesg snippets with patched kernel to enable
MADT override):

217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons 
https://bugzilla.kernel.org/show_bug.cgi?id=217394

Aya Neo Air Plus - AMD Ryzen 7 6800U

[    0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge)
[    0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[    0.410670] ACPI: IRQ 1 override to edge, low(!)

217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs
https://bugzilla.kernel.org/show_bug.cgi?id=217406

HP Pavilion Aero 13 - AMD Ryzen 7735U 

[    0.026135] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge)
[    0.026136] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.026137] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)

[    0.361640] ACPI: IRQ 1 override to edge, low(!)

217336 - keyboard not working Asus TUF FA617NS 
https://bugzilla.kernel.org/show_bug.cgi?id=217336

Asus TUF FA617NS - AMD Ryzen 7 7735HS

Noteworthy DSTD keyboard resource:

                IRQNoFlags ()
                    {1}

ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
ACPI: IRQ 1 override to edge, low(!)

So for all 3 do use MADT override on Zen bugs we have an INT_SRC_OVR dmesg entry
for IRQ 1.

Unfortunately the "MAINGEAR Vector Pro 2 17" / "MG-VCP2-17A3070T" for
which a quirk was added in commit 9946e39fe8d0 to force the override
even though it it Zen based breaks this pattern:

[    0.073733] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.073734] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[    0.341347] ACPI: IRQ 1 override to edge, high(!)

Still the presence of an INT_SRC_OVR for a specific legacy IRQ seems
to be a strong indicator that MADT overriding should be used in that
case and can be used to at least reduce the amount of DMI quirks.

Another interesting data point is that all the devices for which
DMI quirks are present for which MADT overriding should not be used
for IRQ 1 all have a DSDT entry with the IRQ configured as level low
and exclusive.

I think that the best thing to do might be to go back to
the original approach from:

https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?id=0ec4e55e9f571f08970ed115ec0addc691eda613

and then limit this to IRQ1. Also maybe inverting the check to:

static bool irq_is_legacy(struct acpi_resource_irq *irq)
{
	return !(irq->triggering == ACPI_LEVEL_SENSITIVE &&
		 irq->polarity == ACPI_ACTIVE_LOW &&
		 irq->shareable == ACPI_EXCLUSIVE);
}

But I need to check if this will work for all the new Zen models
for which we got bug reports after the recent dropping of
9946e39fe8d0 ("ACPI: resource: skip IRQ override on AMD Zen platforms")

Regards,

Hans





> ---
>  drivers/acpi/resource.c | 60 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 60 insertions(+)
> 
> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
> index 1dd8d5aebf67..0800a9d77558 100644
> --- a/drivers/acpi/resource.c
> +++ b/drivers/acpi/resource.c
> @@ -470,6 +470,52 @@ static const struct dmi_system_id asus_laptop[] = {
>  	{ }
>  };
>  
> +static const struct dmi_system_id lenovo_laptop[] = {
> +	{
> +		.ident = "LENOVO IdeaPad Flex 5 14ALC7",
> +		.matches = {
> +			DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
> +			DMI_MATCH(DMI_PRODUCT_NAME, "82R9"),
> +		},
> +	},
> +	{
> +		.ident = "LENOVO IdeaPad Flex 5 16ALC7",
> +		.matches = {
> +			DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
> +			DMI_MATCH(DMI_PRODUCT_NAME, "82RA"),
> +		},
> +	},
> +	{ }
> +};
> +
> +static const struct dmi_system_id tongfang_gm_rg[] = {
> +	{
> +		.ident = "TongFang GMxRGxx/XMG CORE 15 (M22)/TUXEDO Stellaris 15 Gen4 AMD",
> +		.matches = {
> +			DMI_MATCH(DMI_BOARD_NAME, "GMxRGxx"),
> +		},
> +	},
> +	{ }
> +};
> +
> +static const struct dmi_system_id maingear_laptop[] = {
> +	{
> +		.ident = "MAINGEAR Vector Pro 2 15",
> +		.matches = {
> +			DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"),
> +			DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-15A3070T"),
> +		}
> +	},
> +	{
> +		.ident = "MAINGEAR Vector Pro 2 17",
> +		.matches = {
> +			DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"),
> +			DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-17A3070T"),
> +		},
> +	},
> +	{ }
> +};
> +
>  static const struct dmi_system_id lg_laptop[] = {
>  	{
>  		.ident = "LG Electronics 17U70P",
> @@ -493,6 +539,10 @@ struct irq_override_cmp {
>  static const struct irq_override_cmp override_table[] = {
>  	{ medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
>  	{ asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
> +	{ lenovo_laptop, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true },
> +	{ lenovo_laptop, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true },
> +	{ tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
> +	{ maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
>  	{ lg_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
>  };
>  
> @@ -512,6 +562,16 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity,
>  			return entry->override;
>  	}
>  
> +#ifdef CONFIG_X86
> +	/*
> +	 * IRQ override isn't needed on modern AMD Zen systems and
> +	 * this override breaks active low IRQs on AMD Ryzen 6000 and
> +	 * newer systems. Skip it.
> +	 */
> +	if (boot_cpu_has(X86_FEATURE_ZEN))
> +		return false;
> +#endif
> +
>  	return true;
>  }
>
Hans de Goede Aug. 7, 2023, 5:06 p.m. UTC | #10
Hi,

On 8/7/23 13:39, Mario Limonciello wrote:
>>> We haven't even given a try to fixing it; I think the revert is still hasty.
>>>
>>> I don't have a machine that can reproduce this failure, but I did confirm that my keyboard still works with this:
>>>
>>> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
>>> index 1dd8d5aebf678..b74d7d8cc8630 100644
>>> --- a/drivers/acpi/resource.c
>>> +++ b/drivers/acpi/resource.c
>>> @@ -19,7 +19,7 @@
>>>   #include <linux/dmi.h>
>>>
>>>   #ifdef CONFIG_X86
>>> -#define valid_IRQ(i) (((i) != 0) && ((i) != 2))
>>> +#define valid_IRQ(i) ((i) > 2)
>>>   static inline bool acpi_iospace_resource_valid(struct resource *res)
>>>   {
>>>          /* On X86 IO space is limited to the [0 - 64K] IO port range */
>>>
>>> Can we perhaps see if that works instead for some affected people?
>>
>> That does not just skip the override stuff, it will make the kernel return irqresource_disabled(res, 1) for the kbd IRQ:
>>
>> static inline void irqresource_disabled(struct resource *res, u32 irq)
>> {
>>          res->start = irq;
>>          res->end = irq;
>>          res->flags |= IORESOURCE_IRQ | IORESOURCE_DISABLED | IORESOURCE_UNSET;
>> }
>>
>> Now not much code seems to actually check the IORESOURCE_DISABLED | IORESOURCE_UNSET flags, so this might still work but it does not seem like the right thing to do.
>>
> 
> Right; so it makes the resource get skipped when PNP is enumerated.

Which AFAICT means that PNP enumerated i8042-s will not have any IRQ assigned at all and this will not work.

Regards,

Hans
Mario Limonciello Aug. 7, 2023, 5:08 p.m. UTC | #11
On 8/7/2023 12:06, Hans de Goede wrote:
> Hi,
> 
> On 8/7/23 13:39, Mario Limonciello wrote:
>>>> We haven't even given a try to fixing it; I think the revert is still hasty.
>>>>
>>>> I don't have a machine that can reproduce this failure, but I did confirm that my keyboard still works with this:
>>>>
>>>> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
>>>> index 1dd8d5aebf678..b74d7d8cc8630 100644
>>>> --- a/drivers/acpi/resource.c
>>>> +++ b/drivers/acpi/resource.c
>>>> @@ -19,7 +19,7 @@
>>>>    #include <linux/dmi.h>
>>>>
>>>>    #ifdef CONFIG_X86
>>>> -#define valid_IRQ(i) (((i) != 0) && ((i) != 2))
>>>> +#define valid_IRQ(i) ((i) > 2)
>>>>    static inline bool acpi_iospace_resource_valid(struct resource *res)
>>>>    {
>>>>           /* On X86 IO space is limited to the [0 - 64K] IO port range */
>>>>
>>>> Can we perhaps see if that works instead for some affected people?
>>>
>>> That does not just skip the override stuff, it will make the kernel return irqresource_disabled(res, 1) for the kbd IRQ:
>>>
>>> static inline void irqresource_disabled(struct resource *res, u32 irq)
>>> {
>>>           res->start = irq;
>>>           res->end = irq;
>>>           res->flags |= IORESOURCE_IRQ | IORESOURCE_DISABLED | IORESOURCE_UNSET;
>>> }
>>>
>>> Now not much code seems to actually check the IORESOURCE_DISABLED | IORESOURCE_UNSET flags, so this might still work but it does not seem like the right thing to do.
>>>
>>
>> Right; so it makes the resource get skipped when PNP is enumerated.
> 
> Which AFAICT means that PNP enumerated i8042-s will not have any IRQ assigned at all and this will not work.
> 
> Regards,
> 
> Hans
> 

Yeah; I did some experimentation with some other machines and confirmed 
this doesn't work reliably on everything.

Your newly proposed direction sounds good to me though.  If you can post 
a new patch for that I'll test on the stuff I have.
Hans de Goede Aug. 8, 2023, 8:36 a.m. UTC | #12
Hi,

On 8/7/23 17:19, Hans de Goede wrote:
> Hi All,
> 
> On 8/6/23 17:14, Hans de Goede wrote:
>> Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and
>> quirks") is causing keyboard problems for quite a log of AMD based
>> laptop users, leading to many bug reports.
>>
>> Revert this change for now, until we can come up with
>> a better fix for the PS/2 IRQ trigger-type/polarity problems
>> on some x86 laptops.
>>
>> Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks")
>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165
>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726
>> Cc: Mario Limonciello <mario.limonciello@amd.com>
>> Cc: Linux regressions mailing list <regressions@lists.linux.dev>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> 
> I've spend most of today analysing the situation / this problem :
> 
> 213031 - MEDION notebook internal keyboard not recognized / not working correctly
> https://bugzilla.kernel.org/show_bug.cgi?id=213031
> 
> This is the bug that started it all, the issue here was overriding
> a level low DSDT entry:
> 
>                 IRQ (Level, ActiveLow, Exclusive, )
>                     {1}
> 
> With an edge high entry from the MADT, note that edge high is the default
> mp_irqs[idx].irqflags value for legacy/ISA IRQs. The dmesg for the Ice Lake
> Medion M15T this bug is about shows no INT_SRC_OVR entry for IRQ 1
> in the MADT, it does show INT_SRC_OVR entries for IRQ 0 and 9.
> 
> At first a fix was attempted to not use the MADT override unless
> the DSDT entry was edge high. But that caused regressions, so a switch
> to a DMI based approach was used instead. Noteworthy is that some of
> the regressions benefitted from a MADT override to high edge for
> IRQ 3 and 4 (UART IRQs) even though there are no INT_SRC_OVR messages
> in the dmesg of the machine with the regression.
> 
> *** fast forward to today ***
> 
> The DMI quirk based approach seems to have worked well for the Ice Lake
> era problems from approx. 3 years ago. But on AMD Zen based systems
> the situation seems to be more complex. Not using the MADT override
> is a problem for quite a few models. But using the MADT override
> is a problem on quite a few other models ...
> 
> Looking at the status quo for v6.4 where MADT overriding by default
> is not used, 3 bugs have been filed where the override is actually
> necessary (note dmesg snippets with patched kernel to enable
> MADT override):
> 
> 217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons 
> https://bugzilla.kernel.org/show_bug.cgi?id=217394
> 
> Aya Neo Air Plus - AMD Ryzen 7 6800U
> 
> [    0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> [    0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge)
> [    0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
> [    0.410670] ACPI: IRQ 1 override to edge, low(!)
> 
> 217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs
> https://bugzilla.kernel.org/show_bug.cgi?id=217406
> 
> HP Pavilion Aero 13 - AMD Ryzen 7735U 
> 
> [    0.026135] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge)
> [    0.026136] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> [    0.026137] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
> 
> [    0.361640] ACPI: IRQ 1 override to edge, low(!)
> 
> 217336 - keyboard not working Asus TUF FA617NS 
> https://bugzilla.kernel.org/show_bug.cgi?id=217336
> 
> Asus TUF FA617NS - AMD Ryzen 7 7735HS
> 
> Noteworthy DSTD keyboard resource:
> 
>                 IRQNoFlags ()
>                     {1}
> 
> ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge)
> ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
> ACPI: IRQ 1 override to edge, low(!)
> 
> So for all 3 do use MADT override on Zen bugs we have an INT_SRC_OVR dmesg entry
> for IRQ 1.
> 
> Unfortunately the "MAINGEAR Vector Pro 2 17" / "MG-VCP2-17A3070T" for
> which a quirk was added in commit 9946e39fe8d0 to force the override
> even though it it Zen based breaks this pattern:
> 
> [    0.073733] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> [    0.073734] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
> [    0.341347] ACPI: IRQ 1 override to edge, high(!)
> 
> Still the presence of an INT_SRC_OVR for a specific legacy IRQ seems
> to be a strong indicator that MADT overriding should be used in that
> case and can be used to at least reduce the amount of DMI quirks.
> 
> Another interesting data point is that all the devices for which
> DMI quirks are present for which MADT overriding should not be used
> for IRQ 1 all have a DSDT entry with the IRQ configured as level low
> and exclusive.
> 
> I think that the best thing to do might be to go back to
> the original approach from:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?id=0ec4e55e9f571f08970ed115ec0addc691eda613
> 
> and then limit this to IRQ1. Also maybe inverting the check to:
> 
> static bool irq_is_legacy(struct acpi_resource_irq *irq)
> {
> 	return !(irq->triggering == ACPI_LEVEL_SENSITIVE &&
> 		 irq->polarity == ACPI_ACTIVE_LOW &&
> 		 irq->shareable == ACPI_EXCLUSIVE);
> }
> 
> But I need to check if this will work for all the new Zen models
> for which we got bug reports after the recent dropping of
> 9946e39fe8d0 ("ACPI: resource: skip IRQ override on AMD Zen platforms")

So today I have started with continueing the investigation looking
at laptop models where we used to not override because of:

        if (boot_cpu_has(X86_FEATURE_ZEN))
                return false;

And where removing this and thus using the override has led to a
regression.

Looking at the acpidump-s from the following bugs:

https://bugzilla.redhat.com/show_bug.cgi?id=2229165
https://bugzilla.redhat.com/show_bug.cgi?id=2229317
https://bugzilla.kernel.org/show_bug.cgi?id=217726

All of these use the following settings for the kbd in the DSDT:

         IRQ (Edge, ActiveLow, Shared, )
             {1}

So we know these are at least 3 models with "Edge, ActiveLow, Shared" IRQ 1 settings which must not use the override. But the existing quirks before a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") contain:

        { tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
        { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },

IOW models with "Edge, ActiveLow, Shared" IRQ 1 settings which OTOH must use the override, so as I was already afraid there is no easy for these DSDT IRQ 1 settings skip the override solution :|

So I have 2 plans to move forward with this:

Plan 1. Short them for 6.5 and backporting to 6.4.y (and other stable series):

1. Revert a9c4a912b7dc
2. Limit the acpi_dev_irq_override() check to only ever skip the IRQ override
   (return false) for GSI 1.
3. Add a check if there is a INT_SRC_OVR MADT entry for GSI 1 and in that case
   use the override even on ZEN, fixing:

217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons 
 https://bugzilla.kernel.org/show_bug.cgi?id=217394
217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs
 https://bugzilla.kernel.org/show_bug.cgi?id=217406
217336 - keyboard not working Asus TUF FA617NS 
 https://bugzilla.kernel.org/show_bug.cgi?id=217336

Which are known AMD ZEN based laptops which do need the override for IRQ 1.

This short term plan is not ideal, but it does fix all currently known issues / models and does so in a way which will hopefully not cause regressions on any other models.


Plan 2. Long term, see if I can come up with a way to read back the actual trigger type set in the IOAPIC for IRQ 1 at boot (in drivers/acpi/resource.c) and use that.


Regards,

Hans
Rafael J. Wysocki Aug. 8, 2023, 11:18 a.m. UTC | #13
On Tue, Aug 8, 2023 at 10:36 AM Hans de Goede <hdegoede@redhat.com> wrote:
>
> Hi,
>
> On 8/7/23 17:19, Hans de Goede wrote:
> > Hi All,
> >
> > On 8/6/23 17:14, Hans de Goede wrote:
> >> Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and
> >> quirks") is causing keyboard problems for quite a log of AMD based
> >> laptop users, leading to many bug reports.
> >>
> >> Revert this change for now, until we can come up with
> >> a better fix for the PS/2 IRQ trigger-type/polarity problems
> >> on some x86 laptops.
> >>
> >> Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks")
> >> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165
> >> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317
> >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726
> >> Cc: Mario Limonciello <mario.limonciello@amd.com>
> >> Cc: Linux regressions mailing list <regressions@lists.linux.dev>
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> >
> > I've spend most of today analysing the situation / this problem :
> >
> > 213031 - MEDION notebook internal keyboard not recognized / not working correctly
> > https://bugzilla.kernel.org/show_bug.cgi?id=213031
> >
> > This is the bug that started it all, the issue here was overriding
> > a level low DSDT entry:
> >
> >                 IRQ (Level, ActiveLow, Exclusive, )
> >                     {1}
> >
> > With an edge high entry from the MADT, note that edge high is the default
> > mp_irqs[idx].irqflags value for legacy/ISA IRQs. The dmesg for the Ice Lake
> > Medion M15T this bug is about shows no INT_SRC_OVR entry for IRQ 1
> > in the MADT, it does show INT_SRC_OVR entries for IRQ 0 and 9.
> >
> > At first a fix was attempted to not use the MADT override unless
> > the DSDT entry was edge high. But that caused regressions, so a switch
> > to a DMI based approach was used instead. Noteworthy is that some of
> > the regressions benefitted from a MADT override to high edge for
> > IRQ 3 and 4 (UART IRQs) even though there are no INT_SRC_OVR messages
> > in the dmesg of the machine with the regression.
> >
> > *** fast forward to today ***
> >
> > The DMI quirk based approach seems to have worked well for the Ice Lake
> > era problems from approx. 3 years ago. But on AMD Zen based systems
> > the situation seems to be more complex. Not using the MADT override
> > is a problem for quite a few models. But using the MADT override
> > is a problem on quite a few other models ...
> >
> > Looking at the status quo for v6.4 where MADT overriding by default
> > is not used, 3 bugs have been filed where the override is actually
> > necessary (note dmesg snippets with patched kernel to enable
> > MADT override):
> >
> > 217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons
> > https://bugzilla.kernel.org/show_bug.cgi?id=217394
> >
> > Aya Neo Air Plus - AMD Ryzen 7 6800U
> >
> > [    0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> > [    0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge)
> > [    0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
> > [    0.410670] ACPI: IRQ 1 override to edge, low(!)
> >
> > 217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs
> > https://bugzilla.kernel.org/show_bug.cgi?id=217406
> >
> > HP Pavilion Aero 13 - AMD Ryzen 7735U
> >
> > [    0.026135] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge)
> > [    0.026136] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> > [    0.026137] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
> >
> > [    0.361640] ACPI: IRQ 1 override to edge, low(!)
> >
> > 217336 - keyboard not working Asus TUF FA617NS
> > https://bugzilla.kernel.org/show_bug.cgi?id=217336
> >
> > Asus TUF FA617NS - AMD Ryzen 7 7735HS
> >
> > Noteworthy DSTD keyboard resource:
> >
> >                 IRQNoFlags ()
> >                     {1}
> >
> > ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> > ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge)
> > ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
> > ACPI: IRQ 1 override to edge, low(!)
> >
> > So for all 3 do use MADT override on Zen bugs we have an INT_SRC_OVR dmesg entry
> > for IRQ 1.
> >
> > Unfortunately the "MAINGEAR Vector Pro 2 17" / "MG-VCP2-17A3070T" for
> > which a quirk was added in commit 9946e39fe8d0 to force the override
> > even though it it Zen based breaks this pattern:
> >
> > [    0.073733] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> > [    0.073734] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
> > [    0.341347] ACPI: IRQ 1 override to edge, high(!)
> >
> > Still the presence of an INT_SRC_OVR for a specific legacy IRQ seems
> > to be a strong indicator that MADT overriding should be used in that
> > case and can be used to at least reduce the amount of DMI quirks.
> >
> > Another interesting data point is that all the devices for which
> > DMI quirks are present for which MADT overriding should not be used
> > for IRQ 1 all have a DSDT entry with the IRQ configured as level low
> > and exclusive.
> >
> > I think that the best thing to do might be to go back to
> > the original approach from:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?id=0ec4e55e9f571f08970ed115ec0addc691eda613
> >
> > and then limit this to IRQ1. Also maybe inverting the check to:
> >
> > static bool irq_is_legacy(struct acpi_resource_irq *irq)
> > {
> >       return !(irq->triggering == ACPI_LEVEL_SENSITIVE &&
> >                irq->polarity == ACPI_ACTIVE_LOW &&
> >                irq->shareable == ACPI_EXCLUSIVE);
> > }
> >
> > But I need to check if this will work for all the new Zen models
> > for which we got bug reports after the recent dropping of
> > 9946e39fe8d0 ("ACPI: resource: skip IRQ override on AMD Zen platforms")
>
> So today I have started with continueing the investigation looking
> at laptop models where we used to not override because of:
>
>         if (boot_cpu_has(X86_FEATURE_ZEN))
>                 return false;
>
> And where removing this and thus using the override has led to a
> regression.
>
> Looking at the acpidump-s from the following bugs:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=2229165
> https://bugzilla.redhat.com/show_bug.cgi?id=2229317
> https://bugzilla.kernel.org/show_bug.cgi?id=217726
>
> All of these use the following settings for the kbd in the DSDT:
>
>          IRQ (Edge, ActiveLow, Shared, )
>              {1}
>
> So we know these are at least 3 models with "Edge, ActiveLow, Shared" IRQ 1 settings which must not use the override. But the existing quirks before a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") contain:
>
>         { tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
>         { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
>
> IOW models with "Edge, ActiveLow, Shared" IRQ 1 settings which OTOH must use the override, so as I was already afraid there is no easy for these DSDT IRQ 1 settings skip the override solution :|
>
> So I have 2 plans to move forward with this:
>
> Plan 1. Short them for 6.5 and backporting to 6.4.y (and other stable series):
>
> 1. Revert a9c4a912b7dc
> 2. Limit the acpi_dev_irq_override() check to only ever skip the IRQ override
>    (return false) for GSI 1.
> 3. Add a check if there is a INT_SRC_OVR MADT entry for GSI 1 and in that case
>    use the override even on ZEN, fixing:
>
> 217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons
>  https://bugzilla.kernel.org/show_bug.cgi?id=217394
> 217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs
>  https://bugzilla.kernel.org/show_bug.cgi?id=217406
> 217336 - keyboard not working Asus TUF FA617NS
>  https://bugzilla.kernel.org/show_bug.cgi?id=217336
>
> Which are known AMD ZEN based laptops which do need the override for IRQ 1.
>
> This short term plan is not ideal, but it does fix all currently known issues / models and does so in a way which will hopefully not cause regressions on any other models.
>
>
> Plan 2. Long term, see if I can come up with a way to read back the actual trigger type set in the IOAPIC for IRQ 1 at boot (in drivers/acpi/resource.c) and use that.

Sounds reasonable to me.

It looks like using the IRQ 1 configuration left by the BIOS as is
would be the best choice unless that is not viable for some reason.
Hans de Goede Aug. 8, 2023, 1:31 p.m. UTC | #14
Hi,

On 8/8/23 13:18, Rafael J. Wysocki wrote:
> On Tue, Aug 8, 2023 at 10:36 AM Hans de Goede <hdegoede@redhat.com> wrote:
>>
>> Hi,
>>
>> On 8/7/23 17:19, Hans de Goede wrote:
>>> Hi All,
>>>
>>> On 8/6/23 17:14, Hans de Goede wrote:
>>>> Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and
>>>> quirks") is causing keyboard problems for quite a log of AMD based
>>>> laptop users, leading to many bug reports.
>>>>
>>>> Revert this change for now, until we can come up with
>>>> a better fix for the PS/2 IRQ trigger-type/polarity problems
>>>> on some x86 laptops.
>>>>
>>>> Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks")
>>>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165
>>>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317
>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726
>>>> Cc: Mario Limonciello <mario.limonciello@amd.com>
>>>> Cc: Linux regressions mailing list <regressions@lists.linux.dev>
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>>>
>>> I've spend most of today analysing the situation / this problem :
>>>
>>> 213031 - MEDION notebook internal keyboard not recognized / not working correctly
>>> https://bugzilla.kernel.org/show_bug.cgi?id=213031
>>>
>>> This is the bug that started it all, the issue here was overriding
>>> a level low DSDT entry:
>>>
>>>                 IRQ (Level, ActiveLow, Exclusive, )
>>>                     {1}
>>>
>>> With an edge high entry from the MADT, note that edge high is the default
>>> mp_irqs[idx].irqflags value for legacy/ISA IRQs. The dmesg for the Ice Lake
>>> Medion M15T this bug is about shows no INT_SRC_OVR entry for IRQ 1
>>> in the MADT, it does show INT_SRC_OVR entries for IRQ 0 and 9.
>>>
>>> At first a fix was attempted to not use the MADT override unless
>>> the DSDT entry was edge high. But that caused regressions, so a switch
>>> to a DMI based approach was used instead. Noteworthy is that some of
>>> the regressions benefitted from a MADT override to high edge for
>>> IRQ 3 and 4 (UART IRQs) even though there are no INT_SRC_OVR messages
>>> in the dmesg of the machine with the regression.
>>>
>>> *** fast forward to today ***
>>>
>>> The DMI quirk based approach seems to have worked well for the Ice Lake
>>> era problems from approx. 3 years ago. But on AMD Zen based systems
>>> the situation seems to be more complex. Not using the MADT override
>>> is a problem for quite a few models. But using the MADT override
>>> is a problem on quite a few other models ...
>>>
>>> Looking at the status quo for v6.4 where MADT overriding by default
>>> is not used, 3 bugs have been filed where the override is actually
>>> necessary (note dmesg snippets with patched kernel to enable
>>> MADT override):
>>>
>>> 217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons
>>> https://bugzilla.kernel.org/show_bug.cgi?id=217394
>>>
>>> Aya Neo Air Plus - AMD Ryzen 7 6800U
>>>
>>> [    0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
>>> [    0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge)
>>> [    0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
>>> [    0.410670] ACPI: IRQ 1 override to edge, low(!)
>>>
>>> 217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs
>>> https://bugzilla.kernel.org/show_bug.cgi?id=217406
>>>
>>> HP Pavilion Aero 13 - AMD Ryzen 7735U
>>>
>>> [    0.026135] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge)
>>> [    0.026136] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
>>> [    0.026137] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
>>>
>>> [    0.361640] ACPI: IRQ 1 override to edge, low(!)
>>>
>>> 217336 - keyboard not working Asus TUF FA617NS
>>> https://bugzilla.kernel.org/show_bug.cgi?id=217336
>>>
>>> Asus TUF FA617NS - AMD Ryzen 7 7735HS
>>>
>>> Noteworthy DSTD keyboard resource:
>>>
>>>                 IRQNoFlags ()
>>>                     {1}
>>>
>>> ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
>>> ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge)
>>> ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
>>> ACPI: IRQ 1 override to edge, low(!)
>>>
>>> So for all 3 do use MADT override on Zen bugs we have an INT_SRC_OVR dmesg entry
>>> for IRQ 1.
>>>
>>> Unfortunately the "MAINGEAR Vector Pro 2 17" / "MG-VCP2-17A3070T" for
>>> which a quirk was added in commit 9946e39fe8d0 to force the override
>>> even though it it Zen based breaks this pattern:
>>>
>>> [    0.073733] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
>>> [    0.073734] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
>>> [    0.341347] ACPI: IRQ 1 override to edge, high(!)
>>>
>>> Still the presence of an INT_SRC_OVR for a specific legacy IRQ seems
>>> to be a strong indicator that MADT overriding should be used in that
>>> case and can be used to at least reduce the amount of DMI quirks.
>>>
>>> Another interesting data point is that all the devices for which
>>> DMI quirks are present for which MADT overriding should not be used
>>> for IRQ 1 all have a DSDT entry with the IRQ configured as level low
>>> and exclusive.
>>>
>>> I think that the best thing to do might be to go back to
>>> the original approach from:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?id=0ec4e55e9f571f08970ed115ec0addc691eda613
>>>
>>> and then limit this to IRQ1. Also maybe inverting the check to:
>>>
>>> static bool irq_is_legacy(struct acpi_resource_irq *irq)
>>> {
>>>       return !(irq->triggering == ACPI_LEVEL_SENSITIVE &&
>>>                irq->polarity == ACPI_ACTIVE_LOW &&
>>>                irq->shareable == ACPI_EXCLUSIVE);
>>> }
>>>
>>> But I need to check if this will work for all the new Zen models
>>> for which we got bug reports after the recent dropping of
>>> 9946e39fe8d0 ("ACPI: resource: skip IRQ override on AMD Zen platforms")
>>
>> So today I have started with continueing the investigation looking
>> at laptop models where we used to not override because of:
>>
>>         if (boot_cpu_has(X86_FEATURE_ZEN))
>>                 return false;
>>
>> And where removing this and thus using the override has led to a
>> regression.
>>
>> Looking at the acpidump-s from the following bugs:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=2229165
>> https://bugzilla.redhat.com/show_bug.cgi?id=2229317
>> https://bugzilla.kernel.org/show_bug.cgi?id=217726
>>
>> All of these use the following settings for the kbd in the DSDT:
>>
>>          IRQ (Edge, ActiveLow, Shared, )
>>              {1}
>>
>> So we know these are at least 3 models with "Edge, ActiveLow, Shared" IRQ 1 settings which must not use the override. But the existing quirks before a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") contain:
>>
>>         { tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
>>         { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
>>
>> IOW models with "Edge, ActiveLow, Shared" IRQ 1 settings which OTOH must use the override, so as I was already afraid there is no easy for these DSDT IRQ 1 settings skip the override solution :|
>>
>> So I have 2 plans to move forward with this:
>>
>> Plan 1. Short them for 6.5 and backporting to 6.4.y (and other stable series):
>>
>> 1. Revert a9c4a912b7dc
>> 2. Limit the acpi_dev_irq_override() check to only ever skip the IRQ override
>>    (return false) for GSI 1.
>> 3. Add a check if there is a INT_SRC_OVR MADT entry for GSI 1 and in that case
>>    use the override even on ZEN, fixing:
>>
>> 217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons
>>  https://bugzilla.kernel.org/show_bug.cgi?id=217394
>> 217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs
>>  https://bugzilla.kernel.org/show_bug.cgi?id=217406
>> 217336 - keyboard not working Asus TUF FA617NS
>>  https://bugzilla.kernel.org/show_bug.cgi?id=217336
>>
>> Which are known AMD ZEN based laptops which do need the override for IRQ 1.
>>
>> This short term plan is not ideal, but it does fix all currently known issues / models and does so in a way which will hopefully not cause regressions on any other models.
>>
>>
>> Plan 2. Long term, see if I can come up with a way to read back the actual trigger type set in the IOAPIC for IRQ 1 at boot (in drivers/acpi/resource.c) and use that.
> 
> Sounds reasonable to me.
> 
> It looks like using the IRQ 1 configuration left by the BIOS as is
> would be the best choice unless that is not viable for some reason.

Agreed, do you have a suggestion how to do that ? I've been looking at this but I've gotten a bit lost in all the layers of ioapic code.

It looks like we need to add a flag to acpi_register_gsi() for it to register a gsi while keeping the IOAPIC settings as is (or a new function) but it is not clear to me how to implement this.

An alternative method would be to call irq_get_trigger_type() for IRQ 1 and use that for the IRQ trigger info when calling acpi_register_gsi(), but I think we need to have the IRQ registered / added to the IRQ domain first ?

Regards,

Hans
Rafael J. Wysocki Aug. 8, 2023, 1:37 p.m. UTC | #15
Hi,

On Tue, Aug 8, 2023 at 3:31 PM Hans de Goede <hdegoede@redhat.com> wrote:
>
> Hi,
>
> On 8/8/23 13:18, Rafael J. Wysocki wrote:
> > On Tue, Aug 8, 2023 at 10:36 AM Hans de Goede <hdegoede@redhat.com> wrote:
> >>
> >> Hi,
> >>
> >> On 8/7/23 17:19, Hans de Goede wrote:
> >>> Hi All,
> >>>
> >>> On 8/6/23 17:14, Hans de Goede wrote:
> >>>> Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and
> >>>> quirks") is causing keyboard problems for quite a log of AMD based
> >>>> laptop users, leading to many bug reports.
> >>>>
> >>>> Revert this change for now, until we can come up with
> >>>> a better fix for the PS/2 IRQ trigger-type/polarity problems
> >>>> on some x86 laptops.
> >>>>
> >>>> Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks")
> >>>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165
> >>>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317
> >>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726
> >>>> Cc: Mario Limonciello <mario.limonciello@amd.com>
> >>>> Cc: Linux regressions mailing list <regressions@lists.linux.dev>
> >>>> Cc: stable@vger.kernel.org
> >>>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> >>>
> >>> I've spend most of today analysing the situation / this problem :
> >>>
> >>> 213031 - MEDION notebook internal keyboard not recognized / not working correctly
> >>> https://bugzilla.kernel.org/show_bug.cgi?id=213031
> >>>
> >>> This is the bug that started it all, the issue here was overriding
> >>> a level low DSDT entry:
> >>>
> >>>                 IRQ (Level, ActiveLow, Exclusive, )
> >>>                     {1}
> >>>
> >>> With an edge high entry from the MADT, note that edge high is the default
> >>> mp_irqs[idx].irqflags value for legacy/ISA IRQs. The dmesg for the Ice Lake
> >>> Medion M15T this bug is about shows no INT_SRC_OVR entry for IRQ 1
> >>> in the MADT, it does show INT_SRC_OVR entries for IRQ 0 and 9.
> >>>
> >>> At first a fix was attempted to not use the MADT override unless
> >>> the DSDT entry was edge high. But that caused regressions, so a switch
> >>> to a DMI based approach was used instead. Noteworthy is that some of
> >>> the regressions benefitted from a MADT override to high edge for
> >>> IRQ 3 and 4 (UART IRQs) even though there are no INT_SRC_OVR messages
> >>> in the dmesg of the machine with the regression.
> >>>
> >>> *** fast forward to today ***
> >>>
> >>> The DMI quirk based approach seems to have worked well for the Ice Lake
> >>> era problems from approx. 3 years ago. But on AMD Zen based systems
> >>> the situation seems to be more complex. Not using the MADT override
> >>> is a problem for quite a few models. But using the MADT override
> >>> is a problem on quite a few other models ...
> >>>
> >>> Looking at the status quo for v6.4 where MADT overriding by default
> >>> is not used, 3 bugs have been filed where the override is actually
> >>> necessary (note dmesg snippets with patched kernel to enable
> >>> MADT override):
> >>>
> >>> 217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons
> >>> https://bugzilla.kernel.org/show_bug.cgi?id=217394
> >>>
> >>> Aya Neo Air Plus - AMD Ryzen 7 6800U
> >>>
> >>> [    0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> >>> [    0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge)
> >>> [    0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
> >>> [    0.410670] ACPI: IRQ 1 override to edge, low(!)
> >>>
> >>> 217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs
> >>> https://bugzilla.kernel.org/show_bug.cgi?id=217406
> >>>
> >>> HP Pavilion Aero 13 - AMD Ryzen 7735U
> >>>
> >>> [    0.026135] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge)
> >>> [    0.026136] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> >>> [    0.026137] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
> >>>
> >>> [    0.361640] ACPI: IRQ 1 override to edge, low(!)
> >>>
> >>> 217336 - keyboard not working Asus TUF FA617NS
> >>> https://bugzilla.kernel.org/show_bug.cgi?id=217336
> >>>
> >>> Asus TUF FA617NS - AMD Ryzen 7 7735HS
> >>>
> >>> Noteworthy DSTD keyboard resource:
> >>>
> >>>                 IRQNoFlags ()
> >>>                     {1}
> >>>
> >>> ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> >>> ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge)
> >>> ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
> >>> ACPI: IRQ 1 override to edge, low(!)
> >>>
> >>> So for all 3 do use MADT override on Zen bugs we have an INT_SRC_OVR dmesg entry
> >>> for IRQ 1.
> >>>
> >>> Unfortunately the "MAINGEAR Vector Pro 2 17" / "MG-VCP2-17A3070T" for
> >>> which a quirk was added in commit 9946e39fe8d0 to force the override
> >>> even though it it Zen based breaks this pattern:
> >>>
> >>> [    0.073733] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> >>> [    0.073734] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
> >>> [    0.341347] ACPI: IRQ 1 override to edge, high(!)
> >>>
> >>> Still the presence of an INT_SRC_OVR for a specific legacy IRQ seems
> >>> to be a strong indicator that MADT overriding should be used in that
> >>> case and can be used to at least reduce the amount of DMI quirks.
> >>>
> >>> Another interesting data point is that all the devices for which
> >>> DMI quirks are present for which MADT overriding should not be used
> >>> for IRQ 1 all have a DSDT entry with the IRQ configured as level low
> >>> and exclusive.
> >>>
> >>> I think that the best thing to do might be to go back to
> >>> the original approach from:
> >>>
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?id=0ec4e55e9f571f08970ed115ec0addc691eda613
> >>>
> >>> and then limit this to IRQ1. Also maybe inverting the check to:
> >>>
> >>> static bool irq_is_legacy(struct acpi_resource_irq *irq)
> >>> {
> >>>       return !(irq->triggering == ACPI_LEVEL_SENSITIVE &&
> >>>                irq->polarity == ACPI_ACTIVE_LOW &&
> >>>                irq->shareable == ACPI_EXCLUSIVE);
> >>> }
> >>>
> >>> But I need to check if this will work for all the new Zen models
> >>> for which we got bug reports after the recent dropping of
> >>> 9946e39fe8d0 ("ACPI: resource: skip IRQ override on AMD Zen platforms")
> >>
> >> So today I have started with continueing the investigation looking
> >> at laptop models where we used to not override because of:
> >>
> >>         if (boot_cpu_has(X86_FEATURE_ZEN))
> >>                 return false;
> >>
> >> And where removing this and thus using the override has led to a
> >> regression.
> >>
> >> Looking at the acpidump-s from the following bugs:
> >>
> >> https://bugzilla.redhat.com/show_bug.cgi?id=2229165
> >> https://bugzilla.redhat.com/show_bug.cgi?id=2229317
> >> https://bugzilla.kernel.org/show_bug.cgi?id=217726
> >>
> >> All of these use the following settings for the kbd in the DSDT:
> >>
> >>          IRQ (Edge, ActiveLow, Shared, )
> >>              {1}
> >>
> >> So we know these are at least 3 models with "Edge, ActiveLow, Shared" IRQ 1 settings which must not use the override. But the existing quirks before a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") contain:
> >>
> >>         { tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
> >>         { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
> >>
> >> IOW models with "Edge, ActiveLow, Shared" IRQ 1 settings which OTOH must use the override, so as I was already afraid there is no easy for these DSDT IRQ 1 settings skip the override solution :|
> >>
> >> So I have 2 plans to move forward with this:
> >>
> >> Plan 1. Short them for 6.5 and backporting to 6.4.y (and other stable series):
> >>
> >> 1. Revert a9c4a912b7dc
> >> 2. Limit the acpi_dev_irq_override() check to only ever skip the IRQ override
> >>    (return false) for GSI 1.
> >> 3. Add a check if there is a INT_SRC_OVR MADT entry for GSI 1 and in that case
> >>    use the override even on ZEN, fixing:
> >>
> >> 217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons
> >>  https://bugzilla.kernel.org/show_bug.cgi?id=217394
> >> 217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs
> >>  https://bugzilla.kernel.org/show_bug.cgi?id=217406
> >> 217336 - keyboard not working Asus TUF FA617NS
> >>  https://bugzilla.kernel.org/show_bug.cgi?id=217336
> >>
> >> Which are known AMD ZEN based laptops which do need the override for IRQ 1.
> >>
> >> This short term plan is not ideal, but it does fix all currently known issues / models and does so in a way which will hopefully not cause regressions on any other models.
> >>
> >>
> >> Plan 2. Long term, see if I can come up with a way to read back the actual trigger type set in the IOAPIC for IRQ 1 at boot (in drivers/acpi/resource.c) and use that.
> >
> > Sounds reasonable to me.
> >
> > It looks like using the IRQ 1 configuration left by the BIOS as is
> > would be the best choice unless that is not viable for some reason.
>
> Agreed, do you have a suggestion how to do that ? I've been looking at this but I've gotten a bit lost in all the layers of ioapic code.
>
> It looks like we need to add a flag to acpi_register_gsi() for it to register a gsi while keeping the IOAPIC settings as is (or a new function) but it is not clear to me how to implement this.
>
> An alternative method would be to call irq_get_trigger_type() for IRQ 1 and use that for the IRQ trigger info when calling acpi_register_gsi(), but I think we need to have the IRQ registered / added to the IRQ domain first ?

I need to take a deeper look into this, which I guess is going to take
some time.

If you figure this out in the meantime, please let me know.
Hans de Goede Aug. 8, 2023, 1:59 p.m. UTC | #16
Hi Rafael,

On 8/8/23 15:37, Rafael J. Wysocki wrote:
> Hi,
> 
> On Tue, Aug 8, 2023 at 3:31 PM Hans de Goede <hdegoede@redhat.com> wrote:
>>
>> Hi,
>>
>> On 8/8/23 13:18, Rafael J. Wysocki wrote:
>>> On Tue, Aug 8, 2023 at 10:36 AM Hans de Goede <hdegoede@redhat.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 8/7/23 17:19, Hans de Goede wrote:
>>>>> Hi All,
>>>>>
>>>>> On 8/6/23 17:14, Hans de Goede wrote:
>>>>>> Commit a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and
>>>>>> quirks") is causing keyboard problems for quite a log of AMD based
>>>>>> laptop users, leading to many bug reports.
>>>>>>
>>>>>> Revert this change for now, until we can come up with
>>>>>> a better fix for the PS/2 IRQ trigger-type/polarity problems
>>>>>> on some x86 laptops.
>>>>>>
>>>>>> Fixes: a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks")
>>>>>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229165
>>>>>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2229317
>>>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217726
>>>>>> Cc: Mario Limonciello <mario.limonciello@amd.com>
>>>>>> Cc: Linux regressions mailing list <regressions@lists.linux.dev>
>>>>>> Cc: stable@vger.kernel.org
>>>>>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>>>>>
>>>>> I've spend most of today analysing the situation / this problem :
>>>>>
>>>>> 213031 - MEDION notebook internal keyboard not recognized / not working correctly
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=213031
>>>>>
>>>>> This is the bug that started it all, the issue here was overriding
>>>>> a level low DSDT entry:
>>>>>
>>>>>                 IRQ (Level, ActiveLow, Exclusive, )
>>>>>                     {1}
>>>>>
>>>>> With an edge high entry from the MADT, note that edge high is the default
>>>>> mp_irqs[idx].irqflags value for legacy/ISA IRQs. The dmesg for the Ice Lake
>>>>> Medion M15T this bug is about shows no INT_SRC_OVR entry for IRQ 1
>>>>> in the MADT, it does show INT_SRC_OVR entries for IRQ 0 and 9.
>>>>>
>>>>> At first a fix was attempted to not use the MADT override unless
>>>>> the DSDT entry was edge high. But that caused regressions, so a switch
>>>>> to a DMI based approach was used instead. Noteworthy is that some of
>>>>> the regressions benefitted from a MADT override to high edge for
>>>>> IRQ 3 and 4 (UART IRQs) even though there are no INT_SRC_OVR messages
>>>>> in the dmesg of the machine with the regression.
>>>>>
>>>>> *** fast forward to today ***
>>>>>
>>>>> The DMI quirk based approach seems to have worked well for the Ice Lake
>>>>> era problems from approx. 3 years ago. But on AMD Zen based systems
>>>>> the situation seems to be more complex. Not using the MADT override
>>>>> is a problem for quite a few models. But using the MADT override
>>>>> is a problem on quite a few other models ...
>>>>>
>>>>> Looking at the status quo for v6.4 where MADT overriding by default
>>>>> is not used, 3 bugs have been filed where the override is actually
>>>>> necessary (note dmesg snippets with patched kernel to enable
>>>>> MADT override):
>>>>>
>>>>> 217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=217394
>>>>>
>>>>> Aya Neo Air Plus - AMD Ryzen 7 6800U
>>>>>
>>>>> [    0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
>>>>> [    0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge)
>>>>> [    0.003333] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
>>>>> [    0.410670] ACPI: IRQ 1 override to edge, low(!)
>>>>>
>>>>> 217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=217406
>>>>>
>>>>> HP Pavilion Aero 13 - AMD Ryzen 7735U
>>>>>
>>>>> [    0.026135] ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge)
>>>>> [    0.026136] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
>>>>> [    0.026137] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
>>>>>
>>>>> [    0.361640] ACPI: IRQ 1 override to edge, low(!)
>>>>>
>>>>> 217336 - keyboard not working Asus TUF FA617NS
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=217336
>>>>>
>>>>> Asus TUF FA617NS - AMD Ryzen 7 7735HS
>>>>>
>>>>> Noteworthy DSTD keyboard resource:
>>>>>
>>>>>                 IRQNoFlags ()
>>>>>                     {1}
>>>>>
>>>>> ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
>>>>> ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 low edge)
>>>>> ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
>>>>> ACPI: IRQ 1 override to edge, low(!)
>>>>>
>>>>> So for all 3 do use MADT override on Zen bugs we have an INT_SRC_OVR dmesg entry
>>>>> for IRQ 1.
>>>>>
>>>>> Unfortunately the "MAINGEAR Vector Pro 2 17" / "MG-VCP2-17A3070T" for
>>>>> which a quirk was added in commit 9946e39fe8d0 to force the override
>>>>> even though it it Zen based breaks this pattern:
>>>>>
>>>>> [    0.073733] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
>>>>> [    0.073734] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
>>>>> [    0.341347] ACPI: IRQ 1 override to edge, high(!)
>>>>>
>>>>> Still the presence of an INT_SRC_OVR for a specific legacy IRQ seems
>>>>> to be a strong indicator that MADT overriding should be used in that
>>>>> case and can be used to at least reduce the amount of DMI quirks.
>>>>>
>>>>> Another interesting data point is that all the devices for which
>>>>> DMI quirks are present for which MADT overriding should not be used
>>>>> for IRQ 1 all have a DSDT entry with the IRQ configured as level low
>>>>> and exclusive.
>>>>>
>>>>> I think that the best thing to do might be to go back to
>>>>> the original approach from:
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?id=0ec4e55e9f571f08970ed115ec0addc691eda613
>>>>>
>>>>> and then limit this to IRQ1. Also maybe inverting the check to:
>>>>>
>>>>> static bool irq_is_legacy(struct acpi_resource_irq *irq)
>>>>> {
>>>>>       return !(irq->triggering == ACPI_LEVEL_SENSITIVE &&
>>>>>                irq->polarity == ACPI_ACTIVE_LOW &&
>>>>>                irq->shareable == ACPI_EXCLUSIVE);
>>>>> }
>>>>>
>>>>> But I need to check if this will work for all the new Zen models
>>>>> for which we got bug reports after the recent dropping of
>>>>> 9946e39fe8d0 ("ACPI: resource: skip IRQ override on AMD Zen platforms")
>>>>
>>>> So today I have started with continueing the investigation looking
>>>> at laptop models where we used to not override because of:
>>>>
>>>>         if (boot_cpu_has(X86_FEATURE_ZEN))
>>>>                 return false;
>>>>
>>>> And where removing this and thus using the override has led to a
>>>> regression.
>>>>
>>>> Looking at the acpidump-s from the following bugs:
>>>>
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=2229165
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=2229317
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=217726
>>>>
>>>> All of these use the following settings for the kbd in the DSDT:
>>>>
>>>>          IRQ (Edge, ActiveLow, Shared, )
>>>>              {1}
>>>>
>>>> So we know these are at least 3 models with "Edge, ActiveLow, Shared" IRQ 1 settings which must not use the override. But the existing quirks before a9c4a912b7dc ("ACPI: resource: Remove "Zen" specific match and quirks") contain:
>>>>
>>>>         { tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
>>>>         { maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
>>>>
>>>> IOW models with "Edge, ActiveLow, Shared" IRQ 1 settings which OTOH must use the override, so as I was already afraid there is no easy for these DSDT IRQ 1 settings skip the override solution :|
>>>>
>>>> So I have 2 plans to move forward with this:
>>>>
>>>> Plan 1. Short them for 6.5 and backporting to 6.4.y (and other stable series):
>>>>
>>>> 1. Revert a9c4a912b7dc
>>>> 2. Limit the acpi_dev_irq_override() check to only ever skip the IRQ override
>>>>    (return false) for GSI 1.
>>>> 3. Add a check if there is a INT_SRC_OVR MADT entry for GSI 1 and in that case
>>>>    use the override even on ZEN, fixing:
>>>>
>>>> 217394 - IRQ override skipping breaks the Aya Neo Air Plus 6800U keyboard buttons
>>>>  https://bugzilla.kernel.org/show_bug.cgi?id=217394
>>>> 217406 - very slow keyboard typing without IRQ override with new AMD Ryzen CPUs
>>>>  https://bugzilla.kernel.org/show_bug.cgi?id=217406
>>>> 217336 - keyboard not working Asus TUF FA617NS
>>>>  https://bugzilla.kernel.org/show_bug.cgi?id=217336
>>>>
>>>> Which are known AMD ZEN based laptops which do need the override for IRQ 1.
>>>>
>>>> This short term plan is not ideal, but it does fix all currently known issues / models and does so in a way which will hopefully not cause regressions on any other models.
>>>>
>>>>
>>>> Plan 2. Long term, see if I can come up with a way to read back the actual trigger type set in the IOAPIC for IRQ 1 at boot (in drivers/acpi/resource.c) and use that.
>>>
>>> Sounds reasonable to me.
>>>
>>> It looks like using the IRQ 1 configuration left by the BIOS as is
>>> would be the best choice unless that is not viable for some reason.
>>
>> Agreed, do you have a suggestion how to do that ? I've been looking at this but I've gotten a bit lost in all the layers of ioapic code.
>>
>> It looks like we need to add a flag to acpi_register_gsi() for it to register a gsi while keeping the IOAPIC settings as is (or a new function) but it is not clear to me how to implement this.
>>
>> An alternative method would be to call irq_get_trigger_type() for IRQ 1 and use that for the IRQ trigger info when calling acpi_register_gsi(), but I think we need to have the IRQ registered / added to the IRQ domain first ?
> 
> I need to take a deeper look into this, which I guess is going to take
> some time.
> 
> If you figure this out in the meantime, please let me know.

I don't see myself having time to dive deeper into this anytime soon,
so if you can take a look (as time permits) that would be great.

Regards,

Hans
diff mbox series

Patch

diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
index 1dd8d5aebf67..0800a9d77558 100644
--- a/drivers/acpi/resource.c
+++ b/drivers/acpi/resource.c
@@ -470,6 +470,52 @@  static const struct dmi_system_id asus_laptop[] = {
 	{ }
 };
 
+static const struct dmi_system_id lenovo_laptop[] = {
+	{
+		.ident = "LENOVO IdeaPad Flex 5 14ALC7",
+		.matches = {
+			DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+			DMI_MATCH(DMI_PRODUCT_NAME, "82R9"),
+		},
+	},
+	{
+		.ident = "LENOVO IdeaPad Flex 5 16ALC7",
+		.matches = {
+			DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+			DMI_MATCH(DMI_PRODUCT_NAME, "82RA"),
+		},
+	},
+	{ }
+};
+
+static const struct dmi_system_id tongfang_gm_rg[] = {
+	{
+		.ident = "TongFang GMxRGxx/XMG CORE 15 (M22)/TUXEDO Stellaris 15 Gen4 AMD",
+		.matches = {
+			DMI_MATCH(DMI_BOARD_NAME, "GMxRGxx"),
+		},
+	},
+	{ }
+};
+
+static const struct dmi_system_id maingear_laptop[] = {
+	{
+		.ident = "MAINGEAR Vector Pro 2 15",
+		.matches = {
+			DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"),
+			DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-15A3070T"),
+		}
+	},
+	{
+		.ident = "MAINGEAR Vector Pro 2 17",
+		.matches = {
+			DMI_MATCH(DMI_SYS_VENDOR, "Micro Electronics Inc"),
+			DMI_MATCH(DMI_PRODUCT_NAME, "MG-VCP2-17A3070T"),
+		},
+	},
+	{ }
+};
+
 static const struct dmi_system_id lg_laptop[] = {
 	{
 		.ident = "LG Electronics 17U70P",
@@ -493,6 +539,10 @@  struct irq_override_cmp {
 static const struct irq_override_cmp override_table[] = {
 	{ medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
 	{ asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
+	{ lenovo_laptop, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true },
+	{ lenovo_laptop, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true },
+	{ tongfang_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
+	{ maingear_laptop, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true },
 	{ lg_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false },
 };
 
@@ -512,6 +562,16 @@  static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity,
 			return entry->override;
 	}
 
+#ifdef CONFIG_X86
+	/*
+	 * IRQ override isn't needed on modern AMD Zen systems and
+	 * this override breaks active low IRQs on AMD Ryzen 6000 and
+	 * newer systems. Skip it.
+	 */
+	if (boot_cpu_has(X86_FEATURE_ZEN))
+		return false;
+#endif
+
 	return true;
 }