diff mbox

[Bugfix,v4] PCI, ACPI: Fix regressions caused by resource_size_t overflow with 32-bit kernel

Message ID 1436340399-19695-1-git-send-email-jiang.liu@linux.intel.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Jiang Liu July 8, 2015, 7:26 a.m. UTC
Zoltan Boszormenyi reported this regression:
  "There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID
   1565:230e) network chip on the mainboard. After the r8169 driver loaded
   the IRQs in the machine went berserk. Keyboard keypressed arrived with
   considerable latency and duplicated, so no real work was possible.
   The machine responded to the power button but didn't actually power
   down. It just stuck at the powering down message. I had to press the
   power button for 4 seconds to power it down.

   The computer is a POS machine with a big battery inside. Because of this,
   either ACPI or the Realtek chip kept the bad state and after rebooting,
   the network chip didn't even show up in lspci. Not even the PXE ROM
   announced itself during boot. I had to disconnect the battery to beat
   some sense back to the computer.

   The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final. 3.18.16 was
   good."

The regression is caused by commit 593669c2ac0f ("x86/PCI/ACPI: Use common
ACPI resource interfaces to simplify implementation"). Since commit
593669c2ac0f, x86 PCI ACPI host bridge driver validates ACPI resources by
first converting an ACPI resource to a 'struct resource' structure and
then applying checks against the converted resource structure. The 'start'
and 'end' fields in 'struct resource' are defined to be type of
resource_size_t, which may be 32 bits or 64 bits depending on
CONFIG_PHYS_ADDR_T_64BIT.

This may cause incorrect resource validation results with 32-bit kernels
because 64-bit ACPI resource descriptors may get truncated when converting
to 32-bit 'start' and 'end' fields in 'struct resource'. It eventually
affects PCI resource allocation subsystem and makes some PCI devices and
the system behave abnormally due to incorrect resource assignment.

So enhance the ACPI resource parsing interfaces to ignore ACPI resource
descriptors with address/offset above 4G when running in 32-bit mode.

With the fix applied, the behavior of the machine was restored to how
3.18.16 worked, i.e. the memory range that is over 4GB is ignored again,
and lspci -vvxxx shows that everything is at the same memory window as
they were with 3.18.16.

Reported-and-Tested-by: Boszormenyi Zoltan <zboszor@pr.hu>
Fixes: 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation")
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: stable@vger.kernel.org # 4.0
---
 drivers/acpi/resource.c |   24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

Comments

Rafael J. Wysocki July 10, 2015, 1:10 a.m. UTC | #1
On Wednesday, July 08, 2015 03:26:39 PM Jiang Liu wrote:
> Zoltan Boszormenyi reported this regression:
>   "There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID
>    1565:230e) network chip on the mainboard. After the r8169 driver loaded
>    the IRQs in the machine went berserk. Keyboard keypressed arrived with
>    considerable latency and duplicated, so no real work was possible.
>    The machine responded to the power button but didn't actually power
>    down. It just stuck at the powering down message. I had to press the
>    power button for 4 seconds to power it down.
> 
>    The computer is a POS machine with a big battery inside. Because of this,
>    either ACPI or the Realtek chip kept the bad state and after rebooting,
>    the network chip didn't even show up in lspci. Not even the PXE ROM
>    announced itself during boot. I had to disconnect the battery to beat
>    some sense back to the computer.
> 
>    The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final. 3.18.16 was
>    good."
> 
> The regression is caused by commit 593669c2ac0f ("x86/PCI/ACPI: Use common
> ACPI resource interfaces to simplify implementation"). Since commit
> 593669c2ac0f, x86 PCI ACPI host bridge driver validates ACPI resources by
> first converting an ACPI resource to a 'struct resource' structure and
> then applying checks against the converted resource structure. The 'start'
> and 'end' fields in 'struct resource' are defined to be type of
> resource_size_t, which may be 32 bits or 64 bits depending on
> CONFIG_PHYS_ADDR_T_64BIT.
> 
> This may cause incorrect resource validation results with 32-bit kernels
> because 64-bit ACPI resource descriptors may get truncated when converting
> to 32-bit 'start' and 'end' fields in 'struct resource'. It eventually
> affects PCI resource allocation subsystem and makes some PCI devices and
> the system behave abnormally due to incorrect resource assignment.
> 
> So enhance the ACPI resource parsing interfaces to ignore ACPI resource
> descriptors with address/offset above 4G when running in 32-bit mode.
> 
> With the fix applied, the behavior of the machine was restored to how
> 3.18.16 worked, i.e. the memory range that is over 4GB is ignored again,
> and lspci -vvxxx shows that everything is at the same memory window as
> they were with 3.18.16.
> 
> Reported-and-Tested-by: Boszormenyi Zoltan <zboszor@pr.hu>
> Fixes: 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation")
> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
> Cc: stable@vger.kernel.org # 4.0

OK, I'm happy with the above changelog, so I'm going to apply the patch.

If anyone has any objections, please let me know.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tomasz Nowicki Nov. 2, 2015, 3:27 p.m. UTC | #2
On 08.07.2015 09:26, Jiang Liu wrote:
> Zoltan Boszormenyi reported this regression:
>    "There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID
>     1565:230e) network chip on the mainboard. After the r8169 driver loaded
>     the IRQs in the machine went berserk. Keyboard keypressed arrived with
>     considerable latency and duplicated, so no real work was possible.
>     The machine responded to the power button but didn't actually power
>     down. It just stuck at the powering down message. I had to press the
>     power button for 4 seconds to power it down.
>
>     The computer is a POS machine with a big battery inside. Because of this,
>     either ACPI or the Realtek chip kept the bad state and after rebooting,
>     the network chip didn't even show up in lspci. Not even the PXE ROM
>     announced itself during boot. I had to disconnect the battery to beat
>     some sense back to the computer.
>
>     The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final. 3.18.16 was
>     good."
>
> The regression is caused by commit 593669c2ac0f ("x86/PCI/ACPI: Use common
> ACPI resource interfaces to simplify implementation"). Since commit
> 593669c2ac0f, x86 PCI ACPI host bridge driver validates ACPI resources by
> first converting an ACPI resource to a 'struct resource' structure and
> then applying checks against the converted resource structure. The 'start'
> and 'end' fields in 'struct resource' are defined to be type of
> resource_size_t, which may be 32 bits or 64 bits depending on
> CONFIG_PHYS_ADDR_T_64BIT.
>
> This may cause incorrect resource validation results with 32-bit kernels
> because 64-bit ACPI resource descriptors may get truncated when converting
> to 32-bit 'start' and 'end' fields in 'struct resource'. It eventually
> affects PCI resource allocation subsystem and makes some PCI devices and
> the system behave abnormally due to incorrect resource assignment.
>
> So enhance the ACPI resource parsing interfaces to ignore ACPI resource
> descriptors with address/offset above 4G when running in 32-bit mode.
>
> With the fix applied, the behavior of the machine was restored to how
> 3.18.16 worked, i.e. the memory range that is over 4GB is ignored again,
> and lspci -vvxxx shows that everything is at the same memory window as
> they were with 3.18.16.
>
> Reported-and-Tested-by: Boszormenyi Zoltan <zboszor@pr.hu>
> Fixes: 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource interfaces to simplify implementation")
> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
> Cc: stable@vger.kernel.org # 4.0
> ---
>   drivers/acpi/resource.c |   24 +++++++++++++++---------
>   1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
> index 10561ce16ed1..e8d281739cbc 100644
> --- a/drivers/acpi/resource.c
> +++ b/drivers/acpi/resource.c
> @@ -194,6 +194,7 @@ static bool acpi_decode_space(struct resource_win *win,
>   	u8 iodec = attr->granularity == 0xfff ? ACPI_DECODE_10 : ACPI_DECODE_16;
>   	bool wp = addr->info.mem.write_protect;
>   	u64 len = attr->address_length;
> +	u64 start, end, offset = 0;
>   	struct resource *res = &win->res;
>
>   	/*
> @@ -205,9 +206,6 @@ static bool acpi_decode_space(struct resource_win *win,
>   		pr_debug("ACPI: Invalid address space min_addr_fix %d, max_addr_fix %d, len %llx\n",
>   			 addr->min_address_fixed, addr->max_address_fixed, len);
>
> -	res->start = attr->minimum;
> -	res->end = attr->maximum;
> -
>   	/*
>   	 * For bridges that translate addresses across the bridge,
>   	 * translation_offset is the offset that must be added to the
> @@ -215,12 +213,22 @@ static bool acpi_decode_space(struct resource_win *win,
>   	 * primary side. Non-bridge devices must list 0 for all Address
>   	 * Translation offset bits.
>   	 */
> -	if (addr->producer_consumer == ACPI_PRODUCER) {
> -		res->start += attr->translation_offset;
> -		res->end += attr->translation_offset;
> -	} else if (attr->translation_offset) {
> +	if (addr->producer_consumer == ACPI_PRODUCER)
> +		offset = attr->translation_offset;
> +	else if (attr->translation_offset)
>   		pr_debug("ACPI: translation_offset(%lld) is invalid for non-bridge device.\n",
>   			 attr->translation_offset);
> +	start = attr->minimum + offset;
> +	end = attr->maximum + offset;

I still see the issue for this area, I mean ACPI_IO_RANGE. You are 
adding translation offset to attr->minimum, build resource structure 
which is then passed to acpi_dev_ioresource_flags and compared against 
0x10003. It causes some IO ranges to be ignored.

> +
> +	win->offset = offset;
> +	res->start = start;
> +	res->end = end;
> +	if (sizeof(resource_size_t) < sizeof(u64) &&
> +	    (offset != win->offset || start != res->start || end != res->end)) {
> +		pr_warn("acpi resource window ([%#llx-%#llx] ignored, not CPU addressable)\n",
> +			attr->minimum, attr->maximum);
> +		return false;
>   	}
>
>   	switch (addr->resource_type) {
> @@ -237,8 +245,6 @@ static bool acpi_decode_space(struct resource_win *win,
>   		return false;
>   	}
>
> -	win->offset = attr->translation_offset;
> -
>   	if (addr->producer_consumer == ACPI_PRODUCER)
>   		res->flags |= IORESOURCE_WINDOW;
>
>

Thanks,
Tomasz
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tomasz Nowicki Nov. 5, 2015, 12:53 p.m. UTC | #3
On 02.11.2015 16:27, Tomasz Nowicki wrote:
> On 08.07.2015 09:26, Jiang Liu wrote:
>> Zoltan Boszormenyi reported this regression:
>>    "There's a Realtek RTL8111/8168/8411 (PCI ID 10ec:8168, Subsystem ID
>>     1565:230e) network chip on the mainboard. After the r8169 driver
>> loaded
>>     the IRQs in the machine went berserk. Keyboard keypressed arrived
>> with
>>     considerable latency and duplicated, so no real work was possible.
>>     The machine responded to the power button but didn't actually power
>>     down. It just stuck at the powering down message. I had to press the
>>     power button for 4 seconds to power it down.
>>
>>     The computer is a POS machine with a big battery inside. Because
>> of this,
>>     either ACPI or the Realtek chip kept the bad state and after
>> rebooting,
>>     the network chip didn't even show up in lspci. Not even the PXE ROM
>>     announced itself during boot. I had to disconnect the battery to beat
>>     some sense back to the computer.
>>
>>     The regression happens with 4.0.5, 4.1.0-rc8 and 4.1.0-final.
>> 3.18.16 was
>>     good."
>>
>> The regression is caused by commit 593669c2ac0f ("x86/PCI/ACPI: Use
>> common
>> ACPI resource interfaces to simplify implementation"). Since commit
>> 593669c2ac0f, x86 PCI ACPI host bridge driver validates ACPI resources by
>> first converting an ACPI resource to a 'struct resource' structure and
>> then applying checks against the converted resource structure. The
>> 'start'
>> and 'end' fields in 'struct resource' are defined to be type of
>> resource_size_t, which may be 32 bits or 64 bits depending on
>> CONFIG_PHYS_ADDR_T_64BIT.
>>
>> This may cause incorrect resource validation results with 32-bit kernels
>> because 64-bit ACPI resource descriptors may get truncated when
>> converting
>> to 32-bit 'start' and 'end' fields in 'struct resource'. It eventually
>> affects PCI resource allocation subsystem and makes some PCI devices and
>> the system behave abnormally due to incorrect resource assignment.
>>
>> So enhance the ACPI resource parsing interfaces to ignore ACPI resource
>> descriptors with address/offset above 4G when running in 32-bit mode.
>>
>> With the fix applied, the behavior of the machine was restored to how
>> 3.18.16 worked, i.e. the memory range that is over 4GB is ignored again,
>> and lspci -vvxxx shows that everything is at the same memory window as
>> they were with 3.18.16.
>>
>> Reported-and-Tested-by: Boszormenyi Zoltan <zboszor@pr.hu>
>> Fixes: 593669c2ac0f ("x86/PCI/ACPI: Use common ACPI resource
>> interfaces to simplify implementation")
>> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
>> Cc: stable@vger.kernel.org # 4.0
>> ---
>>   drivers/acpi/resource.c |   24 +++++++++++++++---------
>>   1 file changed, 15 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
>> index 10561ce16ed1..e8d281739cbc 100644
>> --- a/drivers/acpi/resource.c
>> +++ b/drivers/acpi/resource.c
>> @@ -194,6 +194,7 @@ static bool acpi_decode_space(struct resource_win
>> *win,
>>       u8 iodec = attr->granularity == 0xfff ? ACPI_DECODE_10 :
>> ACPI_DECODE_16;
>>       bool wp = addr->info.mem.write_protect;
>>       u64 len = attr->address_length;
>> +    u64 start, end, offset = 0;
>>       struct resource *res = &win->res;
>>
>>       /*
>> @@ -205,9 +206,6 @@ static bool acpi_decode_space(struct resource_win
>> *win,
>>           pr_debug("ACPI: Invalid address space min_addr_fix %d,
>> max_addr_fix %d, len %llx\n",
>>                addr->min_address_fixed, addr->max_address_fixed, len);
>>
>> -    res->start = attr->minimum;
>> -    res->end = attr->maximum;
>> -
>>       /*
>>        * For bridges that translate addresses across the bridge,
>>        * translation_offset is the offset that must be added to the
>> @@ -215,12 +213,22 @@ static bool acpi_decode_space(struct
>> resource_win *win,
>>        * primary side. Non-bridge devices must list 0 for all Address
>>        * Translation offset bits.
>>        */
>> -    if (addr->producer_consumer == ACPI_PRODUCER) {
>> -        res->start += attr->translation_offset;
>> -        res->end += attr->translation_offset;
>> -    } else if (attr->translation_offset) {
>> +    if (addr->producer_consumer == ACPI_PRODUCER)
>> +        offset = attr->translation_offset;
>> +    else if (attr->translation_offset)
>>           pr_debug("ACPI: translation_offset(%lld) is invalid for
>> non-bridge device.\n",
>>                attr->translation_offset);
>> +    start = attr->minimum + offset;
>> +    end = attr->maximum + offset;
>
> I still see the issue for this area, I mean ACPI_IO_RANGE. You are
> adding translation offset to attr->minimum, build resource structure
> which is then passed to acpi_dev_ioresource_flags and compared against
> 0x10003. It causes some IO ranges to be ignored.
>

Kindly reminder, any comments?

Tomasz
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c
index 10561ce16ed1..e8d281739cbc 100644
--- a/drivers/acpi/resource.c
+++ b/drivers/acpi/resource.c
@@ -194,6 +194,7 @@  static bool acpi_decode_space(struct resource_win *win,
 	u8 iodec = attr->granularity == 0xfff ? ACPI_DECODE_10 : ACPI_DECODE_16;
 	bool wp = addr->info.mem.write_protect;
 	u64 len = attr->address_length;
+	u64 start, end, offset = 0;
 	struct resource *res = &win->res;
 
 	/*
@@ -205,9 +206,6 @@  static bool acpi_decode_space(struct resource_win *win,
 		pr_debug("ACPI: Invalid address space min_addr_fix %d, max_addr_fix %d, len %llx\n",
 			 addr->min_address_fixed, addr->max_address_fixed, len);
 
-	res->start = attr->minimum;
-	res->end = attr->maximum;
-
 	/*
 	 * For bridges that translate addresses across the bridge,
 	 * translation_offset is the offset that must be added to the
@@ -215,12 +213,22 @@  static bool acpi_decode_space(struct resource_win *win,
 	 * primary side. Non-bridge devices must list 0 for all Address
 	 * Translation offset bits.
 	 */
-	if (addr->producer_consumer == ACPI_PRODUCER) {
-		res->start += attr->translation_offset;
-		res->end += attr->translation_offset;
-	} else if (attr->translation_offset) {
+	if (addr->producer_consumer == ACPI_PRODUCER)
+		offset = attr->translation_offset;
+	else if (attr->translation_offset)
 		pr_debug("ACPI: translation_offset(%lld) is invalid for non-bridge device.\n",
 			 attr->translation_offset);
+	start = attr->minimum + offset;
+	end = attr->maximum + offset;
+
+	win->offset = offset;
+	res->start = start;
+	res->end = end;
+	if (sizeof(resource_size_t) < sizeof(u64) &&
+	    (offset != win->offset || start != res->start || end != res->end)) {
+		pr_warn("acpi resource window ([%#llx-%#llx] ignored, not CPU addressable)\n",
+			attr->minimum, attr->maximum);
+		return false;
 	}
 
 	switch (addr->resource_type) {
@@ -237,8 +245,6 @@  static bool acpi_decode_space(struct resource_win *win,
 		return false;
 	}
 
-	win->offset = attr->translation_offset;
-
 	if (addr->producer_consumer == ACPI_PRODUCER)
 		res->flags |= IORESOURCE_WINDOW;