diff mbox

NULL pointer dereference: IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]

Message ID 1444982077.2350.0.camel@giantmonkey.de (mailing list archive)
State New, archived
Headers show

Commit Message

Paul Menzel Oct. 16, 2015, 7:54 a.m. UTC
Package: linux-image-4.2.0-1-686-pae
Version: 4.2.3-2
Severity: important


Dear Linux SCSI folks,


please don’t include the address submit@bugs.debian.org in your reply.


Am Freitag, den 16.10.2015, 03:05 +0200 schrieb Paul Menzel:

> using Debian Sid/unstable with Linux 4.2.3-1 upgrading from systemd
> 227-1 to 227-2 [1] and other packages, the system doesn’t start up
> anymore and the /dev/md1 device doesn’t seem to be found and I am
> dropped into shell from initramfs (BusyBox).
> 
> Only having wireless LAN and no serial or USB debug capabilities, and
> mount a USB storage device did not work, I manually copied the beginning
> of the Oops.
> 
> ```
> BUG: unable to handle kernel NULL pointer dereference at 00000014
> IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
> *pdpt = 000000003696e001 *pde = 000000000000000000
> Oops: 0000 [#1] SMB
> Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal thermal_sys floppy(+)
> CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 4.2.3-1
> Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009
> task: f68dd040 ti: f6988000 task.ti: f6988000
> EIP: 0060:[<fh28a00c>] EFLAGS: 00010246 CPU: 1
> EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod]
> EAX: 00000000 EBX: f6a30cd8 ECX: f6c03d2c EDX: 00000000
> ESI: 00000000 EDI: f828e100 EBP: f6989ba8 ESP: f6989b88
>  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> CR0: 8005003b CR2: 00000014 CR3: 3696d780 CR4: 000006f0
> Stack:
>  af83346c3 00000000 00000001 fffffff5 f6a7d150 f6a30cd8 f6a30d3c 00000000
>  f6989bbc c1390cb7 f6a30cd8 f8334660 00000000 f6989bd0 c1390d0f f6a30cd8
>  f8334660 00000000 f6989c0c c13916cb f694a614 f68dd040 00000000 00000008
> Call Trace:
>  […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod]
>  […] ? __rpm_callback+0x27/0x60
> […]
> ```
> 
> I tried also to boot with Linux 4.1 and it fails the same way.
> 
> Is that a known problem and has been fixed in the mean time? It’d be
> great if you helped me getting the system to boot again. Please tell me
> if you need more information to debug this issue and I’ll do my best to
> get it.

Ben Hutchings asked me to test the patch below to get more debug
information.

```
```

I’ll try that as soon as a spare drive has arrived, where I can copy the
data to as a backup.

More thoughts are welcome! Especially, if that error suggests a failing
drive or not.


Thanks,

Paul


> [1] http://metadata.ftp-master.debian.org/changelogs//main/s/systemd/systemd_227-2_changelog

Comments

Paul Menzel Oct. 16, 2015, 8:52 a.m. UTC | #1
Dear Linux SCSI folks,


Am Freitag, den 16.10.2015, 09:54 +0200 schrieb Paul Menzel:
> Package: linux-image-4.2.0-1-686-pae
> Version: 4.2.3-2
> Severity: important

> please don’t include the address submit@bugs.debian.org in your reply.

this issue is now also tracked in the Debian Bug Tracking System [2] and
has the number #801925 [3]. Please keep that address in CC.

> Am Freitag, den 16.10.2015, 03:05 +0200 schrieb Paul Menzel:
> 
> > using Debian Sid/unstable with Linux 4.2.3-1 upgrading from systemd
> > 227-1 to 227-2 [1] and other packages, the system doesn’t start up
> > anymore and the /dev/md1 device doesn’t seem to be found and I am
> > dropped into shell from initramfs (BusyBox).
> > 
> > Only having wireless LAN and no serial or USB debug capabilities, and
> > mount a USB storage device did not work, I manually copied the beginning
> > of the Oops.
> > 
> > ```
> > BUG: unable to handle kernel NULL pointer dereference at 00000014
> > IP: [<f828a00c>] sr_runtime_suspend+0xc/0x20 [sr_mod]
> > *pdpt = 000000003696e001 *pde = 000000000000000000
> > Oops: 0000 [#1] SMB
> > Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal thermal_sys floppy(+)
> > CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 4.2.3-1
> > Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009
> > task: f68dd040 ti: f6988000 task.ti: f6988000
> > EIP: 0060:[<fh28a00c>] EFLAGS: 00010246 CPU: 1
> > EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod]
> > EAX: 00000000 EBX: f6a30cd8 ECX: f6c03d2c EDX: 00000000
> > ESI: 00000000 EDI: f828e100 EBP: f6989ba8 ESP: f6989b88
> >  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> > CR0: 8005003b CR2: 00000014 CR3: 3696d780 CR4: 000006f0
> > Stack:
> >  af83346c3 00000000 00000001 fffffff5 f6a7d150 f6a30cd8 f6a30d3c 00000000
> >  f6989bbc c1390cb7 f6a30cd8 f8334660 00000000 f6989bd0 c1390d0f f6a30cd8
> >  f8334660 00000000 f6989c0c c13916cb f694a614 f68dd040 00000000 00000008
> > Call Trace:
> >  […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod]
> >  […] ? __rpm_callback+0x27/0x60
> > […]
> > ```
> > 
> > I tried also to boot with Linux 4.1 and it fails the same way.
> > 
> > Is that a known problem and has been fixed in the mean time? It’d be
> > great if you helped me getting the system to boot again. Please tell me
> > if you need more information to debug this issue and I’ll do my best to
> > get it.
> 
> Ben Hutchings asked me to test the patch below to get more debug
> information.
> 
> ```
> diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
> index 8bd54a6..dd5b5b2 100644
> --- a/drivers/scsi/sr.c
> +++ b/drivers/scsi/sr.c
> @@ -144,6 +144,12 @@ static int sr_runtime_suspend(struct device *dev)
>  {
>  	struct scsi_cd *cd = dev_get_drvdata(dev);
>  
> +	if (WARN_ON(!cd)) {
> +		pr_info("%s: cd == NULL; power.usage_count = %d\n",
> +			__func__, atomic_read(&dev->power.usage_count));
> +		return 0;
> +	}
> +
>  	if (cd->media_present)
>  		return -EBUSY;
>  	else
> @@ -652,7 +658,13 @@ static int sr_probe(struct device *dev)
>  	struct scsi_cd *cd;
>  	int minor, error;
>  
> -	scsi_autopm_get_device(sdev);
> +	error = scsi_autopm_get_device(sdev);
> +	if (error) {
> +		pr_err("%s: scsi_autopm_get_device returned %d\n",
> +		       __func__, error);
> +		return error;
> +	}
> +
>  	error = -ENODEV;
>  	if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM)
>  		goto fail;
> @@ -719,6 +731,9 @@ static int sr_probe(struct device *dev)
>  	if (register_cdrom(&cd->cdi))
>  		goto fail_put;
>  
> +	pr_info("%s: power.usage_count = %d\n",
> +		__func__, atomic_read(&dev->power.usage_count));
> +
>  	/*
>  	 * Initialize block layer runtime PM stuffs before the
>  	 * periodic event checking request gets started in add_disk.
> ```
> 
> I’ll try that as soon as a spare drive has arrived, where I can copy the
> data to as a backup.
> 
> More thoughts are welcome! Especially, if that error suggests a failing
> drive or not.


Thanks,

Paul


> > [1] http://metadata.ftp-master.debian.org/changelogs//main/s/systemd/systemd_227-2_changelog
[2] https://www.debian.org/Bugs/
[3] https://bugs.debian.org/801925
Ben Hutchings Oct. 20, 2015, 1:39 a.m. UTC | #2
On Fri, 2015-10-16 at 09:54 +0200, Paul Menzel wrote:
[...]
> > BUG: unable to handle kernel NULL pointer dereference at 00000014
> > IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]
> > *pdpt = 000000003696e001 *pde = 000000000000000000
> > Oops: 0000 [#1] SMB
> > Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal thermal_sys floppy(+)
> > CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 4.2.3-1
> > Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009
> > task: f68dd040 ti: f6988000 task.ti: f6988000
> > EIP: 0060:[] EFLAGS: 00010246 CPU: 1
> > EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod]
> > EAX: 00000000 EBX: f6a30cd8 ECX: f6c03d2c EDX: 00000000
> > ESI: 00000000 EDI: f828e100 EBP: f6989ba8 ESP: f6989b88
> >  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> > CR0: 8005003b CR2: 00000014 CR3: 3696d780 CR4: 000006f0
> > Stack:
> >  af83346c3 00000000 00000001 fffffff5 f6a7d150 f6a30cd8 f6a30d3c 00000000
> >  f6989bbc c1390cb7 f6a30cd8 f8334660 00000000 f6989bd0 c1390d0f f6a30cd8
> >  f8334660 00000000 f6989c0c c13916cb f694a614 f68dd040 00000000 00000008
> > Call Trace:
> >  […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod]
> >  […] ? __rpm_callback+0x27/0x60
> > […]
[...]
> Ben Hutchings asked me to test the patch below to get more debug
> information.
[...]

Well, that didn't help much.  Paul hit another oops, this time in
sd_mod but again apparently related to runtime PM.  My patch only
touched sr_mod.

This time he sent photos of the complete oops; see
<https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=801925;filename=20151020_005.jpg;att=4;msg=15>
and
<https://bugs.debian.org/cgi-bin/bugreport.cgi?filename=20151020_006.jpg;bug=801925;att=3;msg=15>

Ben.
Paul Menzel Oct. 31, 2015, 9:39 a.m. UTC | #3
Control: notfound -1 3.19-1~exp1
Control: found -1 4.2.5-1


Am Dienstag, den 20.10.2015, 02:39 +0100 schrieb Ben Hutchings:
> On Fri, 2015-10-16 at 09:54 +0200, Paul Menzel wrote:
> [...]
> > > BUG: unable to handle kernel NULL pointer dereference at 00000014
> > > IP: [] sr_runtime_suspend+0xc/0x20 [sr_mod]
> > > *pdpt = 000000003696e001 *pde = 000000000000000000
> > > Oops: 0000 [#1] SMB
> > > Modules linked in: sd_mod(+) sr_mod(+) cdrom ata_generic ohci_pci ahci libahci pata_amd firwire_ohci firewire_core crc_iti_t forcedeth libata scsi_mod ohci_hcd ehci_pci ehci_hcd usbcore usb_common fan thermal thermal_sys floppy(+)
> > > CPU: 1 PID: 73 Comm: systemd-udevd Not tainted 4.2.0-1-686-pae #1 Debian 4.2.3-1
> > > Hardware name: Packard Bell imedia S3210/WMCP78M, BIOs P01-B2 11/06/2009
> > > task: f68dd040 ti: f6988000 task.ti: f6988000
> > > EIP: 0060:[] EFLAGS: 00010246 CPU: 1
> > > EIP is at sr_runtime_suspend+0xc/0x20 [sr_mod]
> > > EAX: 00000000 EBX: f6a30cd8 ECX: f6c03d2c EDX: 00000000
> > > ESI: 00000000 EDI: f828e100 EBP: f6989ba8 ESP: f6989b88
> > >  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> > > CR0: 8005003b CR2: 00000014 CR3: 3696d780 CR4: 000006f0
> > > Stack:
> > >  af83346c3 00000000 00000001 fffffff5 f6a7d150 f6a30cd8 f6a30d3c 00000000
> > >  f6989bbc c1390cb7 f6a30cd8 f8334660 00000000 f6989bd0 c1390d0f f6a30cd8
> > >  f8334660 00000000 f6989c0c c13916cb f694a614 f68dd040 00000000 00000008
> > > Call Trace:
> > >  […] ? scsi_runtime_suspend+0x63/0xa0 [scsi_mod]
> > >  […] ? __rpm_callback+0x27/0x60
> > > […]
> [...]
> > Ben Hutchings asked me to test the patch below to get more debug
> > information.
> [...]
> 
> Well, that didn't help much.  Paul hit another oops, this time in
> sd_mod but again apparently related to runtime PM.  My patch only
> touched sr_mod.
> 
> This time he sent photos of the complete oops; see
> <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=801925;filename=20151020_005.jpg;att=4;msg=15>
> and
> <https://bugs.debian.org/cgi-bin/bugreport.cgi?filename=20151020_006.jpg;bug=801925;att=3;msg=15>

after backing up my data, I tested a little bit more, and using Linux
3.19 the drive is detected and the system boots.

Does anything stand out what changed in this area between Linux 3.19 and
4.1?


Thanks

Paul
Alan Stern Nov. 1, 2015, 1:56 a.m. UTC | #4
On Sat, 31 Oct 2015, Paul Menzel wrote:

> > Well, that didn't help much.  Paul hit another oops, this time in
> > sd_mod but again apparently related to runtime PM.  My patch only
> > touched sr_mod.
> > 
> > This time he sent photos of the complete oops; see
> > <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=801925;filename=20151020_005.jpg;att=4;msg=15>
> > and
> > <https://bugs.debian.org/cgi-bin/bugreport.cgi?filename=20151020_006.jpg;bug=801925;att=3;msg=15>
> 
> after backing up my data, I tested a little bit more, and using Linux
> 3.19 the drive is detected and the system boots.
> 
> Does anything stand out what changed in this area between Linux 3.19 and
> 4.1?

I believe the problem shown in that photo was fixed by commit
49718f0fb8c9 ("SCSI: Fix NULL pointer dereference in runtime PM"),
which was merged in 4.2 and has been back-ported to various stable 
releases.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Nov. 1, 2015, 2:05 a.m. UTC | #5
On Sat, 31 Oct 2015, Alan Stern wrote:

> I believe the problem shown in that photo was fixed by commit
> 49718f0fb8c9 ("SCSI: Fix NULL pointer dereference in runtime PM"),
> which was merged in 4.2 and has been back-ported to various stable 
> releases.

On second thought, it seems more likely that this issue probably was
_caused_ by that commit.  The fix can be found in these two emails:

	http://marc.info/?l=linux-scsi&m=144185206825609&w=2
	http://marc.info/?l=linux-scsi&m=144185208525611&w=2

which have not been merged yet as far as I know even though they were
submitted back in September.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Menzel Jan. 9, 2016, 3:23 p.m. UTC | #6
Version: 4.4~rc8-1~exp1

Dear Alan,


Thank you for your help!

There were some follow-ups to the bug report [1], but I think you and I
were not in CC.

Am Samstag, den 31.10.2015, 22:05 -0400 schrieb Alan Stern:
> On Sat, 31 Oct 2015, Alan Stern wrote:
> 
> > I believe the problem shown in that photo was fixed by commit
> > 49718f0fb8c9 ("SCSI: Fix NULL pointer dereference in runtime PM"),
> > which was merged in 4.2 and has been back-ported to various stable 
> > releases.
> 
> On second thought, it seems more likely that this issue probably was
> _caused_ by that commit.  The fix can be found in these two emails:
> 
> 	http://marc.info/?l=linux-scsi&m=144185206825609&w=2
> 	http://marc.info/?l=linux-scsi&m=144185208525611&w=2
> 
> which have not been merged yet as far as I know even though they were
> submitted back in September.

I can only say, that I am still unable to boot my system with Linux
4.4-rc8 [2]. Are these patches included there?


Thanks,

Paul


[1] https://bugs.debian.org/801925
[2] https://packages.debian.org/experimental/linux-image-4.4.0-rc8-686-pae-dbg
Alan Stern Jan. 9, 2016, 4:36 p.m. UTC | #7
On Sat, 9 Jan 2016, Paul Menzel wrote:

> Version: 4.4~rc8-1~exp1
> 
> Dear Alan,
> 
> 
> Thank you for your help!
> 
> There were some follow-ups to the bug report [1], but I think you and I
> were not in CC.

I wasn't.

> > 	http://marc.info/?l=linux-scsi&m=144185206825609&w=2
> > 	http://marc.info/?l=linux-scsi&m=144185208525611&w=2

> I can only say, that I am still unable to boot my system with Linux
> 4.4-rc8 [2]. Are these patches included there?

They are.  I don't see how they could cause a NULL pointer dereference 
in sd_resume(), though.  If you revert them, does the problem go away?

Also, can you add some debugging statements to sd_resume() so we can 
see where the NULL pointer comes from?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Erich Schubert Jan. 10, 2016, 11:44 a.m. UTC | #8
Hi all,
4.4-rc8 does not fix the problem for me.
Anything beyond 4.1.0 remains unable to boot this computer.

Unfortunately, because the error occurs during early early SCSI
initialization, I do not have easy access to the log - no disk, no
network.
It happens during SATA initialization: "scsi_runtime_resume".
So my back trace looks different than Alex in
https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=42;filename=scsi-null-pointer-dereference.log;bug=801925;att=1
but like the one Paul is seeing:
https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=15;filename=20151020_006.jpg;bug=801925;att=3
I will try to do a photo next time, too.

Here is some dmesg output from a successful boot on 4.1.0:
Note there are some ACPI Errors there (but probably not related).
---
ahci 0000:00:1f.2: version 3.0
ahci 0000:00:1f.2: SSS flag set, parallel bus scan disabled
ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 3 Gbps 0x1 impl SATA mode
ahci 0000:00:1f.2: flags: 64bit ncq sntf stag pm led clo pio slum part ems apst
scsi host0: ahci
scsi host1: ahci
scsi host2: ahci
scsi host3: ahci
scsi host4: ahci
scsi host5: ahci
ata1: SATA max UDMA/133 abar m2048@0xc0728000 port 0xc0728100 irq 30
ata2: DUMMY
ata3: DUMMY
ata4: DUMMY
ata5: DUMMY
ata6: DUMMY
usb 3-1: new high-speed USB device number 2 using ehci-pci
usb 4-1: new high-speed USB device number 2 using ehci-pci
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20150410/psargs-359)
ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.PRT0._SDD]
(Node ffff8802458b1608), AE_NOT_FOUND (20150410/psparse-536)
ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20150410/psargs-359)
ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.PRT0._GTF]
(Node ffff8802458b15e0), AE_NOT_FOUND (20150410/psparse-536)
ata1.00: ATA-8: TOSHIBA THNSNS256GMCP, TA2ABBF0, max UDMA/133
ata1.00: 500118192 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20150410/psargs-359)
ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.PRT0._SDD]
(Node ffff8802458b1608), AE_NOT_FOUND (20150410/psparse-536)
ACPI Error: [GTF0] Namespace lookup failure, AE_NOT_FOUND (20150410/psargs-359)
ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.PRT0._GTF]
(Node ffff8802458b15e0), AE_NOT_FOUND (20150410/psparse-536)
ata1.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access     ATA      TOSHIBA THNSNS25 BBF0 PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 500118192 512-byte logical blocks: (256 GB/238 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 >
sd 0:0:0:0: [sda] Attached SCSI disk
PM: Starting manual resume from disk
PM: Hibernation image partition 8:6 present
PM: Looking for hibernation image.
PM: Image not found (code -22)
PM: Hibernation image not present or could not be loaded.
---

On Sat, Jan 9, 2016 at 5:36 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Sat, 9 Jan 2016, Paul Menzel wrote:
>
>> Version: 4.4~rc8-1~exp1
>>
>> Dear Alan,
>>
>>
>> Thank you for your help!
>>
>> There were some follow-ups to the bug report [1], but I think you and I
>> were not in CC.
>
> I wasn't.
>
>> >     http://marc.info/?l=linux-scsi&m=144185206825609&w=2
>> >     http://marc.info/?l=linux-scsi&m=144185208525611&w=2
>
>> I can only say, that I am still unable to boot my system with Linux
>> 4.4-rc8 [2]. Are these patches included there?
>
> They are.  I don't see how they could cause a NULL pointer dereference
> in sd_resume(), though.  If you revert them, does the problem go away?
>
> Also, can you add some debugging statements to sd_resume() so we can
> see where the NULL pointer comes from?
>
> Alan Stern
>
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Jan. 10, 2016, 3:32 p.m. UTC | #9
On Sun, 10 Jan 2016, Erich Schubert wrote:

> Hi all,
> 4.4-rc8 does not fix the problem for me.
> Anything beyond 4.1.0 remains unable to boot this computer.
> 
> Unfortunately, because the error occurs during early early SCSI
> initialization, I do not have easy access to the log - no disk, no
> network.
> It happens during SATA initialization: "scsi_runtime_resume".

You didn't include any debugging information.  However...

> So my back trace looks different than Alex in
> https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=42;filename=scsi-null-pointer-dereference.log;bug=801925;att=1
> but like the one Paul is seeing:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=15;filename=20151020_006.jpg;bug=801925;att=3

The information in that bug report says that the failure happens in
sr_runtime_resume, not in scsi_runtime_resume.  Compare with the
Subject: line in this email thread.

> I will try to do a photo next time, too.

If I send you a patch, can you build and test it?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 8bd54a6..dd5b5b2 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -144,6 +144,12 @@  static int sr_runtime_suspend(struct device *dev)
 {
 	struct scsi_cd *cd = dev_get_drvdata(dev);
 
+	if (WARN_ON(!cd)) {
+		pr_info("%s: cd == NULL; power.usage_count = %d\n",
+			__func__, atomic_read(&dev->power.usage_count));
+		return 0;
+	}
+
 	if (cd->media_present)
 		return -EBUSY;
 	else
@@ -652,7 +658,13 @@  static int sr_probe(struct device *dev)
 	struct scsi_cd *cd;
 	int minor, error;
 
-	scsi_autopm_get_device(sdev);
+	error = scsi_autopm_get_device(sdev);
+	if (error) {
+		pr_err("%s: scsi_autopm_get_device returned %d\n",
+		       __func__, error);
+		return error;
+	}
+
 	error = -ENODEV;
 	if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM)
 		goto fail;
@@ -719,6 +731,9 @@  static int sr_probe(struct device *dev)
 	if (register_cdrom(&cd->cdi))
 		goto fail_put;
 
+	pr_info("%s: power.usage_count = %d\n",
+		__func__, atomic_read(&dev->power.usage_count));
+
 	/*
 	 * Initialize block layer runtime PM stuffs before the
 	 * periodic event checking request gets started in add_disk.