diff mbox

crash in dm-io when signal is pending

Message ID Pine.LNX.4.64.0901212203090.31606@hs20-bc2-1.build.redhat.com (mailing list archive)
State Accepted, archived
Delegated to: Alasdair Kergon
Headers show

Commit Message

Mikulas Patocka Jan. 22, 2009, 3:04 a.m. UTC
If someone sends signal to a process performing synchronous dm-io call,
the kernel may crash.

The function sync_io attempts to exit with -EINTR if it has pending signal,
however the structure "io" is allocated on stack, so already submitted io
requests end up touching unallocated stack space and corrupting kernel memory.

sync_io sets its state to TASK_UNINTERRUPTIBLE, so the signal can't break out
of io_schedule() --- however, if the signal was pending before sync_io entered
while (1) loop, the corruption of kernel memory will happen.

There is no way to cancel in-progress IOs, so the best solution is to ignore
signals at this point.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm-io.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Comments

Alasdair G Kergon Jan. 23, 2009, 10:57 a.m. UTC | #1
On Wed, Jan 21, 2009 at 10:04:39PM -0500, Mikulas Patocka wrote:
> If someone sends signal to a process performing synchronous dm-io call,
> the kernel may crash.
 
> There is no way to cancel in-progress IOs, so the best solution is to ignore
> signals at this point.
 
So what is the impact of this patch at a higher level?
- What userspace operations are there that you can interrupt now, but that
after this patch you won't be able to?
(Are there any situations where the io will not complete without a reboot,
that could actually be safe today?)

Alasdair
Mikulas Patocka Jan. 23, 2009, 7:59 p.m. UTC | #2
On Fri, 23 Jan 2009, Alasdair G Kergon wrote:

> On Wed, Jan 21, 2009 at 10:04:39PM -0500, Mikulas Patocka wrote:
> > If someone sends signal to a process performing synchronous dm-io call,
> > the kernel may crash.
>  
> > There is no way to cancel in-progress IOs, so the best solution is to ignore
> > signals at this point.
>  
> So what is the impact of this patch at a higher level?

Avoid crash if the admin kills lvm or dmsetup with SIGKILL at a certain 
point.

AFAIK lvm blocks all the blockable signals while it is performing critical 
operations, so there should be no crash from pressing ^C, terminal loss or 
so.

> - What userspace operations are there that you can interrupt now, but that
> after this patch you won't be able to?

When I grepped for interruptible sleep, I found one another possibility: 
aborting a suspend with signal. I didn't find crash condition that could 
be caused by this, but it could unfortunatelly confuse targets.

If suspend is aborted this way, presuspend method is called, but 
postsuspend, preresume and resume isn't --- this will confuse target 
drivers --- you end up with an active mirror that stopped recovering or 
active snapshot that stopped merging.

I don't know if aborting suspend this way should be allowed or not.

> (Are there any situations where the io will not complete without a reboot,
> that could actually be safe today?)

If the io will not complete, you can't reboot with normal reboot script. 
Unmount/remount-ro waits for ios on a filesystem to complete, so they will 
deadlock.

Mikulas

> Alasdair
> -- 
> agk@redhat.com
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Mikulas Patocka Jan. 27, 2009, 8:50 p.m. UTC | #3
On Fri, 23 Jan 2009, Mikulas Patocka wrote:

> On Fri, 23 Jan 2009, Alasdair G Kergon wrote:
> 
> > On Wed, Jan 21, 2009 at 10:04:39PM -0500, Mikulas Patocka wrote:
> > > If someone sends signal to a process performing synchronous dm-io call,
> > > the kernel may crash.
> >  
> > > There is no way to cancel in-progress IOs, so the best solution is to ignore
> > > signals at this point.
> >  
> > So what is the impact of this patch at a higher level?
> 
> Avoid crash if the admin kills lvm or dmsetup with SIGKILL at a certain 
> point.
> 
> AFAIK lvm blocks all the blockable signals while it is performing critical 
> operations, so there should be no crash from pressing ^C, terminal loss or 
> so.
> 
> > - What userspace operations are there that you can interrupt now, but that
> > after this patch you won't be able to?
> 
> When I grepped for interruptible sleep, I found one another possibility: 
> aborting a suspend with signal. I didn't find crash condition that could 
> be caused by this, but it could unfortunatelly confuse targets.
> 
> If suspend is aborted this way, presuspend method is called, but 
> postsuspend, preresume and resume isn't --- this will confuse target 
> drivers --- you end up with an active mirror that stopped recovering or 
> active snapshot that stopped merging.
> 
> I don't know if aborting suspend this way should be allowed or not.
> 
> > (Are there any situations where the io will not complete without a reboot,
> > that could actually be safe today?)
> 
> If the io will not complete, you can't reboot with normal reboot script. 
> Unmount/remount-ro waits for ios on a filesystem to complete, so they will 
> deadlock.
> 
> Mikulas
> 
> > Alasdair
> > -- 
> > agk@redhat.com

Regarding the other possibilities you suggested on the phone call:

There are main architectural design decisions that can't be changed 
without major code rewrite:
- submitted bio can't be cancelled
- the device must not be closed when some bios are submitted on it

So, if the function sync_io() was modified so that the structure would be 
allocated with kmalloc and wouldn't be on stack (so that the function 
could be interrupted and exit while bios are still pending), we would have 
to somehow make sure be that the device wouldn't be closed until all the 
bios finish.

So there would be no benefit for the user --- the user still wouldn't be 
able to interrupt target contructor --- because in the error path, devices 
are closed and these close calls would have to wait for the interrupted 
bio to finish. And there would be major code blow coming from the fact 
that these interrupted bios would have to be tracked somehow and 
dm_put_device would have to wait for them.

So the best solution is to make dm-io ios uninterruptible.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

Index: linux-2.6.29-rc1-devel/drivers/md/dm-io.c
===================================================================
--- linux-2.6.29-rc1-devel.orig/drivers/md/dm-io.c	2009-01-22 03:46:09.000000000 +0100
+++ linux-2.6.29-rc1-devel/drivers/md/dm-io.c	2009-01-22 03:46:50.000000000 +0100
@@ -368,16 +368,13 @@  static int sync_io(struct dm_io_client *
 	while (1) {
 		set_current_state(TASK_UNINTERRUPTIBLE);
 
-		if (!atomic_read(&io.count) || signal_pending(current))
+		if (!atomic_read(&io.count))
 			break;
 
 		io_schedule();
 	}
 	set_current_state(TASK_RUNNING);
 
-	if (atomic_read(&io.count))
-		return -EINTR;
-
 	if (error_bits)
 		*error_bits = io.error_bits;