Message ID | 20090203103925.25703074@free.fr (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Tue, 3 Feb 2009, Jean-Francois Moine wrote: > On Mon, 2 Feb 2009 20:36:31 -0600 (CST) > kilgota@banach.math.auburn.edu wrote: >> Just now when I logged in, a fortune came up which says: >> >> "A little experience often upsets a lot of theory." >> >> It struck me funny after our recent experiences, so I thought I would >> share it with both of you. > > Hello, > > May be this message made me to look again at the gspca code. Well, it's > my fault: I did not check the previous patch. Sorry for all trouble. > > The patch is simply: > > diff -r 3f4a7bc53d8e linux/drivers/media/video/gspca/gspca.c > --- a/linux/drivers/media/video/gspca/gspca.c Mon Feb 02 20:25:38 2009 +0100 > +++ b/linux/drivers/media/video/gspca/gspca.c Tue Feb 03 10:37:51 2009 +0100 > @@ -435,7 +435,7 @@ > break; > > gspca_dev->urb[i] = NULL; > - if (!gspca_dev->present) > + if (gspca_dev->present) > usb_kill_urb(urb); > if (urb->transfer_buffer != NULL) > usb_buffer_free(gspca_dev->dev, > Well, OK. But this will solve the problem which comes from disconnect while streaming? Sure, I will try this and see if it does the job or not. But whatever the outcome of that might be, I do not follow the logic. Theodore Kilgore -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 3 Feb 2009 12:21:55 -0600 (CST) kilgota@banach.math.auburn.edu wrote: > I should add to the above that now I have tested, and indeed this > change does not solve the problem of kernel oops after disconnect > while streaming. It does make one change. The xterm does not go wild > with error messages. But it is still not possible to close the svv > window. Moreover, ps ax reveals that [svv] is running as an > unkillable process, with [sq905/0] and [sq905/1], equally unkillable, > in supporting roles. And dmesg reveals an oops. The problem is after > all notorious by now, so I do not see much need for yet another log > of debug output unless specifically asked for such. Why is there 2 sq905 processes? What is the oops? (Your last trace was good, because it gave the last gspca/sq905 messages and the full oops) > Perhaps we also need to do what Adam suggested yesterday, and add a > call to destroy_urbs() in gspca_disconnect()? Surely not! The destroy_urbs() must be called at the right time, i.e. on close().
On Tue, 3 Feb 2009, kilgota@banach.math.auburn.edu wrote: > > > On Tue, 3 Feb 2009, Jean-Francois Moine wrote: > >> On Mon, 2 Feb 2009 20:36:31 -0600 (CST) >> kilgota@banach.math.auburn.edu wrote: >>> Just now when I logged in, a fortune came up which says: >>> >>> "A little experience often upsets a lot of theory." >>> >>> It struck me funny after our recent experiences, so I thought I would >>> share it with both of you. >> >> Hello, >> >> May be this message made me to look again at the gspca code. Well, it's >> my fault: I did not check the previous patch. Sorry for all trouble. >> >> The patch is simply: >> >> diff -r 3f4a7bc53d8e linux/drivers/media/video/gspca/gspca.c >> --- a/linux/drivers/media/video/gspca/gspca.c Mon Feb 02 20:25:38 2009 >> +0100 >> +++ b/linux/drivers/media/video/gspca/gspca.c Tue Feb 03 10:37:51 2009 >> +0100 >> @@ -435,7 +435,7 @@ >> break; >> >> gspca_dev->urb[i] = NULL; >> - if (!gspca_dev->present) >> + if (gspca_dev->present) >> usb_kill_urb(urb); >> if (urb->transfer_buffer != NULL) >> usb_buffer_free(gspca_dev->dev, >> > > Well, OK. But this will solve the problem which comes from disconnect while > streaming? Sure, I will try this and see if it does the job or not. But > whatever the outcome of that might be, I do not follow the logic. > > Theodore Kilgore I should add to the above that now I have tested, and indeed this change does not solve the problem of kernel oops after disconnect while streaming. It does make one change. The xterm does not go wild with error messages. But it is still not possible to close the svv window. Moreover, ps ax reveals that [svv] is running as an unkillable process, with [sq905/0] and [sq905/1], equally unkillable, in supporting roles. And dmesg reveals an oops. The problem is after all notorious by now, so I do not see much need for yet another log of debug output unless specifically asked for such. Perhaps we also need to do what Adam suggested yesterday, and add a call to destroy_urbs() in gspca_disconnect()? That really did appear to cure the problem. But the way I did it is void gspca_disconnect(struct usb_interface *intf) { struct gspca_dev *gspca_dev = usb_get_intfdata(intf); gspca_dev->present = 0; destroy_urbs(gspca_dev); usb_set_intfdata(intf, NULL); /* release the device */ /* (this will call gspca_release() immediatly or on last close) */ video_unregister_device(&gspca_dev->vdev); PDEBUG(D_PROBE, "disconnect complete"); } and it would appear that with your changes above it would work better to do here first destroy_urbs(gspca_dev); and then only after that gspca_dev->present = 0; Or? Theodore Kilgore -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 3 Feb 2009, Jean-Francois Moine wrote: > On Tue, 3 Feb 2009 12:21:55 -0600 (CST) > kilgota@banach.math.auburn.edu wrote: > >> I should add to the above that now I have tested, and indeed this >> change does not solve the problem of kernel oops after disconnect >> while streaming. It does make one change. The xterm does not go wild >> with error messages. But it is still not possible to close the svv >> window. Moreover, ps ax reveals that [svv] is running as an >> unkillable process, with [sq905/0] and [sq905/1], equally unkillable, >> in supporting roles. And dmesg reveals an oops. The problem is after >> all notorious by now, so I do not see much need for yet another log >> of debug output unless specifically asked for such. > > Why is there 2 sq905 processes? I of course do not fully understand why there are two such processes. However, I would suspect that [sq905/0] is running on processor 0 and [sq905/1] is running on processor 1. As I remember, there is only one [sq905] process which runs on a single-core machine. > > What is the oops? (Your last trace was good, because it gave the last > gspca/sq905 messages and the full oops) Well, I can do it again, I suppose. So you get that in a few minutes. But my private speculation is it will look just about like the last one, because the problem is not addressed. > >> Perhaps we also need to do what Adam suggested yesterday, and add a >> call to destroy_urbs() in gspca_disconnect()? > > Surely not! The destroy_urbs() must be called at the right time, i.e. > on close(). Hmmm. well, the problem is that we are down there in a workqueue. We have to force the workqueue to close. I do not see a way to do that, even with the exit routines at the end of the workqueue, if it is not possible to call upon the functions there which will do the job (and it appears to me it is not thus possible, because those gspca functions are not public). And if we try to close the workqueue without doing something like destroy_urbs() as a consequence of a violent disconnect, then the workqueue just does not understand it is supposed to stop and merrily goes on its way in spite of all. Then it can not be closed because it is "busy" talking to a now-nonexistent piece of hardware, and the calling program can not be closed, either, because it is "busy" as well. Unless I get interrupted by something extraneous, I should have a log sent along in a few minutes. God knows, it is easy enough to create one. Theodore Kilgore -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 3 Feb 2009 13:15:58 -0600 (CST) kilgota@banach.math.auburn.edu wrote: > > Why is there 2 sq905 processes? > > I of course do not fully understand why there are two such processes. > However, I would suspect that [sq905/0] is running on processor 0 and > [sq905/1] is running on processor 1. As I remember, there is only one > [sq905] process which runs on a single-core machine. Indeed, the problem is there! You must have only one process reading the webcam! I do not see how this can work with these 2 processes...
On Tue, 3 Feb 2009, Jean-Francois Moine wrote: > On Tue, 3 Feb 2009 12:21:55 -0600 (CST) > kilgota@banach.math.auburn.edu wrote: > >> I should add to the above that now I have tested, and indeed this >> change does not solve the problem of kernel oops after disconnect >> while streaming. It does make one change. The xterm does not go wild >> with error messages. But it is still not possible to close the svv >> window. Moreover, ps ax reveals that [svv] is running as an >> unkillable process, with [sq905/0] and [sq905/1], equally unkillable, >> in supporting roles. And dmesg reveals an oops. The problem is after >> all notorious by now, so I do not see much need for yet another log >> of debug output unless specifically asked for such. > > Why is there 2 sq905 processes? > > What is the oops? (Your last trace was good, because it gave the last > gspca/sq905 messages and the full oops) Right, here is the output. I have not searched for precise differences, but a glance at it leaves me with the feeling that it is the same old same old. This was done on the Pentium 4 Dual Core, with gspca_main loaded with option debug=255, and this is the dmesg output which resulted when I pulled the cord of the camera. Theodore Kilgore
On Tue, 3 Feb 2009 13:54:15 -0600 (CST) kilgota@banach.math.auburn.edu wrote: > > Indeed, the problem is there! You must have only one process > > reading the webcam! I do not see how this can work with these 2 > > processes... > > The problem, then, would seem to me to boil down to the question of > whether that is up to us. Apparently, a decision like that is not up > to us, but rather it is up to the compiler and to the rest of the > kernel to decide. Which, incidentally, appears to me to be a very > logical way to arrange things. Presumably, a dual- or multi-core > machine gives certain advantages, or it ought to, but it also > requires certain accommodations. Yes, a multiprocessor machine is a plus, but, you must run only one process to handle streaming. If you have not seen it yet, this is done changing the line 373 of sq905.c (if I have the same source as yours) from: dev->work_thread = create_workqueue(MODULE_NAME); to dev->work_thread = create_singlethread_workqueue(MODULE_NAME);
On Tue, 3 Feb 2009, Jean-Francois Moine wrote: > > Perhaps we also need to do what Adam suggested yesterday, and add a > > call to destroy_urbs() in gspca_disconnect()? > > Surely not! The destroy_urbs() must be called at the right time, i.e. > on close(). Theodore was pointing out that destroy_urbs() must also be called during disconnect. If a camera is unplugged while it is in use then waiting until close() is no good -- it will cause destroy_urbs() to pass a stale pointer to usb_buffer_free(). That's the reason for the oops. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 3 Feb 2009, Jean-Francois Moine wrote: > On Tue, 3 Feb 2009 13:15:58 -0600 (CST) > kilgota@banach.math.auburn.edu wrote: > >>> Why is there 2 sq905 processes? >> >> I of course do not fully understand why there are two such processes. >> However, I would suspect that [sq905/0] is running on processor 0 and >> [sq905/1] is running on processor 1. As I remember, there is only one >> [sq905] process which runs on a single-core machine. > > Indeed, the problem is there! You must have only one process reading the > webcam! I do not see how this can work with these 2 processes... The problem, then, would seem to me to boil down to the question of whether that is up to us. Apparently, a decision like that is not up to us, but rather it is up to the compiler and to the rest of the kernel to decide. Which, incidentally, appears to me to be a very logical way to arrange things. Presumably, a dual- or multi-core machine gives certain advantages, or it ought to, but it also requires certain accommodations. You know, Jean-Francois, in a way it is a lucky accident that I got this machine for Christmas. I would never have fired up the Pentium 4, at least until sometime in the unforeseeable future, because in fact I was getting quite adequate performance out of the old Sempron box. Thus, I would not have been aware of this problem, either. We would have gone right ahead, Adam and I, blissfully ignorant, and published a module which has a flaw on a dual-core machine. So, in spite of the problems I say it is better to face the problems now. Theodore Kilgore -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 3 Feb 2009 kilgota@banach.math.auburn.edu wrote: > > > On Tue, 3 Feb 2009, Jean-Francois Moine wrote: > > > On Tue, 3 Feb 2009 13:15:58 -0600 (CST) > > kilgota@banach.math.auburn.edu wrote: > > > >>> Why is there 2 sq905 processes? > >> > >> I of course do not fully understand why there are two such processes. > >> However, I would suspect that [sq905/0] is running on processor 0 and > >> [sq905/1] is running on processor 1. As I remember, there is only one > >> [sq905] process which runs on a single-core machine. > > > > Indeed, the problem is there! You must have only one process reading the > > webcam! I do not see how this can work with these 2 processes... > > The problem, then, would seem to me to boil down to the question of > whether that is up to us. Apparently, a decision like that is not up to > us, but rather it is up to the compiler and to the rest of the kernel to > decide. Nonsense. It's simply a matter of how you create your workqueue. In the code you sent me, you call create_workqueue(). Instead, just call create_singlethread_workqueue(). Or maybe even create_freezeable_workqueue(). Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday 03 February 2009, Jean-Francois Moine wrote: > Indeed, the problem is there! You must have only one process reading the > webcam! I do not see how this can work with these 2 processes... Although 2 processes are created only one ever gets used. create_singlethread_workqueue would therefore be less wasteful but make no other difference. Google searches for create_freezeable_workqueue bring up suggestions to avoid it so I think I will although I guess we ought to check at some point how well the camera copes with suspend while streaming. At 19:53:31 Alan Stern wrote > If a camera is unplugged while it is in use then > waiting until close() is no good -- it will cause destroy_urbs() to > pass a stale pointer to usb_buffer_free(). That's the reason for the > oops. I'd go further and suggest that gspca_disconnect should, after calling destroy_urbs() do gspca_dev->dev = NULL; Any use of that pointer past that point is a bug (the downside is that doing so would have turned the current bug into a hard to spot memory leak). At 03:37:35 Alan Stern wrote > On Mon, 2 Feb 2009, Adam Baker wrote: >> This shouldn't be a problem for sq905 (which doesn't use these URBs) or >> isochronous cameras (which don't need to resubmit URBs) but the finepix >> driver (the other supported bulk device) will need some careful >> consideration to avoid a race between killing the URB and resubmitting it. > > You shouldn't need to take any special action. usb_kill_urb() solves > these races for you; it guarantees not to return until the URB has been > unlinked and the completion handler has finished, and it guarantees > that attempts by the completion handler to resubmit the URB will fail. The sq905 driver doesn't use the URBs provided by gspca, it uses usb_control_msg and usb_bulk_msg which I presume do the right thing internally. There would be a tiny window in between when it checks the dev->streaming flag and when it sends a new USB msg for the disconnect to occur and invalidate the dev pointer. That could be fixed by holding gspca_dev->usb_lock in gspca_disconnect when it sets gspca_dev->present = 0. That would also address the race between open and disconnect. Unfortunately the finepix driver sometimes uses calls to schedule_delayed_work in the completion handler which then makes the call to usb_submit_urb. Fixing that will require someone with access to a suitable camera to test it otherwise there is a significant risk of adding deadlocks. It already suffers from this bug so we aren't making it worse. Calling destroy_urbs from dev_close rather than gspca_disconnect could also trigger the same bug on isochronous cameras. I haven't looked at them closely but they probably also have the race between open and disconnect which taking the lock in disconnect is likely to fix. I'll do some testing and post a patch if it looks promising. Adam -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 3 Feb 2009, Alan Stern wrote: > On Tue, 3 Feb 2009 kilgota@banach.math.auburn.edu wrote: > >> >> >> On Tue, 3 Feb 2009, Jean-Francois Moine wrote: >> >>> On Tue, 3 Feb 2009 13:15:58 -0600 (CST) >>> kilgota@banach.math.auburn.edu wrote: >>> >>>>> Why is there 2 sq905 processes? >>>> >>>> I of course do not fully understand why there are two such processes. >>>> However, I would suspect that [sq905/0] is running on processor 0 and >>>> [sq905/1] is running on processor 1. As I remember, there is only one >>>> [sq905] process which runs on a single-core machine. >>> >>> Indeed, the problem is there! You must have only one process reading the >>> webcam! I do not see how this can work with these 2 processes... >> >> The problem, then, would seem to me to boil down to the question of >> whether that is up to us. Apparently, a decision like that is not up to >> us, but rather it is up to the compiler and to the rest of the kernel to >> decide. > > Nonsense. It's simply a matter of how you create your workqueue. In > the code you sent me, you call create_workqueue(). Instead, just call > create_singlethread_workqueue(). Or maybe even > create_freezeable_workqueue(). > > Alan Stern > OK, seems one way out, might even work. I will definitely try that. Update. I did try it. No it does not work, sorry. :/ I changed the line dev->work_thread = create_workqueue(MODULE_NAME); to read dev->work_thread = create_singlethread_workqueue(MODULE_NAME); As a result, I can definitely report that only _two_ processes were frozen when the cord was pulled, named [svv] and [sq905]. I am sure that the attached log file of the oops looks a little bit different from the previous ones. It must, after all. While you have this matter on your mind, I am curious about the following: As the code for the sq905 module evolved through various stages, the only occasion on which any real trouble arose was at the end, when we put in the mutex locks which you can see in the code now. Before they were put in, these problems which we are discussing now did not occur. Specifically, there was not any such problem about an oops caused by camera removal until the mutex locks were put in the code. And I strongly suspect -- nay, I am almost certain -- that with that the same code you are looking at now, the oops would go away if all those mutex locks were simply commented out and the code re-compiled and reinstalled. Can you explain this? I am just curious about why.
On Tue, 3 Feb 2009, Adam Baker wrote: > On Tuesday 03 February 2009, Jean-Francois Moine wrote: >> Indeed, the problem is there! You must have only one process reading the >> webcam! I do not see how this can work with these 2 processes... > > Although 2 processes are created only one ever gets used. How do you know that? Just curious. > create_singlethread_workqueue would therefore be less wasteful but make no > other difference. In at least one respect, you seem to be right. The oops still occurs. See my previous mail. Theodore Kilgore -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 3 Feb 2009, Adam Baker wrote: > The sq905 driver doesn't use the URBs provided by gspca, it uses > usb_control_msg and usb_bulk_msg which I presume do the right thing > internally. There would be a tiny window in between when it checks the > dev->streaming flag and when it sends a new USB msg for the disconnect to > occur and invalidate the dev pointer. That could be fixed by holding > gspca_dev->usb_lock in gspca_disconnect when it sets gspca_dev->present = 0. > > That would also address the race between open and disconnect. > > Unfortunately the finepix driver sometimes uses calls to schedule_delayed_work > in the completion handler which then makes the call to usb_submit_urb. Fixing > that will require someone with access to a suitable camera to test it > otherwise there is a significant risk of adding deadlocks. It already suffers > from this bug so we aren't making it worse. If the driver submits URBs from a work routine then usb_kill_urb's guarantees don't apply. You'll need to synchronize all three routines: disconnect, the completion handler, and the work routine. That means you'll have to use a spinlock, since a completion handler isn't allowed to acquire a mutex. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 3 Feb 2009 kilgota@banach.math.auburn.edu wrote: > > Nonsense. It's simply a matter of how you create your workqueue. In > > the code you sent me, you call create_workqueue(). Instead, just call > > create_singlethread_workqueue(). Or maybe even > > create_freezeable_workqueue(). > > > > Alan Stern > > > > OK, seems one way out, might even work. I will definitely try that. > > Update. I did try it. > > No it does not work, sorry. :/ Again, nonsense. Of course it works. It causes the kernel to create only one workqueue thread instead of two. That's what it's supposed to do -- it was never intended to fix your oops. > While you have this matter on your mind, I am curious about the following: > > As the code for the sq905 module evolved through various stages, the > only occasion on which any real trouble arose was at the end, when we put > in the mutex locks which you can see in the code now. Before they were put > in, these problems which we are discussing now did not occur. > Specifically, there was not any such problem about an oops caused by > camera removal until the mutex locks were put in the code. And I strongly > suspect -- nay, I am almost certain -- that with that the same code you > are looking at now, the oops would go away if all those mutex locks were > simply commented out and the code re-compiled and reinstalled. Can you > explain this? I am just curious about why. You're wrong, the oops would not go away. It would just become a lot less likely to occur -- and thereby all the more insidious. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2009-02-03 at 22:09 +0000, Adam Baker wrote: > On Tuesday 03 February 2009, Jean-Francois Moine wrote: > > Indeed, the problem is there! You must have only one process reading the > > webcam! I do not see how this can work with these 2 processes... > > Although 2 processes are created only one ever gets used. > create_singlethread_workqueue would therefore be less wasteful but make no > other difference. This is generally not the case. There is a single workqueue as far as a driver is concerned. Work items submitted to the queue by the driver are set to be processed by the same CPU submitting the work item (unless you call queue_work_on() to specify the CPU). However, there is no guarantee that the same CPU will be submitting work requests to the workqueue every time. For most drivers this is a moot point though, because they only ever instantiate and submit one work object ever per device. This means the workqueue depth never exceeds 1 for most drivers. So the correct statement would be, I believe, "only one sq905 worker thread ever gets used (per device per capture) at a time in sq905.c" (For the cx18 driver, where the workqueue can have up to 70 items queued for all ongoing capture streams per device, it makes quite a difference on whether a single thread or multiple threads process the work queue. To preserve ordering of the work items, as needed for work orders for moving video buffers from one place to another, a singlethreaded workqueue had to be used.) I did look at the patch as submitted on Jan 19, and do have what I intend to be constructive criticisms (sorry if they're overcome by events by now): Creating and destroying the worker thread(s) at the start and stop of each capture is a bit overkill. It's akin to registering and unregistering a device's interrupt handler before and after every capture, but it's a bit worse overhead-wise. It's probably better to just instantiate the workqueue when the device "appears" (I'm not a USB guy so insert appropraite term there) and destroy the workqueue and worker threads(s) when the device is going to "disappear". Or if it will meet your performance requirements, create and destroy the workqueue when you init and remove the module. The workqueue and its thread(s) are essentially the bottom half of your interrupt handler to handle deferrable work - no point in killing them off until you really don't need them anymore. Also, you've created the workqueue threads with a non-unique name: the expansion of MODULE_NAME. You're basically saying that you only need one workqueue, with it's per CPU thread(s), for all instantiations of an sq905 device - which *can* be a valid design choice. However, you're bringing them up and tearing them down on a per capture basis. That's a problem that needs to be corrected if you intend to support multiple sq905 devices on a single machine. What happens when you attempt to have two sq905 devices do simultaneous captures? I don't know myself; I've never tried to create 2 separate instantiations of a workqueue object with the same name. Regards, Andy -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 3 Feb 2009, Alan Stern wrote: > On Tue, 3 Feb 2009 kilgota@banach.math.auburn.edu wrote: > >>> Nonsense. It's simply a matter of how you create your workqueue. In >>> the code you sent me, you call create_workqueue(). Instead, just call >>> create_singlethread_workqueue(). Or maybe even >>> create_freezeable_workqueue(). >>> >>> Alan Stern >>> >> >> OK, seems one way out, might even work. I will definitely try that. >> >> Update. I did try it. >> >> No it does not work, sorry. :/ > > Again, nonsense. Of course it works. It causes the kernel to create > only one workqueue thread instead of two. OK, yes, it did do that. That's what it's supposed to > do -- it was never intended to fix your oops. OK. > >> While you have this matter on your mind, I am curious about the following: >> >> As the code for the sq905 module evolved through various stages, the >> only occasion on which any real trouble arose was at the end, when we put >> in the mutex locks which you can see in the code now. Before they were put >> in, these problems which we are discussing now did not occur. >> Specifically, there was not any such problem about an oops caused by >> camera removal until the mutex locks were put in the code. And I strongly >> suspect -- nay, I am almost certain -- that with that the same code you >> are looking at now, the oops would go away if all those mutex locks were >> simply commented out and the code re-compiled and reinstalled. Can you >> explain this? I am just curious about why. > > You're wrong, the oops would not go away. It would just become a lot > less likely to occur -- and thereby all the more insidious. How nice. Thanks for explaining. > > Alan Stern > -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday 04 February 2009, Andy Walls wrote: > On Tue, 2009-02-03 at 22:09 +0000, Adam Baker wrote: > > On Tuesday 03 February 2009, Jean-Francois Moine wrote: > > > Indeed, the problem is there! You must have only one process reading > > > the webcam! I do not see how this can work with these 2 processes... > > > > Although 2 processes are created only one ever gets used. > > create_singlethread_workqueue would therefore be less wasteful but make > > no other difference. > > This is generally not the case. There is a single workqueue as far as a > driver is concerned. Work items submitted to the queue by the driver > are set to be processed by the same CPU submitting the work item (unless > you call queue_work_on() to specify the CPU). However, there is no > guarantee that the same CPU will be submitting work requests to the > workqueue every time. > > For most drivers this is a moot point though, because they only ever > instantiate and submit one work object ever per device. This means the > workqueue depth never exceeds 1 for most drivers. So the correct > statement would be, I believe, "only one sq905 worker thread ever gets > used (per device per capture) at a time in sq905.c" Yes, I did mean only one gets used in the case of sq905.c (because the queue is created per capture and only one item gets submitted to it). <snip> > > I did look at the patch as submitted on Jan 19, and do have what I > intend to be constructive criticisms (sorry if they're overcome by > events by now): > > Creating and destroying the worker thread(s) at the start and stop of > each capture is a bit overkill. It's akin to registering and > unregistering a device's interrupt handler before and after every > capture, but it's a bit worse overhead-wise. It's probably better to > just instantiate the workqueue when the device "appears" (I'm not a USB > guy so insert appropraite term there) and destroy the workqueue and > worker threads(s) when the device is going to "disappear". Or if it > will meet your performance requirements, create and destroy the > workqueue when you init and remove the module. The workqueue and its > thread(s) are essentially the bottom half of your interrupt handler to > handle deferrable work - no point in killing them off until you really > don't need them anymore. > My thought was that the camera was likely to remain plugged in even if it wasn't being used so it was best to clean up as much as possible when it wasn't in use. I don't really know how the overheads of creating a workqueue when you do need it compares to leaving an unused one around sitting on the not ready queue in the process table but starting a capture is going to take many ms just for the USB traffic so a little extra overhead doesn't seem too worrying. > Also, you've created the workqueue threads with a non-unique name: the > expansion of MODULE_NAME. You're basically saying that you only need > one workqueue, with it's per CPU thread(s), for all instantiations of an > sq905 device - which *can* be a valid design choice. However, you're > bringing them up and tearing them down on a per capture basis. That's a > problem that needs to be corrected if you intend to support multiple > sq905 devices on a single machine. What happens when you attempt to > have two sq905 devices do simultaneous captures? I don't know myself; > I've never tried to create 2 separate instantiations of a workqueue > object with the same name. > I see multiple instance of [pdflush] and [nfsd] which seem to work fine so I believe the name doesn't need to be unique, just a guide to the user of what is eating CPU time. I don't have 2 sq905 cameras to test it but I have had left over workqueues caused by a driver bug stopping it shut down and new ones that started also worked fine. > > Regards, > Andy -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 4 Feb 2009, Adam Baker wrote: > On Wednesday 04 February 2009, Andy Walls wrote: >> On Tue, 2009-02-03 at 22:09 +0000, Adam Baker wrote: >>> On Tuesday 03 February 2009, Jean-Francois Moine wrote: >>>> Indeed, the problem is there! You must have only one process reading >>>> the webcam! I do not see how this can work with these 2 processes... >>> >>> Although 2 processes are created only one ever gets used. >>> create_singlethread_workqueue would therefore be less wasteful but make >>> no other difference. >> >> This is generally not the case. There is a single workqueue as far as a >> driver is concerned. Work items submitted to the queue by the driver >> are set to be processed by the same CPU submitting the work item (unless >> you call queue_work_on() to specify the CPU). However, there is no >> guarantee that the same CPU will be submitting work requests to the >> workqueue every time. >> >> For most drivers this is a moot point though, because they only ever >> instantiate and submit one work object ever per device. This means the >> workqueue depth never exceeds 1 for most drivers. So the correct >> statement would be, I believe, "only one sq905 worker thread ever gets >> used (per device per capture) at a time in sq905.c" > > Yes, I did mean only one gets used in the case of sq905.c (because the queue > is created per capture and only one item gets submitted to it). > > <snip> >> >> I did look at the patch as submitted on Jan 19, and do have what I >> intend to be constructive criticisms (sorry if they're overcome by >> events by now): >> >> Creating and destroying the worker thread(s) at the start and stop of >> each capture is a bit overkill. It's akin to registering and >> unregistering a device's interrupt handler before and after every >> capture, but it's a bit worse overhead-wise. It's probably better to >> just instantiate the workqueue when the device "appears" (I'm not a USB >> guy so insert appropraite term there) and destroy the workqueue and >> worker threads(s) when the device is going to "disappear". Or if it >> will meet your performance requirements, create and destroy the >> workqueue when you init and remove the module. The workqueue and its >> thread(s) are essentially the bottom half of your interrupt handler to >> handle deferrable work - no point in killing them off until you really >> don't need them anymore. >> > > My thought was that the camera was likely to remain plugged in even if it > wasn't being used so it was best to clean up as much as possible when it > wasn't in use. I don't really know how the overheads of creating a workqueue > when you do need it compares to leaving an unused one around sitting on the > not ready queue in the process table but starting a capture is going to take > many ms just for the USB traffic so a little extra overhead doesn't seem too > worrying. > >> Also, you've created the workqueue threads with a non-unique name: the >> expansion of MODULE_NAME. You're basically saying that you only need >> one workqueue, with it's per CPU thread(s), for all instantiations of an >> sq905 device - which *can* be a valid design choice. However, you're >> bringing them up and tearing them down on a per capture basis. That's a >> problem that needs to be corrected if you intend to support multiple >> sq905 devices on a single machine. What happens when you attempt to >> have two sq905 devices do simultaneous captures? I don't know myself; >> I've never tried to create 2 separate instantiations of a workqueue >> object with the same name. >> > > I see multiple instance of [pdflush] and [nfsd] which seem to work fine so I > believe the name doesn't need to be unique, just a guide to the user of what > is eating CPU time. I don't have 2 sq905 cameras to test it ... Well, I do. Here is what happens: 1. Plugged in one of them, started the streaming. 2. Plugged in the second one. First one continues to stream, no problem. Attempt to start a stream from the second camera while the first one is streaming results in the error message, VIDIOC_REQBUFS error 16, Device or resource busy and the command line reappears. No obvious interference with the stream from the first camera is apparent. 3. After the stream from the first camera is closed, the streaming can be started again. However, the stream is again from the first camera which was plugged in. 4. After removing the first camera which was plugged in, I tried to start the stream from the second one. The stream will not start. A message says that Cannot identify 'dev/video0': 2. No such file or directory. 5. Second camera left attached, first camera plugged in again. The first camera can stream again. The second one can not. 6. When the first camera is removed and the second one is then re-plugged, the second one will stream. So, we can safely conclude that, as the module is presently constituted, there can be only one stream at a time. The second camera plugged in cannot stream. Neither can it conflict with the first one. The only thing which puzzles me a little bit is item 5. Apparently, the question is about which port the camera was connected to. So let's try three cameras. I will hook up camera number one, get a stream, close it, hook up number two through a different port, then remove number one and replace it with number three on the same port where number one was. What I suspected would happen is what happened. Number two would not stream. Number three would stream just as number one did. In other words, this was determined by which port was in use for number one, which is now also in use for number three. So, now we know what happens. Theodore Kilgore -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday 04 February 2009, kilgota@banach.math.auburn.edu wrote: <snip description of attempting to stream from 2 cameras at once> > 4. After removing the first camera which was plugged in, I tried to start > the stream from the second one. The stream will not start. A message says > that > > Cannot identify 'dev/video0': 2. No such file or directory. This line points to an error in your test method. You need to start the second stream with svv -d /dev/video1 to tell it to pick the second camera. Adam -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 4 Feb 2009, Adam Baker wrote: > On Wednesday 04 February 2009, kilgota@banach.math.auburn.edu wrote: > <snip description of attempting to stream from 2 cameras at once> >> 4. After removing the first camera which was plugged in, I tried to start >> the stream from the second one. The stream will not start. A message says >> that >> >> Cannot identify 'dev/video0': 2. No such file or directory. > > This line points to an error in your test method. > > You need to start the second stream with svv -d /dev/video1 to tell it to pick > the second camera. > > Adam > Oops, right. Well, in that case I have to report that two cameras work simultaneously just fine. No problem at all. Theodore Kilgore -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 4 Feb 2009, kilgota@banach.math.auburn.edu wrote: > > > On Wed, 4 Feb 2009, Adam Baker wrote: > >> On Wednesday 04 February 2009, kilgota@banach.math.auburn.edu wrote: >> <snip description of attempting to stream from 2 cameras at once> >>> 4. After removing the first camera which was plugged in, I tried to start >>> the stream from the second one. The stream will not start. A message says >>> that >>> >>> Cannot identify 'dev/video0': 2. No such file or directory. >> >> This line points to an error in your test method. >> >> You need to start the second stream with svv -d /dev/video1 to tell it to >> pick >> the second camera. >> >> Adam >> > > Oops, right. > > Well, in that case I have to report that two cameras work simultaneously just > fine. No problem at all. For completeness, I should add the following: ps ax now shows two svv processes. One of them is svv -d /dev/video1 and is followed by [sq905] and the second one is svv followed by [sq905]. Why in reverse order? Well, probably it is that I now fired up the second camera first and the first one after that. The workqueue in the module is started with dev->work_thread = create_singlethread_workqueue(MODULE_NAME); so these are distinct instances of [sq905]. Now am I supposed to check what happens if I start jerking them off of their tethers without closing the capture windows? No thanks, not this time. Let's wait until that problem is solved with just one camera. :) Theodore Kilgore -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff -r 3f4a7bc53d8e linux/drivers/media/video/gspca/gspca.c --- a/linux/drivers/media/video/gspca/gspca.c Mon Feb 02 20:25:38 2009 +0100 +++ b/linux/drivers/media/video/gspca/gspca.c Tue Feb 03 10:37:51 2009 +0100 @@ -435,7 +435,7 @@ break; gspca_dev->urb[i] = NULL; - if (!gspca_dev->present) + if (gspca_dev->present) usb_kill_urb(urb); if (urb->transfer_buffer != NULL) usb_buffer_free(gspca_dev->dev,