Message ID | 1368457290-1734-1-git-send-email-dh.herrmann@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Jiri Kosina |
Headers | show |
2013/5/13 David Herrmann <dh.herrmann@gmail.com>: > Anyway, I'd still like to see this patch applied so we have this annoying > bug fixed. I'd be willing to change the power_supply core, too, if one of the > maintainers agrees on the design (David? Anton?). > > And, @Daniel, can you check whether this actually fixes the bug? I don't own > a device that reports battery-information, but it at least fixes the same bug > for the hid-wiimote driver (tested by me). Yes, it does fixes the bug. Now udevadm properly show add and remove events and upower happily get's them. But there is around 15 seconds block on the bluetooth stack, in other words when connecting a device it seems to probe the device which blocks till a timeout, and while that timeout doesn't finish other bluetooth devices are also blocked. It seems the devices aren't able to be probed so soon, possibly because bluetooth didn't finished setting them up. Looking at udevadm output I clearly see that it locks for around 3 times. My kernel knowledge is rather limited but if you can give me tips or patches to test I really want to see this code finally working properly, and sorry for not realizing when I sent it that it had this issue... -- Daniel Nicoletti KDE Developer - http://dantti.wordpress.com -- To unsubscribe from this list: send the line "unsubscribe linux-input" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 13, 2013 at 05:01:30PM +0200, David Herrmann wrote: [..] > I really dislike the way power_supply core calls into the drivers during the > "add" uevent. If a driver holds an I/O mutex (or anything else), it might > even deadlock in a very non-obvious way. Is there a reason why we need to > pass _all_ battery properties along "add" and "remove" uevents? Isn't it > enough to pass them with "change" uevents? This would guarantee that the > power_supply callbacks are only called from user-context and "change" events. I don't think that there is a particular reason for that, but if you want to change that, then I'd suggest to still keep uevent reporting of all the properties on "add" and "remove" events, but don't actually call the drivers' callback, just assume ENODATA. This way we well preserve the behaviour of the older kernels, so that userland will not break if, for example, it allocates needed memory on "add" event, and then assumes that "change" will follow the pattern. Thanks, Anton -- To unsubscribe from this list: send the line "unsubscribe linux-input" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Anton On Tue, May 14, 2013 at 1:20 AM, Anton Vorontsov <anton@enomsg.org> wrote: > On Mon, May 13, 2013 at 05:01:30PM +0200, David Herrmann wrote: > [..] >> I really dislike the way power_supply core calls into the drivers during the >> "add" uevent. If a driver holds an I/O mutex (or anything else), it might >> even deadlock in a very non-obvious way. Is there a reason why we need to >> pass _all_ battery properties along "add" and "remove" uevents? Isn't it >> enough to pass them with "change" uevents? This would guarantee that the >> power_supply callbacks are only called from user-context and "change" events. > > I don't think that there is a particular reason for that, but if you want > to change that, then I'd suggest to still keep uevent reporting of all the > properties on "add" and "remove" events, but don't actually call the > drivers' callback, just assume ENODATA. In case of ENODATA the property is entirely skipped and not included in the uevent. Therefore, "assuming ENODATA" would be skipping all properties. > This way we well preserve the behaviour of the older kernels, so that > userland will not break if, for example, it allocates needed memory on > "add" event, and then assumes that "change" will follow the pattern. I tried fixing this several ways, but there is one deadlock that we cannot overcome. If we assume a driver holds a lock during power_supply_unregister() that it also acquires in the get_property callback, then we will always deadlock. Just think of one CPU currently calling the power_supply_changed_work() callback while another CPU currently holds the driver's lock and calls power_supply_unregister(). power_supply_changed_work() will wait for the lock, while power_supply_unregister() will wait synchronously for the work to finish => deadlock As a side-note, in *_uevent() callbacks we have no chance to see what kind of event this is. We would have to fix lib/kobject_uevent.c to provide this information. So I am back to fixing drivers to allow I/O / safe callbacks while calling power_supply_register()/unregister(). This might be easy for HID-USB drivers to implement (there is hid_device_io_start/stop()), but for Bluetooth HID devices, this will not work as there are bluetooth locks that are still held. And fixing HIDP isn't enough, we would actually have to go all the way down to fix Bluetooth HCI core to provide more fine-grained locking (at least one per hci_conn). And even then I am not sure how to allow I/O while loading/unloading drivers on it. So if anybody wants to step up and fix this mess, feel free to go. I will give up here. I might try to extend Bluetooth HIDP to allow I/O during setup, but I currently cannot figure out _any_ way how we can allow this during destruction/unregistration. Sorry. Regards David -- To unsubscribe from this list: send the line "unsubscribe linux-input" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi On Wed, May 15, 2013 at 4:58 PM, David Herrmann <dh.herrmann@gmail.com> wrote: > Hi Anton > > On Tue, May 14, 2013 at 1:20 AM, Anton Vorontsov <anton@enomsg.org> wrote: >> On Mon, May 13, 2013 at 05:01:30PM +0200, David Herrmann wrote: >> [..] >>> I really dislike the way power_supply core calls into the drivers during the >>> "add" uevent. If a driver holds an I/O mutex (or anything else), it might >>> even deadlock in a very non-obvious way. Is there a reason why we need to >>> pass _all_ battery properties along "add" and "remove" uevents? Isn't it >>> enough to pass them with "change" uevents? This would guarantee that the >>> power_supply callbacks are only called from user-context and "change" events. >> >> I don't think that there is a particular reason for that, but if you want >> to change that, then I'd suggest to still keep uevent reporting of all the >> properties on "add" and "remove" events, but don't actually call the >> drivers' callback, just assume ENODATA. > > In case of ENODATA the property is entirely skipped and not included > in the uevent. Therefore, "assuming ENODATA" would be skipping all > properties. > >> This way we well preserve the behaviour of the older kernels, so that >> userland will not break if, for example, it allocates needed memory on >> "add" event, and then assumes that "change" will follow the pattern. > > I tried fixing this several ways, but there is one deadlock that we > cannot overcome. If we assume a driver holds a lock during > power_supply_unregister() that it also acquires in the get_property > callback, then we will always deadlock. Just think of one CPU > currently calling the power_supply_changed_work() callback while > another CPU currently holds the driver's lock and calls > power_supply_unregister(). power_supply_changed_work() will wait for > the lock, while power_supply_unregister() will wait synchronously for > the work to finish => deadlock > > As a side-note, in *_uevent() callbacks we have no chance to see what > kind of event this is. We would have to fix lib/kobject_uevent.c to > provide this information. > > So I am back to fixing drivers to allow I/O / safe callbacks while > calling power_supply_register()/unregister(). This might be easy for > HID-USB drivers to implement (there is hid_device_io_start/stop()), > but for Bluetooth HID devices, this will not work as there are > bluetooth locks that are still held. And fixing HIDP isn't enough, we > would actually have to go all the way down to fix Bluetooth HCI core > to provide more fine-grained locking (at least one per hci_conn). And > even then I am not sure how to allow I/O while loading/unloading > drivers on it. > > So if anybody wants to step up and fix this mess, feel free to go. I > will give up here. I might try to extend Bluetooth HIDP to allow I/O It was a bit naive to think my brain would just let this go.. So while I tried to sleep, my brain decided to continue working on this and despite it being a fairly hard to solve issue, I could come up with a race-free HIDP fix. With that prepared, we can actually make HID core allow I/O during device registration. During unregistration, the I/O layer will report ENODEV/EIO, anyway, so we don't have this hang in that case. I will try to move power_supply registration to a place where we can safely allow I/O in HID device registration. Then we should at least be good to go for HID devices. Cheers David -- To unsubscribe from this list: send the line "unsubscribe linux-input" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi On Mon, May 13, 2013 at 11:17 PM, Daniel Nicoletti <dantti12@gmail.com> wrote: > 2013/5/13 David Herrmann <dh.herrmann@gmail.com>: >> Anyway, I'd still like to see this patch applied so we have this annoying >> bug fixed. I'd be willing to change the power_supply core, too, if one of the >> maintainers agrees on the design (David? Anton?). >> >> And, @Daniel, can you check whether this actually fixes the bug? I don't own >> a device that reports battery-information, but it at least fixes the same bug >> for the hid-wiimote driver (tested by me). > > Yes, it does fixes the bug. Now udevadm properly show add and remove events > and upower happily get's them. > But there is around 15 seconds block on the bluetooth stack, in other words > when connecting a device it seems to probe the device which blocks till > a timeout, and while that timeout doesn't finish other bluetooth > devices are also > blocked. It seems the devices aren't able to be probed so soon, possibly > because bluetooth didn't finished setting them up. Looking at udevadm output > I clearly see that it locks for around 3 times. > My kernel knowledge is rather limited but if you can give me tips or patches to > test I really want to see this code finally working properly, and sorry for > not realizing when I sent it that it had this issue... The block occurs because power_supply core now tries all properties in a row instead of aborting if the first fails (which is what we want). However, bluetooth-core didn't allow I/O during probe so the timeout got quite huge considering 3s for 5 different properties instead of 3s once (which no-one noticed, yet..) This is now fixed with the HIDP-patch pending on linux-input and linux-bluetooth. For usbhid and uhid we already allow I/O during probe so this does not happen there. So I hope we can apply this for linux-next (probably after gustavo applied the HIDP fix)? Cheers David -- To unsubscribe from this list: send the line "unsubscribe linux-input" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 24 May 2013, David Herrmann wrote: > So I hope we can apply this for linux-next (probably after gustavo > applied the HIDP fix)? Applied to my tree, together with the hidp fix. Thanks David, thanks Daniel!
diff --git a/drivers/hid/hid-input.c b/drivers/hid/hid-input.c index 945b815..c526a3c 100644 --- a/drivers/hid/hid-input.c +++ b/drivers/hid/hid-input.c @@ -354,10 +354,10 @@ static int hidinput_get_battery_property(struct power_supply *psy, dev->battery_report_type); if (ret != 2) { - if (ret >= 0) - ret = -EINVAL; + ret = -ENODATA; break; } + ret = 0; if (dev->battery_min < dev->battery_max && buf[1] >= dev->battery_min &&
power_supply core has the bad habit of calling our battery callbacks from within power_supply_register(). Furthermore, if the callbacks fail with an unhandled error code, it will skip any uevent that it might currently process. So if HID-core registers battery devices, an "add" uevent is generated and the battery callbacks are called. These will gracefully fail due to timeouts as they might still hold locks on event processing. One could argue that this should be fixed in power_supply core, but the least we can do is to signal ENODATA so power_supply core will just skip the property and continue with the uevent. This fixes a bug where "add" and "remove" uevents are skipped for battery devices. upower is unable to track these devices and currently needs to ignore them. This patch also overwrites any other error code. I cannot see any reason why we should forward protocol- or I/O-errors to the power_supply core. We handle these errors in hid_ll_driver later, anyway, so just skip them. power_supply core cannot do anything useful with them, anyway, and we avoid skipping important uevents and confusing user-space. Thanks a lot to Daniel Nicoletti for pushing and investigating on this. Cc: Jiri Kosina <jkosina@suse.cz> Cc: Anton Vorontsov <cbou@mail.ru> Cc: David Woodhouse <dwmw2@infradead.org> Reported-by: Daniel Nicoletti <dantti12@gmail.com> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> --- Hi I really dislike the way power_supply core calls into the drivers during the "add" uevent. If a driver holds an I/O mutex (or anything else), it might even deadlock in a very non-obvious way. Is there a reason why we need to pass _all_ battery properties along "add" and "remove" uevents? Isn't it enough to pass them with "change" uevents? This would guarantee that the power_supply callbacks are only called from user-context and "change" events. Anyway, I'd still like to see this patch applied so we have this annoying bug fixed. I'd be willing to change the power_supply core, too, if one of the maintainers agrees on the design (David? Anton?). And, @Daniel, can you check whether this actually fixes the bug? I don't own a device that reports battery-information, but it at least fixes the same bug for the hid-wiimote driver (tested by me). Cheers David drivers/hid/hid-input.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)