Message ID | 20180806222918.12644-13-jmkrzyszt@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC,v2,01/12] mtd: rawnand: ams-delta: Assign mtd->dev.parent, not mtd->owner | expand |
On Tue, Aug 7, 2018 at 12:29 AM Janusz Krzysztofik <jmkrzyszt@gmail.com> wrote: Hi Janusz! > Certain GPIO descriptor arrays returned by gpio_get_array() may contain > information on a single GPIO chip driving array member pins in hardware > order. In such cases, bitmaps of values can be passed directly to the > chip callback functions without wasting time on iterations. > > Add respective code to gpiod_get/set_array_bitmap_complex() functions. > > Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com> I think it would be disappointing to leave all the users of the old array API without the performance improvement. I think we need to deal with this in a way such that everyone can benefit from it. Also it is kludgy if users (consumers) would need to handle the case where all lines are on the same chip separately, through the bitmap API. What we need is an API that: - All drivers handling arrays can use (including current users). - Enables speed-up if the lines are all on the same chip/register. - Doesn't require consumers to know if they are all on the same chip or not. This means a deep API with a small surface. How do we achieve this the best way? Yours, Linus Walleij
On Tuesday, August 7, 2018 1:43:56 AM CEST Linus Walleij wrote: > On Tue, Aug 7, 2018 at 12:29 AM Janusz Krzysztofik <jmkrzyszt@gmail.com> wrote: > > Hi Janusz! > > > Certain GPIO descriptor arrays returned by gpio_get_array() may contain > > information on a single GPIO chip driving array member pins in hardware > > order. In such cases, bitmaps of values can be passed directly to the > > chip callback functions without wasting time on iterations. > > > > Add respective code to gpiod_get/set_array_bitmap_complex() functions. > > > > Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com> > > I think it would be disappointing to leave all the users of the old > array API without the performance improvement. I think we need to > deal with this in a way such that everyone can benefit from it. There are a few issues to be resolved: 1) array size limited by bitmap size: - are we ready to limit array size to a single bitmap for all users? - if not, how can we pass a bitmap of an arbitrary size? - if as an array of bitmaps, is that still clear enough and easy to use? - other ideas? 2) arbitrary array support: - are we ready to drop that? - if not, do we agree to require all users to pack their arbitrary arrays inside the gpio_descs structure? Maybe more. > Also it is kludgy if users (consumers) would need to handle the case > where all lines are on the same chip separately, through the bitmap > API. Not true as long as array size fits (arbitrary arrays can be packed by users), but I see your point. > What we need is an API that: > > - All drivers handling arrays can use (including current users). > > - Enables speed-up if the lines are all on the same chip/register. > > - Doesn't require consumers to know if they are all on the same > chip or not. > > This means a deep API with a small surface. > > How do we achieve this the best way? I think widely accepted solutions to those two issues I've mentioned above can give the answer. Thanks, Janusz
Hi Janusz, On Tue, 07 Aug 2018 19:29:53 +0200 Janusz Krzysztofik <jmkrzyszt@gmail.com> wrote: > On Tuesday, August 7, 2018 1:43:56 AM CEST Linus Walleij wrote: > > On Tue, Aug 7, 2018 at 12:29 AM Janusz Krzysztofik <jmkrzyszt@gmail.com> > wrote: > > > > Hi Janusz! > > > > > Certain GPIO descriptor arrays returned by gpio_get_array() may contain > > > information on a single GPIO chip driving array member pins in hardware > > > order. In such cases, bitmaps of values can be passed directly to the > > > chip callback functions without wasting time on iterations. > > > > > > Add respective code to gpiod_get/set_array_bitmap_complex() functions. > > > > > > Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com> > > > > I think it would be disappointing to leave all the users of the old > > array API without the performance improvement. I think we need to > > deal with this in a way such that everyone can benefit from it. I agree with Linus on that one. When I initially proposed the gpio bitbanging API I had something more advanced in mind where the GPIO framework would be responsible for toggling the GPIOs on its own when it's given an array of bytes to transmit (this way you avoid going back and forth between the GPIO user and the GPIO framework). But this approach would clearly be more invasive than what you propose here (turning the int array into a bitmap and optimizing). So, if we go for the "int array -> bitmap" approach I think all users should be converted so that we end up with a single API. > > There are a few issues to be resolved: > > 1) array size limited by bitmap size: > - are we ready to limit array size to a single bitmap for all users? > - if not, how can we pass a bitmap of an arbitrary size? > - if as an array of bitmaps, is that still clear enough and easy to use? > - other ideas? What we call a bitmap is an array of unsigned longs each entry containing NBITS_PER_LONG bits, so yes, it's an arbitrary size (see the bitmap API here [1]). > > 2) arbitrary array support: > - are we ready to drop that? > - if not, do we agree to require all users to pack their arbitrary arrays > inside the gpio_descs structure? I could only find one user, and it's the core itself (for the ioctl), so that shouldn't be too hard to convert all users. Did you find more. > > Maybe more. > > > Also it is kludgy if users (consumers) would need to handle the case > > where all lines are on the same chip separately, through the bitmap > > API. > > Not true as long as array size fits (arbitrary arrays can be packed by users), > but I see your point. I think the API should be the same and the framework should decide to take the fast path if all gpios belong to the same chip (which AFAICT is already the case, except it's putting the result in an int array instead of a bitmap) > > > What we need is an API that: > > > > - All drivers handling arrays can use (including current users). > > > > - Enables speed-up if the lines are all on the same chip/register. > > > > - Doesn't require consumers to know if they are all on the same > > chip or not. > > > > This means a deep API with a small surface. > > > > How do we achieve this the best way? > > I think widely accepted solutions to those two issues I've mentioned above can > give the answer. I'd still like to see how far we are from the initial perfs (the one poking the GPIO regs directly) with this approach, and what's the improvement compared to the int array solution we already have in place. Regards, Boris [1]https://elixir.bootlin.com/linux/v4.18-rc8/source/include/linux/bitmap.h
On Tue, Aug 7, 2018 at 7:47 PM Boris Brezillon <boris.brezillon@bootlin.com> wrote: > Janusz Krzysztofik <jmkrzyszt@gmail.com> wrote: > > > On Tuesday, August 7, 2018 1:43:56 AM CEST Linus Walleij wrote: > > > On Tue, Aug 7, 2018 at 12:29 AM Janusz Krzysztofik <jmkrzyszt@gmail.com> > > wrote: > > > > > > Hi Janusz! > > > > > > > Certain GPIO descriptor arrays returned by gpio_get_array() may contain > > > > information on a single GPIO chip driving array member pins in hardware > > > > order. In such cases, bitmaps of values can be passed directly to the > > > > chip callback functions without wasting time on iterations. > > > > > > > > Add respective code to gpiod_get/set_array_bitmap_complex() functions. > > > > > > > > Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com> > > > > > > I think it would be disappointing to leave all the users of the old > > > array API without the performance improvement. I think we need to > > > deal with this in a way such that everyone can benefit from it. > > I agree with Linus on that one. When I initially proposed the gpio > bitbanging API I had something more advanced in mind where the GPIO > framework would be responsible for toggling the GPIOs on its own when > it's given an array of bytes to transmit (this way you avoid going > back and forth between the GPIO user and the GPIO framework). But this > approach would clearly be more invasive than what you propose > here (turning the int array into a bitmap and optimizing). So, if we go > for the "int array -> bitmap" approach I think all users should be > converted so that we end up with a single API. I thought about this the recent days and something must have gone wrong in the development process of the array API because this was the (mine atleast) intention all the time. If we look at the GPIOchip driver API it looks like this: int (*get_multiple)(struct gpio_chip *chip, unsigned long *mask, unsigned long *bits); void (*set_multiple)(struct gpio_chip *chip, unsigned long *mask, unsigned long *bits); So there is nothing hindering the drivers from optimizing a call here into a single register write, which is what e.g. the gpio-mmio.c driver does: if the hardware has a dedicated register for clearing and setting lines, it will simply just write the register with whatever is passed in, also cache the current value so it doesn't need to read back the register every time. When an array comes down to gpiod_set_array_value_complex() it loops over the descriptors in order to handle e.g. open drain settings separately. Then the remainder (lines that should just be set 1/0) is pushed to the .set_multiple() callback if they are on the same chip. This is assuming: 1. The CPU is not the bottleneck so we can do a bit of complex looping over structs etc in each write. 2. We want to perform as much in a single register write as possible to avoid I/O and glitches as all lines (e.g. clock and data) get written at the same time, if possible. (No skew.) It seems Janusz has problems with assumption (1) and therefore is trying to optimize the read/write path. This can be done if all descs are on the same chip and none of them is using open drain or open source. To keep track of "quick path" the array needs to have a state. So a magic "cookie" or something like that needs to be passed to the array API. I would suggest that struct gpiod_descs contain a magic cookie returned from [devm_]gpiod_get_array[_optional]() that can be passed along to get/set array operations or left as NULL to just fall back to the default: void gpiod_set_array_value(unsigned int array_size, struct gpio_desc **desc_array, int *value_array, struct gpiod_array_cookie *cookie); Cookie is just a dummy name, I don't know what makes most sense. It reflects a state for the entire array. If this cookie exist in some struct gpio_chip state variable, it informs gpiolib that this array: - Has all descriptors in the same gpiochip - Has no open drain or open source-flagged lines It can thenbypass the complex check and just write the values directly by calling down into .set_multiple(). Maybe this cookie could just be a bool that is true when the above is fulfilled. But it's best if that is hidden from the consumers I guess, they shouldn't try to half-guess if the criteria is true, gpiolib should do that. The current users would have to be augmented to store the cookie and pass it along when getting/setting arrays, but it would be pretty simple and straight-forward compared to adding a new API. Yours, Linus Walleij
diff --git a/Documentation/driver-api/gpio/consumer.rst b/Documentation/driver-api/gpio/consumer.rst index bec4eab3b87c..b82f134dc352 100644 --- a/Documentation/driver-api/gpio/consumer.rst +++ b/Documentation/driver-api/gpio/consumer.rst @@ -409,6 +409,12 @@ descriptor arrays, only those of type struct gpio_descs returned by gpiod_get_array() and its variants. Supported array size is limited to the size of the bitmap, i.e., sizeof(unsigned long). +If the .chip member of the array structure, filled in by gpiod_get_array() in +certain circumstances, contains a valid GPIO chip descriptor, the raw variants +of the functions can take fast processing paths, passing bitmap arguments +directly to the chip callback functions. That allows for utilization of GPIO +banks as data I/O ports without much loss of performance. + GPIOs mapped to IRQs -------------------- diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 5b541364dee0..bf95f2964bc5 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -2846,6 +2846,12 @@ int gpiod_get_array_bitmap_complex(bool raw, bool can_sleep, if (array->ndescs > sizeof(*bits)) return -EINVAL; + if (raw && !IS_ERR_OR_NULL(array->chip)) { + unsigned long mask = (1ULL << array->ndescs) - 1; + + return gpio_chip_get_multiple(array->chip, &mask, bits); + } + i = gpiod_get_array_value_complex(raw, can_sleep, array->ndescs, array->desc, value_array); if (i) @@ -3156,6 +3162,14 @@ int gpiod_set_array_bitmap_complex(bool raw, bool can_sleep, if (array->ndescs > sizeof(*bits)) return -EINVAL; + if (raw && !IS_ERR_OR_NULL(array->chip)) { + unsigned long mask = (1ULL << array->ndescs) - 1; + + gpio_chip_set_multiple(array->chip, &mask, bits); + + return 0; + } + for (i = 0; i < array->ndescs; i++) value_array[i] = test_bit(i, bits);
Certain GPIO descriptor arrays returned by gpio_get_array() may contain information on a single GPIO chip driving array member pins in hardware order. In such cases, bitmaps of values can be passed directly to the chip callback functions without wasting time on iterations. Add respective code to gpiod_get/set_array_bitmap_complex() functions. Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com> --- Documentation/driver-api/gpio/consumer.rst | 6 ++++++ drivers/gpio/gpiolib.c | 14 ++++++++++++++ 2 files changed, 20 insertions(+)