Message ID | 20200528011154.30355-1-rentao.bupt@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | usb: gadget: aspeed: fixup vhub port irq handling | expand |
Hi Ben, I sent out the follow-on patch per Greg's suggestion, and the purpose is to include the latest version (v4) of the original commit. As v4 was already Acked-by you, can I add the tag for this patch? Or are you willing to Ack it again? Cheers, Tao On Wed, May 27, 2020 at 06:11:54PM -0700, rentao.bupt@gmail.com wrote: > From: Tao Ren <rentao.bupt@gmail.com> > > This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed: > improve vhub port irq handling"): for_each_set_bit() is replaced with > simple for() loop because for() loop runs faster on ASPEED BMC. > > Signed-off-by: Tao Ren <rentao.bupt@gmail.com> > --- > drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++------- > drivers/usb/gadget/udc/aspeed-vhub/vhub.h | 3 +++ > 2 files changed, 6 insertions(+), 7 deletions(-) > > diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c > index cdf96911e4b1..be7bb64e3594 100644 > --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c > +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c > @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data) > > /* Handle device interrupts */ > if (istat & vhub->port_irq_mask) { > - unsigned long bitmap = istat; > - int offset = VHUB_IRQ_DEV1_BIT; > - int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports; > - > - for_each_set_bit_from(offset, &bitmap, size) { > - i = offset - VHUB_IRQ_DEV1_BIT; > - ast_vhub_dev_irq(&vhub->ports[i].dev); > + for (i = 0; i < vhub->max_ports; i++) { > + if (istat & VHUB_DEV_IRQ(i)) > + ast_vhub_dev_irq(&vhub->ports[i].dev); > } > } > > diff --git a/drivers/usb/gadget/udc/aspeed-vhub/vhub.h b/drivers/usb/gadget/udc/aspeed-vhub/vhub.h > index 2e5a1ef14a75..87a5dea12d3c 100644 > --- a/drivers/usb/gadget/udc/aspeed-vhub/vhub.h > +++ b/drivers/usb/gadget/udc/aspeed-vhub/vhub.h > @@ -67,6 +67,9 @@ > #define VHUB_IRQ_HUB_EP0_SETUP (1 << 0) > #define VHUB_IRQ_ACK_ALL 0x1ff > > +/* Downstream device IRQ mask. */ > +#define VHUB_DEV_IRQ(n) (VHUB_IRQ_DEVICE1 << (n)) > + > /* SW reset reg */ > #define VHUB_SW_RESET_EP_POOL (1 << 9) > #define VHUB_SW_RESET_DMA_CONTROLLER (1 << 8) > -- > 2.17.1 >
Hi, rentao.bupt@gmail.com writes: > From: Tao Ren <rentao.bupt@gmail.com> > > This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed: > improve vhub port irq handling"): for_each_set_bit() is replaced with > simple for() loop because for() loop runs faster on ASPEED BMC. > > Signed-off-by: Tao Ren <rentao.bupt@gmail.com> > --- > drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++------- > drivers/usb/gadget/udc/aspeed-vhub/vhub.h | 3 +++ > 2 files changed, 6 insertions(+), 7 deletions(-) > > diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c > index cdf96911e4b1..be7bb64e3594 100644 > --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c > +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c > @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data) > > /* Handle device interrupts */ > if (istat & vhub->port_irq_mask) { > - unsigned long bitmap = istat; > - int offset = VHUB_IRQ_DEV1_BIT; > - int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports; > - > - for_each_set_bit_from(offset, &bitmap, size) { > - i = offset - VHUB_IRQ_DEV1_BIT; > - ast_vhub_dev_irq(&vhub->ports[i].dev); > + for (i = 0; i < vhub->max_ports; i++) { > + if (istat & VHUB_DEV_IRQ(i)) > + ast_vhub_dev_irq(&vhub->ports[i].dev); how have you measured your statement above? for_each_set_bit() does exactly what you did. Unless your architecture has an instruction which helps finds the next set bit (like cls on ARM), which, then, makes it much faster.
On Mon, Aug 17, 2020 at 04:49:32PM +0300, Felipe Balbi wrote: > > Hi, > > rentao.bupt@gmail.com writes: > > From: Tao Ren <rentao.bupt@gmail.com> > > > > This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed: > > improve vhub port irq handling"): for_each_set_bit() is replaced with > > simple for() loop because for() loop runs faster on ASPEED BMC. > > > > Signed-off-by: Tao Ren <rentao.bupt@gmail.com> > > --- > > drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++------- > > drivers/usb/gadget/udc/aspeed-vhub/vhub.h | 3 +++ > > 2 files changed, 6 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c > > index cdf96911e4b1..be7bb64e3594 100644 > > --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c > > +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c > > @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data) > > > > /* Handle device interrupts */ > > if (istat & vhub->port_irq_mask) { > > - unsigned long bitmap = istat; > > - int offset = VHUB_IRQ_DEV1_BIT; > > - int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports; > > - > > - for_each_set_bit_from(offset, &bitmap, size) { > > - i = offset - VHUB_IRQ_DEV1_BIT; > > - ast_vhub_dev_irq(&vhub->ports[i].dev); > > + for (i = 0; i < vhub->max_ports; i++) { > > + if (istat & VHUB_DEV_IRQ(i)) > > + ast_vhub_dev_irq(&vhub->ports[i].dev); > > how have you measured your statement above? for_each_set_bit() does > exactly what you did. Unless your architecture has an instruction which > helps finds the next set bit (like cls on ARM), which, then, makes it > much faster. I did some testing and result shows for() loop runs faster than for_each_set_bit() loop. Please refer to details below (discussion with Benjamin in the original patch) and kindly let me know your suggestions. > On Mon, 2020-04-06 at 23:02 -0700, Tao Ren wrote: > > I ran some testing on my ast2400 and ast2500 BMC and looks like the > > for() loop runs faster than for_each_set_bit_from() loop in my > > environment. I'm not sure if something needs to be revised in my test > > code, but please kindly share your suggestions: > > > > I use get_cycles() to calculate execution time of 2 different loops, and > > ast_vhub_dev_irq() is replaced with barrier() to avoid "noise"; below > > are the results: > > > > - when downstream port number is 5 and only 1 irq bit is set, it takes > > ~30 cycles to finish for_each_set_bit() loop, and 20-25 cycles to > > finish the for() loop. > > > > - if downstream port number is 5 and all 5 bits are set, then > > for_each_set_bit() loop takes ~50 cycles and for() loop takes ~25 > > cycles. > > > > - when I increase downsteam port number to 16 and set 1 irq bit, the > > for_each_set_bit() loop takes ~30 cycles and for() loop takes 25 > > cycles. It's a little surprise to me because I thought for() loop > > would cost 60+ cycles (3 times of the value when port number is 5). > > > > - if downstream port number is 16 and all irq status bits are set, > > then for_each_set_bit() loop takes 60-70 cycles and for() loop takes > > 30+ cycles. Cheers, Tao
Hi, Tao Ren <rentao.bupt@gmail.com> writes: > On Mon, Aug 17, 2020 at 04:49:32PM +0300, Felipe Balbi wrote: >> >> Hi, >> >> rentao.bupt@gmail.com writes: >> > From: Tao Ren <rentao.bupt@gmail.com> >> > >> > This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed: >> > improve vhub port irq handling"): for_each_set_bit() is replaced with >> > simple for() loop because for() loop runs faster on ASPEED BMC. >> > >> > Signed-off-by: Tao Ren <rentao.bupt@gmail.com> >> > --- >> > drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++------- >> > drivers/usb/gadget/udc/aspeed-vhub/vhub.h | 3 +++ >> > 2 files changed, 6 insertions(+), 7 deletions(-) >> > >> > diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c >> > index cdf96911e4b1..be7bb64e3594 100644 >> > --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c >> > +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c >> > @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data) >> > >> > /* Handle device interrupts */ >> > if (istat & vhub->port_irq_mask) { >> > - unsigned long bitmap = istat; >> > - int offset = VHUB_IRQ_DEV1_BIT; >> > - int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports; >> > - >> > - for_each_set_bit_from(offset, &bitmap, size) { >> > - i = offset - VHUB_IRQ_DEV1_BIT; >> > - ast_vhub_dev_irq(&vhub->ports[i].dev); >> > + for (i = 0; i < vhub->max_ports; i++) { >> > + if (istat & VHUB_DEV_IRQ(i)) >> > + ast_vhub_dev_irq(&vhub->ports[i].dev); >> >> how have you measured your statement above? for_each_set_bit() does >> exactly what you did. Unless your architecture has an instruction which >> helps finds the next set bit (like cls on ARM), which, then, makes it >> much faster. > > I did some testing and result shows for() loop runs faster than > for_each_set_bit() loop. Please refer to details below (discussion with > Benjamin in the original patch) and kindly let me know your suggestions. no strong feelings, just surprised that you're already worried about 20~40 cycles of cpu time ;-) Patch applied for next merge window
Hi, Tao Ren <rentao.bupt@gmail.com> writes: > On Mon, Aug 17, 2020 at 04:49:32PM +0300, Felipe Balbi wrote: >> >> Hi, >> >> rentao.bupt@gmail.com writes: >> > From: Tao Ren <rentao.bupt@gmail.com> >> > >> > This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed: >> > improve vhub port irq handling"): for_each_set_bit() is replaced with >> > simple for() loop because for() loop runs faster on ASPEED BMC. >> > >> > Signed-off-by: Tao Ren <rentao.bupt@gmail.com> >> > --- >> > drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++------- >> > drivers/usb/gadget/udc/aspeed-vhub/vhub.h | 3 +++ >> > 2 files changed, 6 insertions(+), 7 deletions(-) >> > >> > diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c >> > index cdf96911e4b1..be7bb64e3594 100644 >> > --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c >> > +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c >> > @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data) >> > >> > /* Handle device interrupts */ >> > if (istat & vhub->port_irq_mask) { >> > - unsigned long bitmap = istat; >> > - int offset = VHUB_IRQ_DEV1_BIT; >> > - int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports; >> > - >> > - for_each_set_bit_from(offset, &bitmap, size) { >> > - i = offset - VHUB_IRQ_DEV1_BIT; >> > - ast_vhub_dev_irq(&vhub->ports[i].dev); >> > + for (i = 0; i < vhub->max_ports; i++) { >> > + if (istat & VHUB_DEV_IRQ(i)) >> > + ast_vhub_dev_irq(&vhub->ports[i].dev); >> >> how have you measured your statement above? for_each_set_bit() does >> exactly what you did. Unless your architecture has an instruction which >> helps finds the next set bit (like cls on ARM), which, then, makes it >> much faster. > > I did some testing and result shows for() loop runs faster than > for_each_set_bit() loop. Please refer to details below (discussion with > Benjamin in the original patch) and kindly let me know your suggestions. no strong feelings, just surprised that you're already worried about 20~40 cycles of cpu time ;-) Patch applied for next merge window
Hi, Tao Ren <rentao.bupt@gmail.com> writes: >> > diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c >> > index cdf96911e4b1..be7bb64e3594 100644 >> > --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c >> > +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c >> > @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data) >> > >> > /* Handle device interrupts */ >> > if (istat & vhub->port_irq_mask) { >> > - unsigned long bitmap = istat; >> > - int offset = VHUB_IRQ_DEV1_BIT; >> > - int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports; >> > - >> > - for_each_set_bit_from(offset, &bitmap, size) { >> > - i = offset - VHUB_IRQ_DEV1_BIT; >> > - ast_vhub_dev_irq(&vhub->ports[i].dev); >> > + for (i = 0; i < vhub->max_ports; i++) { >> > + if (istat & VHUB_DEV_IRQ(i)) >> > + ast_vhub_dev_irq(&vhub->ports[i].dev); >> >> how have you measured your statement above? for_each_set_bit() does >> exactly what you did. Unless your architecture has an instruction which >> helps finds the next set bit (like cls on ARM), which, then, makes it >> much faster. > > I did some testing and result shows for() loop runs faster than > for_each_set_bit() loop. Please refer to details below (discussion with > Benjamin in the original patch) and kindly let me know your > suggestions. no strong feelings, just surprised you're already worried about 20~40 cycles of cpu time ;-) patch queued for next merge window
On Mon, Aug 31, 2020 at 12:54:57PM +0300, Felipe Balbi wrote: > > Hi, > > Tao Ren <rentao.bupt@gmail.com> writes: > > On Mon, Aug 17, 2020 at 04:49:32PM +0300, Felipe Balbi wrote: > >> > >> Hi, > >> > >> rentao.bupt@gmail.com writes: > >> > From: Tao Ren <rentao.bupt@gmail.com> > >> > > >> > This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed: > >> > improve vhub port irq handling"): for_each_set_bit() is replaced with > >> > simple for() loop because for() loop runs faster on ASPEED BMC. > >> > > >> > Signed-off-by: Tao Ren <rentao.bupt@gmail.com> > >> > --- > >> > drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++------- > >> > drivers/usb/gadget/udc/aspeed-vhub/vhub.h | 3 +++ > >> > 2 files changed, 6 insertions(+), 7 deletions(-) > >> > > >> > diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c > >> > index cdf96911e4b1..be7bb64e3594 100644 > >> > --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c > >> > +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c > >> > @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data) > >> > > >> > /* Handle device interrupts */ > >> > if (istat & vhub->port_irq_mask) { > >> > - unsigned long bitmap = istat; > >> > - int offset = VHUB_IRQ_DEV1_BIT; > >> > - int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports; > >> > - > >> > - for_each_set_bit_from(offset, &bitmap, size) { > >> > - i = offset - VHUB_IRQ_DEV1_BIT; > >> > - ast_vhub_dev_irq(&vhub->ports[i].dev); > >> > + for (i = 0; i < vhub->max_ports; i++) { > >> > + if (istat & VHUB_DEV_IRQ(i)) > >> > + ast_vhub_dev_irq(&vhub->ports[i].dev); > >> > >> how have you measured your statement above? for_each_set_bit() does > >> exactly what you did. Unless your architecture has an instruction which > >> helps finds the next set bit (like cls on ARM), which, then, makes it > >> much faster. > > > > I did some testing and result shows for() loop runs faster than > > for_each_set_bit() loop. Please refer to details below (discussion with > > Benjamin in the original patch) and kindly let me know your suggestions. > > no strong feelings, just surprised that you're already worried about > 20~40 cycles of cpu time ;-) > > Patch applied for next merge window Thanks Felipe. Ben had some concerns about interrupt handling cost on AST2400 BMC (ARM9), hence we did the comparison and noticed the small difference :) Cheers, Tao
diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c index cdf96911e4b1..be7bb64e3594 100644 --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data) /* Handle device interrupts */ if (istat & vhub->port_irq_mask) { - unsigned long bitmap = istat; - int offset = VHUB_IRQ_DEV1_BIT; - int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports; - - for_each_set_bit_from(offset, &bitmap, size) { - i = offset - VHUB_IRQ_DEV1_BIT; - ast_vhub_dev_irq(&vhub->ports[i].dev); + for (i = 0; i < vhub->max_ports; i++) { + if (istat & VHUB_DEV_IRQ(i)) + ast_vhub_dev_irq(&vhub->ports[i].dev); } } diff --git a/drivers/usb/gadget/udc/aspeed-vhub/vhub.h b/drivers/usb/gadget/udc/aspeed-vhub/vhub.h index 2e5a1ef14a75..87a5dea12d3c 100644 --- a/drivers/usb/gadget/udc/aspeed-vhub/vhub.h +++ b/drivers/usb/gadget/udc/aspeed-vhub/vhub.h @@ -67,6 +67,9 @@ #define VHUB_IRQ_HUB_EP0_SETUP (1 << 0) #define VHUB_IRQ_ACK_ALL 0x1ff +/* Downstream device IRQ mask. */ +#define VHUB_DEV_IRQ(n) (VHUB_IRQ_DEVICE1 << (n)) + /* SW reset reg */ #define VHUB_SW_RESET_EP_POOL (1 << 9) #define VHUB_SW_RESET_DMA_CONTROLLER (1 << 8)