diff mbox

[v4,09/12,media] vivid: Local optimization

Message ID 1468845736-19651-10-git-send-email-ricardo.ribalda@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ricardo Ribalda Delgado July 18, 2016, 12:42 p.m. UTC
Avoid duplicated data shifts when possible.

Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
---
 drivers/media/common/v4l2-tpg/v4l2-tpg-core.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

Comments

Philipp Zabel July 18, 2016, 1:13 p.m. UTC | #1
Hi Ricardo,

Am Montag, den 18.07.2016, 14:42 +0200 schrieb Ricardo Ribalda Delgado:
> Avoid duplicated data shifts when possible.
> 
> Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
> ---
>  drivers/media/common/v4l2-tpg/v4l2-tpg-core.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c b/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
> index a26172575e56..7f284c591f25 100644
> --- a/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
> +++ b/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
> @@ -919,13 +919,14 @@ static void precalculate_color(struct tpg_data *tpg, int k)
>  			color_to_ycbcr(tpg, r, g, b, &y, &cb, &cr);
>  
>  		if (tpg->real_quantization == V4L2_QUANTIZATION_LIM_RANGE) {
> -			y = clamp(y, 16 << 4, 235 << 4);
> -			cb = clamp(cb, 16 << 4, 240 << 4);
> -			cr = clamp(cr, 16 << 4, 240 << 4);
> +			y = clamp(y >> 4, 16, 235);
> +			cb = clamp(cb >> 4, 16, 240);
> +			cr = clamp(cr >> 4, 16, 240);

Since the constant expressions are evaluated at compile time, you are
not actually removing shifts. The code generated for precalculate_color
by gcc 5.4 even grows by one asr instruction with this patch.

> +		} else {
> +			y = clamp(y >> 4, 1, 254);
> +			cb = clamp(cb >> 4, 1, 254);
> +			cr = clamp(cr >> 4, 1, 254);
>  		}
> -		y = clamp(y >> 4, 1, 254);
> -		cb = clamp(cb >> 4, 1, 254);
> -		cr = clamp(cr >> 4, 1, 254);
>  		switch (tpg->fourcc) {
>  		case V4L2_PIX_FMT_YUV444:
>  			y >>= 4;

regards
Philipp

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ricardo Ribalda Delgado July 18, 2016, 1:21 p.m. UTC | #2
Hi Philipp

On Mon, Jul 18, 2016 at 3:13 PM, Philipp Zabel <p.zabel@pengutronix.de> wrote:
> Since the constant expressions are evaluated at compile time, you are
> not actually removing shifts. The code generated for precalculate_color
> by gcc 5.4 even grows by one asr instruction with this patch.
>

I dont think that I follow you completely here. The original code was

if (a)
   y= clamp(y, 16<<4, 235<<4)

y = clamp(y>>4, 1, 254)


And now is

if (a)
   y= clamp(y >>4, 16, 235)
else
    y = clamp(y, 1, 254)


On the previous case, when a was true there was 2 clamp operations.
Now it is only one.


Best regards!
Philipp Zabel July 18, 2016, 2:16 p.m. UTC | #3
Hi Ricardo,

Am Montag, den 18.07.2016, 15:21 +0200 schrieb Ricardo Ribalda Delgado:
> Hi Philipp
> 
> On Mon, Jul 18, 2016 at 3:13 PM, Philipp Zabel <p.zabel@pengutronix.de> wrote:
> > Since the constant expressions are evaluated at compile time, you are
> > not actually removing shifts. The code generated for precalculate_color
> > by gcc 5.4 even grows by one asr instruction with this patch.
> >
> 
> I dont think that I follow you completely here. The original code was

Sorry, I forgot to mention I compiled both versions for ARMv7-A, saw
that object size increased, had a look the diff between objdump -d
outputs and noticed an additional shift instruction. I have not checked
this for x86_64.

> if (a)
>    y= clamp(y, 16<<4, 235<<4)
> 
> y = clamp(y>>4, 1, 254)
>
> And now is
> 
> if (a)
>    y= clamp(y >>4, 16, 235)
> else
>     y = clamp(y, 1, 254)
     y = clamp(y >>4, 1, 254)

> On the previous case, when a was true there was 2 clamp operations.
> Now it is only one.

Yes. And now there's two shift operations (overall, still just one in
each conditional path).

It seems in my case the compiler was not clever enough to move all the
right shifts out of the conditional paths, so I ended up with one more
than before. You are right that in the limited range path the second
clamps are now avoided though. Basically, feel free to disregard my
comment.

I had the best looking result with this variant, btw:

	y >>= 4;
	cb >>= 4;
	cr >>= 4;
	if (tpg->real_quantization == V4L2_QUANTIZATION_LIM_RANGE) {
		y = clamp(y, 16, 235);
		cb = clamp(cb, 16, 240);
		cr = clamp(cr, 16, 240);
	} else {
		y = clamp(y, 1, 254);
		cb = clamp(cb, 1, 254);
		cr = clamp(cr, 1, 254);
	}

regards
Philipp

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ricardo Ribalda Delgado July 18, 2016, 7:41 p.m. UTC | #4
Hi Philipp:

On Mon, Jul 18, 2016 at 4:16 PM, Philipp Zabel <p.zabel@pengutronix.de> wrote:

> I had the best looking result with this variant, btw:
>
>         y >>= 4;
>         cb >>= 4;
>         cr >>= 4;
>         if (tpg->real_quantization == V4L2_QUANTIZATION_LIM_RANGE) {
>                 y = clamp(y, 16, 235);
>                 cb = clamp(cb, 16, 240);
>                 cr = clamp(cr, 16, 240);
>         } else {
>                 y = clamp(y, 1, 254);
>                 cb = clamp(cb, 1, 254);
>                 cr = clamp(cr, 1, 254);
>         }

I like this variant much better than mine. I have applied to my local
tree. So it will be what I post on v5

Thanks for you comments :)


btw: it is scary what the compiler optimizations are capable of doing these days
diff mbox

Patch

diff --git a/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c b/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
index a26172575e56..7f284c591f25 100644
--- a/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
+++ b/drivers/media/common/v4l2-tpg/v4l2-tpg-core.c
@@ -919,13 +919,14 @@  static void precalculate_color(struct tpg_data *tpg, int k)
 			color_to_ycbcr(tpg, r, g, b, &y, &cb, &cr);
 
 		if (tpg->real_quantization == V4L2_QUANTIZATION_LIM_RANGE) {
-			y = clamp(y, 16 << 4, 235 << 4);
-			cb = clamp(cb, 16 << 4, 240 << 4);
-			cr = clamp(cr, 16 << 4, 240 << 4);
+			y = clamp(y >> 4, 16, 235);
+			cb = clamp(cb >> 4, 16, 240);
+			cr = clamp(cr >> 4, 16, 240);
+		} else {
+			y = clamp(y >> 4, 1, 254);
+			cb = clamp(cb >> 4, 1, 254);
+			cr = clamp(cr >> 4, 1, 254);
 		}
-		y = clamp(y >> 4, 1, 254);
-		cb = clamp(cb >> 4, 1, 254);
-		cr = clamp(cr >> 4, 1, 254);
 		switch (tpg->fourcc) {
 		case V4L2_PIX_FMT_YUV444:
 			y >>= 4;