diff mbox series

[2/2] util/bufferiszero: improve avx2 accelerator

Message ID 1585119021-46593-2-git-send-email-robert.hu@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [1/2] util/bufferiszero: assign length_to_accel value for each accelerator case | expand

Commit Message

Robert Hoo March 25, 2020, 6:50 a.m. UTC
By increasing avx2 length_to_accel to 128, we can simplify its logic and reduce a
branch.

The authorship of this patch actually belongs to Richard Henderson <richard.henderson@linaro.org>,
I just fix a boudary case on his original patch.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
---
 util/bufferiszero.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

Comments

Eric Blake March 25, 2020, 12:54 p.m. UTC | #1
On 3/25/20 1:50 AM, Robert Hoo wrote:
> By increasing avx2 length_to_accel to 128, we can simplify its logic and reduce a
> branch.
> 
> The authorship of this patch actually belongs to Richard Henderson <richard.henderson@linaro.org>,

Long line; it's nice to wrap commit messages around column 70 or so 
(because reading 'git log' in an 80-column window adds indentation).

> I just fix a boudary case on his original patch.

boundary

> 
> Suggested-by: Richard Henderson <richard.henderson@linaro.org>
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> ---
>   util/bufferiszero.c | 26 +++++++++-----------------
>   1 file changed, 9 insertions(+), 17 deletions(-)
>
Robert Hoo March 26, 2020, 2:09 a.m. UTC | #2
(Don't know why my Linux-Evolution missed this mail.)
> -----Original Message-----
> From: Eric Blake <eblake@redhat.com>
> Sent: Wednesday, March 25, 2020 20:54
> To: Robert Hoo <robert.hu@linux.intel.com>; qemu-devel@nongnu.org;
> pbonzini@redhat.com; richard.henderson@linaro.org
> Cc: Hu, Robert <robert.hu@intel.com>
> Subject: Re: [PATCH 2/2] util/bufferiszero: improve avx2 accelerator
> 
> On 3/25/20 1:50 AM, Robert Hoo wrote:
> > By increasing avx2 length_to_accel to 128, we can simplify its logic
> > and reduce a branch.
> >
> > The authorship of this patch actually belongs to Richard Henderson
> > <richard.henderson@linaro.org>,
> 
> Long line; it's nice to wrap commit messages around column 70 or so (because
> reading 'git log' in an 80-column window adds indentation).
> 
[Hu, Robert] 
I think I set my vim on wrap. This probably escaped by paste.
I ran checkpatch.pl on the patches before sending. It escaped check but didn't
escaped your eagle eye
Paolo Bonzini March 26, 2020, 9:43 a.m. UTC | #3
On 26/03/20 03:09, Hu, Robert wrote:
> BTW, do I need to resend these 2 patches?

No, thanks!  I have queued them.

Paolo
Eric Blake March 26, 2020, 1:26 p.m. UTC | #4
On 3/25/20 9:09 PM, Hu, Robert wrote:
> (Don't know why my Linux-Evolution missed this mail.)
>> -----Original Message-----

>> Long line; it's nice to wrap commit messages around column 70 or so (because
>> reading 'git log' in an 80-column window adds indentation).
>>
> [Hu, Robert]
> I think I set my vim on wrap. This probably escaped by paste.
> I ran checkpatch.pl on the patches before sending. It escaped check but didn't
> escaped your eagle eye
Robert Hoo March 26, 2020, 1:51 p.m. UTC | #5
On Thu, 2020-03-26 at 08:26 -0500, Eric Blake wrote:
> On 3/25/20 9:09 PM, Hu, Robert wrote:
> > (Don't know why my Linux-Evolution missed this mail.)
> > > -----Original Message-----
> > > Long line; it's nice to wrap commit messages around column 70 or
> > > so (because
> > > reading 'git log' in an 80-column window adds indentation).
> > > 
> > 
> > [Hu, Robert]
> > I think I set my vim on wrap. This probably escaped by paste.
> > I ran checkpatch.pl on the patches before sending. It escaped check
> > but didn't
> > escaped your eagle eye
diff mbox series

Patch

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index b801253..695bb4c 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -158,27 +158,19 @@  buffer_zero_avx2(const void *buf, size_t len)
     __m256i *p = (__m256i *)(((uintptr_t)buf + 5 * 32) & -32);
     __m256i *e = (__m256i *)(((uintptr_t)buf + len) & -32);
 
-    if (likely(p <= e)) {
-        /* Loop over 32-byte aligned blocks of 128.  */
-        do {
-            __builtin_prefetch(p);
-            if (unlikely(!_mm256_testz_si256(t, t))) {
-                return false;
-            }
-            t = p[-4] | p[-3] | p[-2] | p[-1];
-            p += 4;
-        } while (p <= e);
-    } else {
-        t |= _mm256_loadu_si256(buf + 32);
-        if (len <= 128) {
-            goto last2;
+    /* Loop over 32-byte aligned blocks of 128.  */
+    while (p <= e) {
+        __builtin_prefetch(p);
+        if (unlikely(!_mm256_testz_si256(t, t))) {
+            return false;
         }
-    }
+        t = p[-4] | p[-3] | p[-2] | p[-1];
+        p += 4;
+    } ;
 
     /* Finish the last block of 128 unaligned.  */
     t |= _mm256_loadu_si256(buf + len - 4 * 32);
     t |= _mm256_loadu_si256(buf + len - 3 * 32);
- last2:
     t |= _mm256_loadu_si256(buf + len - 2 * 32);
     t |= _mm256_loadu_si256(buf + len - 1 * 32);
 
@@ -263,7 +255,7 @@  static void init_accel(unsigned cache)
     }
     if (cache & CACHE_AVX2) {
         fn = buffer_zero_avx2;
-        length_to_accel = 64;
+        length_to_accel = 128;
     }
 #endif
 #ifdef CONFIG_AVX512F_OPT