diff mbox series

[GSoC,v2] Optimize ewah_bitmap.c for efficiency using trailing zeros for set bit iteration

Message ID 20240313223751.50816-1-garyan447@gmail.com (mailing list archive)
State New, archived
Headers show
Series [GSoC,v2] Optimize ewah_bitmap.c for efficiency using trailing zeros for set bit iteration | expand

Commit Message

Aryan Gupta March 13, 2024, 10:37 p.m. UTC
Signed-off-by: Aryan Gupta <garyan447@gmail.com>
---

Thank you Vicent for the guidance. I am still not sure how 
to do the performance measurement for this improvement. Any 
guidance would be appreciated. 

 ewah/ewah_bitmap.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Comments

karthik nayak March 18, 2024, 3:08 a.m. UTC | #1
Aryan Gupta <garyan447@gmail.com> writes:

Hello,

> Signed-off-by: Aryan Gupta <garyan447@gmail.com>
> ---
>
> Thank you Vicent for the guidance. I am still not sure how
> to do the performance measurement for this improvement. Any
> guidance would be appreciated.
>

I guess there is some off-list discussion here. That along with the fact
that the commit message is missing makes it really hard to understand
how this is better than what was here already.

The guidelines ('Documentation/SubmittingPatches') also state how to
draft the commit message. This patch only seems to have a title, it is
recommend to add a description as to why this change is being made.

>
>  ewah/ewah_bitmap.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
> index 8785cbc54a..1a75f50682 100644
> --- a/ewah/ewah_bitmap.c
> +++ b/ewah/ewah_bitmap.c
> @@ -257,12 +257,15 @@ void ewah_each_bit(struct ewah_bitmap *self, void (*callback)(size_t, void*), vo
>  		for (k = 0; k < rlw_get_literal_words(word); ++k) {
>  			int c;
>
> -			/* todo: zero count optimization */
> -			for (c = 0; c < BITS_IN_EWORD; ++c, ++pos) {
> -				if ((self->buffer[pointer] & ((eword_t)1 << c)) != 0)
> -					callback(pos, payload);
> +			eword_t bitset = self->buffer[pointer];
> +			while(bitset != 0) {
> +				eword_t t = bitset & -bitset;
> +				int r = __builtin_ctzl(bitset);
> +				bitset ^= t;
> +				callback(pos+r, payload);
>  			}
> -
> +			
> +			pos += BITS_IN_EWORD;
>  			++pointer;
>  		}
>  	}

The bit manipulation done here is slightly hard to comprehend, it would
be nice if you could also add some comments as to what is being done
here and why.
Junio C Hamano March 18, 2024, 3:42 p.m. UTC | #2
Karthik Nayak <karthik.188@gmail.com> writes:

> Aryan Gupta <garyan447@gmail.com> writes:
>
> Hello,
>
>> Signed-off-by: Aryan Gupta <garyan447@gmail.com>
>> ---
>>
>> Thank you Vicent for the guidance. I am still not sure how
>> to do the performance measurement for this improvement. Any
>> guidance would be appreciated.
>>
>
> I guess there is some off-list discussion here. That along with the fact
> that the commit message is missing makes it really hard to understand
> how this is better than what was here already.
>
> The guidelines ('Documentation/SubmittingPatches') also state how to
> draft the commit message. This patch only seems to have a title, it is
> recommend to add a description as to why this change is being made.

Yes.

>> diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
>> index 8785cbc54a..1a75f50682 100644
>> --- a/ewah/ewah_bitmap.c
>> +++ b/ewah/ewah_bitmap.c
>> @@ -257,12 +257,15 @@ void ewah_each_bit(struct ewah_bitmap *self, void (*callback)(size_t, void*), vo
>>  		for (k = 0; k < rlw_get_literal_words(word); ++k) {
>>  			int c;
>>
>> -			/* todo: zero count optimization */
>> -			for (c = 0; c < BITS_IN_EWORD; ++c, ++pos) {
>> -				if ((self->buffer[pointer] & ((eword_t)1 << c)) != 0)
>> -					callback(pos, payload);
>> +			eword_t bitset = self->buffer[pointer];
>> +			while(bitset != 0) {
>> +				eword_t t = bitset & -bitset;
>> +				int r = __builtin_ctzl(bitset);
>> +				bitset ^= t;
>> +				callback(pos+r, payload);
>>  			}
>> -
>> +			
>> +			pos += BITS_IN_EWORD;
>>  			++pointer;
>>  		}
>>  	}
>
> The bit manipulation done here is slightly hard to comprehend, it would
> be nice if you could also add some comments as to what is being done
> here and why.

In addition, this patch assumes that __builtin_ctzl() function is
always available no matter what environment the code is built on,
which I am not sure is a safe.  Quite honestory, I suspect that the
whole of "todo" is to seamlessly detect the presense of the builtin
support to count the top zero bit, use it only when it is there, and
giving a fallback implementation when it does not exist.  The code
itself to use the builtin is only 20% of that effort ;-)

And of course, there is benchmark.  To show how much better
performance gets for people with that function, and more importantly
to show that the performance does not degrade for those who are
without.

Thanks.
diff mbox series

Patch

diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index 8785cbc54a..1a75f50682 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -257,12 +257,15 @@  void ewah_each_bit(struct ewah_bitmap *self, void (*callback)(size_t, void*), vo
 		for (k = 0; k < rlw_get_literal_words(word); ++k) {
 			int c;
 
-			/* todo: zero count optimization */
-			for (c = 0; c < BITS_IN_EWORD; ++c, ++pos) {
-				if ((self->buffer[pointer] & ((eword_t)1 << c)) != 0)
-					callback(pos, payload);
+			eword_t bitset = self->buffer[pointer]; 
+			while(bitset != 0) {
+				eword_t t = bitset & -bitset; 
+				int r = __builtin_ctzl(bitset); 
+				bitset ^= t; 
+				callback(pos+r, payload);
 			}
-
+			
+			pos += BITS_IN_EWORD;
 			++pointer;
 		}
 	}