diff mbox series

[v2,07/67] fscache: Implement a hash function

Message ID 163906888735.143852.10944614318596881429.stgit@warthog.procyon.org.uk (mailing list archive)
State New, archived
Headers show
Series fscache, cachefiles: Rewrite | expand

Commit Message

David Howells Dec. 9, 2021, 4:54 p.m. UTC
Implement a function to generate hashes.  It needs to be stable over time
and endianness-independent as the hashes will appear on disk in future
patches.  It can assume that its input is a multiple of four bytes in size
and alignment.

This is borrowed from the VFS and simplified.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/163819586113.215744.1699465806130102367.stgit@warthog.procyon.org.uk/ # v1
---

 fs/fscache/internal.h |    2 ++
 fs/fscache/main.c     |   39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

Comments

Linus Torvalds Dec. 9, 2021, 5:12 p.m. UTC | #1
On Thu, Dec 9, 2021 at 8:54 AM David Howells <dhowells@redhat.com> wrote:
>
> Implement a function to generate hashes.  It needs to be stable over time
> and endianness-independent as the hashes will appear on disk in future
> patches.

I'm not actually seeing this being endianness-independent.

Is the input just regular 32-bit data in native word order? Because
then it's not endianness-independent, it's purely that there *is* no
endianness to the data at all and it is purely native data.

So the code may be correct, but the explanation is confusing. There is
absolutely nothing here that is about endianness.

           Linus
David Howells Dec. 9, 2021, 9:57 p.m. UTC | #2
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > Implement a function to generate hashes.  It needs to be stable over time
> > and endianness-independent as the hashes will appear on disk in future
> > patches.
> 
> I'm not actually seeing this being endianness-independent.
> 
> Is the input just regular 32-bit data in native word order? Because
> then it's not endianness-independent, it's purely that there *is* no
> endianness to the data at all and it is purely native data.
>
> So the code may be correct, but the explanation is confusing. There is
> absolutely nothing here that is about endianness.

What I'm trying to get at is that the hash needs to be consistent, no matter
the endianness of the cpu, for any particular input blob.  The hashing
function shouldn't need to know the structure of the input blob.  In the case
of the volume key, it's a padded printable string; in the case of the cookie
key, it's probably some sort of structured blob, quite possibly an actual
array of be32.

The reason it needs to be consistent is that people seem to like seeding the
cache by tarring up the cache from one machine and untarring it on another.

And looking again at my code:

unsigned int fscache_hash(unsigned int salt, unsigned int *data, unsigned int n)
{
	unsigned int a, x = 0, y = salt;

	for (; n; n--) {
		a = *data++;   <<<<<<<
		HASH_MIX(x, y, a);
	}
	return fold_hash(x, y);
}

The marked line should probably use something like le/be32_to_cpu().

I also need to fix 9p to canonicalise its cookie key.

Thanks for catching that,
David
Linus Torvalds Dec. 9, 2021, 10:07 p.m. UTC | #3
On Thu, Dec 9, 2021 at 1:57 PM David Howells <dhowells@redhat.com> wrote:
>
> What I'm trying to get at is that the hash needs to be consistent, no matter
> the endianness of the cpu, for any particular input blob.

Yeah, if that's the case, then you should probably make that "unsigned
int *data" argument probably just be "void *" and then:

>                 a = *data++;   <<<<<<<
>                 HASH_MIX(x, y, a);
>         }
>         return fold_hash(x, y);
> }
>
> The marked line should probably use something like le/be32_to_cpu().

Yes, it should be using a '__le32 *' inside that function and you
should use l32_to_cpu(). Obviously, BE would work too, but cause
unnecessary work on common hardware.

But as mentioned for the other patches, you should then also be a lot
more careful about always using the end result as an 'unsigned int'
(or maybe 'u32') too, and when comparing hashes for binary search or
other, you should always do th4e compare in some stable format.

Because doing

        return (long)hash_a - (long)hash_b;

and looking at the sign doesn't actually result in a stable ordering
on 32-bit architectures. You don't get a transitive ordering (ie a < b
and b < c doesn't imply a < c).

And presumably if the hashes are meaningful across machines, then hash
comparisons should also be meaningful across machines.

So when comparing hashes, you need to compare them either in a truly
bigger signed type (and make sure that doesn't get truncated) - kind
of like how a lot of 'memcmp()' functions do 'unsigned char'
subtractions in an 'int' - or you need to compare them _as_ 'unsigned
int'.

Otherwise the comparisons will be all kinds of messed up.

          Linus
David Howells Dec. 10, 2021, 2:35 p.m. UTC | #4
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > What I'm trying to get at is that the hash needs to be consistent, no matter
> > the endianness of the cpu, for any particular input blob.
> 
> Yeah, if that's the case, then you should probably make that "unsigned
> int *data" argument probably just be "void *" and then:
> 
> >                 a = *data++;   <<<<<<<
> >                 HASH_MIX(x, y, a);
> >         }
> >         return fold_hash(x, y);
> > }
> >
> > The marked line should probably use something like le/be32_to_cpu().
> 
> Yes, it should be using a '__le32 *' inside that function and you
> should use l32_to_cpu(). Obviously, BE would work too, but cause
> unnecessary work on common hardware.

Okay, how about I make the attached change to make the hashing stable?  This
will make fscache_hash() take an opaque buffer and a length (the length must
be a multiple of four).

David
---
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index e287952292c5..65cf2ae22a70 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -269,22 +269,23 @@ EXPORT_SYMBOL(fscache_caching_failed);
 static int fscache_set_key(struct fscache_cookie *cookie,
 			   const void *index_key, size_t index_key_len)
 {
-	u32 *buf;
-	int bufs;
+	void *buf;
+	size_t buf_size;
 
-	bufs = DIV_ROUND_UP(index_key_len, sizeof(*buf));
+	buf_size = round_up(index_key_len, sizeof(__le32));
 
 	if (index_key_len > sizeof(cookie->inline_key)) {
-		buf = kcalloc(bufs, sizeof(*buf), GFP_KERNEL);
+		buf = kzalloc(buf_size, GFP_KERNEL);
 		if (!buf)
 			return -ENOMEM;
 		cookie->key = buf;
 	} else {
-		buf = (u32 *)cookie->inline_key;
+		buf = cookie->inline_key;
 	}
 
 	memcpy(buf, index_key, index_key_len);
-	cookie->key_hash = fscache_hash(cookie->volume->key_hash, buf, bufs);
+	cookie->key_hash = fscache_hash(cookie->volume->key_hash,
+					buf, buf_size);
 	return 0;
 }
 
diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index 87884f4b34fb..f121c21590dc 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -86,7 +86,7 @@ static inline void fscache_end_operation(struct netfs_cache_resources *cres)
  */
 extern unsigned fscache_debug;
 
-extern unsigned int fscache_hash(unsigned int salt, unsigned int *data, unsigned int n);
+extern unsigned int fscache_hash(unsigned int salt, const void *data, size_t len);
 
 /*
  * proc.c
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index 01d57433702c..dad85fd84f6f 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -53,15 +53,16 @@ static inline unsigned int fold_hash(unsigned long x, unsigned long y)
 /*
  * Generate a hash.  This is derived from full_name_hash(), but we want to be
  * sure it is arch independent and that it doesn't change as bits of the
- * computed hash value might appear on disk.  The caller also guarantees that
- * the hashed data will be a series of aligned 32-bit words.
+ * computed hash value might appear on disk.  The caller must guarantee that
+ * the source data is a multiple of four bytes in size.
  */
-unsigned int fscache_hash(unsigned int salt, unsigned int *data, unsigned int n)
+unsigned int fscache_hash(unsigned int salt, const void *data, size_t len)
 {
-	unsigned int a, x = 0, y = salt;
+	const __le32 *p = data;
+	unsigned int a, x = 0, y = salt, n = len / sizeof(__le32);
 
 	for (; n; n--) {
-		a = *data++;
+		a = le32_to_cpu(*p++);
 		HASH_MIX(x, y, a);
 	}
 	return fold_hash(x, y);
diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c
index edd3c245010e..26a6b8f315e1 100644
--- a/fs/fscache/volume.c
+++ b/fs/fscache/volume.c
@@ -131,7 +131,7 @@ static long fscache_compare_volume(const struct fscache_volume *a,
 	if (a->key[0] != b->key[0])
 		return (long)a->key[0]   - (long)b->key[0];
 
-	klen = round_up(a->key[0] + 1, sizeof(unsigned int));
+	klen = round_up(a->key[0] + 1, sizeof(__le32));
 	return memcmp(a->key, b->key, klen);
 }
 
@@ -225,7 +225,7 @@ static struct fscache_volume *fscache_alloc_volume(const char *volume_key,
 	 * hashing easier.
 	 */
 	klen = strlen(volume_key);
-	hlen = round_up(1 + klen + 1, sizeof(unsigned int));
+	hlen = round_up(1 + klen + 1, sizeof(__le32));
 	key = kzalloc(hlen, GFP_KERNEL);
 	if (!key)
 		goto err_vol;
@@ -233,8 +233,7 @@ static struct fscache_volume *fscache_alloc_volume(const char *volume_key,
 	memcpy(key + 1, volume_key, klen);
 
 	volume->key = key;
-	volume->key_hash = fscache_hash(0, (unsigned int *)key,
-					hlen / sizeof(unsigned int));
+	volume->key_hash = fscache_hash(0, key, hlen);
 
 	volume->debug_id = atomic_inc_return(&fscache_volume_debug_id);
 	down_write(&fscache_addremove_sem);
David Howells Dec. 10, 2021, 2:41 p.m. UTC | #5
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> But as mentioned for the other patches, you should then also be a lot
> more careful about always using the end result as an 'unsigned int'
> (or maybe 'u32') too, and when comparing hashes for binary search or
> other, you should always do th4e compare in some stable format.
> 
> Because doing
> 
>         return (long)hash_a - (long)hash_b;
> 
> and looking at the sign doesn't actually result in a stable ordering
> on 32-bit architectures. You don't get a transitive ordering (ie a < b
> and b < c doesn't imply a < c).
> 
> And presumably if the hashes are meaningful across machines, then hash
> comparisons should also be meaningful across machines.
> 
> So when comparing hashes, you need to compare them either in a truly
> bigger signed type (and make sure that doesn't get truncated) - kind
> of like how a lot of 'memcmp()' functions do 'unsigned char'
> subtractions in an 'int' - or you need to compare them _as_ 'unsigned
> int'.
> 
> Otherwise the comparisons will be all kinds of messed up.

I don't think it should matter in this case, as the in-memory hash tables are
an independent of what's on disk (and not even necessarily the same size).
They're only used for duplicate detection.  The idea was to be able to shorten
the scanning of a hash bucket by half on average, but I didn't make it do
that.  It's just that I use the same hash value as a quick check first.

However, since the comparator functions are only used to see if they're the
same or different, the attached change makes them return bool instead, no
cast or subtraction required.

David
---
diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 65cf2ae22a70..ca36b598d6b5 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -289,17 +289,15 @@ static int fscache_set_key(struct fscache_cookie *cookie,
 	return 0;
 }
 
-static long fscache_compare_cookie(const struct fscache_cookie *a,
-				   const struct fscache_cookie *b)
+static bool fscache_cookie_same(const struct fscache_cookie *a,
+				const struct fscache_cookie *b)
 {
 	const void *ka, *kb;
 
-	if (a->key_hash != b->key_hash)
-		return (long)a->key_hash - (long)b->key_hash;
-	if (a->volume != b->volume)
-		return (long)a->volume - (long)b->volume;
-	if (a->key_len != b->key_len)
-		return (long)a->key_len - (long)b->key_len;
+	if (a->key_hash	!= b->key_hash ||
+	    a->volume	!= b->volume ||
+	    a->key_len	!= b->key_len)
+		return false;
 
 	if (a->key_len <= sizeof(a->inline_key)) {
 		ka = &a->inline_key;
@@ -308,7 +306,7 @@ static long fscache_compare_cookie(const struct fscache_cookie *a,
 		ka = a->key;
 		kb = b->key;
 	}
-	return memcmp(ka, kb, a->key_len);
+	return memcmp(ka, kb, a->key_len) == 0;
 }
 
 static atomic_t fscache_cookie_debug_id = ATOMIC_INIT(1);
@@ -399,7 +397,7 @@ static bool fscache_hash_cookie(struct fscache_cookie *candidate)
 
 	hlist_bl_lock(h);
 	hlist_bl_for_each_entry(cursor, p, h, hash_link) {
-		if (fscache_compare_cookie(candidate, cursor) == 0) {
+		if (fscache_cookie_same(candidate, cursor)) {
 			if (!test_bit(FSCACHE_COOKIE_RELINQUISHED, &cursor->flags))
 				goto collision;
 			wait_for = fscache_get_cookie(cursor,
diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c
index 26a6b8f315e1..0e80fd8fd14a 100644
--- a/fs/fscache/volume.c
+++ b/fs/fscache/volume.c
@@ -119,20 +119,18 @@ void fscache_end_volume_access(struct fscache_volume *volume,
 }
 EXPORT_SYMBOL(fscache_end_volume_access);
 
-static long fscache_compare_volume(const struct fscache_volume *a,
-				   const struct fscache_volume *b)
+static bool fscache_volume_same(const struct fscache_volume *a,
+				const struct fscache_volume *b)
 {
 	size_t klen;
 
-	if (a->key_hash != b->key_hash)
-		return (long)a->key_hash - (long)b->key_hash;
-	if (a->cache != b->cache)
-		return (long)a->cache    - (long)b->cache;
-	if (a->key[0] != b->key[0])
-		return (long)a->key[0]   - (long)b->key[0];
+	if (a->key_hash	!= b->key_hash ||
+	    a->cache	!= b->cache ||
+	    a->key[0]	!= b->key[0])
+		return false;
 
 	klen = round_up(a->key[0] + 1, sizeof(__le32));
-	return memcmp(a->key, b->key, klen);
+	return memcmp(a->key, b->key, klen) == 0;
 }
 
 static bool fscache_is_acquire_pending(struct fscache_volume *volume)
@@ -170,7 +168,7 @@ static bool fscache_hash_volume(struct fscache_volume *candidate)
 
 	hlist_bl_lock(h);
 	hlist_bl_for_each_entry(cursor, p, h, hash_link) {
-		if (fscache_compare_volume(candidate, cursor) == 0) {
+		if (fscache_volume_same(candidate, cursor)) {
 			if (!test_bit(FSCACHE_VOLUME_RELINQUISHED, &cursor->flags))
 				goto collision;
 			fscache_see_volume(cursor, fscache_volume_get_hash_collision);
@@ -335,7 +333,7 @@ static void fscache_wake_pending_volume(struct fscache_volume *volume,
 	struct hlist_bl_node *p;
 
 	hlist_bl_for_each_entry(cursor, p, h, hash_link) {
-		if (fscache_compare_volume(cursor, volume) == 0) {
+		if (fscache_volume_same(cursor, volume)) {
 			fscache_see_volume(cursor, fscache_volume_see_hash_wake);
 			clear_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &cursor->flags);
 			wake_up_bit(&cursor->flags, FSCACHE_VOLUME_ACQUIRE_PENDING);
Linus Torvalds Dec. 10, 2021, 5:33 p.m. UTC | #6
On Fri, Dec 10, 2021 at 6:41 AM David Howells <dhowells@redhat.com> wrote:
>
> However, since the comparator functions are only used to see if they're the
> same or different, the attached change makes them return bool instead, no
> cast or subtraction required.

Ok, thanks, that resolves my worries.

Which is not to say it all works - I obviously only scanned the
patches rather than testing anything.

                Linus
diff mbox series

Patch

diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h
index ea52f8594a77..64767992bd15 100644
--- a/fs/fscache/internal.h
+++ b/fs/fscache/internal.h
@@ -22,6 +22,8 @@ 
  */
 extern unsigned fscache_debug;
 
+extern unsigned int fscache_hash(unsigned int salt, unsigned int *data, unsigned int n);
+
 /*
  * proc.c
  */
diff --git a/fs/fscache/main.c b/fs/fscache/main.c
index 819de2ee1276..a4afba1b9d3b 100644
--- a/fs/fscache/main.c
+++ b/fs/fscache/main.c
@@ -24,6 +24,45 @@  MODULE_PARM_DESC(fscache_debug,
 struct workqueue_struct *fscache_wq;
 EXPORT_SYMBOL(fscache_wq);
 
+/*
+ * Mixing scores (in bits) for (7,20):
+ * Input delta: 1-bit      2-bit
+ * 1 round:     330.3     9201.6
+ * 2 rounds:   1246.4    25475.4
+ * 3 rounds:   1907.1    31295.1
+ * 4 rounds:   2042.3    31718.6
+ * Perfect:    2048      31744
+ *            (32*64)   (32*31/2 * 64)
+ */
+#define HASH_MIX(x, y, a)	\
+	(	x ^= (a),	\
+	y ^= x,	x = rol32(x, 7),\
+	x += y,	y = rol32(y,20),\
+	y *= 9			)
+
+static inline unsigned int fold_hash(unsigned long x, unsigned long y)
+{
+	/* Use arch-optimized multiply if one exists */
+	return __hash_32(y ^ __hash_32(x));
+}
+
+/*
+ * Generate a hash.  This is derived from full_name_hash(), but we want to be
+ * sure it is arch independent and that it doesn't change as bits of the
+ * computed hash value might appear on disk.  The caller also guarantees that
+ * the hashed data will be a series of aligned 32-bit words.
+ */
+unsigned int fscache_hash(unsigned int salt, unsigned int *data, unsigned int n)
+{
+	unsigned int a, x = 0, y = salt;
+
+	for (; n; n--) {
+		a = *data++;
+		HASH_MIX(x, y, a);
+	}
+	return fold_hash(x, y);
+}
+
 /*
  * initialise the fs caching module
  */