diff mbox series

[GSoC,v2] t: port helper/test-oid-array.c to unit-tests/t-oid-array.c

Message ID 20240803132206.72166-1-shyamthakkar001@gmail.com (mailing list archive)
State Superseded
Headers show
Series [GSoC,v2] t: port helper/test-oid-array.c to unit-tests/t-oid-array.c | expand

Commit Message

Ghanshyam Thakkar Aug. 3, 2024, 1:21 p.m. UTC
helper/test-oid-array.c along with t0064-oid-array.sh test the
oid-array.h library, which provides storage and processing
efficiency over large lists of object identifiers.

Migrate them to the unit testing framework for better runtime
performance and efficiency. Also 'the_hash_algo' is used internally in
oid_array_lookup(), but we do not initialize a repository directory,
therefore initialize the_hash_algo manually. And
init_hash_algo():lib-oid.c can aid in this process, so make it public.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
---
Changes in v2:
- removed the use of internal test__run_*() functions.
- TEST_LOOKUP() is entirely changed where it now accepts a lower bound
  and an upper bound for checking return values of oid_array_lookup(),
  instead of passing the condition as a whole literal. (i.e.
   v1: TEST_LOOKUP(..., ret < 0, ...)
   v2: TEST_LOOKUP(..., INT_MIN, -1, ...)
  )
- TEST_ENUMERATION() remains unchanged.

I didn't include the range-diff because the TEST_LOOKUP() has changed
quite a lot, so it would have to be reviewed from the beginning
anyways and I also rebased it on top of latest master since the last
version was sent 2 months ago.

Thanks.

 Makefile                   |   2 +-
 t/helper/test-oid-array.c  |  49 --------------
 t/helper/test-tool.c       |   1 -
 t/helper/test-tool.h       |   1 -
 t/t0064-oid-array.sh       | 122 ----------------------------------
 t/unit-tests/lib-oid.c     |   2 +-
 t/unit-tests/lib-oid.h     |   6 ++
 t/unit-tests/t-oid-array.c | 132 +++++++++++++++++++++++++++++++++++++
 8 files changed, 140 insertions(+), 175 deletions(-)
 delete mode 100644 t/helper/test-oid-array.c
 delete mode 100755 t/t0064-oid-array.sh
 create mode 100644 t/unit-tests/t-oid-array.c

Comments

Christian Couder Aug. 19, 2024, 4:55 p.m. UTC | #1
On Sat, Aug 3, 2024 at 3:22 PM Ghanshyam Thakkar
<shyamthakkar001@gmail.com> wrote:
>
> helper/test-oid-array.c along with t0064-oid-array.sh test the
> oid-array.h library, which provides storage and processing

Nit: I think "oid-array.h" is more an API than a library.

> efficiency over large lists of object identifiers.
>
> Migrate them to the unit testing framework for better runtime
> performance and efficiency. Also 'the_hash_algo' is used internally in

It doesn't seem to me that a variable called 'the_hash_algo' is used
internally in oid_array_lookup() anymore.

> oid_array_lookup(), but we do not initialize a repository directory,
> therefore initialize the_hash_algo manually. And
> init_hash_algo():lib-oid.c can aid in this process, so make it public.
>
> Mentored-by: Christian Couder <chriscool@tuxfamily.org>
> Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
> Helped-by: Phillip Wood <phillip.wood123@gmail.com>
> Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
> ---
> Changes in v2:
> - removed the use of internal test__run_*() functions.
> - TEST_LOOKUP() is entirely changed where it now accepts a lower bound
>   and an upper bound for checking return values of oid_array_lookup(),
>   instead of passing the condition as a whole literal. (i.e.
>    v1: TEST_LOOKUP(..., ret < 0, ...)
>    v2: TEST_LOOKUP(..., INT_MIN, -1, ...)
>   )

Nice improvements.

> - TEST_ENUMERATION() remains unchanged.

[...]

> +/*
> + * Returns one of GIT_HASH_{SHA1, SHA256, UNKNOWN} based on the value of
> + * GIT_TEST_DEFAULT_HASH. The fallback value in case of absence of
> + * GIT_TEST_DEFAULT_HASH is GIT_HASH_SHA1.
> + */

In this comment, it might be helpful to say that GIT_TEST_DEFAULT_HASH
is an environment variable (while GIT_HASH_* are #defined values).
Also while at it, it might be helpful to say that the function uses
check(algo != GIT_HASH_UNKNOWN) before returning to verify that
GIT_TEST_DEFAULT_HASH is either unset or properly set.

> +int init_hash_algo(void);

[...]

> +static void t_enumeration(const char **input_args, size_t input_sz,
> +                         const char **result, size_t result_sz)
> +{
> +       struct oid_array input = OID_ARRAY_INIT, expect = OID_ARRAY_INIT,
> +                        actual = OID_ARRAY_INIT;
> +       size_t i;
> +
> +       if (fill_array(&input, input_args, input_sz))
> +               return;
> +       if (fill_array(&expect, result, result_sz))
> +               return;

It would have been nice if the arguments were called 'expect_args' and
'expect_sz' in the same way as for 'input'. Is there a reason why we
couldn't just use 'expect' (or maybe 'expected') everywhere instead of
'result'?

Also after the above 'input.nr' is equal to 'input_sz' and 'expect.nr'
is equal to 'result_sz' otherwise we would have already returned fron
the current function.

> +       oid_array_for_each_unique(&input, add_to_oid_array, &actual);
> +       check_uint(actual.nr, ==, expect.nr);

I think it might be better to return if this check fails. Otherwise it
means that we likely messed something up in the 'input_args' or
'result' arguments we passed to the function, and then...

> +       for (i = 0; i < test_min(actual.nr, expect.nr); i++) {
> +               if (!check(oideq(&actual.oid[i], &expect.oid[i])))

...we might not compare here the input oid with the corresponding
result oid we intended to compare it to. This might result in a lot of
not very relevant output.

Returning if check_uint(actual.nr, ==, expect.nr) fails would avoid
such output and also enable us to just use 'actual.nr' instead of
'test_min(actual.nr, expect.nr)' in the 'for' loop above.

> +                       test_msg("expected: %s\n       got: %s\n     index: %" PRIuMAX,
> +                                oid_to_hex(&expect.oid[i]), oid_to_hex(&actual.oid[i]),
> +                                (uintmax_t)i);
> +       }
> +       check_uint(i, ==, result_sz);

As we saw above that 'expect.nr' is equal to 'result_sz', this check
can fail only if 'actual.nr' is different from 'expect.nr' which we
already checked above. So I think this check is redundant and we might
want to get rid of it.

> +       oid_array_clear(&actual);
> +       oid_array_clear(&input);
> +       oid_array_clear(&expect);
> +}

[...]

> +static void t_lookup(const char **input_hexes, size_t n, const char *query_hex,
> +                    int lower_bound, int upper_bound)
> +{
> +       struct oid_array array = OID_ARRAY_INIT;
> +       struct object_id oid_query;
> +       int ret;
> +
> +       if (get_oid_arbitrary_hex(query_hex, &oid_query))
> +               return;

In fill_array() above, we use check_int() to check the result of
get_oid_arbitrary_hex() like this:

              if (!check_int(get_oid_arbitrary_hex(hexes[i], &oid), ==, 0))

It doesn't look consistent to not use check_int() to check the result
of get_oid_arbitrary_hex() here. Or is there a specific reason to do
it in one place but not in another?

> +       if (fill_array(&array, input_hexes, n))
> +               return;
> +       ret = oid_array_lookup(&array, &oid_query);
> +
> +       if (!check_int(ret, <=, upper_bound) ||
> +           !check_int(ret, >=, lower_bound))
> +               test_msg("oid query for lookup: %s", oid_to_hex(&oid_query));
> +
> +       oid_array_clear(&array);
> +}
> +
> +#define TEST_LOOKUP(input_hexes, query, lower_bound, upper_bound, desc) \
> +       TEST(t_lookup(input_hexes, ARRAY_SIZE(input_hexes), query,      \
> +                     lower_bound, upper_bound),                        \
> +            desc " works")
> +
> +static void setup(void)
> +{
> +       int algo = init_hash_algo();
> +       /* because the_hash_algo is used by oid_array_lookup() internally */

I think this comment should be above the first line in this function
as it also explains why we need to use init_hash_algo().

Also something like "/* The hash algo is used by oid_array_lookup()
internally */" seems better to me as there is no 'the_hash_algo'
global variable used by oid_array_lookup().

> +       if (check_int(algo, !=, GIT_HASH_UNKNOWN))
> +               repo_set_hash_algo(the_repository, algo);
> +}
> +
> +int cmd_main(int argc UNUSED, const char **argv UNUSED)
> +{
> +       const char *arr_input[] = { "88", "44", "aa", "55" };
> +       const char *arr_input_dup[] = { "88", "44", "aa", "55",
> +                                       "88", "44", "aa", "55",
> +                                       "88", "44", "aa", "55" };
> +       const char *res_sorted[] = { "44", "55", "88", "aa" };
> +       const char *nearly_55;
> +
> +       if (!TEST(setup(), "setup"))
> +               test_skip_all("hash algo initialization failed");

Nice that we skip all the other tests with a helpful message if setup() fails.

> +       TEST_ENUMERATION(arr_input, res_sorted, "ordered enumeration");
> +       TEST_ENUMERATION(arr_input_dup, res_sorted,
> +                        "ordered enumeration with duplicate suppression");
> +
> +       /* ret is the return value of oid_array_lookup() */

This comment is not relevant anymore in this version.

> +       TEST_LOOKUP(arr_input, "55", 1, 1, "lookup");
> +       TEST_LOOKUP(arr_input, "33", INT_MIN, -1, "lookup non-existent entry");
> +       TEST_LOOKUP(arr_input_dup, "55", 3, 5, "lookup with duplicates");
> +       TEST_LOOKUP(arr_input_dup, "66", INT_MIN, -1,
> +                   "lookup non-existent entry with duplicates");
> +
> +       nearly_55 = init_hash_algo() == GIT_HASH_SHA1 ?
> +                       "5500000000000000000000000000000000000001" :
> +                       "5500000000000000000000000000000000000000000000000000000000000001";
> +       TEST_LOOKUP(((const char *[]){ "55", nearly_55 }), "55", 0, 0,
> +                   "lookup with almost duplicate values");
> +       TEST_LOOKUP(((const char *[]){ "55", "55" }), "55", 0, 1,
> +                   "lookup with single duplicate value");
> +
> +       return test_done();
> +}
> --
> 2.46.0
>
Ghanshyam Thakkar Aug. 24, 2024, 5 p.m. UTC | #2
Christian Couder <christian.couder@gmail.com> wrote:
> On Sat, Aug 3, 2024 at 3:22 PM Ghanshyam Thakkar
> <shyamthakkar001@gmail.com> wrote:
> > Migrate them to the unit testing framework for better runtime
> > performance and efficiency. Also 'the_hash_algo' is used internally in
>
> It doesn't seem to me that a variable called 'the_hash_algo' is used
> internally in oid_array_lookup() anymore.

It is. oid_array_lookup() uses oid_pos():hash-lookup.c, which uses
'the_hash_algo'.

> > +static void t_enumeration(const char **input_args, size_t input_sz,
> > +                         const char **result, size_t result_sz)
> > +{
> > +       struct oid_array input = OID_ARRAY_INIT, expect = OID_ARRAY_INIT,
> > +                        actual = OID_ARRAY_INIT;
> > +       size_t i;
> > +
> > +       if (fill_array(&input, input_args, input_sz))
> > +               return;
> > +       if (fill_array(&expect, result, result_sz))
> > +               return;
>
> It would have been nice if the arguments were called 'expect_args' and
> 'expect_sz' in the same way as for 'input'. Is there a reason why we
> couldn't just use 'expect' (or maybe 'expected') everywhere instead of
> 'result'?

I have changed them to 'expect' in v3.

> Also after the above 'input.nr' is equal to 'input_sz' and 'expect.nr'
> is equal to 'result_sz' otherwise we would have already returned fron
> the current function.
>
> > +       oid_array_for_each_unique(&input, add_to_oid_array, &actual);
> > +       check_uint(actual.nr, ==, expect.nr);
>
> I think it might be better to return if this check fails. Otherwise it
> means that we likely messed something up in the 'input_args' or
> 'result' arguments we passed to the function, and then...
>
> > +       for (i = 0; i < test_min(actual.nr, expect.nr); i++) {
> > +               if (!check(oideq(&actual.oid[i], &expect.oid[i])))
>
> ...we might not compare here the input oid with the corresponding
> result oid we intended to compare it to. This might result in a lot of
> not very relevant output.
>
> Returning if check_uint(actual.nr, ==, expect.nr) fails would avoid
> such output and also enable us to just use 'actual.nr' instead of
> 'test_min(actual.nr, expect.nr)' in the 'for' loop above.

Changed this in v3.

>
> > +                       test_msg("expected: %s\n       got: %s\n     index: %" PRIuMAX,
> > +                                oid_to_hex(&expect.oid[i]), oid_to_hex(&actual.oid[i]),
> > +                                (uintmax_t)i);
> > +       }
> > +       check_uint(i, ==, result_sz);
>
> As we saw above that 'expect.nr' is equal to 'result_sz', this check
> can fail only if 'actual.nr' is different from 'expect.nr' which we
> already checked above. So I think this check is redundant and we might
> want to get rid of it.

Removed in v3.

>
> In fill_array() above, we use check_int() to check the result of
> get_oid_arbitrary_hex() like this:
>
> if (!check_int(get_oid_arbitrary_hex(hexes[i], &oid), ==, 0))
>
> It doesn't look consistent to not use check_int() to check the result
> of get_oid_arbitrary_hex() here. Or is there a specific reason to do
> it in one place but not in another?

Not in particular. Added check_int() in v3.

Thanks for the review.
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index d6479092a0..c6f11dc453 100644
--- a/Makefile
+++ b/Makefile
@@ -808,7 +808,6 @@  TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o
 TEST_BUILTINS_OBJS += test-match-trees.o
 TEST_BUILTINS_OBJS += test-mergesort.o
 TEST_BUILTINS_OBJS += test-mktemp.o
-TEST_BUILTINS_OBJS += test-oid-array.o
 TEST_BUILTINS_OBJS += test-online-cpus.o
 TEST_BUILTINS_OBJS += test-pack-mtimes.o
 TEST_BUILTINS_OBJS += test-parse-options.o
@@ -1336,6 +1335,7 @@  UNIT_TEST_PROGRAMS += t-ctype
 UNIT_TEST_PROGRAMS += t-example-decorate
 UNIT_TEST_PROGRAMS += t-hash
 UNIT_TEST_PROGRAMS += t-mem-pool
+UNIT_TEST_PROGRAMS += t-oid-array
 UNIT_TEST_PROGRAMS += t-oidmap
 UNIT_TEST_PROGRAMS += t-oidtree
 UNIT_TEST_PROGRAMS += t-prio-queue
diff --git a/t/helper/test-oid-array.c b/t/helper/test-oid-array.c
deleted file mode 100644
index 076b849cbf..0000000000
--- a/t/helper/test-oid-array.c
+++ /dev/null
@@ -1,49 +0,0 @@ 
-#define USE_THE_REPOSITORY_VARIABLE
-
-#include "test-tool.h"
-#include "hex.h"
-#include "oid-array.h"
-#include "setup.h"
-#include "strbuf.h"
-
-static int print_oid(const struct object_id *oid, void *data UNUSED)
-{
-	puts(oid_to_hex(oid));
-	return 0;
-}
-
-int cmd__oid_array(int argc UNUSED, const char **argv UNUSED)
-{
-	struct oid_array array = OID_ARRAY_INIT;
-	struct strbuf line = STRBUF_INIT;
-	int nongit_ok;
-
-	setup_git_directory_gently(&nongit_ok);
-	if (nongit_ok)
-		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
-
-	while (strbuf_getline(&line, stdin) != EOF) {
-		const char *arg;
-		struct object_id oid;
-
-		if (skip_prefix(line.buf, "append ", &arg)) {
-			if (get_oid_hex(arg, &oid))
-				die("not a hexadecimal oid: %s", arg);
-			oid_array_append(&array, &oid);
-		} else if (skip_prefix(line.buf, "lookup ", &arg)) {
-			if (get_oid_hex(arg, &oid))
-				die("not a hexadecimal oid: %s", arg);
-			printf("%d\n", oid_array_lookup(&array, &oid));
-		} else if (!strcmp(line.buf, "clear"))
-			oid_array_clear(&array);
-		else if (!strcmp(line.buf, "for_each_unique"))
-			oid_array_for_each_unique(&array, print_oid, NULL);
-		else
-			die("unknown command: %s", line.buf);
-	}
-
-	strbuf_release(&line);
-	oid_array_clear(&array);
-
-	return 0;
-}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index da3e69128a..353d2aaaa4 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -43,7 +43,6 @@  static struct test_cmd cmds[] = {
 	{ "match-trees", cmd__match_trees },
 	{ "mergesort", cmd__mergesort },
 	{ "mktemp", cmd__mktemp },
-	{ "oid-array", cmd__oid_array },
 	{ "online-cpus", cmd__online_cpus },
 	{ "pack-mtimes", cmd__pack_mtimes },
 	{ "parse-options", cmd__parse_options },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 642a34578c..d3d8aa28e0 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -64,7 +64,6 @@  int cmd__scrap_cache_tree(int argc, const char **argv);
 int cmd__serve_v2(int argc, const char **argv);
 int cmd__sha1(int argc, const char **argv);
 int cmd__sha1_is_sha1dc(int argc, const char **argv);
-int cmd__oid_array(int argc, const char **argv);
 int cmd__sha256(int argc, const char **argv);
 int cmd__sigchain(int argc, const char **argv);
 int cmd__simple_ipc(int argc, const char **argv);
diff --git a/t/t0064-oid-array.sh b/t/t0064-oid-array.sh
deleted file mode 100755
index de74b692d0..0000000000
--- a/t/t0064-oid-array.sh
+++ /dev/null
@@ -1,122 +0,0 @@ 
-#!/bin/sh
-
-test_description='basic tests for the oid array implementation'
-
-TEST_PASSES_SANITIZE_LEAK=true
-. ./test-lib.sh
-
-echoid () {
-	prefix="${1:+$1 }"
-	shift
-	while test $# -gt 0
-	do
-		echo "$prefix$ZERO_OID" | sed -e "s/00/$1/g"
-		shift
-	done
-}
-
-test_expect_success 'without repository' '
-	cat >expect <<-EOF &&
-	4444444444444444444444444444444444444444
-	5555555555555555555555555555555555555555
-	8888888888888888888888888888888888888888
-	aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-	EOF
-	cat >input <<-EOF &&
-	append 4444444444444444444444444444444444444444
-	append 5555555555555555555555555555555555555555
-	append 8888888888888888888888888888888888888888
-	append aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
-	for_each_unique
-	EOF
-	nongit test-tool oid-array <input >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'ordered enumeration' '
-	echoid "" 44 55 88 aa >expect &&
-	{
-		echoid append 88 44 aa 55 &&
-		echo for_each_unique
-	} | test-tool oid-array >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'ordered enumeration with duplicate suppression' '
-	echoid "" 44 55 88 aa >expect &&
-	{
-		echoid append 88 44 aa 55 &&
-		echoid append 88 44 aa 55 &&
-		echoid append 88 44 aa 55 &&
-		echo for_each_unique
-	} | test-tool oid-array >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'lookup' '
-	{
-		echoid append 88 44 aa 55 &&
-		echoid lookup 55
-	} | test-tool oid-array >actual &&
-	n=$(cat actual) &&
-	test "$n" -eq 1
-'
-
-test_expect_success 'lookup non-existing entry' '
-	{
-		echoid append 88 44 aa 55 &&
-		echoid lookup 33
-	} | test-tool oid-array >actual &&
-	n=$(cat actual) &&
-	test "$n" -lt 0
-'
-
-test_expect_success 'lookup with duplicates' '
-	{
-		echoid append 88 44 aa 55 &&
-		echoid append 88 44 aa 55 &&
-		echoid append 88 44 aa 55 &&
-		echoid lookup 55
-	} | test-tool oid-array >actual &&
-	n=$(cat actual) &&
-	test "$n" -ge 3 &&
-	test "$n" -le 5
-'
-
-test_expect_success 'lookup non-existing entry with duplicates' '
-	{
-		echoid append 88 44 aa 55 &&
-		echoid append 88 44 aa 55 &&
-		echoid append 88 44 aa 55 &&
-		echoid lookup 66
-	} | test-tool oid-array >actual &&
-	n=$(cat actual) &&
-	test "$n" -lt 0
-'
-
-test_expect_success 'lookup with almost duplicate values' '
-	# n-1 5s
-	root=$(echoid "" 55) &&
-	root=${root%5} &&
-	{
-		id1="${root}5" &&
-		id2="${root}f" &&
-		echo "append $id1" &&
-		echo "append $id2" &&
-		echoid lookup 55
-	} | test-tool oid-array >actual &&
-	n=$(cat actual) &&
-	test "$n" -eq 0
-'
-
-test_expect_success 'lookup with single duplicate value' '
-	{
-		echoid append 55 55 &&
-		echoid lookup 55
-	} | test-tool oid-array >actual &&
-	n=$(cat actual) &&
-	test "$n" -ge 0 &&
-	test "$n" -le 1
-'
-
-test_done
diff --git a/t/unit-tests/lib-oid.c b/t/unit-tests/lib-oid.c
index 37105f0a8f..8f0ccac532 100644
--- a/t/unit-tests/lib-oid.c
+++ b/t/unit-tests/lib-oid.c
@@ -3,7 +3,7 @@ 
 #include "strbuf.h"
 #include "hex.h"
 
-static int init_hash_algo(void)
+int init_hash_algo(void)
 {
 	static int algo = -1;
 
diff --git a/t/unit-tests/lib-oid.h b/t/unit-tests/lib-oid.h
index 8d2acca768..011a2d88de 100644
--- a/t/unit-tests/lib-oid.h
+++ b/t/unit-tests/lib-oid.h
@@ -13,5 +13,11 @@ 
  * environment variable.
  */
 int get_oid_arbitrary_hex(const char *s, struct object_id *oid);
+/*
+ * Returns one of GIT_HASH_{SHA1, SHA256, UNKNOWN} based on the value of
+ * GIT_TEST_DEFAULT_HASH. The fallback value in case of absence of
+ * GIT_TEST_DEFAULT_HASH is GIT_HASH_SHA1.
+ */
+int init_hash_algo(void);
 
 #endif /* LIB_OID_H */
diff --git a/t/unit-tests/t-oid-array.c b/t/unit-tests/t-oid-array.c
new file mode 100644
index 0000000000..baa7c68d7b
--- /dev/null
+++ b/t/unit-tests/t-oid-array.c
@@ -0,0 +1,132 @@ 
+#define USE_THE_REPOSITORY_VARIABLE
+
+#include "test-lib.h"
+#include "lib-oid.h"
+#include "oid-array.h"
+#include "hex.h"
+
+static inline size_t test_min(size_t a, size_t b)
+{
+	return a <= b ? a : b;
+}
+
+static int fill_array(struct oid_array *array, const char *hexes[], size_t n)
+{
+	for (size_t i = 0; i < n; i++) {
+		struct object_id oid;
+
+		if (!check_int(get_oid_arbitrary_hex(hexes[i], &oid), ==, 0))
+			return -1;
+		oid_array_append(array, &oid);
+	}
+	if (!check_uint(array->nr, ==, n))
+		return -1;
+	return 0;
+}
+
+static int add_to_oid_array(const struct object_id *oid, void *data)
+{
+	struct oid_array *array = data;
+
+	oid_array_append(array, oid);
+	return 0;
+}
+
+static void t_enumeration(const char **input_args, size_t input_sz,
+			  const char **result, size_t result_sz)
+{
+	struct oid_array input = OID_ARRAY_INIT, expect = OID_ARRAY_INIT,
+			 actual = OID_ARRAY_INIT;
+	size_t i;
+
+	if (fill_array(&input, input_args, input_sz))
+		return;
+	if (fill_array(&expect, result, result_sz))
+		return;
+
+	oid_array_for_each_unique(&input, add_to_oid_array, &actual);
+	check_uint(actual.nr, ==, expect.nr);
+
+	for (i = 0; i < test_min(actual.nr, expect.nr); i++) {
+		if (!check(oideq(&actual.oid[i], &expect.oid[i])))
+			test_msg("expected: %s\n       got: %s\n     index: %" PRIuMAX,
+				 oid_to_hex(&expect.oid[i]), oid_to_hex(&actual.oid[i]),
+				 (uintmax_t)i);
+	}
+	check_uint(i, ==, result_sz);
+
+	oid_array_clear(&actual);
+	oid_array_clear(&input);
+	oid_array_clear(&expect);
+}
+
+#define TEST_ENUMERATION(input, result, desc)                                     \
+	TEST(t_enumeration(input, ARRAY_SIZE(input), result, ARRAY_SIZE(result)), \
+			   desc " works")
+
+static void t_lookup(const char **input_hexes, size_t n, const char *query_hex,
+		     int lower_bound, int upper_bound)
+{
+	struct oid_array array = OID_ARRAY_INIT;
+	struct object_id oid_query;
+	int ret;
+
+	if (get_oid_arbitrary_hex(query_hex, &oid_query))
+		return;
+	if (fill_array(&array, input_hexes, n))
+		return;
+	ret = oid_array_lookup(&array, &oid_query);
+
+	if (!check_int(ret, <=, upper_bound) ||
+	    !check_int(ret, >=, lower_bound))
+		test_msg("oid query for lookup: %s", oid_to_hex(&oid_query));
+
+	oid_array_clear(&array);
+}
+
+#define TEST_LOOKUP(input_hexes, query, lower_bound, upper_bound, desc) \
+	TEST(t_lookup(input_hexes, ARRAY_SIZE(input_hexes), query,      \
+		      lower_bound, upper_bound),                        \
+	     desc " works")
+
+static void setup(void)
+{
+	int algo = init_hash_algo();
+	/* because the_hash_algo is used by oid_array_lookup() internally */
+	if (check_int(algo, !=, GIT_HASH_UNKNOWN))
+		repo_set_hash_algo(the_repository, algo);
+}
+
+int cmd_main(int argc UNUSED, const char **argv UNUSED)
+{
+	const char *arr_input[] = { "88", "44", "aa", "55" };
+	const char *arr_input_dup[] = { "88", "44", "aa", "55",
+					"88", "44", "aa", "55",
+					"88", "44", "aa", "55" };
+	const char *res_sorted[] = { "44", "55", "88", "aa" };
+	const char *nearly_55;
+
+	if (!TEST(setup(), "setup"))
+		test_skip_all("hash algo initialization failed");
+
+	TEST_ENUMERATION(arr_input, res_sorted, "ordered enumeration");
+	TEST_ENUMERATION(arr_input_dup, res_sorted,
+			 "ordered enumeration with duplicate suppression");
+
+	/* ret is the return value of oid_array_lookup() */
+	TEST_LOOKUP(arr_input, "55", 1, 1, "lookup");
+	TEST_LOOKUP(arr_input, "33", INT_MIN, -1, "lookup non-existent entry");
+	TEST_LOOKUP(arr_input_dup, "55", 3, 5, "lookup with duplicates");
+	TEST_LOOKUP(arr_input_dup, "66", INT_MIN, -1,
+		    "lookup non-existent entry with duplicates");
+
+	nearly_55 = init_hash_algo() == GIT_HASH_SHA1 ?
+			"5500000000000000000000000000000000000001" :
+			"5500000000000000000000000000000000000000000000000000000000000001";
+	TEST_LOOKUP(((const char *[]){ "55", nearly_55 }), "55", 0, 0,
+		    "lookup with almost duplicate values");
+	TEST_LOOKUP(((const char *[]){ "55", "55" }), "55", 0, 1,
+		    "lookup with single duplicate value");
+
+	return test_done();
+}