Message ID | cover.1718106284.git.ps@pks.im (mailing list archive) |
---|---|
Headers | show |
Series | Introduce `USE_THE_REPOSITORY_VARIABLE` macro | expand |
On 2024-06-11 at 11:57:33, Patrick Steinhardt wrote: > Hi, > > use of the `the_repository` variable is nowadays considered to be > deprecated, and over time we want to convert our codebase to stop using > it in favor of explicitly passing down the repository to functions via > parameters. This effort faces some important problems though. > > - It is hard to prove that a certain code unit does not use > `the_repository` anymore when sending patch series. The reviewer has > no way to verify that it's not used anymore without reading through > the code itself. > > - It is easy to sneak in new usages of `the_repository` by accident > into a code unit that is already `the_repository`-clean. > > - There are many functions which implicitly use `the_repository`, > which is really hard to spot. > > This patch series aims to address those problems by introducing a new > `USE_THE_REPOSITORY_VARIABLE` macro. When unset, then the declarations > of `the_repository`, `the_hash_algo` and some functions that implicitly > depend on them will be hidden away. This makes it trivial to demonstrate > that a code unit is `the_repository`-free by removing the definition of > any such macro. Overall, I left a few comments, but I think this definitely moves us in the right direction and I'm glad to see it. This obviously improves the experience with libification and unit testing in a lot of ways, which is good. My only caution is that using the *_any functions will cause us a world of pain if we ever adopt another 256-bit hash function, since it will be ambiguous which algorithm is to be used. That's why, traditionally, we haven't assumed a hash algorithm based on the object ID length. I don't think the amount of uses we have is excessive, even with your changes, but we'll need to be mindful of that going forward.
On Tue, Jun 11, 2024 at 11:24:30PM +0000, brian m. carlson wrote: > On 2024-06-11 at 11:57:33, Patrick Steinhardt wrote: > > Hi, > > > > use of the `the_repository` variable is nowadays considered to be > > deprecated, and over time we want to convert our codebase to stop using > > it in favor of explicitly passing down the repository to functions via > > parameters. This effort faces some important problems though. > > > > - It is hard to prove that a certain code unit does not use > > `the_repository` anymore when sending patch series. The reviewer has > > no way to verify that it's not used anymore without reading through > > the code itself. > > > > - It is easy to sneak in new usages of `the_repository` by accident > > into a code unit that is already `the_repository`-clean. > > > > - There are many functions which implicitly use `the_repository`, > > which is really hard to spot. > > > > This patch series aims to address those problems by introducing a new > > `USE_THE_REPOSITORY_VARIABLE` macro. When unset, then the declarations > > of `the_repository`, `the_hash_algo` and some functions that implicitly > > depend on them will be hidden away. This makes it trivial to demonstrate > > that a code unit is `the_repository`-free by removing the definition of > > any such macro. > > Overall, I left a few comments, but I think this definitely moves us in > the right direction and I'm glad to see it. This obviously improves the > experience with libification and unit testing in a lot of ways, which is > good. > > My only caution is that using the *_any functions will cause us a world > of pain if we ever adopt another 256-bit hash function, since it will be > ambiguous which algorithm is to be used. That's why, traditionally, we > haven't assumed a hash algorithm based on the object ID length. I don't > think the amount of uses we have is excessive, even with your changes, > but we'll need to be mindful of that going forward. The only cases where I add new calls to `_any()` are in test helpers: - "t/helper/test-oidtree.c". This one is getting converted to a unit test by Ghanshyam, so I'll leave it to him to improve this. - "t/helper/test-proc-receive.c". Here we don't care about the actual algorithm, the only thing we care about is that we can correctly parse them and then eventually emit them via `oid_to_hex()` again. So even if we introduce a second hash function with the same length this code would continue to work alright. So I think it should be fine in the context of this series. But the remark is certainly valid and something we should be cautious about going forward. Thanks! Patrick
On Wed, 12 Jun 2024, Patrick Steinhardt <ps@pks.im> wrote: > On Tue, Jun 11, 2024 at 11:24:30PM +0000, brian m. carlson wrote: > > On 2024-06-11 at 11:57:33, Patrick Steinhardt wrote: > > > Hi, > > > > > > use of the `the_repository` variable is nowadays considered to be > > > deprecated, and over time we want to convert our codebase to stop using > > > it in favor of explicitly passing down the repository to functions via > > > parameters. This effort faces some important problems though. > > > > > > - It is hard to prove that a certain code unit does not use > > > `the_repository` anymore when sending patch series. The reviewer has > > > no way to verify that it's not used anymore without reading through > > > the code itself. > > > > > > - It is easy to sneak in new usages of `the_repository` by accident > > > into a code unit that is already `the_repository`-clean. > > > > > > - There are many functions which implicitly use `the_repository`, > > > which is really hard to spot. > > > > > > This patch series aims to address those problems by introducing a new > > > `USE_THE_REPOSITORY_VARIABLE` macro. When unset, then the declarations > > > of `the_repository`, `the_hash_algo` and some functions that implicitly > > > depend on them will be hidden away. This makes it trivial to demonstrate > > > that a code unit is `the_repository`-free by removing the definition of > > > any such macro. > > > > Overall, I left a few comments, but I think this definitely moves us in > > the right direction and I'm glad to see it. This obviously improves the > > experience with libification and unit testing in a lot of ways, which is > > good. > > > > My only caution is that using the *_any functions will cause us a world > > of pain if we ever adopt another 256-bit hash function, since it will be > > ambiguous which algorithm is to be used. That's why, traditionally, we > > haven't assumed a hash algorithm based on the object ID length. I don't > > think the amount of uses we have is excessive, even with your changes, > > but we'll need to be mindful of that going forward. > > The only cases where I add new calls to `_any()` are in test helpers: > > - "t/helper/test-oidtree.c". This one is getting converted to a unit > test by Ghanshyam, so I'll leave it to him to improve this. Yeah, I don't use '_any()', and explicitly give algo using '_algop()'. link:https://lore.kernel.org/git/20240608165731.29467-1-shyamthakkar001@gmail.com/ Thanks. > - "t/helper/test-proc-receive.c". Here we don't care about the actual > algorithm, the only thing we care about is that we can correctly > parse them and then eventually emit them via `oid_to_hex()` again. > So even if we introduce a second hash function with the same length > this code would continue to work alright. > > So I think it should be fine in the context of this series. But the > remark is certainly valid and something we should be cautious about > going forward. > > Thanks! > > Patrick