Message ID | 20240719093338.55117-1-linyunsheng@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | Replace page_frag with page_frag_cache for sk_page_frag() | expand |
On Fri, Jul 19, 2024 at 2:36 AM Yunsheng Lin <linyunsheng@huawei.com> wrote: > > After [1], there are still two implementations for page frag: > > 1. mm/page_alloc.c: net stack seems to be using it in the > rx part with 'struct page_frag_cache' and the main API > being page_frag_alloc_align(). > 2. net/core/sock.c: net stack seems to be using it in the > tx part with 'struct page_frag' and the main API being > skb_page_frag_refill(). > > This patchset tries to unfiy the page frag implementation > by replacing page_frag with page_frag_cache for sk_page_frag() > first. net_high_order_alloc_disable_key for the implementation > in net/core/sock.c doesn't seems matter that much now as pcp > is also supported for high-order pages: > commit 44042b449872 ("mm/page_alloc: allow high-order pages to > be stored on the per-cpu lists") > > As the related change is mostly related to networking, so > targeting the net-next. And will try to replace the rest > of page_frag in the follow patchset. So in reality I would say something like the first 4 patches are probably more applicable to mm than they are to the net-next tree. Especially given that we are having to deal with the mm_task_types.h in order to sort out the include order issues. Given that I think it might make more sense to look at breaking this into 2 or more patch sets with the first being more mm focused since the test module and pulling the code out of page_alloc.c, gfp.h, and mm_types.h would be pretty impactful on mm more than it is on the networking stack. After those changes then I would agree that we are mostly just impacting the network stack. > After this patchset: > 1. Unify the page frag implementation by taking the best out of > two the existing implementations: we are able to save some space > for the 'page_frag_cache' API user, and avoid 'get_page()' for > the old 'page_frag' API user. > 2. Future bugfix and performance can be done in one place, hence > improving maintainability of page_frag's implementation. > > Kernel Image changing: > Linux Kernel total | text data bss > ------------------------------------------------------ > after 45250307 | 27274279 17209996 766032 > before 45254134 | 27278118 17209984 766032 > delta -3827 | -3839 +12 +0 It might be interesting if we can track some of this on a per patch basis for the signficant patches. For example I would be curious how much of this is just from patch 9 that replaces alloc_pages_node with __alloc_pages. > Performance validation: > 1. Using micro-benchmark ko added in patch 1 to test aligned and > non-aligned API performance impact for the existing users, there > is no notiable performance degradation. Instead we seems to some > minor performance boot for both aligned and non-aligned API after > this patchset as below. > > 2. Use the below netcat test case, we also have some minor > performance boot for repalcing 'page_frag' with 'page_frag_cache' > after this patchset. > server: taskset -c 32 nc -l -k 1234 > /dev/null > client: perf stat -r 200 -- taskset -c 0 head -c 20G /dev/zero | taskset -c 1 nc 127.0.0.1 1234 > > In order to avoid performance noise as much as possible, the testing > is done in system without any other laod and have enough iterations to > prove the data is stable enogh, complete log for testing is below: Please go through and run a spell check over your patches. I have seen multiple spelling errors in a few of the patch descriptions as well. > taskset -c 32 nc -l -k 1234 > /dev/null > perf stat -r 200 -- insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 > perf stat -r 200 -- insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_align=1 > perf stat -r 200 -- taskset -c 0 head -c 20G /dev/zero | taskset -c 1 nc 127.0.0.1 1234 > > *After* this patchset: > > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17' (200 runs): ... > 23.856064365 seconds time elapsed ( +- 0.08% ) > > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_align=1' (200 runs): ... > 23.443927103 seconds time elapsed ( +- 0.05% ) > > Performance counter stats for 'taskset -c 0 head -c 20G /dev/zero' (200 runs): ... > 27.370305077 seconds time elapsed ( +- 0.01% ) > > > *Before* this patchset: > > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17' (200 runs): ... > 23.918006193 seconds time elapsed ( +- 0.10% ) > > Performance counter stats for 'insmod ./page_frag_test.ko test_push_cpu=16 test_pop_cpu=17 test_align=1' (200 runs): ... > 23.876306259 seconds time elapsed ( +- 0.13% ) > > Performance counter stats for 'taskset -c 0 head -c 20G /dev/zero' (200 runs): ... > 27.842745343 seconds time elapsed ( +- 0.02% ) I'm not seeing any sort of really significant improvement here. If I am understanding correctly the change amounts to: 23.9 -> 23.9 (-0.26%) 23.9 -> 23.4 (-1.8%) 27.8 -> 27.4 (-1.6%) The trade off is that we are adding a bunch more complexity to the code, and refactoring it for reuse, but somehow in the process this patch set increases the total code footprint in terms of total lines of code and will make maintenance that much harder as the code seems much more tangled up and brittle than it was previously. I think we need to think about splitting this patch set into at least 2. Basically an mm side of the patch set to move the code and do most of your refactoring work. The second half of the set being to add the refactoring of the network side to reuse the existing code. I think it will make it easier to review and might get you more engagement as I am pretty sure several of these patches likely wouldn't fly with some of the maintainers. So specifically I would like to see patches 1 (refactored as selftest), 2, 3, 5, 7, 8, 13 (current APIs), and 14 done as more of an mm focused set since many of the issues you seem to have are problems building due to mm build issues, dependencies, and the like. That is the foundation for this patch set and it seems like we keep seeing issues there so that needs to be solid before we can do the new API work. If focused on mm you might get more eyes on it as not many networking folks are that familiar with the memory management side of things. As for the other patches, specifically 10, 11, 12, and 13 (prepare, probe, commit API), they could then be spun up as a netdev centered set. I took a brief look at them but they need some serious refactor as I think they are providing page as a return value in several cases where they don't need to. In my opinion with a small bit of refactoring patch 4 can just be dropped. I don't think the renaming is necessary and it just adds noise to the commit logs for the impacted drivers. It will require tweaks to the other patches but I think it will be better that way in the long run. Looking at patch 6 I am left scratching my head and wondering if you have another build issue of some sort that you haven't mentioned. I really don't think it belongs in this patch set and should probably be a fix on its own if you have some reason to justify it. Otherwise you might also just look at refactoring it to take "__builtin_constant_p(size)" into account by copying/pasting the first bits from the generic version into the function since I am assuming there is a performance benefit to doing it in assembler. It should be a net win if you just add the accounting for constants. Patch 9 could probably be a standalone patch or included in the more mm centered set. However it would need to be redone to fix the underlying issue rather than working around it by changing the function called rather than fixing the function. No point in improving it for one case when you can cover multiple cases with a single change.
On 2024/7/22 7:49, Alexander Duyck wrote: > On Fri, Jul 19, 2024 at 2:36 AM Yunsheng Lin <linyunsheng@huawei.com> wrote: >> >> After [1], there are still two implementations for page frag: >> >> 1. mm/page_alloc.c: net stack seems to be using it in the >> rx part with 'struct page_frag_cache' and the main API >> being page_frag_alloc_align(). >> 2. net/core/sock.c: net stack seems to be using it in the >> tx part with 'struct page_frag' and the main API being >> skb_page_frag_refill(). >> >> This patchset tries to unfiy the page frag implementation >> by replacing page_frag with page_frag_cache for sk_page_frag() >> first. net_high_order_alloc_disable_key for the implementation >> in net/core/sock.c doesn't seems matter that much now as pcp >> is also supported for high-order pages: >> commit 44042b449872 ("mm/page_alloc: allow high-order pages to >> be stored on the per-cpu lists") >> >> As the related change is mostly related to networking, so >> targeting the net-next. And will try to replace the rest >> of page_frag in the follow patchset. > > So in reality I would say something like the first 4 patches are > probably more applicable to mm than they are to the net-next tree. > Especially given that we are having to deal with the mm_task_types.h > in order to sort out the include order issues. > > Given that I think it might make more sense to look at breaking this > into 2 or more patch sets with the first being more mm focused since > the test module and pulling the code out of page_alloc.c, gfp.h, and > mm_types.h would be pretty impactful on mm more than it is on the > networking stack. After those changes then I would agree that we are > mostly just impacting the network stack. I am sure there are plenty of good precedents about how to handling a patchset that affecting multi subsystems. Let's be more specific about what are the options here: 1. Keeping all changing as one patchset targetting the net-next tree as this version does. 2. Breaking all changing into two patchsets, the one affecting current APIs targetting the mm tree and the one supporting new APIs targetting net-next tree. 3. Breaking all changing into two patchset as option 2 does, but both patchsets targetting net-next tree to aovid waiting for the changing in mm tree to merged back to net-next tree for adding supporting of new APIs. I am not sure your perference is option 2 or option 3 here, or there are others options here, it would be better to be more specific about your option here. As option 2 doesn't seems to make much sense if all the existing users/callers of page_frag seems to be belonged to networking for testing reasons, and the original code seemed to go through net-next tree too: https://github.com/torvalds/linux/commit/b63ae8ca096dfdbfeef6a209c30a93a966518853 And the main reason I chose option 1 over option 2 is: it is hard to tell how much changing needed to support the new usecase, so it is better to keep them in one patchset to have a bigger picture here. Yes, it may make the patchset harder to review, but that is the tradeoff we need to make here. As my understanding, option 1 seem to be the common practice to handle the changing affecting multi subsystems. Especially you had similar doubt about the changing affecting current APIs as below, it seems hard to explain it without a new case: https://lore.kernel.org/all/68d1c7d3dfcd780fa3bed0bb71e41d7fb0a8c15d.camel@gmail.com/ > ... > > So specifically I would like to see patches 1 (refactored as > selftest), 2, 3, 5, 7, 8, 13 (current APIs), and 14 done as more of an > mm focused set since many of the issues you seem to have are problems > building due to mm build issues, dependencies, and the like. That is > the foundation for this patch set and it seems like we keep seeing > issues there so that needs to be solid before we can do the new API > work. If focused on mm you might get more eyes on it as not many > networking folks are that familiar with the memory management side of > things. I am not sure if breaking it into more patchset is the common practice to 'get more eyes' here. Anyways, it is fair enough ask if there is more concrete reasoning behind the asking and it is common practice to do that, and I would love to break it to more patchsets to perhaps make the discussion easier. > > As for the other patches, specifically 10, 11, 12, and 13 (prepare, > probe, commit API), they could then be spun up as a netdev centered > set. I took a brief look at them but they need some serious refactor > as I think they are providing page as a return value in several cases > where they don't need to. The above is one of the reason I am not willing to do the spliting. It is hard for someone to tell if the refactoring affecting current APIs will be enough for the new usecase without supporting the new usecase, isn't it possible that some refactoring may be proved to be unnecessary or wrong? It would be better to be more specific about what do you mean by 'they are providing page as a return value in several cases where they don't need to' as above. > > In my opinion with a small bit of refactoring patch 4 can just be > dropped. I don't think the renaming is necessary and it just adds > noise to the commit logs for the impacted drivers. It will require > tweaks to the other patches but I think it will be better that way in > the long run. It would be better to be more specific about above too so that we don't have to have more refactoring patchsets for the current APIs. > > Looking at patch 6 I am left scratching my head and wondering if you > have another build issue of some sort that you haven't mentioned. I > really don't think it belongs in this patch set and should probably be > a fix on its own if you have some reason to justify it. Otherwise you > might also just look at refactoring it to take > "__builtin_constant_p(size)" into account by copying/pasting the first > bits from the generic version into the function since I am assuming > there is a performance benefit to doing it in assembler. It should be > a net win if you just add the accounting for constants. I am not sure if the commit log in patch 6 needs some rephasing to answer your question above: "As the get_order() implemented by xtensa supporting 'nsau' instruction seems be the same as the generic implementation in include/asm-generic/getorder.h when size is not a constant value as the generic implementation calling the fls*() is also utilizing the 'nsau' instruction for xtensa. So remove the get_order() implemented by xtensa, as using the generic implementation may enable the compiler to do the computing when size is a constant value instead of runtime computing and enable the using of get_order() in BUILD_BUG_ON() macro in next patch." See the below in the next patch, as the PAGE_FRAG_CACHE_MAX_ORDER is using the get_order(): BUILD_BUG_ON(PAGE_FRAG_CACHE_MAX_ORDER > PAGE_FRAG_CACHE_ORDER_MASK); > > Patch 9 could probably be a standalone patch or included in the more > mm centered set. However it would need to be redone to fix the > underlying issue rather than working around it by changing the > function called rather than fixing the function. No point in improving > it for one case when you can cover multiple cases with a single > change. Sure, it is just that there is only 24h a day for me to do things more generically. So perhaps I should remove patch 9 for now so that we can improve thing more generically.
On Mon, Jul 22, 2024 at 5:41 AM Yunsheng Lin <linyunsheng@huawei.com> wrote: > > On 2024/7/22 7:49, Alexander Duyck wrote: > > On Fri, Jul 19, 2024 at 2:36 AM Yunsheng Lin <linyunsheng@huawei.com> wrote: > >> > >> After [1], there are still two implementations for page frag: > >> > >> 1. mm/page_alloc.c: net stack seems to be using it in the > >> rx part with 'struct page_frag_cache' and the main API > >> being page_frag_alloc_align(). > >> 2. net/core/sock.c: net stack seems to be using it in the > >> tx part with 'struct page_frag' and the main API being > >> skb_page_frag_refill(). > >> > >> This patchset tries to unfiy the page frag implementation > >> by replacing page_frag with page_frag_cache for sk_page_frag() > >> first. net_high_order_alloc_disable_key for the implementation > >> in net/core/sock.c doesn't seems matter that much now as pcp > >> is also supported for high-order pages: > >> commit 44042b449872 ("mm/page_alloc: allow high-order pages to > >> be stored on the per-cpu lists") > >> > >> As the related change is mostly related to networking, so > >> targeting the net-next. And will try to replace the rest > >> of page_frag in the follow patchset. > > > > So in reality I would say something like the first 4 patches are > > probably more applicable to mm than they are to the net-next tree. > > Especially given that we are having to deal with the mm_task_types.h > > in order to sort out the include order issues. > > > > Given that I think it might make more sense to look at breaking this > > into 2 or more patch sets with the first being more mm focused since > > the test module and pulling the code out of page_alloc.c, gfp.h, and > > mm_types.h would be pretty impactful on mm more than it is on the > > networking stack. After those changes then I would agree that we are > > mostly just impacting the network stack. > > I am sure there are plenty of good precedents about how to handling a > patchset that affecting multi subsystems. > Let's be more specific about what are the options here: > 1. Keeping all changing as one patchset targetting the net-next tree > as this version does. > 2. Breaking all changing into two patchsets, the one affecting current APIs > targetting the mm tree and the one supporting new APIs targetting > net-next tree. > 3. Breaking all changing into two patchset as option 2 does, but both patchsets > targetting net-next tree to aovid waiting for the changing in mm tree > to merged back to net-next tree for adding supporting of new APIs. > > I am not sure your perference is option 2 or option 3 here, or there are others > options here, it would be better to be more specific about your option here. As > option 2 doesn't seems to make much sense if all the existing users/callers of > page_frag seems to be belonged to networking for testing reasons, and the original > code seemed to go through net-next tree too: > https://github.com/torvalds/linux/commit/b63ae8ca096dfdbfeef6a209c30a93a966518853 I am suggesting option 2. The main issue is that this patch set has had a number of issues that fall into the realm of mm more than netdev. The issue is that I only have a limited amount of time for review and I feel like having this be reviewed as a submission for mm would bring in more people familiar with that side of things to review it. As it stands, trying to submit this through netdev is eating up a significant amount of my time as there aren't many people on the netdev side of things that can review the mm bits. If you insist on this needing to go through net-next my inclination would be to just reject the set as it is bound to introduce a number of issues due to the sheer size of the refactor and the fact that it is providing little if any benefit. > And the main reason I chose option 1 over option 2 is: it is hard to tell how > much changing needed to support the new usecase, so it is better to keep them > in one patchset to have a bigger picture here. Yes, it may make the patchset > harder to review, but that is the tradeoff we need to make here. As my > understanding, option 1 seem to be the common practice to handle the changing > affecting multi subsystems. Especially you had similar doubt about the changing > affecting current APIs as below, it seems hard to explain it without a new case: > > https://lore.kernel.org/all/68d1c7d3dfcd780fa3bed0bb71e41d7fb0a8c15d.camel@gmail.com/ The issue as I see it is that you aren't getting any engagement from the folks on the mm side. In fact from what I can tell it looks like you didn't even CC this patch set to them. The patches I called out below are very memory subsystem centric. I would say this patchset has no way forward if the patches I called out below aren't reviewed by folks from the memory subsystem maintainers. > > > > ... > > > > > So specifically I would like to see patches 1 (refactored as > > selftest), 2, 3, 5, 7, 8, 13 (current APIs), and 14 done as more of an > > mm focused set since many of the issues you seem to have are problems > > building due to mm build issues, dependencies, and the like. That is > > the foundation for this patch set and it seems like we keep seeing > > issues there so that needs to be solid before we can do the new API > > work. If focused on mm you might get more eyes on it as not many > > networking folks are that familiar with the memory management side of > > things. > > I am not sure if breaking it into more patchset is the common practice > to 'get more eyes' here. > Anyways, it is fair enough ask if there is more concrete reasoning > behind the asking and it is common practice to do that, and I would > love to break it to more patchsets to perhaps make the discussion > easier. The issue here is that this patchset is 2/3 memory subsystem, and you didn't seem to include anyone from the memory subsystem side of things on the Cc list. > > > > As for the other patches, specifically 10, 11, 12, and 13 (prepare, > > probe, commit API), they could then be spun up as a netdev centered > > set. I took a brief look at them but they need some serious refactor > > as I think they are providing page as a return value in several cases > > where they don't need to. > > The above is one of the reason I am not willing to do the spliting. > It is hard for someone to tell if the refactoring affecting current APIs > will be enough for the new usecase without supporting the new usecase, > isn't it possible that some refactoring may be proved to be unnecessary > or wrong? > > It would be better to be more specific about what do you mean by > 'they are providing page as a return value in several cases where they > don't need to' as above. This patchset isn't moving forward in its current state. Part of the issue is that it is kind of an unwieldy mess and has been difficult to review due to things like refactoring code you had already refactored. Ideally each change should be self contained and you shouldn't have to change things more than once. That is why I have suggested splitting things the way I did. It would give you a logical set where you do the initial refactor to enable your changes, and then you make those changes. It is not uncommon to see this done within the kernel community. For example if I recall correctly the folio changes when in as a few patch sets in order to take care of the necessary enabling and then enable their use in the various subsystems. > > > > In my opinion with a small bit of refactoring patch 4 can just be > > dropped. I don't think the renaming is necessary and it just adds > > noise to the commit logs for the impacted drivers. It will require > > tweaks to the other patches but I think it will be better that way in > > the long run. > > It would be better to be more specific about above too so that we don't > have to have more refactoring patchsets for the current APIs. I provided the review feedback in the patch. Specifically, don't rename existing APIs. It would be better to just come up with an alternative scheme such as a double underscore that would represent the page based version while the regular version stays the same. > > > > Looking at patch 6 I am left scratching my head and wondering if you > > have another build issue of some sort that you haven't mentioned. I > > really don't think it belongs in this patch set and should probably be > > a fix on its own if you have some reason to justify it. Otherwise you > > might also just look at refactoring it to take > > "__builtin_constant_p(size)" into account by copying/pasting the first > > bits from the generic version into the function since I am assuming > > there is a performance benefit to doing it in assembler. It should be > > a net win if you just add the accounting for constants. > > I am not sure if the commit log in patch 6 needs some rephasing to > answer your question above: > "As the get_order() implemented by xtensa supporting 'nsau' > instruction seems be the same as the generic implementation > in include/asm-generic/getorder.h when size is not a constant > value as the generic implementation calling the fls*() is also > utilizing the 'nsau' instruction for xtensa. > > So remove the get_order() implemented by xtensa, as using the > generic implementation may enable the compiler to do the > computing when size is a constant value instead of runtime > computing and enable the using of get_order() in BUILD_BUG_ON() > macro in next patch." > > See the below in the next patch, as the PAGE_FRAG_CACHE_MAX_ORDER > is using the get_order(): > BUILD_BUG_ON(PAGE_FRAG_CACHE_MAX_ORDER > PAGE_FRAG_CACHE_ORDER_MASK); Are you saying that the compiler translates the get_order call into the nsau instruction? I'm still not entirely convinced and would really like to see a review by the maintainer for that architecture to be comfortable with it. Otherwise as I said before my thought would be to simply copy over the bits for __builtin_constant_p from the generic version of get_order so that we don't run the risk of somehow messing up the non-constant case. > > > > Patch 9 could probably be a standalone patch or included in the more > > mm centered set. However it would need to be redone to fix the > > underlying issue rather than working around it by changing the > > function called rather than fixing the function. No point in improving > > it for one case when you can cover multiple cases with a single > > change. > > Sure, it is just that there is only 24h a day for me to do things > more generically. So perhaps I should remove patch 9 for now so > that we can improve thing more generically. I'm not sure what that is supposed to mean. The change I am suggesting is no bigger than what you have already done. It would just mean fixing the issue at the source instead of working around the issue. Taking that approach would yield a much better return than just doing the workaround. I could make the same argument about reviewing this patch set. I feel like a I only have so much time in the day. I have already caught a few places where you were circumventing issues instead of addressing them such as using macros to cover up #include ordering issues resulting in static inline functions blowing up. It feels like labeling this as a networking patch set is an attempt to circumvent working with the mm tree by going in and touching as much networking code as you can to claim this is a networking patch when only 3 patches(5, 10 and 12) really need to touch anything in networking. I am asking you to consider my suggestions for your own benefit as otherwise I am pretty much the only reviewer for these patches and the fact is I am not a regular contributor within the mm subsystem myself. I would really like to have input from the mm subsystem maintainer on things like your first patch which is adding a new test module to the mm tree currently. I am assuming that they wouldn't want us to place the test module in there, but I could be wrong. That is why I am suggesting breaking this up and submitting the mm bits as more mm focused so that we can get that additional input.
+cc Andrew Morton On 2024/7/22 23:21, Alexander Duyck wrote: > On Mon, Jul 22, 2024 at 5:41 AM Yunsheng Lin <linyunsheng@huawei.com> wrote: >> >> On 2024/7/22 7:49, Alexander Duyck wrote: >>> On Fri, Jul 19, 2024 at 2:36 AM Yunsheng Lin <linyunsheng@huawei.com> wrote: >>>> >>>> After [1], there are still two implementations for page frag: >>>> >>>> 1. mm/page_alloc.c: net stack seems to be using it in the >>>> rx part with 'struct page_frag_cache' and the main API >>>> being page_frag_alloc_align(). >>>> 2. net/core/sock.c: net stack seems to be using it in the >>>> tx part with 'struct page_frag' and the main API being >>>> skb_page_frag_refill(). >>>> >>>> This patchset tries to unfiy the page frag implementation >>>> by replacing page_frag with page_frag_cache for sk_page_frag() >>>> first. net_high_order_alloc_disable_key for the implementation >>>> in net/core/sock.c doesn't seems matter that much now as pcp >>>> is also supported for high-order pages: >>>> commit 44042b449872 ("mm/page_alloc: allow high-order pages to >>>> be stored on the per-cpu lists") >>>> >>>> As the related change is mostly related to networking, so >>>> targeting the net-next. And will try to replace the rest >>>> of page_frag in the follow patchset. >>> >>> So in reality I would say something like the first 4 patches are >>> probably more applicable to mm than they are to the net-next tree. >>> Especially given that we are having to deal with the mm_task_types.h >>> in order to sort out the include order issues. >>> >>> Given that I think it might make more sense to look at breaking this >>> into 2 or more patch sets with the first being more mm focused since >>> the test module and pulling the code out of page_alloc.c, gfp.h, and >>> mm_types.h would be pretty impactful on mm more than it is on the >>> networking stack. After those changes then I would agree that we are >>> mostly just impacting the network stack. >> >> I am sure there are plenty of good precedents about how to handling a >> patchset that affecting multi subsystems. >> Let's be more specific about what are the options here: >> 1. Keeping all changing as one patchset targetting the net-next tree >> as this version does. >> 2. Breaking all changing into two patchsets, the one affecting current APIs >> targetting the mm tree and the one supporting new APIs targetting >> net-next tree. >> 3. Breaking all changing into two patchset as option 2 does, but both patchsets >> targetting net-next tree to aovid waiting for the changing in mm tree >> to merged back to net-next tree for adding supporting of new APIs. >> >> I am not sure your perference is option 2 or option 3 here, or there are others >> options here, it would be better to be more specific about your option here. As >> option 2 doesn't seems to make much sense if all the existing users/callers of >> page_frag seems to be belonged to networking for testing reasons, and the original >> code seemed to go through net-next tree too: >> https://github.com/torvalds/linux/commit/b63ae8ca096dfdbfeef6a209c30a93a966518853 > > I am suggesting option 2. The main issue is that this patch set has > had a number of issues that fall into the realm of mm more than > netdev. The issue is that I only have a limited amount of time for > review and I feel like having this be reviewed as a submission for mm > would bring in more people familiar with that side of things to review > it. I am agreed with the 'bring in more people familiar with that side of things to review it' part, but I am just not sure about your asking for option 2 is reasonable or fair enough ask, as breaking this patchset into two patchsets may make it harder to discuss the reason behind the refactoring affecting the current APIs, and may make the reviewing harder too. It seems we might need to consider option 4 too: 4. Keeping all changing as one patchset targetting the mm tree. One of the problem I see with option 4 is that it makes it harder to use CI from netdev. > > As it stands, trying to submit this through netdev is eating up a > significant amount of my time as there aren't many people on the > netdev side of things that can review the mm bits. If you insist on > this needing to go through net-next my inclination would be to just > reject the set as it is bound to introduce a number of issues due to > the sheer size of the refactor and the fact that it is providing > little if any benefit. > >> And the main reason I chose option 1 over option 2 is: it is hard to tell how >> much changing needed to support the new usecase, so it is better to keep them >> in one patchset to have a bigger picture here. Yes, it may make the patchset >> harder to review, but that is the tradeoff we need to make here. As my >> understanding, option 1 seem to be the common practice to handle the changing >> affecting multi subsystems. Especially you had similar doubt about the changing >> affecting current APIs as below, it seems hard to explain it without a new case: >> >> https://lore.kernel.org/all/68d1c7d3dfcd780fa3bed0bb71e41d7fb0a8c15d.camel@gmail.com/ > > The issue as I see it is that you aren't getting any engagement from > the folks on the mm side. In fact from what I can tell it looks like > you didn't even CC this patch set to them. The patches I called out Below is the gitconfig and cmd used to send the patchset, I am not sure if there is someone specifically that is your mind that need CC'ing, as the maillist for mm subsystem seems to be cc'ed for each mm focused patch you mentioned, and Andrew Morton is CC'ed most of the mm focused patch too. [sendemail.netdev] to = "davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com" cc = "netdev@vger.kernel.org, linux-kernel@vger.kernel.org" cccmd ="/home/*/net-next/scripts/get_maintainer.pl --nogit --nogit-fallback --norolestats" git send-email --identity=netdev > below are very memory subsystem centric. I would say this patchset has > no way forward if the patches I called out below aren't reviewed by > folks from the memory subsystem maintainers. Andrew Morton is the maintainer of mm subsystem according to MAINTAINERS file, Let's see if we can have some feedback about what is prefer option for him to handling this patchset by CC'ing him. > >>> >> >> ... >> >>> >>> So specifically I would like to see patches 1 (refactored as >>> selftest), 2, 3, 5, 7, 8, 13 (current APIs), and 14 done as more of an >>> mm focused set since many of the issues you seem to have are problems >>> building due to mm build issues, dependencies, and the like. That is >>> the foundation for this patch set and it seems like we keep seeing >>> issues there so that needs to be solid before we can do the new API >>> work. If focused on mm you might get more eyes on it as not many >>> networking folks are that familiar with the memory management side of >>> things. >> >> I am not sure if breaking it into more patchset is the common practice >> to 'get more eyes' here. >> Anyways, it is fair enough ask if there is more concrete reasoning >> behind the asking and it is common practice to do that, and I would >> love to break it to more patchsets to perhaps make the discussion >> easier. > > The issue here is that this patchset is 2/3 memory subsystem, and you > didn't seem to include anyone from the memory subsystem side of things > on the Cc list. I am not familiar enough with mm subsystem yet, so I depended on the get_maintainer.pl to sort out the CC list. If there is a CC list in your mind, please give a list, and I would be very graceful and happy to include them in the next version. > >>> >>> As for the other patches, specifically 10, 11, 12, and 13 (prepare, >>> probe, commit API), they could then be spun up as a netdev centered >>> set. I took a brief look at them but they need some serious refactor >>> as I think they are providing page as a return value in several cases >>> where they don't need to. >> >> The above is one of the reason I am not willing to do the spliting. >> It is hard for someone to tell if the refactoring affecting current APIs >> will be enough for the new usecase without supporting the new usecase, >> isn't it possible that some refactoring may be proved to be unnecessary >> or wrong? >> >> It would be better to be more specific about what do you mean by >> 'they are providing page as a return value in several cases where they >> don't need to' as above. > > This patchset isn't moving forward in its current state. Part of the > issue is that it is kind of an unwieldy mess and has been difficult to > review due to things like refactoring code you had already refactored. > Ideally each change should be self contained and you shouldn't have to > change things more than once. That is why I have suggested splitting > things the way I did. It would give you a logical set where you do the > initial refactor to enable your changes, and then you make those > changes. It is not uncommon to see this done within the kernel > community. For example if I recall correctly the folio changes when in > as a few patch sets in order to take care of the necessary enabling > and then enable their use in the various subsystems. The first folio changes did seem come with use case for it as below" " This converts just parts of the core MM and the page cache. For 5.17, we intend to convert various filesystems (XFS and AFS are ready; other filesystems may make it) and also convert more of the MM and page cache to folios. For 5.18, multi-page folios should be ready." It is just that the use case for folio happen to be in mm subsystem, but it seems we can't do that for page_frag, as the current and future possible usecases are mostly networking related, I suppose that is one of the reason there are no much engagement from the mm side and I am suspecting there will be not much engagement as we expected even if we target the mm tree. https://github.com/torvalds/linux/commit/49f8275c7d92?spm=a2c65.11461447.0.0.68003853lUiVcA > >>> >>> In my opinion with a small bit of refactoring patch 4 can just be >>> dropped. I don't think the renaming is necessary and it just adds >>> noise to the commit logs for the impacted drivers. It will require >>> tweaks to the other patches but I think it will be better that way in >>> the long run. >> >> It would be better to be more specific about above too so that we don't >> have to have more refactoring patchsets for the current APIs. > > I provided the review feedback in the patch. Specifically, don't > rename existing APIs. It would be better to just come up with an > alternative scheme such as a double underscore that would represent > the page based version while the regular version stays the same. It seems you provided a similar feedback in v2, but we seems we need three APIs for the different usecases at least from the allocation side: "Depending on different use cases, callers expecting to deal with va, page or both va and page for them may call page_frag_alloc_va*, page_frag_alloc_pg*, or page_frag_alloc* API accordingly." https://lore.kernel.org/all/18ca19fa64267b84bee10473a81cbc63f53104a0.camel@gmail.com/ And yes, you suggested dropping it, but it does not make the disagreement disapear, we still need to figure out a appropriate naming we are both ok with the new APIs. > >>> >>> Looking at patch 6 I am left scratching my head and wondering if you >>> have another build issue of some sort that you haven't mentioned. I >>> really don't think it belongs in this patch set and should probably be >>> a fix on its own if you have some reason to justify it. Otherwise you >>> might also just look at refactoring it to take >>> "__builtin_constant_p(size)" into account by copying/pasting the first >>> bits from the generic version into the function since I am assuming >>> there is a performance benefit to doing it in assembler. It should be >>> a net win if you just add the accounting for constants. >> >> I am not sure if the commit log in patch 6 needs some rephasing to >> answer your question above: >> "As the get_order() implemented by xtensa supporting 'nsau' >> instruction seems be the same as the generic implementation >> in include/asm-generic/getorder.h when size is not a constant >> value as the generic implementation calling the fls*() is also >> utilizing the 'nsau' instruction for xtensa. >> >> So remove the get_order() implemented by xtensa, as using the >> generic implementation may enable the compiler to do the >> computing when size is a constant value instead of runtime >> computing and enable the using of get_order() in BUILD_BUG_ON() >> macro in next patch." >> >> See the below in the next patch, as the PAGE_FRAG_CACHE_MAX_ORDER >> is using the get_order(): >> BUILD_BUG_ON(PAGE_FRAG_CACHE_MAX_ORDER > PAGE_FRAG_CACHE_ORDER_MASK); > > Are you saying that the compiler translates the get_order call into > the nsau instruction? I'm still not entirely convinced and would > really like to see a review by the maintainer for that architecture to > be comfortable with it. That patch does carry an Acked-by tag from Max Filippov, which is the maintainer for xtensa architecture according to MAINTAINERS file: https://lore.kernel.org/all/CAMo8BfJ5KwXjFDKGs2oBSTH1C7Vnnsbvcm6-qfV5gYh30+VvUQ@mail.gmail.com/ > > Otherwise as I said before my thought would be to simply copy over the > bits for __builtin_constant_p from the generic version of get_order so > that we don't run the risk of somehow messing up the non-constant > case. > >>> >>> Patch 9 could probably be a standalone patch or included in the more >>> mm centered set. However it would need to be redone to fix the >>> underlying issue rather than working around it by changing the >>> function called rather than fixing the function. No point in improving >>> it for one case when you can cover multiple cases with a single >>> change. >> >> Sure, it is just that there is only 24h a day for me to do things >> more generically. So perhaps I should remove patch 9 for now so >> that we can improve thing more generically. > > I'm not sure what that is supposed to mean. The change I am suggesting > is no bigger than what you have already done. It would just mean > fixing the issue at the source instead of working around the issue. > Taking that approach would yield a much better return than just doing > the workaround. > > I could make the same argument about reviewing this patch set. I feel > like a I only have so much time in the day. I have already caught a Yes, I think we need to consider thing from each other'prespection. We can always slow thing down by sending a message to notify each other when there are some busy days. > few places where you were circumventing issues instead of addressing > them such as using macros to cover up #include ordering issues > resulting in static inline functions blowing up. It feels like > labeling this as a networking patch set is an attempt to circumvent Circumventing is always not my intention. In fact, I think it is better to catch some mistake during review instead of having to debug in the field and better idea may sometimes come out during discussion. As the matter of fact, the idea of reusing the existing space to reduce the size of 'page_frag_cache' came out during the discussion with you in the 'remove page frag implementation in vhost_net' patchset. > working with the mm tree by going in and touching as much networking > code as you can to claim this is a networking patch when only 3 > patches(5, 10 and 12) really need to touch anything in networking. > > I am asking you to consider my suggestions for your own benefit as > otherwise I am pretty much the only reviewer for these patches and the Thanks for the time and effort reviewing, but there are still other reviewers, mainly Mat, Subbaraya and Max. > fact is I am not a regular contributor within the mm subsystem myself. > I would really like to have input from the mm subsystem maintainer on > things like your first patch which is adding a new test module to the > mm tree currently. I am assuming that they wouldn't want us to place > the test module in there, but I could be wrong. That is why I am > suggesting breaking this up and submitting the mm bits as more mm > focused so that we can get that additional input.