Message ID | cover.1626879395.git.pabeni@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | sk_buff: optimize layout for GRO | expand |
On 7/21/2021 9:44 AM, Paolo Abeni wrote: > This is a very early draft - in a different world would be > replaced by hallway discussion at in-person conference - aimed at > outlining some ideas and collect feedback on the overall outlook. > There are still bugs to be fixed, more test and benchmark need, etc. > > There are 3 main goals: > - [try to] avoid the overhead for uncommon conditions at GRO time > (patches 1-4) > - enable backpressure for the veth GRO path (patches 5-6) > - reduce the number of cacheline used by the sk_buff lifecycle > from 4 to 3, at least in some common scenarios (patches 1,7-9). > The idea here is avoid the initialization of some fields and > control their validity with a bitmask, as presented by at least > Florian and Jesper in the past. If I understand correctly, you're creating an optimized case which excludes ct, secmark, vlan and UDP tunnel. Is this correct, and if so, why those particular fields? What impact will this have in the non-optimal (with any of the excluded fields) case? > > The above requires a bit of code churn in some places and, yes, > a few new bits in the sk_buff struct (using some existing holes) > > Paolo Abeni (9): > sk_buff: track nfct status in newly added skb->_state > sk_buff: track dst status in skb->_state > sk_buff: move the active_extensions into the state bitfield > net: optimize GRO for the common case. > skbuff: introduce has_sk state bit. > veth: use skb_prepare_for_gro() > sk_buff: move inner header fields after tail > sk_buff: move vlan field after tail. > sk_buff: access secmark via getter/setter > > drivers/net/veth.c | 2 +- > include/linux/skbuff.h | 117 ++++++++++++++++++++++--------- > include/net/dst.h | 3 + > include/net/sock.h | 9 +++ > net/core/dev.c | 31 +++++--- > net/core/skbuff.c | 40 +++++++---- > net/netfilter/nfnetlink_queue.c | 6 +- > net/netfilter/nft_meta.c | 6 +- > net/netfilter/xt_CONNSECMARK.c | 8 +-- > net/netfilter/xt_SECMARK.c | 2 +- > security/apparmor/lsm.c | 15 ++-- > security/selinux/hooks.c | 10 +-- > security/smack/smack_lsm.c | 4 +- > security/smack/smack_netfilter.c | 4 +- > 14 files changed, 175 insertions(+), 82 deletions(-) >
Hello, On Wed, 2021-07-21 at 11:15 -0700, Casey Schaufler wrote: > On 7/21/2021 9:44 AM, Paolo Abeni wrote: > > This is a very early draft - in a different world would be > > replaced by hallway discussion at in-person conference - aimed at > > outlining some ideas and collect feedback on the overall outlook. > > There are still bugs to be fixed, more test and benchmark need, etc. > > > > There are 3 main goals: > > - [try to] avoid the overhead for uncommon conditions at GRO time > > (patches 1-4) > > - enable backpressure for the veth GRO path (patches 5-6) > > - reduce the number of cacheline used by the sk_buff lifecycle > > from 4 to 3, at least in some common scenarios (patches 1,7-9). > > The idea here is avoid the initialization of some fields and > > control their validity with a bitmask, as presented by at least > > Florian and Jesper in the past. > > If I understand correctly, you're creating an optimized case > which excludes ct, secmark, vlan and UDP tunnel. Is this correct, > and if so, why those particular fields? What impact will this have > in the non-optimal (with any of the excluded fields) case? Thank you for the feedback. There are 2 different relevant points: - the GRO stage. packets carring any of CT, dst, sk or skb_ext will do 2 additional conditionals per gro_receive WRT the current code. My understanding is that having any of such field set at GRO receive time is quite exceptional for real nic. All others packet will do 4 or 5 less conditionals, and will traverse a little less code. - sk_buff lifecycle * packets carrying vlan and UDP will not see any differences: sk_buff lifecycle will stil use 4 cachelines, as currently does, and no additional conditional is introduced. * packets carring nfct or secmark will see an additional conditional every time such field is accessed. The number of cacheline used will still be 4, as in the current code. My understanding is that when such access happens, there is already a relevant amount of "additional" code to be executed, the conditional overhead should not be measurable. Cheers, Paolo
On 7/22/2021 12:10 AM, Paolo Abeni wrote: > Hello, > > On Wed, 2021-07-21 at 11:15 -0700, Casey Schaufler wrote: >> On 7/21/2021 9:44 AM, Paolo Abeni wrote: >>> This is a very early draft - in a different world would be >>> replaced by hallway discussion at in-person conference - aimed at >>> outlining some ideas and collect feedback on the overall outlook. >>> There are still bugs to be fixed, more test and benchmark need, etc. >>> >>> There are 3 main goals: >>> - [try to] avoid the overhead for uncommon conditions at GRO time >>> (patches 1-4) >>> - enable backpressure for the veth GRO path (patches 5-6) >>> - reduce the number of cacheline used by the sk_buff lifecycle >>> from 4 to 3, at least in some common scenarios (patches 1,7-9). >>> The idea here is avoid the initialization of some fields and >>> control their validity with a bitmask, as presented by at least >>> Florian and Jesper in the past. >> If I understand correctly, you're creating an optimized case >> which excludes ct, secmark, vlan and UDP tunnel. Is this correct, >> and if so, why those particular fields? What impact will this have >> in the non-optimal (with any of the excluded fields) case? > Thank you for the feedback. You're most welcome. You did request comments. > > There are 2 different relevant points: > > - the GRO stage. > packets carring any of CT, dst, sk or skb_ext will do 2 additional > conditionals per gro_receive WRT the current code. My understanding is > that having any of such field set at GRO receive time is quite > exceptional for real nic. All others packet will do 4 or 5 less > conditionals, and will traverse a little less code. > > - sk_buff lifecycle > * packets carrying vlan and UDP will not see any differences: sk_buff > lifecycle will stil use 4 cachelines, as currently does, and no > additional conditional is introduced. > * packets carring nfct or secmark will see an additional conditional > every time such field is accessed. The number of cacheline used will > still be 4, as in the current code. My understanding is that when such > access happens, there is already a relevant amount of "additional" code > to be executed, the conditional overhead should not be measurable. I'm responsible for some of that "additonal" code. If the secmark is considered to be outside the performance critical data there are changes I would like to make that will substantially improve the performance of that "additional" code that would include a u64 secmark. If use of a secmark is considered indicative of a "slow" path, the rationale for restricting it to u32, that it might impact the "usual" case performance, seems specious. I can't say that I understand all the nuances and implications involved. It does appear that the changes you've suggested could negate the classic argument that requires the u32 secmark. > > Cheers, > > Paolo >
On Thu, 2021-07-22 at 09:04 -0700, Casey Schaufler wrote: > On 7/22/2021 12:10 AM, Paolo Abeni wrote: > > On Wed, 2021-07-21 at 11:15 -0700, Casey Schaufler wrote: > > > On 7/21/2021 9:44 AM, Paolo Abeni wrote: > > > > This is a very early draft - in a different world would be > > > > replaced by hallway discussion at in-person conference - aimed at > > > > outlining some ideas and collect feedback on the overall outlook. > > > > There are still bugs to be fixed, more test and benchmark need, etc. > > > > > > > > There are 3 main goals: > > > > - [try to] avoid the overhead for uncommon conditions at GRO time > > > > (patches 1-4) > > > > - enable backpressure for the veth GRO path (patches 5-6) > > > > - reduce the number of cacheline used by the sk_buff lifecycle > > > > from 4 to 3, at least in some common scenarios (patches 1,7-9). > > > > The idea here is avoid the initialization of some fields and > > > > control their validity with a bitmask, as presented by at least > > > > Florian and Jesper in the past. > > > If I understand correctly, you're creating an optimized case > > > which excludes ct, secmark, vlan and UDP tunnel. Is this correct, > > > and if so, why those particular fields? What impact will this have > > > in the non-optimal (with any of the excluded fields) case? > > Thank you for the feedback. > > You're most welcome. You did request comments. > > > There are 2 different relevant points: > > > > - the GRO stage. > > packets carring any of CT, dst, sk or skb_ext will do 2 additional > > conditionals per gro_receive WRT the current code. My understanding is > > that having any of such field set at GRO receive time is quite > > exceptional for real nic. All others packet will do 4 or 5 less > > conditionals, and will traverse a little less code. > > > > - sk_buff lifecycle > > * packets carrying vlan and UDP will not see any differences: sk_buff > > lifecycle will stil use 4 cachelines, as currently does, and no > > additional conditional is introduced. > > * packets carring nfct or secmark will see an additional conditional > > every time such field is accessed. The number of cacheline used will > > still be 4, as in the current code. My understanding is that when such > > access happens, there is already a relevant amount of "additional" code > > to be executed, the conditional overhead should not be measurable. > > I'm responsible for some of that "additonal" code. If the secmark > is considered to be outside the performance critical data there are > changes I would like to make that will substantially improve the > performance of that "additional" code that would include a u64 > secmark. If use of a secmark is considered indicative of a "slow" > path, the rationale for restricting it to u32, that it might impact > the "usual" case performance, seems specious. I can't say that I > understand all the nuances and implications involved. It does > appear that the changes you've suggested could negate the classic > argument that requires the u32 secmark. I see now I did not reply to one of you questions - why I picked-up vlan, tunnel secmark fields to move them at sk_buff tail. Tow main drivers on my side: - there are use cases/deployments that do not use them. - moving them around was doable in term of required changes. There are no "slow-path" implications on my side. For example, vlan_* fields are very critical performance wise, if the traffic is tagged. But surely there are busy servers not using tagget traffic which will enjoy the reduced cachelines footprint, and this changeset will not impact negatively the first case. WRT to the vlan example, secmark and nfct require an extra conditional to fetch the data. My understanding is that such additional conditional is not measurable performance-wise when benchmarking the security modules (or conntrack) because they have to do much more intersting things after fetching a few bytes from an already hot cacheline. Not sure if the above somehow clarify my statements. As for expanding secmark to 64 bits, I guess that could be an interesting follow-up discussion :) Cheers, Paolo
On Thu, Jul 22, 2021 at 12:59 PM Paolo Abeni <pabeni@redhat.com> wrote: > On Thu, 2021-07-22 at 09:04 -0700, Casey Schaufler wrote: > > On 7/22/2021 12:10 AM, Paolo Abeni wrote: > > > On Wed, 2021-07-21 at 11:15 -0700, Casey Schaufler wrote: > > > > On 7/21/2021 9:44 AM, Paolo Abeni wrote: > > > > > This is a very early draft - in a different world would be > > > > > replaced by hallway discussion at in-person conference - aimed at > > > > > outlining some ideas and collect feedback on the overall outlook. > > > > > There are still bugs to be fixed, more test and benchmark need, etc. > > > > > > > > > > There are 3 main goals: > > > > > - [try to] avoid the overhead for uncommon conditions at GRO time > > > > > (patches 1-4) > > > > > - enable backpressure for the veth GRO path (patches 5-6) > > > > > - reduce the number of cacheline used by the sk_buff lifecycle > > > > > from 4 to 3, at least in some common scenarios (patches 1,7-9). > > > > > The idea here is avoid the initialization of some fields and > > > > > control their validity with a bitmask, as presented by at least > > > > > Florian and Jesper in the past. > > > > If I understand correctly, you're creating an optimized case > > > > which excludes ct, secmark, vlan and UDP tunnel. Is this correct, > > > > and if so, why those particular fields? What impact will this have > > > > in the non-optimal (with any of the excluded fields) case? > > > Thank you for the feedback. > > > > You're most welcome. You did request comments. > > > > > There are 2 different relevant points: > > > > > > - the GRO stage. > > > packets carring any of CT, dst, sk or skb_ext will do 2 additional > > > conditionals per gro_receive WRT the current code. My understanding is > > > that having any of such field set at GRO receive time is quite > > > exceptional for real nic. All others packet will do 4 or 5 less > > > conditionals, and will traverse a little less code. > > > > > > - sk_buff lifecycle > > > * packets carrying vlan and UDP will not see any differences: sk_buff > > > lifecycle will stil use 4 cachelines, as currently does, and no > > > additional conditional is introduced. > > > * packets carring nfct or secmark will see an additional conditional > > > every time such field is accessed. The number of cacheline used will > > > still be 4, as in the current code. My understanding is that when such > > > access happens, there is already a relevant amount of "additional" code > > > to be executed, the conditional overhead should not be measurable. > > > > I'm responsible for some of that "additonal" code. If the secmark > > is considered to be outside the performance critical data there are > > changes I would like to make that will substantially improve the > > performance of that "additional" code that would include a u64 > > secmark. If use of a secmark is considered indicative of a "slow" > > path, the rationale for restricting it to u32, that it might impact > > the "usual" case performance, seems specious. I can't say that I > > understand all the nuances and implications involved. It does > > appear that the changes you've suggested could negate the classic > > argument that requires the u32 secmark. > > I see now I did not reply to one of you questions - why I picked-up > vlan, tunnel secmark fields to move them at sk_buff tail. > > Tow main drivers on my side: > - there are use cases/deployments that do not use them. > - moving them around was doable in term of required changes. > > There are no "slow-path" implications on my side. For example, vlan_* > fields are very critical performance wise, if the traffic is tagged. > But surely there are busy servers not using tagget traffic which will > enjoy the reduced cachelines footprint, and this changeset will not > impact negatively the first case. > > WRT to the vlan example, secmark and nfct require an extra conditional > to fetch the data. My understanding is that such additional conditional > is not measurable performance-wise when benchmarking the security > modules (or conntrack) because they have to do much more intersting > things after fetching a few bytes from an already hot cacheline. > > Not sure if the above somehow clarify my statements. > > As for expanding secmark to 64 bits, I guess that could be an > interesting follow-up discussion :) The intersection between netdev and the LSM has a long and somewhat tortured past with each party making sacrifices along the way to get where we are at today. It is far from perfect, at least from a LSM perspective, but it is what we've got and since performance is usually used as a club to beat back any changes proposed by the LSM side, I would like to object to these changes that negatively impact the LSM performance without some concession in return. It has been a while since Casey and I have spoken about this, but I think the prefered option would be to exchange the current __u32 "sk_buff.secmark" field with a void* "sk_buff.security" field, like so many other kernel level objects. Previous objections have eventually boiled down to the additional space in the sk_buff for the extra bits (there is some additional editorializing that could be done here, but I'll refrain), but based on the comments thus far in this thread it sounds like perhaps we can now make a deal here: move the LSM field down to a "colder" cacheline in exchange for converting the LSM field to a proper pointer. Thoughts?
Paul Moore <paul@paul-moore.com> wrote: > Tow main drivers on my side: > > - there are use cases/deployments that do not use them. > > - moving them around was doable in term of required changes. > > > > There are no "slow-path" implications on my side. For example, vlan_* > > fields are very critical performance wise, if the traffic is tagged. > > But surely there are busy servers not using tagget traffic which will > > enjoy the reduced cachelines footprint, and this changeset will not > > impact negatively the first case. > > > > WRT to the vlan example, secmark and nfct require an extra conditional > > to fetch the data. My understanding is that such additional conditional > > is not measurable performance-wise when benchmarking the security > > modules (or conntrack) because they have to do much more intersting > > things after fetching a few bytes from an already hot cacheline. > > > > Not sure if the above somehow clarify my statements. > > > > As for expanding secmark to 64 bits, I guess that could be an > > interesting follow-up discussion :) > > The intersection between netdev and the LSM has a long and somewhat > tortured past with each party making sacrifices along the way to get > where we are at today. It is far from perfect, at least from a LSM > perspective, but it is what we've got and since performance is usually > used as a club to beat back any changes proposed by the LSM side, I > would like to object to these changes that negatively impact the LSM > performance without some concession in return. It has been a while > since Casey and I have spoken about this, but I think the prefered > option would be to exchange the current __u32 "sk_buff.secmark" field > with a void* "sk_buff.security" field, like so many other kernel level > objects. Previous objections have eventually boiled down to the > additional space in the sk_buff for the extra bits (there is some > additional editorializing that could be done here, but I'll refrain), > but based on the comments thus far in this thread it sounds like > perhaps we can now make a deal here: move the LSM field down to a > "colder" cacheline in exchange for converting the LSM field to a > proper pointer. > > Thoughts? Is there a summary disucssion somewhere wrt. what exactly LSMs need? There is the skb extension infra, does that work for you?
On Sat, Jul 24, 2021 at 2:51 PM Florian Westphal <fw@strlen.de> wrote: > Paul Moore <paul@paul-moore.com> wrote: > > Tow main drivers on my side: > > > - there are use cases/deployments that do not use them. > > > - moving them around was doable in term of required changes. > > > > > > There are no "slow-path" implications on my side. For example, vlan_* > > > fields are very critical performance wise, if the traffic is tagged. > > > But surely there are busy servers not using tagget traffic which will > > > enjoy the reduced cachelines footprint, and this changeset will not > > > impact negatively the first case. > > > > > > WRT to the vlan example, secmark and nfct require an extra conditional > > > to fetch the data. My understanding is that such additional conditional > > > is not measurable performance-wise when benchmarking the security > > > modules (or conntrack) because they have to do much more intersting > > > things after fetching a few bytes from an already hot cacheline. > > > > > > Not sure if the above somehow clarify my statements. > > > > > > As for expanding secmark to 64 bits, I guess that could be an > > > interesting follow-up discussion :) > > > > The intersection between netdev and the LSM has a long and somewhat > > tortured past with each party making sacrifices along the way to get > > where we are at today. It is far from perfect, at least from a LSM > > perspective, but it is what we've got and since performance is usually > > used as a club to beat back any changes proposed by the LSM side, I > > would like to object to these changes that negatively impact the LSM > > performance without some concession in return. It has been a while > > since Casey and I have spoken about this, but I think the prefered > > option would be to exchange the current __u32 "sk_buff.secmark" field > > with a void* "sk_buff.security" field, like so many other kernel level > > objects. Previous objections have eventually boiled down to the > > additional space in the sk_buff for the extra bits (there is some > > additional editorializing that could be done here, but I'll refrain), > > but based on the comments thus far in this thread it sounds like > > perhaps we can now make a deal here: move the LSM field down to a > > "colder" cacheline in exchange for converting the LSM field to a > > proper pointer. > > > > Thoughts? > > Is there a summary disucssion somewhere wrt. what exactly LSMs need? My network access is limited for the next week so I don't have the ability to dig through the list archives, but if you look through the netdev/LSM/lists over the past decade (maybe go back ~15 years?) you will see multiple instances where we/I've brought up different solutions with the netdev folks only to hit a brick wall. The LSM ask for sk_buff is really the same as any other kernel object that we want to control with LSM access controls, e.g. inodes; we basically want a void* blob with the necessary hooks so that the opaque blob can be managed through the skb's lifetime. > There is the skb extension infra, does that work for you? I was hopeful that when the skb_ext capability was introduced we might be able to use it for the LSM(s), but when I asked netdev if they would be willing to accept patches to leverage the skb_ext infrastructure I was told "no".
Paul Moore <paul@paul-moore.com> wrote: > > There is the skb extension infra, does that work for you? > > I was hopeful that when the skb_ext capability was introduced we might > be able to use it for the LSM(s), but when I asked netdev if they > would be willing to accept patches to leverage the skb_ext > infrastructure I was told "no". I found https://lore.kernel.org/netdev/CAHC9VhSz1_KA1tCJtNjwK26BOkGhKGbPT7v1O82mWPduvWwd4A@mail.gmail.com/#r and from what I gather from your comments and that of Casey I think skb extensions is the correct thing for this (i.e., needs netlabel/secid config/enablement so typically won't be active on a distro kernel by default). It certainly makes more sense to me than doing lookups in a hashtable based on a ID (I tried to do that to get rid of skb->nf_bridge pointer years ago and it I could not figure out how to invalidate an entry without adding a new skb destructor callback).
On 7/25/2021 9:25 AM, Florian Westphal wrote: > Paul Moore <paul@paul-moore.com> wrote: >>> There is the skb extension infra, does that work for you? >> I was hopeful that when the skb_ext capability was introduced we might >> be able to use it for the LSM(s), but when I asked netdev if they >> would be willing to accept patches to leverage the skb_ext >> infrastructure I was told "no". > I found > > https://lore.kernel.org/netdev/CAHC9VhSz1_KA1tCJtNjwK26BOkGhKGbPT7v1O82mWPduvWwd4A@mail.gmail.com/#r > > and from what I gather from your comments and that of Casey > I think skb extensions is the correct thing for this (i.e., needs > netlabel/secid config/enablement so typically won't be active on > a distro kernel by default). RedHat and android use SELinux and will want this. Ubuntu doesn't yet, but netfilter in in the AppArmor task list. Tizen definitely uses it with Smack. The notion that security modules are only used in fringe cases is antiquated. > It certainly makes more sense to me than doing lookups > in a hashtable based on a ID Agreed. The data burden required to support a hash scheme for the security module stacking case is staggering. > (I tried to do that to get rid of skb->nf_bridge > pointer years ago and it I could not figure out how to invalidate an entry > without adding a new skb destructor callback).
Casey Schaufler <casey@schaufler-ca.com> wrote: > RedHat and android use SELinux and will want this. Ubuntu doesn't > yet, but netfilter in in the AppArmor task list. Tizen definitely > uses it with Smack. The notion that security modules are only used > in fringe cases is antiquated. I was not talking about LSM in general, I was referring to the extended info that Paul mentioned. If thats indeed going to be used on every distro then skb extensions are not suitable for this, it would result in extr akmalloc for every skb. > > It certainly makes more sense to me than doing lookups > > in a hashtable based on a ID > > Agreed. The data burden required to support a hash scheme > for the security module stacking case is staggering. It depends on the type of data (and its lifetime). I suspect you have something that is more like skb->dev/dst, i.e. reference to object that persists after the skb is free'd.
On 7/25/2021 3:52 PM, Florian Westphal wrote: > Casey Schaufler <casey@schaufler-ca.com> wrote: >> RedHat and android use SELinux and will want this. Ubuntu doesn't >> yet, but netfilter in in the AppArmor task list. Tizen definitely >> uses it with Smack. The notion that security modules are only used >> in fringe cases is antiquated. > I was not talking about LSM in general, I was referring to the > extended info that Paul mentioned. > > If thats indeed going to be used on every distro then skb extensions > are not suitable for this, it would result in extr akmalloc for every > skb. I am explicitly talking about the use of secmarks. All my references are uses of secmarks. >>> It certainly makes more sense to me than doing lookups >>> in a hashtable based on a ID >> Agreed. The data burden required to support a hash scheme >> for the security module stacking case is staggering. > It depends on the type of data (and its lifetime). > > I suspect you have something that is more like skb->dev/dst, > i.e. reference to object that persists after the skb is free'd. Just so. Only to make it more complicated, SELinux and Smack, the two LSMs currently using secmarks, use them differently. SELinux uses u32 "secids" natively, but Smack suffers serious performance degradation because it has to look up (efficiently, but look up nonetheless) the real Smack value on every packet. Please, I know about hash caches, cache hashes and all sorts of clever tricks to reduce the impact. Nothing beats having the end value up front.
On Mon, Jul 26, 2021 at 11:13 AM Casey Schaufler <casey@schaufler-ca.com> wrote: > On 7/25/2021 3:52 PM, Florian Westphal wrote: > > Casey Schaufler <casey@schaufler-ca.com> wrote: > >> RedHat and android use SELinux and will want this. Ubuntu doesn't > >> yet, but netfilter in in the AppArmor task list. Tizen definitely > >> uses it with Smack. The notion that security modules are only used > >> in fringe cases is antiquated. > > I was not talking about LSM in general, I was referring to the > > extended info that Paul mentioned. > > > > If thats indeed going to be used on every distro then skb extensions > > are not suitable for this, it would result in extr akmalloc for every > > skb. > > I am explicitly talking about the use of secmarks. All my > references are uses of secmarks. I'm talking about a void* which would contain LSM specific data; as I said earlier, think of inodes. This LSM specific data would include the existing secmark data as well as network peer security information which would finally (!!!) allow us to handle forwarded traffic and enable a number of other fixes and performance improvements. (The details are a bit beyond this discussion but it basically revolves around us not having to investigate the import the packet headers every time we want to determine the network peer security attributes, we could store the resolved LSM information in the sk_buff.security blob.)
On Mon, 2021-07-26 at 22:51 -0400, Paul Moore wrote: > On Mon, Jul 26, 2021 at 11:13 AM Casey Schaufler <casey@schaufler-ca.com> wrote: > > On 7/25/2021 3:52 PM, Florian Westphal wrote: > > > Casey Schaufler <casey@schaufler-ca.com> wrote: > > > > RedHat and android use SELinux and will want this. Ubuntu doesn't > > > > yet, but netfilter in in the AppArmor task list. Tizen definitely > > > > uses it with Smack. The notion that security modules are only used > > > > in fringe cases is antiquated. > > > I was not talking about LSM in general, I was referring to the > > > extended info that Paul mentioned. > > > > > > If thats indeed going to be used on every distro then skb extensions > > > are not suitable for this, it would result in extr akmalloc for every > > > skb. > > > > I am explicitly talking about the use of secmarks. All my > > references are uses of secmarks. > > I'm talking about a void* which would contain LSM specific data; as I > said earlier, think of inodes. This LSM specific data would include > the existing secmark data as well as network peer security information > which would finally (!!!) allow us to handle forwarded traffic and > enable a number of other fixes and performance improvements. > > (The details are a bit beyond this discussion but it basically > revolves around us not having to investigate the import the packet > headers every time we want to determine the network peer security > attributes, we could store the resolved LSM information in the > sk_buff.security blob.) I've investigated the feasibility of extending the secmark field to long/void*. I think that performance wise it should be doable on top of this series: the amount of allocated memory for sk_buff will not change, nor the amount of memory memseted at skb initialization time. I stumbled upon some uAPIs issues, as CT/nft expose a secmark related field via uAPI, changing that size without breaking esisting user-space looks hard to me. Additionally, even patch 7/9 is problematic, as there are some in kernel users accessing and using the inner_ field regardless skb- >encapsulation. That works while inner_* field are always initializared/zeored, but will break with the mentioned patch. The fix is doable, but large and complex. To keep the scope of this series sane, I'll drop in the next iteration all the problematic patches - that is: no sk_buff layout change at all. If there is interest for such thing, it could still be added incrementally. Cheers, Paolo