diff mbox series

[iwl-next,v2,10/13] ice: use <linux/packing.h> for Tx and Rx queue context data

Message ID 20240828-e810-live-migration-jk-prep-ctx-functions-v2-10-558ab9e240f5@intel.com (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series ice: use <linux/packing.h> for Tx and Rx queue context data | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 3 maintainers not CCed: pabeni@redhat.com kuba@kernel.org edumazet@google.com
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 18 this patch: 18
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 338 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 242 this patch: 241
netdev/source_inline success Was 0 now: 0

Commit Message

Jacob Keller Aug. 28, 2024, 8:57 p.m. UTC
The ice driver needs to write the Tx and Rx queue context when programming
Tx and Rx queues. This is currently done using some bespoke custom logic
via the ice_set_ctx() and its helper functions, along with bit position
definitions in the ice_tlan_ctx_info and ice_rlan_ctx_info structures.

This logic does work, but is problematic for several reasons:

1) ice_set_ctx requires a helper function for each byte size being packed,
   as it uses a separate function to pack u8, u16, u32, and u64 fields.
   This requires 4 functions which contain near-duplicate logic with the
   types changed out.

2) The logic in the ice_pack_ctx_word, ice_pack_ctx_dword, and
   ice_pack_ctx_qword does not handle values which straddle alignment
   boundaries very well. This requires that several fields in the
   ice_tlan_ctx_info and ice_rlan_ctx_info be a size larger than their bit
   size should require.

3) Future support for live migration will require adding unpacking
   functions to take the packed hardware context and unpack it into the
   ice_rlan_ctx and ice_tlan_ctx structures. Implementing this would
   require implementing ice_get_ctx, and its associated helper functions,
   which essentially doubles the amount of code required.

Since Linux 5.2, commit 554aae35007e ("lib: Add support for generic packing
operations") has had a robust bit packing library function which can
correctly pack or unpack data between the CPU-useful unpacked structures
and bitpacked buffers used by hardware. This API was previously broken if
the buffer size was not aligned to 4 bytes. However, this has recently been
fixed.

The major difference with <linux/packing.h> is that it expects the unpacked
data will always be a u64. This is somewhat limiting, but can be worked
around by using a local temporary u64.

Replace the ice-specific ice_set_ctx() logic with a <linux/packing.h>
based implementation, ice_ctx_pack(). It takes pointers to the relevant
packed and unpacked storage, as well as the least significant bit and the
width. A temporary local u64 value is used to interact with the pack() API.

As with the other implementations, we handle the error codes from pack()
with a pr_err and a call to dump_stack. These are unexpected as they should
only happen due to programmer error.

Note that I initially tried implementing this as functions which just
repeatably called the ice_ctx_pack() function instead of using the
ice_rlan_ctx_info and ice_tlan_ctx_info arrays. This does work, but it has
a couple of downsides:

 1) it wastes a significant amount of bytes in the text section, vs the
    savings from removing the RO data of the arrays.

 2) this cost is made worse after implementing an unpack function, as we
    must duplicate the list of packings for the unpack function.

It does have the slight upside of allowing integer type promotion when
packing. However, the unpack() function still requires a pointer to the u64
value, so a temporary variable would still be required when unpacking.

Finally, I opted to use the pack() interface instead of the packing()
interface since it ends up producing code I found easier to read overall.
The unpack support will ultimately add a few extra bytes to implement
separate ice_ctx_unpack(), and the __ice_unpack_queue_ctx(). However, I
think the resulting code is still simpler and will avoid requiring the
packing() to stick around forever. The overall cost is not high since we
still store the actual packing tables as shared RO data.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_common.h |  13 +-
 drivers/net/ethernet/intel/ice/ice_base.c   |   3 +-
 drivers/net/ethernet/intel/ice/ice_common.c | 285 ++++++++--------------------
 3 files changed, 96 insertions(+), 205 deletions(-)

Comments

Vladimir Oltean Sept. 3, 2024, 12:08 a.m. UTC | #1
On Wed, Aug 28, 2024 at 01:57:26PM -0700, Jacob Keller wrote:
> The major difference with <linux/packing.h> is that it expects the unpacked
> data will always be a u64. This is somewhat limiting, but can be worked
> around by using a local temporary u64.
> 
> As with the other implementations, we handle the error codes from pack()
> with a pr_err and a call to dump_stack. These are unexpected as they should
> only happen due to programmer error.
> 
> Note that I initially tried implementing this as functions which just
> repeatably called the ice_ctx_pack() function instead of using the
> ice_rlan_ctx_info and ice_tlan_ctx_info arrays. This does work, but it has
> a couple of downsides:
> 
>  1) it wastes a significant amount of bytes in the text section, vs the
>     savings from removing the RO data of the arrays.
> 
>  2) this cost is made worse after implementing an unpack function, as we
>     must duplicate the list of packings for the unpack function.

I agree with your concerns and with your decision of keeping the
ICE_CTX_STORE tables. Actually I have some more unposted lib/packing
changes which operate on struct packed_field arrays, very, very similar
to the ICE_CTX_STORE tables. Essentially two new calls: pack_fields()
and unpack_fields(), which perform the iteration inside the core library.
(the only real difference being that I went for startbit and endbit in
their definitions, rather than LSB+size).

I came to the realization that this API would be nice exactly because
otherwise, you need to duplicate the field definitions, once for the
pack() call and once for the unpack(). But if they're tables, you don't.

I'm looking at ways in which this new API could solve all the shortcomings
I don't like with lib/packing in general. Without being even aware of
ICE_CTX_STORE, I had actually implemented the type conversion to smaller
unpacked u8/u16/u32 types in exactly the same way.

I also wish to do something about the way in which lib/packing deals
with errors. I don't think it's exactly great for every driver to have
to dump stack and ignore them. And also, since they're programming
errors, it's odd (and bad for performance) to perform the sanity checks
for every function call, instead of just once, say at boot time. So I
had thought of a third new API call: packed_fields_validate(), which
runs at module_init() in the consumer driver (ice, sja1105, ...) and
operates on the field tables.

Basically it is a singular replacement for these existing verifications
in pack() and unpack():

	/* startbit is expected to be larger than endbit, and both are
	 * expected to be within the logically addressable range of the buffer.
	 */
	if (unlikely(startbit < endbit || startbit >= 8 * pbuflen))
		/* Invalid function call */
		return -EINVAL;

	value_width = startbit - endbit + 1;
	if (unlikely(value_width > 64))
		return -ERANGE;

so you actually need to tell packed_fields_validate() what is the size
of the buffer you intend to run pack_fields() and unpack_fields() on.
I don't see it as a problem at all that the packed buffer size must be
fixed and known ahead of time - you also operate on fixed ICE_RXQ_CTX_SZ
and ... hmmm... txq_ctx[22]? sized buffers.

But this packed_fields_validate() idea quickly creates another set of 2
problems which I didn't get to the bottom of:

1. Some sanity checks in pack() are dynamic and cannot be run at
   module_init() time. Like this:

	/* Check if "uval" fits in "value_width" bits.
	 * If value_width is 64, the check will fail, but any
	 * 64-bit uval will surely fit.
	 */
	if (value_width < 64 && uval >= (1ull << value_width))
		/* Cannot store "uval" inside "value_width" bits.
		 * Truncating "uval" is most certainly not desirable,
		 * so simply erroring out is appropriate.
		 */
		return -ERANGE;

   The worse part is that the check is actually useful. It led to the
   quick identification of the bug behind commit 24deec6b9e4a ("net:
   dsa: sja1105: disallow C45 transactions on the BASE-TX MDIO bus"),
   rather than seeing a silent failure. But... I would still really like
   the revised lib/packing API to return void for pack_fields() and
   unpack_fields(). Not really sure how to reconcile these.

2. How to make sure that the pbuflen passed to packed_fields_validate()
   will be the same one as the pbuflen passed to all subsequent pack_fields()
   calls validated against that very pbuflen?

Sorry for not posting code and just telling a story about it. There
isn't much to show other than unfinished ideas with conflicting
requirements. So I thought maybe I could use a little help with some
brainstorming. Of course, do let me know if you are not that interested
in making the ICE_CTX_STORE tables idea a part of the packing core.

Thanks!
Jacob Keller Sept. 3, 2024, 9:16 p.m. UTC | #2
On 9/2/2024 5:08 PM, Vladimir Oltean wrote:
> On Wed, Aug 28, 2024 at 01:57:26PM -0700, Jacob Keller wrote:
>> The major difference with <linux/packing.h> is that it expects the unpacked
>> data will always be a u64. This is somewhat limiting, but can be worked
>> around by using a local temporary u64.
>>
>> As with the other implementations, we handle the error codes from pack()
>> with a pr_err and a call to dump_stack. These are unexpected as they should
>> only happen due to programmer error.
>>
>> Note that I initially tried implementing this as functions which just
>> repeatably called the ice_ctx_pack() function instead of using the
>> ice_rlan_ctx_info and ice_tlan_ctx_info arrays. This does work, but it has
>> a couple of downsides:
>>
>>  1) it wastes a significant amount of bytes in the text section, vs the
>>     savings from removing the RO data of the arrays.
>>
>>  2) this cost is made worse after implementing an unpack function, as we
>>     must duplicate the list of packings for the unpack function.
> 
> I agree with your concerns and with your decision of keeping the
> ICE_CTX_STORE tables. Actually I have some more unposted lib/packing
> changes which operate on struct packed_field arrays, very, very similar
> to the ICE_CTX_STORE tables. Essentially two new calls: pack_fields()
> and unpack_fields(), which perform the iteration inside the core library.
> (the only real difference being that I went for startbit and endbit in
> their definitions, rather than LSB+size).
> 

I only kept my interface in terms of lsb+size because I did not want to
attempt to re-write the table. I actually did re-write the table at
first, and discovered a documentation bug because our documentation for
the table has incorrect lsb/msb for some of the fields in some versions
of the doc!

I ultimately don't mind switching to the packing convention of start/end
(though my brain does have trouble sometimes thinking of the start as
the higher bit...)

> I came to the realization that this API would be nice exactly because
> otherwise, you need to duplicate the field definitions, once for the
> pack() call and once for the unpack(). But if they're tables, you don't.
> 
> I'm looking at ways in which this new API could solve all the shortcomings
> I don't like with lib/packing in general. Without being even aware of
> ICE_CTX_STORE, I had actually implemented the type conversion to smaller
> unpacked u8/u16/u32 types in exactly the same way.

I think having this in the core API with a standardized table, along
with support for unpacking the types would be great!

> 
> I also wish to do something about the way in which lib/packing deals
> with errors. I don't think it's exactly great for every driver to have
> to dump stack and ignore them. And also, since they're programming
> errors, it's odd (and bad for performance) to perform the sanity checks
> for every function call, instead of just once, say at boot time. So I
> had thought of a third new API call: packed_fields_validate(), which
> runs at module_init() in the consumer driver (ice, sja1105, ...) and
> operates on the field tables.
> 

It does seem reasonable to me that we can move the sanity checks here,
especially since the main programmer error is if this table is incorrect
in one of these ways.

> Basically it is a singular replacement for these existing verifications
> in pack() and unpack():
> 
> 	/* startbit is expected to be larger than endbit, and both are
> 	 * expected to be within the logically addressable range of the buffer.
> 	 */
> 	if (unlikely(startbit < endbit || startbit >= 8 * pbuflen))
> 		/* Invalid function call */
> 		return -EINVAL;
> 
> 	value_width = startbit - endbit + 1;
> 	if (unlikely(value_width > 64))
> 		return -ERANGE;
> 
> so you actually need to tell packed_fields_validate() what is the size
> of the buffer you intend to run pack_fields() and unpack_fields() on.
> I don't see it as a problem at all that the packed buffer size must be
> fixed and known ahead of time - you also operate on fixed ICE_RXQ_CTX_SZ
> and ... hmmm... txq_ctx[22]? sized buffers.
> 

Yea, these are fixed sizes. Strictly I think we could have a macro
defining the size of the Tx queue context as well....

> But this packed_fields_validate() idea quickly creates another set of 2
> problems which I didn't get to the bottom of:
> 
> 1. Some sanity checks in pack() are dynamic and cannot be run at
>    module_init() time. Like this:
> 
> 	/* Check if "uval" fits in "value_width" bits.
> 	 * If value_width is 64, the check will fail, but any
> 	 * 64-bit uval will surely fit.
> 	 */
> 	if (value_width < 64 && uval >= (1ull << value_width))
> 		/* Cannot store "uval" inside "value_width" bits.
> 		 * Truncating "uval" is most certainly not desirable,
> 		 * so simply erroring out is appropriate.
> 		 */
> 		return -ERANGE;
> 

If we support u8/u16/u32/u64 sizes as well, you could check that the
size of the unpacked variable too. Could this data be in the table? Oh I
guess technically not because we are checking if the actual value passed
fits. I think keeping this but making it WARN would be sufficient?

>    The worse part is that the check is actually useful. It led to the
>    quick identification of the bug behind commit 24deec6b9e4a ("net:
>    dsa: sja1105: disallow C45 transactions on the BASE-TX MDIO bus"),
>    rather than seeing a silent failure. But... I would still really like
>    the revised lib/packing API to return void for pack_fields() and
>    unpack_fields(). Not really sure how to reconcile these.
> 

Since this is generally programmer error (not something where uAPI can
affect it) what about making these WARN in the core?

> 2. How to make sure that the pbuflen passed to packed_fields_validate()
>    will be the same one as the pbuflen passed to all subsequent pack_fields()
>    calls validated against that very pbuflen?
> 

I guess you either duplicate the check and WARN, or you don't, and let
it panic on the invalid memory? But I guess that would only actually
panic with something like KASAN

> Sorry for not posting code and just telling a story about it. There
> isn't much to show other than unfinished ideas with conflicting
> requirements. So I thought maybe I could use a little help with some
> brainstorming. Of course, do let me know if you are not that interested
> in making the ICE_CTX_STORE tables idea a part of the packing core.
> 
> Thanks!

I think moving this into core would be fantastic. Since pretty much
every driver handles these sanity checks the same way, I also think that
moving those into the core and making them WARN or similar seems
reasonable, so we can make the pack/unpack as void.

It would be interesting to see a comparison of the resulting module size.

How much of this do you have code for now?
Vladimir Oltean Sept. 3, 2024, 10:13 p.m. UTC | #3
On Tue, Sep 03, 2024 at 02:16:59PM -0700, Jacob Keller wrote:
> I only kept my interface in terms of lsb+size because I did not want to
> attempt to re-write the table. I actually did re-write the table at
> first, and discovered a documentation bug because our documentation for
> the table has incorrect lsb/msb for some of the fields in some versions
> of the doc!
> 
> I ultimately don't mind switching to the packing convention of start/end
> (though my brain does have trouble sometimes thinking of the start as
> the higher bit...)

Well, you can keep the ICE_CTX_STORE() macro formatted in whichever way
you want, including Width+LSB, but reimplement it to generate a struct
packed_field, formatted as startbit+endbit (endbit=LSB, startbit=LSB+Width-1).
I actually encourage any user of the packing library to write the code
in the closest way possible to the field definitions as they appear in
the documentation. What do you think?

> > I came to the realization that this API would be nice exactly because
> > otherwise, you need to duplicate the field definitions, once for the
> > pack() call and once for the unpack(). But if they're tables, you don't.
> > 
> > I'm looking at ways in which this new API could solve all the shortcomings
> > I don't like with lib/packing in general. Without being even aware of
> > ICE_CTX_STORE, I had actually implemented the type conversion to smaller
> > unpacked u8/u16/u32 types in exactly the same way.
> 
> I think having this in the core API with a standardized table, along
> with support for unpacking the types would be great!

Cool then!

> > I also wish to do something about the way in which lib/packing deals
> > with errors. I don't think it's exactly great for every driver to have
> > to dump stack and ignore them. And also, since they're programming
> > errors, it's odd (and bad for performance) to perform the sanity checks
> > for every function call, instead of just once, say at boot time. So I
> > had thought of a third new API call: packed_fields_validate(), which
> > runs at module_init() in the consumer driver (ice, sja1105, ...) and
> > operates on the field tables.
> > 
> 
> It does seem reasonable to me that we can move the sanity checks here,
> especially since the main programmer error is if this table is incorrect
> in one of these ways.

Actually I did manage to cook something up which I like even more than
packed_fields_validate(): a compile-time sanity check. I'm not completely
happy with it, just reasonably happy. You'll see.

> > Basically it is a singular replacement for these existing verifications
> > in pack() and unpack():
> > 
> > 	/* startbit is expected to be larger than endbit, and both are
> > 	 * expected to be within the logically addressable range of the buffer.
> > 	 */
> > 	if (unlikely(startbit < endbit || startbit >= 8 * pbuflen))
> > 		/* Invalid function call */
> > 		return -EINVAL;
> > 
> > 	value_width = startbit - endbit + 1;
> > 	if (unlikely(value_width > 64))
> > 		return -ERANGE;
> > 
> > so you actually need to tell packed_fields_validate() what is the size
> > of the buffer you intend to run pack_fields() and unpack_fields() on.
> > I don't see it as a problem at all that the packed buffer size must be
> > fixed and known ahead of time - you also operate on fixed ICE_RXQ_CTX_SZ
> > and ... hmmm... txq_ctx[22]? sized buffers.
> > 
> 
> Yea, these are fixed sizes. Strictly I think we could have a macro
> defining the size of the Tx queue context as well....

Yeah, you'll need than when you'll see what I've prepared below :)

> > But this packed_fields_validate() idea quickly creates another set of 2
> > problems which I didn't get to the bottom of:
> > 
> > 1. Some sanity checks in pack() are dynamic and cannot be run at
> >    module_init() time. Like this:
> > 
> > 	/* Check if "uval" fits in "value_width" bits.
> > 	 * If value_width is 64, the check will fail, but any
> > 	 * 64-bit uval will surely fit.
> > 	 */
> > 	if (value_width < 64 && uval >= (1ull << value_width))
> > 		/* Cannot store "uval" inside "value_width" bits.
> > 		 * Truncating "uval" is most certainly not desirable,
> > 		 * so simply erroring out is appropriate.
> > 		 */
> > 		return -ERANGE;
> > 
> 
> If we support u8/u16/u32/u64 sizes as well, you could check that the
> size of the unpacked variable too. Could this data be in the table? Oh I
> guess technically not because we are checking if the actual value passed
> fits. I think keeping this but making it WARN would be sufficient?

The u8/u16/u32/u64 unpacked field size will absolutely be held in the
struct packed_field tables.

> >    The worse part is that the check is actually useful. It led to the
> >    quick identification of the bug behind commit 24deec6b9e4a ("net:
> >    dsa: sja1105: disallow C45 transactions on the BASE-TX MDIO bus"),
> >    rather than seeing a silent failure. But... I would still really like
> >    the revised lib/packing API to return void for pack_fields() and
> >    unpack_fields(). Not really sure how to reconcile these.
> > 
> 
> Since this is generally programmer error (not something where uAPI can
> affect it) what about making these WARN in the core?

Yes, I went for demoting the -ERANGE error in the truncation case to a
WARN() that is printed for both the "int pack()" as well as the new
"void __pack()" call. Practically, it doesn't make a big difference
whether we bail out or do nothing, as long as we loudly complain in
either case. It's the complaint that leads to the bug getting easily
identified and fixed.

> > 2. How to make sure that the pbuflen passed to packed_fields_validate()
> >    will be the same one as the pbuflen passed to all subsequent pack_fields()
> >    calls validated against that very pbuflen?
> 
> I guess you either duplicate the check and WARN, or you don't, and let
> it panic on the invalid memory? But I guess that would only actually
> panic with something like KASAN

Yeah, I'm still not too sure here. The length of the packed buffer still
needs to be "obvious to the eye", since the same length must be manually
passed to the sanity check as well as to the actual pack_fields() call,
otherwise "bad things" will happen. I believe in the users' ability to
structure their code in such a way that this is not hard. Especially
since the sanity checks now rely on BUILD_BUG_ON(), and that can be
technically be placed anywhere in the code, I expect the best place to
put it is exactly near the pack_fields() call. That way, it's the most
obvious that the buffer length declared for the sanity check is
identical to the one during actual usage. Especially if the usage can be
restricted to just one function or two.

> > Sorry for not posting code and just telling a story about it. There
> > isn't much to show other than unfinished ideas with conflicting
> > requirements. So I thought maybe I could use a little help with some
> > brainstorming. Of course, do let me know if you are not that interested
> > in making the ICE_CTX_STORE tables idea a part of the packing core.
> > 
> > Thanks!
> 
> I think moving this into core would be fantastic. Since pretty much
> every driver handles these sanity checks the same way, I also think that
> moving those into the core and making them WARN or similar seems
> reasonable, so we can make the pack/unpack as void.
> 
> It would be interesting to see a comparison of the resulting module size.

Yeah, I'm also very much interested in comparisons: text size vs rodata
size vs dynamic memory usage. With the pack_fields() API in the sja1105
driver, I can shrink an enormous amount of u64 unpacked structure fields
to just u8. I also really like the idea of compile-time sanity checks,
and I'm curious how much that matters in a benchmark or something.
Anyway, I don't have a complete rework at the moment, so there's that.

> How much of this do you have code for now?

Well, I do have code, but it's not yet complete :)
I've updated by https://github.com/vladimiroltean/linux/tree/packing-selftests
branch on top of your patch set. Until HEAD~1, I've tested the sja1105
driver and it still works. The last patch - the bulk of the conversion
actually - is extremely tedious and is not yet ready (it doesn't even
compile). Yet, with a bit of imagination, it should be enough to provide
an example and hopefully move the discussion forward.
Jacob Keller Sept. 3, 2024, 10:24 p.m. UTC | #4
On 9/3/2024 3:13 PM, Vladimir Oltean wrote:
> On Tue, Sep 03, 2024 at 02:16:59PM -0700, Jacob Keller wrote:
>> I only kept my interface in terms of lsb+size because I did not want to
>> attempt to re-write the table. I actually did re-write the table at
>> first, and discovered a documentation bug because our documentation for
>> the table has incorrect lsb/msb for some of the fields in some versions
>> of the doc!
>>
>> I ultimately don't mind switching to the packing convention of start/end
>> (though my brain does have trouble sometimes thinking of the start as
>> the higher bit...)
> 
> Well, you can keep the ICE_CTX_STORE() macro formatted in whichever way
> you want, including Width+LSB, but reimplement it to generate a struct
> packed_field, formatted as startbit+endbit (endbit=LSB, startbit=LSB+Width-1).
> I actually encourage any user of the packing library to write the code
> in the closest way possible to the field definitions as they appear in
> the documentation. What do you think?

Our documentation has both lsb, msb, and the width. I agree, and i think
it would benefit if the packing documentation suggests this as well.

> 
>>> I came to the realization that this API would be nice exactly because
>>> otherwise, you need to duplicate the field definitions, once for the
>>> pack() call and once for the unpack(). But if they're tables, you don't.
>>>
>>> I'm looking at ways in which this new API could solve all the shortcomings
>>> I don't like with lib/packing in general. Without being even aware of
>>> ICE_CTX_STORE, I had actually implemented the type conversion to smaller
>>> unpacked u8/u16/u32 types in exactly the same way.
>>
>> I think having this in the core API with a standardized table, along
>> with support for unpacking the types would be great!
> 
> Cool then!
> 
>>> I also wish to do something about the way in which lib/packing deals
>>> with errors. I don't think it's exactly great for every driver to have
>>> to dump stack and ignore them. And also, since they're programming
>>> errors, it's odd (and bad for performance) to perform the sanity checks
>>> for every function call, instead of just once, say at boot time. So I
>>> had thought of a third new API call: packed_fields_validate(), which
>>> runs at module_init() in the consumer driver (ice, sja1105, ...) and
>>> operates on the field tables.
>>>
>>
>> It does seem reasonable to me that we can move the sanity checks here,
>> especially since the main programmer error is if this table is incorrect
>> in one of these ways.
> 
> Actually I did manage to cook something up which I like even more than
> packed_fields_validate(): a compile-time sanity check. I'm not completely
> happy with it, just reasonably happy. You'll see.>>> Basically it is a singular replacement for these existing verifications
>>> in pack() and unpack():
>>>
>>> 	/* startbit is expected to be larger than endbit, and both are
>>> 	 * expected to be within the logically addressable range of the buffer.
>>> 	 */
>>> 	if (unlikely(startbit < endbit || startbit >= 8 * pbuflen))
>>> 		/* Invalid function call */
>>> 		return -EINVAL;
>>>
>>> 	value_width = startbit - endbit + 1;
>>> 	if (unlikely(value_width > 64))
>>> 		return -ERANGE;
>>>
>>> so you actually need to tell packed_fields_validate() what is the size
>>> of the buffer you intend to run pack_fields() and unpack_fields() on.
>>> I don't see it as a problem at all that the packed buffer size must be
>>> fixed and known ahead of time - you also operate on fixed ICE_RXQ_CTX_SZ
>>> and ... hmmm... txq_ctx[22]? sized buffers.
>>>
>>
>> Yea, these are fixed sizes. Strictly I think we could have a macro
>> defining the size of the Tx queue context as well....
> 
> Yeah, you'll need than when you'll see what I've prepared below :)
> 
>>> But this packed_fields_validate() idea quickly creates another set of 2
>>> problems which I didn't get to the bottom of:
>>>
>>> 1. Some sanity checks in pack() are dynamic and cannot be run at
>>>    module_init() time. Like this:
>>>
>>> 	/* Check if "uval" fits in "value_width" bits.
>>> 	 * If value_width is 64, the check will fail, but any
>>> 	 * 64-bit uval will surely fit.
>>> 	 */
>>> 	if (value_width < 64 && uval >= (1ull << value_width))
>>> 		/* Cannot store "uval" inside "value_width" bits.
>>> 		 * Truncating "uval" is most certainly not desirable,
>>> 		 * so simply erroring out is appropriate.
>>> 		 */
>>> 		return -ERANGE;
>>>
>>
>> If we support u8/u16/u32/u64 sizes as well, you could check that the
>> size of the unpacked variable too. Could this data be in the table? Oh I
>> guess technically not because we are checking if the actual value passed
>> fits. I think keeping this but making it WARN would be sufficient?
> 
> The u8/u16/u32/u64 unpacked field size will absolutely be held in the
> struct packed_field tables.
> 
>>>    The worse part is that the check is actually useful. It led to the
>>>    quick identification of the bug behind commit 24deec6b9e4a ("net:
>>>    dsa: sja1105: disallow C45 transactions on the BASE-TX MDIO bus"),
>>>    rather than seeing a silent failure. But... I would still really like
>>>    the revised lib/packing API to return void for pack_fields() and
>>>    unpack_fields(). Not really sure how to reconcile these.
>>>
>>
>> Since this is generally programmer error (not something where uAPI can
>> affect it) what about making these WARN in the core?
> 
> Yes, I went for demoting the -ERANGE error in the truncation case to a
> WARN() that is printed for both the "int pack()" as well as the new
> "void __pack()" call. Practically, it doesn't make a big difference
> whether we bail out or do nothing, as long as we loudly complain in
> either case. It's the complaint that leads to the bug getting easily
> identified and fixed.
>

Right.


>>> 2. How to make sure that the pbuflen passed to packed_fields_validate()
>>>    will be the same one as the pbuflen passed to all subsequent pack_fields()
>>>    calls validated against that very pbuflen?
>>
>> I guess you either duplicate the check and WARN, or you don't, and let
>> it panic on the invalid memory? But I guess that would only actually
>> panic with something like KASAN
> 
> Yeah, I'm still not too sure here. The length of the packed buffer still
> needs to be "obvious to the eye", since the same length must be manually
> passed to the sanity check as well as to the actual pack_fields() call,
> otherwise "bad things" will happen. I believe in the users' ability to
> structure their code in such a way that this is not hard. Especially
> since the sanity checks now rely on BUILD_BUG_ON(), and that can be
> technically be placed anywhere in the code, I expect the best place to
> put it is exactly near the pack_fields() call. That way, it's the most
> obvious that the buffer length declared for the sanity check is
> identical to the one during actual usage. Especially if the usage can be
> restricted to just one function or two.
> 

Makes sense. I may need to do a little work on the ice driver to make
that fit well :D I think currently we have a packing for the Tx queue
context in different places.

>>> Sorry for not posting code and just telling a story about it. There
>>> isn't much to show other than unfinished ideas with conflicting
>>> requirements. So I thought maybe I could use a little help with some
>>> brainstorming. Of course, do let me know if you are not that interested
>>> in making the ICE_CTX_STORE tables idea a part of the packing core.
>>>
>>> Thanks!
>>
>> I think moving this into core would be fantastic. Since pretty much
>> every driver handles these sanity checks the same way, I also think that
>> moving those into the core and making them WARN or similar seems
>> reasonable, so we can make the pack/unpack as void.
>>
>> It would be interesting to see a comparison of the resulting module size.
> 
> Yeah, I'm also very much interested in comparisons: text size vs rodata
> size vs dynamic memory usage. With the pack_fields() API in the sja1105
> driver, I can shrink an enormous amount of u64 unpacked structure fields
> to just u8. I also really like the idea of compile-time sanity checks,
> and I'm curious how much that matters in a benchmark or something.
> Anyway, I don't have a complete rework at the moment, so there's that.
> 

Yea. Actually my entire initial motivation for starting looking at
reworking ice was the weird sizes used in our unpacked context
structure... :D

>> How much of this do you have code for now?
> 
> Well, I do have code, but it's not yet complete :)
> I've updated by https://github.com/vladimiroltean/linux/tree/packing-selftests
> branch on top of your patch set. Until HEAD~1, I've tested the sja1105
> driver and it still works. The last patch - the bulk of the conversion
> actually - is extremely tedious and is not yet ready (it doesn't even
> compile). Yet, with a bit of imagination, it should be enough to provide
> an example and hopefully move the discussion forward.

Ok sounds good. I'll take a look at this.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
index 27208a60cece..550fdd08e6aa 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -5,6 +5,7 @@ 
 #define _ICE_COMMON_H_
 
 #include <linux/bitfield.h>
+#include <linux/packing.h>
 
 #include "ice.h"
 #include "ice_type.h"
@@ -93,8 +94,16 @@  bool ice_check_sq_alive(struct ice_hw *hw, struct ice_ctl_q_info *cq);
 int ice_aq_q_shutdown(struct ice_hw *hw, bool unloading);
 void ice_fill_dflt_direct_cmd_desc(struct ice_aq_desc *desc, u16 opcode);
 extern const struct ice_ctx_ele ice_tlan_ctx_info[];
-int ice_set_ctx(struct ice_hw *hw, u8 *src_ctx, u8 *dest_ctx,
-		const struct ice_ctx_ele *ce_info);
+extern const struct ice_ctx_ele ice_rlan_ctx_info[];
+
+void __ice_pack_queue_ctx(const void *ctx, void *buf, size_t len,
+			  const struct ice_ctx_ele *ctx_info);
+
+#define ice_pack_rxq_ctx(rlan_ctx, buf) \
+	__ice_pack_queue_ctx((rlan_ctx), (buf), sizeof(buf), ice_rlan_ctx_info)
+
+#define ice_pack_txq_ctx(tlan_ctx, buf) \
+	__ice_pack_queue_ctx((tlan_ctx), (buf), sizeof(buf), ice_tlan_ctx_info)
 
 extern struct mutex ice_global_cfg_lock_sw;
 
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index f448d3a84564..1881ce8105ca 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -907,8 +907,7 @@  ice_vsi_cfg_txq(struct ice_vsi *vsi, struct ice_tx_ring *ring,
 	ice_setup_tx_ctx(ring, &tlan_ctx, pf_q);
 	/* copy context contents into the qg_buf */
 	qg_buf->txqs[0].txq_id = cpu_to_le16(pf_q);
-	ice_set_ctx(hw, (u8 *)&tlan_ctx, qg_buf->txqs[0].txq_ctx,
-		    ice_tlan_ctx_info);
+	ice_pack_txq_ctx(&tlan_ctx, qg_buf->txqs[0].txq_ctx);
 
 	/* init queue specific tail reg. It is referred as
 	 * transmit comm scheduler queue doorbell.
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index c4b24ba36b5e..09a94c20e16d 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1387,8 +1387,89 @@  ice_copy_rxq_ctx_to_hw(struct ice_hw *hw, u8 *ice_rxq_ctx, u32 rxq_index)
 	return 0;
 }
 
+/**
+ * ice_ctx_pack - Pack Tx/Rx queue context data
+ * @pbuf: pointer to the packed buffer
+ * @pbuflen: size of the packed buffer
+ * @val: pointer to storage for the unpacked data
+ * @len: size of the unpacked data
+ * @width: width of bits to pack
+ * @lsb: least significant bit in the packed buffer
+ *
+ * Pack the given field of the Tx or Rx queue context into the hardware data
+ * buffer. The packed contents are in full Little Endian ordering with the
+ * least significant 4-byte block first.
+ *
+ * The only time that pack() should produce an error is due to invalid bit
+ * offsets. Thus, errors are logged along with a stack dump.
+ */
+static void ice_ctx_pack(void *pbuf, size_t pbuflen, const void *val,
+			 size_t len, size_t width, size_t lsb)
+{
+	size_t msb = lsb + width - 1;
+	u64 uval;
+	int err;
+
+	switch (len) {
+	case sizeof(u8):
+		uval = *((u8 *)val);
+		break;
+	case sizeof(u16):
+		uval = *((u16 *)val);
+		break;
+	case sizeof(u32):
+		uval = *((u32 *)val);
+		break;
+	case sizeof(u64):
+		uval = *((u64 *)val);
+		break;
+	default:
+		WARN_ONCE(1, "Unexpected size %zd when packing queue context",
+			  len);
+		return;
+	}
+
+	err = pack(pbuf, uval, msb, lsb, pbuflen,
+		   QUIRK_LITTLE_ENDIAN | QUIRK_LSW32_IS_FIRST);
+	if (unlikely(err)) {
+		if (err == -EINVAL) {
+			pr_err("MSB (%zd) expected to be larger than LSB (%zd)\n",
+			       msb, lsb);
+		} else if (err == -ERANGE) {
+			if ((msb - lsb + 1) > 64)
+				pr_err("Field %zd-%zd too large for 64 bits!\n",
+				       lsb, msb);
+			else
+				pr_err("Cannot store %llx inside field %zd-%zd (would truncate)\n",
+				       uval, lsb, msb);
+		}
+		dump_stack();
+	}
+}
+
+/**
+ * __ice_pack_queue_ctx - Pack Tx or Rx queue context
+ * @ctx: Unpacked queue context structure
+ * @buf: packed buffer storage
+ * @len: size of the packed buffer
+ * @ctx_info: array describing the packing layout
+ */
+void __ice_pack_queue_ctx(const void *ctx, void *buf, size_t len,
+			  const struct ice_ctx_ele *ctx_info)
+{
+	for (int f = 0; ctx_info[f].width; f++) {
+		const struct ice_ctx_ele *field_info = &ctx_info[f];
+		const void *field;
+
+		field = (const void *)ctx + field_info->offset;
+
+		ice_ctx_pack(buf, len, field, field_info->size_of,
+			     field_info->width, field_info->lsb);
+	}
+}
+
 /* LAN Rx Queue Context */
-static const struct ice_ctx_ele ice_rlan_ctx_info[] = {
+const struct ice_ctx_ele ice_rlan_ctx_info[] = {
 	/* Field		Width	LSB */
 	ICE_CTX_STORE(ice_rlan_ctx, head,		13,	0),
 	ICE_CTX_STORE(ice_rlan_ctx, cpuid,		8,	13),
@@ -1433,7 +1514,8 @@  int ice_write_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx,
 
 	rlan_ctx->prefena = 1;
 
-	ice_set_ctx(hw, (u8 *)rlan_ctx, ctx_buf, ice_rlan_ctx_info);
+	ice_pack_rxq_ctx(rlan_ctx, ctx_buf);
+
 	return ice_copy_rxq_ctx_to_hw(hw, ctx_buf, rxq_index);
 }
 
@@ -4526,205 +4608,6 @@  ice_aq_add_rdma_qsets(struct ice_hw *hw, u8 num_qset_grps,
 
 /* End of FW Admin Queue command wrappers */
 
-/**
- * ice_pack_ctx_byte - write a byte to a packed context structure
- * @src_ctx: unpacked source context structure
- * @dest_ctx: packed destination context data
- * @ce_info: context element description
- */
-static void ice_pack_ctx_byte(u8 *src_ctx, u8 *dest_ctx,
-			      const struct ice_ctx_ele *ce_info)
-{
-	u8 src_byte, dest_byte, mask;
-	u8 *from, *dest;
-	u16 shift_width;
-
-	/* copy from the next struct field */
-	from = src_ctx + ce_info->offset;
-
-	/* prepare the bits and mask */
-	shift_width = ce_info->lsb % 8;
-	mask = GENMASK(ce_info->width - 1 + shift_width, shift_width);
-
-	src_byte = *from;
-	src_byte <<= shift_width;
-	src_byte &= mask;
-
-	/* get the current bits from the target bit string */
-	dest = dest_ctx + (ce_info->lsb / 8);
-
-	memcpy(&dest_byte, dest, sizeof(dest_byte));
-
-	dest_byte &= ~mask;	/* get the bits not changing */
-	dest_byte |= src_byte;	/* add in the new bits */
-
-	/* put it all back */
-	memcpy(dest, &dest_byte, sizeof(dest_byte));
-}
-
-/**
- * ice_pack_ctx_word - write a word to a packed context structure
- * @src_ctx: unpacked source context structure
- * @dest_ctx: packed destination context data
- * @ce_info: context element description
- */
-static void ice_pack_ctx_word(u8 *src_ctx, u8 *dest_ctx,
-			      const struct ice_ctx_ele *ce_info)
-{
-	u16 src_word, mask;
-	__le16 dest_word;
-	u8 *from, *dest;
-	u16 shift_width;
-
-	/* copy from the next struct field */
-	from = src_ctx + ce_info->offset;
-
-	/* prepare the bits and mask */
-	shift_width = ce_info->lsb % 8;
-	mask = GENMASK(ce_info->width - 1 + shift_width, shift_width);
-
-	/* don't swizzle the bits until after the mask because the mask bits
-	 * will be in a different bit position on big endian machines
-	 */
-	src_word = *(u16 *)from;
-	src_word <<= shift_width;
-	src_word &= mask;
-
-	/* get the current bits from the target bit string */
-	dest = dest_ctx + (ce_info->lsb / 8);
-
-	memcpy(&dest_word, dest, sizeof(dest_word));
-
-	dest_word &= ~(cpu_to_le16(mask));	/* get the bits not changing */
-	dest_word |= cpu_to_le16(src_word);	/* add in the new bits */
-
-	/* put it all back */
-	memcpy(dest, &dest_word, sizeof(dest_word));
-}
-
-/**
- * ice_pack_ctx_dword - write a dword to a packed context structure
- * @src_ctx: unpacked source context structure
- * @dest_ctx: packed destination context data
- * @ce_info: context element description
- */
-static void ice_pack_ctx_dword(u8 *src_ctx, u8 *dest_ctx,
-			       const struct ice_ctx_ele *ce_info)
-{
-	u32 src_dword, mask;
-	__le32 dest_dword;
-	u8 *from, *dest;
-	u16 shift_width;
-
-	/* copy from the next struct field */
-	from = src_ctx + ce_info->offset;
-
-	/* prepare the bits and mask */
-	shift_width = ce_info->lsb % 8;
-	mask = GENMASK(ce_info->width - 1 + shift_width, shift_width);
-
-	/* don't swizzle the bits until after the mask because the mask bits
-	 * will be in a different bit position on big endian machines
-	 */
-	src_dword = *(u32 *)from;
-	src_dword <<= shift_width;
-	src_dword &= mask;
-
-	/* get the current bits from the target bit string */
-	dest = dest_ctx + (ce_info->lsb / 8);
-
-	memcpy(&dest_dword, dest, sizeof(dest_dword));
-
-	dest_dword &= ~(cpu_to_le32(mask));	/* get the bits not changing */
-	dest_dword |= cpu_to_le32(src_dword);	/* add in the new bits */
-
-	/* put it all back */
-	memcpy(dest, &dest_dword, sizeof(dest_dword));
-}
-
-/**
- * ice_pack_ctx_qword - write a qword to a packed context structure
- * @src_ctx: unpacked source context structure
- * @dest_ctx: packed destination context data
- * @ce_info: context element description
- */
-static void ice_pack_ctx_qword(u8 *src_ctx, u8 *dest_ctx,
-			       const struct ice_ctx_ele *ce_info)
-{
-	u64 src_qword, mask;
-	__le64 dest_qword;
-	u8 *from, *dest;
-	u16 shift_width;
-
-	/* copy from the next struct field */
-	from = src_ctx + ce_info->offset;
-
-	/* prepare the bits and mask */
-	shift_width = ce_info->lsb % 8;
-	mask = GENMASK_ULL(ce_info->width - 1 + shift_width, shift_width);
-
-	/* don't swizzle the bits until after the mask because the mask bits
-	 * will be in a different bit position on big endian machines
-	 */
-	src_qword = *(u64 *)from;
-	src_qword <<= shift_width;
-	src_qword &= mask;
-
-	/* get the current bits from the target bit string */
-	dest = dest_ctx + (ce_info->lsb / 8);
-
-	memcpy(&dest_qword, dest, sizeof(dest_qword));
-
-	dest_qword &= ~(cpu_to_le64(mask));	/* get the bits not changing */
-	dest_qword |= cpu_to_le64(src_qword);	/* add in the new bits */
-
-	/* put it all back */
-	memcpy(dest, &dest_qword, sizeof(dest_qword));
-}
-
-/**
- * ice_set_ctx - set context bits in packed structure
- * @hw: pointer to the hardware structure
- * @src_ctx:  pointer to a generic non-packed context structure
- * @dest_ctx: pointer to memory for the packed structure
- * @ce_info: List of Rx context elements
- */
-int ice_set_ctx(struct ice_hw *hw, u8 *src_ctx, u8 *dest_ctx,
-		const struct ice_ctx_ele *ce_info)
-{
-	int f;
-
-	for (f = 0; ce_info[f].width; f++) {
-		/* We have to deal with each element of the FW response
-		 * using the correct size so that we are correct regardless
-		 * of the endianness of the machine.
-		 */
-		if (ce_info[f].width > (ce_info[f].size_of * BITS_PER_BYTE)) {
-			ice_debug(hw, ICE_DBG_QCTX, "Field %d width of %d bits larger than size of %d byte(s) ... skipping write\n",
-				  f, ce_info[f].width, ce_info[f].size_of);
-			continue;
-		}
-		switch (ce_info[f].size_of) {
-		case sizeof(u8):
-			ice_pack_ctx_byte(src_ctx, dest_ctx, &ce_info[f]);
-			break;
-		case sizeof(u16):
-			ice_pack_ctx_word(src_ctx, dest_ctx, &ce_info[f]);
-			break;
-		case sizeof(u32):
-			ice_pack_ctx_dword(src_ctx, dest_ctx, &ce_info[f]);
-			break;
-		case sizeof(u64):
-			ice_pack_ctx_qword(src_ctx, dest_ctx, &ce_info[f]);
-			break;
-		default:
-			return -EINVAL;
-		}
-	}
-
-	return 0;
-}
-
 /**
  * ice_get_lan_q_ctx - get the LAN queue context for the given VSI and TC
  * @hw: pointer to the HW struct