Message ID | 20230526171057.66876-3-sebastian.reichel@collabora.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
Series | Fix 64 bit issues in common clock framework | expand |
Il 26/05/23 19:10, Sebastian Reichel ha scritto: > The clock framework handles clock rates as "unsigned long", so u32 on > 32-bit architectures and u64 on 64-bit architectures. > > The current code pointlessly casts the dividend to u64 on 32-bit > architectures and thus pointlessly reducing the performance. > > On the other hand on 64-bit architectures the divisor is masked and only > the lower 32-bit are used. Thus requesting a frequency >= 4.3GHz results > in incorrect values. For example requesting 4300000000 (4.3 GHz) will > effectively request ca. 5 MHz. Requesting clk_round_rate(clk, ULONG_MAX) > is a bit of a special case, since that still returns correct values as > long as the parent clock is below 8.5 GHz. > > Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Quoting Sebastian Reichel (2023-05-26 10:10:57) > The clock framework handles clock rates as "unsigned long", so u32 on > 32-bit architectures and u64 on 64-bit architectures. > > The current code pointlessly casts the dividend to u64 on 32-bit > architectures and thus pointlessly reducing the performance. It looks like that was done to make the DIV_ROUND_UP() macro not overflow the dividend on 32-bit machines (from 9556f9dad8f5): DIV_ROUND_UP(3000000000, 1500000000) = (3.0G + 1.5G - 1) / 1.5G = OVERFLOW / 1.5G but I agree, the u64 cast is not necessary if DIV_ROUND_UP_ULL() is used as that macro casts the dividend to unsigned long long anyway. > > On the other hand on 64-bit architectures the divisor is masked and only > the lower 32-bit are used. Thus requesting a frequency >= 4.3GHz results > in incorrect values. For example requesting 4300000000 (4.3 GHz) will > effectively request ca. 5 MHz. Nice catch. But I'm concerned that the case above is broken by changing to DIV_ROUND_UP(). As this code is generic, I fear we'll have to change this code that divides rates to use DIV64_U64_ROUND_UP() because we don't know how large the rate is (i.e. it could be larger than 32-bits on a 64-bit machine). > Requesting clk_round_rate(clk, ULONG_MAX) > is a bit of a special case, since that still returns correct values as > long as the parent clock is below 8.5 GHz. > > Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> > --- > drivers/clk/clk-divider.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/clk/clk-divider.c b/drivers/clk/clk-divider.c > index a2c2b5203b0a..c38e8aa60e54 100644 > --- a/drivers/clk/clk-divider.c > +++ b/drivers/clk/clk-divider.c > @@ -220,7 +220,7 @@ static int _div_round_up(const struct clk_div_table *table, > unsigned long parent_rate, unsigned long rate, > unsigned long flags) > { > - int div = DIV_ROUND_UP_ULL((u64)parent_rate, rate); > + int div = DIV_ROUND_UP(parent_rate, rate); > > if (flags & CLK_DIVIDER_POWER_OF_TWO) > div = __roundup_pow_of_two(div); > @@ -237,7 +237,7 @@ static int _div_round_closest(const struct clk_div_table *table, > int up, down; > unsigned long up_rate, down_rate; > > - up = DIV_ROUND_UP_ULL((u64)parent_rate, rate); > + up = DIV_ROUND_UP(parent_rate, rate); > down = parent_rate / rate; > > if (flags & CLK_DIVIDER_POWER_OF_TWO) { > @@ -473,7 +473,7 @@ int divider_get_val(unsigned long rate, unsigned long parent_rate, > { > unsigned int div, value; > > - div = DIV_ROUND_UP_ULL((u64)parent_rate, rate); > + div = DIV_ROUND_UP(parent_rate, rate); > > if (!_is_valid_div(table, div, flags)) > return -EINVAL; This is undoing parts of commit 9556f9dad8f5 ("clk: divider: handle integer overflow when dividing large clock rates"). Please pair this patch with extensive kunit tests in a new test suite clk-divider_test.c file. I don't know if UML supports changing sizeof(long), but that would be a cool feature to tease out these sorts of issues. I suppose we'll just have to run the kunit tests on various architectures to cover the possibilities.
From: Stephen Boyd > Sent: 13 June 2023 01:42 > > Quoting Sebastian Reichel (2023-05-26 10:10:57) > > The clock framework handles clock rates as "unsigned long", so u32 on > > 32-bit architectures and u64 on 64-bit architectures. > > > > The current code pointlessly casts the dividend to u64 on 32-bit > > architectures and thus pointlessly reducing the performance. > > It looks like that was done to make the DIV_ROUND_UP() macro not > overflow the dividend on 32-bit machines (from 9556f9dad8f5): > > DIV_ROUND_UP(3000000000, 1500000000) = (3.0G + 1.5G - 1) / 1.5G > = OVERFLOW / 1.5G Maybe add: #define DIV_ROUND_UP_NZ(x, y) (((x) - 1)/(y) + 1) which doesn't overflow but requires x != 0. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/drivers/clk/clk-divider.c b/drivers/clk/clk-divider.c index a2c2b5203b0a..c38e8aa60e54 100644 --- a/drivers/clk/clk-divider.c +++ b/drivers/clk/clk-divider.c @@ -220,7 +220,7 @@ static int _div_round_up(const struct clk_div_table *table, unsigned long parent_rate, unsigned long rate, unsigned long flags) { - int div = DIV_ROUND_UP_ULL((u64)parent_rate, rate); + int div = DIV_ROUND_UP(parent_rate, rate); if (flags & CLK_DIVIDER_POWER_OF_TWO) div = __roundup_pow_of_two(div); @@ -237,7 +237,7 @@ static int _div_round_closest(const struct clk_div_table *table, int up, down; unsigned long up_rate, down_rate; - up = DIV_ROUND_UP_ULL((u64)parent_rate, rate); + up = DIV_ROUND_UP(parent_rate, rate); down = parent_rate / rate; if (flags & CLK_DIVIDER_POWER_OF_TWO) { @@ -473,7 +473,7 @@ int divider_get_val(unsigned long rate, unsigned long parent_rate, { unsigned int div, value; - div = DIV_ROUND_UP_ULL((u64)parent_rate, rate); + div = DIV_ROUND_UP(parent_rate, rate); if (!_is_valid_div(table, div, flags)) return -EINVAL;
The clock framework handles clock rates as "unsigned long", so u32 on 32-bit architectures and u64 on 64-bit architectures. The current code pointlessly casts the dividend to u64 on 32-bit architectures and thus pointlessly reducing the performance. On the other hand on 64-bit architectures the divisor is masked and only the lower 32-bit are used. Thus requesting a frequency >= 4.3GHz results in incorrect values. For example requesting 4300000000 (4.3 GHz) will effectively request ca. 5 MHz. Requesting clk_round_rate(clk, ULONG_MAX) is a bit of a special case, since that still returns correct values as long as the parent clock is below 8.5 GHz. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> --- drivers/clk/clk-divider.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)