Message ID | 20190703011020.151615-1-saravanak@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce Bandwidth OPPs for interconnect paths | expand |
On 02-07-19, 18:10, Saravana Kannan wrote: > Interconnects and interconnect paths quantify their performance levels in > terms of bandwidth and not in terms of frequency. So similar to how we have > frequency based OPP tables in DT and in the OPP framework, we need > bandwidth OPP table support in the OPP framework and in DT. Since there can > be more than one interconnect path used by a device, we also need a way to > assign a bandwidth OPP table to an interconnect path. > > This patch series: > - Adds opp-peak-KBps and opp-avg-KBps properties to OPP DT bindings > - Adds interconnect-opp-table property to interconnect DT bindings > - Adds OPP helper functions for bandwidth OPP tables > - Adds icc_get_opp_table() to get the OPP table for an interconnect path > > So with the DT bindings added in this patch series, the DT for a GPU > that does bandwidth voting from GPU to Cache and GPU to DDR would look > something like this: And what changed since V2 ?
On Tue, Jul 2, 2019 at 11:36 PM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 02-07-19, 18:10, Saravana Kannan wrote: > > Interconnects and interconnect paths quantify their performance levels in > > terms of bandwidth and not in terms of frequency. So similar to how we have > > frequency based OPP tables in DT and in the OPP framework, we need > > bandwidth OPP table support in the OPP framework and in DT. Since there can > > be more than one interconnect path used by a device, we also need a way to > > assign a bandwidth OPP table to an interconnect path. > > > > This patch series: > > - Adds opp-peak-KBps and opp-avg-KBps properties to OPP DT bindings > > - Adds interconnect-opp-table property to interconnect DT bindings > > - Adds OPP helper functions for bandwidth OPP tables > > - Adds icc_get_opp_table() to get the OPP table for an interconnect path > > > > So with the DT bindings added in this patch series, the DT for a GPU > > that does bandwidth voting from GPU to Cache and GPU to DDR would look > > something like this: > > And what changed since V2 ? Sorry, forgot to put that in. I just dropped a lot of patches that weren't relevant to the idea of BW OPPs and tying them to interconnects. -Saravana
On 02-07-19, 18:10, Saravana Kannan wrote: > Interconnects and interconnect paths quantify their performance levels in > terms of bandwidth and not in terms of frequency. So similar to how we have > frequency based OPP tables in DT and in the OPP framework, we need > bandwidth OPP table support in the OPP framework and in DT. Since there can > be more than one interconnect path used by a device, we also need a way to > assign a bandwidth OPP table to an interconnect path. > > This patch series: > - Adds opp-peak-KBps and opp-avg-KBps properties to OPP DT bindings > - Adds interconnect-opp-table property to interconnect DT bindings > - Adds OPP helper functions for bandwidth OPP tables > - Adds icc_get_opp_table() to get the OPP table for an interconnect path > > So with the DT bindings added in this patch series, the DT for a GPU > that does bandwidth voting from GPU to Cache and GPU to DDR would look > something like this: > > gpu_cache_opp_table: gpu_cache_opp_table { > compatible = "operating-points-v2"; > > gpu_cache_3000: opp-3000 { > opp-peak-KBps = <3000>; > opp-avg-KBps = <1000>; > }; > gpu_cache_6000: opp-6000 { > opp-peak-KBps = <6000>; > opp-avg-KBps = <2000>; > }; > gpu_cache_9000: opp-9000 { > opp-peak-KBps = <9000>; > opp-avg-KBps = <9000>; > }; > }; > > gpu_ddr_opp_table: gpu_ddr_opp_table { > compatible = "operating-points-v2"; > > gpu_ddr_1525: opp-1525 { > opp-peak-KBps = <1525>; > opp-avg-KBps = <452>; > }; > gpu_ddr_3051: opp-3051 { > opp-peak-KBps = <3051>; > opp-avg-KBps = <915>; > }; > gpu_ddr_7500: opp-7500 { > opp-peak-KBps = <7500>; > opp-avg-KBps = <3000>; > }; > }; Who is going to use the above tables and how ? These are the maximum BW available over these paths, right ? > gpu_opp_table: gpu_opp_table { > compatible = "operating-points-v2"; > opp-shared; > > opp-200000000 { > opp-hz = /bits/ 64 <200000000>; > }; > opp-400000000 { > opp-hz = /bits/ 64 <400000000>; > }; > }; Shouldn't this link back to the above tables via required-opp, etc ? How will we know how much BW is required by the GPU device for all the paths ? > gpu@7864000 { > ... > operating-points-v2 = <&gpu_opp_table>, <&gpu_cache_opp_table>, <&gpu_ddr_opp_table>; > interconnects = <&mmnoc MASTER_GPU_1 &bimc SLAVE_SYSTEM_CACHE>, > <&mmnoc MASTER_GPU_1 &bimc SLAVE_DDR>; > interconnect-names = "gpu-cache", "gpu-mem"; > interconnect-opp-table = <&gpu_cache_opp_table>, <&gpu_ddr_opp_table> > };
On Wed, Jul 17, 2019 at 3:32 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 02-07-19, 18:10, Saravana Kannan wrote: > > Interconnects and interconnect paths quantify their performance levels in > > terms of bandwidth and not in terms of frequency. So similar to how we have > > frequency based OPP tables in DT and in the OPP framework, we need > > bandwidth OPP table support in the OPP framework and in DT. Since there can > > be more than one interconnect path used by a device, we also need a way to > > assign a bandwidth OPP table to an interconnect path. > > > > This patch series: > > - Adds opp-peak-KBps and opp-avg-KBps properties to OPP DT bindings > > - Adds interconnect-opp-table property to interconnect DT bindings > > - Adds OPP helper functions for bandwidth OPP tables > > - Adds icc_get_opp_table() to get the OPP table for an interconnect path > > > > So with the DT bindings added in this patch series, the DT for a GPU > > that does bandwidth voting from GPU to Cache and GPU to DDR would look > > something like this: > > > > gpu_cache_opp_table: gpu_cache_opp_table { > > compatible = "operating-points-v2"; > > > > gpu_cache_3000: opp-3000 { > > opp-peak-KBps = <3000>; > > opp-avg-KBps = <1000>; > > }; > > gpu_cache_6000: opp-6000 { > > opp-peak-KBps = <6000>; > > opp-avg-KBps = <2000>; > > }; > > gpu_cache_9000: opp-9000 { > > opp-peak-KBps = <9000>; > > opp-avg-KBps = <9000>; > > }; > > }; > > > > gpu_ddr_opp_table: gpu_ddr_opp_table { > > compatible = "operating-points-v2"; > > > > gpu_ddr_1525: opp-1525 { > > opp-peak-KBps = <1525>; > > opp-avg-KBps = <452>; > > }; > > gpu_ddr_3051: opp-3051 { > > opp-peak-KBps = <3051>; > > opp-avg-KBps = <915>; > > }; > > gpu_ddr_7500: opp-7500 { > > opp-peak-KBps = <7500>; > > opp-avg-KBps = <3000>; > > }; > > }; > > Who is going to use the above tables and how ? In this example the GPU driver would use these. It'll go through these and then decide what peak and average bw to pick based on whatever criteria. > These are the maximum > BW available over these paths, right ? I wouldn't call them "maximum" because there can't be multiple maximums :) But yes, these are the meaningful bandwidth from the GPU's perspective to use over these paths. > > > gpu_opp_table: gpu_opp_table { > > compatible = "operating-points-v2"; > > opp-shared; > > > > opp-200000000 { > > opp-hz = /bits/ 64 <200000000>; > > }; > > opp-400000000 { > > opp-hz = /bits/ 64 <400000000>; > > }; > > }; > > Shouldn't this link back to the above tables via required-opp, etc ? > How will we know how much BW is required by the GPU device for all the > paths ? If that's what the GPU driver wants to do, then yes. But the GPU driver could also choose to scale the bandwidth for these paths based on multiple other signals. Eg: bus port busy percentage, measure bandwidth, etc. Or they might even decide to pick the BW OPP that matches the index of their GPU OPP. It's up to them how they actually do the heuristic. -Saravana
I know you have explained lots of things earlier as well, but they are available over multiple threads and I don't know where to reply now :) Lets have proper discussion (once again) here and be done with it. Sorry for the trouble of explaining things again. On 17-07-19, 13:34, Saravana Kannan wrote: > On Wed, Jul 17, 2019 at 3:32 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 02-07-19, 18:10, Saravana Kannan wrote: > > > gpu_cache_opp_table: gpu_cache_opp_table { > > > compatible = "operating-points-v2"; > > > > > > gpu_cache_3000: opp-3000 { > > > opp-peak-KBps = <3000>; > > > opp-avg-KBps = <1000>; > > > }; > > > gpu_cache_6000: opp-6000 { > > > opp-peak-KBps = <6000>; > > > opp-avg-KBps = <2000>; > > > }; > > > gpu_cache_9000: opp-9000 { > > > opp-peak-KBps = <9000>; > > > opp-avg-KBps = <9000>; > > > }; > > > }; > > > > > > gpu_ddr_opp_table: gpu_ddr_opp_table { > > > compatible = "operating-points-v2"; > > > > > > gpu_ddr_1525: opp-1525 { > > > opp-peak-KBps = <1525>; > > > opp-avg-KBps = <452>; > > > }; > > > gpu_ddr_3051: opp-3051 { > > > opp-peak-KBps = <3051>; > > > opp-avg-KBps = <915>; > > > }; > > > gpu_ddr_7500: opp-7500 { > > > opp-peak-KBps = <7500>; > > > opp-avg-KBps = <3000>; > > > }; > > > }; > > > > Who is going to use the above tables and how ? > > In this example the GPU driver would use these. It'll go through these > and then decide what peak and average bw to pick based on whatever > criteria. Are you saying that the GPU driver will decide which bandwidth to choose while running at a particular frequency (say 2 GHz) ? And that it can choose 1525 or 3051 or 7500 from the ddr path ? Will it be possible to publicly share how we derive to these decisions ? The thing is I don't like these separate OPP tables which will not be used by anyone else, but just GPU (or a single device). I would like to put this data in the GPU OPP table only. What about putting a range in the GPU OPP table for the Bandwidth if it can change so much for the same frequency. > > These are the maximum > > BW available over these paths, right ? > > I wouldn't call them "maximum" because there can't be multiple > maximums :) But yes, these are the meaningful bandwidth from the GPU's > perspective to use over these paths. > > > > > > gpu_opp_table: gpu_opp_table { > > > compatible = "operating-points-v2"; > > > opp-shared; > > > > > > opp-200000000 { > > > opp-hz = /bits/ 64 <200000000>; > > > }; > > > opp-400000000 { > > > opp-hz = /bits/ 64 <400000000>; > > > }; > > > }; > > > > Shouldn't this link back to the above tables via required-opp, etc ? > > How will we know how much BW is required by the GPU device for all the > > paths ? > > If that's what the GPU driver wants to do, then yes. But the GPU > driver could also choose to scale the bandwidth for these paths based > on multiple other signals. Eg: bus port busy percentage, measure > bandwidth, etc. Lets say that the GPU is running at 2 GHz right now and based on above inputs it wants to increase the bandwidth to 7500 for ddr path, now does it make sense to run at 4 GHz instead of 2 so we utilize the bandwidth to the best of our ability and waste less power ? If something like that is acceptable, then what about keeping the bandwidth fixed for frequencies and rather scale the frequency of the GPU on the inputs your provided (like bus port busy percentage, etc). The current proposal makes me wonder on why should we try to reuse OPP tables for providing these bandwidth values as the OPP tables for interconnect paths isn't really a lot of data, only bandwidth all the time and there is no linking from the device's OPP table as well.
On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > I know you have explained lots of things earlier as well, but they are > available over multiple threads and I don't know where to reply now :) > > Lets have proper discussion (once again) here and be done with it. > Sorry for the trouble of explaining things again. > > On 17-07-19, 13:34, Saravana Kannan wrote: > > On Wed, Jul 17, 2019 at 3:32 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > On 02-07-19, 18:10, Saravana Kannan wrote: > > > > gpu_cache_opp_table: gpu_cache_opp_table { > > > > compatible = "operating-points-v2"; > > > > > > > > gpu_cache_3000: opp-3000 { > > > > opp-peak-KBps = <3000>; > > > > opp-avg-KBps = <1000>; > > > > }; > > > > gpu_cache_6000: opp-6000 { > > > > opp-peak-KBps = <6000>; > > > > opp-avg-KBps = <2000>; > > > > }; > > > > gpu_cache_9000: opp-9000 { > > > > opp-peak-KBps = <9000>; > > > > opp-avg-KBps = <9000>; > > > > }; > > > > }; > > > > > > > > gpu_ddr_opp_table: gpu_ddr_opp_table { > > > > compatible = "operating-points-v2"; > > > > > > > > gpu_ddr_1525: opp-1525 { > > > > opp-peak-KBps = <1525>; > > > > opp-avg-KBps = <452>; > > > > }; > > > > gpu_ddr_3051: opp-3051 { > > > > opp-peak-KBps = <3051>; > > > > opp-avg-KBps = <915>; > > > > }; > > > > gpu_ddr_7500: opp-7500 { > > > > opp-peak-KBps = <7500>; > > > > opp-avg-KBps = <3000>; > > > > }; > > > > }; > > > > > > Who is going to use the above tables and how ? > > > > In this example the GPU driver would use these. It'll go through these > > and then decide what peak and average bw to pick based on whatever > > criteria. > > Are you saying that the GPU driver will decide which bandwidth to > choose while running at a particular frequency (say 2 GHz) ? And that > it can choose 1525 or 3051 or 7500 from the ddr path ? > > Will it be possible to publicly share how we derive to these decisions > ? GPU is just an example. So I can't really speak for how a random GPU driver might decide the bandwidth to pick. But one obvious way is to start at the lowest bandwidth and check the bus port busy%. If it's > 80% busy, it'll pick the next bandwidth, etc. So, something like what cpufreq ondemand or conservative governor used to do. > The thing is I don't like these separate OPP tables which will not be > used by anyone else, but just GPU (or a single device). The BW OPP table isn't always a secondary OPP table. It can be a primary OPP table too. For example, if you have a bandwidth monitoring device/HW IP that can measure for a path and make requests for that path, it'll have a BW OPP table and it'll pick from one of those BW OPP levels based on the hardware measurements. It will have it's own device driver. This is basically no different from a device being the only user of a freq OPP table. > I would like > to put this data in the GPU OPP table only. What about putting a > range in the GPU OPP table for the Bandwidth if it can change so much > for the same frequency. I don't think the range is going to work. If a GPU is doing purely computational work, it's not unreasonable for it to vote for the lowest bandwidth for any GPU frequency. > > > > These are the maximum > > > BW available over these paths, right ? > > > > I wouldn't call them "maximum" because there can't be multiple > > maximums :) But yes, these are the meaningful bandwidth from the GPU's > > perspective to use over these paths. > > > > > > > > > gpu_opp_table: gpu_opp_table { > > > > compatible = "operating-points-v2"; > > > > opp-shared; > > > > > > > > opp-200000000 { > > > > opp-hz = /bits/ 64 <200000000>; > > > > }; > > > > opp-400000000 { > > > > opp-hz = /bits/ 64 <400000000>; > > > > }; > > > > }; > > > > > > Shouldn't this link back to the above tables via required-opp, etc ? > > > How will we know how much BW is required by the GPU device for all the > > > paths ? > > > > If that's what the GPU driver wants to do, then yes. But the GPU > > driver could also choose to scale the bandwidth for these paths based > > on multiple other signals. Eg: bus port busy percentage, measure > > bandwidth, etc. > > Lets say that the GPU is running at 2 GHz right now and based on above > inputs it wants to increase the bandwidth to 7500 for ddr path, now > does it make sense to run at 4 GHz instead of 2 so we utilize the > bandwidth to the best of our ability and waste less power ? This is kinda hard to explain, but I'll try. Firstly, the GPU power increase might be so high that you might not want to do this anyway. Also, what you are proposing *might* improve the perf/mW (efficiency) but it doesn't decrease the actual power consumption. So, this doesn't really work towards saving power for mobile devices. Also, if the GPU is generating a lot of traffic to DDR and you increase the GPU frequency, it's only going to generate even more traffic. So you'll end up in a positive feedback loop that maxes out the frequency and bandwidth. Definitely not something you want for a mobile device. > If something like that is acceptable, then what about keeping the > bandwidth fixed for frequencies and rather scale the frequency of the > GPU on the inputs your provided (like bus port busy percentage, etc). I don't think it's acceptable. > The current proposal makes me wonder on why should we try to reuse OPP > tables for providing these bandwidth values as the OPP tables for > interconnect paths isn't really a lot of data, only bandwidth all the > time and there is no linking from the device's OPP table as well. I think everyone is getting too tied up on mapping device frequency to bandwidth requests. That's useful for a limited set of cases. But it doesn't work for a lot of use cases. Couple of benefits of using BW OPPs instead of with listing bandwidth values as part of frequency OPP tables: - Works better when the interconnect path has more useful levels that the device frequency levels. I think this might even be true on the SDM845 for GPU and DDR. The link from freq OPP to BW OPP could list the minimum bandwidth level to use for a particular device freq and then let the hardware monitoring heuristic take it higher from there. - Works even if no freq to bandwidth mapping heuristic is used but the device needs to skip certain bandwidth levels based on the platform's power/perf reasons. - More scalable as more properties are added to BW OPP levels. Traffic priority is one natural extension of the BW OPP "rows". Explicit latency is another possibility. - Currently devices that have use case specific bandwidth levels (that's not computed at runtime) have no way of capturing their use case level bandwidth needs in DT. Everyone is inventing their own scheme. Having BW OPP table would allow them capture all the use case specific bandwidth levels in DT and then pick one using the index/phandle/etc. We could even allow naming OPP rows and pick it that way. Not saying this is a main reason for BW OPP tables or we should do this, but this is a possibility to consider. Long story short, BW OPP tables make a lot of sense for anyone who has actually done bandwidth scaling on a commercial platform. If people are getting too tied up about the interconnect-opp-table we can just drop that. I just added that to avoid having any implicit ordering of tables in the operation-points-v2 property vs interconnects property and call it out more explicitly. But it's not a hill worth dying on. -Saravana
On 18-07-19, 21:12, Saravana Kannan wrote: > On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > I would like > > to put this data in the GPU OPP table only. What about putting a > > range in the GPU OPP table for the Bandwidth if it can change so much > > for the same frequency. > > I don't think the range is going to work. Any specific reason for that ? > If a GPU is doing purely > computational work, it's not unreasonable for it to vote for the > lowest bandwidth for any GPU frequency. I think that is fine, but if the GPU is able to find how much bandwidth it needs why can't it just pass that value without needing to have another OPP table for the path ? The interconnect can then take all the requests and have its own OPP table to get help on deciding what stable bandwidth to use.
On Mon, Jul 29, 2019 at 2:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 18-07-19, 21:12, Saravana Kannan wrote: > > On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > I would like > > > to put this data in the GPU OPP table only. What about putting a > > > range in the GPU OPP table for the Bandwidth if it can change so much > > > for the same frequency. > > > > I don't think the range is going to work. > > Any specific reason for that ? The next sentence was literally explaining this :) Fine to debate that, but ignoring that and asking this question is kinda funny. > > If a GPU is doing purely > > computational work, it's not unreasonable for it to vote for the > > lowest bandwidth for any GPU frequency. > > I think that is fine, but if the GPU is able to find how much > bandwidth it needs why can't it just pass that value without needing > to have another OPP table for the path ? You were asking this question in the context of "can the GPU OPP just list all the range of bandwidth it might use per GPU frequency". My point is that the range would be useless because it would the entire available bandwidth range (because purely compute work might not need any bandwidth). Whereas, what the GPU's algorithm actually needs might be the list of "useful" bandwidth levels to use. Also, as we add more ICC request properties, this range idea will not scale. -Saravana
On 29-07-19, 13:12, Saravana Kannan wrote: > On Mon, Jul 29, 2019 at 2:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > > On 18-07-19, 21:12, Saravana Kannan wrote: > > > On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > > I would like > > > > to put this data in the GPU OPP table only. What about putting a > > > > range in the GPU OPP table for the Bandwidth if it can change so much > > > > for the same frequency. > > > > > > I don't think the range is going to work. > > > > Any specific reason for that ? > > The next sentence was literally explaining this :) Fine to debate > that, but ignoring that and asking this question is kinda funny. Okay, but ... > > > If a GPU is doing purely > > > computational work, it's not unreasonable for it to vote for the > > > lowest bandwidth for any GPU frequency. ... it wasn't clear to me even after reading this sentence again now :) I understand that you may have to vote for the lowest bandwidth but that doesn't explain why a range can't work (sorry if it was just me who doesn't understood it :)). > > I think that is fine, but if the GPU is able to find how much > > bandwidth it needs why can't it just pass that value without needing > > to have another OPP table for the path ? > > You were asking this question in the context of "can the GPU OPP just > list all the range of bandwidth it might use per GPU frequency". My point > is that the range would be useless because it would the entire > available bandwidth range (because purely compute work might not need > any bandwidth). If it is useless to have entire range here, then why bother providing one ? Why can't the GPU request what it needs in exact terms, based on its calculations ? And then based on these requests, let the interconnect find what's the best/stable values it really wants to program the path for (and for that the interconnect can use its own OPP table, which would be fine). > Whereas, what the GPU's algorithm actually needs might be the list of > "useful" bandwidth levels to use. Hmm, I am not sure GPU's algorithm needs this table AFAIU based on all the conversations we had until now. It is very capable of finding how much bandwidth it needs, you just want the GPU driver to finally align that with a stable bandwidth for the platform later on. And what I am asking is that it is not required for the end driver to look for stable values, it just requests what it wants and let the interconnect core/driver decide the stable values. Very much like the clock framework, most of the user drivers just ask for a clk value to be programmed and it is the clock driver which keeps a table of the stable values and then aligns the requested value to one of those. > Also, as we add more ICC request properties, this range idea will not scale. I am not sure about what kind of properties there can be and where should they land. My feedback is completely based on what you have presented in this patchset.
Resending due to HTML. On Mon, Jul 29, 2019 at 8:02 PM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 29-07-19, 13:12, Saravana Kannan wrote: > > On Mon, Jul 29, 2019 at 2:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > > > > On 18-07-19, 21:12, Saravana Kannan wrote: > > > > On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > > > I would like > > > > > to put this data in the GPU OPP table only. What about putting a > > > > > range in the GPU OPP table for the Bandwidth if it can change so much > > > > > for the same frequency. > > > > > > > > I don't think the range is going to work. > > > > > > Any specific reason for that ? > > > > The next sentence was literally explaining this :) Fine to debate > > that, but ignoring that and asking this question is kinda funny. > > Okay, but ... > > > > > If a GPU is doing purely > > > > computational work, it's not unreasonable for it to vote for the > > > > lowest bandwidth for any GPU frequency. > > ... it wasn't clear to me even after reading this sentence again now > :) > > I understand that you may have to vote for the lowest bandwidth but > that doesn't explain why a range can't work (sorry if it was just me > who doesn't understood it :)). Well, doesn't work as in, it doesn't give any additional info. I can just vote for 0 or UINT_MAX if I want to stay at the lowest or high bandwidth. Having the actual values of the lowest or highest point doesn't help for cases where you need to skip intermediate bandwidth levels when going from low to high (as the need increases). > > > > I think that is fine, but if the GPU is able to find how much > > > bandwidth it needs why can't it just pass that value without needing > > > to have another OPP table for the path ? > > > > You were asking this question in the context of "can the GPU OPP just > > list all the range of bandwidth it might use per GPU frequency". My point > > is that the range would be useless because it would the entire > > available bandwidth range (because purely compute work might not need > > any bandwidth). > > If it is useless to have entire range here, then why bother providing > one ? Why can't the GPU request what it needs in exact terms, based on > its calculations ? And then based on these requests, let the > interconnect find what's the best/stable values it really wants to > program the path for (and for that the interconnect can use its own > OPP table, which would be fine). Let's say there actual path can support 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10 GB/s. Let's say 2, 3, and 4 need the same voltage level as 5 for this path. So, for GPU's needs using 2, 3 and 4 GB/s might not be good because the power savings from the frequency difference is not worth the performance and power (if you run the interconnect slow, the GPU would run faster to achieve the same performance) impact compared to running the interconnect at 5 GB/s. Similarly it might skip 6 GB/s. So even if the GPU can somehow calculate the exact bandwidth required (or say measure it), it'll need to know to skip 2, 3 and 4 because they aren't power/perf efficient levels to use. But all these bandwidth levels might be useable for a smaller HW IP whose power cost isn't high. So power savings running the interconnect at 3 GB/s might be worth it -- because even if the small HW IP ran faster to achieve the performance, the power increase in the HW IP won't be higher than the power savings from running the interconnect slower. > > Whereas, what the GPU's algorithm actually needs might be the list of > > "useful" bandwidth levels to use. > > Hmm, I am not sure GPU's algorithm needs this table AFAIU based on all > the conversations we had until now. It is very capable of finding how > much bandwidth it needs, Not really. If you have a monitor that can actually measure the bandwidth, yes. Most often that's not the case. If you just have a monitor that can give you the bus port busy% then it'll have to use this table to pick the useful ones. As in, in the example above, if the bus is still too busy at 1 GB/s it would directly ask for 5 GB/s instead of going through 2, 3 and 4. > you just want the GPU driver to finally align > that with a stable bandwidth for the platform later on. And what I am > asking is that it is not required for the end driver to look for > stable values, it just requests what it wants and let the interconnect > core/driver decide the stable values. I think you've misunderstood my prior statements then. The interconnect driver would then still have to aggregate the requests and pick the final frequency for the interconnect. That's where it comes in -- executing/implementing the requests of all the clients. > Very much like the clock framework, most of the user drivers just ask > for a clk value to be programmed and it is the clock driver which > keeps a table of the stable values and then aligns the requested value > to one of those. This of this similar to the clock API and the OPP tables for CPUs. The clock API could run the CPU at multiple different frequencies. But the CPU driver uses the CPU OPP table to pick a certain set of frequencies that are "useful". If your CPU clock is shared with another block, say L3 and there's an L3 driver that's requesting a different frequency range (using clk_set_rate_range(x, UINT_MAX)), that's where the clock driver aggregates their request and set's the final clock frequency. Similarly, the GPU driver wants to pick useful interconnect path bandwidth levels using BW OPP tables. And the interconnect framework aggregates the requests across multiple drivers requesting different bandwidths. Does that make sense? -Saravana