Message ID | 20190627171603.14767-1-douglas.raillard@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | sched/cpufreq: Make schedutil energy aware | expand |
On Thu, Jun 27, 2019 at 06:15:58PM +0100, Douglas RAILLARD wrote: > Make schedutil cpufreq governor energy-aware. > > - patch 1 introduces a function to retrieve a frequency given a base > frequency and an energy cost margin. > - patch 2 links Energy Model perf_domain to sugov_policy. > - patch 3 updates get_next_freq() to make use of the Energy Model. > > 1) Selecting the highest possible frequency for a given cost. Some > platforms can have lower frequencies that are less efficient than > higher ones, in which case they should be skipped for most purposes. > They can still be useful to give more freedom to thermal throttling > mechanisms, but not under normal circumstances. > note: the EM framework will warn about such OPPs "hertz/watts ratio > non-monotonically decreasing" Humm, for some reason I was thinking we explicitly skipped those OPPs and they already weren't used. This isn't in fact so, and these first few patches make it so?
On Thu, Jun 27, 2019 at 06:15:58PM +0100, Douglas RAILLARD wrote: > Make schedutil cpufreq governor energy-aware. > > - patch 4 adds sugov_cpu_ramp_boost() function. > - patch 5 updates sugov_update_(single|shared)() to make use of > sugov_cpu_ramp_boost(). > > The benefits of using the EM in schedutil are twofold: > 2) Driving the frequency selection with power in mind, in addition to > maximizing the utilization of the non-idle CPUs in the system. > Point 2) is enabled in > "sched/cpufreq: Boost schedutil frequency ramp up". It allows using > higher frequencies when it is known that the true utilization of > currently running tasks is exceeding their previous stable point. > The benefits are: > > * Boosting the frequency when the behavior of a runnable task changes, > leading to an increase in utilization. That shortens the frequency > ramp up duration, which in turns allows the utilization signal to > reach stable values quicker. Since the allowed frequency boost is > bounded in energy, it will behave consistently across platforms, > regardless of the OPP cost range. > > * The boost is only transient, and should not impact a lot the energy > consumed of workloads with very stable utilization signals. So you're allowing a higher pick when the EWMA exceeds the enqueue thing. This then obviously has relation to Patrick's patch that makes the EWMA asymmetric, but I'm thinking that the interaction is mostly favourable? I'm not immediately seeing how it is transient; that is, PELT has a wobble in it's steady state, is that accounted for?
Hi Peter, On 7/2/19 4:44 PM, Peter Zijlstra wrote: > On Thu, Jun 27, 2019 at 06:15:58PM +0100, Douglas RAILLARD wrote: >> Make schedutil cpufreq governor energy-aware. >> >> - patch 1 introduces a function to retrieve a frequency given a base >> frequency and an energy cost margin. >> - patch 2 links Energy Model perf_domain to sugov_policy. >> - patch 3 updates get_next_freq() to make use of the Energy Model. > >> >> 1) Selecting the highest possible frequency for a given cost. Some >> platforms can have lower frequencies that are less efficient than >> higher ones, in which case they should be skipped for most purposes. >> They can still be useful to give more freedom to thermal throttling >> mechanisms, but not under normal circumstances. >> note: the EM framework will warn about such OPPs "hertz/watts ratio >> non-monotonically decreasing" > > Humm, for some reason I was thinking we explicitly skipped those OPPs > and they already weren't used. > > This isn't in fact so, and these first few patches make it so? That's correct, the cost information about each OPP has been introduced recently in mainline by the energy model series. Without that info, the only way to skip them that comes to my mind is to set a policy min frequency, since these inefficient OPPs are usually located at the lower end. Thanks, Douglas
On 7/2/19 4:51 PM, Peter Zijlstra wrote: > On Thu, Jun 27, 2019 at 06:15:58PM +0100, Douglas RAILLARD wrote: >> Make schedutil cpufreq governor energy-aware. >> >> - patch 4 adds sugov_cpu_ramp_boost() function. >> - patch 5 updates sugov_update_(single|shared)() to make use of >> sugov_cpu_ramp_boost(). >> >> The benefits of using the EM in schedutil are twofold: > >> 2) Driving the frequency selection with power in mind, in addition to >> maximizing the utilization of the non-idle CPUs in the system. > >> Point 2) is enabled in >> "sched/cpufreq: Boost schedutil frequency ramp up". It allows using >> higher frequencies when it is known that the true utilization of >> currently running tasks is exceeding their previous stable point. >> The benefits are: >> >> * Boosting the frequency when the behavior of a runnable task changes, >> leading to an increase in utilization. That shortens the frequency >> ramp up duration, which in turns allows the utilization signal to >> reach stable values quicker. Since the allowed frequency boost is >> bounded in energy, it will behave consistently across platforms, >> regardless of the OPP cost range. >> >> * The boost is only transient, and should not impact a lot the energy >> consumed of workloads with very stable utilization signals. > [reordered original comments] > This then obviously has relation to Patrick's patch that makes the EWMA > asymmetric, but I'm thinking that the interaction is mostly favourable? Making task_ue.ewma larger makes cpu_ue.enqueued larger, so Patrick's patch helps increasing the utilisation as seen by schedutil in that transient time. (see discussion on schedutil signals at the bottom). That goes in the same direction as this series. > So you're allowing a higher pick when the EWMA exceeds the enqueue > thing. TLDR: Schedutil ramp boost works on CPU rq signals, for which util est EWMA is not defined, but the idea is the same (replace util est EWMA by util_avg). The important point here is that when util_avg for the task becomes higher than task_ue.enqueued, it means the knowledge of the actual needs of the task is turned into a lower bound (=task_ue.enqueued) rather than an exact value. This means that selecting a higher frequency than that is: a) necessary, the task needs more computational power to do its job. b) a shot in the dark, as it's impossible to predict exactly how much it will need without a crystal ball. When adding ramp boost, the bill is split: part of the "shot in the dark" comes from the growing CPU's util_avg (see schedutil_u definition at the bottom), and part of it comes from the ramp boost. We don't want to make the boost too costly either since it's a shot in the dark. Therefore, we make the boost proportional to a battery life cost rather than some guessed utilisation. Now that I think about it, it may make sense to let this ramp-boost completely handle this "future util prediction" case, as it's not better or worse than util_avg at that (since it's based on it), but allows better control on the cost of a (mis)prediction. > > I'm not immediately seeing how it is transient; that is, PELT has a > wobble in it's steady state, is that accounted for? > The transient-ness of the ramp boost I'm introducing comes from the fact that for a periodic task at steady state, task_ue.enqueued <= task_u when the task is executing. That is because task_ue.enqueued is sampled at dequeue time, precisely at the moment at which task_u is reaching its max for that task. Since we only take into account positive boosts, ramp boost will only have an impact in the "increase transients". About signals schedutil is based on =================================== Here is the state of signals used by schedutil to my knowledge to compute the final "chosen_freq": # let's define short names to talk about task_ue = se.avg.util_est task_u = se.avg.util_avg cpu_ue = cfs_rq->avg.util_est cpu_u = cfs_rq->avg.util_avg # How things are defined task_u ~= LOW_PASS_FILTER(task_activations) task_ue.enqueued = SAMPLE_AT_DEQUEUE_AND_HOLD(task_u) task_ue.ewma = LOW_PASS_FILTER(task_ue.enqueued) # Patrick's patch amends task_ue.ewma definition this way: task_ue.ewma = | task_ue.enqueued > task_ue.ewma: task_ue.enqueued | otherwise : LOW_PASS_FILTER(task_ue.enqueued) cpu_ue.enqueued = SUM[MAX(task_ue.ewma, task_ue.enqueued) forall task_ue in enqueued_tasks] cpu_u = SUM[task_u forall task_ue in enqueued_tasks] # What schedutil considers when taking freq decisions non_cfs_u = util of deadline + rt + irq schedutil_u = non_cfs_u + APPLY_UCLAMP(MAX(cpu_ue.enqueued, cpu_u)) + iowait_boost schedutil_base_freq = MAP_UTIL_FREQ(schedutil_u) STABLE(signal) = | signal equal to the last time it was sampled by caller: True | otherwise : False # A diff between two util signals is converted to a EM_COST_MARGIN_SCALE value. # They are different units, but the conversion factor is 1 in practice. ramp_boost = | cpu_ue.enqueued > cpu_u && STABLE(cpu_ue.enqueued): (cpu_ue.enqueued - cpu_u) * (EM_COST_MARGIN_SCALE/SCHED_CAPACITY_SCALE) | otherwise: 0 APPLY_RAMP_BOOST(boost, base_freq) = boosted_freq with acceptable_cost = ENERGY_MODEL_COST(base_freq) * (EM_COST_MARGIN_SCALE + boost) boosted_freq = MAX[freq forall freqs if ENERGY_MODEL_COST(freq) < acceptable_cost] # ramp-boost is applied on a freq instead of a util (unlike iowait_boost), since # the function ENERGY_MODEL_COST(freq) is provided by the EM, and an equivalent # ENERGY_MODEL_COST(util) would need extra calls to MAP_UTIL_FREQ(). schedutil_freq = APPLY_RAMP_BOOST(ramp_boost, schedutil_base_freq) REAL_FREQ(ideal_freq) = MIN[freq forall freqs if freq >= ideal_freq] POLICY_CLAMP(freq) = | freq < policy_min_freq: policy_min_freq | freq > policy_max_freq: policy_max_freq | otherwise : freq # Frequency finally used for the policy chosen_freq = POLICY_CLAMP(REAL_FREQ(schedutil_freq)) Thanks, Douglas
On 03-Jul 17:36, Douglas Raillard wrote: > On 7/2/19 4:51 PM, Peter Zijlstra wrote: > > On Thu, Jun 27, 2019 at 06:15:58PM +0100, Douglas RAILLARD wrote: [...] > > I'm not immediately seeing how it is transient; that is, PELT has a > > wobble in it's steady state, is that accounted for? > > > > The transient-ness of the ramp boost I'm introducing comes from the fact that for a > periodic task at steady state, task_ue.enqueued <= task_u when the task is executing. ^^^^^^^^^^^^^^^ I find your above "at steady state" a bit confusing. The condition "task_ue.enqueue <= task_u" is true only for the first task's big activation after a series of small activations, e.g. a task switching from 20% to 70%. That's the transient stat you refer to, isn't it? > That is because task_ue.enqueued is sampled at dequeue time, precisely at the moment > at which task_u is reaching its max for that task. Right, so in the example above we will have enqueued=20% while task_u is going above to converge towards 70% > Since we only take into account positive boosts, ramp boost will > only have an impact in the "increase transients". Right. I think Peter was referring to the smallish wobbles we see when the task already converged to 70%. If that's the case I would say they are already fully covered also by the current util_est. You are also correct in pointing out that in the steady state ramp_boost will not be triggered in that steady state. IMU, that's for two main reasons: a) it's very likely that enqueued <= util_avg b) even in case enqueued should turn out to be _slightly_ bigger then util_avg, the corresponding (proportional) ramp_boost would be so tiny to not have any noticeable effect on OPP selection. Am I correct on point b) above? Could you maybe come up with some experimental numbers related to that case specifically? Best, Patrick
On 03-Jul 14:38, Douglas Raillard wrote: > Hi Peter, > > On 7/2/19 4:44 PM, Peter Zijlstra wrote: > > On Thu, Jun 27, 2019 at 06:15:58PM +0100, Douglas RAILLARD wrote: > > > Make schedutil cpufreq governor energy-aware. > > > > > > - patch 1 introduces a function to retrieve a frequency given a base > > > frequency and an energy cost margin. > > > - patch 2 links Energy Model perf_domain to sugov_policy. > > > - patch 3 updates get_next_freq() to make use of the Energy Model. > > > > > > > > 1) Selecting the highest possible frequency for a given cost. Some > > > platforms can have lower frequencies that are less efficient than > > > higher ones, in which case they should be skipped for most purposes. > > > They can still be useful to give more freedom to thermal throttling > > > mechanisms, but not under normal circumstances. > > > note: the EM framework will warn about such OPPs "hertz/watts ratio > > > non-monotonically decreasing" > > > > Humm, for some reason I was thinking we explicitly skipped those OPPs > > and they already weren't used. > > > > This isn't in fact so, and these first few patches make it so? > > That's correct, the cost information about each OPP has been introduced recently in mainline > by the energy model series. Without that info, the only way to skip them that comes to my > mind is to set a policy min frequency, since these inefficient OPPs are usually located > at the lower end. Perhaps it's also worth to point out that the alternative approach you point out above is a system wide solution. While, the ramp_boost thingy you propose, it's a more fine grained mechanisms which could be extended in the future to have a per-task side. IOW, it could contribute to have better user-space hints, for example to ramp_boost more certain tasks and not others. Best, Patrick
Hi Patrick, On 7/8/19 12:09 PM, Patrick Bellasi wrote: > On 03-Jul 17:36, Douglas Raillard wrote: >> On 7/2/19 4:51 PM, Peter Zijlstra wrote: >>> On Thu, Jun 27, 2019 at 06:15:58PM +0100, Douglas RAILLARD wrote: > > [...] > >>> I'm not immediately seeing how it is transient; that is, PELT has a >>> wobble in it's steady state, is that accounted for? >>> >> >> The transient-ness of the ramp boost I'm introducing comes from the fact that for a >> periodic task at steady state, task_ue.enqueued <= task_u when the task is executing. > ^^^^^^^^^^^^^^^ > > I find your above "at steady state" a bit confusing. > > The condition "task_ue.enqueue <= task_u" is true only for the first > task's big activation after a series of small activations, e.g. a task > switching from 20% to 70%. I actually made a typo and meant "task_u <= task_ue.enqueue". The rest of the paragraph is aligned with that condition, sorry for the confusion. > That's the transient stat you refer to, isn't it? > >> That is because task_ue.enqueued is sampled at dequeue time, precisely at the moment >> at which task_u is reaching its max for that task. > > Right, so in the example above we will have enqueued=20% while task_u > is going above to converge towards 70% > >> Since we only take into account positive boosts, ramp boost will >> only have an impact in the "increase transients". > > Right. > > I think Peter was referring to the smallish wobbles we see when the > task already converged to 70%. If that's the case I would say they are > already fully covered also by the current util_est. Yes, that's covered by the "task_u <= task_ue.enqueue" condition, with task_ue.enqueued not having any of these "mid freq" content that we call wobble here. Util est enqueued acts as an adaptive filter that kills frequencies higher than 1/task_period, task_period being the delta between the two previous "enqueue events". All what's (mostly) remaining after that is util variation of larger periods, with a positive shift that increases with the task period (mean(enqueued) = mean(util_avg) + f(task_period)). > You are also correct in pointing out that in the steady state > ramp_boost will not be triggered in that steady state. > > IMU, that's for two main reasons: > a) it's very likely that enqueued <= util_avg > b) even in case enqueued should turn out to be _slightly_ bigger then > util_avg, the corresponding (proportional) ramp_boost would be so > tiny to not have any noticeable effect on OPP selection. > > Am I correct on point b) above? Assuming you meant "util_avg slightly bigger than enqueued" (which is when boosting triggers), then yes since ramp_boost effect is proportional to "task_ue.enqueue - task_u". It makes it robust against that. > > Could you maybe come up with some experimental numbers related to that > case specifically? With: * an rt-app task ramping up from 5% to 75% util in one big step. The whole cycle is 0.6s long (0.3s at 5% followed by 0.3s at 75%). This cycle is repeated 20 times and the average of boosting is taken. * a hikey 960 (this impact the frequency at which the test runs at the beginning of 75% phase, which impacts the number of missed activations before the util ramped up). * assuming an OPP exists for each util value (i.e. 1024 OPPs, so the effect of boost on consumption is not impacted by OPP capacities granularity) Then the boosting feature would increase the average power consumption by 3.1%, out of which 0.12% can be considered "spurious boosting" due to the util taking some time to really converge to its steady state value. In practice, the impact of small boosts will be even lower since they will less likely trigger the selection of a high OPP due to OPP capacity granularity > 1 util unit. > > Best, > Patrick > Best regards, Douglas
On 7/8/19 12:13 PM, Patrick Bellasi wrote: > On 03-Jul 14:38, Douglas Raillard wrote: >> Hi Peter, >> >> On 7/2/19 4:44 PM, Peter Zijlstra wrote: >>> On Thu, Jun 27, 2019 at 06:15:58PM +0100, Douglas RAILLARD wrote: >>>> Make schedutil cpufreq governor energy-aware. >>>> >>>> - patch 1 introduces a function to retrieve a frequency given a base >>>> frequency and an energy cost margin. >>>> - patch 2 links Energy Model perf_domain to sugov_policy. >>>> - patch 3 updates get_next_freq() to make use of the Energy Model. >>> >>>> >>>> 1) Selecting the highest possible frequency for a given cost. Some >>>> platforms can have lower frequencies that are less efficient than >>>> higher ones, in which case they should be skipped for most purposes. >>>> They can still be useful to give more freedom to thermal throttling >>>> mechanisms, but not under normal circumstances. >>>> note: the EM framework will warn about such OPPs "hertz/watts ratio >>>> non-monotonically decreasing" >>> >>> Humm, for some reason I was thinking we explicitly skipped those OPPs >>> and they already weren't used. >>> >>> This isn't in fact so, and these first few patches make it so? >> >> That's correct, the cost information about each OPP has been introduced recently in mainline >> by the energy model series. Without that info, the only way to skip them that comes to my >> mind is to set a policy min frequency, since these inefficient OPPs are usually located >> at the lower end. > > Perhaps it's also worth to point out that the alternative approach you > point out above is a system wide solution. > > While, the ramp_boost thingy you propose, it's a more fine grained > mechanisms which could be extended in the future to have a per-task > side. IOW, it could contribute to have better user-space hints, for > example to ramp_boost more certain tasks and not others. ramp_boost and the situation you describe are more what solves point 2) (which has been cut out in that answer), this point "1)" is really just about avoiding selection of some OPPs, regardless of task util. IOW, it's better to skip the OPPs we talk about here, and race to idle at a higher OPP regardless of what the task need. > Best, > Patrick > Cheers, Douglas
On 08-Jul 14:46, Douglas Raillard wrote: > Hi Patrick, > > On 7/8/19 12:09 PM, Patrick Bellasi wrote: > > On 03-Jul 17:36, Douglas Raillard wrote: > > > On 7/2/19 4:51 PM, Peter Zijlstra wrote: > > > > On Thu, Jun 27, 2019 at 06:15:58PM +0100, Douglas RAILLARD wrote: [...] > > You are also correct in pointing out that in the steady state > > ramp_boost will not be triggered in that steady state. > > > > IMU, that's for two main reasons: > > a) it's very likely that enqueued <= util_avg > > b) even in case enqueued should turn out to be _slightly_ bigger then > > util_avg, the corresponding (proportional) ramp_boost would be so > > tiny to not have any noticeable effect on OPP selection. > > > > Am I correct on point b) above? > > Assuming you meant "util_avg slightly bigger than enqueued" (which is when boosting triggers), > then yes since ramp_boost effect is proportional to "task_ue.enqueue - task_u". It makes it robust > against that. Right :) > > Could you maybe come up with some experimental numbers related to that > > case specifically? > > With: > * an rt-app task ramping up from 5% to 75% util in one big step. The > whole cycle is 0.6s long (0.3s at 5% followed by 0.3s at 75%). This > cycle is repeated 20 times and the average of boosting is taken. > > * a hikey 960 (this impact the frequency at which the test runs at > the beginning of 75% phase, which impacts the number of missed > activations before the util ramped up). > > * assuming an OPP exists for each util value (i.e. 1024 OPPs, so the > effect of boost on consumption is not impacted by OPP capacities > granularity) > > Then the boosting feature would increase the average power > consumption by 3.1%, out of which 0.12% can be considered "spurious > boosting" due to the util taking some time to really converge to its > steady state value. > > In practice, the impact of small boosts will be even lower since > they will less likely trigger the selection of a high OPP due to OPP > capacity granularity > 1 util unit. That's ok for the energy side: you estimate a ~3% worst case more energy on that specific target. By boosting I expect the negative boost to improve. Do you have also numbers/stats related to the negative slack? Can you share a percentage figure for that improvement? Best, Patrick
Hi Patrick, On 7/9/19 11:37 AM, Patrick Bellasi wrote: > On 08-Jul 14:46, Douglas Raillard wrote: >> Hi Patrick, >> >> On 7/8/19 12:09 PM, Patrick Bellasi wrote: >>> On 03-Jul 17:36, Douglas Raillard wrote: >>>> On 7/2/19 4:51 PM, Peter Zijlstra wrote: >>>>> On Thu, Jun 27, 2019 at 06:15:58PM +0100, Douglas RAILLARD wrote: > > [...] > >>> You are also correct in pointing out that in the steady state >>> ramp_boost will not be triggered in that steady state. >>> >>> IMU, that's for two main reasons: >>> a) it's very likely that enqueued <= util_avg >>> b) even in case enqueued should turn out to be _slightly_ bigger then >>> util_avg, the corresponding (proportional) ramp_boost would be so >>> tiny to not have any noticeable effect on OPP selection. >>> >>> Am I correct on point b) above? >> >> Assuming you meant "util_avg slightly bigger than enqueued" (which is when boosting triggers), >> then yes since ramp_boost effect is proportional to "task_ue.enqueue - task_u". It makes it robust >> against that. > > Right :) > >>> Could you maybe come up with some experimental numbers related to that >>> case specifically? >> >> With: >> * an rt-app task ramping up from 5% to 75% util in one big step. The >> whole cycle is 0.6s long (0.3s at 5% followed by 0.3s at 75%). This >> cycle is repeated 20 times and the average of boosting is taken. >> >> * a hikey 960 (this impact the frequency at which the test runs at >> the beginning of 75% phase, which impacts the number of missed >> activations before the util ramped up). >> >> * assuming an OPP exists for each util value (i.e. 1024 OPPs, so the >> effect of boost on consumption is not impacted by OPP capacities >> granularity) >> >> Then the boosting feature would increase the average power >> consumption by 3.1%, out of which 0.12% can be considered "spurious >> boosting" due to the util taking some time to really converge to its >> steady state value. >> >> In practice, the impact of small boosts will be even lower since >> they will less likely trigger the selection of a high OPP due to OPP >> capacity granularity > 1 util unit. > > That's ok for the energy side: you estimate a ~3% worst case more > energy on that specific target. > > By boosting I expect the negative boost to improve. > Do you have also numbers/stats related to the negative slack? > Can you share a percentage figure for that improvement? I'm now testing on a Google Pixel 3 (Qcom Snapdragon 845) phone, with the same workload, pinned on a big core. It has a lot more OPPs than a hikey 960, so gradations in boosting are better reflected on frequency selection. avg slack (higher=better): Average time between task sleep and its next periodic activation. avg negative slack (lower in absolute value=better): Same as avg slack, but only taking into account negative values. Negative slack means a task activation did not have enough time to complete before the next periodic activation fired, which is what we want to avoid. boost energy overhead (lower=better): Extra power consumption induced by ramp boost, assuming continuous OPP space (infinite number of OPP) and single-CPU policies. In practice, fixed number of OPP decrease this value, and more CPU per policy increases it, since boost(policy) = max(boost(cpu of policy)). Without ramp boost: +--------------------+--------------------+ |avg slack (us) |avg negative slack | | |(us) | +--------------------+--------------------+ |6598.72 |-10217.13 | |6595.49 |-10200.13 | |6613.72 |-10401.06 | |6600.29 |-9860.872 | |6605.53 |-10057.64 | |6612.05 |-10267.50 | |6599.01 |-9939.60 | |6593.79 |-9445.633 | |6613.56 |-10276.75 | |6595.44 |-9751.770 | +--------------------+--------------------+ |average | +--------------------+--------------------+ |6602.76 |-10041.81 | +--------------------+--------------------+ With ramp boost enabled: +--------------------+--------------------+--------------------+ |boost energy |avg slack (us) |avg negative slack | |overhead (%) | |(us) | +--------------------+--------------------+--------------------+ |3.05 |7148.93 |-5664.26 | |3.04 |7144.69 |-5667.77 | |3.05 |7149.05 |-5698.31 | |2.97 |7126.71 |-6040.23 | |3.02 |7140.28 |-5826.78 | |3.03 |7135.11 |-5749.62 | |3.05 |7140.24 |-5750.0 | |3.05 |7144.84 |-5667.04 | |3.07 |7157.30 |-5656.65 | |3.06 |7154.65 |-5653.76 | +--------------------+--------------------+--------------------+ |average | +--------------------+--------------------+--------------------+ |3.039000 |7144.18 |5737.44 | +--------------------+--------------------+--------------------+ The negative slack is due to missed activations while the utilization signals increase during the big utilization step. Ramp boost is designed to boost frequency during that phase, which materializes in 1.75 less negative slack, for an extra power consumption under 3%. > Best, > Patrick > Thanks, Douglas