diff mbox series

drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory.

Message ID 20240112203803.6421-1-ian.forbes@broadcom.com (mailing list archive)
State New, archived
Headers show
Series drm/vmwgfx: Filter modes which exceed 3/4 of graphics memory. | expand

Commit Message

Ian Forbes Jan. 12, 2024, 8:38 p.m. UTC
SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
means that modes with a final buffer size that would exceed graphics memory
must be pruned otherwise creation will fail.

Additionally, device commands which use multiple graphics resources must
have all their resources fit within graphics memory for the duration of the
command. Thus we need a small carve out of 1/4 of graphics memory to ensure
commands likes surface copies to the primary framebuffer for cursor
composition or damage clips can fit within graphics memory.

This fixes an issue where VMs with low graphics memory (< 64MiB) configured
with high resolution mode boot to a black screen because surface creation
fails.

Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
---
 drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

Comments

Thomas Zimmermann Jan. 15, 2024, 8:21 a.m. UTC | #1
Hi

Am 12.01.24 um 21:38 schrieb Ian Forbes:
> SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
> means that modes with a final buffer size that would exceed graphics memory
> must be pruned otherwise creation will fail.
> 
> Additionally, device commands which use multiple graphics resources must
> have all their resources fit within graphics memory for the duration of the
> command. Thus we need a small carve out of 1/4 of graphics memory to ensure
> commands likes surface copies to the primary framebuffer for cursor
> composition or damage clips can fit within graphics memory.
> 
> This fixes an issue where VMs with low graphics memory (< 64MiB) configured
> with high resolution mode boot to a black screen because surface creation
> fails.

That is a long-standing problem, which we have observed with other 
drivers as well. On low-memory devices, TTM doesn't play well. The real 
fix would be to export all modes that possibly fit and sort out the 
invalid configurations in atomic_check. It's just a lot more work.

Did you consider simply ignoring vmwgfx devices with less than 64 MiB of 
VRAM?

> 
> Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
> ---
>   drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 20 ++++++++++++--------
>   1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> index 28ff30e32fab..39d6d17fc488 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> @@ -2854,12 +2854,12 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
>   	struct vmw_private *dev_priv = vmw_priv(dev);
>   	u32 max_width = dev_priv->texture_max_width;
>   	u32 max_height = dev_priv->texture_max_height;
> -	u32 assumed_cpp = 4;
> +	u32 assumed_cpp = dev_priv->assume_16bpp ? 2 : 4;
> +	u32 pitch = mode->hdisplay * assumed_cpp;
> +	u64 total = mode->vdisplay * pitch;
> +	bool using_stdu = dev_priv->active_display_unit == vmw_du_screen_target;
>   
> -	if (dev_priv->assume_16bpp)
> -		assumed_cpp = 2;
> -
> -	if (dev_priv->active_display_unit == vmw_du_screen_target) {
> +	if (using_stdu) {
>   		max_width  = min(dev_priv->stdu_max_width,  max_width);
>   		max_height = min(dev_priv->stdu_max_height, max_height);
>   	}
> @@ -2870,9 +2870,13 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
>   	if (max_height < mode->vdisplay)
>   		return MODE_BAD_VVALUE;
>   
> -	if (!vmw_kms_validate_mode_vram(dev_priv,
> -			mode->hdisplay * assumed_cpp,
> -			mode->vdisplay))
> +	if (using_stdu &&
> +		(total > (dev_priv->max_mob_pages * PAGE_SIZE * 3 / 4) ||

IDK the details of vmwgfx's memory management, but instead of this '3/4 
test', wouldn't it be better to partition the VRAM via TTM and test 
against the partition sizes?

Best regards
Thomas

> +		total > dev_priv->max_mob_size)) {
> +		return MODE_MEM;
> +	}
> +
> +	if (!vmw_kms_validate_mode_vram(dev_priv, pitch, mode->vdisplay))
>   		return MODE_MEM;
>   
>   	return MODE_OK;
Zack Rusin Jan. 18, 2024, 6:25 p.m. UTC | #2
On Mon, Jan 15, 2024 at 3:21 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> Hi
>
> Am 12.01.24 um 21:38 schrieb Ian Forbes:
> > SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
> > means that modes with a final buffer size that would exceed graphics memory
> > must be pruned otherwise creation will fail.
> >
> > Additionally, device commands which use multiple graphics resources must
> > have all their resources fit within graphics memory for the duration of the
> > command. Thus we need a small carve out of 1/4 of graphics memory to ensure
> > commands likes surface copies to the primary framebuffer for cursor
> > composition or damage clips can fit within graphics memory.
> >
> > This fixes an issue where VMs with low graphics memory (< 64MiB) configured
> > with high resolution mode boot to a black screen because surface creation
> > fails.
>
> That is a long-standing problem, which we have observed with other
> drivers as well. On low-memory devices, TTM doesn't play well. The real
> fix would be to export all modes that possibly fit and sort out the
> invalid configurations in atomic_check. It's just a lot more work.
>
> Did you consider simply ignoring vmwgfx devices with less than 64 MiB of
> VRAM?

Unfortunately we can't do that because on new esx servers without
gpu's the default is 16MB. A lot of people are still running their esx
boxes with 4MB, which is in general the most common problem because
with 4MB people still tend to like to set 1280x800 which with 32bpp fb
takes 4096000 bytes and with 4MB available that leaves only 96KB
available and we need more to also allocate things like the cursor.
Even if ttm did everything right technically 1280x800 @ 32bpp
resolution will fit in a 4MB graphics memory, but then the system will
not be able to have a hardware (well, virtualized) cursor. It's
extremely unlikely people would even be aware of this tradeoff when
making the decision to increase resolution.

So the driver either needs to preallocate all the memory it possibly
might need for all the basic functionality before modesetting or the
available modes need to be validated with some constraints.

z
Thomas Zimmermann Jan. 19, 2024, 9:22 a.m. UTC | #3
Hi

Am 18.01.24 um 19:25 schrieb Zack Rusin:
> On Mon, Jan 15, 2024 at 3:21 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>>
>> Hi
>>
>> Am 12.01.24 um 21:38 schrieb Ian Forbes:
>>> SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
>>> means that modes with a final buffer size that would exceed graphics memory
>>> must be pruned otherwise creation will fail.
>>>
>>> Additionally, device commands which use multiple graphics resources must
>>> have all their resources fit within graphics memory for the duration of the
>>> command. Thus we need a small carve out of 1/4 of graphics memory to ensure
>>> commands likes surface copies to the primary framebuffer for cursor
>>> composition or damage clips can fit within graphics memory.
>>>
>>> This fixes an issue where VMs with low graphics memory (< 64MiB) configured
>>> with high resolution mode boot to a black screen because surface creation
>>> fails.
>>
>> That is a long-standing problem, which we have observed with other
>> drivers as well. On low-memory devices, TTM doesn't play well. The real
>> fix would be to export all modes that possibly fit and sort out the
>> invalid configurations in atomic_check. It's just a lot more work.
>>
>> Did you consider simply ignoring vmwgfx devices with less than 64 MiB of
>> VRAM?
> 
> Unfortunately we can't do that because on new esx servers without
> gpu's the default is 16MB. A lot of people are still running their esx
> boxes with 4MB, which is in general the most common problem because
> with 4MB people still tend to like to set 1280x800 which with 32bpp fb
> takes 4096000 bytes and with 4MB available that leaves only 96KB
> available and we need more to also allocate things like the cursor.
> Even if ttm did everything right technically 1280x800 @ 32bpp
> resolution will fit in a 4MB graphics memory, but then the system will
> not be able to have a hardware (well, virtualized) cursor. It's
> extremely unlikely people would even be aware of this tradeoff when
> making the decision to increase resolution.

Do you allocate buffer storage directly in the provided VRAM? If so how 
do you do page flips then? You'd need for the example of 1280x800-32, 
you'd need around 8 MiB to keep front and back buffer in VRAM. I guess, 
you only support the framebuffer console (which doesn't do pageflips)?

In mgag200 and ast, I had the luxury for replacing TTM with SHMEM 
helpers, which worked around the problem easily. Maybe that's an option 
for low-memory systems?

Best regards
Thomas

> 
> So the driver either needs to preallocate all the memory it possibly
> might need for all the basic functionality before modesetting or the
> available modes need to be validated with some constraints.
> 
> z
Zack Rusin Jan. 30, 2024, 6:38 p.m. UTC | #4
On Fri, Jan 19, 2024 at 4:22 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> Hi
>
> Am 18.01.24 um 19:25 schrieb Zack Rusin:
> > On Mon, Jan 15, 2024 at 3:21 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> >>
> >> Hi
> >>
> >> Am 12.01.24 um 21:38 schrieb Ian Forbes:
> >>> SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
> >>> means that modes with a final buffer size that would exceed graphics memory
> >>> must be pruned otherwise creation will fail.
> >>>
> >>> Additionally, device commands which use multiple graphics resources must
> >>> have all their resources fit within graphics memory for the duration of the
> >>> command. Thus we need a small carve out of 1/4 of graphics memory to ensure
> >>> commands likes surface copies to the primary framebuffer for cursor
> >>> composition or damage clips can fit within graphics memory.
> >>>
> >>> This fixes an issue where VMs with low graphics memory (< 64MiB) configured
> >>> with high resolution mode boot to a black screen because surface creation
> >>> fails.
> >>
> >> That is a long-standing problem, which we have observed with other
> >> drivers as well. On low-memory devices, TTM doesn't play well. The real
> >> fix would be to export all modes that possibly fit and sort out the
> >> invalid configurations in atomic_check. It's just a lot more work.
> >>
> >> Did you consider simply ignoring vmwgfx devices with less than 64 MiB of
> >> VRAM?
> >
> > Unfortunately we can't do that because on new esx servers without
> > gpu's the default is 16MB. A lot of people are still running their esx
> > boxes with 4MB, which is in general the most common problem because
> > with 4MB people still tend to like to set 1280x800 which with 32bpp fb
> > takes 4096000 bytes and with 4MB available that leaves only 96KB
> > available and we need more to also allocate things like the cursor.
> > Even if ttm did everything right technically 1280x800 @ 32bpp
> > resolution will fit in a 4MB graphics memory, but then the system will
> > not be able to have a hardware (well, virtualized) cursor. It's
> > extremely unlikely people would even be aware of this tradeoff when
> > making the decision to increase resolution.
>
> Do you allocate buffer storage directly in the provided VRAM? If so how
> do you do page flips then? You'd need for the example of 1280x800-32,
> you'd need around 8 MiB to keep front and back buffer in VRAM. I guess,
> you only support the framebuffer console (which doesn't do pageflips)?

In general, yes. Of course it's a little more convoluted because we'll
act like OpenGL runtime here (i.e. glXSwapBuffers), i.e. our driver
will fake page-flips because the only memory we'll have is a single
buffer as the actual page-flipping happens in the presentation code on
the host. So the guest is not aware of the actual presentation (it's
also why we don't have any sort of vblank signaling in vmwgfx, the
concept just doesn't exist for us). i.e. on para-virtualized drivers
the actual page-flips will be property of the presentation code that's
outside of the guest. It's definitely one those things that I wanted
to have a good solution for in a while, in particular to have a better
story behind vblank handling, but it's difficult because
"presentation" on vm's is in general difficult to define - it might be
some vnc connected host on the other continent. Having said that
that's basically a wonky VRR display so we should be able to handle
our presentation as VRR and give more control of updates to the guest,
but we haven't done it yet.

> In mgag200 and ast, I had the luxury for replacing TTM with SHMEM
> helpers, which worked around the problem easily. Maybe that's an option
> for low-memory systems?

Our current device doesn't have the ability to present out of
unspecified memory in the guest, i.e. the host, which is doing the
presentation, is not aware of how guest kernel lays out the memory so
we need to basically create a page-table for every graphics object
(VMW_PL_MOB placement in vmwgfx) so that the host can actually find
the memory it needs to read. So the shmem helpers would need something
extra for us to be able to generate those page tables for the
drm_gem_object's it deals with.

z
Zack Rusin Jan. 30, 2024, 7:38 p.m. UTC | #5
On Fri, Jan 12, 2024 at 4:20 PM Ian Forbes <ian.forbes@broadcom.com> wrote:
>
> SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
> means that modes with a final buffer size that would exceed graphics memory
> must be pruned otherwise creation will fail.
>
> Additionally, device commands which use multiple graphics resources must
> have all their resources fit within graphics memory for the duration of the
> command. Thus we need a small carve out of 1/4 of graphics memory to ensure
> commands likes surface copies to the primary framebuffer for cursor
> composition or damage clips can fit within graphics memory.
>
> This fixes an issue where VMs with low graphics memory (< 64MiB) configured
> with high resolution mode boot to a black screen because surface creation
> fails.
>
> Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> index 28ff30e32fab..39d6d17fc488 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
> @@ -2854,12 +2854,12 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
>         struct vmw_private *dev_priv = vmw_priv(dev);
>         u32 max_width = dev_priv->texture_max_width;
>         u32 max_height = dev_priv->texture_max_height;
> -       u32 assumed_cpp = 4;
> +       u32 assumed_cpp = dev_priv->assume_16bpp ? 2 : 4;
> +       u32 pitch = mode->hdisplay * assumed_cpp;
> +       u64 total = mode->vdisplay * pitch;
> +       bool using_stdu = dev_priv->active_display_unit == vmw_du_screen_target;
>
> -       if (dev_priv->assume_16bpp)
> -               assumed_cpp = 2;
> -
> -       if (dev_priv->active_display_unit == vmw_du_screen_target) {
> +       if (using_stdu) {
>                 max_width  = min(dev_priv->stdu_max_width,  max_width);
>                 max_height = min(dev_priv->stdu_max_height, max_height);
>         }
> @@ -2870,9 +2870,13 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
>         if (max_height < mode->vdisplay)
>                 return MODE_BAD_VVALUE;
>
> -       if (!vmw_kms_validate_mode_vram(dev_priv,
> -                       mode->hdisplay * assumed_cpp,
> -                       mode->vdisplay))
> +       if (using_stdu &&
> +               (total > (dev_priv->max_mob_pages * PAGE_SIZE * 3 / 4) ||

Could you export that computation somewhere where we could document
why we're doing this? Just to not leave the awkward "* 3 /4" that
everyone reading this code will wonder about?
And also make sure you indent this correctly, "dim checkpatch" should
warn about this.

z
Daniel Stone Jan. 30, 2024, 11:50 p.m. UTC | #6
Hi,

On Tue, 30 Jan 2024 at 18:39, Zack Rusin <zack.rusin@broadcom.com> wrote:
> In general, yes. Of course it's a little more convoluted because we'll
> act like OpenGL runtime here (i.e. glXSwapBuffers), i.e. our driver
> will fake page-flips because the only memory we'll have is a single
> buffer as the actual page-flipping happens in the presentation code on
> the host. So the guest is not aware of the actual presentation (it's
> also why we don't have any sort of vblank signaling in vmwgfx, the
> concept just doesn't exist for us). i.e. on para-virtualized drivers
> the actual page-flips will be property of the presentation code that's
> outside of the guest. It's definitely one those things that I wanted
> to have a good solution for in a while, in particular to have a better
> story behind vblank handling, but it's difficult because
> "presentation" on vm's is in general difficult to define - it might be
> some vnc connected host on the other continent. Having said that
> that's basically a wonky VRR display so we should be able to handle
> our presentation as VRR and give more control of updates to the guest,
> but we haven't done it yet.

Please don't.

Photon time is _a_ useful metric, but only backwards-informational.
It's nice to give userspace a good forward estimate of when pixels
will hit retinas, but as it's not fully reliable, the main part is
being able to let it know when it did happen so it can adjust. Given
that it's not reliable, we can't use it as a basis for preparing
submissions though, so we don't, even on bare-metal drivers.

As you've noted though, it really falls apart on non-bare-metal cases,
especially where latency vastly exceeds throughput, or when either is
hugely variable. So we don't ever use it as a basis.

VRR is worse though. The FRR model is 'you can display new content
every $period, and here's your basis so you can calibrate phase'. The
VRR model is 'you can display new content so rapidly it's not worth
trying to quantise, just fire it as rapidly as possible'. That's a
world away from 'errrr ... might be 16ms, might be 500? dunno really'.

The entire model we have is that basis timing flows backwards. The
'hardware' gives us a deadline, KMS angles to meet that with a small
margin, the compositor angles to meet that with a margin again, and it
lines up client repaints to hit that window too. Everything works on
that model, so it's not super surprising that using svga is - to quote
one of Weston's DRM-backend people who uses ESXi - 'a juddery mess'.

Given that the entire ecosystem is based on this model, I don't think
there's an easy way out where svga just does something wildly
different. The best way to fix it is to probably work on predictable
quantisation with updates: pick 5/12/47/60Hz to quantise to based on
your current throughput, with something similar to hotplug/LINK_STATUS
and faked EDID to let userspace know when the period changes. If you
have variability within the cycle, e.g. dropped frames, then just suck
it up and keep the illusion alive to userspace that it's presenting to
a fixed period, and if/when you calculate there's a better
quantisation then let userspace know what it is so it can adjust.

But there's really no future in just doing random presentation rates,
because that's not the API anyone has written for.

Cheers,
Daniel
Zack Rusin Jan. 31, 2024, 2:31 a.m. UTC | #7
On Tue, Jan 30, 2024 at 6:50 PM Daniel Stone <daniel@fooishbar.org> wrote:
>
> Hi,
>
> On Tue, 30 Jan 2024 at 18:39, Zack Rusin <zack.rusin@broadcom.com> wrote:
> > In general, yes. Of course it's a little more convoluted because we'll
> > act like OpenGL runtime here (i.e. glXSwapBuffers), i.e. our driver
> > will fake page-flips because the only memory we'll have is a single
> > buffer as the actual page-flipping happens in the presentation code on
> > the host. So the guest is not aware of the actual presentation (it's
> > also why we don't have any sort of vblank signaling in vmwgfx, the
> > concept just doesn't exist for us). i.e. on para-virtualized drivers
> > the actual page-flips will be property of the presentation code that's
> > outside of the guest. It's definitely one those things that I wanted
> > to have a good solution for in a while, in particular to have a better
> > story behind vblank handling, but it's difficult because
> > "presentation" on vm's is in general difficult to define - it might be
> > some vnc connected host on the other continent. Having said that
> > that's basically a wonky VRR display so we should be able to handle
> > our presentation as VRR and give more control of updates to the guest,
> > but we haven't done it yet.
>
> Please don't.
>
> Photon time is _a_ useful metric, but only backwards-informational.
> It's nice to give userspace a good forward estimate of when pixels
> will hit retinas, but as it's not fully reliable, the main part is
> being able to let it know when it did happen so it can adjust. Given
> that it's not reliable, we can't use it as a basis for preparing
> submissions though, so we don't, even on bare-metal drivers.
>
> As you've noted though, it really falls apart on non-bare-metal cases,
> especially where latency vastly exceeds throughput, or when either is
> hugely variable. So we don't ever use it as a basis.
>
> VRR is worse though. The FRR model is 'you can display new content
> every $period, and here's your basis so you can calibrate phase'. The
> VRR model is 'you can display new content so rapidly it's not worth
> trying to quantise, just fire it as rapidly as possible'. That's a
> world away from 'errrr ... might be 16ms, might be 500? dunno really'.
>
> The entire model we have is that basis timing flows backwards. The
> 'hardware' gives us a deadline, KMS angles to meet that with a small
> margin, the compositor angles to meet that with a margin again, and it
> lines up client repaints to hit that window too. Everything works on
> that model, so it's not super surprising that using svga is - to quote
> one of Weston's DRM-backend people who uses ESXi - 'a juddery mess'.

That's very hurtful. Or it would be but of course you didn't believe
them because they're working on Weston so clearly don't make good
choices in general, right? The presentation on esxi is just as smooth
as it is by default on Ubuntu on new hardware...

> Given that the entire ecosystem is based on this model, I don't think
> there's an easy way out where svga just does something wildly
> different. The best way to fix it is to probably work on predictable
> quantisation with updates: pick 5/12/47/60Hz to quantise to based on
> your current throughput, with something similar to hotplug/LINK_STATUS
> and faked EDID to let userspace know when the period changes. If you
> have variability within the cycle, e.g. dropped frames, then just suck
> it up and keep the illusion alive to userspace that it's presenting to
> a fixed period, and if/when you calculate there's a better
> quantisation then let userspace know what it is so it can adjust.
>
> But there's really no future in just doing random presentation rates,
> because that's not the API anyone has written for.

See, my hope was that with vrr we could layer the weird remote
presentation semantics of virtualized guest on top of the same
infrastructure that would be used on real hardware. If you're saying
that it's not the way userspace will work, then yea, that doesn't
help. My issue, that's general for para-virtualized drivers, is that
any behavior that differs from hw drivers means that it's going to
break at some point, we see that even for basic things like the
update-layout hotplug events that have been largely standardized for
many years. I'm assuming that refresh-rate-changed will result in the
same regressions, but fwiw if I can implement FRR correctly and punt
any issues that arise due to changes in the FRR as issues in userspace
then that does make my life a lot easier, so I'm not going to object
to that.

z
Thomas Zimmermann Jan. 31, 2024, 9:11 a.m. UTC | #8
Hi Zack

Am 30.01.24 um 20:38 schrieb Zack Rusin:
> On Fri, Jan 12, 2024 at 4:20 PM Ian Forbes <ian.forbes@broadcom.com> wrote:
>>
>> SVGA requires surfaces to fit within graphics memory (max_mob_pages) which
>> means that modes with a final buffer size that would exceed graphics memory
>> must be pruned otherwise creation will fail.
>>
>> Additionally, device commands which use multiple graphics resources must
>> have all their resources fit within graphics memory for the duration of the
>> command. Thus we need a small carve out of 1/4 of graphics memory to ensure
>> commands likes surface copies to the primary framebuffer for cursor
>> composition or damage clips can fit within graphics memory.
>>
>> This fixes an issue where VMs with low graphics memory (< 64MiB) configured
>> with high resolution mode boot to a black screen because surface creation
>> fails.
>>
>> Signed-off-by: Ian Forbes <ian.forbes@broadcom.com>
>> ---
>>   drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 20 ++++++++++++--------
>>   1 file changed, 12 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
>> index 28ff30e32fab..39d6d17fc488 100644
>> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
>> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
>> @@ -2854,12 +2854,12 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
>>          struct vmw_private *dev_priv = vmw_priv(dev);
>>          u32 max_width = dev_priv->texture_max_width;
>>          u32 max_height = dev_priv->texture_max_height;
>> -       u32 assumed_cpp = 4;
>> +       u32 assumed_cpp = dev_priv->assume_16bpp ? 2 : 4;
>> +       u32 pitch = mode->hdisplay * assumed_cpp;
>> +       u64 total = mode->vdisplay * pitch;
>> +       bool using_stdu = dev_priv->active_display_unit == vmw_du_screen_target;
>>
>> -       if (dev_priv->assume_16bpp)
>> -               assumed_cpp = 2;
>> -
>> -       if (dev_priv->active_display_unit == vmw_du_screen_target) {
>> +       if (using_stdu) {
>>                  max_width  = min(dev_priv->stdu_max_width,  max_width);
>>                  max_height = min(dev_priv->stdu_max_height, max_height);
>>          }
>> @@ -2870,9 +2870,13 @@ enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
>>          if (max_height < mode->vdisplay)
>>                  return MODE_BAD_VVALUE;
>>
>> -       if (!vmw_kms_validate_mode_vram(dev_priv,
>> -                       mode->hdisplay * assumed_cpp,
>> -                       mode->vdisplay))
>> +       if (using_stdu &&
>> +               (total > (dev_priv->max_mob_pages * PAGE_SIZE * 3 / 4) ||
> 
> Could you export that computation somewhere where we could document
> why we're doing this? Just to not leave the awkward "* 3 /4" that
> everyone reading this code will wonder about?

There's code in VRAM helpers that does a similar test. [1] But the VRAM 
helpers are obsolete. I guess that TTM could contain such a test 
somewhere by testing against a max_bo_size for each TTM resource. 
Whether that is feasible in practice is a different question. Maybe ask 
the TTM maintainers.

Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_gem_vram_helper.c#L1156

> And also make sure you indent this correctly, "dim checkpatch" should
> warn about this.
> 
> z
Daniel Stone Feb. 6, 2024, 2:58 p.m. UTC | #9
On Wed, 31 Jan 2024 at 02:31, Zack Rusin <zack.rusin@broadcom.com> wrote:
> On Tue, Jan 30, 2024 at 6:50 PM Daniel Stone <daniel@fooishbar.org> wrote:
> > The entire model we have is that basis timing flows backwards. The
> > 'hardware' gives us a deadline, KMS angles to meet that with a small
> > margin, the compositor angles to meet that with a margin again, and it
> > lines up client repaints to hit that window too. Everything works on
> > that model, so it's not super surprising that using svga is - to quote
> > one of Weston's DRM-backend people who uses ESXi - 'a juddery mess'.
>
> That's very hurtful. Or it would be but of course you didn't believe
> them because they're working on Weston so clearly don't make good
> choices in general, right? The presentation on esxi is just as smooth
> as it is by default on Ubuntu on new hardware...

Yeah sorry, that wasn't a 'VMware is bad' dig, that was a 'oh that
explains so much if you're deliberately doing the other thing'
realisation. I'm not suggesting anyone else use my life choices as a
template :)

> > Given that the entire ecosystem is based on this model, I don't think
> > there's an easy way out where svga just does something wildly
> > different. The best way to fix it is to probably work on predictable
> > quantisation with updates: pick 5/12/47/60Hz to quantise to based on
> > your current throughput, with something similar to hotplug/LINK_STATUS
> > and faked EDID to let userspace know when the period changes. If you
> > have variability within the cycle, e.g. dropped frames, then just suck
> > it up and keep the illusion alive to userspace that it's presenting to
> > a fixed period, and if/when you calculate there's a better
> > quantisation then let userspace know what it is so it can adjust.
> >
> > But there's really no future in just doing random presentation rates,
> > because that's not the API anyone has written for.
>
> See, my hope was that with vrr we could layer the weird remote
> presentation semantics of virtualized guest on top of the same
> infrastructure that would be used on real hardware. If you're saying
> that it's not the way userspace will work, then yea, that doesn't
> help. My issue, that's general for para-virtualized drivers, is that
> any behavior that differs from hw drivers means that it's going to
> break at some point, we see that even for basic things like the
> update-layout hotplug events that have been largely standardized for
> many years. I'm assuming that refresh-rate-changed will result in the
> same regressions, but fwiw if I can implement FRR correctly and punt
> any issues that arise due to changes in the FRR as issues in userspace
> then that does make my life a lot easier, so I'm not going to object
> to that.

Yeah, I think that's the best way forward ... modelling the full
pipeline in all its glory starts to look way less like KMS and way
more like something like GStreamer. Trying to encapsulate all that
reasonably in the kernel would've required - at the very least - a
KMS-side queue with target display times in order to be at all useful,
and that seemed like way too much complexity when the majority of
hardware could be handled with 'I'll fire an ioctl at you and you
update at the next 16ms boundary'.

I'd be super happy to review any uAPI extensions which added feedback
to userspace to let it know that the optimal presentation cadence had
changed.

Cheers,
Daniel
diff mbox series

Patch

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index 28ff30e32fab..39d6d17fc488 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -2854,12 +2854,12 @@  enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
 	struct vmw_private *dev_priv = vmw_priv(dev);
 	u32 max_width = dev_priv->texture_max_width;
 	u32 max_height = dev_priv->texture_max_height;
-	u32 assumed_cpp = 4;
+	u32 assumed_cpp = dev_priv->assume_16bpp ? 2 : 4;
+	u32 pitch = mode->hdisplay * assumed_cpp;
+	u64 total = mode->vdisplay * pitch;
+	bool using_stdu = dev_priv->active_display_unit == vmw_du_screen_target;
 
-	if (dev_priv->assume_16bpp)
-		assumed_cpp = 2;
-
-	if (dev_priv->active_display_unit == vmw_du_screen_target) {
+	if (using_stdu) {
 		max_width  = min(dev_priv->stdu_max_width,  max_width);
 		max_height = min(dev_priv->stdu_max_height, max_height);
 	}
@@ -2870,9 +2870,13 @@  enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector,
 	if (max_height < mode->vdisplay)
 		return MODE_BAD_VVALUE;
 
-	if (!vmw_kms_validate_mode_vram(dev_priv,
-			mode->hdisplay * assumed_cpp,
-			mode->vdisplay))
+	if (using_stdu &&
+		(total > (dev_priv->max_mob_pages * PAGE_SIZE * 3 / 4) ||
+		total > dev_priv->max_mob_size)) {
+		return MODE_MEM;
+	}
+
+	if (!vmw_kms_validate_mode_vram(dev_priv, pitch, mode->vdisplay))
 		return MODE_MEM;
 
 	return MODE_OK;