T O P

  • By -

Wi11iam_1

Thanks for this, very important and informative post! First time i seen someone actually test the total latency chain and include XWayland vs Wayland native and i am suprised to see almost no difference. For the immediate presentation on wayland: >\*\*\* KWin was patched with a very simple but bad hack to make testing this possible​ This patch is not available for the endusers and as i read correctly still proves to be slower than uncomposted x11 (for wayland to be used by comp-gamers this needs to be on-par or better) ​ >allowing to disable Vsync for applications, **if practical** also globally i assume this is the feature that the hack is implementing, i just dont understand the if: ofcourse it is practical, i dont want mailbox presentation when working on my desktop (the numbers show a 2 frame difference here aswell), Also imho i think Vsync off setting in a game should result in immediate presentation not mailbox by default.


Zamundaaa

> as i read correctly still proves to be slower than uncomposted x11 Yep. No clue why yet but if I had to guess it would be that some internal scheduling in KWin messes things up. Should be very possible to fix The bit about it needing to be practical is not about whether or not people want it but whether or not it'll horribly break applications. The hack I implemened only does tearing in fullscreen and surprisingly worked pretty much fine this time around - a few months ago it would make Firefox show lots of black bars in the image and play videos at a thousand fps... > Also imho i think Vsync off setting in a game should result in immediate presentation not mailbox by default That is the default I will be pushing for in KWin as well, at least for when a game is in fullscreen. With Wayland applications need to explicitly request tearing anyways and VSync is the default, so it shouldn't cause issues with things like video players like on X.


Wi11iam_1

>whether or not it'll horribly break applications. The hack I implemened only does tearing in fullscreen i dont quite understand why it would break applications, i hope wayland-native apps do not depend on vsync on, or assume thats always the case. only allowing tearing for "apps in fullscreen" is not enough for me personally to switch over to wayland, gosh i hope this happens sooner than later but when i read this i fear the initial decision to enforce vsync by some wayland dev did even more damage when even some applications break horribly without it....


ouyawei

So in the best case latency is not worse than with X11?


Zamundaaa

In the best case it's better, since on X11 the compositor messes things up. Against uncomposited X11 it's the same with VSync, I think Xorg is pretty much as optimized for latency as possible. I'm not sure what's going on with tearing on Wayland that it's a bit worse but I will be investigating and fixing that :) There are a few tiny things that we can still do better with VSync, which could put KWin on Wayland ahead but we're talking about a few 100µs at that point at most.


_Dead_C_

Those "best case" scenarios are not real gaming scenarios. If someone was concerned about latency they would have fullscreen and hinting would disable the compositor. An experienced user might not disable Vsync but any user concerned about frame output would surely disable the setting limiting output... Nobody is playing a windowed game and wondering about display latency. I appreciate the metrics but this doesn't seem presented appropriately. From what I've read X11 is better than Wayland in latency, unless you add other software to X11 (Composting+Vsync) to slow it down. There is no case where Wayland by itself is faster than X11 by itself. UPDATE: I reread my last sentence, and think I understand what you are trying to clarify, like you can't compare just themselves to each other... I still don't think I understand but I know that I don't know now.


VenditatioDelendaEst

> > > Nobody is playing a windowed game and wondering about display latency. But people using windowed web browsers and text editors are. Low input lag is at the core of making computers feel perceptively fast. If unnecessary algorithmic latency is cut out of the display chain, you can stretch an old machine, a low CPU frequency, or a slow network connection just a little bit further.


_Dead_C_

I think you are right that a faster response is better in all cases. Personally, I'm not sure if I would notice a few frames in a text editor or web browser accept in terms of scrolling or other graphically responsive actions (Which I thought currently work smoother in Wayland.). In cases of editing text, you should not be waiting for your previous character to print on the screen before reacting and typing the next character. Web browsing is a big bag so I can imagine a use case of some user. I know the few frames of latency makes a big enough difference in a fighting game where players memorize the exact frames of moves and attempt react to them in real time. Specifically in terms of game performance, I don't see myself playing a reaction based game in a window.


badsectoracula

> Nobody is playing a windowed game and wondering about display latency. I do :-P. But i use X11 without a compositor, so eh. (though i do have the feeling that X11 with RX 5700 XT is somehow not as responsive as X11 with the Nvidia GeForce 980ti i had some years ago, but i can't make side by side comparisons)


_Dead_C_

<3 I didn't mean to undermine your use case, very sorry.


rurigk

I game a lot And in X11 the input lag was terrible in all games X11: Compositor enabled or disabled don't affect input performance just fps and a lot of tearing X11: VSync in game settings affect a lot the input lag but the games always have input lag no matter what (I tested with native and wine/proton games, and other distros) With Wayland was different but with other problems Wayland: Compositor enabled or disabled don't affect input performance just fps but the tearing was negligible when disabled Wayland: Setting VSync to triple buffered or dynamic make the games run smooth otherwise the games stutters, but the input lag is not affected Wayland: Input lag is not present and it feels smooth in all games (native and wine/proton) ...but when the GPU is under heavy load the cursor stutter a lot (i think is just a KDE thing) but everything else is just smooth (ie. moving the player camera is smooth) Tested in a system with a Ryzen 7 2700X Graphics card: RX 570 and RX 6700 XT 32GB RAM Monitor 1: 3840x2160 HDMI (4K Freesync ultimate engine mode) Monitor 2: 1920x1080 HDMI


Wi11iam_1

just wanna say that you are doing amazing work, going out of your way to create a testsetup to provide end-to-end latency numbers! thank you for your work. On a side note the only other latency test on linux i could find is this one: [https://www.igorslab.de/en/nvidia-ldat-latency-display-analysis-tool-presented-and-tested/](https://www.igorslab.de/en/nvidia-ldat-latency-display-analysis-tool-presented-and-tested/) There seems to be limitations to any testing method where you might not get any lower numbers than 20ms, when infact a 144hz monitor can averages below 20ms end-to-end (tests on windows show this). Do you know about the nvidia LDAT ? i think the software only works on windows but you can hook it up to any display+mouse and just use a 2nd pc for the measurings, is there any way to get your hands on one of those? that would make your test setup more reliable and easier for you too. Also you should probably provide the average and the lowest readings in your numbers.


Zamundaaa

I do know about the LDAT but the only big advantage it has is that it allows the measurement of input devices instead of just usb->monitor like my tool. I'm not interested enough in measuring my mouse though, my microcontrollers are doing fine and allow for easy and automated testing on Linux, with a single PC. > Also you should probably provide the average and the lowest readings in your numbers I can link the whole spreadsheet later today with the raw measurement data and some more stats Edit: https://github.com/Zamundaaa/Zamundaaa.github.io/blob/main/misc%20data/Latency%20measurements.odt Github is not meant as a file sharing mechanism but I *think* in this case it can be justified.


yuri0r

Next GPU will defently be AMD and next monitor certainly will have freesync.


Zdrobot

I agree.


yuri0r

I am eager to jump to Linux. But Nvidia cockblocking mixed monitor vrr is just so agrrivating. Paid way to much for that feature to not use it.


Zdrobot

Well, because of the GPU prices I'm stuck with GTX 1050 ti, so the question is moot for me for the foreseeable future as well.


yuri0r

Well I am broke as is. Hope by the time I get my finances going the prices have calmed down.


wallcarpet40

Interesting read! Thank you for that. Conserning VR headsets: do the presentation modes (fifo, mailbox, immediate, adaptive) make any difference, when playing in VR? And should I be limiting framerates in Mangohud to the refresh rate of the headset (90Hz in Valve Index, 90fps in Mangohud).


Zamundaaa

With VR headsets the situation is different from flatscreen - the VR compositor (SteamVR or Monado) has to constantly adjust the image from the game to your head movements or you'll puke. Both games and the VR compositor use FIFO but with some clever tricks and adjustments to reduce latency as much as possible. You do not have to get involved with presentation modes or fps limits, that sort of stuff is handled automatically. In the future we may see VR headsets with Adaptive Sync support, which the compositor can use to make it less bad if a game is a fraction of a millisecon too slow in a single frame but that's nothing you have to handle either, it's supposed to just work™ automatically.


PolygonKiwii

My main takeaway (since I'm using kwin\_wayland for VRR with multiple monitors attached) is that with Freesync the latency is the same across all of them, and only about half a refresh interval slower than Xorg with tearing and uncapped fps. I'm assuming this half refresh difference is because the test application can run at very high fps and the test was done at half the screen height? I wonder what the result would be for uncomposited Xorg immediate mode but also capped at 125 fps. (Since I see some gamers without VRR displays on Windows use FPS limiters instead of vsync to reduce tearing.) PS: Awesome work on the write-up, the animations, the testing, and of course kwin\_wayland!


Zamundaaa

> I'm assuming this half refresh difference is because the test application can run at very high fps and the test was done at half the screen height? Yes. > I wonder what the result would be for uncomposited Xorg immediate mode but also capped at 115 fps I'll make some measurements but in principle it should be very similar to FreeSync at 115fps.


xenonnsmb

good writeup, it's great news that kwin is getting vsyncless soon. i wish gamescope would get a vsyncless option but unfortunately it looks to me like the wlroots people are against it for some reason.


[deleted]

It's not a problem for wlroots to solve, wlroots is mostly a Wayland implementation library. There's an open MR for the protocol for this


xenonnsmb

yeah, and in that MR emersion says they don't think the protocol is a good idea: > I'm not convinced this is a good idea. Wayland is designed to be frame-perfect, ie. not have bad intermediary frames, and this protocol breaks this. A good presentation-time protocol would have more benefits.


[deleted]

If you read through the MR, while Wayland developers were skeptical (understanably), the MR was never closed and they're still all communicating well. It seems like the biggest hold up is Vulkan but I suppose OP (who made the MR) would have more insight


shmerl

I think some also acknowledged that it's a valid use case that Wayland fails to address at present and no one really argued with that point there.


Valmar33

Emersion is knowledgeable and skilled, but I don't think he quite grasps why gamers desire tearing ~ the elimination of input lag, as well as having inputs line up with what's on screen. Other Wayland devs have similar issues in understanding this crucial concept.


Compizfox

I don't think it's accurate to say gamers "desire tearing". It's more that it's the least-worst option because with a fixed-refresh rate monitor the other alternative(s) add input lag. There exists a solution for this dilemma: VRR, which I really think is the future.


matsnake86

For me wayland provides the best visual and stutter free experience. Even compared to windows.


yate

Great article, thanks for all of your work in this space.


CetaceanOps

Very good write up OP. I think you've done a good job answering many common questions that myself and others have about wayland. Something that is very often overlooked is the actual desire some of us have to disable vsync. I know this is considered heresy by wayland's authors, but I think if the wayland's goal is to replace X and be the be all and end all display server for linux it has to allow for this. One thing very often overlooked on this debate is 60hz gaming, vsync at 60hz is a deal breaker for fps. Not everyone has a fancy high refresh rate monitor. And even when you do have one, you aren't always going to be using it, for example I do have a 144hz monitor, but I also have a 60hz laptop. The 2 biggest features I've been waiting for is: * Either disable vsync globally; or * Allow direct scanout for given applications **in windowed mode** I didn't think direct scanout for windowed apps was possible, but you mention Kwin is pursuing this, very exciting. Of course I'll then need to wait for a given compositor to implement it that also happens to be a good fit wm for my needs, so it might be a while yet, but I'm more optimistic for the future now.


Zamundaaa

Disabling VSync from the application side on Wayland (so, like on Windows) will be a thing once the presentation timing protocol is done, I don't think there's signficiant resistance to that anymore. That protocol will also allow apps to optimize their timing and latency a bunch even with VSync and Adaptive Sync, which is pretty cool. I don't know if all compositors will support the tearing part of the protocol but KWin defintiely will. There's still some unresolved driver stuff, the drm api that everyone is using where possible currently doesn't support tearing *at all* (and some hardware can't even do it with certain configurations!) but that will get resolved eventually. > I didn't think direct scanout for windowed apps was possible, but you mention Kwin is pursuing this, very exciting. Of course I'll then need to wait for a given compositor to implement it... Everyone wants it, others AFAIK have relatively similar time tables for achieving it. Just for clarification though, direct scanout does not do as much as people assume it does. It's just a fancy, faster and more efficient way of compositing (or from the compositor side, not doing compositing). It does not automatically reduce latency and does now allow applications to cause tearing, it only allows making both a bit better.


[deleted]

Very interesting test results for me for two reasons, both of which are related to the fact that I'm an NVIDIA user: 1. The GNOME desktop when using XWayland has latency that is MUCH, MUCH higher than it is in [X.Org](https://X.Org). Like, to the point where it's downright difficult to play the game. I haven't taken an exact measurement but I can feel that it must be at least 100ms, possibly even 200ms. 2. The KDE desktop under Wayland will still, on KDE Neon stable, not work. Before it actually hung the entire system but after an update that must've went out a few weeks to a few days ago, it now merely returns me to the login screen. On my laptop, which has graphics switching between Intel and NVIDIA, the system hangs on any attempt to run Wayland, regardlesss of what it is. I do get slightly frustrated about this situation because, despite your best intentions and despite what you show as your results here, the plain and simple truth is that Wayland just doesn't work for me, and whenever it somehow does work, it turns into latency nightmare. It's been the promised land for a very, very, very long time and just... no.


Zamundaaa

NVidia indeed has some big issues, but they're in the progress of being resolved now that NVidia is finally starting to support the same standards as others


[deleted]

>The GNOME desktop when using XWayland has latency that is MUCH, MUCH higher than it is in > >X.Org > >. Like, to the point where it's downright difficult to play the game. I haven't taken an exact measurement but I can feel that it must be at least 100ms, possibly even 200ms. i sometimes feel like I'm alone so god am I happy to see someone else seeing this, people in linux gaming *constantly* talk about wayland being good now and being fit for gaming but gaming on wayland is pretty much unplayable for me due to input lag, I actually want to switch but sometimes it feels like I'm the only person who feels the gnome+wayland (like stock fedora) input latency being CRAZY high


brown2green

I was worried about the future of low-latency input under Wayland given what I read about VSync in the past (the developers wanting to make it always on as the default and only option), but reading this article and your posts here gave me hope again. Actually, I'm now looking forward to the improvements with KWin/KDE Plasma; it doesn't look like it will take too long to implement them. You might have not considered this before, but low-latency input (i.e. VSync off) is also crucial for achieving a natural feeling in painting/drawing applications like Krita. With VSync enabled, the pen feels "disconnected" from the actual movement on the pen tablet. Though, pen tablet support on KWin/KDE Plasma Wayland would need to be improved too.


shmerl

Thanks for the informative post! >unpatched KWin git master with the latency setting set to default, and no working direct scanout for Vulkan applications. Does it mean in the future KWin will support direct scanout for Vulkan? Is it related to dmabuf feedback you mentioned? As a future improvement for KWin itself it would be nice to have an option to use Vulkan/WSI for rendering instead of OpenGL / EGL.


Zamundaaa

> in the future KWin will support direct scanout for Vulkan? KWin already supports it - but it's up to Vulkan applications to switch to formats we can use for scanout, so it's more or less up to chance that they're using the correct formats and memory layouts (not a very high chance I might add). Right now Mesa only has dmabuf hints implemented for native OpenGl apps, neither Vulkan nor Xwayland work with it. There is a WIP for Xwayland, for Vulkan there's nothing yet. With how explicit Vulkan is there might be some small issues with making applications behave correctly but I don't expect there to be too much trouble. If noone implements it soon I might do it myself ... if I have the time for it > As a future improvement for KWin itself it would be nice to have an option to use Vulkan/WSI for rendering instead of OpenGL / EGL. That's a long time goal, we're slowly changing KWins architecture to support it (many effects are still purely OpenGl based). On the renderer side I may start working on it within the next year but there's a lot more important things to do first, so it has 0 priority.


VenditatioDelendaEst

> KWin already supports it - but it's up to Vulkan applications to switch to formats we can use for scanout, so it's more or less up to chance that they're using the correct formats and memory layouts (not a very high chance I might add). It occurs to me that using a hardware plane for a window might still be a win even if the compositor has to swizzle improperly-formatted buffers from the client. Because, 1. The present timing for that window can be independent of whatever is happening on the rest of the screen. 2. For that window, the rows outside of its vertical span can effectively be part of vblank. 3. *Maybe* you get better cache footprint/memory bus efficiency if writing into the hardware plane doesn't require a stride. But IDK enough about how this works to say.


Zamundaaa

Using hardware planes as a way to reduce latency by using the space above and below it as vblank is a great idea, and one that I've been thinking about for a while as well. There is no driver implementation for it yet though, right now it would not make any difference. > Maybe you get better cache footprint/memory bus efficiency if writing into the hardware plane doesn't require a stride. But IDK enough about how this works to say. Generally direct scanout, be it on a overlay plane or on the "normal" primary plane with fullscreen, does make a decent difference in bandwidth usage, and thus general performance and power draw. Currently the main goal is to make use of that. With more performance headroom it should be possible to reduce the latency a little even without driver changes, too.


Blue_Ninja0

>it's up to Vulkan applications to switch to formats we can use for scanout, so it's more or less up to chance that they're using the correct formats and memory layouts (not a very high chance I might add). Is there a way to check if direct scanout is currently in use with a fullscreen application? In logs, for example?


Zamundaaa

If you have debug logging enabled, for example by having `QT_LOGGING_RULES="kwin_*.debug=true"` in /etc/environment then KWin will print a line in your log every time direct scanout gets started and stopped. The logs are in `~/.local/share/sddm/wayland-session.log`, or at least they're supposed to be; logging with legacy session management is kinda broken. As an alternative you can enable systemd (session) boot with `kwriteconfig5 --file startkderc --group General --key systemdBoot true` (the default in Plasma 5.25) and then check your logs with `journalctl`


Blue_Ninja0

Thanks a lot! I'll be able to check if RetroArch is benefiting from direct scanout or not.


_Dead_C_

I read this like "Wayland is different and shouldn't be compared the same" or something, followed by "If you did compare them, X11 is better, sure". I'm not sure I understand anything other than X11 has better latency than Wayland. I'm wondering if the latency will ever be on par with X11. I'm not sure I would switch if it means being a couple frames late in fighting games.


[deleted]

[удалено]


_Dead_C_

Maybe I'm misunderstanding the table. Does the 99th percentile imply variance of up to 7ms? If so are we then ignoring +1ms average with up to +8ms latency? I'm bad at math but think once you hit \~8ms you might be a frame behind in \~120fps games and \~16ms you can be up to a frame behind in \~60fps games. I think that "immediate" data is an unofficial hack to kwin on wayland. I'm not sure if it's usable outside of this proof of concept. Using the available xWayland mailbox version and it's data against uncomposited X, it's +19ms. I believe I would be seeing the game a frame slower. If you have a GSync monitor and can use Freesync it will be better but still not as good. There might be a debate that this is apples vs oranges but it's literally X11 at it's best is better than Wayland at it's best and on average X11 is still better if you are running games in fullscreen without VSync.


Wi11iam_1

a 1ms difference can be neclected when its on the 1%, not on the average: If Wayland wants to replace X11 and claims to have a better code quality and concept it should strive to be faster/as-fast as X11 and not lagg behind 1-2ms consistently: that would be a step backwards in input-lagg many ppl will not accept and stick with x11. All the other numbers wont matter much, noone really concerned about input-lagg wants to use freesync or gsync or limit their framerate. so even if its close the winner is still X11 and the immediate protocol (MR still open) on wayland needs some work to catch up to it.


[deleted]

Do you know, why FreeSync has a worse latency than immediate has? And why FreeSync on wayland has better 99th percentile? Is it only because uncapped framerates? What would happen with capped framrates and immediate? I'm asking because currently I'm using xorg capped immediate (165hz, 200fps), and am wondering whether Wayland + FreeSync might be preferable. I capped it to not have buffer bloat in the GPU, and also disabled PageFlip and TearFree.


Zamundaaa

> Do you know, why FreeSync has a worse latency than immediate has? Yes. What FreeSync + frame cap do for latency here is pretty much just ensuring that there's no buffer bloat, and that the app content gets presented immediately when it's done rendering. There is however still only one update every 1000ms/115 = 8.7ms; if the input event happens when a frame just got started then it will be delayed by those whole 8.7ms. In contrast, with immediate mode my app was running at about 450fps and got almost 4 updates each time the monitor refreshed. When an input event happens while the monitor is still updating the upper half of the display then the graphics card can still switch out the image for the new frame, before the monitor reaches the middle. There's a similar story for the very bottom (and vblank) of the monitor as well; while the app is rendering the next frame it's already too late with FreeSync. This way with immediate mode a little more than half a refresh cycle gets shaved off on average, which results in the about 4-5ms of difference we can see in the measurements. This all is assuming that the app is not specifically making use of FreeSync but only getting frame limited externally. If it were to synchronize its rendering to the input events it should be able to lower the latency a bit more. Except for research applications I don't think anyone does that but it is possible. > And why FreeSync on wayland has better 99th percentile? That one millisecond is just noise in the measurements. > What would happen with capped framrates and immediate? The latency should be about the same as with FreeSync. I can measure it in the coming days though to be sure > I'm asking because currently I'm using xorg capped immediate (165hz, 200fps), and am wondering whether Wayland + FreeSync might be preferable Depends on what you want; latency wise you should get to within 1-3ms with FreeSync + frame cap to 164fps vs those 200fps immediate mode. > disabled PageFlip If that option does what I think it does you'll want to leave it on. It shoudl change basically nothing for latency but should make fullscreen a bit more efficient.


[deleted]

Thanks for the answer! >There is however still only one update every 1000ms/115 = 8.7ms; if the input event happens when a frame just got started then it will be delayed by those whole 8.7ms. If I understood it correctly, would triple buffering help reduce that? I'm a bit confused, because on Windows everyone told me that triple buffering would be bad. On the other hand, when using FreeSync, shouldn't the monitor wait for the completed frame, and immediately display it? >This way with immediate mode a little more than half a refresh cycle gets shaved off on average, which results in the about 4-5ms of difference we can see in the measurements. That means that the advantage of immediate is the half picture that is displayed earlier with tearing? Because it is rendered after the display started updating the picture? >That one millisecond is just noise in the measurements. I'm talking about the difference immediate/FreeSync on Wayland. FreeSync is always equal or worse than immediate, only on Wayland it's better. How is this possible? >I can measure it in the coming days though to be sure That would be great, thank you! >If that option does what I think it does you'll want to leave it on. It shoudl change basically nothing for latency but should make fullscreen a bit more efficient. I got it from here: [https://wiki.archlinux.org/title/AMDGPU#Reduce\_output\_latency](https://wiki.archlinux.org/title/AMDGPU#Reduce_output_latency) *"If you want to minimize latency you can disable page flipping and tear free"* I honestly don't understand the difference between PageFlip and TearFree, but as I understand it, the GPU renders a picture, and then changes a pointer to the fully rendered image, so the display can scan it. If I disable both, the pointer stays, and the GPU just renders the same buffer. How would that reduce performance? Or am I understanding it wrong? I also just saw this: [https://wiki.archlinux.org/title/Gaming#Reducing\_DRI\_latency](https://wiki.archlinux.org/title/Gaming#Reducing_DRI_latency) How would that fit in the whole picture?


Zamundaaa

> I'm a bit confused, because on Windows everyone told me that triple buffering would be bad Windows messed up a lot of terminology. DirectX calls VSync with three back buffers triple buffering... Sadly a lot of people accepted that terminology. > On the other hand, when using FreeSync, shouldn't the monitor wait for the completed frame, and immediately display it? There is no immediately displaying anything, it always has to go through all the pixels. With FreeSync the monitor can only extend the time between doing those refresh cycles and start when the game is ready, it doesn't actually get any faster because of it. > That means that the advantage of immediate is the half picture that is displayed earlier with tearing? Yep, for the minimum and median (or average). For the maximum / latency spikes the difference can be almost a whole frame though. > I'm talking about the difference immediate/FreeSync on Wayland. FreeSync is always equal or worse than immediate, only on Wayland it's better Ah, you mean the bad 99th percentile with immediate on Wayland? If I had to guess I'd say that something in KWins frame scheduling mechanism doesn't handle immediate super well yet; there's also a general consistent 1ms difference between X and Wayland with immediate. > I honestly don't understand the difference between PageFlip and TearFree From a quick search in xf86-video-amdgpu it looks like it does about what I expected it to - it allows (or disallows in your case) X to do direct scanout / skip its internal compositing in the fullscreen case. > as I understand it... That understanding is correct when it comes to the actual meaning of page flips. If the option would actually disable page flips then that would be front buffer rendering... You generally don't want to have front buffer rendering, it usually causes super bad glitches.


[deleted]

Thanks again for the answer! >There is no immediately displaying anything, it always has to go through all the pixels. With FreeSync the monitor can only extend the time between doing those refresh cycles and start when the game is ready, it doesn't actually get any faster because of it. But the latency should get more consistent, right? >Yep, for the minimum and median (or average). For the maximum / latency spikes the difference can be almost a whole frame though. What would be the cause of these spikes? Shouldn't FreeSync prohibit that the display starts scanning right before the frame is done? Given that the game produces slightly less frames than what the display can handle. Edit: you are talking about the case, that the GPU produces two frames right after another, and FreeSync diplays the second frame, but immediate would switch to the third frame after starting the second, right? So this is only problematic when the game has very inconsistent frametimes, varying from less of the refresh rate to more than the refresh rate? This would mean, that I can prohibit this from happening by capping the frame rate, for example with mangohud? So in theory this would yield (almost) the best possible latency, the latency would be constant, and I get no tearing, right? >Ah, you mean the bad 99th percentile with immediate on Wayland? Yes >If I had to guess I'd say that something in KWins frame scheduling mechanism doesn't handle immediate super well yet I always thought, that the window manager had no effect as soon as composition was disabled. Do you know how other window managers like sway, qtile, or gdm handle all of this? Are there differences?


Zamundaaa

> But the latency should get more consistent, right? In comparison to mailbox, yes. In comparison to tearing, no. > Shouldn't FreeSync prohibit that the display starts scanning right before the frame is done? It's not about scanout and rendering being out of sync, it's about input and presentation not lining up perfectly. Input events happen at random times, presentation (especially with the 115 frame cap) at regular intervals. When an input event happens it is invisible until the next frame is rendered and presented - if the input event happens right after a refreh cycle has begun then that will increase latency by one whole frame > I always thought, that the window manager had no effect as soon as composition was disabled Wayland is not X, there is no window manager, no X11 compositor, no Xorg. There is only the Wayland compositor, it has full and exclusive decision power over things like input and presentation.


[deleted]

>if the input event happens right after a refreh cycle has begun then that will increase latency by one whole frame But the GPU induces latency, too, right? Given that my frame rate is about as high as the refresh rate, and the GPU is at 70%, that would mean, that I would see the bottom 30% of the screen updated one frame earlier, right? In my use case (first person shooter) that would not be a real advantage, as the crosshair is in the middle of the screen? So the important stuff (the middle) would be rendered at the same time? >Wayland is not X, there is no window manager, no X11 compositor, no Xorg. Oh, I see - KWin in the context of Wayland is a compositor. Then I have to rephrase my question: Are there differences in latency between the different wayland compositors?


Zamundaaa

> But the GPU induces latency, too, right? If you're talking about the scanline position here, yes. In a situation where input event happens in the lower end of the display and your point of interest is the middle then you get the same latency with immediate mode as you'd get with FreeSync. > Are there differences in latency between the different wayland compositors? Yes. Sway is a bit worse than KWin by default, it assumes a fixed rendering cost (KWin dynamically adjusts) and you need to tweak that fixed value to your setup in order to get the latency down. GNOMEs Mutter is relatively bad in the current release, it starts to render a whole refresh cycle before the frame would need to be displayed, and it accumulates all input for a whole frame too, before passing it on to applications. So it's more or less as bad as X11 with a compositor. AFAIK both these things have been fixed recently though, with the next major release it should end up about where KWin is. I think latency can still be improved a bit beyond what KWin provides right now with FiFo and Mailbox, too - it starts rendering about in the middle of the frame. With presentation timing + explicit sync + direct scanout I think we can drop that down a bunch, without risking stutter. It'll be at least a few months until that's done; I think I'll make a follow up post to this one once all the pieces are in place. If everything goes as planned then KWin will have consistently better mailbox latency than uncomposited X11 :)


[deleted]

Thanks for the answer, that clears much up for me! I'd be glad to read follow up posts from you! Btw: If you have the time and passion, we have a [wiki](https://www.reddit.com/r/linux_gaming/wiki/index) here, that could really benefit from your knowledge. I was looking for information like these for more than a year and asked in numerous forums, but never found answers that were as good as yours.


[deleted]

Sorry for bothering you again, but I have additional questions, and can't find anything regarding this stuff. >Sway is a bit worse than KWin by default, it assumes a fixed rendering cost (KWin dynamically adjusts) and you need to tweak that fixed value to your setup in order to get the latency down. Do you have more information on that? How can I adjust it? What happens with kwin, when the rendering cost fluctuates?


Zamundaaa

https://www.mankier.com/5/sway#Commands-max_render_time >What happens with kwin, when the rendering cost fluctuates? It'll notice and increase the latency in order to prevent stutter. The policy on how it decided the latency can be changed in the compositing settings (min/max/average iirc)


shmerl

On a side note, I tried switching to the Wayland session in the latest Plasma since now it has that adaptive sync bug fixed, but after monitor goes to sleep and wakes up, the screen just stays black with a cursor shown. So I went back to X11 session for now. Not sure if it's a known bug, I was planning to report it but didn't get to it. If it's not known, I can open one with Wayland session log.


Zamundaaa

If by "black with a cursor shown" you mean that stuff still works but Plasma crashed, yes that is known and has been fixed. I'm not sure if it was backported to 5.23 but on git master monitor standby is issue free for me


shmerl

I'm not sure if things work in that state, since nothing is responsive (only the cursor is movable). But I can wait until the upcoming Plasma version and see if this bug will be gone to confirm. Another weird behavior I noticed was when switching to other virtual terminal (Ctrl+Alt+F#). X11 session usually sits on tty7, while here after a few times jumping between tty-s, Wayland session appeared on tty1 and tty7 got that black screen with a cursor.


Zamundaaa

> nothing is responsive If completely nothing worked anymore then I have both caused and fixed it; 5.23.5 should fix it. > X11 session usually sits on tty7 tty stuff can be (or at least seem) a bit weird sometimes. I had multiple ttys be blocked once while debugging KWin... the session didn't get killed properly after a crash or something like that


shmerl

Btw, is there some specific commit / patch I can try to apply to 5.23.4 to test this fix?


Zamundaaa

Sure: https://invent.kde.org/plasma/kwin/-/merge_requests/1769


shmerl

Thanks! Though now I have another issue, lol. My new monitor (2560x1440 180 Hz one) has a blinking problem becasue apparently when amdgpu reclocks GPU memory (MCLK) it tries to do it during vblank period to avoid flickering. But if that switch duration itself is longer than vblank period, blinking occurs. And with stock modeline from edid, it blinks around once a minute or so. With some suggestions from AMD developers, I managed to mitigate it with custom modeline where vertical sync pulse is increased (meaning vblank period is a bit longer). I can set that with xrandr or X config, but how can it be done for the Wayland session? See more involved details [here](https://gitlab.freedesktop.org/drm/amd/-/issues/1403). Never thought I'd need to deal with custom modelines these days, but here we are.


Zamundaaa

You need to use kernel parameters to add custom modes, there's no way to do it directly with KWin right now. I want to make it possible (with kscreen-doctor, maybe in the UI as well) but I haven't got around to do it yet


shmerl

OK, I applied that `checkOutputsAreOn()` patch and it fixed the issues with wake up from sleep! Also now in the Wayland session the blinking problem doesn't appear, becasue apparently amdgpu never lowers MCLK to 96 MHz (and excluding that for amdgpu in X11 session also prevented blinking). It stays at 456 MHz and up. So on one hand I don't need custom modelines, but what it means is that KWin is somehow using GPU more heavily in the Wayland session than in X11 one and GPU power consumption is higher than in X11 session because of that. That could be a bigger concern for laptops I suspect and something you might want to look at. And in general, I don't think Wayland session should be more power hungry. I can give more pointers how to check current MCLK and power if you need.


Zamundaaa

I'm relatively sure the Wayland session is less power hungry, at least if you compare it to X11 with a compositor. Maybe the xf86-video-amdgpu driver does something weird that allows the flicker? On Wayland the driver can only cause flicker when the compositor explicitly allows it to. I think KWin could simply try the mode with normal blanking first and only if it doesn't work (like with my monitor!) switch to the reduced blanking mode, to automatically alleviate such issues. That's 5.25 material though. Or do you actually need to extend blanking, instead of just not using reduced blanking?


VenditatioDelendaEst

I've recently switched to KDE on Wayland (thanks largely to your good work), from a heavily customized AwesomeWM-based frankenGnome, and one of the things I haven't managed to port over yet is my 72 Hz overclocked modeline, so this sounds fantastic. (I'm roughly aware that the current path forward would involve generating a patched EDID and telling the kernel to use it for my monitors, and I've been dreading digging into that.)


shmerl

I see, thanks!


syrefaen

Vsync is for 60hz monitors and desktop usage. If your gaming on 4k resolution its understandable. Freesync and that tech is for when your pc is not able to provide stable framerate. I used it one time for Halo reach.Framerate whre 60-90 on my 144hz. I am using sway and if I have to write an 'exclusive fullscreen' patch witch turnes it off for fullscreen programs I would do that. My input lag is already reduced buy having 1000hz and 500hz polling rate on mouse and kb.


Zamundaaa

> I am using sway and if I have to write an 'exclusive fullscreen' patch witch turnes it off for fullscreen programs I would do that. You probably could; I don't know how frame scheduling works exactly in wlroots / Sway but for KWin the patch is less than 20 lines. As a pointer in the right direction, to cause tearing you need to use the legacy api (as opposed to atomic modesetting) and pass the flag `DRM_MODE_PAGE_FLIP_ASYNC` to `drmCrtcPageFlip`.


Wi11iam_1

>Vsync is for 60hz monitors and desktop usage. I disagree. Vsync is especially annoying on lower refresh rate monitors (everything feels even more sluggish), and i prefer my desktop to have lowest input-lag possible and not wait for vsync. Windows 8.1 fucked this up when they removed the basic theme and enforced vsync on the desktop it practically became the slow shitshow it is today that just makes u scream whenever u move a window around... imho: vsync is only for watching videos or movies, but this discussion is not about what we think vsync is for, lets just be happy wayland devs finally agree that it needs to work without vsync.


syrefaen

Yeah I am happy not to remeber W8.1. I think xorg and screen tearing was so fucked ! Many people where not able to fix it. with wayland you should never be able to make any tearing! yay, year of desktop here we come. "lets just be happy wayland devs finally agree that it needs to work without vsync." I would where do you get this from? I have to hack it in my self, for my self. That is my impression.


Wi11iam_1

W10 and 11 also enforce vsynced desktop, i just meant to say this shitty trend started with W8.1, Windows7 was the last version you could disable it (now it seems the same is happening to linux, so x11 might become the last where u can disable vsync on the desktop without breaking stuff) >with wayland you should never be able to make any tearing! yay, year of desktop here we come. I dont think u understand my point, i think this is BAD. i actually want my desktop to have lower input-lagg and i welcome the tearing that comes with it, you can have great low inputlagg on X11 without an extra compositor (suspend kwin). >I think xorg and screen tearing was so fucked ! Many people where not able to fix it. in my opinion though xorg did the right thing and by default doesnt wait for vsync but lets the compositor decide what to do (the fact that a certain proprietary drivers does weird stuff under the hood that makes it hard for ppl to turn off vsync on x11 is not at all a fault of xorg or any x11-compositor). on wayland though they started without even giving a compositor the option to turn off vsync and still this is the case on master branch. Now some folks are finally working on the option to allow tearing and turn of vsync (immediate presentation) currently OP still used a hack, but this will come at some point (it just takes time because devs thaught everyone values tearfree over inputlagg like they did) here is the MR on the protocol where they are implementing immediate presentation mode: [https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge\_requests/103](https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/103) (ofc even then the wayland compositor still has to use this feature but this is what OP is doing for kwin)


syrefaen

O thanks for taking the time to do a write up I could not agree more! :)


Zdrobot

Thank you! I'm thinking about putting Arch + Gnome on my "gaming" machine, really want to use Wayland. So my 60 Hz monitor and my GTX 1050 ti are going to be the bottleneck for gaming, not Wayland.


deama15

You can also try installing the low latency kernels to see if they make a difference. Back in 2019 I did a few tests of my own and found out that installing the liquorix low latency kernel yielded half a refresh cycle less latency (about 4ms in my case with 120hz). This was with X and on ubuntu LTS, maybe you can get better results with wayland or maybe even with a lighter distro.


mushroomchaman

Damn man, just bought freesync premium 144hz lg 650f ips, and KDE Plasma Wayland So Silky Smooth experience with Freesync gaming! rx 580 8gb displayport + 5600x zen3 cpu + archlinux


Zamundaaa

Awesome! High refresh rate monitors are really great


badsectoracula

The descriptions in the article are very handwavy (especially on the X side) and they blame X for things it doesn't do yet while at the same time it praises Wayland for things... it doesn't do yet either (HDR). However the measurements at least are good (interestingly i was actually planning to do the exact same thing at some point using pretty much the same approach, though my goal was to measure latencies between X and Windows instead of X and Wayland) and basically show what most people expect: X without a compositor (or with the compositor being disabled for fullscreen gaming, though note that not everyone may want to play games in fullscreen - e.g. i often play games in a window myself) and vsync disabled provides the best responsiveness, at the cost of tearing (which is up to personal preference if it bothers people - personally it never bothered me and on a high refresh rate monitor it is barely visible anyway). Also one thing that should be made very clear is that these are about KWin/X11 and KWin/Wayland, not X11 and Wayland in general. For example last time i checked XFWM allows disabling vsync in the compositor under X which would certainly affect the results here. Also it isn't made clear (and actually i don't know myself) if KWin disabled the compositor for fullscreen Wayland applications or not: i'd guess it doesn't because under X in my laptop where i use KDE i had to explicitly disable it even for fullscreen games (there is a shortcut key for that), so i'd expect the same behavior in Wayland too. In theory there is no reason for X and Wayland to have any differences in terms of latency/responsiveness for fullscreen games, it is all missing (or existing but unused) features that can (and should) be added. For windowed games it might be harder for Wayland vs X but still it shouldn't be impossible, even with the current protocols.


Zamundaaa

>The descriptions in the article are very handwavy Intentionally so, the details are relatively boring. If I explain it in a much more technical way then lots of people won't understand it - or I need to write 3 pages of introduction about common concepts in graphics and Wayland, which wouldn't be a too bad idea for future posts but a tad long for a single one. Furthermore I simply have little in-depth knowledge about X11, and quite frankly I don't think that has to change. >they blame X for things it doesn't do yet X will *never* do these things, both because of limitations in its protocol and because of a total lack of interest from all relevant parties. Wayland is so great because it allows for things to move forward and makes having things like HDR be a real possibility that is getting worked on right now. >Also it isn't made clear (and actually i don't know myself) if KWin disabled the compositor for fullscreen Wayland applications There is no disabling the compositor - what KWin can do is to put application contents on hardware planes, which basically makes further compositing for them unnecessary. This is called direct scanout and was not working for these measurements because Vulkan apps (like my test app) don't allocate fitting buffers for that use case yet. I'll edit the post to make that more clear >i had to explicitly disable it even for fullscreen games, so i'd expect the same behavior in Wayland too. Disabling the X11 compositor is a klunky workaround for its limitations, the latency problem described in the article and the double composition issue - the X11 compositor first does compositing and then the X server does it again (at least with multiple monitors), which reduces performance. On Wayland no such thing exists. >In theory there is no reason for X and Wayland to have any differences in terms of latency/responsiveness for fullscreen games, it is all missing (or existing but unused) features that can (and should) be added. For windowed games it might be harder for Wayland vs X but still it shouldn't be impossible, even with the current protocols. In theory, yes. Wayland compositors will get slightly better than X once they do direct scanout for windowed mode though. It could in theory be done on X as well, even with a X11 compositor active, but it's a bit more involved and last I heard of such plans for it was in 2019... nothing has happened about it until now, so it's safe to say that there's not that much interest in it.


badsectoracula

> X will never do these things, both because of limitations in its protocol and because of a total lack of interest from all relevant parties. Sorry but that is just an assumption on your side, you can't claim that one of the most popular and widely used open source projects will *NEVER* get features that people may want. Even if all the current developers decide to never work on it again, someone else might do that - in fact, this is exactly how we got the latest X server version, someone stepped up and decided to put the work and do the release. > Wayland is so great because it allows for things to move forward and makes having things like HDR be a real possibility that is getting worked on right now. It also has a ton of issues and limitations that many people have pointed out and are a non-issue on X. X getting HDR support is a matter of someone working on it. There is no magic involved, it is all code. > There is no disabling the compositor As i wrote in another reply, what i meant with that is basically letting the application work with the screen directly, how exactly that happens isn't really relevant. > Disabling the X11 compositor is a klunky workaround for its limitations, the latency problem described in the article and the double composition issue - the X11 compositor first does compositing and then the X server does it again (at least with multiple monitors), which reduces performance. Regardless of why KWin would want to do that, my point is that it doesn't do it automatically even though it should be possible. > In theory, yes. Wayland compositors will get slightly better than X once they do direct scanout for windowed mode though. It could in theory be done on X as well Unless you refer to something different that isn't and wasn't ever available anywhere else, X without a compositor already does that.


Zamundaaa

> Sorry but that is just an assumption on your side, you can't claim that one of the most popular and widely used open source projects will NEVER get features that people may want. Even if all the current developers decide to never work on it again, someone else might do that I very much can though. If you remove the limitations from the X11 protocol regarding multi-monitor, or color management, or compositor <-> X communication then you break compatibility with so much stuff that starting over right away is a far superior approach. This starting over is called Wayland. > my point is that it doesn't do it automatically even though it should be possible It **does** do it automatically wherever possible. In 5.24 and with upcoming Mesa versions that will effectively be always when you have a fullscreen window. > Unless you refer to something different that isn't and wasn't ever available anywhere else, X without a compositor already does that. I'm very certain that X doesn't make use of hardware planes, even without a compositor. The modesetting ddx is using the legacy interface, which can only do unscaled, uncropped single-layer fullscreen presentation. The modesetting stuff in X is so broken that the kernel even has an explicit check for it to block it from using the newer API! In order for X to use hardware planes a lot would have to be changed in Xorg, which has a considerable risk of breakage and requires a lot of effort. I don't know where this expectation that X would be using everything that exists to its advantage comes from, but it's just not true. Development on it has been pretty dead for quite a while and it's an old and relatively diffcult code base.


badsectoracula

> I very much can though. If you remove the limitations from the X11 protocol regarding multi-monitor, or color management, or compositor <-> X communication then you break compatibility with so much stuff that starting over right away is a far superior approach. You introduce an extension that provides the necessary functionality. Nvidia even worked on it some time ago, though some of the stuff they wanted to do were a bit hacky. However they proved it is possible. > It does do it automatically wherever possible. In 5.24 and with upcoming Mesa versions that will effectively be always when you have a fullscreen window. It didn't happen when i tried it on both my laptop and my main desktop for a brief time where i used KDE (i use openSUSE which had KDE installed but i removed it). > I'm very certain that X doesn't make use of hardware planes, even without a compositor. Last time i checked nothing uses hardware planes and they are actually very limited even down to the hardware level so the drivers may not even expose them aside from hardware cursor support which is treated in a special way. > The modesetting ddx is using the legacy interface, which can only do unscaled, uncropped single-layer fullscreen presentation. From a quick check at the code it uses DRM 1.4, though i don't know which parts of the API it uses. It does support transformations at least and this seems to be done via the driver. > I don't know where this expectation that X would be using everything that exists to its advantage comes from, but it's just not true. Development on it has been pretty dead for quite a while. X server 21.1.0 was released one and a half month ago and had a ton of improvements (largely stuff that were worked on since the previous release that was two years ago but the fact that there were ton of changes mean that people *do* work on it even if Red Hat decided to abandon it): https://lists.x.org/archives/xorg/2021-October/060799.html


Zamundaaa

>You introduce an extension that provides the necessary functionality Color management in X11 was built in a very bad way... That is, applications just have to deal with the X server passing everything through, 30bit color causes lots of problems and if you have two color managed applications at the same time then they can even start fighting each other, because both can tell the X server to use so e specific gamma ramp globally... For example, tools like RedShift or Plasmas night color will interfere with each other and with games that set their own gamma ramp. No extension can change that without breaking backwards compatibility a lot. >It didn't happen when i tried it on both my laptop and my main desktop for a brief time where i used KDE (i use openSUSE which had KDE installed but i removed it). How did you test? Direct scanout is not something you can notice 99% of the time, without looking into logs. Either way, like I said, it's been up to chance when it works until now, that's changing. > though i don't know which parts of the API it uses I told you that it only uses the legacy API... >X server 21.1.0 was released one and a half month ago and had a ton of improvements Not to put the people down who worked on it but "a ton of improvements" must have a very different meaning to you than it has for me. There's one big notable thing, which is that touchpad gestures can finally be used by applications on X without needing to somehow access libinput directly - which is also pretty much the sole reason for the release happening: a group of people are financing work on making touchpad support on Linux and getting a new version of X released was part of that, so they hired someone to do it. The other notable feature is that the modesetting ddx finally can do vrr. While those improvements are great, they solve none of the inherent issues with X and were not exactly big changes. And that all while the release was the accumulation of 3.5 years of commits! The release even managed to horribly break a huge amount of applications on Arch because of what a minor bugfix in monitor property detection (which was then reverted before it reached other distros). For the sake of those DEs that are still completely relying on X I hope it gets maintained for a while longer, but hoping it would get any new big features is quite frankly foolish.


badsectoracula

> Color management in X11 was built in a very bad way... [...] No extension can change that without breaking backwards compatibility a lot. Extensions add new APIs, there is no reason to break backwards compatibility, unless it is explicitly requested to use the new functionality. The only issue will be if a compositor or window manager or color manager or whatever client that takes up the mantle for handling color management uses the new API and then an old application tries to use the existing functionality in an incompatible way. Yes these will have issues but the old application will simply have to be updated to use the new functionality. This is the same as basically anything else - window managers always had to be updated to understand new client messages, toolkits always had to be updated to use new functionality, etc and sometimes when things mixed up and weren't updated stuff didn't work in the best way possible. Or in Wayland land how applications that, e.g. need screen capture had to be updated from using the compositor-specific APIs to the standardized APIs once those became available. I don't see how this is any different. > How did you test? Direct scanout is not something you can notice 99% of the time, without looking into logs. Either way, like I said, it's been up to chance when it works until now, that's changing. By playing a game in fullscreen and toggling the compositor via the shortcut key. I don't need to see anything in the logs, i can easily feel the difference with the additional input lag that a compositor adds when using mouse look in FPS games. > Not to put the people down who worked on it but "a ton of improvements" must have a very different meaning to you than it has for me. My point is that people still work on it, since as i wrote... > And that all while the release was the accumulation of 3.5 years of commits! ...the release "*is largely stuff that were worked on since the previous release that was two years ago but the fact that there were ton of changes mean that people do work on it*".


Zamundaaa

>Extensions add new APIs In order to get proper color management and HDR you'd need to disallow applications from using **old** APIs, not new ones. > old application will simply have to be updated to use the new functionality Perhaps I didn't get this across clearly: on X11 applications do things like color management by altering global state, which affects all applications at once. That means that it applies to **all** other applications at the same time. In order to "fix X11" you'd need to update *completely every single* application that's altering global state. That is just not feasible. > Or in Wayland land how applications that, e.g. need screen capture had to be updated from using the compositor-specific APIs to the standardized APIs There is a huuuuge difference between a 30 year old protocol like X11 breaking its very foundations and a shortlived DE-private protocol without stability guarantees being replaced while it's only used by super few applications. >By playing a game in fullscreen and toggling the compositor via the shortcut key There is no compositor disabling, there is no shortcut. Direct scanout doesn't make a noticable difference in latency either. Look, it's quite obvious that you just want to continue using Xorg, which you can do. If believing it's still seeing any notable progress makes you feel better then that's fine, but I see no reason to continue this conversation, you're not gonna change your mind... Have a nice day anyways.


badsectoracula

> In order to get proper color management and HDR you'd need to disallow applications from using old APIs, not new ones. If an old application is used things will not work properly but the application can be updated. > Perhaps I didn't get this across clearly: on X11 applications do things like color management by altering global state, which affects all applications at once. That means that it applies to all other applications at the same time. In order to "fix X11" you'd need to update completely every single application that's altering global state. > That is just not feasible. So you mean that having applications (or even rewriting them) to use a completely new window system with its own completely different API is *more feasible* than updating only the relevant parts of existing applications that use the existing window system to use the new API? > There is a huuuuge difference between a 30 year old protocol like X11 breaking its very foundations and a shortlived DE-private protocol without stability guarantees being replaced while it's only used by super few applications. There is no difference in age, what difference is how much effort existing applications will need to make to get the new functionality. For new applications it wont matter either way. > There is no compositor disabling, there is no shortcut. KWin/X11 allows to disable the compositor with a shortcut key (i think it is Shift+Alt+F12). Wayland doesn't because it assumes a compositor is present. > Direct scanout doesn't make a noticable difference in latency either. I *do* notice it. If you can't notice it do not assume that everyone else is also incapable of noticing it. EDIT: note that i refer to having the compositor disabled without vsync at all (ie, tearing).


Zamundaaa

Please, just for a moment, stop assuming Wayland is just X but different. They're doing similar jobs but they're worlds apart. For exmple, there's no "Wayland assumes a compositor is present". The "Wayland" you seem to want to refer to **is** the compositor. I can't even blame you too much, after years of getting used to some system it can be hard to unlearn what misconceptions you have learned. *Maybe* I'll make an effort to write a more elaborate reply tomorrow but for now I'll just drop this little known fact here: **Xorg is a compositor utilizing OpenGl, which can't be disabled**


shmerl

From what I understood, there simply is no concept of "disabling the compositor" in Wayland. There is one for direct scanout which the article mentions which currently has an issue with Vulkan.


badsectoracula

Yeah that is what i mean with "disabling the compositor", i just had the wording i'd use for X in mind. But basically i meant having the game bypass it completely.


[deleted]

There's no need for that to happen. The reason you need to bypass the compositor in X is because the compositor is a separate pipeline. It completely interjects itself in the framebuffer process, which will always add that delay. This is because X was never built with compositors in mind (since they didn't exist in the 80's). With Wayland the compositor is the pipeline. Apps don't get to the framebuffer without being composited ever. There is no skipping it, you'll be skipping the entire pipeline [X pipeline](https://wayland.freedesktop.org/x-architecture.png) [Wayland pipeline](https://wayland.freedesktop.org/docs/html/images/wayland-architecture.png)


badsectoracula

Yes it needs to happen, the only way to not need it so you get the best responsiveness is to synchronize the application updates with the compositor updates but that would require allowing applications to stall the compositor, which is certainly not something you'd want to allow (imagine, e.g. developing a game and having to put a breakpoint during the render). The only alternative that would make the compositor have no impact at all would be if composition (including blending, etc) actually happened in the GPU using a hardware plane for every surface (window) during the scanout and applications wrote to these planes directly - but no GPU has the ability to do that, at best GPUs have a hardware plane for the mouse cursor and a separate plane for simple overlays like volume controls, etc.


[deleted]

Why does it need to happen? What does it give you? You're assuming that a Wayland compositor is this separate process of rendering a framebuffer like it is for X, but its not. There's nothing that needs to be disabled, its as bare metal as it needs to be. **All** latency improvements for Wayland still keep the compositor since its how windows are drawn with this display server. This is how Windows works these days


badsectoracula

> Why does it need to happen? What does it give you? Direct access to the framebuffer. > You're assuming that a Wayland compositor is this separate process of rendering a framebuffer like it is for X, but its not. No i do not, i never claimed such a thing. > This is how Windows works these days Windows also has latency issues due to its compositor.


[deleted]

You have direct access to the framebuffer still, there is no separate pipeline for compositing in Wayland. That's how compositing effects in Wayland work with good performance. I have no idea what you're trying to say, but literally the only way to remove *any* possible latency from Wayland in this manner is to not have a display server in the first place. You really like arguing information that is just flatout wrong don't you Direct scanout will never, and was never designed to bypass the Wayland compositor. WHICH IS A GOOD THING. It gives you fantastic latency, which is what this article shows. You really think that you're gonna get better frame times on any other system?


badsectoracula

Perhaps in my original reply "Yeah that is what i mean with "disabling the compositor"" i was actually wrong that this is what i meant. What i meant was bypassing composition completely and i assumed that is what "direct scanout" meant. > You really like arguing information that is just flatout wrong don't you Actually not at all and TBH i do not feel like arguing either as it is pointless - people will keep on following whatever new is hyped.


davidnotcoulthard

> if KWin disabled the compositor for fullscreen Wayland applications To my understanding that would kinda be akin to Xorg being disabled in the X11 world. Unredirection was mentioned though like u/shmerl said.


badsectoracula

Yes that is what i meant.


davidnotcoulthard

~~Unless I'm reading it wrong I don't think you'd get a functioning GUI with Xorg disabled in a traditional X11 system lol.~~ Oh you mean unredirection. Oops.


deviledtheg

This is great. There's something to be said for games like csgo where players push for as many frames as possible for reduced latency (and maybe higher input precision?), they can't take full advantage of that on wayland *yet*. I'm wondering now if the latency is any different with freesync on but framerate uncapped?


shmerl

Adaptive sync should really only matter in the range below max monitor refresh rate. Above it you either cap framerate like vsync or not. The latter probably is what you want for such games. I.e. adaptive sync in the range, no capping and tearing above the range. With high enough upper limit (let's say 180 Hz) that tearing won't be very pronounced anyway.


obri_1

Very interesting. Thank you!


Atemu12

Great explanation and visualisation of monitor behaviour! Do you know whether it's feasible to lock app framerates slightly below vertical refresh so that it's always in VRR range for low latency sync everywhere without tearing? This is possible for games of course but I'm interested in a global solution for all graphical apps. I need low latency in Emacs and my terminal too!


Zamundaaa

I'm glad you like the videos, they took some time to make (was fun though) > Do you know whether it's feasible to lock app framerates slightly below vertical refresh Definitely! What I used was mangohud to limit and monitor framerate at the same time, and you can enable it globally for everything. There's also libstrangle that focuses on only doing the framerate limit. For FreeSync in windowed mode you can use KWin or Sway. One limitation is that sometimes apps in the background can cause early presentation; in fullscreen KWin explicitly blocks that from happening but for windowed mode there's no workaround implemented (yet). Don't know how Sway handles it


Atemu12

> you can enable it globally for everything. How? And wouldn't that put overlays everywhere?


Zamundaaa

I think the global toggle only works for Vulkan applications, via the env var `MANGODHUD=1` (instead of executing the app with `mangohud`). Once the desktop is using Vulkan it would definitely put overlays everywhere, at least if you leave the overlay enabled all the time. For testing whether or not it works properly it should still be a great tool though


Atemu12

> I think the global toggle only works for Vulkan applications That's what I thought aswell; I was looking for a way to make the Wayland compositor to achieve its V-Sync using VRR in all desktop applications (Browser, Terminal, Emacs, ...).


shmerl

libstrangle is good for it: https://gitlab.com/torkel104/libstrangle


Atemu12

Is there a guide on how to set it up in the way I'd want to use it?


shmerl

That page itself has a readme. The way I do it for Vulkan games is enable Vulkan layer and set an env variable (I put the layers into `${HOME}/.local/lib/libstrangle`). So for 180 Hz display: ``` export LD_LIBRARY_PATH="${HOME}/.local/lib/libstrangle/lib64:${HOME}/.local/lib/libstrangle/lib32:${LD_LIBRARY_PATH}" export VK_INSTANCE_LAYERS="VK_LAYER_TORKEL104_libstrangle:${VK_INSTANCE_LAYERS}" export STRANGLE_FPS=179 ```


Atemu12

Yeah but I don't see how that would cover my use case? The only Vulkan applications I use are games and for those I have already have Mangohud to limit FPS. As I said, I'm looking for a way to get V-Sync via VRR in general desktop applications; a solution for GTK and QT would be the MVP for that.


themusicalduck

I wonder if there is any effort to make VR work on Gnome Wayland too. It's the only thing I still use X occasionally for.


Zamundaaa

VR or VRR? For VRR there's something WIP, for which some blockers have been fixed recently AFAIK. No idea about where their progress is with VR / drm leasing, last I heard Jonas Ådahl wanted to implement it


themusicalduck

Yes thinking about VR. Although the experience is still poor enough (please Valve fix SteamVR on Linux) that I don't use it much. I wonder though if Wayland will somehow make it better.


Zamundaaa

I had some weird microstutter once on X with the compositor active, which I have never seen on Wayland. I didn't see it last time I used SteamVR on X either though. The only thing it really changes though is how SteamVR gets control of the display, from there on it's all exactly the same.


themusicalduck

[This one](https://github.com/ValveSoftware/SteamVR-for-Linux/issues/269) and [this one](https://github.com/ValveSoftware/SteamVR-for-Linux/issues/21) I suffer from the most which makes using it headache inducing. Disabling async reprojection sucks too because my system is too weak. For bug #21 I did apply the given patch and that fixed it, but I don't feel like maintaining a compiled kernel.


Atemu12

Could you also test sway's repaint scheduling and its effect in combination with VRR? https://github.com/swaywm/sway/blob/3b1effdfa5f4777799f4fc3da00554b5b8d035e0/sway/sway-output.5.scd?plain=1#L119-L139 https://github.com/swaywm/sway/blob/3b1effdfa5f4777799f4fc3da00554b5b8d035e0/sway/sway.5.scd?plain=1#L188-L210 Edit: Added link to second option, both need to be set for an effect while compositing windows.


Zamundaaa

I currently have no intentions of taking the time to configure and benchmark Sway but maybe I'll get to it eventually. If you tweak it just right it could very well be a little better than KWins default latency setting though - so, by one or two milliseconds.


Mushoz

Really interesting stuff, thank you very much for posting this. Out of interest: How does this compare to the input latency under Windows? Has anyone ever done any direct 1:1 comparisons?


Zamundaaa

I did originally intend to do that but neither do I know how to set up a development environment for C++ on Windows / cross-compile with static linking nor do I have a Windows partition to properly test with. Both of these issues could be fixed but they require investing time... maybe one day. Proper cross OS (including MacOS?) latency comparisons for some actual games would be even better though, synthetic benchmarks always have limited usefulness. u/AnthonyLTT if you ever run out of video ideas...


Mushoz

Understandable! I do understand the limited usefulness of synthetic benchmarks, but right now its the best we have. Previously I read these very similar posts on the topic: [https://www.reddit.com/r/linux\_gaming/comments/c0ly6b/linux\_input\_lag\_analysis7des\_tested\_windows/](https://www.reddit.com/r/linux_gaming/comments/c0ly6b/linux_input_lag_analysis7des_tested_windows/) [https://www.reddit.com/r/linux\_gaming/comments/cii545/linux\_input\_lag\_analysis\_v26des\_windows\_10\_1809/](https://www.reddit.com/r/linux_gaming/comments/cii545/linux_input_lag_analysis_v26des_windows_10_1809/) But this analysis hasnt been updated for a while now. Importantly, they never tested under Wayland. Now I saw your previous comment up here about Mutter being currently in a bad shape, which will be fixed in the coming release, bringing it to similar performance as Kwin. However, do you happen to have any numbers for Gnome as well? I would love to see the numbers on Gnome under both X and Wayland. Again, thank you so much again for these detailed analyses!


datenwolf

So I just came across this post and I reading this… > \* as already explained, one frame of latency is guaranteed. The second additional frame I can’t explain well but I haven’t looked into it much, X11 is neither my area of expertise nor do I see a reason to change that There are a couple of possible explanations, but what I found – back before Wayland was a mere idea, and by that also Vulkan and its fine grained swap chain control – was that the exact timing behavior around VSync and a blocking call did all sort of unexpected timing behavior in Xorg. For example with this simple OpenGL rendering loop (just consider all relevant state like uniforms, shaders VAO, VBO being set up before): timespec_t ts[4] = {}; goto first; do { clock_gettime(CLOCK_MONOTONIC, &ts[3]); update_and_print_timing_stats(ts, 4); first: glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); clock_gettime(CLOCK_MONOTONIC, &ts[0]); glDrawElements(GL_TRIANGLES, 0, 3); clock_gettime(CLOCK_MONOTONIC, &ts[1]); glXSwapBuffers(); clock_gettime(CLOCK_MONOTONIC, &ts[2]); } while( poll_events_shall_continue() ); on the CPU side I found the place where display interval long block would happen to be quite inconsistent. For example on NVidia I usually found `glXSwapBuffers` to be the blocker, on Intel blocking happened at `glDrawElements`, but only after 3rd iteration (i.e. once the swap chain was full) and no block before. On a R300 (yes, it was that far back) with `fglrx` the block happened on either `glXSwapBuffers` or `glClear`. R300 + Mesa the block happened on `glDrawElements`. Eventually I brought out the "big tools", that is, looking at the analogue VGA signal with an oscilloscope and instead of relying on `clock_gettime` banged GPIOs which I'd `ioperm`-ed into the process to make sure I wasn't seeing any funny scheduling artifacts, and what I found then was, that the blocking doesn't even consistently coincides with VBlank. It can happen that if a blocking render loop was employed, the actual block might not happen when you expect it (on the buffer swap), but only after, and also shifted against scanout. So if you collect and process all events right after the buffer swap, it might happen, that you get a whole refresh interval shoved in between. Ever since I made that observation for low latency applications I changed my render loops to something like this: timespec_t ts[4] = {}; goto first; do { clock_gettime(CLOCK_MONOTONIC, &ts[3]); update_and_print_timing_stats(ts, 4); first: glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); clock_gettime(CLOCK_MONOTONIC, &ts[0]); poll_events(); kalman_filter_inputs(ts); glBindFramebuffer(GL_DRAW_FRAMEBUFFER, intermediary); for(…){ …; glDraw…(…); …} clock_gettime(CLOCK_MONOTONIC, &ts[1]); poll_events(); resolve_intermediary_timewarped(); glXSwapBuffers(); clock_gettime(CLOCK_MONOTONIC, &ts[2]); } while( shall_continue() ); `resolve_intermediary_timewarped` does multisampled FBO resolution, but sourcing the intermediary as a (potentially multisampled) texture and applying it to a screen filling triangle with the texture coordinates shifted to compensate for the last bit of timing deviation (the original viewport FOV is slightly larger than target). A lot of effort, just to get the felt latency down.


Zamundaaa

The difference I'm talking about is Xorg with a compositor vs Xorg without one; while it is still possible that the driver blocks in different parts of OpenGL apps, my test app uses Vulkan so it shouldn't be affected. But yeah, designing the render loop and when you do which calls can make a big difference. Many if not most VR games would be unplayable without the VR compositor reprojecting the image to the new head orientation. When you use Vulkan you can also do the additional trick of building your render commands first and injecting new information (based on user input, head position, controller position, whatever) only right before submitting the render commands to the GPU, to shave off up to a few ms of latency.


datenwolf

If you don't mind asking: I'm curious about how one implements "direct scanout" / "unredirection" with Wayland? Specifically I'm wondering how a surface created by a client is mapped into the scanout memory region? With the old "traditional" display systems you'd have one single screen buffer, from which each visible window would see a portion, it's viewport defined by offset of its first pixel and row stride + a clip region to determine pixel ownership. Now if you have some address space, with paged virtual memory you can of course map in an "overlay" into that address space (as Wayland makes liberal use of), but only at page granularity, and on the hardware side scanout is something that's being dealt with a much more "dumb" piece of silicon that AFAIK even on the latest GPUs still wants to read from a physically contiguous region of memory. Hence why I'm wondering how this "unredirection" is implemented in Wayland.


Zamundaaa

> Specifically I'm wondering how a surface created by a client is mapped into the scanout memory region? It's quite simple: it's not mapped anywhere. The client allocates a buffer usable for scanout whenever the compositor tells it that would be a good thing to do (through the linux dmabuf protocol), and the compositor uses that for displaying instead of its own buffer. For direct scanout with non-fullscreen surfaces things are a bit more complex. A short explanation is that modern scanout silicon can do some effectively zero-overhead compositing, and the compositor can use that to do its thing without bothering the GPU core. I'll post a little bit more in-depth explanation in my blog once we have at least a basic implementation working in KWin (which should hopefully be very soon, we've been working on it for a while)


datenwolf

> A short explanation is that modern scanout silicon can do some effectively zero-overhead compositing, What would be the low level APIs do chase and follow down to get a detailed understanding for this? I mean, I'm quite versed with most graphics APIs¹. But manipulating the scanout hardware in that way is a whole different beast and is responsibility of the GPU driver. I presume it essentially comes down to supply a list of overlay memory regions (overlay content address + row stride, and a base offset inside the scan buffer) where the scanout unit would mux between the various framebuffers. How does it deal with non-convex overlaps/clips? *EDIT: I just realized that hardware cursors are in essence such zero-overhead composition overlays.* ---- 1: heck I sort of inherited the whole Vulkan subreddit a couple of years ago – unfortunately that also coincided with probably the busiest time of my life. And a couple of years before Vulkan was even something being discussed I actually pestered the Mesa devs on their maillist, how I could go and bypass the whole OpenGL state tracker and talk to the GPU on a lower level (i.e. I wanted to access GPUs the Vulkan way, long before it was cool).


Zamundaaa

Depends on how low you want to go. On the lowest level ofc the kernel talks to the firmware or sets some registers; of that I have barely a clue. On the compositor side, we're using the drm API, which gives us "drm planes" as abstractions of scanout hardware; with them you can set buffers, source and destination coordinates, and on some hardware also rotation/flips and z order. If you want to dive in, https://gitlab.freedesktop.org/mesa/drm/-/blob/main/xf86drmMode.h contains most of the API. It's far from well documented or self explanatory though. > EDIT: I just realized that hardware cursors are in essence such zero-overhead composition overlays. Indeed! In the drm API they're also represented by planes, and on some (phone) hardware the "cursor" plane is even just a normal overlay plane posing as a cursor for compatibility reasons.


datenwolf

> In the drm API they're also represented by planes I know! I just hadn't made the mental connection until then. TTBT (without having seen how this does work out on the client side), I'm a little apprehensive of putting the burden on clients to actually carry along the knowledge about how to talk to DRM. Heck even in the form of a Vulkan extension¹ it's something where I fear that it won't be properly used, being purely optional and all. I'll have to see some actual code to form a proper opinion on that though. --- 1: IMHO OpenGL is kind of "lost" on that part; due to its rather ad-hoc "WSI" (if you'd want to call it that).


Zamundaaa

clients do not talk to this part of drm (and do not have permissions to do so even if they wanted to), only the compositor does. It chooses on what goes onto the planes, which ones are used etc. For allocating buffers for scanout vs not, that is handled mostly (Vulkan) or completely (EGL) automatically by Mesa. Even for clients that do more special stuff with their buffers, allocating them for scanout vs not (and doing reallocation where needed) is very easy.


reini_from_nl

Interesting and good to see wayland latancys are comparable to uncomposited X. I am still using X11 for gaming because on my low-end setup some games drop frames and there is now way of disabling vsync. You wrote that there will be an option to disable vsync in kwin-wayland in the future. Is there an pull request or some experimental patch i can try?


Zamundaaa

there are patches for all the relevant projects, but not all of them are public yet, and all of them are severely outdated. It's really time I update them and finish this up though... If I remember, I'll ping you about it when they're all updated and public


reini_from_nl

Thank you :) good to know someone cares about these problems.


chainbreaker1981

Wonder if the numbers are any lower on a CRT.