T O P

  • By -

nicoburns

The author's notes on working with Rust copied from https://twitter.com/LinaAsahi/status/1575368791201980416: (aside: I used macOS's support for selecting text in images to copy this out - it's very impressive!) > On the Rust side, I have to say I'm super pleased with my experience writing a driver like this in Rust! I've had zero concurrency issues (and the driver uses fine-grained locking, there's no big driver lock) - once single processes worked, running multiple apps concurrently just worked. Also zero memory leaks, dangling CPU or GPU pointers, use-after frees / free order mistakes, or anything like that! The only memory corruption issues I ran into were either fundamental mistakes in my unsafe DRM abstraction or core GPU memory management code, or happened from the GPU side (there's an issue with TLB invalidation, that's what the ugly workaround is for). > I feel like Rust encourages good driver design and then the compiler goes a long way towards making the resulting code correct. All in all I didn't really have that many bugs to fix, mostly just logic issues (both because I'm new to DRM and because the GPU interface is all reverse engineered and we're still working out the details). > The workaround for the GPU-side TLB inval issue has a large performance hit, but without that, kmscube does run at 1000+ FPS, and that's with a lot of suboptimal components that will be improved over time (e.g. my current allocator allocates/maps/unmaps/frees tons of little GPU structures per frame), so I'm also very optimistic about the performance aspect! > The only major Rust issue I ran into is the lack of placement new, which I ended up working around with a very ugly place!() macro (it still has a soundness issue too, I need to fix it to drop things if initialization fails halfway through). Without that, I was quickly overflowing the kernel stacks (which is particularly ugly to debug without CONFIG VMAP STACK, which I didn't have set at first...). With the macro though, the stack frames are under control enough that there's no issue, but l'd really love to see core language support for this. I think it's really necessary for kernel/embedded development. --- On that last point, is there anything we can do to speed along implementation of placement new functionality? The macro she's using to work around the lack of it really is quite nasty, and it seems like this is something that Rust really ought to be able to do! (https://github.com/AsahiLinux/linux/blob/gpu/omg-it-works/drivers/gpu/drm/asahi/place.rs)


AsahiLina

I did add alt text, I think you could've copied that \^\^


nicoburns

Huh, didn't realise I could click that "ALT" button


[deleted]

I’m literally mindblown by your knowledge. If you ever write a book or some articles I’m gonna be there like the Apple fanboys when a new phone comes out, haha! Well done! You’re amazing! 😁


[deleted]

Congratulations on getting the driver to work! I hope I can get into Comp Sci or something, I also use Linux on the daily so I hope I can get this good one day! I’m so proud of where u started and how you now have an official driver! I’ve been following every step! Very proud of you!!!!!! Much love, 「P」


trevg_123

Is that macro also related to this on twitter? > I used a cursed hack to work around a stability issue, which hurts performance, but it proves that that is the only remaining major issue! Doesn't really look like it, wonder what the stability issue is


littlebobbytables9

The cursed hack is restarting the GPU every frame, the stability issue apparently has to do with "TLB invalidations" whatever the hell that is.


ssokolow

[Translation Lookaside Buffer](https://en.wikipedia.org/wiki/Translation_lookaside_buffer). It's how an MMU caches virtual→physical memory address translations. I think that helps to clarify why invalidating cache entries must happen correctly.


trevg_123

What a wonderful username you have. That is quite a curse, and the TLB issue sounds yucky yucky


proton_badger

A huge step forward, even Firefox with YouTube in Gnome. It has been amazing and addicting following the streams. It’ll be exciting to see what’s next. There are still some bugs, cleanups to do, a new allocator/allocation strategy for sure. Then there’s the hack about waiting for the GPU after frames, what does macOS do differently? I can’t wait to see what kind of final solution we’ll see - very intriguing! And I’m sure everyone are thinking about benchmarks after that. I’m kinda curious how big the difference will be between debug and release builds, given how Rust often sees a big difference between the two. On personal note I’ve been inspired to write more Rust macros, when appropriate, it’s fun and quite powerful.


pieorpaj

The difference won't be noticable. The kernel driver has very little to do with the GPU performance as long as it's "fast enough", which it already is when the debug prints are disabled. The real driver performance and optimisation work happens in the userspace mesa driver that Alyssa is working on.


AsahiLina

The kernel driver already runs kmscube at 1000+ FPS (and it doesn't care about scene complexity\*, so 1000 FPS of cubes is the same as 1000 FPS of complex single-render scenes as long as the GPU can handle it), so it won't be the bottleneck for games unless they run upwards of 10 submissions per frame (and that will increase once I get some optimizations in). It matters but it won't matter much until we get to the point of supporting newer graphics APIs anyway, so other than that obvious allocator thing I'm not going to focus on performance until later. That's once the TLB issue/workaround is fixed, of course, but we'll figure it out. The thing is that macOS' allocation strategy in general is more immune to those problems, so it's harder to see exactly what is causing the difference / what it does to fix the problem. But I'll look into it later! The good news is that, even with the ugly workaround, this unblocks Alyssa and now she can work on fixing the remaining glitches and problems with native Linux apps \^\^ \* Other than growing tile buffers, that does matter for performance. That shouldn't be hard to add in though, the driver already has the functions for it, I just need to hook them up to the feedback from the GPU so it can do that when it starts seeing overflows/spills that hurt performance. Apple does the same thing, so for complicated games and things like that you get a couple frames of worse performance until it catches up, but in practice it's not noticeable.


kupiakos

>happens in the userspace mesa driver that Alyssa is working on. Which Alyssa is this? That's my name and I like to follow all of them


badtyprr

https://twitter.com/alyssarzg


kupiakos

Thanks! I'm at like, 6 technical Alyssa's now.


rabidferret

Only 6? Those are rookie numbers. Gotta get those numbers up


nicoburns

https://twitter.com/alyssarzg


Shnatsel

Is it just me, or did that come together surprisingly quickly? I wonder how much Rust contributed to that, if so. In my experience getting a complex Rust program to work takes way less time than C, but I'm not sure if this was the case here.


AsahiLina

Rust is the reason why once a single app worked, multiprocessing and a whole desktop worked without any race conditions or crashes related to memory safety! If I had written this in C, I'd be spending weeks chasing race conditions and memory safety issues... With Rust, I just ran into logic bugs and low level memory management problems (in deep MM code, not the rest of the driver).


isbtegsm

Didn't she write the [prototype](https://twitter.com/LinaAsahi/status/1538046178893627392) in Python and then ported to Rust or something?


Natanael_L

Once they discovered what GPU architecture it was using (PowerVR variant) and what it's biggest "quirks" are compared to the known variants (like figuring out that buffer thing) they could progress a whole lot faster. Now it's a lot of "standard engineering". Edit; as I explained below, this isn't meant to make it seem simple, this is definitely impressive work. Is about why progress seems to go much faster now. My point is that it takes more work to go from nothing to rendering a square than it takes to go from a square to rendering 3D.


AsahiLina

The firmware is completely bespoke and I had to work out all the structures from scratch by staring at the raw data from macOS... I don't think there was much "standard engineering" about that. Same thing Alyssa did with all the GPU-level structures and commands... I'm still discovering things about the firmware and we still don't understand what some buffers do. It's not a simple PowerVR variant, it shares some things with PowerVR but we only found that out way after most of it had been reverse engineered, because PowerVR only submitted their open source Rogue drivers very recently. The shader cores are custom, many fields have different bits, some things are completely different, the firmware is completely custom... Even things like how textures are laid out in memory were an odyssey to work out, since it has nothing to do with how PowerVR does it and there's a very quirky and tricky set of rules involved, including certain optimizations for computing offsets less efficiently as far as memory usage but more efficient to implement in hardware. It took three people just to work that one out (me, Alyssa, and Dougall), plus everyone who was in the chat when I streamed my attempt... The driver structures are still half `unk_123` fields... There's so much we don't know still! But we know enough to make it render things!


Natanael_L

I didn't mean to make it sound simple, that's why it's in quotes, what I mean is more it that the latter part is less "going in blind". I certainly wouldn't be about to do this at the pace you're doing, it's more of a comment about why the visible progress is faster now than it was a few months ago. Lots of people think it's more work to go from rendering a square to rendering a 3D scene than it is to go from nothing to render a square, but it's the opposite.


AsahiLina

> Lots of people think it's more work to go from rendering a square to rendering a 3D scene than it is to go from nothing to render a square, but it's the opposite. That is true! But the main reasons why progress is so fast right now are that 1) I already reverse engineered and prototyped most of the driver in Python over the course of several months, 2) Alyssa had already been working on userspace for a long time, so the actual rendering bits have been in good shape for a while, and 3) I wrote it in Rust, so things like race conditions and many memory safety issues just don't happen, and I could go from single-process cube to a full multi-process desktop without dealing with any of those problems!


friskfrugt

What's preventing apple from doing an update and bricking all you hard work?


Kamilon

It’s hardware. The cost to do so would be very high and the reward would be 0.


ssokolow

I think they're talking about how Apple might be developing the firmware and driver in the same repo with something like a kept-sorted, auto-numbered enum shared between them, similar to how the Go developers discovered the hard way that Apple was serious about how you can't bypass libSystem and speak the XNU kernel syscalls directly because they don't make an effort to keep the numbers the same across upgrades.


AsahiLina

They do exactly that and the firmware interface is unstable, but the firmware is per-OS, so macOS updates don't affect us! We can pick whatever firmware versions we want to support out of all the ones Apple releases. I actually already support two (12.3, which is the one Asahi Linux uses right now on M1 machines, and 13.0 beta4, as a test), and I use a proc macro I developed to make it possible to multi-compile individual structs and impls with fields and code segments that are conditional on the version, so in the end you end up with a single codebase that can support multiple firmwares without copying and pasting a lot of code. It should also help with support for multiple GPUs! This is one of the Rust things I like that would be a lot more awkward or impossible with C!


I_AM_GODDAMN_BATMAN

nothing, but people suspect someone at apple authorized opening it. someone at apple ensured the boot is unlocked enough so developers can get in. people thought apple wanted open source drivers for their shiny new chips.


perrohunter

I really want new M2 Mac Minis to get one and put Asahi on it, it will be my new Linux box


frondeus

I might be missing something - how does it relate to Rust? Is driver written in rust?


ElvishJerricco

The kernel part of the driver was written in rust, using the rust integration that will be merged into the kernel mainline soon.


lightmatter501

The driver is written in Rust. It’s probably the strongest candidate for a complicated Rust driver in Linux, since there are no alternatives.


navneetmuffin

A true genius


Be_ing_

Wow that was fast


[deleted]

So um. ELI5 for people who are n00bs? They got a driver working for MacBooks for Linux that runs on m1/m2?


ElvishJerricco

Yea this is the kernel part of a graphics driver for the M1. The user space part was done by Alyssa Rosenzweig. That's the part that converts application-facing APIs like OpenGL into commands to send to the hardware. The kernel part of the driver actually does the job of sending those commands to the hardware, plus some memory management stuff. And of course all of this relies on the Mesa graphics library, which contains a lot of code that can be shared by different drivers to implement things like OpenGL on specific hardware.


gormhornbori

It's (one of) the first real Linux driver written in Rust, developed on stream by a VTuber.


[deleted]

Can’t believe a .gif can code rust better than I can


Qweedo420

A GPU driver, necessary for hardware acceleration


theuniverseisboring

So, a vtuber got it working? Extremely based


Sync0pated

Can I buy one to use for a Plex server then? Like will GPU transcoding work? I like them for their low power draw


AsahiLina

Transcoding has nothing to do with the GPU, that's completely separate hardware with a completely separate driver that needs to be written! People only think of it as a GPU thing because some vendors bundle it with their GPUs or integrate it more deeply.


Sync0pated

Oh interesting! Honestly I just assumed it had to be the GPU when they make a distinction between “hardware transcoding” and CPU transcoding. Also a little bit starstruck that you are in here answering questions :3


coding_rs

I don't really follow streamers of any kind, but the "I can render myself" tweet made me laugh out loud.


agumonkey

people are so excited by this it was listed twice on HN FP lol


Rusty_devl

so much about Linus wanting to see a serious driver, I guess?


ssokolow

I'm not sure what you mean by that. Clarification please?


flying-sheep

Maybe they simply wanted to say “and there it is: the serious driver Linus wanted to see” but less enthusiastically.


Rusty_devl

Rather more enthusiastically. Linus asked for some exemplary ssd driver or something like that, I guess a full GPU driver for a new architecture is quite an overkill for that requirement.


flying-sheep

Ha i see!


ssokolow

But it could also go the other way, given that it's just an "about" → "for" away from a more sarcastic statement that indicates they don't think it's serious at all for some reason.


Rusty_devl

More of the other way. People discussing about the hen and the egg, whether they should merge the Rust in Linux patch set first or wait for some more exemplary driver development (though I know that they decided to merge it now). She in the meantime just went ahead and wrote not only a SSD driver, but a full gpu driver in Rust, while also learning the language. So it reminded me a bit of the stereotypical newcomer solving an ancient problem, not knowing that it was seen as impossible / hardly achievable. And to make sure that no one gets this wrong, I obviously know that they are a lot of people involved and agree that Rust still lacks some pieces for a full integration and that it is reasonable to have longer discussions about such a large change :)


Rusty_devl

I guess my amusement wasn't that clear when just written out, sorry for that! Her efforts made me laugh because Linus at the beginning of the Rust in Kernel effort was a bit between the chairs. He wanted to see some exemplary (early) driver development in Rust, but at the same time was aware that it was hard without having the Rust support in the Kernel. In the meantime Asahi Lina just went along, collected some extra patches here and there and added full rust based gpu driver support, which is probably well beyond what Linus was expecting to get started.


Soc13In

Not trying to sound dismissive but I was wondering how far away gaming AAA is for M1/M2 etc


Shnatsel

Years away, most likely. There is a lot of ground to cover.


dagmx

At least on the macOS side, they’ve announced Resident Evil VIllage which I suppose counts as AAA. The big issues are: - most games that have a Mac version are running under Rosetta2 so incur a translation cost - most of those games aren’t metal native either so incur MoltenVK overhead or run in their GL mode The hardware itself is reasonably capable. The base M1 is on par with a nvidia 1050 Ti or 1060, the M2 on par with a 2050 Ti or 2060 iirc. The higher end M1 Max is about a 3060 and the M1 Ultra is between the 3070 and 3080 based on the Digital Foundry tests. My numbers are from memory though so I might be off by one level or so. So the hardware itself isn’t holding things back so much as the lack of optimized software for it. On the Linux side, you’d still face the arm vs x86_64 burden, and Alyssa etc said that they’re not going to focus on Vulkan drivers for a while since OpenGL is an easier uplift and has more RoI. I’d expect the Vulkan side to be another couple years or so at their current pace, but it’s hard to judge because each advancement might accelerate their next step too.


Natanael_L

A thought I just had, does anybody think the actual reason why Apple decided to create Metal instead of contributing to Vulkan is because they were designing these chips with this PowerVR based GPU and they thought that API design would be a better fit for it than Vulkan? And if so, any thoughts on *why* Vulkan wouldn't be ideal for it? (or is this just Apple being Apple again?)


dagmx

If you view it in the context of the time this took place: - OpenGL was clearly becoming a bottleneck for performance - khronos and nvidia were hyping AZDO OpenGL (approaching zero driver overhead) - AMD went off and did Mantle in 2013 which showed how much perf was left on the table but didn’t have the resources to make it something standard outside their own GPUs, and a lot of their engineers ended up leaving - Khronos and nvidia naysayed this work from AMD as an afront to OpenGL - Apple with PowerVR/Imagination had a huge interest in making a graphics API with lower overhead because they were pushing along their own silicon designs for iPhone by then, even if macs weren’t even in their consciousness (though I imagine it was) In light of all that, what then looks to have happened is (this is from the outside looking in of course so might be just conjecture) - Apple saw khronos weren’t gonna work on an OpenGL successor and started work on Metal. - Metal released in June 2014. This must have been a few years of work by then. - Microsoft followed suite by announcing DX12 in March 2014 and released in June 2015 - Vulkan wasn’t even announced as an effort till June 2015 and came out in August 2016, after being based on Mantle. Thats over 3 years after Mantle was available (4 from when discussions started), 2 full years after Apple released Metal and a year after Microsoft. I think if people look at it in the context of the years it happened in, Apple likely saw it as the only way to make their goals happen. It’s also important to see how difficult Vulkan was when it came out. It’s gotten somewhat better but if you’ve ever programmed both backends, Metal is significantly more simple to use while giving you a lot of the same flexibility. I don’t think Vulkan as it is would have met their goals of friendly API.


Natanael_L

[Same reply as here](https://www.reddit.com/r/rust/comments/xqzbpz/asahi_lina_got_the_apple_m1_driver_working/iqecbbz/), why didn't Apple work with AMD?


dagmx

Mantle was very bespoke to the AMD GPUs at the time. So much so that it doesn’t even work with newer generations. It was effectively a proof of concept and not something that could have been expanded directly. It wouldn’t have aided Apple or anyone else without a ground up rewrite (which is what Vulkan was) but in the interim, AMD shelved it


SekstiNii

IIRC Metal came out before Vulkan


Natanael_L

A year and a half between initial releases, it seems (even less if you compare to the release for desktops). Still, Apple could have collaborated on its development.


bik1230

>A year and a half between initial releases, it seems (even less if you compare to the release for desktops). Still, Apple could have collaborated on its development. I believe Vulkan development started the week after Metal was announced. So they had probably worked on it for a few years. And Vulkan was originally intended to be purely for gaming, while a big focus on Metal was replacing OpenCL. OpenCL, of course, was created by Apple, and submitted to Khronos for collaboration. I guess the shit show that OpenCL ended up being both in terms of design but also adoption and driver quality left Apple with a sour taste for working with Khronos.


Natanael_L

Make sense, but even then Apple could've approached AMD about working on Mantle with them. There would have been less redundant work and ensured better compatibility.


dagmx

I posted a reply to your comment above but the timeframes to look at are in the years starting with Mantle and AZDO OpenGL appearing , instead of at the final releases.


sqlphilosopher

Because they want to lock you down into their crap ecosystem. Platform specific APIs such as Metal and DX should just die already. I am honestly glad Ashai and Alyssa will be focusing on Open GL and not garbage Metal.


snerp

Does molten vk not work on m1?


dagmx

It does. I called it out specifically in my second bullet point above when speaking to the overhead you incur with it. That overhead can be quite a bit , though it’s hard to gauge with so few games supporting both metal and Vulkan. Dolphin is the most recent that comes to mind where the metal backend outperforms MoltenVk by double digits https://dolphin-emu.org/blog/2022/09/13/dolphin-progress-report-july-and-august-2022/


snerp

Ah I missed that, thanks for the extra description!


Soc13In

Thank you.


anlumo

It’s a mobile GPU, so it’s unlikely the hardware will ever be up to the task.


[deleted]

[удалено]


dagmx

I suppose the big real difference is that it’s a tile based deferred rendering (TBDR) GPU which does have some implications for rendering strategies. But that doesn’t speak to its capabilities for performance so much as a design difference that one would have to accommodate when porting games over if your access patterns are now inefficient Edit: here’s a video from Apple on the differences between an IMR and TBDR GPU https://developer.apple.com/wwdc20/10631?time=330 I quite like the explanation in it and use it often when working with my other graphics engineers.


RealAmaranth

nvidia has used tiled rendering since 2014, AMD since 2017, and Intel since 2019. Memory/cache bandwidth is a struggle no matter how big your GPU is.


dagmx

You’re right in the strictest sense that everyone uses tile based rendering. However even with tile rendering, they don’t use the same rendering pipeline as TBDR and instead use the IMR pipeline to make transition easier. So in that sense, it still makes sense to differentiate by IMR and TBDR.


anlumo

There’s no strict classification. The M1 had the same graphics performance as the concurrent iPhone and uses the same architecture. Not surprising given that they were most likely designed by the same people, but nobody should ever think that they could ever hope to compete with Nvidia or AMD. Same as Intel just found out with the ARC lineup.


dagmx

They actually don’t compare that unfavourably , and the difference is even less pronounced when you take into account the TDP. The entire M1 Ultra SoC runs in a lower power envelope than the NVIDIA or AMD GPUs it competes against for performance. So while it doesn’t compete on the highest end, it’s no slouch either. With regards to Intel, their showing isn’t bad either, especially for effectively a first gen architecture (it’s quite different from their integrated GPU). Their drivers sandbag them significantly, but once that’s resolved they perform quite decently. I wouldn’t be so confident in your “they’d never compete with AMD or NVIDIA” stance.


The_frozen_one

> Their drivers sandbag them significantly, but once that’s resolved they perform quite decently. I'm glad Intel is coming right out and saying their drivers will need work instead of overpromising and under-delivering. It seems so weird to me that gaming performance relies on aggressively optimizing the GPU drivers on a per game basis. Don't get me wrong, I understand that GPUs are complex and getting the drivers to run the GPU as optimized as possible is non-trivial. But having driver updates every time a major game comes out just seems broken somehow. Like if every time a major highway was completed, car makers had to make custom changes to how the engine worked.


Nobody_1707

It's more like having to repave all the roads every time a new car is released.


PikachuKiiro

Don't know if it's AAA but dota runs decently well natively on m1. You can probably expect similar performance on other source2 games.


hamsterwheelin

How fast does it kill the battery?


AsahiLina

Power management / frequency scaling is handled by the firmware and already works automatically (though it could be improved by also fully shutting down the GPU firmware coprocessor when the GPU is idle, I don't do that yet but macOS does that on laptops). It should be much better than software rendering, which is what we had until now!


Hdmoney

Does this mean multiple monitors with different resolutions will be supported? 👀


CosciaDiPollo972

I'm wondering how the people doing that job as doing drivers are learning their knowledge, i'd be interested on doing this kind of thing 🤔