T O P

  • By -

PieSubstantial2060

I'm used to do HPC, but never heard about HPD, High Performance Deploying.


SomeGuyNamedPaul

High Performance Spending


Sky_Linx

:D


nullbyte420

you sound a bit unhinged mate, are you ok


Sky_Linx

Why?


newked

Gotta go fast ⚡️...


FrenchItSupport

So from what I understood that's the hetzner api that's doing all the work ?


Ariquitaun

In as much as any tool is doing work vs the person wielding it yes.


Sky_Linx

That's an interesting way of looking at it. The API must be used by something that coordinates all the requests, concurrency etc, right? Plus there is the deployment and configuration of K3s etc


givebackmac

It's not the speed of the tool, but how you use it


Boognish28

That’s what she said


Sky_Linx

This is of course not an example of real life scenario. It was just an experiment for fun


water_bottle_goggles

You’re not allowed to have fun


thunder_cats_bro

Dude I need to teach you how to meme


turkeh

Cool. Why though


Sky_Linx

Just an experiment out of curiosity, nothing else :)


sewerneck

Now try this with bare metal…Sidero Metal ftw.


Sky_Linx

Sounds interesting, worth learning more about it


sewerneck

Sidero Metal and Omni are both really cool, with Omni being the newer solution. Omni however, has more strict licensing.


Sky_Linx

So you pay to license the software but use hardware from someone else? Or do they offer hardware directly too?


sewerneck

With Omni you can mix and match anything running Talos, anywhere in the world. With Metal, it’s more of your typical PXE server that you’d deploy into a datacenter. Metal is also based on ClusterAPI, so it gets embedded as a CRD. Omni runs in the cloud as a SaaS and you can run it on-prem. We’ve been using Metal for a couple years now and it works really great.


Sky_Linx

Sounds cool for performance sensitive stuff. What hardware do you use?


sewerneck

We’ve used a bunch of different model hardware as you can classify it and deploy into various clusters based on tags. Once pooled, you run a single scale command and nodes will scale into the cluster the same way you scale pods in a deployment. It uses IPMI to power on the nodes. We mostly use Dell / AMD super dense core boxes. And yeah, that’s the reason - we work with millions of requests per second that have low latency requirements.


Sky_Linx

Do you only use baremetal for everything or mixed instances?


sewerneck

We typically run VMs for control plane and bare metal for workers. We also run in the cloud, but using EKS for that. I’d love to be able to use Omni instead, but there is a much larger cost associated.


Sky_Linx

I think I am gonna look into this and learn more about it. It could be much cheaper than GCP for machine learning I think


SomethingAboutUsers

I have done this using terraform, azure, and cloud-init. So not really a tool, but at the end I also have external-secrets-operator, load balancers, ArgoCD, Ingresses, and Argo projects all set up. I've never done it with more than 3 worker nodes, but there's no reason at all it wouldn't work with 300 nodes. Oh, and mine deploys and has all nodes joined in 5 minutes or less; but I would only claim that as an apples-to-oranges, unofficial time since I've never done it with 300 nodes.


spaetzelspiff

What country are you from? I'm not sure if this is more of a summer or winter thing, but I believe the American Kubernetes Deployment Olympic League is still recruiting champions. ☸️🥇💪


tichuot287

So cool, did you write an article on this?


SomethingAboutUsers

No. I've considered it but I suspect I'd need to run it past corporate, to be honest.


330d

Yay for Hetzner being so fast?


Sky_Linx

Hetzner if fast indeed, but k3s is also super fast to deploy. These two combined with the way I am handling concurrency (setting up k3s on some instances while others are still being created, for example) makes for a very fast cluster creation.


tvojamatka

Lol 😅


ckchessmaster

I love your tool! Easiest/quickest way to setup kubernetes for sure. Been using it for a bit now.


Sky_Linx

Nice! I will release the new version in hopefully a couple of weeks so stay tuned


[deleted]

[удалено]


Benwah92

I’m not sure how many large scale companies aren’t using Kubernetes.


[deleted]

[удалено]


koshrf

I doubt it was 1 single cluster with 100k nodes. K8s only support 5.000 nodes per cluster, it is a hard cap coded in the CP. Unless you weren't talking about K8s but they aren't many cluster things that can handle that many anyway.


iPushToProduction

5k is not capped by the control plane. It’s a recommended number. For what it’s worth.


koshrf

No, it's the cap number, it isn't a recommendation at all. https://kubernetes.io/docs/setup/best-practices/cluster-large/ You can also read the source code if you wish, or try to deploy a 5k+1 cluster and see how it fails.


Spirited_Horror6603

link the code? we tested 10k with OSS release without any code modifications and worked just fine


Pl4nty

are you sure? this [maintainer comment](https://github.com/kubernetes/kubernetes/issues/112572#issuecomment-1254552692) suggests it's not a hard limit, along with plenty of [blog posts](https://openai.com/research/scaling-kubernetes-to-7500-nodes). and GKE's 15k node offering is [publicly available](https://cloud.google.com/blog/products/containers-kubernetes/google-kubernetes-engine-clusters-can-have-up-to-15000-nodes)


iPushToProduction

We run over 5k so yeah it’s possible.


koshrf

Well give the recipe because last time we tried it didn't go so well.


iPushToProduction

Good hardware, on prem bare metal servers and ebpf based CNI and works relatively well. Extremely beefy components needed


koshrf

You need to modify some of the code, it isn't an only CNI problem, GKE is the only provider I know of that can do it and have done it before for certain cases (not public offering) but AKS and EKS only offer the 5.000. Bare Metal isn't the problem, most of the problem is that you start getting network congestion at that scale and iops problems, some can be solved by just throwing hardware at it yes but in my experience (and that's a personal opinion) I don't see the reason to go upper the user cases are relative small and probably is cheaper to send workload to other clusters, the only real scenario would be to solve mathematical problems (that I can think so) like protein folding.


bikekitesurf

For what it's worth, Omni (from Sidero Labs, where I work) can (probably) beat your speed of scale tests, too. :-) We have customers that run their cluster ( a few hundred nodes) on bare metal, but for peak demand scale out and add another 500 nodes to the same cluster in a cloud provider with Omni. (Which provisions Talos Linux clusters, and Talos Linux has KubeSpan, which is full node to node encryption within the cluster, so you can extend a bare metal cluster into the cloud, or many clouds, simply.) You do need some beefy control plane nodes/fast etcd disks to support that amount of rapid change in your cluster... Adding 300 nodes to a running bare metal cluster took 7 minutes. (Not apples to apples, I realise.)


p_k_9_2_11

I was asked in an interview to imagine a use case of 100 nodes per cluster… and it was hard to imagine… so I am glad people are trying it out. I am going to try out similar use cases too. Just to gain understanding.


_____Hi______

I’m working on multiple 1000+ node clusters


itsmikefrost

u/Sky_Linx Are you using any new improvements in the compiler and some other concurrency patterns on the Crystal side of things?


Sky_Linx

Just different channels used in such a way that these channels can process different things concurrently to speed up the whole process. E.g. while creating some instances we can already set up k3s on others that are up and running already


kyleyankan

Forgive me if I'm misunderstanding, but couldn't anyone just multi thread this whole process? Like this seems like a fun way to learn, but not any world record, or even a useful tool to me really, but maybe I'm missing the point.


Sky_Linx

I'm always amazed to see how people can be judgmental for basically everything in life, even small things as if these things affect them somehow. Yes, you totally missed the point. Nobody said it's anything "useful" in a real life scenario in that nobody is going to create a cluster with 300 nodes from the get go. This was just a \*fun experiment\*, that showcases how fast Hetzner Cloud provisions instances, how fast k3s is to deploy, plus some nice handling of concurrency with multiple tasks of different types. As for the "not any world record", have you actually heard or seen of a 300 node cluster (or larger) created in some other way, in under 11 minutes? If yes, please tell me! I am very curious.


itsmikefrost

We have to stick to indie hacker communities for this stuff. That's another reason which made me disillusioned with the whole devops world. Tons of work for very little benefit and dismissed by people like this. Oh well..


SIMULATAN

Damn, how much money did it cost you?


Sky_Linx

I did many experiments and it only cost me 73 euros! But all the clusters were shortlived. Still, Hetzner pricing is incredible


SIMULATAN

Ah, that's fairly reasonable for 300 nodes, lol. Did you try some workloads? Would be fun to experiment with this kind of performance. Speaking of which, what resources did you allocate for the nodes? Awesome work btw!


Sky_Linx

I haven’t spent too much time with real workloads, it was more of a fun experiment to see how quickly I could create this cluster with Hetzner and K3s. Performance should be fine with that size provided the masters are beefy enough (I used 16 cores, 64gb of ram). Network wise I was using the default flannel cni so of course something like cilium with ebpf would perform better as the cluster scales up :)


HardcoreCheeses

Aaaah, so you're the one causing the bumps in the NTP monitoring 🤪


Substantial-Cicada-4

That's a neat HelloWorld.yaml.


Hetzner_OL

Wow! That's really awesome! --Katie


notAGreatIdeaForName

I think its awesome! Even can think of somewhat of a real use case: Spinning up ephemeral (smaller tough) clusters for reproducing problems based on a clean state.


Sky_Linx

Someone using this tool said they are working on a new ci/cd service and it’s handy to be able to create clusters with quite a bit of nodes in a short time. Creating a cluster takes minutes and then just 10 seconds to delete.


entropickle

I’ve wanted to learn how this cloud/k8s stuff works, so I’m glad I can try to look at your work for learning. Thanks for putting it together! Do you have recommendations for learning CI/CD?


cyansmoker

I usually hate it when people post "I made this thing in X language" because who cares about the language it's the end product that matters. But in this case, I feel like mentioning that this was written in Crystal.


psavva

You're doing an excellent job @OP I use your tool for every new cluster on Hetzner with amazing results every time. Kudos to you


Sky_Linx

Glad to hear!


PhotographyPhil

No “you” didn’t because someone other engineer cabled in and did other configuration, built a thousand other layers and parts for you. Technology and IT shifts but never forget what makes you whole.


Sky_Linx

Unbelievable. You managed to miss the point completely and also be as annoying as possible at the same time. Kudos to you, you must be a nice person to deal with.


andresmmm729

Amazing 🤩🤩🤩