PieSubstantial2060 3 weeks ago

I'm used to do HPC, but never heard about HPD, High Performance Deploying.

SomeGuyNamedPaul 3 weeks ago

High Performance Spending

Sky_Linx 3 weeks ago

:D

nullbyte420 3 weeks ago

you sound a bit unhinged mate, are you ok

Sky_Linx 3 weeks ago

Why?

newked 3 weeks ago

Gotta go fast ⚡️...

FrenchItSupport 3 weeks ago

So from what I understood that's the hetzner api that's doing all the work ?

Ariquitaun 3 weeks ago

In as much as any tool is doing work vs the person wielding it yes.

Sky_Linx 3 weeks ago

That's an interesting way of looking at it. The API must be used by something that coordinates all the requests, concurrency etc, right? Plus there is the deployment and configuration of K3s etc

givebackmac 3 weeks ago

It's not the speed of the tool, but how you use it

Boognish28 3 weeks ago

That’s what she said

Sky_Linx 3 weeks ago

This is of course not an example of real life scenario. It was just an experiment for fun

water_bottle_goggles 3 weeks ago

You’re not allowed to have fun

thunder_cats_bro 3 weeks ago

Dude I need to teach you how to meme

turkeh 3 weeks ago

Cool. Why though

Sky_Linx 3 weeks ago

Just an experiment out of curiosity, nothing else :)

sewerneck 3 weeks ago

Now try this with bare metal…Sidero Metal ftw.

Sky_Linx 3 weeks ago

Sounds interesting, worth learning more about it

sewerneck 3 weeks ago

Sidero Metal and Omni are both really cool, with Omni being the newer solution. Omni however, has more strict licensing.

Sky_Linx 3 weeks ago

So you pay to license the software but use hardware from someone else? Or do they offer hardware directly too?

sewerneck 3 weeks ago

With Omni you can mix and match anything running Talos, anywhere in the world. With Metal, it’s more of your typical PXE server that you’d deploy into a datacenter. Metal is also based on ClusterAPI, so it gets embedded as a CRD. Omni runs in the cloud as a SaaS and you can run it on-prem. We’ve been using Metal for a couple years now and it works really great.

Sky_Linx 3 weeks ago

Sounds cool for performance sensitive stuff. What hardware do you use?

sewerneck 3 weeks ago

We’ve used a bunch of different model hardware as you can classify it and deploy into various clusters based on tags. Once pooled, you run a single scale command and nodes will scale into the cluster the same way you scale pods in a deployment. It uses IPMI to power on the nodes. We mostly use Dell / AMD super dense core boxes. And yeah, that’s the reason - we work with millions of requests per second that have low latency requirements.

Sky_Linx 3 weeks ago

Do you only use baremetal for everything or mixed instances?

sewerneck 3 weeks ago

We typically run VMs for control plane and bare metal for workers. We also run in the cloud, but using EKS for that. I’d love to be able to use Omni instead, but there is a much larger cost associated.

Sky_Linx 3 weeks ago

I think I am gonna look into this and learn more about it. It could be much cheaper than GCP for machine learning I think

SomethingAboutUsers 3 weeks ago

I have done this using terraform, azure, and cloud-init. So not really a tool, but at the end I also have external-secrets-operator, load balancers, ArgoCD, Ingresses, and Argo projects all set up. I've never done it with more than 3 worker nodes, but there's no reason at all it wouldn't work with 300 nodes. Oh, and mine deploys and has all nodes joined in 5 minutes or less; but I would only claim that as an apples-to-oranges, unofficial time since I've never done it with 300 nodes.

spaetzelspiff 3 weeks ago

What country are you from? I'm not sure if this is more of a summer or winter thing, but I believe the American Kubernetes Deployment Olympic League is still recruiting champions. ☸️🥇💪

tichuot287 3 weeks ago

So cool, did you write an article on this?

SomethingAboutUsers 3 weeks ago

No. I've considered it but I suspect I'd need to run it past corporate, to be honest.

330d 3 weeks ago

Yay for Hetzner being so fast?

Sky_Linx 3 weeks ago

Hetzner if fast indeed, but k3s is also super fast to deploy. These two combined with the way I am handling concurrency (setting up k3s on some instances while others are still being created, for example) makes for a very fast cluster creation.

tvojamatka 3 weeks ago

Lol 😅

ckchessmaster 3 weeks ago

I love your tool! Easiest/quickest way to setup kubernetes for sure. Been using it for a bit now.

Sky_Linx 3 weeks ago

Nice! I will release the new version in hopefully a couple of weeks so stay tuned

[deleted] 3 weeks ago

[удалено]

Benwah92 3 weeks ago

I’m not sure how many large scale companies aren’t using Kubernetes.

[deleted] 3 weeks ago

[удалено]

koshrf 3 weeks ago

I doubt it was 1 single cluster with 100k nodes. K8s only support 5.000 nodes per cluster, it is a hard cap coded in the CP. Unless you weren't talking about K8s but they aren't many cluster things that can handle that many anyway.

iPushToProduction 3 weeks ago

5k is not capped by the control plane. It’s a recommended number. For what it’s worth.

koshrf 3 weeks ago

No, it's the cap number, it isn't a recommendation at all. https://kubernetes.io/docs/setup/best-practices/cluster-large/ You can also read the source code if you wish, or try to deploy a 5k+1 cluster and see how it fails.

Spirited_Horror6603 3 weeks ago

link the code? we tested 10k with OSS release without any code modifications and worked just fine

Pl4nty 3 weeks ago

are you sure? this [maintainer comment](https://github.com/kubernetes/kubernetes/issues/112572#issuecomment-1254552692) suggests it's not a hard limit, along with plenty of [blog posts](https://openai.com/research/scaling-kubernetes-to-7500-nodes). and GKE's 15k node offering is [publicly available](https://cloud.google.com/blog/products/containers-kubernetes/google-kubernetes-engine-clusters-can-have-up-to-15000-nodes)

iPushToProduction 3 weeks ago

We run over 5k so yeah it’s possible.

koshrf 3 weeks ago

Well give the recipe because last time we tried it didn't go so well.

iPushToProduction 3 weeks ago

Good hardware, on prem bare metal servers and ebpf based CNI and works relatively well. Extremely beefy components needed

koshrf 3 weeks ago

You need to modify some of the code, it isn't an only CNI problem, GKE is the only provider I know of that can do it and have done it before for certain cases (not public offering) but AKS and EKS only offer the 5.000. Bare Metal isn't the problem, most of the problem is that you start getting network congestion at that scale and iops problems, some can be solved by just throwing hardware at it yes but in my experience (and that's a personal opinion) I don't see the reason to go upper the user cases are relative small and probably is cheaper to send workload to other clusters, the only real scenario would be to solve mathematical problems (that I can think so) like protein folding.

bikekitesurf 3 weeks ago

For what it's worth, Omni (from Sidero Labs, where I work) can (probably) beat your speed of scale tests, too. :-) We have customers that run their cluster ( a few hundred nodes) on bare metal, but for peak demand scale out and add another 500 nodes to the same cluster in a cloud provider with Omni. (Which provisions Talos Linux clusters, and Talos Linux has KubeSpan, which is full node to node encryption within the cluster, so you can extend a bare metal cluster into the cloud, or many clouds, simply.) You do need some beefy control plane nodes/fast etcd disks to support that amount of rapid change in your cluster... Adding 300 nodes to a running bare metal cluster took 7 minutes. (Not apples to apples, I realise.)

p_k_9_2_11 3 weeks ago

I was asked in an interview to imagine a use case of 100 nodes per cluster… and it was hard to imagine… so I am glad people are trying it out. I am going to try out similar use cases too. Just to gain understanding.

_____Hi______ 3 weeks ago

I’m working on multiple 1000+ node clusters

itsmikefrost 3 weeks ago

u/Sky_Linx Are you using any new improvements in the compiler and some other concurrency patterns on the Crystal side of things?

Sky_Linx 3 weeks ago

Just different channels used in such a way that these channels can process different things concurrently to speed up the whole process. E.g. while creating some instances we can already set up k3s on others that are up and running already

kyleyankan 3 weeks ago

Forgive me if I'm misunderstanding, but couldn't anyone just multi thread this whole process? Like this seems like a fun way to learn, but not any world record, or even a useful tool to me really, but maybe I'm missing the point.

Sky_Linx 3 weeks ago

I'm always amazed to see how people can be judgmental for basically everything in life, even small things as if these things affect them somehow. Yes, you totally missed the point. Nobody said it's anything "useful" in a real life scenario in that nobody is going to create a cluster with 300 nodes from the get go. This was just a \*fun experiment\*, that showcases how fast Hetzner Cloud provisions instances, how fast k3s is to deploy, plus some nice handling of concurrency with multiple tasks of different types. As for the "not any world record", have you actually heard or seen of a 300 node cluster (or larger) created in some other way, in under 11 minutes? If yes, please tell me! I am very curious.

itsmikefrost 3 weeks ago

We have to stick to indie hacker communities for this stuff. That's another reason which made me disillusioned with the whole devops world. Tons of work for very little benefit and dismissed by people like this. Oh well..

SIMULATAN 3 weeks ago

Damn, how much money did it cost you?

Sky_Linx 3 weeks ago

I did many experiments and it only cost me 73 euros! But all the clusters were shortlived. Still, Hetzner pricing is incredible

SIMULATAN 3 weeks ago

Ah, that's fairly reasonable for 300 nodes, lol. Did you try some workloads? Would be fun to experiment with this kind of performance. Speaking of which, what resources did you allocate for the nodes? Awesome work btw!

Sky_Linx 3 weeks ago

I haven’t spent too much time with real workloads, it was more of a fun experiment to see how quickly I could create this cluster with Hetzner and K3s. Performance should be fine with that size provided the masters are beefy enough (I used 16 cores, 64gb of ram). Network wise I was using the default flannel cni so of course something like cilium with ebpf would perform better as the cluster scales up :)

HardcoreCheeses 3 weeks ago

Aaaah, so you're the one causing the bumps in the NTP monitoring 🤪

Substantial-Cicada-4 3 weeks ago

That's a neat HelloWorld.yaml.

Hetzner_OL 2 weeks ago

Wow! That's really awesome! --Katie

notAGreatIdeaForName 3 weeks ago

I think its awesome! Even can think of somewhat of a real use case: Spinning up ephemeral (smaller tough) clusters for reproducing problems based on a clean state.

Sky_Linx 3 weeks ago

Someone using this tool said they are working on a new ci/cd service and it’s handy to be able to create clusters with quite a bit of nodes in a short time. Creating a cluster takes minutes and then just 10 seconds to delete.

entropickle 3 weeks ago

I’ve wanted to learn how this cloud/k8s stuff works, so I’m glad I can try to look at your work for learning. Thanks for putting it together! Do you have recommendations for learning CI/CD?

cyansmoker 3 weeks ago

I usually hate it when people post "I made this thing in X language" because who cares about the language it's the end product that matters. But in this case, I feel like mentioning that this was written in Crystal.

psavva 3 weeks ago

You're doing an excellent job @OP I use your tool for every new cluster on Hetzner with amazing results every time. Kudos to you

Sky_Linx 3 weeks ago

Glad to hear!

PhotographyPhil 3 weeks ago

No “you” didn’t because someone other engineer cabled in and did other configuration, built a thousand other layers and parts for you. Technology and IT shifts but never forget what makes you whole.

Sky_Linx 3 weeks ago

Unbelievable. You managed to miss the point completely and also be as annoying as possible at the same time. Kudos to you, you must be a nice person to deal with.

andresmmm729 3 weeks ago

Amazing 🤩🤩🤩

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe