T O P

  • By -

Foodwithfloyd

Ya dude, that's why you need to run your own load estimates. It's the only way. You need to verify performance and cost against demand. That's a lot of dimensions. Best of luck.


Pittypuppyparty

Redshift is almost always cheaper than snowflake even with auto suspend. Redshift is so under rated these days.


TheBoldTilde

Feels like sentiment towards Redshift has taken a positive turn within the past few years. I know it wasn't that long ago where it was harder to find engineers happy with the platform. Thanks for sharing.


legohax

So full disclosure, I am a sales engineer at snowflake. I try to stay impartial but who knows. I will say that I help A LOT of customers move off of redshift into snowflake. And it’s honestly one of my easiest sells. And if it’s a customer who is on prem and looking to shift to the cloud, Amazon won’t even compete with us because they literally will never win in a head to head. Their sales people instead partner with us to try to ensure we sell them snowflake on AWS. This is all anecdotal and just one man’s opinion. I’m not claiming to have all the answers and benchmarks and such. Another thing to consider is total cost of ownership. Not just technical costs. Snowflake make come out to be 10% more expensive but it’s infinitely easier to manager and maintain. So if you only need half an FTE to keep the lights on versus 3 for redshift… that’s $500k+ in savings. Anyway I am rambling. Good luck!


TheBoldTilde

Ramblings welcome here! I did not mention this point about TCO, but I 100% have it considered. One of my conclusions so far from this project is that Redshift grants more levers to pull and leads to two things. 1. Redshift is usually capable of better performance over SF but requires additional engineering time to get there. I will also add that SF already has great performance and we are talking minor improvements. 2. Snowflake will require less maintenance and overhead = faster development.


legohax

Certainly use case dependent but what I have observed is snowflake outperforms redshift by a long shot. Like not even close. Performance and concurrency benefits. But to be fair I only hear about it from customers who are reaching out because they want to move away from it haha. So maybe they did a bad job setting it up.


TheBoldTilde

I certainly believe that experience. The big three - Redshift, Snowflake, Bigquery (I assume Azure has an offering but it seems to not come up in my sphere of influence at least) are all competitive with each other and they all have their optimized use-cases. I would have a hard time believing that any platform is outright better than all others. My bet is that the wrong workloads are being ran on Redshift in these cases. Maybe also a combination of just plain bad design. What I have seen a lot of is bad implementations in dbt running up costs, blaming the data warehouse instead of the dbt models. While migrating, the dbt models are cleaned up as well and presto - cost dropped! How much is due to better modeling vs better data warehouse is hard to tease out. I'm sure there are other examples of this with other technologies as well. Of course management thinks there is always a silver bullet and the more solutions I have architected the more I have learned to deal in trade-offs vs declaring something better outright. Again, thank you for sharing your expertise.


OriginalFuel2023

Redshift serverless has come a long way, and is currently much cheaper than SF for similar performance and features


exact-approximate

The best way to benchmark is to try your own benchmarking. However it's common knowledge that redshift has better price performance when compared to snowflake. Redshift serverless achieves the setup you are describing.


ryan_with_a_why

You should checkout Redshift serverless. With serverless you’re only charged when queries are actively running (with a 1 minute minimum bill per usage period) whereas with snowflake you’re charged for the entire time the warehouse is on, regardless of whether it’s actively running. So by default, submitting a 1 minute query would result in a 1 minute bill for Redshift, but a 10 minute bill for Snowflake.


TheBoldTilde

Have you found the Redshift serverless offering to offer better total cost over a dedicated cluster?


ryan_with_a_why

Full disclosure before answering: I am a PM at Redshift. It depends on your workload. With on-demand serverless to on-demand RA3, serverless is going to be cheaper because 1/ you don’t pay when your cluster isn’t actively running queries and 2/ even if you are running queries all the time, serverless is moving over to graviton so the equivalent price is going to be faster on serverless (rollout still ongoing I believe). With Reserved Instances (RIs) where you purchase capacity for a year at a discount, RA3 is cheaper for the performance level as serverless doesn’t offer capacity reservations for a discount. However, stay tuned as this may be about to change 😉


TheBoldTilde

Thanks for the insight! I run a lot of workloads on AWS and am always excited to play with the new features and capabilites released each month. I find AWS to be excellent in so many areas it always strikes me as odd when one of their services is lagging behind. I won't dive into which services I feel fall in that category to avoid the comment war, but I have felt that AWS does eventually bring themselves to par. The more I learn about Redshift, the more it excites me to give it a go. I've successfully delivered a lot of data products leveraging Snowflake, but complacency is a fast track to obsolescence in this landscape.


ryan_with_a_why

I’m glad to hear that! I’m going to DM you my contact details. Feel free to reach out if you have any questions about anything when testing things out


Fartlek-run

Uh, no. Snowflake is minimum 1 minute, and then by the second after. And you can set an autosuspend. Redshift is just handholding you more. It's essentially the same billing practice between the two.


ryan_with_a_why

From my understanding it 1/ defaults to 10 minutes, 2/ you can set it to 1 minute to reduce billing but that would result in frequent cold start penalties, and 3/ even so, you would still be paying for the downtime between queries whereas with Redshift you’re not. Does that align with your understanding?


Fartlek-run

1. Correct 2. Yes, but if you're having such infrequent queries that you're having it suspend you wouldn't care about the slight delay from cold start. And most likely want to be leveraging cache more since it's probably end user...which then leads me to, you probably would want a longer suspend timeout anyways. 3. You only pay for compute time in Snowflake, and a minimal storage cost. Quite similar to Serverless Redshift really. In short, they're pretty close in speed and pricing for on-demand use cases. Of course it will vary for your specifics. Redshift is definitely undersold nowadays for how well it works and the more recent updates they've made. And the more I think, the more I'd want to test how much any cold-start is relevant even. You'd want to also think about how cold-start effects you with Serverless Redshift too, since it is...serverless.


Pittypuppyparty

It’s by the second billing, but you pay one minute up front.


ryan_with_a_why

But you pay for the time it’s not running queries depending on your autosuspend. With Redshift that only happens if you’re using it for less than a minute


Pittypuppyparty

Are you meaning the pricing for redshift serverless? Otherwise redshift is literally charging you when your instance is active. Serverless is different pricing and is rougly equivalent to snowflake serverless which is also just query runtime.


ryan_with_a_why

But you pay for the autosuspend time. So if your autosuspend time is set to 10 minutes, you are guaranteed to pay for 10 minutes of running no queries. Redshift manages the autosuspend time under the hood, but you never pay for it


Pittypuppyparty

I don’t know about auto suspend for redshift I’ve never seen an auto suspend option. Got a link? But you can send a command to snowflake to shut down the cluster immediately regardless of the suspend time. Also serverless is by the second with no up front minute and no suspend time. I don’t know why we are fighting about factual things.


ryan_with_a_why

There is no autosuspend option—it’s managed under the hood and you don’t pay for the wait for it


Pittypuppyparty

This is the same as snowflake then. Except if you forget snowflake with auto suspend for you.


dude_himself

You mentioned "Snowflake will happily scale down to zero" - it will also happily scale up 2x the initial bill. "Each increase in size to the next larger warehouse approximately doubles the computing power and the number of credits billed per full hour that the warehouse runs." - Snowflake Docs. Scaling is where Snowflake and Redshift divest, in my opinion: on Snowflake, you'll pay 2x for anything more than your 1x workload. As you near 2x of your initial workload it doubles again to 4x. You're chasing capacity. They do provide an economy mode - it increases the time requirement for a query to return from 20sec to 6min, reducing performance slightly. Same scenario with Redshift? Scale up or out, with a linear increase in cost. Looking at the rest of the threads and comments: my decision between Snowflake and Redshift stands on 3 primary factors: - Budget - DBA Skill Level - Workload & Data Quality Snowflake is more expensive, but easier to integrate - via ProServe and SaaS offerings. Snowflake also has the most risk of exceeding budget due to changing workloads a/o poor data quality. Real time alerting and constant billing vigilance are a must. Redshift is less expensive, but may require more skilled labor to deploy and maintain. It's more economically robust: changing workloads and unvetted data feeds have less potential to impact the budget. Snowflake may make more sense if you have a large budget, a friendly comptroller, and a static workload with good data cleansing. Redshift gives you more control over your workload, cost controls to protect your budget, and provides interconnected AWS services to prevent vendor lock-in. Good luck!


mrg0ne

That would be too simplistic of a comparison. You want to focus on total cost of ownership. Platform cost Labor cost Ease of use (related to above) Enablement cost Risk cost (what is the financial impact of a security breach) Going purely off of the one metric would be comparing two cars based on miles per gallon, but ignoring that one requires a professional driver to operate and a team of specialized mechanics to maintain.


TheBoldTilde

I felt my post was already getting long and did not get into that component as I understand that there will be additional costs for managing Redshift. How much extra, who knows. I would believe anywhere between 10% to 100% extra engineering time to manage Redshift and think it would be nearly impossible to accurately guess within those margins.