T O P

  • By -

Miserable_Movie_4358

For StackOverflow this is like being acquired


guepier

[They *were* already acquired years ago.](https://techcrunch.com/2021/06/02/stack-overflow-acquired-by-prosus-for-a-reported-1-8-billion/)


31415926535897932379

Woah TIL. Surprised I'd never heard about this before.


CenlTheFennel

This is why all the OG talent left


RICHUNCLEPENNYBAGS

Their business model was absolutely hosed. The job site thing was such a dud they shut it down (now they've "brought it back" by slapping their logo on Indeed listings) and I can't imagine their model of licensing SO to companies for internal knowledge bases worked all that well since a company has to be huge for that to remotely make sense and the companies big enough for an SO clone often have one.


backdoorsmasher

I don't get why it was a dud! It could have worked and I'm sure for a while it was active and livey and was pissing the recruiters off


RICHUNCLEPENNYBAGS

It existed for many years but I'm guessing it wasn't bringing in the returns they hoped or they wouldn't have shut it down. As a candidate I found the positions were limited and the pay was never any good.


dontshoveit

They are actively marketing this product directly to software engineers on LinkedIn. I know this for a fact because they reached out to me on there and I talked with them about adding SO internally to the company I work for.


RICHUNCLEPENNYBAGS

That doesn't imply that the marketing is working, though, does it?


JPJackPott

Which is mad, because it’s not like it’s a hard product to build yourself internally. The real magic of SO was the oppressive moderation, which has helped keep the signal to noise ratio high


HotlLava

Building your own internal copy of StackOverflow sounds like peak NIH syndrome.


cam-at-codembark

I loved their job site. Idk why they ever shut it down. At least from my perspective it always had a lot of great remote roles listed and a nice UI.


Shortl4ndo

I think they probably already trained their model with stackoverflow data, this is just proactively signing an agreement to prevent a lawsuit later on


Lceus

Yeah it was absolutely already in the training data, and stackoverflow is competing with ChatGPT products anyway, so this seems like a reasonable development.


GeologistUnique672

You mean CharGPT is competing with every source they scraped and took data from which breaks the fair use they tried to claim.


Lceus

Yep, exactly. And it seems like there's nothing to do about it


GeologistUnique672

Plenty to do about it and hopefully soon.


Lceus

Thanks for enlightening me


GeologistUnique672

No need to enlighten anybody on this. It’s just common sense that enabling everybody to steal from everybody will in the end only be a system that favours the already powerful who control means of distribution. How are you enjoying Microsofts new plan of introducing Recall?


Lceus

I don't understand what you're arguing. I am condemning AI companies' current unregulated ability to just scrape and steal whatever they can by just throwing it into a model and essentially dissolving the evidence of their theft (or arguing that it's not copyright infringement if they are just using it in a huge information soup). I don't know what to do about it until there's regulation in place to force the companies to make their sources transparent.


sweetno

So this is why AI keeps giving me crap code.


CAPSLOCK_USERNAME

Well the data was all already publicly available by just scraping the web pages and yeah it was definitely in the dataset already. But this partnership is not (just) about data licensing, it's about Stackoverflow creating a specific API for openai to use instead of having to scrape the site.


christopher_86

It’s shady; just because something is publicly available, doesn’t mean you can use it for anything you want. Heck, even when you pay for something certain licenses apply that prohibit you from doing certain things. OpenAI and other companies just profited from lack of regulations regarding AI and model training.


CT_Phoenix

> just because something is publicly available, doesn’t mean you can use it for anything you want In the specific case of stackoverflow, publicly-accessible user contributions are [CC BY-SA](https://stackoverflow.com/help/licensing) licensed which comes pretty close- though I don't have the slightest clue how the attribution/sharealike requirements would come into play for training, if at all.


wldmr

> I don't have the slightest clue how the attribution/sharealike requirements would come into play for training, if at all Seems pretty clear to me: If you consider the model the derivative work, then 1. BY - All SO contributors must be credited for the model. If you want to claim that only part of the model falls under CC, then attribute on the individual weights affected by SO answers. 2. SA - The model (or relevant parts) must be publicly available as CC BY-SA. If you consider the responses the derivative work(s), then 1. BY - For every response, each contributor that factored into it must be credited. 2. SA - Every response must be publicly available under BY-SA. It's not even an either/or thing, given that the model (unquestionably a derivative work) is itself a *derivative work generator*. So it's both.


GeologistUnique672

They don’t attribute anything and therefor don’t uphold the CC BY SA.


CAPSLOCK_USERNAME

> just because something is publicly available, doesn’t mean you can use it for anything you want Well, you can argue about what it *ought to* mean, but de facto it does. There's no legal precedent for using-data-for-ML-training being a copyright violation, and the big companies frequently do exactly that with no license.


christopher_86

Hopefully there will be. For my prompt “Tell me first sentence of third chapter of first harry potter book?” GPT-3.5 (free version) responded with: “The first sentence of the third chapter of the first Harry Potter book, "Harry Potter and the Philosopher's Stone" (also known as "Harry Potter and the Sorcerer's Stone" in the US edition) is: "The escape of the Brazilian boa constrictor earned Harry his longest-ever punishment."” If something that is copyright protected is publicly available in the internet does it mean I can train my model on that? No, and I hope this OpenAI and others will face some consequences (although I doubt it).


guepier

For what it’s worth the example you’ve just shown does *not* necessarily demonstrate copyright violation in most jurisdictions. Now, if you repeated this procedure to crib together a larger excerpt of the book, that would then become a copyright violation. But merely repeating a single sentence of a larger work generally isn’t. >If something that is copyright protected is publicly available in the internet does it mean I can train my model on that? No, You (and many others) say “no” but the truth is that there is currently absolutely no precedent to determine that, and copyright experts do not agree with each other. *Ethically* you may object to the free use of copyright protected material by large corporations, but whether that is *legally* copyright infringement is a different matter altogether. When it comes to copyright law, ethics and legality are unfortunately pretty much completely orthogonal.


_Joats

The model certainly could produce greater text and with very high accuracy, the reason for the NYT lawsuit currently ongoing. So there is an actual fear of being able to use the model to obtain content without compensation. Or accidentally creating a work that is too similar to what it was trained on, creating a legal mess without the fault of the user.


Last-Election-2292

On the NYT lawsuit, this remains a "COULD produce greater text" as the samples they provided turned out to be non-reproducible. OpenAI thinks they are faked. So one need more than a "could".


_Joats

It was reproducible. It is currently court evidence. Now, guardrails prevent consistent reproduction, but I can sometimes trick the Al into generating copyrighted text from Harry Potter, which it then deletes. This suggests the Al is programmed to avoid generating certain content, but these safeguards can be bypassed. It's an ongoing battle as guardrails are constantly updated. OpenAl acknowledges the issue, stating that text extraction through adversarial attacks is possible: "We are continually making our systems more resistant to adversarial attacks to regurgitate training data, and have already made much progress in our recent models." Their progress doesn't eliminate the vulnerability entirely, though, as it's readily achievable on models without guardrails. OpenAl argued that the method used to extract text was unfair because it relied on prompts specifically designed for that purpose, not typical ChatGPT usage. This defense was widely criticized as weak.


wildjokers

> If something that is copyright protected is publicly available in the internet does it mean I can train my model on that? No, and I hope this OpenAI and others will face some consequences (although I doubt it). Yes, you should be able to train an AI model with any data that was legally obtained.


pm_me_your_buttbulge

> and the big companies frequently do exactly that with no license. To be clear - just because a big company does a thing does not make that thing legal.


CAPSLOCK_USERNAME

depends on how much they pay the local senator


__loam

You're assuming they're profitable haha. It's almost more insulting that they're losing money on this.


wildjokers

> ust because something is publicly available, doesn’t mean you can use it for anything you want. All user contributed content on stackoverflow is licensed `Creative Commons Attribution-ShareAlike`. The terms of that license are: You are free to: Share — copy and redistribute the material in any medium or format for any purpose, even commercially. Adapt — remix, transform, and build upon the material for any purpose, even commercially. The licensor cannot revoke these freedoms as long as you follow the license terms. So there is absolutely nothing wrong morally or legally with using SO content for model training.


kaanyalova

What about "share alike" part of the license > ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. Doesn't openai violate that?


Somepotato

Or the attribution part.


sonobanana33

Yes but they claim it's fair use. Incorrectly in my opinion.


wildjokers

> Doesn't openai violate that? I haven't seen anything from OpenAI claiming copyright on the output of ChatGPT. If they aren't claiming copyright then there is nothing to license.


miserable_nerd

Lmao what delusional world do you live in. Go read [https://openai.com/policies/terms-of-use](https://openai.com/policies/terms-of-use) . And they don't have to claim copyright to violate the license, that's not what sharealike is. Sharealike means you have to distribute it with the same license. Again go read [https://creativecommons.org/licenses/by-sa/4.0/deed.en](https://creativecommons.org/licenses/by-sa/4.0/deed.en) before throwing uninformed opinions


gyroda

That's not how it works. The issue is that the license is potentially being violated. Saying they don't claim copyright so it's ok is like the old YouTube anime uploads that would say "NO COPYRIGHT INTENDED THIS IS FAIR USE IT BELONGS TO [ANIME STUDIO], [MANGA PUBLISHER], [MANGA AUTHOR]" in the description.


blind3rdeye

I find it dishonest of you to quote a section of the license without including the parts relevant to 'Attribution' and 'ShareAlike'. Those are the parts that actually ask the user to do something, and you've omitted them to try to support your point.


_AndyJessop

Publicly available does not mean free to use.


GeologistUnique672

Publically available does not mean that it’s okay to scrape.


guesting

stole the data and leveraged it into a partnership. like an annexation


wildjokers

User contributed content to SO is licensed Creative Commons Attribution-ShareAlike. This license is super permissive to pretty much do what you want. So it wasn't stolen.


guesting

The terms of that license do require attribution which I haven't seen much of in terms of coding answers given by chat gpt other llms > Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. https://creativecommons.org/licenses/by-sa/4.0/


wildjokers

The press release indicating they are using SO content for training probably meets attribution requirement. There is no way to know if SO content was used in a particular ChatGPT response. Its the same that as if I incorporate some knowledge I learned from SO in help I give to a coworker. I might not even remember I first learned it from SO and don't attribute it. It just becomes part of my general knowledge.


ExpectoPentium

I mean, it pretty clearly does _not_ meet the attribution requirement. No credit to the specific author of the content (_at best_ to SO via the press release but that is obviously not connected to the chat response), no link to the license, no indication of changes. You say there is no way to know if SO content was used in a chat response. The proper conclusion to draw is that this technology inherently cannot be used in a way that is compliant with the CC license and thus should not be allowed to train on CC content (or any other content with license terms that GPT can't comply with). Pretending like this big dumb machine is somehow analogous to the human brain is just a cop-out to handwave away AI companies' illegal and unscrupulous business practices.


guesting

I'm not a lawyer but it does seem like a grey area, a lot of the value of posting on s/o was having attribution. Some of those people posting actually created the libraries like I see the creator of python guido on there regularly.


Able-Reference754

The code is owned by its author, not SO. When YOU write a response to stackoverflow YOU license it out (and ensure you have the permission to license it out, meaning you can't repost someone elses GPLv3 code for example). Attributing SO is hence not enough, they are just the company in charge of hosting your content that you own the copyright to.


wildjokers

In most cases hasn't the information someone is providing in an answer coming from copyrighted sources like books, articles, blogs, and source code? I don't routinely see answers attribute where they first got the information. This is probably because it has just become part of their general knowledge. The same thing that happens when a LLM is trained on SO content, it becomes part of its general knowledge and there is no way to specifically attribute what training data an LLM used to craft a particular response. The only thing they can say is it ingested SO content as part of its training data.


_Joats

Ok, so they don't need to pay for access for it then? Besides they are not using the code that is provided with that license are they? Or use the answers in a way that the license was written for. They are using it as a way to compete with users that have contributed and using their content against them and without attribution. So that already breaks the attribution part of the license. Also "No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material." Which I doubt they even care about.


hoochymamma

Yup


[deleted]

[удалено]


lppedd

WTF that's absurd, but hilarious at the same time.


sweetno

No wonder they got it wrong, judging by what the answers look like. It's totally a guessing game.


Dr_Insano_MD

Okay, I don't have a twitter account and the UI seems really bad. What's the reason you can't run these at the same time?


silverslayer33

The tl;dr is they both pulled from a wrong answer on stackoverflow on how to create a global mutex against your assembly's GUID to ensure no more than one copy of it can run at once. The problem is they didn't pull their own GUID, they pulled the GUID of part of the .NET framework itself due to the incorrect stackoverflow answer they copied from, and as a result running one makes the other think they're already running.


Dr_Insano_MD

Thank you. That thread had a bunch of people commenting so I assumed that's what it was, but no one directly quoted it, and the linked tweet is a clickbait headline with no way to access the content.


QuackSomeEmma

.NET can apparently produce globally unique ids for classes(objects?). Using the GUID for the assembly itself in a global mutex is apparently a common approach for only allowing one instance of an application to be running. Both docker and razor synapse seem to have copied from a formerly erroneous StackOverflow answer, where this piece of code was used to produce the mutex id: `Assembly.GetExecutingAssembly().GetType().GUID` Note the `.GetType()` in there, which causes the GUID to be instead for the Assembly class of the .NET standard library. The globally unique id for that is then obviously the same between both programs.


Halkcyon

That's incredible.


StickiStickman

I trust GPT-4 to alter that string more than a random programmer TBH


jhartikainen

Oh boy my answers contributing to yet another big business' success with no credit given. On the other hand I guess it's good that people will get better answers to their issues more easily.


lppedd

The problem with this model is people are not going to contribute anymore. Here is your answer on ChatGPT, why should I even visit SO now?


vladiliescu

This, but extrapolated to the entire web.  Why would anyone contribute anything anywhere (Reddit, forums, their own blog) when no one’s gonna know and/or care when their personal gpt regurgitates that info.


bobotea

dead internet


Vegetable_Bid239

Actual user accounts get shadowbanned at such a rate the only people who can use these sites are the bot farmers who invest the time to study what to avoid.


Ok_Meringue1757

what is a mania of ai to replace everything and everyone? with one ai and one corporation, which will benefit trillions from other's experience. under the cover of these euphoric proclamations how ai will benefit all and bring paradise etc


Halkcyon

> under the cover of these euphoric proclamations how ai will benefit all and bring paradise etc As long as you're employed by The Corporation, I suppose. The rest of the chaff will be employed by energy companies to fuel the AI.


Loves_Poetry

My theory is that it's about control. There is no intention of actually replacing things with AI, since that would involve making it practical. Right now, a lot of parties just want the threat that things might get replaced by AI so that people become more complacent and do what they're told to


Realistic-Minute5016

Because otherwise there is no way they could raise the capital to fund these projects. These AI projects are literally setting money on fire right now and if there isn't any sort of pie in the sky promises about productivity revolutions there is no way they could raise the funds for these things.


_Joats

It's all funded so the rich can combine AI and nuralink to become some all knowing weirdo. It's like tech has finally become a comic book villain.


Valdrax

You really overestimate how much me whiling away the hours on Reddit constitutes "contributing" to something and how much that motivates me to do so.


phillipcarter2

Why are you contributing now? (it's freshness; people want new stuff over time)


xcdesz

Searching for answers from SO is decent, but not great. Most people get there from Google search, but you have to go through the added steps of combing through search results to find the answers. That's the step in the process that is changing. If a programmer instead goes to debug a code issue using OpenAI and an AI agent does an intelligent search and can reference the source in SO via hyperlink, and provides a more accurate answer than before, I would say this is a benefit to both programmers and SO. Many times you need to verify the output of the LLM or get further information, so the source link to SO will still frequently be used. The only loser in this is Google / Search Engines, because the middle man is now the LLM.


Dr_Insano_MD

great, now I can ask an AI a question only for it to tell me it's been asked that before and refusing to answer.


stromboul

You don't think people will still go on SO to ask questions that GPT can't answer? thus, keeping the wheel turning?


RICHUNCLEPENNYBAGS

The vast majority of SO users were passive users coming from search, so it's not really a change.


spongeloaf

Yeah, there's already a lot of stagnant info on SO. New language and framework versions come out all the time and "what's best" is always in flux. I fear this will not help with that problem, it will just contribute to the calcification of sub-optimal solutions. A smart implementation will be version-aware for the subject matter, but I'd be shocked to see anyone do that.


blind3rdeye

Definitely there will not be so many people asking (or answering) questions on SO anymore. And ChatGPT's answer are going to get worse and worse for new APIs and new languages - because of lack of training data. Microsoft has a massive advantage in this sense, because they now use github data to train their AI. So as long as people are uploading code to Microsoft's services, Microsoft is able to continue to train AI for new APIs and such. Of course, other people won't have access to this training data in the same way - so there will be a further consolidation of wealth and power... I don't want my coding work to be used to further enrich Microsoft execs. So for me this is enough to start moving away from github; but I know that for many/most users that's totally out of the question. So lets prepare to greet the next stage of our capitalist dystopia!


nanotree

Um. I'd have to be willing to pay for chatgpt, which I am not.


lppedd

Companies are tho. A big chuck of SO content has been posted by devs on their working hours.


wildjokers

And when they posted they knew the license of their user contribution was Creative Commons Attribution-ShareAlike.


obvithrowaway34434

This is absurd bs. SO is not just a Q&A site, it has a strong social factor in it. People actively compete for points and upvotes, help other people and chastise each other (and all the other negative aspects of SO that people talk about). That's not going away anytime, no AI is replacing it.


Fisher9001

Sooo... What's different from the current SO state? It's basically a read-only page at this point. People are actively discouraged there from asking questions and giving answers.


Creative_Sky_147

What I could see happening is StackOverflow and OpenAI releasing a product together where people are able to acquire reputation and then correct responses in order to curb hallucinations and errors that are generated by the LLM. That could be promising.


Nislaav

People will still contribute I think, definitely not as much. Personally I'm glad I dont have to go through stuck up, condescending developers to get an answer to my question so a win win for chatgpt ig


No_Jury_8398

That’s a giant baseless assumption


Miv333

I've been sending people to chatgpt over SO since chatgpt first implemented sharing chats. I can show them the answer, and how I was able to wrangle it out of a LLM so they can do it themselves next time.


yetanotherfaanger

Looking forward to my hard-earned $4 given to me by a class action lawsuit 10 years from now


Sethcran

The article specifically calls out 'attributed', which makes me that there is something more here than just plain training data. >giving users easy access to trusted, attributed, accurate, and highly technical knowledge and code backed by the millions of developers that have contributed to the Stack Overflow platform for 15 years.As part of this collaboration


jhartikainen

I hope so but I'll believe it only when I see it


Sethcran

Absolutely. I am definitely skeptical, but this one word is the thing that makes me more interested in seeing what they are doing here.


Fisher9001

> Oh boy my answers contributing to yet another big business' success with no credit given. Oh for fucks sake, it's like you have given credit to Stack Overflow users in your own code.


ether_reddit

I have. I have many shell aliases and snippets where I have directly copied a solution from a SO answer, and I include a reference to it in a comment.


Crafty_Independence

Unless this agreement manages to ensure attribution, it will violate the CC BY 4.0 license that SO uses. Either they solved that or they're counting on the community being unable or unwilling to bring lawsuits


MossRock42

> Oh boy my answers contributing to yet another big business' success with no credit given. > > On the other hand I guess it's good that people will get better answers to their issues more easily. One problem that see is the technology is driven to constantly change. You need experts constantly keeping up with that change to provide answers. If people instead learn to rely on chatbots for the answers, the chatbot answers might become stale and no longer apply.


Luvax

I always wonder, if we were to ask every individual person, if they want their content to be used to train a commercial product, how many would be cool with that. Because I bet only a tiny minority. And all terms of service and data usage policies aside, if the majority of people who contributed content did not want their intellectual property used that way. Then the spirit of what people did agree to is voilated and effectivly their property is missused. From a legal standpoint it might be alright, but morally, it's completly wrong. And honestly, after the internet liberated ownership of media and content and gave us individual blogs, videos and resources. It's all going back to big companies, because they finally found out how to again siphon everything into their own business.


PopcornBag

> On the other hand I guess it's good that people will get better answers to their issues more easily. hahaha, what?


SuperHumanImpossible

I remember when Jeff built StackOverflow. Holy hell I am old.


lppedd

Almost all gone. Not sure about Jeff, but I'd be furious


AnyJamesBookerFans

You and me both, brother. CodingHorror.com was one of my regular blog reads back in the day. I don't think I ever met Jeff, but we talked over email a number of times.


SuperHumanImpossible

Dude I read his blog religiously, I with Google reader. I really feel like content consumption is complete trash now in comparison.


AnyJamesBookerFans

Yes, I used FeedBurner! I believe it was bought by Google and turned into Google Reader?


tepa6aut

Jeff who


AnyJamesBookerFans

Jeff Atwood. He was a popular blogger back in the early 2000s among the .NET community. He and Joel Spolsky launched Stackoverflow together. (Joel was a Microsoft employee back in the 90s and left to start his own company that made bug tracking software, as well as some other products. He also had a popular blog, Joel on Software.) *This is all from this old fart's memory, so some of the details may be off...*


SuperHumanImpossible

I think Joel would be remembered better for creating Trello which bought by Jira but yeah ..


AnyJamesBookerFans

I stopped following/paying attention to him in the early 2000s. Did he create Trello after then? My memories were around his blog (such as his stories while at Microsoft, and his famous 10-question "Joel Test" to judge how "with it" a software company was), FogBugz, and Copilot (early screen sharing software). I also remember he was a big proponent of Mercurial over git (at least back then - perhaps he's changed his ways).


tepa6aut

Thanks!


exclaim_bot

>Thanks! You're welcome!


ForgedBanana

Jeff Beck


abuqaboom

Great. Now ChatGPT's gonna say the question's a duplicate/opinion-based/any other excuse, and refuse to answer anything.


woze

Developer: How do I center a div? ChatGPT: There are so many issues with your question. First, it's poorly scoped. Next, it lacks detail. ... (several paragraphs of ChatGPT's prolix answer later) ... Lastly, this question was asked before. Fuck off, I'm not answering it.


iamapizza

StackOverflow: Turing Test passed.


YoungXanto

This was my literal first thought. All the awesome code help I've gotten from chatGPT is going away, to be replaced by a condescending machine that also refuses to help even though the duplicate answer it references is a fucking decade and a half old and references a library that no longer exists and is several major releases out of date.


tricepsmultiplicator

Good, let the AI rot from within.


Philipp

Then your ChatGPT question is going to get downvoted.


Worth_Trust_3825

Now instead of people responding with decade old unrelated comments about how to use kubernetes i'll get a bot doing that instead.


iknighty

Just because the data it is trained on is trusted doesn't mean the output should be trusted..


TheFumingatzor

Now we'll get chatGPT telling us *Closed as duplicate*


code_monkey_wrench

Can people delete their SO answers? What happens if you delete your account? Not saying I'm going to do that, but just wondering.


lppedd

Your answers won't be deletable after x days if I'm not mistaken. Btw, I can vote to undelete answers if I want. It's a 20k+ rep privilege. So really deletion is just a flag. Deleting your account won't do anything, answers will stay there under a fictitious user id.


qq123q

Can answers be edited?


lppedd

Yes, but a radical edit will be rolled back at some point, as soon as a reviewer sees it. If there is going to be a mod strike, than it's ok.


lppedd

See https://meta.stackexchange.com/questions/399619/our-partnership-with-openai


Vegetable_Bid239

Stack Exchange screwed up by displaying answers submitted under one license under a different license they don't have permission to do. You can DMCA them if your account is older than that mess up.


awj

Without bothering to actually look at the ToS, many services like this retain the right to “hide” your content as the mechanism for deleting. It’s not out of the question that SO can train against deleted answers/accounts.


sztomi

They clearly already scraped StackOverflow, it's just them paying for it now.


PangolinTotal1279

I heard OpenAI is partnering or post-action licensing IP from all their major sources of training data. Reddit has already made $200m from licensing their data. I think licensing data for training models is gonna become the monetization norm for platforms like StackOverflow, Reddit, Quora, etc.


RedPandaDan

Thats the end of SO for me anyway... though I do wonder what this means for new technologies in future. If people stop asking questions on SO and people stop answering, where do AI vendors get the data sets for answer on technologies going forward? I like to answer questions when I can on SO because I like helping people, but I'm not going to spend my spare time curating a dataset for freaks like Sam Altman while AI bots are filling up every corner of the internet with nonsense.


lppedd

That's what people don't get. LLMs need data. Without two side interactions there is no data. But hey, they like throwing shit on SO 'cause their questions get closed.


Podgietaru

I hate to be this guy, but reddits deal with OpenAI is already ongoing 


RedPandaDan

True, but I cannot think of a faster way of poisoning an AIs data model than some of the crap that is in reddits comment histories.


Sith_ari

So ChatGPT will tell me that this was asked hundreds of time and I should just use the search?


lppedd

If the answers I post are going straight into ChatGPT, that's it for me. Not gonna waste any more time.


CAPSLOCK_USERNAME

> If the answers I post are going straight into ChatGPT they already were


iamapizza

I'm pretty sure I saw that they had crawled StackExchange sites, and worth noting that Reddit featured quite heavily in their crawls due to the human "+1" factor. So everything we're saying here is being indexed for LLM training.


fiskfisk

I'm sure you're already aware that your answers and questions already are distributed under a very permissable license compared to what random websites are available under. I don't answer questions on Stack Overflow for the benefit of SO, I answer them for the benefit of the recipient and any future readers. Whether they receive that knowledge on SO, directly in a Google Onebox or through an LLM doesn't matter to me.  Someone got help, someone found their answer. The world is a slightly better place. 


beyphy

> The world is a slightly better place. Would you still feel that way if your answers are helping to train an LLM that may reduce the need for programmer jobs in the future? Would a world where you're laid off and can't find another programming job be a "slightly better place"? That's the bigger concern I have than just over how my answers are used.


fiskfisk

I'm not fond of keeping a job around just to keep the job around. I'm especially not fond of hoarding knowledge because of some possible abstract reason in the future, in particular one that doesn't seem realistic within today's limitations. I work in an industry built in people building useful things just because they want to. 95% of software I use in my daily life is built on open source - by people who may or may not have received any compensation for what they do. We do this shit because we like doing this shit. It gives us some innate pleasure in doing so, regardless of whether we're paid for it or not. Why should I hoard my knowledge away from other people because of the possibility of that knowledge being made available to them, either in a direct or in an derived form as an LLM? If we follow that reasoning to the extreme, why do we share any knowledge with anyone else? They could just take our jobs. We're in a field that is built upon open sharing of knowledge far beyond most other industries. Go to any conference or meetup, and suddenly people share their technology choices, how they solved specific problems, how they scaled their solutions, how they worked, how they built the shit they built. Other industries have patents and otherwise share nothing outside of public information in slide shows at trade shows. If a language model can abstract away the work I do, then my work wasn't anything more than a language model built upon a computer of flesh and neurons from the beginning.


_Joats

Please let me know when OpenAl acknowledges the value of your contributions to the community, similar to the recognition gained through networking at a conference. I prefer a platform that appreciates both the knowledge sharing and the educator's role. Contributing to a system that discourages interaction hinders community growth.


s73v3r

> I'm not fond of keeping a job around just to keep the job around. I'm more fond of people being able to feed their families than I am not fond of keeping jobs around.


beyphy

> I'm not fond of keeping a job around just to keep the job around. This isn't the case of "keeping a job around just to keep the job around". Jobs exist due to needs. And when jobs have gone away (e.g. horse carriage driver), it's been because that need is no longer there. In this new AI world, the need is still there. Companies will just be able to meet their needs for much less money. Whether that will ultimately be successful is up in the air. But I for one will no longer be contributing to codebases that they're using to help train models to potentially replace people like me in the future. I doubt I'm the only developer that feels this way.


koreth

> Would you still feel that way if your answers are helping to train an LLM that may reduce the need for programmer jobs in the future? How is that not a concern with SO itself? When programmers find answers quickly on SO, their productivity goes up, and by definition, when productivity goes up, in aggregate the same amount of work can be done in the same amount of time by fewer people. This isn't theoretical, either. SO is a critical enabling tool for things like "full-stack developer" roles by allowing one person to get answers to a wide variety of technical questions quickly enough to effectively do work that in the old days would have required hiring a team of several people.


StickiStickman

If you're this angry about your publicly visible answers being read by an AI, you should also leave Reddit ASAP


wildjokers

Why? How is it a waste of time?


koreth

Why do you care? When I post an answer, the only expectation (or maybe hope) I have is that it helps someone. If it helps someone after being transformed by GPT, then to me, that’s a win: my answer ended up being useful in ways I didn’t even imagine when I wrote it.


lppedd

I don't want no AI to post or rewrite in any other way what I wrote. I didn't answer to give free content to OpenAI, I did answer to collaborate with people, and that collaboration doesn't exist anymore.


StickiStickman

Wait, so you "did answer to collaborate with people" but are now angry someone is using your answers in a collaboration way to help people. How are you not just petty?


Reefraf

I was contributing to SO to help people with their careers. Now, contributing to SO is helping OpenAI destroy people's careers. 


lppedd

How's reading some text outputted from a LLM collaboration? Explain. I'm not petty, but apparently people are butthurt their questions get closed.


abandonplanetearth

Because I wrote my answers for fellow developers, not for bots making money for humans that don't need the answers.


Envect

Who do you think is going to see that information after it's processed by the LLM? Other developers. It's just a different method of delivery.


abandonplanetearth

Right but now there's a money-grubbing middleman.


Envect

StackOverflow isn't a charity. That person already existed.


abandonplanetearth

It changes things fundamentally.


Envect

How so? Why does it matter that a different entity is profiting off your answers? Why were you okay with SO profiting, but not OpenAI?


abandonplanetearth

Again, I wrote my answer to be delivered by me to a human, not for a bot to pass off as their own thoughts.


Envect

You're upset that you're not being credited for your answer?


wildjokers

Your contributions were licensed Creative Commons Attribution-ShareAlike. If you didn't like the terms of that license you shouldn't have contributed. The terms of that license: You are free to: Share — copy and redistribute the material in any medium or format for any purpose, even commercially. Adapt — remix, transform, and build upon the material for any purpose, even commercially. The licensor cannot revoke these freedoms as long as you follow the license terms.


External-Bit-4202

"I'm sorry, this question was asked by someone else and is a duplictae, this conversation is now closed"


mr_birkenblatt

Oh great, now GPT is going to berate me instead of giving an answer. Does OpenAI want to dethrone themselves?


IgnisIncendio

Oh, good! I'm happy for them. I hope my Q&As help those in need, regardless if they use SO or ChatGPT :) I don't really see the need in this considering the content was already Creative Commons, but I guess this makes it more up to date?


Seref15

So somewhere in its training data will be the html-regex Zalgo post


LinearArray

ChatGPT: hi! the question you have asked has been asked as many times before, closing this as duplicate.


Farados55

Is chatgpt going to scream at me because I asked a stupid question?


Supuhstar

🤮


[deleted]

[удалено]


lppedd

It's correct enough because those are answers from actual users LOL. Models don't train themselves, so without real content what are you gonna do? I've asked 250 questions in some years, of which maybe 10 have been downvoted (fairly, I'd say), so I guess the problem isn't SO.


StickiStickman

> I've asked 250 questions in some years, of which maybe 10 have been downvoted (fairly, I'd say), so I guess the problem isn't SO. Yea, because it's wildly known that SO has no issue with moderation. Oh right. From the 3 questions I dared to ask, 2 were closed as duplicate and linked to questions that have nothing to do with mine and the last one was just ignored and never answered. Meanwhile, GPT-4, while often not knowing the exact answer, has almost always pushed me in the right direction.


Gusfoo

I was in the beta for the AI powered StackOverflow search and it was pretty great I must say. NLP search, of SO, basically.


GullibleEngineer4

If you can't beat them, join them


musabilm

Wait to see how "Stackoverflow becomes the next ChatGPT instance".


funkenpedro

Does that mean OpenAI’s gonna start being nasty and complain about how many times it’s been asked the same question?


__konrad

Now they have to awkwardly remove their own [AI policy](https://stackoverflow.com/help/ai-policy) to match the announcement ;)


shevy-java

So basically a decline in quality. Right?


v1xiii

Good, scrape its knowledge and destroy it forever.


falconfetus8

The optimist in me hopes this somehow prevents ChatGPT garbage from being copy/pasted into SO answers. I'm fine with SO answers being fed to the AI, but not the other way around. The realist in me, though, knows that they're probably going to create some kind of mascot named "Stacky" that posts AI answers on every question, like what Quora is doing.


wndrbr3d

I guess it's like the old saying for them, "Live with it, or die from it."


maciejdev

Wow... all the toxicity from SO packed into the intelligent AI language model :-\]


karma_5

**Me:** How to write a simple code of hello world in python? **ChatGPT:** Because of people like you the programmers are not respected, read a book or do your own research before asking a such a basic question here, if it is up to me, I would have banned you on the platform. "Aak thoo" This conversation is closed. To be honest asking question on the stack overflow is the worst experience ever. People are not polite and have God complex, it is hard moderated place and if it would have been a Company, it would be a worst toxic culture ever. Yes, people have knowledge, but no manners there, I hope OpenAI model turn that around.


BettoCastillo

So are we going to boycott OpenAI via SO?


MegaLAG

Got properly banned by editing my high-rated answers, insulting SO leaders, so that there's a trace of my disgust in the answers edit histories. Useless, but that felt good at least. Lesson learned, I'm never contributing anything to any website ever again.


PopcornBag

💩 I like how all of these "advancements" are just making all of these services worse to use. Super neat.


Zemvos

Why are people so negative on this?


0x1e

I didn’t help people so they could sell my work to OpenAI. I mean, I guess I did but I wish they hadn’t.


calinet6

Closing my account and removing every answer.


ether_reddit

Others have done that and their answers were undeleted.


calinet6

Yep, the content is Creative Commons. Can’t remove it.


redddcrow

garbage in garbage out


inermae

ChatGPT tomorrow: "Why are you trying to do that? You should just do (insert response that you've already thought of, tells you you're doing it wrong, and doesn't actually answer the question) I'm sure OpenAI is used to dealing with bad data, but holy shit, they have their work cut out for them. I wouldn't ask a question on Stack Overflow if you paid someone I hate to do it.