I edited it to make it more readable - not changed it's meaning.
In terms of the substance - bottom line they have taken something without permission and sold it on. Sure - they have added value in the process - but if I steal your car and mash it up with another one before selling it on - it's still theft.
The original post was implying there was no harm as a result because it's just copying - the original owner was not deprived of anything.
My point was they are potentially being deprived of a living - and that's through stuff being taken without permission - not through fair competition.
What matters here is not the semnatics of theft or copyright or whatever - what matters is fairness - and I accept that's a judgement.
I don't see a problem with these companies having to either pay to incorporate material into their models or/and the authors having the right to refuse to license.
Note - that's not to stand in the way of the development of these tools, but to ensure that the effort that went in to creating them ( which includes the generation of the source material ) is properly rewarded.
If OpenAI etc al think creation is a trivial part and it doesn't need rewarding - they are free to bootstrap their models by creating all the inputs from scratch.
Perhaps you think it's fine if I took a copy of ChatGPT model without permission and started a competing service - which was cheaper because I didn't have to pay for the training costs?
They haven't lost anything - just took a copy.....
Are they going to stand in the way of me making the output of chatgpt more widely available through my cheaper pricing?????
And note I'm selling access to the output - which is different everytime ( I use a different random number seed from them ) - so I'm not selling the copy of model per se...... perfectly fair use....
> Perhaps you think it's fine if I took a copy of ChatGPT model without permission [...]
There are laws about copyright and trade secret.
> They haven't lost anything - just took a copy.....
Correct. This is why it's a copyright violation and not a theft.
> [...] fair use....
Fair use is also a legal term, and it has some (reasonably) specific meanings. It's noteworthy that the large copyright protection industries don't respect those terms and have automated DMCA takedowns to abuse people for things which age legal:
"the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright."
> This is why it's a copyright violation and not a theft.
But is it? Copying the OpenAI model is only potentially copyright - as you have to prove it's not exempt via fair use etc. Note I'm not selling it on - I'm just selling the output - which isn't soley determined by the model - it's determined by the model plus random numbers plus context - what I'm selling is only partly determined by the source model I copied.
Now if I copied it and used it to undercut your original buisness - then clearly that's not fair use - but that's rather my point no?
These companies have clearly copied source material without permission on a huge scale - but because it's copying and the people haven't lost the original - there is in effect another test - do the original people lose out as result etc.
It's quite clear - say in the news industry which might be supported by advertising - that copying content and then presenting a summary version so that people never visit the source material is clearly damaging the underlying copyright holders.
> I edited it to make it more readable - not changed it's meaning.
I think you might have added the copying bit, and that changed its meaning if not the whole topic. Then you claim I'm confusing competition with theft instead of addressing the "right to a living" part. That's kind of insincere and dishonest on your part, but fine, this other topic is interesting to me too.
I have no idea whether OpenAI, Google, Meta, Anthropic, or any other company got valid licenses for all of the books they trained on. If they didn't, they likely broke some specific laws. Go after them for that if you want. This is copyright violation, not theft.
But if I legally obtain a hundred books and pay a really smart kid to read them all to learn their style, then I pay that kid to write a new book using the style he's learned, that all seems fair and legal to me. It's the way things have been for a very long time.
For any argument you're going to make about this, please imagine a really smart kid doing it instead of a computer. And if you think there should be different laws for computers vs really smart kids, go get it into legislature.
> That's kind of insincere and dishonest on your part,
Ad hominem attack - great.
> This is copyright violation, not theft.
I'm arguing it's copyright violation because it's theft of revenue. If it didn't result in any loss of revenue then it would hard to argue it wasn't fair use.
Note I'm not using theft in any special legal sense - just in the common sense English sense.
If 'learning their style' included incoporating large recognisable chunks - that smart kid would fail his English degree on plagarism grounds.....
The point about LLM's is they an do everything - from an unrecognisable 'original' mashup - to what are quite clearly regurgations of the input. Note also that the kid didn't steal the books he learnt from.....
The question is what's happening fair and good for society, not what is convenient for some very well funded companies in a hurry who see existing laws as annoying things getting in their way, rather than something to respect.
No, it was not. You moved the goal posts from artists deserving a livelihood to copyright issues, and I called you out for that.
> Note also that the kid didn't steal the books he learnt from.....
I should note this? I explicitly stated it as part of the hypothetical.
As I said above, there are already laws about copying. If you're sure they broke those laws, maybe you should criticize the powers that be for not enforcing them.
> The question is what's happening fair and good for society
I think this is a good question.
Personally, I think the benefit from having automated tutors that are attentive and patient and can answer questions about almost any topic known to man dwarfs the benefit from defending intellectual property. I hope they get cheap enough to be accessible to every person who can't afford a traditional education and accurate enough that we trust them more than typical teachers (not a high bar, unfortunately).
I donated to Wikipedia for years specifically because of its educational value while being freely available. Watching people I know learn from LLMs, and do useful interesting things with what they learned, I think the potential is much higher.
>Personally, I think the benefit from having automated tutors that are attentive and patient and can answer questions about almost any topic known to man dwarfs the benefit from defending intellectual property.
Why is it one or the other? Your argument is like saying we shouldn't pay nurses a fair wage because it get's in the way of great care for everyone.
It's not an either/or situation - it's how you allocate rewards for the different contributions to the new tech. Currently tech companies are saying there is zero value in that training data - that's clearly not the case.
I think I finally understand your misunderstanding - I'm not arguing AI should be banned as it destroys a musicians job because in the future all music will be AI generated. That's not my concern - I'm not saying anyone deserves a job in perpetuity.
My point is simply that in building the models they have to respect the current laws - and that means respecting the content owners rights and either paying what they ask or not using it.
> Your argument is like saying we shouldn't pay nurses a fair wage because it get's in the way of great care for everyone.
This argument strategy, where you make a strained analogy/metaphor, and then apply it back to the original topic - it's fragile and depends on how comparable the two ideas are. If you're just interested in winning discussions, it's a bad tactic because it opens up a whole new avenue for your opponent to attack.
Can I COPY nurses into equally valuable robots? Because if I can, then yeah - the world would be a MUCH better place with abundant and affordable nurse robots, and the human nurses can go find other jobs. I have some friends who are nurses, and after watching them fight with the medical system for their own health issues, I'm pretty sure they'd agree.
Picking at tangential points while avoiding the main argument hmm....
Admit it - you misunderstood my original point and accused me of then changing the argument.
Bottom line - the original poster was implying there was no harm because simple copying doesn't create a loss. I was pointing out that a key test ( in considering copyright issues ) is whether such an action causing harm - and in this case there are many very good cases to be made about resulting loss of revenue.
Let's be clear, I think LLMs etc are a huge technical advance - I just think it's wrong to try and ignore the law because it get's in the way of large companies attempts to make money.
> Picking at tangential points while avoiding the main argument hmm....
I've tried (and occasionally failed) to avoid the parts of what you wrote which were just the typical flame war bait. And of course I'm guilty of trying to antagonize you in a few places. The topic is interesting, but our conversation about it was not.
I appreciate the link to the UK law, but the rest of this comment thread is mostly two people talking past each other.
Because that's how it works in reality. Once the copyright holders get their teeth in something, it gets paywalled. For instance, poor people don't have (free/legal/easy) access to lots of research papers/articles which were paid for with government grants. And copyright industry associations (MPAA, RIAA, CCC, AAP, ...) lobby to extend the laws so that creative works take lifetimes to enter the public domain.
You think you're arguing in favor of the little guy who made a series of blog posts or digital art? That's naive.
> My point is simply that in building the models they have to respect the current laws
So go enforce those laws. The rent seekers will thank you.
> So go enforce those laws. The rent seekers will thank you.
Seems you have bought into the idea idea that companies like Google, Facebook and Microsoft are the poor little guys. Wow.
What we are talking about here is certain companies trying to gain a defacto monopoly on the sum of human knowledge - without paying any of those people who built it in the first place.
This is the real story.
Now it may well be their moat isn't as big as they thought it was and the greedy investors trying to do this heist will fail - but that's what they are attempting - and you are cheer leading for it.
> No, it was not. You moved the goal posts from artists deserving a livelihood to copyright issues, and I called you out for that.
Nope. I never said artistic's deserve a living - I said that people deserve protection from their living being stolen via copyright violation. You are confusing what I said, to what you mistakenly understood. I don't understand why your original misunderstanding is somehow a character flaw of mine.
In terms of the substance - bottom line they have taken something without permission and sold it on. Sure - they have added value in the process - but if I steal your car and mash it up with another one before selling it on - it's still theft.
The original post was implying there was no harm as a result because it's just copying - the original owner was not deprived of anything.
My point was they are potentially being deprived of a living - and that's through stuff being taken without permission - not through fair competition.
What matters here is not the semnatics of theft or copyright or whatever - what matters is fairness - and I accept that's a judgement.
I don't see a problem with these companies having to either pay to incorporate material into their models or/and the authors having the right to refuse to license.
Note - that's not to stand in the way of the development of these tools, but to ensure that the effort that went in to creating them ( which includes the generation of the source material ) is properly rewarded.
If OpenAI etc al think creation is a trivial part and it doesn't need rewarding - they are free to bootstrap their models by creating all the inputs from scratch.