Copyrighted material can be used or reproduced only with a license that allows for it. If the license forbids you from using the copyrighted material for business purposes, and you do it anyway, then it’s pirating.
If they are training the AI with copyrighted data that they aren’t paying for, then yes, they are doing the same thing as traditional media piracy. While I think piracy laws have been grossly blown out of proportion by entities such as the RIAA and MPAA, these AI companies shouldn’t get a pass for doing what Joe Schmoe would get fined thousands of dollars for on a smaller scale.
In fact when you think about the way organizations like RIAA and MPAA like to calculate damages based on lost potential sales they pull out of thin air training an AI that might make up entire songs that compete with their existing set of songs should be even worse. (not that I want to encourage more of that kind of bullshit potential sales argument)
The act of copying the data without paying for it (assuming it’s something you need to pay for to get a copy of) is piracy, yes. But the training of an AI is not piracy because no copying takes place.
A lot of people have a very vague, nebulous concept of what copyright is all about. It isn’t a generalized “you should be able to get money whenever anyone does anything with something you thought of” law. It’s all about making and distributing copies of the data.
Where does the training data come from seems like the main issue, rather than the training itself. Copying has to take place somewhere for that data to exist. I’m no fan of the current IP regime but it seems like an obvious problem if you get caught making money with terabytes of content you don’t have a license for.
the slippery slope here is that you as an artist hear music on the radio, in movies and TV, commercials. All this hearing music is training your brain. If an AI company just plugged in an FM radio and learned from that music I’m sure that a lawsuit could start to make it that no one could listen to anyone’s music without being tainted.
That feels categorically different unless AI has legal standing as a person. We’re talking about training LLMs, there’s not anything more than people using computers going on here.
So then anyone who uses a computer to make music would be in violation?
Or is it some amount of computer generated content? How many notes? If its not a sample of a song, how does one know how much of those notes are attributed to which artist being stolen from?
What if I have someone else listen to a song and they generate a few bars of a song for me? Is it different that a computer listened and then generated output?
To me it sounds like artists were open to some types of violations but not others. If an AI model listened to the radio most of these issues go away unless we are saying that humans who listen to music and write similar songs are OK but people who write music using computers who calculate the statistically most common song are breaking the law.
Potentially yes, if you use existing IP to make music, doing it with a computer isn’t going to change anything about how the law works. It does get super complicated and there’s ambiguity depending on the specifics, but mostly if you do it a not obvious way and no one knows how you did it you’re going to be fine, anything other than that you will potentially get sued, even if whatever you did was a legally permissible use of the IP. Rightsholders generally hate when anyone who isn’t them tries to make money off their IP regardless of how they try to do it or whether they have a right to do it unless they paid for a license.
That sounds like a setup to only go after those you can make money from and not actually protecting IP.
By definition if your song is a hit it is heard by everyone. How do we show my new song is a direct consequence of hearing X song while your new song isn’t due to you hearing X song?
I can see an easy lawsuit by putting out a song and then claiming that anyone who heard it “learned” how to play their new album this way. The fact AI can output something that sounds different than any individual song it learned from means we can claim nearly all works derivative.
A lot of the griping about AI training involves data that’s been freely published. Stable Diffusion, for example, trained on public images available on the internet for anyone to view, but led to all manner of ill-informed public outrage. LLMs train on public forums and news sites. But people have this notion that copyright gives them some kind of absolute control over the stuff they “own” and they suddenly see a way to demand a pound of flesh for what they previously posted in public. It’s just not so.
I have the right to analyze what I see. I strongly oppose any move to restrict that right.
The problem with those things is that the viewer doesn’t need that license in order to analyze them. They can just refuse the license. Licenses don’t automatically apply, you have to accept them. And since they’re contracts they need to offer consideration, not just place restrictions.
An AI model is not a derivative work, it doesn’t include any identifiable pieces of the training data.
It’s also pretty clear they used a lot of books and other material they didn’t pay for, and obtained via illegal downloads. The practice of which I’m fine with, I just want it legalised for everyone.
I’m wondering when i go to the library and read a book, does this mean i can never become an author as I’m tainted? Or am I only tainted if I stole the book?
That’s the whole problem with AI and artists complaining about theft. You can’t draw a meaningful distinction between what people do and what the ai is doing.
The reality is that there’s a bunch of court cases and laws still up in the air about what AI training counts as, and until those are resolved the most we can make is conjecture and vague moral posturing.
Closest we have is likely the court decisions on music sampling and so far those haven’t been consistent, and have mostly hinged on “intent” and “affect on original copy sales”. So based on that logic whether or not AI training counts as copyright infringement is likely going to come down to whether or not shit like “ghibli filters” actually provably (at least as far as a judge is concerned) fuck with Ghibli’s sales.
court decisions on music sampling and so far those haven’t been consistent,
Grand Upright Music, Ltd. v. Warner Bros. Records Inc. (1991) - Rapper Biz Markie sampled Gilbert O’Sullivan’s “Alone Again (Naturally)” without permission
Bridgeport Music, Inc. v. Dimension Films (2005) - any unauthorized sampling, no matter how minimal, is infringement.
VMG Salsoul v. Ciccone (2016) - to determine whether use was de minimis it must be considered whether an average audience would recognize appropriation from the original work as present in the accused work.
Campbell v. Acuff-Rose Music, Inc. (1994) - This case established that the fact that money is made by a work does not make it impossible for fair use to apply; it is merely one of the components of a fair use analysis
It’s not quite cut and dry as there’s also the recent decisions by the supreme court:
Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith (2023) - “At issue was the Prince Series created by Andy Warhol based on a photograph of the musician Prince by Lynn Goldsmith. It held Warhol’s changes were insufficiently transformative to fall within fair use for commercial purposes, resolving an issue arising from a split between the Second and Ninth circuits among others.”
Jack Daniel’s Properties, Inc. v. VIP Products LLC (also 2023) - “The case deals with a dog toy shaped similar to a Jack Daniel’s whiskey bottle and label, but with parody elements, which Jack Daniel’s asserts violates their trademark. The Court unambiguously ruled in favor of Jack Daniel’s as the toy company used its parody as its trademark, and leaving the Rogers test on parody intact.”
The aforementioned Rogers test was quoted in both decisions but with pretty different interpretations of the coverage of “parody.”
One thing seems to be the key: intent
As long as AI isn’t purposefully trained to mimic a style to then it’s probably safe, but things like style LoRAs and style CLIP encodings are likely gonna be decided on whether the supreme court decided to have lunch that day.
Note that both of those rulings are for the original rights holders (and therefore against AI tech).
What’s interesting to me is that we now have a goliath vs goliath fight with AI tech in one corner and mpaa and riaa (+ a lot of case history) in the other.
Either was I can’t see David (us) coming out on top.
“Exploiting copyrighted content” is an incredibly vague concept that is not illegal. Copyright is about distributing copies of copyrighted content.
If I am given a copyrighted book, there are plenty of ways that I can exploit that book that are not against copyright. I could make paper airplanes out of its pages. I could burn it for heat. I could even read it and learn from its contents. The one thing I can’t do is distribute copies of it.
It’s not only about copying or distribution, but also use and reproduction. I can buy a legit DVD and play it in my own home and all is fine. Then I play it on my bar’s tv, in front of 100 people, and now it’s illegal. I can listen to a song however many times I want, but I can’t use it for anything other than private listening. In theory you should pay even if you want to make a video montage to show at your wedding.
Right now most licenses for copyrighted material specify that you use said material only for personal consumption. To use it for profit you need a special license
It’s about making copies, not just distributing them, otherwise I wouldn’t be able to be bound by a software eula because I wouldn’t need a license to copy the content to my computer ram to run it.
The enforceability of EULAs varies with jurisdiction and with the actual contents of the EULA. It’s by no means a universally accepted thing.
It’s funny how suddenly large chunks of the Internet are cheering on EULAs and copyright enforcement by giant megacorporations because they’ve become convinced that AI is Satan.
I’m absolutely against the idea of EULAs but the fact remains they are only enforceable because it’s the copying that is the reserved right, not the distribution. If it was distribution then second hand sales would be prohibitable (though thanks to going digital only that loop hole is getting pulled shut slowly but surely).
No, because training an AI is not “pirating.”
Copyrighted material can be used or reproduced only with a license that allows for it. If the license forbids you from using the copyrighted material for business purposes, and you do it anyway, then it’s pirating.
If they are training the AI with copyrighted data that they aren’t paying for, then yes, they are doing the same thing as traditional media piracy. While I think piracy laws have been grossly blown out of proportion by entities such as the RIAA and MPAA, these AI companies shouldn’t get a pass for doing what Joe Schmoe would get fined thousands of dollars for on a smaller scale.
In fact when you think about the way organizations like RIAA and MPAA like to calculate damages based on lost potential sales they pull out of thin air training an AI that might make up entire songs that compete with their existing set of songs should be even worse. (not that I want to encourage more of that kind of bullshit potential sales argument)
The act of copying the data without paying for it (assuming it’s something you need to pay for to get a copy of) is piracy, yes. But the training of an AI is not piracy because no copying takes place.
A lot of people have a very vague, nebulous concept of what copyright is all about. It isn’t a generalized “you should be able to get money whenever anyone does anything with something you thought of” law. It’s all about making and distributing copies of the data.
One of the first steps of training is to copy the data into the training data set.
Where does the training data come from seems like the main issue, rather than the training itself. Copying has to take place somewhere for that data to exist. I’m no fan of the current IP regime but it seems like an obvious problem if you get caught making money with terabytes of content you don’t have a license for.
the slippery slope here is that you as an artist hear music on the radio, in movies and TV, commercials. All this hearing music is training your brain. If an AI company just plugged in an FM radio and learned from that music I’m sure that a lawsuit could start to make it that no one could listen to anyone’s music without being tainted.
That feels categorically different unless AI has legal standing as a person. We’re talking about training LLMs, there’s not anything more than people using computers going on here.
So then anyone who uses a computer to make music would be in violation?
Or is it some amount of computer generated content? How many notes? If its not a sample of a song, how does one know how much of those notes are attributed to which artist being stolen from?
What if I have someone else listen to a song and they generate a few bars of a song for me? Is it different that a computer listened and then generated output?
To me it sounds like artists were open to some types of violations but not others. If an AI model listened to the radio most of these issues go away unless we are saying that humans who listen to music and write similar songs are OK but people who write music using computers who calculate the statistically most common song are breaking the law.
Potentially yes, if you use existing IP to make music, doing it with a computer isn’t going to change anything about how the law works. It does get super complicated and there’s ambiguity depending on the specifics, but mostly if you do it a not obvious way and no one knows how you did it you’re going to be fine, anything other than that you will potentially get sued, even if whatever you did was a legally permissible use of the IP. Rightsholders generally hate when anyone who isn’t them tries to make money off their IP regardless of how they try to do it or whether they have a right to do it unless they paid for a license.
That sounds like a setup to only go after those you can make money from and not actually protecting IP.
By definition if your song is a hit it is heard by everyone. How do we show my new song is a direct consequence of hearing X song while your new song isn’t due to you hearing X song?
I can see an easy lawsuit by putting out a song and then claiming that anyone who heard it “learned” how to play their new album this way. The fact AI can output something that sounds different than any individual song it learned from means we can claim nearly all works derivative.
A lot of the griping about AI training involves data that’s been freely published. Stable Diffusion, for example, trained on public images available on the internet for anyone to view, but led to all manner of ill-informed public outrage. LLMs train on public forums and news sites. But people have this notion that copyright gives them some kind of absolute control over the stuff they “own” and they suddenly see a way to demand a pound of flesh for what they previously posted in public. It’s just not so.
I have the right to analyze what I see. I strongly oppose any move to restrict that right.
Publically available =/= freely published
Many images are made and published with anti AI licenses or are otherwise licensed in a way that requires attribution for derivative works.
The problem with those things is that the viewer doesn’t need that license in order to analyze them. They can just refuse the license. Licenses don’t automatically apply, you have to accept them. And since they’re contracts they need to offer consideration, not just place restrictions.
An AI model is not a derivative work, it doesn’t include any identifiable pieces of the training data.
It does. For example, Harry Potter books can be easily identified.
It’s also pretty clear they used a lot of books and other material they didn’t pay for, and obtained via illegal downloads. The practice of which I’m fine with, I just want it legalised for everyone.
I’m wondering when i go to the library and read a book, does this mean i can never become an author as I’m tainted? Or am I only tainted if I stole the book?
To me this is only a theft case.
That’s the whole problem with AI and artists complaining about theft. You can’t draw a meaningful distinction between what people do and what the ai is doing.
And what of the massive amount of content paywalled that ai still used to train?
If it’s paywalled how did they access it?
By piracy.
https://arstechnica.com/tech-policy/2025/02/meta-defends-its-vast-book-torrenting-were-just-a-leech-no-proof-of-seeding/
You are dull. Very dull. There is no shortage of ways to pirate content on the internet, including torrents. And they wasted no time doing so
This isn’t quite correct either.
The reality is that there’s a bunch of court cases and laws still up in the air about what AI training counts as, and until those are resolved the most we can make is conjecture and vague moral posturing.
Closest we have is likely the court decisions on music sampling and so far those haven’t been consistent, and have mostly hinged on “intent” and “affect on original copy sales”. So based on that logic whether or not AI training counts as copyright infringement is likely going to come down to whether or not shit like “ghibli filters” actually provably (at least as far as a judge is concerned) fuck with Ghibli’s sales.
Grand Upright Music, Ltd. v. Warner Bros. Records Inc. (1991) - Rapper Biz Markie sampled Gilbert O’Sullivan’s “Alone Again (Naturally)” without permission
Bridgeport Music, Inc. v. Dimension Films (2005) - any unauthorized sampling, no matter how minimal, is infringement.
VMG Salsoul v. Ciccone (2016) - to determine whether use was de minimis it must be considered whether an average audience would recognize appropriation from the original work as present in the accused work.
Campbell v. Acuff-Rose Music, Inc. (1994) - This case established that the fact that money is made by a work does not make it impossible for fair use to apply; it is merely one of the components of a fair use analysis
That case is about fair use for parody.
Case law suggests using AI for parody is legal.
It’s not quite cut and dry as there’s also the recent decisions by the supreme court:
Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith (2023) - “At issue was the Prince Series created by Andy Warhol based on a photograph of the musician Prince by Lynn Goldsmith. It held Warhol’s changes were insufficiently transformative to fall within fair use for commercial purposes, resolving an issue arising from a split between the Second and Ninth circuits among others.”
Jack Daniel’s Properties, Inc. v. VIP Products LLC (also 2023) - “The case deals with a dog toy shaped similar to a Jack Daniel’s whiskey bottle and label, but with parody elements, which Jack Daniel’s asserts violates their trademark. The Court unambiguously ruled in favor of Jack Daniel’s as the toy company used its parody as its trademark, and leaving the Rogers test on parody intact.”
The aforementioned Rogers test was quoted in both decisions but with pretty different interpretations of the coverage of “parody.”
One thing seems to be the key: intent As long as AI isn’t purposefully trained to mimic a style to then it’s probably safe, but things like style LoRAs and style CLIP encodings are likely gonna be decided on whether the supreme court decided to have lunch that day.
Note that both of those rulings are for the original rights holders (and therefore against AI tech).
What’s interesting to me is that we now have a goliath vs goliath fight with AI tech in one corner and mpaa and riaa (+ a lot of case history) in the other.
Either was I can’t see David (us) coming out on top.
So streaming is fine but copying not
Streaming involves distributing copies so I don’t see why it would be. The law has been well tested in this area.
Well how does the AI company consume the content?
Which company us “the AI company?”
It’s exploiting copyrighted content without a licence, so, in short, it’s pirating.
“Exploiting copyrighted content” is an incredibly vague concept that is not illegal. Copyright is about distributing copies of copyrighted content.
If I am given a copyrighted book, there are plenty of ways that I can exploit that book that are not against copyright. I could make paper airplanes out of its pages. I could burn it for heat. I could even read it and learn from its contents. The one thing I can’t do is distribute copies of it.
It’s not only about copying or distribution, but also use and reproduction. I can buy a legit DVD and play it in my own home and all is fine. Then I play it on my bar’s tv, in front of 100 people, and now it’s illegal. I can listen to a song however many times I want, but I can’t use it for anything other than private listening. In theory you should pay even if you want to make a video montage to show at your wedding.
Right now most licenses for copyrighted material specify that you use said material only for personal consumption. To use it for profit you need a special license
It’s about making copies, not just distributing them, otherwise I wouldn’t be able to be bound by a software eula because I wouldn’t need a license to copy the content to my computer ram to run it.
The enforceability of EULAs varies with jurisdiction and with the actual contents of the EULA. It’s by no means a universally accepted thing.
It’s funny how suddenly large chunks of the Internet are cheering on EULAs and copyright enforcement by giant megacorporations because they’ve become convinced that AI is Satan.
I’m absolutely against the idea of EULAs but the fact remains they are only enforceable because it’s the copying that is the reserved right, not the distribution. If it was distribution then second hand sales would be prohibitable (though thanks to going digital only that loop hole is getting pulled shut slowly but surely).
Again, they are not universally enforceable. There are plenty of jurisdictions where they are not.