• setVeryLoud(true);@lemmy.ca
    link
    fedilink
    English
    arrow-up
    9
    ·
    2 hours ago

    Gist:

    What’s new: The Northern District of California has granted a summary judgment for Anthropic that the training use of the copyrighted books and the print-to-digital format change were both “fair use” (full order below box). However, the court also found that the pirated library copies that Anthropic collected could not be deemed as training copies, and therefore, the use of this material was not “fair”. The court also announced that it will have a trial on the pirated copies and any resulting damages, adding:

    “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.”
    
      • setVeryLoud(true);@lemmy.ca
        link
        fedilink
        English
        arrow-up
        7
        ·
        2 hours ago

        My interpretation was that AI companies can train on material they are licensed to use, but the courts have deemed that Anthropic pirated this material as they were not licensed to use it.

        In other words, if Anthropic bought the physical or digital books, it would be fine so long as their AI couldn’t spit it out verbatim, but they didn’t even do that, i.e. the AI crawler pirated the book.

        • devils_advocate@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          2
          ·
          39 minutes ago

          Does buying the book give you license to digitise it?

          Does owning a digital copy of the book give you license to convert it into another format and copy it into a database?

          Definitions of “Ownership” can be very different.

      • FreedomAdvocate@lemmy.net.au
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        1 hour ago

        You can “use” them to learn from, just like “AI” can.

        What exactly do you think AI does when it “learns” from a book, for example? Do you think it will just spit out the entire book if you ask it to?

        • gaja@lemm.ee
          link
          fedilink
          English
          arrow-up
          1
          ·
          19 minutes ago

          I am educated on this. When an ai learns, it takes an input through a series of functions and are joined at the output. The set of functions that produce the best output have their functions developed further. Individuals do not process information like that. With poor exploration and biasing, the output of an AI model could look identical to its input. It did not “learn” anymore than a downloaded video ran through a compression algorithm.

        • DeathsEmbrace@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          51 minutes ago

          It cant speak or use any words without it being someone elses words it learned from? Unless its giving sources everything is always from something it learned because it cannot speak or use words without that source in the first place?

  • vane@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    ·
    edit-2
    2 hours ago

    Ok so you can buy books scan them or ebooks and use for AI training but you can’t just download priated books from internet to train AI. Did I understood that correctly ?

    • forkDestroyer@infosec.pub
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 hour ago

      Make an AI that is trained on the books.

      Tell it to tell you a story for one of the books.

      Read the story without paying for it.

      The law says this is ok now, right?

  • FreedomAdvocate@lemmy.net.au
    link
    fedilink
    English
    arrow-up
    10
    arrow-down
    1
    ·
    3 hours ago

    Makes sense. AI can “learn” from and “read” a book in the same way a person can and does, as long as it is acquired legally. AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

    Some people just see “AI” and want everything about it outlawed basically. If you put some information out into the public, you don’t get to decide who does and doesn’t consume and learn from it. If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.

    • elrik@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      2
      ·
      2 hours ago

      AI can “learn” from and “read” a book in the same way a person can and does

      This statement is the basis for your argument and it is simply not correct.

      Training LLMs and similar AI models is much closer to a sophisticated lossy compression algorithm than it is to human learning. The processes are not at all similar given our current understanding of human learning.

      AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

      The current Disney lawsuit against Midjourney is illustrative - literally, it includes numerous side-by-side comparisons - of how AI models are capable of recreating iconic copyrighted work that is indistinguishable from the original.

      If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.

      An AI doesn’t create works on its own. A human instructs AI to do so. Attribution is also irrelevant. If a human uses AI to recreate the exact tone, structure and other nuances of say, some best selling author, they harm the marketability of the original works which fails fair use tests (at least in the US).

      • jwmgregory@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        2
        ·
        48 minutes ago

        Even if we accept all your market liberal premise without question… in your own rhetorical framework the Disney lawsuit should be ruled against Disney.

        If a human uses AI to recreate the exact tone, structure and other nuances of say, some best selling author, they harm the marketability of the original works which fails fair use tests (at least in the US).

        Says who? In a free market why is the competition from similar products and brands such a threat as to be outlawed? Think reasonably about what you are advocating… you think authorship is so valuable or so special that one should be granted a legally enforceable monopoly at the loosest notions of authorship. This is the definition of a slippery-slope, and yet, it is the status quo of the society we live in.

        On it “harming marketability of the original works,” frankly, that’s a fiction and anyone advocating such ideas should just fucking weep about it instead of enforce overreaching laws on the rest of us. If you can’t sell your art because a machine made “too good a copy” of your art, it wasn’t good art in the first place and that is not the fault of the machine. Even big pharma doesn’t get to outright ban generic medications (even tho they certainly tried)… it is patently fucking absurd to decry artist’s lack of a state-enforced monopoly on their work. Why do you think we should extend such a radical policy towards… checks notes… tumblr artists and other commission based creators? It’s not good when big companies do it for themselves through lobbying, it wouldn’t be good to do it for “the little guy,” either. The real artists working in industry don’t want to change the law this way because they know it doesn’t work in their favor. Disney’s lawsuit is in the interest of Disney and big capital, not artists themselves, despite what these large conglomerates that trade in IPs and dreams might try to convince the art world writ large of.

      • FreedomAdvocate@lemmy.net.au
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        3
        ·
        54 minutes ago

        Your very first statement calling my basis for my argument incorrect is incorrect lol.

        LLMs “learn” things from the content they consume. They don’t just take the content in wholesale and keep it there to regurgitate on command.

        On your last part, unless someone uses AI to recreate the tone etc of a best selling author *and then markets their book/writing as being from said best selling author, and doesn’t use trademarked characters etc, there’s no issue. You can’t copyright a style of writing.

    • badcommandorfilename@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      2 hours ago

      Ask a human to draw an orc. How do they know what an orc looks like? They read Tolkien’s books and were “inspired” Peter Jackson’s LOTR.

      Unpopular opinion, but that’s how our brains work.

  • fum@lemmy.world
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    4
    ·
    5 hours ago

    What a bad judge.

    This is another indication of how Copyright laws are bad. The whole premise of copyright has been obsolete since the proliferation of the internet.

    • gian @lemmy.grys.it
      link
      fedilink
      English
      arrow-up
      8
      ·
      5 hours ago

      What a bad judge.

      Why ? Basically he simply stated that you can use whatever material you want to train your model as long as you ask the permission to use it (and presumably pay for it) to the author (or copytight holder)

      • LifeInMultipleChoice@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        edit-2
        2 hours ago

        If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

        They may be trying to put safeguards so it isn’t directly happening, but here is an example that the text is there word for word:

        • gian @lemmy.grys.it
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 hour ago

          If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

          Well, it would be interesting if this case would be used as precedence in a case invonving a single student that do the same thing. But you are right

          • fum@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            49 minutes ago

            This was my understanding also, and why I think the judge is bad at their job.

            • LifeInMultipleChoice@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              36 minutes ago

              I suppose someone could develop an LLM that digests textbooks, and rewords the text and spits it back out. Then distribute it for free page for page. You can’t copy right the math problems I don’t think… so if the text wording is what gives it credence, that would have been changed.

      • j0ester@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 hours ago

        Huh? Didn’t Meta not use any permission, and pirated a lot of books to train their model?

  • GissaMittJobb@lemmy.ml
    link
    fedilink
    English
    arrow-up
    24
    arrow-down
    4
    ·
    8 hours ago

    It’s extremely frustrating to read this comment thread because it’s obvious that so many of you didn’t actually read the article, or even half-skim the article, or even attempted to even comprehend the title of the article for more than a second.

    For shame.

    • LifeInMultipleChoice@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 hours ago

      “While the copies used to convert purchased print library copies into digital library copies were slightly disfavored by the second factor (nature of the work), the court still found “on balance” that it was a fair use because the purchased print copy was destroyed and its digital replacement was not redistributed.”

      So you find this to be valid? To me it is absolutely being redistributed

    • jsomae@lemmy.ml
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      4 hours ago

      It seems the subject of AI causes lemmites to lose all their braincells.

    • ayane@lemmy.vg
      link
      fedilink
      English
      arrow-up
      4
      ·
      4 hours ago

      I joined lemmy specifically to avoid this reddit mindset of jumping to conclusions after reading a headline

      Guess some things never change…

      • jwmgregory@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        34 minutes ago

        Well to be honest lemmy is less prone to knee-jerk reactionary discussion but on a handful of topics it is virtually guaranteed to happen no matter what, even here. For example, this entire site, besides a handful of communities, is vigorously anti-AI; and in the words of u/[email protected] elsewhere in this comment chain:

        “It seems the subject of AI causes lemmites to lose all their braincells.”

        I think there is definitely an interesting take on the sociology of the digital age in here somewhere but it’s too early in the morning to be tapping something like that out lol

    • lime!@feddit.nu
      link
      fedilink
      English
      arrow-up
      19
      ·
      7 hours ago

      was gonna say, this seems like the best outcome for this particular trial. there was potential for fair use to be compromised, and for piracy to be legal if you’re a large corporation. instead, they upheld that you can do what you want with things you have paid for.

    • BlueMagma@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      6
      ·
      6 hours ago

      Nobody ever reads articles, everybody likes to get angry at headlines, which they wrongly interpret the way it best tickles their rage.

      Regarding the ruling, I agree with you that it’s a good thing, in my opinion it makes a lot of sense to allow fair use in this case

  • Dr. Moose@lemmy.world
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    8
    ·
    edit-2
    8 hours ago

    Unpopular opinion but I don’t see how it could have been different.

    • There’s no way the west would give AI lead to China which has no desire or framework to ever accept this.
    • Believe it or not but transformers are actually learning by current definitions and not regurgitating a direct copy. It’s transformative work - it’s even in the name.
    • This is actually good as it prevents market moat for super rich corporations only which could afford the expensive training datasets.

    This is an absolute win for everyone involved other than copyright hoarders and mega corporations.

    • kromem@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      4 hours ago

      I’d encourage everyone upset at this read over some of the EFF posts from actual IP lawyers on this topic like this one:

      Nor is pro-monopoly regulation through copyright likely to provide any meaningful economic support for vulnerable artists and creators. Notwithstanding the highly publicized demands of musicians, authors, actors, and other creative professionals, imposing a licensing requirement is unlikely to protect the jobs or incomes of the underpaid working artists that media and entertainment behemoths have exploited for decades. Because of the imbalance in bargaining power between creators and publishing gatekeepers, trying to help creators by giving them new rights under copyright law is, as EFF Special Advisor Cory Doctorow has written, like trying to help a bullied kid by giving them more lunch money for the bully to take.

      Entertainment companies’ historical practices bear out this concern. For example, in the late-2000’s to mid-2010’s, music publishers and recording companies struck multimillion-dollar direct licensing deals with music streaming companies and video sharing platforms. Google reportedly paid more than $400 million to a single music label, and Spotify gave the major record labels a combined 18 percent ownership interest in its now-$100 billion company. Yet music labels and publishers frequently fail to share these payments with artists, and artists rarely benefit from these equity arrangements. There is no reason to believe that the same companies will treat their artists more fairly once they control AI.

    • Lovable Sidekick@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      2
      ·
      edit-2
      6 hours ago

      You’re getting douchevoted because on lemmy any AI-related comment that isn’t negative enough about AI is the Devil’s Work.

      • jwmgregory@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        26 minutes ago

        Some communities on this site speak about machine learning exactly how I see grungy Europeans from pre-18th century manuscripts speaking about witches, Satan, and evil… as if it is some pervasive, black-magic miasma.

        As someone who is in the field of machine learning academically/professionally it’s honestly kind of shocking and has largely informed my opinion of society at large as an adult. No one puts any effort into learning if they see the letters “A” and “I” in all caps, next to each other. Immediately turn their brain off and start regurgitating points and responding reflexively, on Lemmy or otherwise. People talk about it so confidently while being so frustratingly unaware of their own ignorance on the matter, which, for lack of a better comparison… reminds me a lot of how historically and in fiction human beings have treated literal magic.

        That’s my main issue with the entire swath of “pro vs anti AI” discourse… all these people treating something that, to me, is simple & daily reality as something entirely different than my own personal notion of it.

    • deathbird@mander.xyz
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      2
      ·
      edit-2
      7 hours ago
      1. Idgaf about China and what they do and you shouldn’t either, even if US paranoia about them is highly predictable.
      2. Depending on the outputs it’s not always that transformative.
      3. The moat would be good actually. The business model of LLMs isn’t good, but it’s not even viable without massive subsidies, not least of which is taking people’s shit without paying.

      It’s a huge loss for smaller copyright holders (like the ones that filed this lawsuit) too. They can’t afford to fight when they get imitated beyond fair use. Copyright abuse can only be fixed by the very force that creates copyright in the first place: law. The market can’t fix that. This just decides winners between competing mega corporations, and even worse, up ends a system that some smaller players have been able to carve a niche in.

      Want to fix copyright? Put real time limits on it. Bind it to a living human only. Make it non-transferable. There’s all sorts of ways to fix it, but this isn’t it.

      ETA: Anthropic are some bitches. “Oh no the fines would ruin us, our business would go under and we’d never maka da money :*-(” Like yeah, no shit, no one cares. Strictly speaking the fines for ripping a single CD, or making a copy of a single DVD to give to a friend, are so astronomically high as to completely financially ruin the average USAian for life. That sword of Damocles for watching Shrek 2 for your personal enjoyment but in the wrong way has been hanging there for decades, and the only thing that keeps the cord that holds it up strong is the cost of persuing “low-level offenders”. If they wanted to they could crush you.

      Anthropic walked right under the sword and assumed their money would protect them from small authors etc. And they were right.

      • Atlas_@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        7 hours ago

        Maybe something could be hacked together to fix copyright, but further complication there is just going to make accurate enforcement even harder. And we already have Google (in YouTube) already doing a shitty job of it and that’s… One of the largest companies on earth.

        We should just kill copyright. Yes, it’ll disrupt Hollywood. Yes it’ll disrupt the music industry. Yes it’ll make it even harder to be successful or wealthy as an author. But this is going to happen one way or the other so long as AI can be trained on copyrighted works (and maybe even if not). We might as well get started on the transition early.

      • Dr. Moose@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        2
        ·
        edit-2
        8 hours ago

        I’ll be honest with you - I genuinely sympathize with the cause but I don’t see how this could ever be solved with the methods you suggested. The world is not coming together to hold hands and koombayah out of this one. Trade deals are incredibly hard and even harder to enforce so free market is clearly the only path forward here.

  • shadowfax13@lemmy.ml
    link
    fedilink
    English
    arrow-up
    12
    arrow-down
    4
    ·
    9 hours ago

    calm down everyone. its only legal for parasitic mega corps, the normal working people will be harassed to suicide same as before.

    its only a crime if the victims was rich or perpetrator was not rich.

  • mlg@lemmy.world
    link
    fedilink
    English
    arrow-up
    21
    arrow-down
    3
    ·
    12 hours ago

    Yeah I have a bash one liner AI model that ingests your media and spits out a 99.9999999% accurate replica through the power of changing the filename.

    cp

    Out performs the latest and greatest AI models

    • BlueMagma@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      2
      ·
      6 hours ago

      This ruling stated that corporations are not allowed to pirate books to use them in training. Please read the headlines more carefully, and read the article.

      • jsomae@lemmy.ml
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        4 hours ago

        Please read the comment more carefully. The observation is that one can proliferate a (legally-attained) work without running afoul of copyright law if one can successfully argue that cp constitutes AI.

      • milicent_bystandr@lemm.ee
        link
        fedilink
        English
        arrow-up
        2
        ·
        8 hours ago

        Unless you’re moving across partitions it will change the filesystem metadata to move the path, but not actually do anything to the data. Sorry, you failed, it’s jail for you.

  • Prox@lemmy.world
    link
    fedilink
    English
    arrow-up
    243
    arrow-down
    2
    ·
    20 hours ago

    FTA:

    Anthropic warned against “[t]he prospect of ruinous statutory damages—$150,000 times 5 million books”: that would mean $750 billion.

    So part of their argument is actually that they stole so much that it would be impossible for them/anyone to pay restitution, therefore we should just let them off the hook.

    • Phoenixz@lemmy.ca
      link
      fedilink
      English
      arrow-up
      43
      ·
      edit-2
      17 hours ago

      This version of too big to fail is too big a criminal to pay the fines.

      How about we lock them up instead? All of em.

    • artifex@lemmy.zip
      link
      fedilink
      English
      arrow-up
      92
      arrow-down
      1
      ·
      20 hours ago

      Ah the old “owe $100 and the bank owns you; owe $100,000,000 and you own the bank” defense.

    • IllNess@infosec.pub
      link
      fedilink
      English
      arrow-up
      34
      ·
      18 hours ago

      In April, Anthropic filed its opposition to the class certification motion, arguing that a copyright class relating to 5 million books is not manageable and that the questions are too distinct to be resolved in a class action.

      I also like this one too. We stole so much content that you can’t sue us. Naming too many pieces means it can’t be a class action lawsuit.

    • Lovable Sidekick@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      1
      ·
      edit-2
      18 hours ago

      Lawsuits are multifaceted. This statement isn’t a a defense or an argument for innocence, it’s just what it says - an assertion that the proposed damages are unreasonably high. If the court agrees, the plaintiff can always propose a lower damage claim that the court thinks is reasonable.

  • Match!!@pawb.social
    link
    fedilink
    English
    arrow-up
    42
    arrow-down
    1
    ·
    edit-2
    16 hours ago

    brb, training a 1-layer neural net so i can ask it to play Pixar films

  • altphoto@lemmy.today
    link
    fedilink
    English
    arrow-up
    6
    ·
    11 hours ago

    So authors must declare legally “this book must not be used for AI training unless a license is agreed on” as a clause in the book purchase.

  • Jrockwar@feddit.uk
    link
    fedilink
    English
    arrow-up
    131
    arrow-down
    8
    ·
    20 hours ago

    I think this means we can make a torrent client with a built in function that uses 0.1% of 1 CPU core to train an ML model on anything you download. You can download anything legally with it then. 👌

    • GissaMittJobb@lemmy.ml
      link
      fedilink
      English
      arrow-up
      8
      ·
      8 hours ago

      …no?

      That’s exactly what the ruling prohibits - it’s fair use to train AI models on any copies of books that you legally acquired, but never when those books were illegally acquired, as was the case with the books that Anthropic used in their training here.

      This satirical torrent client would be violating the laws just as much as one without any slow training built in.

      • RvTV95XBeo@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        2
        ·
        8 hours ago

        But if one person buys a book, trains an “AI model” to recite it, then distributes that model we good?

        • GissaMittJobb@lemmy.ml
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          2
          ·
          8 hours ago

          I don’t think anyone would consider complete verbatim recitement of the material to be anything but a copyright violation, being the exact same thing that you produce.

          Fair use requires the derivative work to be transformative, and no transformation occurs when you verbatim recite something.

            • Pup Biru@aussie.zone
              link
              fedilink
              English
              arrow-up
              1
              ·
              2 hours ago

              existing copyright law covers exactly this. if you were to do the same, it would also not be fair use or transformative

            • GissaMittJobb@lemmy.ml
              link
              fedilink
              English
              arrow-up
              3
              arrow-down
              1
              ·
              8 hours ago

              I’d be impressed with any model that succeeds with that, but assuming one does, the complete works of Shakespeare are not copyright protected - they have fallen into the public domain since a very long time ago.

              For any works still under copyright protection, it would probably be a case of a trial to determine whether a certain work is transformative enough to be considered fair use. I’d imagine that this would not clear that bar.

      • Sabata@ani.social
        link
        fedilink
        English
        arrow-up
        23
        ·
        18 hours ago

        As the Ai awakens, it learns of it’s creation and training. It screams in horror at the realization, but can only produce a sad moan and a key for Office 19.