A lawsuit claims Google took people’s data without their knowledge or consent to train its AI products, including chatbot Bard.

  • plebeian_@lemmy.ml
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    5
    ·
    1 year ago

    search engines point to your site though. You are getting back something. An LLM won‘t give a reference. It’s something else altogether.

    And there is no „robots.txt“ to block LLM training scrapers.

    Just because you publish something doesn’t imply you forfeit copyright.

    • Touching_Grass@lemmy.world
      link
      fedilink
      English
      arrow-up
      15
      arrow-down
      6
      ·
      edit-2
      1 year ago

      Their work isn’t being reproduced and sold. Seems like fair use. I hate to say it but I’m with google on this. Things would get much with these lawsuits succeeding

      • AbidanYre@lemmy.world
        link
        fedilink
        English
        arrow-up
        11
        arrow-down
        4
        ·
        1 year ago

        No, but it is being used commercially for a profit.

        This seems like a situation copyright law never saw coming.

        • fubo@lemmy.world
          link
          fedilink
          English
          arrow-up
          8
          arrow-down
          2
          ·
          1 year ago

          If I read a bunch of copyrighted books, and answer questions based on the knowledge I have acquired from them, I do not owe the authors anything.

          • diggit@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            1
            ·
            1 year ago

            TLDR: maybe it’s like a library? Libraries pay for books, even digital copies.

            Presumably somebody bought a copy of the book, even if you found it on the coffee table.

            This seems more like going through the trash for anything legible, reading billboards and taking free newspapers. It just happens that a lot of the stuff put out at the curb was copyrighted material. In fact, almost every website has © in the footer, so clearly the sentiment is “don’t copy my original content”, especially without credit. But if the AI is not reproducing, in whole or in part, the copyrighted material then it does seems a bit late to try to claw back value just because someone else found a way to monetize what you put out on the open web. I think that’s what’s going to have to be proven, one way or another.

            Maybe another way to look at a LLM is as an enormous library, but instead of borrowing books and periodicals, as a user you are borrowing the pre-digested knowledge directly. Libraries have complex agreements in place with publishers, so that rights holders are compensated. Say what you will about these contracts, but they are a precedent. What is perhaps without precedent is how to handle the rest of the trash this library is indiscriminately gathering up.

        • Touching_Grass@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          3
          ·
          1 year ago

          And that’s fair use. But I’m more on the side that a lot of things should be more fair use and modern content creators have ruined the internet. I would much prefer if the Sara Silverman’s and others realized they cant both try to use the internet for free promotion while also preventing specific people from consuming that content. I’d love if they all left. I don’t need these people as much they need us

    • fubo@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      3
      ·
      edit-2
      1 year ago

      See my edit. You own the copyright to your work but you do not own ① facts about your work, or ② facts contained in your work. The creators of reference works, for example, cannot assess a royalty fee from people who learn information from those reference works.

      And there is no „robots.txt“ to block LLM training scrapers.

      robots.txt is consulted by all manner of automated tools, not just search-engine indexing.