The lawsuit alleges OpenAI crawled the web to amass huge amounts of data without people’s permission.

  • Protegee9850@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Scraping is protected. GPT and the line are more akin to fair use machines than plagiarism machines. This is a lot of hot air to go nowhere. Rage bait

  • nH95sp@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Likely an unpleasant or possibly infeasible thing to implement, but designing the AI to always be able to “show the receipts” for how it’s formulating any given response could potentially be helpful. Suppose that could result in like a micro-royalties sort of industry to crop up for sourced data being used, akin to movies or TV using music and paying royalties

    • Protegee9850@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      1 year ago

      The worst rush to legislation is done in the name of stopping terrorists and saving the children. Always.

  • Hick@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Scraping social media posts and reddit posts doesn’t sound like stealing, they’re public posts.

      • sik0fewl@kbin.social
        link
        fedilink
        arrow-up
        0
        ·
        1 year ago

        Just because something is posted online doesn’t mean it can be taken a resold. Copyright law prevents that. Of course, copyright law and generative AI is new and gray area.

    • sudneo@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 year ago

      Here is not just scraping though, it is also using that data to create other content and to potentially also re-publish that data (we have no way of knowing whether chatGPT will spit out any of that nor where did it take what is spitting out).

      The expectation that social media data will be read by anybody is fair, but the fact is that the data has been written to be read, not to be resold and published elsewhere too.

      It is similar for blog articles. My blog is public and anybody can read it, but that data is not there to be repackaged and sold. The fact that something is public does not mean I can do whatever I want with it.

      • seasick@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        1 year ago

        I could read your blog post and write my own blog post, using yours as inspiration. I could quote your post, add a link back to your blog post and even add affiliate links to my blog post.I could be hired to do something like that for the whole day

        • sudneo@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          ChatGPT doesn’t get inspired, the process is different and it could very well spit verbatim the content. You can do all the rest (depending on the license) without issues, but once again this is not what chatGPT does, as it doesn’t provide attribution.

          It’s exactly the same with software, in fact.

    • SamB@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 year ago

      I doubt it’s only about some Reddit posts. The scrapping was done on the whole web, capturing everything it could. So besides stealing data and presenting it as its own, it seems to have collected some even more problematic data which wasn’t properly protected.

      • zekiz@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        1 year ago

        But that really isn’t OpenAI’s fault. Whoever was in charge of securing the patients data really fucked up.

        • krellor@kbin.social
          link
          fedilink
          arrow-up
          1
          ·
          1 year ago

          Leaving your front door open isn’t prudent but doesn’t grant permission to others to enter and take/copy your belongings or data.

          The security teams may have royally screwed up, but OpenAI has a legal obligation to respect copyright and laws regarding data ownership.

          Likewise, they could have scraped pages that included terms of use, copyright, disclaimers, etc., and failed to honor them.

          All parties can be in the wrong for different reasons.

        • Apathy Tree@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          0
          ·
          1 year ago

          It’s certainly their fault that they used it, though.

          If they cared, they could have ensured they weren’t using sensitive or otherwise highly problematic information, but they chose not to. That’s on them.

          • MercuryUprising@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            arrow-down
            2
            ·
            1 year ago

            It’s called “disrupting” the established norms. You wouldn’t get it because you’re not on the bleeding edge of a revolutionary platform that’s seeing scalable vertical growth due to its paradigm shift.

      • tallwookie@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        1 year ago

        if it was unsecured it’s basically public. whomever put that data on a publicly accessible server is at fault

  • Craynak_Zero@kbin.social
    link
    fedilink
    arrow-up
    0
    ·
    1 year ago

    I would rather an AI have that data, instead of any of the Demons of Google, Twitter, Youtube, etc. The AI won’t abuse me the way those companies do on a daily basis.

    • SkierniewiceBoi@kbin.social
      link
      fedilink
      arrow-up
      0
      ·
      1 year ago

      @Craynak_Zero
      But that’s why they need this ai trained with enormous amount of data. Such ai can be much better in understanding how to keep your engagement how to convince you to buy something etc. As long as it’s (not so)open ai connected with Microsoft I’d say it’s exactly the same
      @L4s

      • Craynak_Zero@kbin.social
        link
        fedilink
        arrow-up
        0
        ·
        1 year ago

        Can’t convince me of anything. I’m completely immune to such manipulation. Only subhumans are vulnerable, and I don’t really care about subhumans or what happens to them, except to laugh at their misery.

        What I do care about though, is money being wasted on frivolous lawsuits. The article seems pretty vague on how started this lawsuit too, probably to protect themselves from public scrutiny. The cowards.