• corstian@lemmy.world
    link
    fedilink
    English
    arrow-up
    16
    ·
    1 year ago

    Am I the only one thinking these trust tokens are not going to prevent bots from scraping websites?

    Eventually, somewhere, someone will just develop the infrastructure to work their way around this, right?

    • diyrebel@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      5
      ·
      edit-2
      1 year ago

      It would stop beneficial bots like the ones I create¹ as a small-time hobbyist because the little guy does not have the resources for this arms race. You may be right when it comes to large-scale scraping ops that are done by a business (e.g. scraping RyanAir or Southwest airlines so an airfare consolidation site can show more fares).

      ① e.g. I wrote a bot that scraped the real estate market sites, scraped the public transport sites, and found me a house with the shortest public transport commute.

        • diyrebel@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          5
          ·
          1 year ago

          It was coded 8 years ago in Tcl¹ for a one-off project in Belgium. Would you really be interested?

          The APIs would have changed dramatically by now & some of the real estate sites no longer exist. Some of the sites brought in CAPTCHAs. It was coded to use Tor & the public transport site is Tor-hostile and also changed the API. It’s also very user unfriendly… a collection of scripts & variety of hacks.

          I didn’t publish the code at the time because I worried that it would trigger the target sites to become bot-hostile.

          ① Also note that I use Tcl for personal use but I resist publishing any Tcl code because I would rather not promote the Tcl language. Why? Because the Tcl have jailed a large portion of their docs in Cloudflare’s walled garden. I believe programming language docs should be openly public.

        • diyrebel@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          Perhaps not at all.

          But the limitation of using #Selenium is a big one. Being forced to work in java, forced to use the resource hog of a modern gui browser, forced to reveal more browserprint info, being browser-dependent, etc. Selenium is my last choice when desperation is sufficiently high.