AI crawlers now solves the Anubis challenges crawling Codeberg
(social.anoxinon.de)
22 points by moelf 38 minutes ago
Last time I checked, Anubis used SHA256 for PoW. This is very GPU/ASIC friendly, so there's a big disparity between the amount of compute available in a legit browser vs a datacentre-scale scraping operation.
A more memory-hard "mining" algorithm could help. by Retr0id 1 minute ago
Crazy thought but what if you made the work required to access the site equal the work required to host site. Host the public part of the database on something like torrent. Render website from db locally. Not easy, but maybe possible?
by hyghjiyhu 1 minute ago
I'm calling it now, this is the beginning of all of the remaining non-commerical properties on the web either going away, or getting hidden inside of some trusted overlay network. Unless the "AI" race slows down or changes or some other act of god happens, the incentives are aligned that I foresee wide swaths of the net getting flogged to death.
by rpcope1 2 minutes ago
This was beyond predictable. The monetary cost of proof of work is several orders of magnitude too small to deter scraping (let alone higher yield abuse), and passing the challenges requires no technical finesse basically by construction.
by jsnell 5 minutes ago
We need to revive 402 Payment Required, clearly. If we lived in a world where we could easily set up a small trusted online balance for microtransactions that's interoperable with everyone, and where giving others a literal penny for their thoughts could allow for running up a significant bill for abusers, I'd gladly play along.
by zahlman 0 minutes ago
> Anubis sits in the background and weighs the risk of incoming requests. If it asks a client to complete a challenge, no user interaction is required.
> Anubis uses a proof-of-work challenge to ensure that clients are using a modern browser and are able to calculate SHA-256 checksums. Anubis has a customizable difficulty for this proof-of-work challenge, but defaults to 5 leading zeroes.
When I go to Codeberg or any other site using it, I'm never asked to perform any kind of in-browser task. It just has my browser run some JavaScript to do that calculation, or uses a signed JWT to let me have that process cached.
Why shouldn't an automated agent be able to deal with that just as easily, by just feeding that JavaScript to its own interpreter? by zahlman 4 minutes ago