
In addition, he noted, “these instances from China, they could have an effect on [corporate] sites that use WordPress, or sites that are badly maintained. These businesses could find that they have intellectual property or previously hidden intranet messages now being used in AI language models.”
Google is alone in mass scraping of training data for AI, he added; other AI businesses are taking a different approach to acquiring it. “Anthropic and OpenAI used to do a lot of scraping,” he said, “but that has changed in the last year. ChatGPT still relies heavily on scraping, but is now reducing it. And we’re a seeing a massive reduction in Anthropic’s use; it’s not absolutely clear what Claude is doing, but it looks like they’re not scraping whole websites, but selecting individual pages.”
Nevertheless, IP lawyer Sigmon noted that it’s not yet possible to say what’s going to happen in the court case. “Big picture, despite the internet being around for quite some time, there’s a bit of a dearth of good case law on web scraping, especially in the manner it’s conducted today,” he said. “SerpApi’s argument might help the court begin to chew on some of those nuances, but I wouldn’t necessarily characterize it as an easy win.”
