Threat Spotlight: The good, the bad, and the ‘gray bots’

Threat Spotlight

Word from our Sponsor

Threat SpotlightThis edition of the Threat Spotlight focuses on the ‘gray bots’. Bots are automated software programs designed to carry out online activities at scale. There are good bots — such as search engine crawler bots, SEO bots, and customer service bots — and bad bots, designed for malicious or harmful online activities like breaching accounts to steal personal data or commit fraud.

In the space between them, you will find what Barracuda calls “gray bots.” Generative artificial intelligence (GenAI) scraper bots are gray bots designed to extract or scrape large volumes of data from websites, often to train generative AI models. Other examples of gray bots are web scraper bots and automated content aggregators that collect web content such as news, reviews, travel offers, etc.

Gray bots are blurring the boundaries of legitimate activity. They are not overtly malicious, but their approach can be questionable. Some are highly aggressive.

We recently reported on how organizations can better protect their web applications, including websites, against Gen AI scraper bots. In this report, we examine what the data tells us about Gen AI gray bot activity facing organizations today.

Gray bots are hungry

Threat Spotlight

Barracuda detection data shows that:

  • Between December and the end of February 2025, millions of requests were received by web applications from Gen AI bots, including ClaudeBot and TikTok’s Bytespider bot
  • One tracked web application received 9.7 million Gen AI scraper bot requests over a period of 30 days.
  • Another tracked web application received over half a million Gen AI scraper bot requests in a single day.
  • Analysis of the gray bot traffic targeting a further tracked web application found that requests remained relatively consistent over 24 hours — averaging around 17,000 requests an hour.

This consistency of request traffic was unexpected. It is generally assumed, and often the case, that gray bot traffic comes in waves, hitting a website for a few minutes to an hour or so before falling back. Both scenarios — constant bombardment or unexpected, ad hoc traffic surges — present challenges for web applications.

Business impact

Gray bots can be aggressive when collecting data and may remove information without permission. Their activity can overwhelm web application traffic, disrupt operations, and gather vast volumes of proprietary creative or commercial data.

AI training models may violate the owners’ legal rights by scraping and subsequently using copyright-protected data.

Bots frequently scrape the web, increasing server load, which can degrade the performance of web applications and affect the user experience.

They can also increase application hosting costs due to increased cloud CPU use and bandwidth consumption.

Further, the presence of AI scraper bots can distort website analytics, making it challenging for organizations to track genuine behavior and make informed business decisions. Many web apps rely on tracking user behavior and popular workflows to make data-driven decisions. Generative AI bots can distort these metrics, leading to misleading insights and poor decision-making.

Data privacy risks exist. Some industries, such as healthcare and finance, may face compliance issues if their proprietary or customer data is scraped.

Last but not least, users and customers may lose trust in a platform if AI-generated content floods it or if their data is used without consent.

Shades of gray

The most prolific Gen AI gray bots detected in early 2025 include ClaudeBot and TikTok’s bot (Bytespider).

ClaudeBot

ClaudeBot is the most active Gen AI gray bot in our dataset by a considerable margin. It collects data to train Claude, a generative AI tool intended for widespread everyday use.

Their relentless requests will likely impact many of its targeted web applications. Anthropic, the company behind Claude, features content on its website explaining how ClaudeBot behaves and how to block scraper activity.

Such content also appears on the websites of some of the other gray bots spotted by Barracuda’s detection systems, including OpenAI/GPTbot and Google-Extended.

TikTok

TikTok is a short-form video hosting service with just over two billion users worldwide. It is owned by Chinese internet company ByteDance, which uses an AI scraper bot called Bytespider to train generative AI models. The data provides TikTok with insight into the latest user preferences and trends, helping to improve TikTok’s content recommendation engine and other AI-driven features, such as keyword searches for advertising. Bytespider has been reported as particularly aggressive and unscrupulous.

Two other generative AI scraper bots detected by Barracuda systems in late 2024/early 2025 were PerplexityBot and DeepSeekBot.

Keeping the gray bots out

The data suggests that gray bots, such as Gen AI bots, are now an everyday component of online bot traffic and are here to stay. It’s time for organizations to factor them into security strategies.

There are guidelines for websites and the companies behind generative AI bots. For example, websites can deploy robots.txt. This is a line of code added to the website that signals to a scraper that it should not take any of that site’s data.

Robots.txt is not legally binding. In addition, for robots.txt to be effective, the specific name of the scraper bot needs to be added. This paves the way for less scrupulous gray bots to ignore the robots.txt setting or to keep their scraper’s specific name confidential or change it regularly.

To ensure your web applications are protected against the impact of gray bots, consider implementing bot protection that is capable of detecting and blocking generative AI scraper bot activity.

For example, Barracuda Advanced Bot Protection leverages cutting-edge AI and machine learning technologies to address the unique threats posed by gray bots, with behavior-based detection, adaptive machine learning, comprehensive fingerprinting, and real-time blocking.

Generative AI bots are not just a passing trend — as our data shows, they’re now mainstream and persistent. The ethical, legal, and commercial debates around gray bots will continue for some time. In the meantime, with the right security tools in place, you have the reassurance of knowing that your data remains yours.

This article was originally published at Barracuda Blog. Learn more about current threat trends by reviewing past Threat Spotlight articles.

Photo: MMD Creative / Shutterstock

This post originally appeared on Smarter MSP.

Leave a Reply

Your email address will not be published. Required fields are marked *