Google has quietly up to date its record of user-triggered fetchers with new documentation for Google NotebookLM. The significance of this seemingly minor change is that it’s clear that Google NotebookLM is not going to obey robots.txt.
Google NotebookLM
NotebookLM is an AI analysis and writing instrument that permits customers so as to add an online web page URL, which can course of the content material after which allow them to ask a spread of questions and generate summaries based mostly on the content material.
Google’s instrument can robotically create an interactive thoughts map that organizes matters from an internet site and extracts takeaways from it.
Consumer-Triggered Fetchers Ignore Robots.txt
Google Consumer-Triggered Fetchers are internet brokers which might be triggered by customers and by default ignore the robots.txt protocol.
In response to Google’s Consumer-Triggered Fetchers documentation:
“As a result of the fetch was requested by a consumer, these fetchers typically ignore robots.txt guidelines.”
Google-NotebookLM Ignores Robots.txt
The aim of robots.txt is to provide publishers management over bots that index internet pages. However brokers just like the Google-NotebookLM fetcher aren’t indexing internet content material, they’re appearing on behalf of customers who’re interacting with the web site content material by way of Google’s NotebookLM.
How To Block NotebookLM
Google makes use of the Google-NotebookLM consumer agent when extracting web site content material. So, it’s doable for publishers wishing to dam customers from accessing their content material may create guidelines that robotically block that consumer agent. For instance, a easy resolution for WordPress publishers is to make use of Wordfence to create a customized rule to dam all web site guests which might be utilizing the Google-NotebookLM consumer agent.
One other option to do it’s with .htaccess utilizing the next rule:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Google-NotebookLM [NC]
RewriteRule .* – [F,L]

