Esentialist
Learning to Learn One Mistake at a Time

JSitemap Professional is a plugin to generate a Joomla sitemap and manage SEO.

The robots.txt file

Pointing out sitemaps to Search Engines and preventing overloading website with crawling requests

Before crawling a site, Google's crawlers download and parse the site's robots.txt file to extract information about which parts of the site may be crawled and where your sitemaps are stored.

  1. Setting up your file

  2. Upload the file robots.txt.dist from Joomla! to the root folder of your website
  3. Rename the file robots.txt
  4. In the JSitemap control panel on the backend of your website, select the Robots.txt Editor functionality
  5. Copy paste the following text and hit "Save robots.txt"
  • User-agent: *
  • User-agent: AdsBot-Google
  • Disallow: /administrator/
  • Disallow: /api/
  • Disallow: /bin/
  • Disallow: /cache/
  • Disallow: /cli/
  • Disallow: /components/
  • Disallow: /includes/
  • Disallow: /installation/
  • Disallow: /language/
  • Disallow: /layouts/
  • Disallow: /libraries/
  • Disallow: /logs/
  • Disallow: /modules/
  • Disallow: /plugins/
  • Disallow: /tmp/

Resources

  1. Introduction to robots.txt
  2. Create a robots.txt file
  3. How Google interprets the robots.txt specification
  • A robots.txt file lives at the root of your site.

  • It is mainly used to avoid overloading your site with requests, it is not a mechnism for keeping aweb page out of Google. To keep a web page out of Google, block indexing with noindex or password-protect the page. To keep information secure from web crawlers, use other blocking methods such as password-protecting private files on your server.
  • A robots.txt file consists of one or more groups:
    • Each group consists of multiple rules or directives, one directive per line. These are case-sensitive.
    • A group gives the following information:
      • "User-agent" : Who the group applies to. Note that you can group together rules that apply to multiple user agents by repeating user-agent lines for each crawler.
      • "Disallow" : Which directories or files that agent cannot access.
      • "Allow" : Which directories or files that agent can access. This is the default assumption for all user agents. You only use it when you wish to override a "Disallow" directive to allow crawling of a subdirectory page.
      • "Sitemap" : The location of a sitemap for the website. Sitemaps are the only directives that require the entire string (Ex: https://esentialist.com/").
    • Each group begins with a User-agent line that specifies the target of groups:
      • To specify all crawlers for your "rules" use syntax "User-agent: *". This syntax excludes AdsBot crawlers, which must be named explicitly with the syntax "User-agent: AdsBot-Google".
      • To specify Google as a "given crawler" use syntax "User-agent: Googlebot".
  • The "#" character marks the beginning of a comment.

Test your file

To test whether your newly uploaded robots.txt file is publicly accessible:

  1. open a private browsing window (or equivalent) in your browser and navigate to the location of the robots.txt file. For example, https://example.com/robots.txt. If you see the contents of your robots.txt file, you're ready to test the markup.
  2. Use Google's robots.txt Tester

Excluding specific images and videos folders from sitemaps

JSitemap Pro offers an advanced filtering system to include or exclude images/videos for each single data source, based on slices of string or paths that you can
specify comma separated:

  1. Global Configuration -> Sitemaps settings
  2. Exclude filters for Images sitemap : ex. favicons