Adding Sitemap to a blog website based on the Astro framework

Install @astro/sitemap#

First, install the official sitemap package under Astro:

# Using NPM
npx astro add sitemap
# Using Yarn
yarn astro add sitemap
# Using PNPM
pnpm astro add sitemap

Press y and enter, Astro will automatically modify your configuration file astro.config.mjs to add sitemap packaging functionality.

It is recommended to double-check your configuration file. If the following code is present in the file, it means the configuration was successful. If not, please add it yourself:

import { defineConfig } from 'astro/config';
import sitemap from '@astrojs/sitemap';

export default defineConfig({
  // ...
  integrations: [sitemap()],
})

After the configuration is complete, package it. A file named sitemap-0.xml will be generated in the root directory. This is the sitemap file for your website. Submit it to search engines for indexing.

Add Baidu Verification#

Note: Site verification is required for sitemap indexing. Add your website to the Baidu Search Resource Platform and follow the verification process described in this detailed guide on site verification.

Select the appropriate site properties.

Choose file verification, download the verification file, and place it in the root directory of your website.

After packaging and deploying, click on the verification file to ensure it can be accessed normally.

Add Baidu Indexing#

In the Baidu Resource Search Platform - Normal Indexing, fill in the sitemap address of your website.

Add Google Verification#

Refer to the documentation on Google Search Central for more information.

To submit a sitemap, send a GET request to the following address in your browser or command line, specifying the complete URL of your sitemap. Make sure the sitemap file is accessible:

https://www.google.com/ping?sitemap=FULL_URL_OF_SITEMAP  # FULL_URL_OF_SITEMAP: Location of the sitemap file

For example, in your browser, directly enter the following link:

https://www.google.com/ping?sitemap=https://cirry.cn/sitemap-0.xml

The returned page will look like this:

Click on the link http://www.google.com/webmasters/tools/ on the page to redirect to Google Search Console.

Enter your website and verify it.

After successful verification, a prompt will appear asking you to add a verification method.

You can choose the method shown in the image below: download the HTML file and add it to the root directory of your website:

Alternatively, you can choose to add the following request header to the head of your website:

After adding it, re-verify in Google Search Console. If you see the image below, it means the addition is complete.

Add Google Indexing#

In Google Search Console, fill in the sitemap address of your website.

Issues Encountered#

After completing the normal operations, check if the Robots and Crawl Diagnostics of your website can be used normally in the Baidu Search Resource Platform.

I added the sitemap of the website in Normal Indexing on Baidu, but it failed to be indexed.

In the Crawl Diagnostics, I encountered a diagnostic error indicating robots.txt blocking. So I tested my website on xml-sitemaps to see if it could be scanned.

I found that this website couldn't detect my site either, which means the crawler protocol is restricting it. So I made some changes to my crawler protocol. The modified robots.txt is as follows:

User-agent: *
Allow: /
Sitemap: https://cirry.cn/sitemap-0.xml

After the modification, remember to click on the error in the diagnostic report and report the error message to Baidu.

After a few minutes, I crawled my website on xml-sitemaps successfully. Re-submitting it on Baidu will result in normal indexing.

Note: Baidu does not allow indexing-type sitemaps. Therefore, we should not include sitemap-index.xml, which is generated by @astro/sitemap, in the robots.txt file. Otherwise, Baidu will still prompt for Robots blocking, resulting in failure to crawl information.

If you encounter other issues, you can refer to Analysis of Common Error Types in Crawl Diagnostics.

You can use the following command to check whether it is blocked by robots or if there is an IP resolution error. If it returns HTTP 200, it means it is normal. Otherwise, it is abnormal. Remember to replace the last website link with your own website.

curl --head  --user-agent 'Mozilla/5.0 (compatible; Baiduspider/2.0; +<http://www.baidu.com/search/spider.html)>' --request GET 'https://cirry.cn'