Migrate domain to new domain? A script to verify URLs are redirecting correctly

When migrating a website’s domain to a new domain, it’s important to check that the URLs in the old domain’s sitemap are redirecting 1:1 to the corresponding pages on the new domain. This helps ensure that search engines continue to crawl and index the site’s pages, even after the domain has changed and visitors can reach the right page on the new domain.

To make this process easier, I’ve created a script that can automatically check the redirects of URLs in a sitemap file or URL based on old and new domains. The script takes two parameters: the file path or URL of the old sitemap and the new domain.

verify_sitemap_urls {path_or_url_to_old_sitemap_file} {new_domain}

Here is the script:

function verify_sitemap_urls() {
if [ -z "$1" ]; then
echo "Usage: check_sitemap [file_path/url] (new_domain)"
echo "If no new domain passed it will use the url from sitemap; if domain is passed then it will replace the domain in sitemap url with same path."
return
fi
if [[ "$1" == *"http"* ]]; then
TEMPFILE=$(mktemp)
# Download the sitemap file from the given URL
curl -s -o $TEMPFILE "$1"
else
# Relative path to absolute path in $TEMPFILE from where the function is called.
TEMPFILE=$(cd $(dirname $1) && pwd)/$(basename $1)
# check file exists.
if [ ! -f "$TEMPFILE" ]; then
echo "File not found: $TEMPFILE"
return
fi
fi
# Parse the sitemap file and extract all URLS_IN_SITEMAP
URLS_IN_SITEMAP=($(ggrep -oP '(?<=<loc>).*(?=</loc>)' $TEMPFILE))
# if url is passed remove the temp file.
if [[ "$1" == *"http"* ]]; then
rm $TEMPFILE
fi
# Send a request to each URL and check the HTTP status code
for URL_ENTRY in "${URLS_IN_SITEMAP[@]}"
do
# If $2 is empty, use the same domain as sitemap.
if [ -z "$2" ]; then
NEW_DOMAIN_URL=$URL_ENTRY
else
URL_PATH=$(echo $URL_ENTRY | sed -e 's|^.*://[^/]*||;s|\?.*$||')
NEW_DOMAIN_URL="$2$URL_PATH"
fi
http_status=$(curl -s -o /dev/null -w "%{http_code}" -L "$NEW_DOMAIN_URL")
if [ $http_status -eq 200 ]
then
echo -n "."
#echo "OK: $NEW_DOMAIN_URL"
else
echo -e "\nError: $NEW_DOMAIN_URL returned $http_status"
fi
done
}
view raw script.sh hosted with ❤ by GitHub

As a side effect this will also warm up the cache.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *