How to optimize your sitemap.xml.gz for better SEO

Search engines are watching your website, though they can run a little slow at times. If you want them to stay up to date with what you post, .XML sitemaps are the best way to do so. Since with the controversial Panda update, having an up-to-date sitemap is even more important than ever, because it helps you prove that you are the original publisher of a disputed piece of content.

In theory, these files actually make a direct communication channel, always ensuring that crawlers get notified faster and more accurately about every content change that you make.

In practice, if your .XML files are not correctly generated and easy to crawl, your SEO score can bruise easily.

How so?

  • Large websites, with over 1000 dynamic URLs, can give a really hard time to a crawler;
  • If the crawler gets a timeout, it might as well give up on crawling your website;
  • When .XML files are not archived, the larger the sitemap, the longer the server time load, and the slower the entire website.

In a nutshell:

Generating compressed .XML.GZ sitemaps is essential for your website performance and for increasing the chances that all your URLs will be properly indexed.

Today’s blog post will, therefore, detail how to create sitemap .XML.GZ using PHP and submit it to the Google Search Console.

We start from the premise that if the website has a large number of links that keeps growing, you are going to have several dynamic sitemaps for the same website.

Here are the steps to follow:

1. Create the dynamic .XML sitemaps

Let’s just assume we need to work with two dynamic sitemaps. We are going to generate them in PHP, naming them sitemap.php respectively sitemap_second.php:

2. Compress the .XML sitemaps

Just to make sure we are periodically introducing new content into our sitemaps, we are going to archive the sitemaps as .GZ files, through a cronjob set to run once a day.

{ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL,$path); curl_setopt($ch, CURLOPT_FAILONERROR,1); curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt($ch, CURLOPT_TIMEOUT, 15); $retValue = curl_exec($ch); curl_close($ch); return $retValue; } $files_array = array(‘sitemap’,’sitemap_second’); foreach ($files_array as $file_name){ $folder = ‘sitemap_gz’; $file_output = $folder.’/’.$file_name.’.xml.gz’; if(file_exists($folder)){ if (file_exists($file_output)) { unlink($file_output); } }else{ mkdir($folder, 0777, true); } $data = download_page($URL.$file_name.’.php’); $gzdata = gzencode($data, 9); $fp = fopen($folder.”/”.$file_name.”.xml.gz”, “w”); fwrite($fp, $gzdata); fclose($fp); }


3. Do NOT submit to Google Search Console the archived links

You might be tempted to rush in and submit these archives’ links from your Google Webmasters account. We know for a fact that there’s a better way to go about it because we submitted these archives and it didn’t work as expected – the sitemaps simply weren’t valid. So the next, correct step is to:

4. Create a sitemap index from the compressed GZ XML sitemaps

Instead of submitting the links of the compressed .XML sitemaps, use only the link to the sitemap index:

5. List the sitemap index file in robots.txt

That’s pretty much the last thing you need to do if you want crawls to effectively index your new content.

Still confused? We’re eager to help.

Send us a message; let us run a proper SEO audit and we’ll make your life a lot easier!

Bogdan Rusu

Working in development for more then 10 years, Bogdan Rusu is the CTO of Design19 web agency. Highly skilled in PHP development and usage of CMS like Wordpress, Magento, Zend framework, but also custom built platforms based on PHP, Bogdan has driven the team to a higher level of proficiency in development.