One of the best ways to tell a search engine about your websites content is to submit a sitemap. Sitemaps allow search engines to discover all of your sites pages and content by telling it all of your sites URLs that are available for crawling and indexing. Sitemaps come in various forms, but the main ones considered by search engines such as Google are XML, RSS and Atom formats. The main difference between these formats is that a standard XML Sitemap provides a full set of URLs for a site, whereas an RSS / Atom feed describes recent changes and updates to a site.
So which one should you be using? Google say both…..
Google recently released a post of their Webmaster Central Blog explaining that it’s a good idea to use both XML and RSS / Atom sitemaps.
For optimal crawling, we recommend using both XML sitemaps and RSS/Atom feeds. XML sitemaps will give Google information about all of the pages on your site. RSS/Atom feeds will provide all updates on your site, helping Google to keep your content fresher in its index. Note that submitting sitemaps or feeds does not guarantee the indexing of those URLs.
There are also some excellent ‘best practices’ listed to help you make the best use of your sitemap. The main piece of advice listed is to ensure that URLs listed in either an XML Sitemap or RSS / Atom feed include a ‘last modification’ date to ensure that the search engine knows when the last meaningful change was made to that page of your site allowing it to crawl your site more efficiently.
Another important point worth noting is regarding canonical URLs. A common mistake is to duplicate URLs in a sitemap by including both www. and non www. page addresses. An easy way to counter this is to use a server side 301 redirect to ensure users always reach the exact address you want them to, or alternatively you can set a preferred domain using your site settings in webmaster tools.
To read the full article, head over to the Webmaster Blog here, or to read up on canonical URLs head here.