It is coming to that time of year again for the Google Summer of Code, which allows students to gain real experience in a living breathing software project. PHP will be invloved again, and a list of ideas is being gathered.
Often I read about the evils of duplicate content in a site, and the wrath Google will supposedly bring on your site if you don’t go to great lengths to prevent it. So it was interesting to read a post by Vanessa Fox on the subject at the Google WebMaster Central blog.
To summarise the 3 main points:
- Google will work out unique content on a site
- having duplicate content on a site does not incur a penalty, but 1 version will be considered primary
- You will not automatically be banished to supplemental hell
Duplicate content is a natural occurrence with blog systems such as WordPress, and I think Google is smart enough to understand this.
Now if you steal all your content from other sites you will be in trouble.
Its not often that Google, Yahoo and MSN agree on something, but it has happened, and it should make a webmasters life a little easier. For a while now Google has had Sitemaps where you give them the location of an XML file that allows them to better spider your site. Well Yahoo and MSN have joined in, and will work off the same file format.
See the sitemaps.org site for more details of file structures and an faq. I assume a number of the smaller search sites will jump onto this pretty quickly.
Google & Yahoo already accept the files, while MSN will be public sometime in 2007.