MENU
Talk to us 01164 244 244 / 02035 877 877 Drop us a message Request a Callback

15 December 2011

SEO Search Engine Optimisation

How Search Engines Work

how search engines work

In an ideal world, web pages would be named and organised in a logical way. Of course, they aren’t (silly humans), which is why we owe a lot of thanks to search engines for allowing us to find the information we’re looking for within a very short space of time (we’re talking tenths of seconds). Despite their unassuming appearance, though, search engines are very complex pieces of virtual and physical engineering.

Search engines such as Google, Yahoo!, Bing, Ask, Baidu, and AOL among the many others work by carrying out three fundamental operations. Let’s take a look.

Crawling

It’s important to remember that web pages, after all, are just files that are stored on the internet. In order for them to be found, software known as spiders or crawlers have to go through all of the files (pages) on the internet. When they do this, they read the words on the page (known as ‘parsing’); record what the words are; and where they are positioned in the file. If there are any links on the page, they will follow the link to the page it is pointing to and continue to parse that information. As you can imagine, this can go on FOREVER…

Apart from simply recording the words on the page and their position, though, web crawlers also assign value/weight to the way the words appear in the title tag, H1 tag, main copy, and meta tags. This helps the engines to work out how relevant a particular page is to a subject. Once this information is gathered, it then has to be stored.

search engine indexationIndexation

The information collected by the web crawlers has to be stored in a logical way, otherwise there would be chaos. This is called indexation, and each search engine has a different method of indexing content from the internet. What they do have in common, however, is a process called hashing.

This is a way of encrypting the information about the words and assigning them a numerical value (a hash value) in order to make them easier to find when somebody searches for a term. There are many reasons why hashing is used. Primarily, however, it relates to the even distribution of numbers versus letters.

For example, there are many times more words beginning with the letter ‘S’ than there are beginning with the letter ‘X’. If the words were indexed according to alphabetical order, it would take much longer for the search engine to go through all the indexes relating to words beginning with ‘S’. With numbers, however, a 10-digit code would have an even distribution of numbers, and therefore all stored words should, theoretically, take the same amount of time to be retrieved from the index. What’s more, because the number is stored in a hash table along with a pointer to the actual information, the information recorded by the web crawlers can be stored in a very efficient way whilst remaining separate.

For professionals working in search engine optimisation, their job is to try to understand, through observation and testing, how each search engine values and assigns weight to all of the variables on a web page, and then optimise pages accordingly so that they are indexed as effectively and accurately as possible. If you ever hear someone refer to The Algorithm with worshipful reverence, they’re referring to Google’s infamously top-secret search calculations.

Returning Results

When a user searches for a term, the search engine will go through its indexes and look for the most appropriate results. For instance, if I type, “how do search engines work” into Google, the search engine will look for all the results which have this sequence of words together or are in close proximity to each other, e.g.

How AND do AND search AND engines AND work

As you can imagine, on their own these words do not carry much meaning, and using Boolean operators such as “AND” to build a longer search query is a very simplistic way of finding information on the internet, which is why search engines are now moving towards more conceptual-based ways of indexing information. For example, this means that search engines are able to use geographical information, natural language recognition, and personal information to try to understand the meaning/context behind a particular search. This helps when there is ambiguity. For instance, if you search for “sun bed”, it is unlikely you are looking for somewhere for our planet’s biggest star to sleep.

The order in which results are returned depends on several factors:

  • How the query is structured
  • How well a webpage is optimised
  • Your location
  • The quality of the links pointing to a webpage
  • The relevancy of the anchor text of the links pointing to the page
  • A lot more that keep SEOs on their toes

future of search

The Future of Search

With the advent of HTML5, we expect to see an increasing relationship between social networking profiles and the way content on the web is

prioritised. In particular, this will be helped along by the advent of the rel=author and rel=publisher tags which allow content creators to indicate to search engines the origins of a page’s content. Theoretically, if you are an influential social media user and your author details are attributed to a particular page, it could well help a page rank highly. The likes of social influence monitoring sites like Klout, therefore, will undoubtedly play a more prominent role in the way we approach search engine marketing and social media over the next year.

It seems impossible to talk about search engines without delving into the realm of search marketing. If you have any thoughts on the future of search; further information to add to this post; or want to give us a HIGH FIVE, leave a comment or chat with me on Twitter.

By Adam Cowlishaw

Archive

Or just call us on 0800 131 0707