HOW SEARCH ENGINES WORK

Search engines use software robots to survey the Web and build their databases.  Web documents are retrieved and indexed.  When you enter a query at a search engine website, your input is checked against the search engine’s keyword indices.  The best matches are then returned to you as hits. 

There are two primary methods of text searching – keywords and concept. 

KEYWORD SEARCHING 

This is the most common form of text search on the Web.  Most search engines do their text query and retrieval using keywords.  Unless the author of the Web document specifies the keywords for their document, (this is now possible by using meta tags [meta tags are keywords and descriptions that are placed inside the code of the website that is not actually visible on the site itself.] in the latest version of HTML), it’s up to the search engine to determine them.  Essentially, this means that search engines pull out and index words that are believed to be significant.  Words that are mentioned towards the top of a document and words that are repeated several times throughout the document are more likely to be deemed important.  Some sites index every word on every page.  Others index only part of the document.  For example:    

Lycos indexes the title, headings, subheadings and the hyperlinks to other sites, along with the first 20 lines of text and the 100 words that occur most often.   

Infoseek uses a full-text indexing system, picking up every word in the text except commonly occurring stop words such as “a,” “an,” “the,” “is,” “and,” “or,” and “www.”

Hotbot also ignores stop words. 

Alta Vista claims to index all words, even the articles, “a,” “an,” and “the.”            

Some of the search engines discriminate upper case from lower case; others store all words without reference to capitalization. 

THE PROBLEM WITH KEYWORD SEARCHING 

Keyword searches have a tough time distinguishing between words that are spelled the same way, but mean something different (i.e. hard cider, a hard stone, a hard exam, and the hard drive on your computer).  This often results in hits that are completely irrelevant to your query.  Some search engines also have trouble with so-called stemming, i.e. if you enter the word “big,” should they return a hit on the word, “bigger?”  What about singular and plural words?   What about verb tenses that differ from the word you entered by only an “s,” or an “ed?”  Search engines also cannot return hits on keywords that mean the same, but are not actually entered in your query.  E.g.:  A query on heart disease would not return a document that used the word “cardiac” instead of “heart.”   

CONCEPT-BASED SEARCHING

Unlike keyword search systems, concept-based search systems try to determine what you mean, not just what you say.  In the best circumstances, a concept-based search returns hits on documents that are “about” the subject / theme you’re exploring, even if the words in the document don’t precisely match the words you enter into the query.

Excite is currently the best-known general-purpose search engine site on the Web that relies on concept-based searching. 

This is also known as clustering – which essentially means that words are examined in relation to other words found nearby.  Excite sticks to a numerical approach.  Excite’s software determines meaning by calculating the frequency with which certain important words appear.  When several words or phrases that are tagged to signal a particular concept appear close to each other in a text, the search engine concludes, by statistical analysis, that the piece is “about” a certain subject. 

For example, the word heart, when used in the medical /health context, would be likely to appear with such words as coronary, artery, lung, stroke, cholesterol, pump, blood, attack and arteriosclerosis.  If the word heart appears in a document with other words such as flowers, candy, love, passion and valentine, a very different context is established, and the search engine returns hits on the subject of romance. 

WARNING:   This often works better in theory than in practice.  Concept-based indexing is a good idea, but it’s far from perfect.  The results are best when you enter a lot of words, all of which roughly refer to the concept you’re seeking information about. 

Now that we have determined the exact nature of the type of search done by the software of the different search-engines, we are adjusting and adding the necessary info within the HTML code to ensure success. 

At The Global Trade Centre we use specialised software to register each individual website with all the main and general search engines and link directories.   

We will do everything possible to make sure that your site places well in the search engines. The time required to be listed in each search engine varies. Some take as long as 8 weeks but the average time at present is 6 – 8 weeks.  However, due to the constantly changing algorithms of search engines, we cannot guarantee any specific results. It is understood that we have no control whatsoever on the acceptance policies of any search engine or directory. Some are quite restrictive, Yahoo for example.   Other problems like server downtime, software problems or Internet communication problems may also lead to the unsuccessful initial submission to search engines. 

Keep in mind that results will be influenced by the size of the market you are competing against. If you are in a very competitive market, i.e., the real estate business, it is very difficult to get a listing in the top 3-5 pages.   

Back to Business Manual