HOW SEARCH ENGINES WORKSearch engines
use software robots to survey the Web and build their databases. Web documents are
retrieved and indexed. When you enter a query at a search engine website, your input
is checked against the search engines keyword indices. The best matches are
then returned to you as hits. There are two
primary methods of text searching keywords and concept. KEYWORD
SEARCHING
This is the most common
form of text search on the Web. Most search engines do their text query and
retrieval using keywords. Unless the author of the Web document specifies the
keywords for their document, (this is now possible by using meta tags [meta tags are
keywords and descriptions that are placed inside the code of the website that is not
actually visible on the site itself.] in the latest version of HTML), its up to
the search engine to determine them. Essentially, this means that search engines
pull out and index words that are believed to be significant. Words that are
mentioned towards the top of a document and words that are repeated several times
throughout the document are more likely to be deemed important. Some sites index
every word on every page. Others index only part of the document. For example:
Lycos indexes
the title, headings, subheadings and the hyperlinks to other sites, along with the first
20 lines of text and the 100 words that occur most often. Infoseek uses a
full-text indexing system, picking up every word in the text except commonly occurring
stop words such as a, an, the, is,
and, or, and www. Hotbot also
ignores stop words. Alta Vista claims to index all words, even the articles, a, an, and the. Some of the search
engines discriminate upper case from lower case; others store all words without reference
to capitalization. THE PROBLEM
WITH KEYWORD SEARCHING
Keyword searches
have a tough time distinguishing between words that are spelled the same way, but mean
something different (i.e. hard cider, a hard stone, a hard exam, and the hard drive on
your computer). This often results in hits that are completely irrelevant to your
query. Some search engines also have trouble with so-called stemming, i.e. if you
enter the word big, should they return a hit on the word,
bigger? What about singular and plural words? What about verb
tenses that differ from the word you entered by only an s, or an
ed? Search engines also cannot return hits on keywords that mean the
same, but are not actually entered in your query. E.g.: A query on heart
disease would not return a document that used the word cardiac instead of
heart. CONCEPT-BASED SEARCHINGUnlike keyword
search systems, concept-based search systems try to determine what you mean, not just what
you say. In the best circumstances, a concept-based search returns hits on
documents that are about the subject / theme youre exploring, even if
the words in the document dont precisely match the words you enter into the query. Excite is
currently the best-known general-purpose search engine site on the Web that relies on
concept-based searching. This is also known as
clustering which essentially means that words are examined in relation to other
words found nearby. Excite sticks to a numerical approach. Excites
software determines meaning by calculating the frequency with which certain important
words appear. When several words or phrases that are tagged to signal a particular
concept appear close to each other in a text, the search engine concludes, by statistical
analysis, that the piece is about a certain subject. For example, the word
heart, when used in the medical /health context, would be likely to appear with such words
as coronary, artery, lung, stroke, cholesterol, pump, blood, attack and
arteriosclerosis. If the word heart appears in a document with other words such as
flowers, candy, love, passion and valentine, a very different context is established, and
the search engine returns hits on the subject of romance. WARNING: This
often works better in theory than in practice. Concept-based indexing is a good
idea, but its far from perfect. The results are best when you enter a lot of
words, all of which roughly refer to the concept youre seeking information
about. Now that we have
determined the exact nature of the type of search done by the software of the different
search-engines, we are adjusting and adding the necessary info within the HTML code to
ensure success. At The Global Trade
Centre we use specialised software to register each individual website with all the main
and general search engines and link directories. We will do everything
possible to make sure that your site places well in the search engines. The time required
to be listed in each search engine varies. Some take as long as 8 weeks but the average
time at present is 6 8 weeks. However, due to the constantly changing
algorithms of search engines, we cannot guarantee any specific results. It is understood
that we have no control whatsoever on the acceptance policies of any search engine or
directory. Some are quite restrictive, Yahoo for example. Other problems like
server downtime, software problems or Internet communication problems may also lead to the
unsuccessful initial submission to search engines. Keep in mind that
results will be influenced by the size of the market you are competing against. If you are
in a very competitive market, i.e., the real estate business, it is very difficult to get
a listing in the top 3-5 pages. |
|
|