Excluding content from the index
From phpCMS
[edit] Excluding content from the index
[edit] Exclusion of words
All three indexers allow excluding certain words from being integrated into the index by means of so called stop word files (stop.db.
There is a global stop word file (delivered with the phpCMS package and located in the /parser/include directory) that comprises for example all articles and conjunctions and other "filling" word insignificant for the search, in German and English language. It can be extended by the user. This is valid for all projects (web sites) handled by that parser
In addition you can create yourself a local stop word file, that also has to be called stop.db, and put it in the search directory /suche or /search located under your project home directory. This is only valid for that project. The stop words from this file would be ignored in addition to the stop words from the global stop word file when building an index.
(The search directory is defined in the search form of a content page with a search function by the tag '<input type="hidden" name="datadir" value="$home/SearchDir" />'.)
Furthermore there is a so called "nono file" (also located in the /parser/include directory and thus valid globally, to be filled by the user) comprising all those words that one wants to ban from search (even if they do appear in the pages and are included in the index).
The stop word files and the nono file are simple text files and can be edited by using any text editor. All stop words or nono words have to be entered in small letters, one word per line. The words (lines) should be sorted alphabetically.
[edit] Exclusion of page areas
(since phpCMS 1.2.0)
If you want to exclude text areas in your templates or content pages from indexing, you can do so by enclosing these areas (it can be more than one per page) in the <phpcms:noindex>exclusion tags</phpcms:noindex>.
The file indexer as well as the HTTP-indexer will not index the content between these tags. But the HTTP-indexer will follow links inside these tags, so it can be used for example to exclude the text inside the menupart of a page from getting indexed, but the indexer will still follow the links inside this menu.
If you want to prevent the HTTP-indexer from following a link inside a page, you can include it inside <phpcms:nofollow>....</phpcms:nofollow> tags. Link inside a block surrounded by these tags won't be followed by the indexer, but the test inside the block will be indexed for the fulltext search.
Therefore if you want a part of a page neither to be indexed nor should the HTTP-indexer follow the links inside, you have to surrond this block by <phpcms:nofollow><phpcms:noindex>....</phpcms:noindex></phpcms:nofollow>.
As the HTTP-indexer only sees the final ouput of phpCMS and not the sourcecode of the contentfile/templates, the phpCMS parser replaces the noindex/nofollow tags with HTML-comments (<!-- PHPCMS_NOINDEX -->...<!-- /PHPCMS_NOINDEX --> or <!-- PHPCMS_NOFOLLOW -->...<!-- /PHPCMS_NOFOLLOW -->). Therefore the HTTP-indexer still gets the information of which parts to be ignored while keeping the output valid HTML.
Of course if the HTTP-indexer should be used to index static HTML pages these HTML-comments can be inserted manually to the files.
The opening and the closing tags (so e.g. <phpcms:nofollow> and </phpcms:nofollow>) do not have to be in the same (sub-)template file. So you could place the <phpcms:nofollow> tag in a header subtemplate file and the </phpcms:nofollow> in a footer subtemplate file.
Be carefull: phpCMS itself does not make any checks to see, if all opened nofollow/noindex tags are closed properly, but the indexer will ignore all noindex/nofollow tags which do not have a corresponding openig/closing tag.

