One of the strengths of Verity is its ability to perform full-text searches on documents of many formats. However, there are often times when you want to restrict a search to certain portions of a document, to improve search relevance. If a Verity collection contains some documents about baseball and other documents about caves, a search for the word bat might retrieve several irrelevant results.
If the documents are structured documents, you can take advantage of the ability to search zones and fields. The following are some examples of structured documents:
You can perform zone searches on markup language documents. The Verity zone filter includes built-in support for HTML and several file formats; for a list of supported file formats, see Building a Search Interface. Verity searches XML files by treating the XML tags as zones. When you use the zone filter, the Verity engine builds zone information into the collection's full-word index. This index, enhanced with zone information, permits quick and efficient searches over zones. The zone filter can automatically define a zone, or you can define it yourself in the style.zon file. You can use zone searching to limit your search to a particular zone. This can produce more accurate, but not necessarily faster, search results than searching an entire file.
The following examples perform zone searching on XML files. In a list of rock bands, you could have XML files with tags for the instruments and for comments. In the following XML file, the word Pete appears in a comment field:
<band.xml> <Lead_Guitar>Dan</Lead_Guitar> <Rhythm_Guitar>Jake</Rhythm_Guitar> <Bass_Guitar>Mike</Bass_Guitar> <Drums>Chris</Drums> <COMMENT_A>Dan plays guitar, better than Pete.</COMMENT_A> <COMMENT_B>Jake plays rhythm guitar.</COMMENT_B> </band.xml>
The following CFML code shows a search for the word Pete:
<cfsearch name = "band_search" collection="my_collection" type = "simple" criteria="Pete">
The above search for Pete returns this XML file because this search target is in the COMMENT_A field. In contrast, Pete is the lead guitarist in the following XML file:
<band.xml> <Lead_Guitar>Pete</Lead_Guitar> <Rhythm_Guitar>Roger</Rhythm_Guitar> <Bass_Guitar>John</Bass_Guitar> <Drums>Kenny</Drums> <COMMENT_A>Who knows who's better than this band?</COMMENT_A> <COMMENT_B>Ticket prices correlated with decibels.</COMMENT_B> </band.xml>
To retrieve only the files in which Pete is the lead guitarist, perform a zone search using the IN operator according to the following syntax:
(query) <IN> (zone1, zone2, ...)
Thus, the following explicit search retrieves files in which Pete is the lead guitarist:
(Pete) <in> Lead_Guitar
This is expressed in CFML as follows:
<cfsearch name = "band_search" collection="my_collection" type = "explicit" criteria="(Pete) <in> Lead_Guitar">
To retrieve files in which Pete plays either lead or rhythm guitar, use the following explicit search:
(Pete) <in> (Lead_Guitar,Rhythm_Guitar)
This is expressed in CFML as follows:
<cfsearch name = "band_search" collection="bbb" type = "explicit" criteria="(Pete) <in> (Lead_Guitar,Rhythm_Guitar)">
Fields are extracted from the document and stored in the collection for retrieval and searching, and can be returned on a results list. Zones, on the other hand, are merely the definitions of "regions" of a document for searching purposes, and are not physically extracted from the document in the same way that fields are extracted.
You must define a region of text as a zone before it can be a field. Therefore, it can be only a zone, or it can be both a field and a zone. Whether you define a region of text as a zone only or as both a field and a zone depends on your particular requirements.
A field must be defined in the style file, style.ufl, before you create the collection. To map zones to fields (to display field data), you must define and add these extra fields to style.ufl.
You can specify the values for the cfindex attributes TITLE, KEY, and URL as document fields for use with relational operators in the criteria attribute. (The SCORE and SUMMARY attributes are automatically returned by a cfsearch; these attributes are different for each record of a collection as the search criteria changes.) Text comparison operators can reference the following document fields:
Text comparison operators can also reference the following automatically populated document fields:
To explore how to use document fields to refine a search, consider the following database table, named Calls. This table has four columns and three records, as the following table shows:
call_ID |
Problem_Description |
Short_Description |
Product |
---|---|---|---|
1 |
Can't bold text properly under certain conditions |
Bold Problem |
HomeSite+ |
2 |
Certain optional attributes are acting as required attributes |
Attributes Problem |
ColdFusion |
3 |
Can't do a File/Open in certain cases |
File Open Problem |
HomeSite+ |
A Verity search for the word certain returns three records. However, you can use the document fields to restrict your search; for example, a search to retrieve HomeSite+ problems with the word certain in the problem description.
These are the requirements to run this procedure:
The following table shows the relationship between the database column and cfindex attribute:
Database column |
The cfindex attribute |
Comment |
---|---|---|
call_ID |
key |
The primary key of a database table is often a key attribute. |
Problem_Description |
body |
This column is the information to be indexed. |
Short_Description |
title |
A short description is conceptually equivalent to a title, as in a running title of a journal article. |
Product |
custom1 |
This field refines the search. |
You begin by selecting all data in a query:
<cfquery name = "Calls" datasource = "MyDSN"> Select * from Calls </cfquery>
The following code shows the cfindex tag for indexing the collection (the type attribute is set to custom for tabular data):
<cfindex query = "Calls" collection = "training" action = "UPDATE" type = "CUSTOM" title = "Short_Description" key = "Call_ID" body = "Problem_Description" custom1 = "Product">
To perform the refined search for HomeSite+ problems with the word certain in the problem description, the cfsearch tag uses the CONTAINS operator in its criteria attribute:
<cfsearch collection = "training" name = "search_calls" criteria = "certain and CF_CUSTOM1 <CONTAINS> HomeSite">
The following code displays the results of the refined search:
<table border="1" cellspacing="5"> <tr> <th align="LEFT">KEY</th> <th align="LEFT">TITLE</th> <th align="LEFT">CUSTOM1</th> </tr> <cfoutput query = "search_calls"> <tr> <td>#KEY#</td> <td>#TITLE#</td> <td>#CUSTOM1#</td> </tr> </cfoutput> </table>