LuceneContribQuery.dtd: Elements - Entities - Source | Intro - Index
FRAMES / NO FRAMES

Contrib Lucene

This DTD builds on the core Lucene XML syntax and adds support for features found in the "contrib" section of the Lucene project.

CorePlusExtensionsParser.java is the Java class that encapsulates this parser behaviour.

The features added are:



<BooleanQuery> Child of BoostQuery, Clause, CachedFilter, Query

BooleanQuerys implement Boolean logic which controls how multiple Clauses should be interpreted. Some clauses may represent optional Query criteria while others represent mandatory criteria.

Example: Find articles about banks, preferably talking about mergers but nothing to do with "sumitomo"

	          
            <BooleanQuery fieldName="contents">
	             <Clause occurs="should">
		              <TermQuery>merger</TermQuery>
	             </Clause>
	             <Clause occurs="mustnot">
		              <TermQuery>sumitomo</TermQuery>
	             </Clause>
	             <Clause occurs="must">
		              <TermQuery>bank</TermQuery>
	             </Clause>
            </BooleanQuery>

	         

<BooleanQuery>'s children
NameCardinality
ClauseAt least one
<BooleanQuery>'s attributes
NameValuesDefault
boost1.0
disableCoordtrue, falsefalse
fieldName
minimumNumberShouldMatch0
Element's model:

(Clause)+


@boost Attribute of BooleanQuery

Optional boost for matches on this query. Values > 1

Default value: 1.0


@fieldName Attribute of BooleanQuery

fieldName can optionally be defined here as a default attribute used by all child elements


@disableCoord Attribute of BooleanQuery

The "Coordination factor" rewards documents that contain more of the optional clauses in this list. This flag can be used to turn off this factor.

Possible values: true, false - Default value: false


@minimumNumberShouldMatch Attribute of BooleanQuery

The minimum number of optional clauses that should be present in any one document before it is considered to be a match.

Default value: 0


<Clause> Child of BooleanFilter, BooleanQuery

NOTE: "Clause" tag has 2 modes of use - inside <BooleanQuery> in which case only "query" types can be child elements - while in a <BooleanFilter> clause only "filter" types can be contained.

<Clause>'s children
NameCardinality
BooleanFilterOne or none
BooleanQueryOne or none
BoostingQueryOne or none
BoostingTermQueryOne or none
CachedFilterOne or none
ConstantScoreQueryOne or none
DuplicateFilterOne or none
FilteredQueryOne or none
FuzzyLikeThisQueryOne or none
LikeThisQueryOne or none
MatchAllDocsQueryOne or none
NumericRangeFilterOne or none
NumericRangeQueryOne or none
RangeFilterOne or none
SpanFirstOne or none
SpanNearOne or none
SpanNotOne or none
SpanOrOne or none
SpanOrTermsOne or none
SpanTermOne or none
TermQueryOne or none
TermsFilterOne or none
TermsQueryOne or none
UserQueryOne or none
<Clause>'s attributes
NameValuesDefault
occursshould, must, mustnotshould
Element's model:

(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | BoostingTermQuery | NumericRangeQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | BoostingTermQuery | LikeThisQuery | BoostingQuery | FuzzyLikeThisQuery | RangeFilter | NumericRangeFilter | CachedFilter | TermsFilter | BooleanFilter | DuplicateFilter)


@occurs Attribute of Clause

Controls if the clause is optional (should), mandatory (must) or unacceptable (mustNot)

Possible values: should, must, mustnot - Default value: should


<CachedFilter> Child of Clause, Filter, ConstantScoreQuery

Caches any nested query or filter in an LRU (Least recently used) Cache. Cached queries, like filters, are turned into Bitsets at a cost of 1 bit per document in the index. The memory cost of a cached query/filter is therefore numberOfDocsinIndex/8 bytes. Queries that are cached as filters obviously retain none of the scoring information associated with results - they retain just a Boolean yes/no record of which documents matched.

Example: Search for documents about banks from the last 10 years - caching the commonly-used "last 10 year" filter as a BitSet in RAM to eliminate the cost of building this filter from disk for every query

	          
            <FilteredQuery>
               <Query>
                  <UserQuery>bank</UserQuery>
               </Query>	
               <Filter>
                  <CachedFilter>
                     <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/>
                  </CachedFilter>
               </Filter>	
            </FilteredQuery>
	         

<CachedFilter>'s children
NameCardinality
BooleanFilterOne or none
BooleanQueryOne or none
BoostingQueryOne or none
BoostingTermQueryOne or none
CachedFilterOne or none
ConstantScoreQueryOne or none
DuplicateFilterOne or none
FilteredQueryOne or none
FuzzyLikeThisQueryOne or none
LikeThisQueryOne or none
MatchAllDocsQueryOne or none
NumericRangeFilterOne or none
NumericRangeQueryOne or none
RangeFilterOne or none
SpanFirstOne or none
SpanNearOne or none
SpanNotOne or none
SpanOrOne or none
SpanOrTermsOne or none
SpanTermOne or none
TermQueryOne or none
TermsFilterOne or none
TermsQueryOne or none
UserQueryOne or none
Element's model:

(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | BoostingTermQuery | NumericRangeQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | BoostingTermQuery | LikeThisQuery | BoostingQuery | FuzzyLikeThisQuery | RangeFilter | NumericRangeFilter | CachedFilter | TermsFilter | BooleanFilter | DuplicateFilter)


<UserQuery> Child of BoostQuery, Clause, CachedFilter, Query

Passes content directly through to the standard LuceneQuery parser see "Lucene Query Syntax"

Example: Search for documents about John Smith or John Doe using standard LuceneQuerySyntax

	          
               <UserQuery>"John Smith" OR "John Doe"</UserQuery>
	         

<UserQuery>'s attributes
NameValuesDefault
boost1.0
fieldName

@boost Attribute of UserQuery

Optional boost for matches on this query. Values > 1

Default value: 1.0


@fieldName Attribute of UserQuery

fieldName can optionally be defined here to change the default field used in the QueryParser


<MatchAllDocsQuery/> Child of BoostQuery, Clause, CachedFilter, Query

A query which is used to match all documents. This has a couple of uses:

  1. as a Clause in a BooleanQuery who's only other clause is a "mustNot" match (Lucene requires at least one positive clause) and..
  2. in a FilteredQuery where a Filter tag is effectively being used to select content rather than it's usual role of filtering the results of a query.

Example: Effectively use a Filter as a query

	          
               <FilteredQuery>
                 <Query>
                    <MatchAllDocsQuery/>
                 </Query>
                 <Filter>
                     <RangeFilter fieldName="date" lowerTerm="19870409" upperTerm="19870412"/>
                 </Filter>	
               </FilteredQuery>	         
	       

This element is always empty.


<TermQuery> Child of BoostQuery, Clause, CachedFilter, Query

a single term query - no analysis is done of the child text

Example: Match on a primary key

	          
               <TermQuery fieldName="primaryKey">13424</TermQuery>
	       

<TermQuery>'s attributes
NameValuesDefault
boost1.0
fieldName

@boost Attribute of TermQuery

Optional boost for matches on this query. Values > 1

Default value: 1.0


@fieldName Attribute of TermQuery

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute


<BoostingTermQuery> Child of BoostQuery, Clause, SpanFirst, Include, CachedFilter, SpanOr, SpanNear, Exclude, Query

A boosted term query - no analysis is done of the child text. Also a span member.

(Text below is copied from the javadocs of BoostingTermQuery)

The BoostingTermQuery is very similar to the {


@boost Attribute of TermQuery

Optional boost for matches on this query. Values > 1

Default value: 1.0


@fieldName Attribute of TermQuery

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute


<TermsQuery> Child of BoostQuery, Clause, CachedFilter, Query

The equivalent of a BooleanQuery with multiple optional TermQuery clauses. Child text is analyzed using a field-specific choice of Analyzer to produce a set of terms that are ORed together in Boolean logic. Unlike UserQuery element, this does not parse any special characters to control fuzzy/phrase/boolean logic and as such is incapable of producing a Query parse error given any user input

Example: Match on text from a database description (which may contain characters that are illegal characters in the standard Lucene Query syntax used in the UserQuery tag

	          
               <TermsQuery fieldName="description">Smith & Sons (Ltd) : incorporated 1982</TermsQuery>
	       

<TermsQuery>'s attributes
NameValuesDefault
boost1.0
disableCoordtrue, falsefalse
fieldName
minimumNumberShouldMatch0

@boost Attribute of TermsQuery

Optional boost for matches on this query. Values > 1

Default value: 1.0


@fieldName Attribute of TermsQuery

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute


@disableCoord Attribute of TermsQuery

The "Coordination factor" rewards documents that contain more of the terms in this list. This flag can be used to turn off this factor.

Possible values: true, false - Default value: false


@minimumNumberShouldMatch Attribute of TermsQuery

The minimum number of terms that should be present in any one document before it is considered to be a match.

Default value: 0


<FilteredQuery> Child of BoostQuery, Clause, CachedFilter, Query

Runs a Query and filters results to only those query matches that also match the Filter element.

Example: Find all documents about Lucene that have a status of "published"

	          
               <FilteredQuery>
                 <Query>
                    <UserQuery>Lucene</UserQuery>
                 </Query>
                 <Filter>
                     <TermsFilter fieldName="status">published</TermsFilter>
                 </Filter>	
               </FilteredQuery>	         
	       

<FilteredQuery>'s children
NameCardinality
FilterOnly one
QueryOnly one
<FilteredQuery>'s attributes
NameValuesDefault
boost1.0
Element's model:

(Query, Filter)


@boost Attribute of FilteredQuery

Optional boost for matches on this query. Values > 1

Default value: 1.0


<Query> Child of FilteredQuery, BoostingQuery

Used to identify a nested Query element inside another container element. NOT a top-level query tag

<Query>'s children
NameCardinality
BooleanQueryOne or none
BoostingQueryOne or none
BoostingTermQueryOne or none
ConstantScoreQueryOne or none
FilteredQueryOne or none
FuzzyLikeThisQueryOne or none
LikeThisQueryOne or none
MatchAllDocsQueryOne or none
NumericRangeQueryOne or none
SpanFirstOne or none
SpanNearOne or none
SpanNotOne or none
SpanOrOne or none
SpanOrTermsOne or none
SpanTermOne or none
TermQueryOne or none
TermsQueryOne or none
UserQueryOne or none
Element's model:

(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | BoostingTermQuery | NumericRangeQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | BoostingTermQuery | LikeThisQuery | BoostingQuery | FuzzyLikeThisQuery)


<Filter> Child of FilteredQuery

The choice of Filter that MUST also be matched

<Filter>'s children
NameCardinality
BooleanFilterOne or none
CachedFilterOne or none
DuplicateFilterOne or none
NumericRangeFilterOne or none
RangeFilterOne or none
TermsFilterOne or none
Element's model:

(RangeFilter | NumericRangeFilter | CachedFilter | TermsFilter | BooleanFilter | DuplicateFilter)


<RangeFilter/> Child of Clause, Filter, CachedFilter, ConstantScoreQuery

Filter used to limit query results to documents matching a range of field values

Example: Search for documents about banks from the last 10 years

	          
            <FilteredQuery>
               <Query>
                  <UserQuery>bank</UserQuery>
               </Query>	
               <Filter>
                     <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/>
               </Filter>	
            </FilteredQuery>
	         

<RangeFilter>'s attributes
NameValuesDefault
fieldName
includeLowertrue, falsetrue
includeUppertrue, falsetrue
lowerTerm
upperTerm

This element is always empty.


@fieldName Attribute of RangeFilter

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute


@lowerTerm Attribute of RangeFilter

The lower-most term value for this field (must be <= upperTerm)

Required


@upperTerm Attribute of RangeFilter

The upper-most term value for this field (must be >= lowerTerm)

Required


@includeLower Attribute of RangeFilter

Controls if the lowerTerm in the range is part of the allowed set of values

Possible values: true, false - Default value: true


@includeUpper Attribute of RangeFilter

Controls if the upperTerm in the range is part of the allowed set of values

Possible values: true, false - Default value: true


<NumericRangeQuery/> Child of BoostQuery, Clause, CachedFilter, Query

A Query that matches numeric values within a specified range.

Example: Search for documents about people who are aged 20-25

	          
            <NumericRangeQuery fieldName="age" lowerTerm="20" upperTerm="25" />
	         

<NumericRangeQuery>'s attributes
NameValuesDefault
fieldName
includeLowertrue, falsetrue
includeUppertrue, falsetrue
lowerTerm
precisionStep4
typeint, long, float, doubleint
upperTerm

This element is always empty.


@fieldName Attribute of NumericRangeQuery

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute


@lowerTerm Attribute of NumericRangeQuery

The lower-most term value for this field (must be <= upperTerm and a valid native java numeric type)

Required


@upperTerm Attribute of NumericRangeQuery

The upper-most term value for this field (must be >= lowerTerm and a valid native java numeric type)

Required


@type Attribute of NumericRangeQuery

The numeric type of this field

Possible values: int, long, float, double - Default value: int


@includeLower Attribute of NumericRangeQuery

Controls if the lowerTerm in the range is part of the allowed set of values

Possible values: true, false - Default value: true


@includeUpper Attribute of NumericRangeQuery

Controls if the upperTerm in the range is part of the allowed set of values

Possible values: true, false - Default value: true


@precisionStep Attribute of NumericRangeQuery

Lower step values mean more precisions and so more terms in index (and index gets larger). This value must be an integer

Default value: 4


<NumericRangeFilter/> Child of Clause, Filter, CachedFilter, ConstantScoreQuery

A Filter that only accepts numeric values within a specified range

Example: Search for documents about people who are aged 20-25

	          
            <FilteredQuery>
               <Query>
                  <UserQuery>person</UserQuery>
               </Query>	
               <Filter>
                     <NumericRangeFilter fieldName="age" lowerTerm="20" upperTerm="25"/>
               </Filter>	
            </FilteredQuery>
	         

<NumericRangeFilter>'s attributes
NameValuesDefault
fieldName
includeLowertrue, falsetrue
includeUppertrue, falsetrue
lowerTerm
precisionStep4
typeint, long, float, doubleint
upperTerm

This element is always empty.


@fieldName Attribute of NumericRangeFilter

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute


@lowerTerm Attribute of NumericRangeFilter

The lower-most term value for this field (must be <= upperTerm and a valid native java numeric type)

Required


@upperTerm Attribute of NumericRangeFilter

The upper-most term value for this field (must be >= lowerTerm and a valid native java numeric type)

Required


@type Attribute of NumericRangeFilter

The numeric type of this field

Possible values: int, long, float, double - Default value: int


@includeLower Attribute of NumericRangeFilter

Controls if the lowerTerm in the range is part of the allowed set of values

Possible values: true, false - Default value: true


@includeUpper Attribute of NumericRangeFilter

Controls if the upperTerm in the range is part of the allowed set of values

Possible values: true, false - Default value: true


@precisionStep Attribute of NumericRangeFilter

Lower step values mean more precisions and so more terms in index (and index gets larger). This value must be an integer

Default value: 4


<SpanTerm> Child of BoostQuery, Clause, SpanFirst, Include, CachedFilter, SpanOr, SpanNear, Exclude, Query

A single term used in a SpanQuery. These clauses are the building blocks for more complex "span" queries which test word proximity

Example: Find documents using terms close to each other about mining and accidents

	      <SpanNear slop="8" inOrder="false" fieldName="text">		
			<SpanOr>
				<SpanTerm>killed</SpanTerm>
				<SpanTerm>died</SpanTerm>
				<SpanTerm>dead</SpanTerm>
			</SpanOr>
			<SpanOr>
				<SpanTerm>miner</SpanTerm>
				<SpanTerm>mining</SpanTerm>
				<SpanTerm>miners</SpanTerm>
			</SpanOr>
	      </SpanNear>
	      

<SpanTerm>'s attributes
NameValuesDefault
fieldName

@fieldName Attribute of SpanTerm

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute

Required


<SpanOrTerms> Child of BoostQuery, Clause, SpanFirst, Include, CachedFilter, SpanOr, SpanNear, Exclude, Query

A field-specific analyzer is used here to parse the child text provided in this tag. The SpanTerms produced are ORed in terms of Boolean logic

Example: Use SpanOrTerms as a more convenient/succinct way of expressing multiple choices of SpanTerms. This example looks for reports using words describing a fatality near to references to miners

	      <SpanNear slop="8" inOrder="false" fieldName="text">		
			<SpanOrTerms>killed died death dead deaths</SpanOrTerms>
			<SpanOrTerms>miner mining miners</SpanOrTerms>
	      </SpanNear>
	      

<SpanOrTerms>'s attributes
NameValuesDefault
fieldName

@fieldName Attribute of SpanOrTerms

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute

Required


<SpanOr> Child of BoostQuery, Clause, SpanFirst, Include, CachedFilter, SpanNear, Exclude, Query

Takes any number of child queries from the Span family

Example: Find documents using terms close to each other about mining and accidents

	      <SpanNear slop="8" inOrder="false" fieldName="text">		
			<SpanOr>
				<SpanTerm>killed</SpanTerm>
				<SpanTerm>died</SpanTerm>
				<SpanTerm>dead</SpanTerm>
			</SpanOr>
			<SpanOr>
				<SpanTerm>miner</SpanTerm>
				<SpanTerm>mining</SpanTerm>
				<SpanTerm>miners</SpanTerm>
			</SpanOr>
	      </SpanNear>
	      

<SpanOr>'s children
NameCardinality
BoostingTermQueryAny number
SpanFirstAny number
SpanNearAny number
SpanNotAny number
SpanOrAny number
SpanOrTermsAny number
SpanTermAny number
Element's model:

(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | BoostingTermQuery)*


<SpanNear> Child of BoostQuery, Clause, SpanFirst, Include, CachedFilter, SpanOr, Exclude, Query

Takes any number of child queries from the Span family and tests for proximity

<SpanNear>'s children
NameCardinality
BoostingTermQueryAny number
SpanFirstAny number
SpanNearAny number
SpanNotAny number
SpanOrAny number
SpanOrTermsAny number
SpanTermAny number
<SpanNear>'s attributes
NameValuesDefault
inOrdertrue, falsetrue
slop
Element's model:

(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | BoostingTermQuery)*


@slop Attribute of SpanNear

defines the maximum distance between Span elements where distance is expressed as word number, not byte offset

Example: Find documents using terms within 8 words of each other talking about mining and accidents

	      <SpanNear slop="8" inOrder="false" fieldName="text">		
			<SpanOr>
				<SpanTerm>killed</SpanTerm>
				<SpanTerm>died</SpanTerm>
				<SpanTerm>dead</SpanTerm>
			</SpanOr>
			<SpanOr>
				<SpanTerm>miner</SpanTerm>
				<SpanTerm>mining</SpanTerm>
				<SpanTerm>miners</SpanTerm>
			</SpanOr>
	      </SpanNear>
	      

Required


@inOrder Attribute of SpanNear

Controls if matching terms have to appear in the order listed or can be reversed

Possible values: true, false - Default value: true


<SpanFirst> Child of BoostQuery, Clause, Include, CachedFilter, SpanOr, SpanNear, Exclude, Query

Looks for a SpanQuery match occuring near the beginning of a document

Example: Find letters where the first 50 words talk about a resignation:

	          
	         <SpanFirst end="50">
	               <SpanOrTerms fieldName="text">resigning resign leave</SpanOrTerms>
	         </SpanFirst>
	         

<SpanFirst>'s children
NameCardinality
BoostingTermQueryOne or none
SpanFirstOne or none
SpanNearOne or none
SpanNotOne or none
SpanOrOne or none
SpanOrTermsOne or none
SpanTermOne or none
<SpanFirst>'s attributes
NameValuesDefault
boost1.0
end
Element's model:

(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | BoostingTermQuery)


@end Attribute of SpanFirst

Controls the end of the region considered in a document's field (expressed in word number, not byte offset)

Required


@boost Attribute of SpanFirst

Optional boost for matches on this query. Values > 1

Default value: 1.0


<SpanNot> Child of BoostQuery, Clause, SpanFirst, Include, CachedFilter, SpanOr, SpanNear, Exclude, Query

Finds documents matching a SpanQuery but not if matching another SpanQuery

Example: Find documents talking about social services but not containing the word "public"

          <SpanNot fieldName="text">
             <Include>
                <SpanNear slop="2" inOrder="true">		
                     <SpanTerm>social</SpanTerm>
                     <SpanTerm>services</SpanTerm>
                </SpanNear>				
             </Include>
             <Exclude>
                <SpanTerm>public</SpanTerm>
             </Exclude>
          </SpanNot>
	      

<SpanNot>'s children
NameCardinality
ExcludeOnly one
IncludeOnly one
Element's model:

(Include, Exclude)


<Include> Child of SpanNot

The SpanQuery to find

<Include>'s children
NameCardinality
BoostingTermQueryOne or none
SpanFirstOne or none
SpanNearOne or none
SpanNotOne or none
SpanOrOne or none
SpanOrTermsOne or none
SpanTermOne or none
Element's model:

(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | BoostingTermQuery)


<Exclude> Child of SpanNot

The SpanQuery to be avoided

<Exclude>'s children
NameCardinality
BoostingTermQueryOne or none
SpanFirstOne or none
SpanNearOne or none
SpanNotOne or none
SpanOrOne or none
SpanOrTermsOne or none
SpanTermOne or none
Element's model:

(SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | BoostingTermQuery)


<ConstantScoreQuery> Child of BoostQuery, Clause, CachedFilter, Query

a utility tag to wrap any filter as a query

Example: Find all documents from the last 10 years

     <ConstantScoreQuery>
           <RangeFilter fieldName="date" lowerTerm="19970101" upperTerm="20070101"/>
     </ConstantScoreQuery>	
	

<ConstantScoreQuery>'s children
NameCardinality
BooleanFilterAny number
CachedFilterAny number
DuplicateFilterAny number
NumericRangeFilterAny number
RangeFilterAny number
TermsFilterAny number
<ConstantScoreQuery>'s attributes
NameValuesDefault
boost1.0
Element's model:

(RangeFilter | NumericRangeFilter | CachedFilter | TermsFilter | BooleanFilter | DuplicateFilter)*


@boost Attribute of ConstantScoreQuery

Optional boost for matches on this query. Values > 1

Default value: 1.0


<FuzzyLikeThisQuery> Child of BoostQuery, Clause, CachedFilter, Query

Performs fuzzy matching on "significant" terms in fields. Improves on "LikeThisQuery" by allowing for fuzzy variations of supplied fields. Improves on FuzzyQuery by rewarding all fuzzy variants of a term with the same IDF rather than default fuzzy behaviour which ranks rarer variants (typically misspellings) more highly. This can be a useful default search mode for processing user input where the end user is not expected to know about the standard query operators for fuzzy, boolean or phrase logic found in UserQuery

Example: Search for information about the Sumitomo bank, where the end user has mis-spelt the name

	          
            <FuzzyLikeThisQuery>
                <Field fieldName="contents">
		             Sumitimo bank
	            </Field>
            </FuzzyLikeThisQuery>
	         

<FuzzyLikeThisQuery>'s children
NameCardinality
FieldAny number
<FuzzyLikeThisQuery>'s attributes
NameValuesDefault
boost1.0
ignoreTFtrue, falsefalse
maxNumTerms50
Element's model:

(Field)*


@boost Attribute of FuzzyLikeThisQuery

Optional boost for matches on this query. Values > 1

Default value: 1.0


@maxNumTerms Attribute of FuzzyLikeThisQuery

Limits the total number of terms selected from the provided text plus the selected "fuzzy" variants

Default value: 50


@ignoreTF Attribute of FuzzyLikeThisQuery

Ignore "Term Frequency" - a boost factor which rewards multiple occurences of the same term in a document

Possible values: true, false - Default value: false


<Field> Child of FuzzyLikeThisQuery

A field used in a FuzzyLikeThisQuery

<Field>'s attributes
NameValuesDefault
fieldName
minSimilarity0.5
prefixLength1

@minSimilarity Attribute of Field

Controls the level of similarity required for fuzzy variants where 1 is identical and 0.5 is that the variant contains half of the original's characters in the same order. Lower values produce more results but may take longer to execute due to additional IO required to read matching document ids

Default value: 0.5


@prefixLength Attribute of Field

Controls the minimum number of characters at the start of fuzzy variant words that must exactly match the original. A value of zero will require no minimum and the search software will effectively scan ALL terms from a to z looking for variations. This can incur high CPU overhead and a prefix length of just "1" will reduce this overhead to 1/26th of the original cost (assuming an even distribution of letters used from the alphabet).

Default value: 1


@fieldName Attribute of Field

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute


<LikeThisQuery> Child of BoostQuery, Clause, CachedFilter, Query

Cherry-picks "significant" terms from the example child text and queries using these words. By only using significant (read: rare) terms the performance cost of the query is substantially reduced and large bodies of text can be used as example content.

Example: Use a block of text as an example of the type of content to be found, ignoring the "Reuters" word which appears commonly in the index.

            <LikeThisQuery percentTermsToMatch="5" stopWords="Reuters">
                IRAQI TROOPS REPORTED PUSHING BACK IRANIANS Iraq said today its troops were pushing Iranian forces out of 
                positions they had initially occupied when they launched a new offensive near the southern port of 
                Basra early yesterday.     A High Command communique said Iraqi troops had won a significant victory 
                and were continuing to advance.     Iraq said it had foiled a three-pronged thrust some 10 km 
                (six miles) from Basra, but admitted the Iranians had occupied ground held by the Mohammed al-Qassem 
                unit, one of three divisions attacked.     The communique said Iranian Revolutionary Guards were under 
                assault from warplanes, helicopter gunships, heavy artillery and tanks.     "Our forces are continuing 
                their advance until they purge the last foothold" occupied by the Iranians, it said.     
                (Iran said its troops had killed or wounded more than 4,000 Iraqis and were stabilising their new positions.)     
                The Baghdad communique said Iraqi planes also destroyed oil installations at Iran's southwestern Ahvaz field 
                during a raid today. It denied an Iranian report that an Iraqi jet was shot down.     
                Iraq also reported a naval battle at the northern tip of the Gulf. Iraqi naval units and forces defending an 
                offshore terminal sank six Iranian out of 28 Iranian boats attempting to attack an offshore terminal, 
                the communique said.      Reuters 3;
            </LikeThisQuery>	         
	        

<LikeThisQuery>'s attributes
NameValuesDefault
boost1.0
fieldNames
maxQueryTerms20
minTermFrequency1
percentTermsToMatch30
stopWords

@boost Attribute of LikeThisQuery

Optional boost for matches on this query. Values > 1

Default value: 1.0


@fieldNames Attribute of LikeThisQuery

Comma delimited list of field names


@stopWords Attribute of LikeThisQuery

a list of stop words - analyzed to produce stop terms


@maxQueryTerms Attribute of LikeThisQuery

controls the maximum number of words shortlisted for the query. The higher the number the slower the response due to more disk reads required

Default value: 20


@minTermFrequency Attribute of LikeThisQuery

Controls how many times a term must appear in the example text before it is shortlisted for use in the query

Default value: 1


@percentTermsToMatch Attribute of LikeThisQuery

A quality control that can be used to limit the number of results to those documents matching a certain percentage of the shortlisted query terms. Values must be between 1 and 100

Default value: 30


<BoostingQuery> Child of BoostQuery, Clause, CachedFilter, Query

Requires matches on the "Query" element and optionally boosts by any matches on the "BoostQuery". Unlike a regular BooleanQuery the boost can be less than 1 to produce a subtractive rather than additive result on the match score.

Example: Find documents about banks, preferably related to mergers, and preferably not about "World bank"

	<BoostingQuery>
      <Query>
         <BooleanQuery fieldName="contents">
           <Clause occurs="should">
              <TermQuery>merger</TermQuery>
           </Clause>
           <Clause occurs="must">
              <TermQuery>bank</TermQuery>
           </Clause>
         </BooleanQuery>	
      </Query>
      <BoostQuery boost="0.01">
         <UserQuery>"world bank"</UserQuery>
      </BoostQuery>
    </BoostingQuery>
	

<BoostingQuery>'s children
NameCardinality
BoostQueryOnly one
QueryOnly one
<BoostingQuery>'s attributes
NameValuesDefault
boost1.0
Element's model:

(Query, BoostQuery)


@boost Attribute of BoostingQuery

Optional boost for matches on this query. Values > 1

Default value: 1.0


<BoostQuery> Child of BoostingQuery

Child element of BoostingQuery used to contain the choice of Query which is used for boosting purposes

<BoostQuery>'s children
NameCardinality
BooleanQueryOne or none
BoostingQueryOne or none
BoostingTermQueryOne or none
ConstantScoreQueryOne or none
FilteredQueryOne or none
FuzzyLikeThisQueryOne or none
LikeThisQueryOne or none
MatchAllDocsQueryOne or none
NumericRangeQueryOne or none
SpanFirstOne or none
SpanNearOne or none
SpanNotOne or none
SpanOrOne or none
SpanOrTermsOne or none
SpanTermOne or none
TermQueryOne or none
TermsQueryOne or none
UserQueryOne or none
<BoostQuery>'s attributes
NameValuesDefault
boost1.0
Element's model:

(BooleanQuery | UserQuery | FilteredQuery | TermQuery | TermsQuery | MatchAllDocsQuery | ConstantScoreQuery | BoostingTermQuery | NumericRangeQuery | SpanOr | SpanNear | SpanOrTerms | SpanFirst | SpanNot | SpanTerm | BoostingTermQuery | LikeThisQuery | BoostingQuery | FuzzyLikeThisQuery)


@boost Attribute of BoostQuery

Optional boost for matches on this query. A boost of >0 but <1 effectively demotes results from Query that match this BoostQuery.

Default value: 1.0


<DuplicateFilter/> Child of Clause, Filter, CachedFilter, ConstantScoreQuery

Removes duplicated documents from results where "duplicate" means documents share a value for a particular field such as a primary key

Example: Find the latest version of each web page that mentions "Lucene"

    <FilteredQuery>
      <Query>
         <TermQuery fieldName="text">lucene</TermQuery>
      </Query>
	  <Filter>
		<DuplicateFilter fieldName="url" keepMode="last"/>
	  </Filter>	
    </FilteredQuery>	
	

<DuplicateFilter>'s attributes
NameValuesDefault
fieldName
keepModefirst, lastfirst
processingModefull, fastfull

This element is always empty.


@fieldName Attribute of DuplicateFilter

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute


@keepMode Attribute of DuplicateFilter

Determines if the first or last document occurence is the one to return when presented with duplicated field values

Possible values: first, last - Default value: first


@processingMode Attribute of DuplicateFilter

Controls the choice of process used to produce the filter - "full" mode identifies only non-duplicate documents with the chosen field while "fast" mode may perform faster but will also mark documents without the field as valid. The former approach starts by assuming every document is a duplicate then finds the "master" documents to keep while the latter approach assumes all documents are unique and unmarks those documents that are a copy.

Possible values: full, fast - Default value: full


<TermsFilter> Child of Clause, Filter, CachedFilter, ConstantScoreQuery

Processes child text using a field-specific choice of Analyzer to produce a set of terms that are then used as a filter.

Example: Find documents talking about Lucene written on a Monday or a Friday

    <FilteredQuery>
      <Query>
         <TermQuery fieldName="text">lucene</TermQuery>
      </Query>
	<Filter>
		<TermsFilter fieldName="dayOfWeek">monday friday</TermsFilter> 
	</Filter>	
    </FilteredQuery>	
	

<TermsFilter>'s attributes
NameValuesDefault
fieldName

@fieldName Attribute of TermsFilter

fieldName must be defined here or is taken from the most immediate parent XML element that defines a "fieldName" attribute


<BooleanFilter> Child of Clause, Filter, CachedFilter, ConstantScoreQuery

A Filter equivalent to BooleanQuery that applies Boolean logic to Clauses containing Filters. Unlike BooleanQuery a BooleanFilter can contain a single "mustNot" clause.

Example: Find documents from the first quarter of this year or last year that are not in "draft" status

     <FilteredQuery>
       <Query>
           <MatchAllDocsQuery/>
       </Query>
       <Filter>
        <BooleanFilter>
          <Clause occurs="should">
             <RangeFilter fieldName="date" lowerTerm="20070101" upperTerm="20070401"/>
          </Clause>
          <Clause occurs="should">
             <RangeFilter fieldName="date" lowerTerm="20060101" upperTerm="20060401"/>
          </Clause>
          <Clause occurs="mustNot">
             <TermsFilter fieldName="status">draft</TermsFilter> 
          </Clause>
        </BooleanFilter>
       </Filter>
    </FilteredQuery>
	

<BooleanFilter>'s children
NameCardinality
ClauseAt least one
Element's model:

(Clause)+