X-Git-Url: https://git.mdrn.pl/pylucene.git/blobdiff_plain/a2e61f0c04805cfcb8706176758d1283c7e3a55c..aaeed5504b982cf3545252ab528713250aa33eed:/lucene-java-3.5.0/lucene/src/java/org/apache/lucene/search/function/package.html diff --git a/lucene-java-3.5.0/lucene/src/java/org/apache/lucene/search/function/package.html b/lucene-java-3.5.0/lucene/src/java/org/apache/lucene/search/function/package.html new file mode 100755 index 0000000..5da5fe8 --- /dev/null +++ b/lucene-java-3.5.0/lucene/src/java/org/apache/lucene/search/function/package.html @@ -0,0 +1,191 @@ + + + + org.apache.lucene.search.function + + +
+ Programmatic control over documents scores. +
+
+ The function package provides tight control over documents scores. +
+
+@lucene.experimental +
+
+ Two types of queries are available in this package: +
+
+
    +
  1. + Custom Score queries - allowing to set the score + of a matching document as a mathematical expression over scores + of that document by contained (sub) queries. +
  2. +
  3. + Field score queries - allowing to base the score of a + document on numeric values of indexed fields. +
  4. +
+
+
 
+
+ Some possible uses of these queries: +
+
+
    +
  1. + Normalizing the document scores by values indexed in a special field - + for instance, experimenting with a different doc length normalization. +
  2. +
  3. + Introducing some static scoring element, to the score of a document, - + for instance using some topological attribute of the links to/from a document. +
  4. +
  5. + Computing the score of a matching document as an arbitrary odd function of + its score by a certain query. +
  6. +
+
+
+ Performance and Quality Considerations: +
+
+
    +
  1. + When scoring by values of indexed fields, + these values are loaded into memory. + Unlike the regular scoring, where the required information is read from + disk as necessary, here field values are loaded once and cached by Lucene in memory + for further use, anticipating reuse by further queries. While all this is carefully + cached with performance in mind, it is recommended to + use these features only when the default Lucene scoring does + not match your "special" application needs. +
  2. +
  3. + Use only with carefully selected fields, because in most cases, + search quality with regular Lucene scoring + would outperform that of scoring by field values. +
  4. +
  5. + Values of fields used for scoring should match. + Do not apply on a field containing arbitrary (long) text. + Do not mix values in the same field if that field is used for scoring. +
  6. +
  7. + Smaller (shorter) field tokens means less RAM (something always desired). + When using FieldScoreQuery, + select the shortest FieldScoreQuery.Type + that is sufficient for the used field values. +
  8. +
  9. + Reusing IndexReaders/IndexSearchers is essential, because the caching of field tokens + is based on an IndexReader. Whenever a new IndexReader is used, values currently in the cache + cannot be used and new values must be loaded from disk. So replace/refresh readers/searchers in + a controlled manner. +
  10. +
+
+
+ History and Credits: + +
+
+ Code sample: +

+ Note: code snippets here should work, but they were never really compiled... so, + tests sources under TestCustomScoreQuery, TestFieldScoreQuery and TestOrdValues + may also be useful. +

    +
  1. + Using field (byte) values to as scores: +

    + Indexing: +

    +      f = new Field("score", "7", Field.Store.NO, Field.Index.UN_TOKENIZED);
    +      f.setOmitNorms(true);
    +      d1.add(f);
    +    
    +

    + Search: +

    +      Query q = new FieldScoreQuery("score", FieldScoreQuery.Type.BYTE);
    +    
    + Document d1 above would get a score of 7. +
  2. +

    +

  3. + Manipulating scores +

    + Dividing the original score of each document by a square root of its docid + (just to demonstrate what it takes to manipulate scores this way) +

    +      Query q = queryParser.parse("my query text");
    +      CustomScoreQuery customQ = new CustomScoreQuery(q) {
    +        public float customScore(int doc, float subQueryScore, float valSrcScore) {
    +          return subQueryScore / Math.sqrt(docid);
    +        }
    +      };
    +    
    +

    + For more informative debug info on the custom query, also override the name() method: +

    +      CustomScoreQuery customQ = new CustomScoreQuery(q) {
    +        public float customScore(int doc, float subQueryScore, float valSrcScore) {
    +          return subQueryScore / Math.sqrt(docid);
    +        }
    +        public String name() {
    +          return "1/sqrt(docid)";
    +        }
    +      };
    +    
    +

    + Taking the square root of the original score and multiplying it by a "short field driven score", ie, the + short value that was indexed for the scored doc in a certain field: +

    +      Query q = queryParser.parse("my query text");
    +      FieldScoreQuery qf = new FieldScoreQuery("shortScore", FieldScoreQuery.Type.SHORT);
    +      CustomScoreQuery customQ = new CustomScoreQuery(q,qf) {
    +        public float customScore(int doc, float subQueryScore, float valSrcScore) {
    +          return Math.sqrt(subQueryScore) * valSrcScore;
    +        }
    +        public String name() {
    +          return "shortVal*sqrt(score)";
    +        }
    +      };
    +    
    + +
  4. +
+
+ + \ No newline at end of file