org.apache.lucene.search.function

+ Programmatic control over documents scores. +

+ The function package provides tight control over documents scores. +

+@lucene.experimental +

+ Two types of queries are available in this package: +

+ Custom Score queries - allowing to set the score + of a matching document as a mathematical expression over scores + of that document by contained (sub) queries. +
+ Field score queries - allowing to base the score of a + document on numeric values of indexed fields. +

+ Some possible uses of these queries: +

+ Normalizing the document scores by values indexed in a special field - + for instance, experimenting with a different doc length normalization. +
+ Introducing some static scoring element, to the score of a document, - + for instance using some topological attribute of the links to/from a document. +
+ Computing the score of a matching document as an arbitrary odd function of + its score by a certain query. +

+ Performance and Quality Considerations: +

+ When scoring by values of indexed fields, + these values are loaded into memory. + Unlike the regular scoring, where the required information is read from + disk as necessary, here field values are loaded once and cached by Lucene in memory + for further use, anticipating reuse by further queries. While all this is carefully + cached with performance in mind, it is recommended to + use these features only when the default Lucene scoring does + not match your "special" application needs. +
+ Use only with carefully selected fields, because in most cases, + search quality with regular Lucene scoring + would outperform that of scoring by field values. +
+ Values of fields used for scoring should match. + Do not apply on a field containing arbitrary (long) text. + Do not mix values in the same field if that field is used for scoring. +
+ Smaller (shorter) field tokens means less RAM (something always desired). + When using FieldScoreQuery, + select the shortest FieldScoreQuery.Type + that is sufficient for the used field values. +
+ Reusing IndexReaders/IndexSearchers is essential, because the caching of field tokens + is based on an IndexReader. Whenever a new IndexReader is used, values currently in the cache + cannot be used and new values must be loaded from disk. So replace/refresh readers/searchers in + a controlled manner. +

+ History and Credits: +

+ A large part of the code of this package was originated from Yonik's FunctionQuery code that was + imported from Solr + (see LUCENE-446). +
+ The idea behind CustomScoreQurey is borrowed from + the "Easily create queries that transform sub-query scores arbitrarily" contribution by Mike Klaas + (see LUCENE-850) + though the implementation and API here are different. +

+ Code sample: +

+ Note: code snippets here should work, but they were never really compiled... so, + tests sources under TestCustomScoreQuery, TestFieldScoreQuery and TestOrdValues + may also be useful. +

+ Using field (byte) values to as scores: +

+ Indexing: +

+      f = new Field("score", "7", Field.Store.NO, Field.Index.UN_TOKENIZED);
+      f.setOmitNorms(true);
+      d1.add(f);
+

+ Search: +

+      Query q = new FieldScoreQuery("score", FieldScoreQuery.Type.BYTE);
+

+ Document d1 above would get a score of 7. +

+ Manipulating scores +

+ Dividing the original score of each document by a square root of its docid + (just to demonstrate what it takes to manipulate scores this way) +

+      Query q = queryParser.parse("my query text");
+      CustomScoreQuery customQ = new CustomScoreQuery(q) {
+        public float customScore(int doc, float subQueryScore, float valSrcScore) {
+          return subQueryScore / Math.sqrt(docid);
+        }
+      };
+

+ For more informative debug info on the custom query, also override the name() method: +

+      CustomScoreQuery customQ = new CustomScoreQuery(q) {
+        public float customScore(int doc, float subQueryScore, float valSrcScore) {
+          return subQueryScore / Math.sqrt(docid);
+        }
+        public String name() {
+          return "1/sqrt(docid)";
+        }
+      };
+

+ Taking the square root of the original score and multiplying it by a "short field driven score", ie, the + short value that was indexed for the scored doc in a certain field: +

+      Query q = queryParser.parse("my query text");
+      FieldScoreQuery qf = new FieldScoreQuery("shortScore", FieldScoreQuery.Type.SHORT);
+      CustomScoreQuery customQ = new CustomScoreQuery(q,qf) {
+        public float customScore(int doc, float subQueryScore, float valSrcScore) {
+          return Math.sqrt(subQueryScore) * valSrcScore;
+        }
+        public String name() {
+          return "shortVal*sqrt(score)";
+        }
+      };
+

+ +