--- /dev/null
+<html>
+<body>
+
+<p>This module enables search result grouping with Lucene, where hits
+with the same value in the specified single-valued group field are
+grouped together. For example, if you group by the <code>author</code>
+field, then all documents with the same value in the <code>author</code>
+field fall into a single group.</p>
+
+<p>Grouping requires a number of inputs:</p>
+
+ <ul>
+ <li> <code>groupField</code>: this is the field used for grouping.
+ For example, if you use the <code>author</code> field then each
+ group has all books by the same author. Documents that don't
+ have this field are grouped under a single group with
+ a <code>null</code> group value.
+
+ <li> <code>groupSort</code>: how the groups are sorted. For sorting
+ purposes, each group is "represented" by the highest-sorted
+ document according to the <code>groupSort</code> within it. For
+ example, if you specify "price" (ascending) then the first group
+ is the one with the lowest price book within it. Or if you
+ specify relevance group sort, then the first group is the one
+ containing the highest scoring book.
+
+ <li> <code>topNGroups</code>: how many top groups to keep. For
+ example, 10 means the top 10 groups are computed.
+
+ <li> <code>groupOffset</code>: which "slice" of top groups you want to
+ retrieve. For example, 3 means you'll get 7 groups back
+ (assuming <code>topNGroups</code> is 10). This is useful for
+ paging, where you might show 5 groups per page.
+
+ <li> <code>withinGroupSort</code>: how the documents within each group
+ are sorted. This can be different from the group sort.
+
+ <li> <code>maxDocsPerGroup</code>: how many top documents within each
+ group to keep.
+
+ <li> <code>withinGroupOffset</code>: which "slice" of top
+ documents you want to retrieve from each group.
+
+ </ul>
+
+<p>The implementation is two-pass: the first pass ({@link
+ org.apache.lucene.search.grouping.TermFirstPassGroupingCollector})
+ gathers the top groups, and the second pass ({@link
+ org.apache.lucene.search.grouping.TermSecondPassGroupingCollector})
+ gathers documents within those groups. If the search is costly to
+ run you may want to use the {@link
+ org.apache.lucene.search.CachingCollector} class, which
+ caches hits and can (quickly) replay them for the second pass. This
+ way you only run the query once, but you pay a RAM cost to (briefly)
+ hold all hits. Results are returned as a {@link
+ org.apache.lucene.search.grouping.TopGroups} instance.</p>
+
+<p>
+ This module abstracts away what defines group and how it is collected. All grouping collectors
+ are abstract and have currently term based implementations. One can implement
+ collectors that for example group on multiple fields.
+</p>
+
+<p>Known limitations:</p>
+<ul>
+ <li> For the two-pass grouping collector, the group field must be a
+ single-valued indexed field.
+ {@link org.apache.lucene.search.FieldCache} is used to load the {@link org.apache.lucene.search.FieldCache.StringIndex} for this field.
+ <li> Although Solr support grouping by function and this module has abstraction of what a group is, there are currently only
+ implementations for grouping based on terms.
+ <li> Sharding is not directly supported, though is not too
+ difficult, if you can merge the top groups and top documents per
+ group yourself.
+</ul>
+
+<p>Typical usage for the generic two-pass collector looks like this
+ (using the {@link org.apache.lucene.search.CachingCollector}):</p>
+
+<pre class="prettyprint">
+ TermFirstPassGroupingCollector c1 = new TermFirstPassGroupingCollector("author", groupSort, groupOffset+topNGroups);
+
+ boolean cacheScores = true;
+ double maxCacheRAMMB = 4.0;
+ CachingCollector cachedCollector = CachingCollector.create(c1, cacheScores, maxCacheRAMMB);
+ s.search(new TermQuery(new Term("content", searchTerm)), cachedCollector);
+
+ Collection<SearchGroup<BytesRef>> topGroups = c1.getTopGroups(groupOffset, fillFields);
+
+ if (topGroups == null) {
+ // No groups matched
+ return;
+ }
+
+ boolean getScores = true;
+ boolean getMaxScores = true;
+ boolean fillFields = true;
+ TermSecondPassGroupingCollector c2 = new TermSecondPassGroupingCollector("author", topGroups, groupSort, docSort, docOffset+docsPerGroup, getScores, getMaxScores, fillFields);
+
+ //Optionally compute total group count
+ TermAllGroupsCollector allGroupsCollector = null;
+ if (requiredTotalGroupCount) {
+ allGroupsCollector = new TermAllGroupsCollector("author");
+ c2 = MultiCollector.wrap(c2, allGroupsCollector);
+ }
+
+ if (cachedCollector.isCached()) {
+ // Cache fit within maxCacheRAMMB, so we can replay it:
+ cachedCollector.replay(c2);
+ } else {
+ // Cache was too large; must re-execute query:
+ s.search(new TermQuery(new Term("content", searchTerm)), c2);
+ }
+
+ TopGroups<BytesRef> groupsResult = c2.getTopGroups(docOffset);
+ if (requiredTotalGroupCount) {
+ groupsResult = new TopGroups<BytesRef>(groupsResult, allGroupsCollector.getGroupCount());
+ }
+
+ // Render groupsResult...
+</pre>
+
+<p>To use the single-pass <code>BlockGroupingCollector</code>,
+ first, at indexing time, you must ensure all docs in each group
+ are added as a block, and you have some way to find the last
+ document of each group. One simple way to do this is to add a
+ marker binary field:</p>
+
+<pre class="prettyprint">
+ // Create Documents from your source:
+ List<Document> oneGroup = ...;
+
+ Field groupEndField = new Field("groupEnd", "x", Field.Store.NO, Field.Index.NOT_ANALYZED);
+ groupEndField.setOmitTermFreqAndPositions(true);
+ groupEndField.setOmitNorms(true);
+ oneGroup.get(oneGroup.size()-1).add(groupEndField);
+
+ // You can also use writer.updateDocuments(); just be sure you
+ // replace an entire previous doc block with this new one. For
+ // example, each group could have a "groupID" field, with the same
+ // value for all docs in this group:
+ writer.addDocuments(oneGroup);
+</pre>
+
+Then, at search time, do this up front:
+
+<pre class="prettyprint">
+ // Set this once in your app & save away for reusing across all queries:
+ Filter groupEndDocs = new CachingWrapperFilter(new QueryWrapperFilter(new TermQuery(new Term("groupEnd", "x"))));
+</pre>
+
+Finally, do this per search:
+
+<pre class="prettyprint">
+ // Per search:
+ BlockGroupingCollector c = new BlockGroupingCollector(groupSort, groupOffset+topNGroups, needsScores, groupEndDocs);
+ s.search(new TermQuery(new Term("content", searchTerm)), c);
+ TopGroups groupsResult = c.getTopGroups(withinGroupSort, groupOffset, docOffset, docOffset+docsPerGroup, fillFields);
+
+ // Render groupsResult...
+</pre>
+
+Note that the <code>groupValue</code> of each <code>GroupDocs</code>
+will be <code>null</code>, so if you need to present this value you'll
+have to separately retrieve it (for example using stored
+fields, <code>FieldCache</code>, etc.).
+
+<p>Another collector is the <code>TermAllGroupHeadsCollector</code> that can be used to retrieve all most relevant
+ documents per group. Also known as group heads. This can be useful in situations when one wants to compute grouping
+ based facets / statistics on the complete query result. The collector can be executed during the first or second
+ phase.</p>
+
+<pre class="prettyprint">
+ AbstractAllGroupHeadsCollector c = TermAllGroupHeadsCollector.create(groupField, sortWithinGroup);
+ s.search(new TermQuery(new Term("content", searchTerm)), c);
+ // Return all group heads as int array
+ int[] groupHeadsArray = c.retrieveGroupHeads()
+ // Return all group heads as FixedBitSet.
+ int maxDoc = s.maxDoc();
+ FixedBitSet groupHeadsBitSet = c.retrieveGroupHeads(maxDoc)
+</pre>
+
+</body>
+</html>