4 <p>This module enables search result grouping with Lucene, where hits
5 with the same value in the specified single-valued group field are
6 grouped together. For example, if you group by the <code>author</code>
7 field, then all documents with the same value in the <code>author</code>
8 field fall into a single group.</p>
10 <p>Grouping requires a number of inputs:</p>
13 <li> <code>groupField</code>: this is the field used for grouping.
14 For example, if you use the <code>author</code> field then each
15 group has all books by the same author. Documents that don't
16 have this field are grouped under a single group with
17 a <code>null</code> group value.
19 <li> <code>groupSort</code>: how the groups are sorted. For sorting
20 purposes, each group is "represented" by the highest-sorted
21 document according to the <code>groupSort</code> within it. For
22 example, if you specify "price" (ascending) then the first group
23 is the one with the lowest price book within it. Or if you
24 specify relevance group sort, then the first group is the one
25 containing the highest scoring book.
27 <li> <code>topNGroups</code>: how many top groups to keep. For
28 example, 10 means the top 10 groups are computed.
30 <li> <code>groupOffset</code>: which "slice" of top groups you want to
31 retrieve. For example, 3 means you'll get 7 groups back
32 (assuming <code>topNGroups</code> is 10). This is useful for
33 paging, where you might show 5 groups per page.
35 <li> <code>withinGroupSort</code>: how the documents within each group
36 are sorted. This can be different from the group sort.
38 <li> <code>maxDocsPerGroup</code>: how many top documents within each
41 <li> <code>withinGroupOffset</code>: which "slice" of top
42 documents you want to retrieve from each group.
46 <p>The implementation is two-pass: the first pass ({@link
47 org.apache.lucene.search.grouping.TermFirstPassGroupingCollector})
48 gathers the top groups, and the second pass ({@link
49 org.apache.lucene.search.grouping.TermSecondPassGroupingCollector})
50 gathers documents within those groups. If the search is costly to
51 run you may want to use the {@link
52 org.apache.lucene.search.CachingCollector} class, which
53 caches hits and can (quickly) replay them for the second pass. This
54 way you only run the query once, but you pay a RAM cost to (briefly)
55 hold all hits. Results are returned as a {@link
56 org.apache.lucene.search.grouping.TopGroups} instance.</p>
59 This module abstracts away what defines group and how it is collected. All grouping collectors
60 are abstract and have currently term based implementations. One can implement
61 collectors that for example group on multiple fields.
64 <p>Known limitations:</p>
66 <li> For the two-pass grouping collector, the group field must be a
67 single-valued indexed field.
68 {@link org.apache.lucene.search.FieldCache} is used to load the {@link org.apache.lucene.search.FieldCache.StringIndex} for this field.
69 <li> Although Solr support grouping by function and this module has abstraction of what a group is, there are currently only
70 implementations for grouping based on terms.
71 <li> Sharding is not directly supported, though is not too
72 difficult, if you can merge the top groups and top documents per
76 <p>Typical usage for the generic two-pass collector looks like this
77 (using the {@link org.apache.lucene.search.CachingCollector}):</p>
79 <pre class="prettyprint">
80 TermFirstPassGroupingCollector c1 = new TermFirstPassGroupingCollector("author", groupSort, groupOffset+topNGroups);
82 boolean cacheScores = true;
83 double maxCacheRAMMB = 4.0;
84 CachingCollector cachedCollector = CachingCollector.create(c1, cacheScores, maxCacheRAMMB);
85 s.search(new TermQuery(new Term("content", searchTerm)), cachedCollector);
87 Collection<SearchGroup<BytesRef>> topGroups = c1.getTopGroups(groupOffset, fillFields);
89 if (topGroups == null) {
94 boolean getScores = true;
95 boolean getMaxScores = true;
96 boolean fillFields = true;
97 TermSecondPassGroupingCollector c2 = new TermSecondPassGroupingCollector("author", topGroups, groupSort, docSort, docOffset+docsPerGroup, getScores, getMaxScores, fillFields);
99 //Optionally compute total group count
100 TermAllGroupsCollector allGroupsCollector = null;
101 if (requiredTotalGroupCount) {
102 allGroupsCollector = new TermAllGroupsCollector("author");
103 c2 = MultiCollector.wrap(c2, allGroupsCollector);
106 if (cachedCollector.isCached()) {
107 // Cache fit within maxCacheRAMMB, so we can replay it:
108 cachedCollector.replay(c2);
110 // Cache was too large; must re-execute query:
111 s.search(new TermQuery(new Term("content", searchTerm)), c2);
114 TopGroups<BytesRef> groupsResult = c2.getTopGroups(docOffset);
115 if (requiredTotalGroupCount) {
116 groupsResult = new TopGroups<BytesRef>(groupsResult, allGroupsCollector.getGroupCount());
119 // Render groupsResult...
122 <p>To use the single-pass <code>BlockGroupingCollector</code>,
123 first, at indexing time, you must ensure all docs in each group
124 are added as a block, and you have some way to find the last
125 document of each group. One simple way to do this is to add a
126 marker binary field:</p>
128 <pre class="prettyprint">
129 // Create Documents from your source:
130 List<Document> oneGroup = ...;
132 Field groupEndField = new Field("groupEnd", "x", Field.Store.NO, Field.Index.NOT_ANALYZED);
133 groupEndField.setOmitTermFreqAndPositions(true);
134 groupEndField.setOmitNorms(true);
135 oneGroup.get(oneGroup.size()-1).add(groupEndField);
137 // You can also use writer.updateDocuments(); just be sure you
138 // replace an entire previous doc block with this new one. For
139 // example, each group could have a "groupID" field, with the same
140 // value for all docs in this group:
141 writer.addDocuments(oneGroup);
144 Then, at search time, do this up front:
146 <pre class="prettyprint">
147 // Set this once in your app & save away for reusing across all queries:
148 Filter groupEndDocs = new CachingWrapperFilter(new QueryWrapperFilter(new TermQuery(new Term("groupEnd", "x"))));
151 Finally, do this per search:
153 <pre class="prettyprint">
155 BlockGroupingCollector c = new BlockGroupingCollector(groupSort, groupOffset+topNGroups, needsScores, groupEndDocs);
156 s.search(new TermQuery(new Term("content", searchTerm)), c);
157 TopGroups groupsResult = c.getTopGroups(withinGroupSort, groupOffset, docOffset, docOffset+docsPerGroup, fillFields);
159 // Render groupsResult...
162 Note that the <code>groupValue</code> of each <code>GroupDocs</code>
163 will be <code>null</code>, so if you need to present this value you'll
164 have to separately retrieve it (for example using stored
165 fields, <code>FieldCache</code>, etc.).
167 <p>Another collector is the <code>TermAllGroupHeadsCollector</code> that can be used to retrieve all most relevant
168 documents per group. Also known as group heads. This can be useful in situations when one wants to compute grouping
169 based facets / statistics on the complete query result. The collector can be executed during the first or second
172 <pre class="prettyprint">
173 AbstractAllGroupHeadsCollector c = TermAllGroupHeadsCollector.create(groupField, sortWithinGroup);
174 s.search(new TermQuery(new Term("content", searchTerm)), c);
175 // Return all group heads as int array
176 int[] groupHeadsArray = c.retrieveGroupHeads()
177 // Return all group heads as FixedBitSet.
178 int maxDoc = s.maxDoc();
179 FixedBitSet groupHeadsBitSet = c.retrieveGroupHeads(maxDoc)