An encoder encodes the integers that are passed to {@link org.apache.lucene.util.encoding.IntEncoder#encode(int) encode} into a set output stream (see {@link org.apache.lucene.util.encoding.IntEncoder#reInit(OutputStream) reInit}). One should always call {@link org.apache.lucene.util.encoding.IntEncoder#close() close} when all integers have been encoded, to ensure proper finish by the encoder. Some encoders buffer values in-memory and encode in batches in order to optimize the encoding, and not closing them may result in loss of information or corrupt stream.
A proper and typical usage of an encoder looks like this:
Each encoder also implements {@link org.apache.lucene.util.encoding.IntEncoder#createMatchingDecoder() createMatchingDecoder} which returns the matching decoder for this encoder. As mentioned above, not all encoders have a matching decoder (like some encoder filters which are explained next), however every encoder should return a decoder following a call to that method. To complete the example above, one can easily iterate over the decoded values like this:int[] data = <the values to encode> IntEncoder encoder = new VInt8IntEncoder(); OutputStream out = new ByteArrayOutputStream(); encoder.reInit(out); for (int val : data) { encoder.encode(val); } encoder.close(); // Print the bytes in binary byte[] bytes = out.toByteArray(); for (byte b : bytes) { System.out.println(Integer.toBinaryString(b)); }
IntDecoder d = e.createMatchingDecoder(); d.reInit(new ByteArrayInputStream(bytes)); long val; while ((val = d.decode()) != IntDecoder.EOS) { System.out.println(val); }
Some encoders don't perform any encoding at all, or do not include an encoding logic. Those are called {@link org.apache.lucene.util.encoding.IntEncoderFilter}s. A filter is an encoder which delegates the encoding task to a given encoder, however performs additional logic before the values are sent for encoding. An example is {@link org.apache.lucene.util.encoding.DGapIntEncoder} which encodes the gaps between values rather than the values themselves. Another example is {@link org.apache.lucene.util.encoding.SortingIntEncoder} which sorts all the values in ascending order before they are sent for encoding. This encoder aggregates the values in its {@link org.apache.lucene.util.encoding.IntEncoder#encode(int) encode} implementation and decoding only happens upon calling {@link org.apache.lucene.util.encoding.IntEncoder#close() close}.
And the matching decoder:public class TaggingIntEncoder extends IntEncoderFilter { public TaggingIntEncoder(IntEncoder encoder) { super(encoder); } @Override public void encode(int value) throws IOException { encoder.encode(value); } @Override public IntDecoder createMatchingDecoder() { return new TaggingIntDecoder(); } @Override public void reInit(OutputStream out) { super.reInit(os); // Assumes the application has a static EncodersMap class which is able to // return a unique ID for a given encoder. int encoderID = EncodersMap.getID(encoder); this.out.write(encoderID); } @Override public String toString() { return "Tagging (" + encoder.toString() + ")"; } }
The example implementspublic class TaggingIntDecoder extends IntDecoder { // Will be initialized upon calling reInit. private IntDecoder decoder; @Override public void reInit(InputStream in) { super.reInit(in); // Read the ID of the encoder that tagged this stream. int encoderID = in.read(); // Assumes EncodersMap can return the proper IntEncoder given the ID. decoder = EncodersMap.getEncoder(encoderID).createMatchingDecoder(); } @Override public long decode() throws IOException { return decoder.decode(); } @Override public String toString() { return "Tagging (" + decoder == null ? "none" : decoder.toString() + ")"; } }
TaggingIntEncoder
as a filter over another
encoder. Even though it does not do any filtering on the actual values, it feels
right to present it as a filter. Anyway, this is just an example code and one
can choose to implement it however it makes sense to the application. For
simplicity, error checking was omitted from the sample code.