Encoding

X-Git-Url: https://git.mdrn.pl/pylucene.git/blobdiff_plain/a2e61f0c04805cfcb8706176758d1283c7e3a55c..aaeed5504b982cf3545252ab528713250aa33eed:/lucene-java-3.4.0/lucene/contrib/facet/src/java/org/apache/lucene/util/encoding/package.html?ds=sidebyside diff --git a/lucene-java-3.4.0/lucene/contrib/facet/src/java/org/apache/lucene/util/encoding/package.html b/lucene-java-3.4.0/lucene/contrib/facet/src/java/org/apache/lucene/util/encoding/package.html deleted file mode 100644 index 941aefa..0000000 --- a/lucene-java-3.4.0/lucene/contrib/facet/src/java/org/apache/lucene/util/encoding/package.html +++ /dev/null @@ -1,150 +0,0 @@ - - -Encoding - - -Offers various encoders and decoders for integers, as well as the -mechanisms to create new ones. The super class for all encoders is -{@link org.apache.lucene.util.encoding.IntEncoder} and for most of the -encoders there is a matching {@link -org.apache.lucene.util.encoding.IntDecoder} implementation (not all -encoders need a decoder). -

An encoder encodes the integers that are passed to {@link -org.apache.lucene.util.encoding.IntEncoder#encode(int) encode} into a -set output stream (see {@link -org.apache.lucene.util.encoding.IntEncoder#reInit(OutputStream) -reInit}). One should always call {@link -org.apache.lucene.util.encoding.IntEncoder#close() close} when all -integers have been encoded, to ensure proper finish by the encoder. Some -encoders buffer values in-memory and encode in batches in order to -optimize the encoding, and not closing them may result in loss of -information or corrupt stream. -

A proper and typical usage of an encoder looks like this: -


-int[] data = <the values to encode>
-IntEncoder encoder = new VInt8IntEncoder();
-OutputStream out = new ByteArrayOutputStream();
-encoder.reInit(out);
-for (int val : data) {
-  encoder.encode(val);
-}
-encoder.close();
-
-// Print the bytes in binary
-byte[] bytes = out.toByteArray();
-for (byte b : bytes) {
-  System.out.println(Integer.toBinaryString(b));
-}
-

-Each encoder also implements {@link -org.apache.lucene.util.encoding.IntEncoder#createMatchingDecoder() -createMatchingDecoder} which returns the matching decoder for this encoder. -As mentioned above, not all encoders have a matching decoder (like some -encoder filters which are explained next), however every encoder should -return a decoder following a call to that method. To complete the -example above, one can easily iterate over the decoded values like this: -


-IntDecoder d = e.createMatchingDecoder();
-d.reInit(new ByteArrayInputStream(bytes));
-long val;
-while ((val = d.decode()) != IntDecoder.EOS) {
-  System.out.println(val);
-}
-

Some encoders don't perform any encoding at all, or do not include an -encoding logic. Those are called {@link -org.apache.lucene.util.encoding.IntEncoderFilter}s. A filter is an -encoder which delegates the encoding task to a given encoder, however -performs additional logic before the values are sent for encoding. An -example is {@link org.apache.lucene.util.encoding.DGapIntEncoder} -which encodes the gaps between values rather than the values themselves. -Another example is {@link -org.apache.lucene.util.encoding.SortingIntEncoder} which sorts all the -values in ascending order before they are sent for encoding. This -encoder aggregates the values in its {@link -org.apache.lucene.util.encoding.IntEncoder#encode(int) encode} implementation -and decoding only happens upon calling {@link -org.apache.lucene.util.encoding.IntEncoder#close() close}. -

Extending IntEncoder

-Extending {@link org.apache.lucene.util.encoding.IntEncoder} is a very -easy task. One only needs to implement {@link -org.apache.lucene.util.encoding.IntEncoder#encode(int) encode} and -{@link org.apache.lucene.util.encoding.IntEncoder#createMatchingDecoder() -createMatchingDecoder} as the base implementation takes care of -re-initializing the output stream and closing it. The following example -illustrates how can one write an encoder (and a matching decoder) which -'tags' the stream with type/ID of the encoder. Such tagging is important -in scenarios where an application uses different encoders for different -streams, and wants to manage some sort of mapping between an encoder ID -to an IntEncoder/Decoder implementation, so a proper decoder will be -initialized on the fly: -


-public class TaggingIntEncoder extends IntEncoderFilter {
-  
-  public TaggingIntEncoder(IntEncoder encoder) {
-    super(encoder);
-  }
-  
-  @Override
-  public void encode(int value) throws IOException {
-    encoder.encode(value);
-  }
-
-  @Override
-  public IntDecoder createMatchingDecoder() {
-    return new TaggingIntDecoder();
-  }
-	
-  @Override
-  public void reInit(OutputStream out) {
-    super.reInit(os);
-    // Assumes the application has a static EncodersMap class which is able to 
-    // return a unique ID for a given encoder.
-    int encoderID = EncodersMap.getID(encoder);
-    this.out.write(encoderID);
-  }
-
-  @Override
-  public String toString() {
-    return "Tagging (" + encoder.toString() + ")";
-  }
-
-}
-

-And the matching decoder: -


-public class TaggingIntDecoder extends IntDecoder {
-  
-  // Will be initialized upon calling reInit.
-  private IntDecoder decoder;
-  
-  @Override
-  public void reInit(InputStream in) {
-    super.reInit(in);
-    
-    // Read the ID of the encoder that tagged this stream.
-    int encoderID = in.read();
-    
-    // Assumes EncodersMap can return the proper IntEncoder given the ID.
-    decoder = EncodersMap.getEncoder(encoderID).createMatchingDecoder();
-  }
-	
-  @Override
-  public long decode() throws IOException {
-    return decoder.decode();
-  }
-
-  @Override
-  public String toString() {
-    return "Tagging (" + decoder == null ? "none" : decoder.toString() + ")";
-  }
-
-}
-

-The example implements TaggingIntEncoder as a filter over another -encoder. Even though it does not do any filtering on the actual values, it feels -right to present it as a filter. Anyway, this is just an example code and one -can choose to implement it however it makes sense to the application. For -simplicity, error checking was omitted from the sample code. - - \ No newline at end of file