pylucene 3.5.0-3
[pylucene.git] / lucene-java-3.4.0 / lucene / src / site / src / documentation / content / xdocs / fileformats.xml
diff --git a/lucene-java-3.4.0/lucene/src/site/src/documentation/content/xdocs/fileformats.xml b/lucene-java-3.4.0/lucene/src/site/src/documentation/content/xdocs/fileformats.xml
deleted file mode 100644 (file)
index 02a66d2..0000000
+++ /dev/null
@@ -1,1936 +0,0 @@
-<?xml version="1.0"?>
-
-<document>
-    <header>
-        <title>
-            Apache Lucene - Index File Formats
-        </title>
-    </header>
-
-    <body>
-        <section id="Index File Formats"><title>Index File Formats</title>
-
-            <p>
-                This document defines the index file formats used
-                in this version of Lucene. If you are using a different
-                version of Lucene, please consult the copy of
-                <code>docs/fileformats.html</code>
-                that was distributed
-                with the version you are using.
-            </p>
-
-            <p>
-                Apache Lucene is written in Java, but several
-                efforts are underway to write
-                <a href="http://wiki.apache.org/lucene-java/LuceneImplementations">versions
-                    of Lucene in other programming
-                languages</a>.  If these versions are to remain compatible with Apache
-                Lucene, then a language-independent definition of the Lucene index
-                format is required.  This document thus attempts to provide a
-                complete and independent definition of the Apache Lucene file
-                formats.
-            </p>
-
-            <p>
-                As Lucene evolves, this document should evolve.
-                Versions of Lucene in different programming languages should endeavor
-                to agree on file formats, and generate new versions of this document.
-            </p>
-
-            <p>
-                Compatibility notes are provided in this document,
-                describing how file formats have changed from prior versions.
-            </p>
-
-            <p>
-                In version 2.1, the file format was changed to allow
-                lock-less commits (ie, no more commit lock). The
-                change is fully backwards compatible: you can open a
-                pre-2.1 index for searching or adding/deleting of
-                docs. When the new segments file is saved
-                (committed), it will be written in the new file format
-                (meaning no specific "upgrade" process is needed).
-                But note that once a commit has occurred, pre-2.1
-                Lucene will not be able to read the index.
-            </p>
-
-            <p>
-                In version 2.3, the file format was changed to allow
-               segments to share a single set of doc store (vectors &amp;
-               stored fields) files.  This allows for faster indexing
-               in certain cases.  The change is fully backwards
-               compatible (in the same way as the lock-less commits
-               change in 2.1).
-            </p>
-
-            <p>
-               In version 2.4, Strings are now written as true UTF-8
-               byte sequence, not Java's modified UTF-8.  See issue
-               LUCENE-510 for details.
-            </p>
-
-           <p>
-               In version 2.9, an optional opaque Map&lt;String,String&gt;
-               CommitUserData may be passed to IndexWriter's commit
-               methods (and later retrieved), which is recorded in
-               the segments_N file.  See issue LUCENE-1382 for
-               details.  Also, diagnostics were added to each segment
-               written recording details about why it was written
-               (due to flush, merge; which OS/JRE was used; etc.).
-               See issue LUCENE-1654 for details.
-            </p>
-           
-           <p>
-               In version 3.0, compressed fields are no longer
-               written to the index (they can still be read, but on
-               merge the new segment will write them,
-               uncompressed). See issue LUCENE-1960 for details.
-            </p>
-
-        <p>
-            In version 3.1, segments records the code version
-            that created them. See LUCENE-2720 for details.
-            
-            Additionally segments track explicitly whether or
-            not they have term vectors. See LUCENE-2811 for details.
-           </p>
-        <p>
-            In version 3.2, numeric fields are written as natively
-            to stored fields file, previously they were stored in
-            text format only.
-           </p>
-        <p>
-            In version 3.4, fields can omit position data while
-            still indexing term frequencies.
-        </p>
-        </section>
-
-        <section id="Definitions"><title>Definitions</title>
-
-            <p>
-                The fundamental concepts in Lucene are index,
-                document, field and term.
-            </p>
-
-
-            <p>
-                An index contains a sequence of documents.
-            </p>
-
-            <ul>
-                <li>
-                    <p>
-                        A document is a sequence of fields.
-                    </p>
-                </li>
-
-                <li>
-                    <p>
-                        A field is a named sequence of terms.
-                    </p>
-                </li>
-
-                <li>
-                    A term is a string.
-                </li>
-            </ul>
-
-            <p>
-                The same string in two different fields is
-                considered a different term.  Thus terms are represented as a pair of
-                strings, the first naming the field, and the second naming text
-                within the field.
-            </p>
-
-            <section id="Inverted Indexing"><title>Inverted Indexing</title>
-
-                <p>
-                    The index stores statistics about terms in order
-                    to make term-based search more efficient.  Lucene's
-                    index falls into the family of indexes known as an <i>inverted
-                        index.</i> This is because it can list, for a term, the documents that contain
-                    it.  This is the inverse of the natural relationship, in which
-                    documents list terms.
-                </p>
-            </section>
-            <section id="Types of Fields">
-                <title>Types of Fields</title>
-                <p>
-                    In Lucene, fields may be <i>stored</i>, in which
-                    case their text is stored in the index literally, in a non-inverted
-                    manner.  Fields that are inverted are called <i>indexed</i>. A field
-                    may be both stored and indexed.</p>
-
-                <p>The text of a field may be <i>tokenized</i> into terms to be
-                    indexed, or the text of a field may be used literally as a term to be indexed.
-                    Most fields are
-                    tokenized, but sometimes it is useful for certain identifier fields
-                    to be indexed literally.
-                </p>
-                <p>See the <a href="api/core/org/apache/lucene/document/Field.html">Field</a> java docs for more information on Fields.</p>
-            </section>
-
-            <section id="Segments"><title>Segments</title>
-
-                <p>
-                    Lucene indexes may be composed of multiple sub-indexes, or
-                    <i>segments</i>. Each segment is a fully independent index, which could be searched
-                    separately. Indexes evolve by:
-                </p>
-
-                <ol>
-                    <li>
-                        <p>Creating new segments for newly added documents.</p>
-                    </li>
-                    <li>
-                        <p>Merging existing segments.</p>
-                    </li>
-                </ol>
-
-                <p>
-                    Searches may involve multiple segments and/or multiple indexes, each
-                    index potentially composed of a set of segments.
-                </p>
-            </section>
-
-            <section id="Document Numbers"><title>Document Numbers</title>
-
-                <p>
-                    Internally, Lucene refers to documents by an integer <i>document
-                        number</i>. The first document added to an index is numbered zero, and each
-                    subsequent document added gets a number one greater than the previous.
-                </p>
-
-                <p>
-                    <br/>
-                </p>
-
-                <p>
-                    Note that a document's number may change, so caution should be taken
-                    when storing these numbers outside of Lucene. In particular, numbers may
-                    change in the following situations:
-                </p>
-
-
-                <ul>
-                    <li>
-                        <p>
-                            The
-                            numbers stored in each segment are unique only within the segment,
-                            and must be converted before they can be used in a larger context.
-                            The standard technique is to allocate each segment a range of
-                            values, based on the range of numbers used in that segment.  To
-                            convert a document number from a segment to an external value, the
-                            segment's <i>base</i> document
-                            number is added.  To convert an external value back to a
-                            segment-specific value, the  segment is identified by the range that
-                            the external value is in, and the segment's base value is
-                            subtracted.  For example two five document segments might be
-                            combined, so that the first segment has a base value of zero, and
-                            the second of five.  Document three from the second segment would
-                            have an external value of eight.
-                        </p>
-                    </li>
-                    <li>
-                        <p>
-                            When documents are deleted, gaps are created
-                            in the numbering. These are eventually removed as the index evolves
-                            through merging. Deleted documents are dropped when segments are
-                            merged. A freshly-merged segment thus has no gaps in its numbering.
-                        </p>
-                    </li>
-                </ul>
-
-            </section>
-
-        </section>
-
-        <section id="Overview"><title>Overview</title>
-
-            <p>
-                Each segment index maintains the following:
-            </p>
-            <ul>
-                <li>
-                    <p>Field names. This
-                        contains the set of field names used in the index.
-
-                    </p>
-                </li>
-                <li>
-                    <p>Stored Field
-                        values. This contains, for each document, a list of attribute-value
-                        pairs, where the attributes are field names. These are used to
-                        store auxiliary information about the document, such as its title,
-                        url, or an identifier to access a
-                        database. The set of stored fields are what is returned for each hit
-                        when searching. This is keyed by document number.
-                    </p>
-                </li>
-                <li>
-                    <p>Term dictionary.
-                        A dictionary containing all of the terms used in all of the indexed
-                        fields of all of the documents. The dictionary also contains the
-                        number of documents which contain the term, and pointers to the
-                        term's frequency and proximity data.
-                    </p>
-                </li>
-
-                <li>
-                    <p>Term Frequency
-                        data. For each term in the dictionary, the numbers of all the
-                        documents that contain that term, and the frequency of the term in
-                        that document, unless frequencies are omitted (IndexOptions.DOCS_ONLY)
-                    </p>
-                </li>
-
-                <li>
-                    <p>Term Proximity
-                        data. For each term in the dictionary, the positions that the term
-                        occurs in each document.  Note that this will
-                        not exist if all fields in all documents omit position data.
-                    </p>
-                </li>
-
-                <li>
-                    <p>Normalization
-                        factors. For each field in each document, a value is stored that is
-                        multiplied into the score for hits on that field.
-                    </p>
-                </li>
-                <li>
-                    <p>Term Vectors. For each field in each document, the term vector
-                        (sometimes called document vector) may be stored. A term vector consists
-                        of term text and term frequency. To add Term Vectors to your index see the
-                        <a href="api/core/org/apache/lucene/document/Field.html">Field</a>
-                        constructors
-                    </p>
-                </li>
-                <li>
-                    <p>Deleted documents.
-                        An optional file indicating which documents are deleted.
-                    </p>
-                </li>
-            </ul>
-
-            <p>Details on each of these are provided in subsequent sections.
-            </p>
-        </section>
-
-        <section id="File Naming"><title>File Naming</title>
-
-            <p>
-                All files belonging to a segment have the same name with varying
-                extensions. The extensions correspond to the different file formats
-                described below. When using the Compound File format (default in 1.4 and greater) these files are
-                collapsed into a single .cfs file (see below for details)
-            </p>
-
-            <p>
-                Typically, all segments
-                in an index are stored in a single directory, although this is not
-                required.
-            </p>
-
-            <p>
-                As of version 2.1 (lock-less commits), file names are
-                never re-used (there is one exception, "segments.gen",
-                see below). That is, when any file is saved to the
-                Directory it is given a never before used filename.
-                This is achieved using a simple generations approach.
-                For example, the first segments file is segments_1,
-                then segments_2, etc. The generation is a sequential
-                long integer represented in alpha-numeric (base 36)
-                form.
-            </p>
-
-        </section>
-      <section id="file-names"><title>Summary of File Extensions</title>
-        <p>The following table summarizes the names and extensions of the files in Lucene:
-          <table>
-            <tr>
-              <th>Name</th>
-              <th>Extension</th>
-              <th>Brief Description</th>
-            </tr>
-            <tr>
-              <td><a href="#Segments File">Segments File</a></td>
-              <td>segments.gen, segments_N</td>
-              <td>Stores information about segments</td>
-            </tr>
-            <tr>
-              <td><a href="#Lock File">Lock File</a></td>
-              <td>write.lock</td>
-              <td>The Write lock prevents multiple IndexWriters from writing to the same file.</td>
-            </tr>
-            <tr>
-              <td><a href="#Compound Files">Compound File</a></td>
-              <td>.cfs</td>
-              <td>An optional "virtual" file consisting of all the other index files for systems
-              that frequently run out of file handles.</td>
-            </tr>
-              <tr>
-              <td><a href="#Compound File">Compound File Entry table</a></td>
-              <td>.cfe</td>
-              <td>The "virtual" compound file's entry table holding all entries in the corresponding .cfs file (Since 3.4)</td>
-            </tr>
-            <tr>
-              <td><a href="#Fields">Fields</a></td>
-              <td>.fnm</td>
-              <td>Stores information about the fields</td>
-            </tr>
-            <tr>
-              <td><a href="#field_index">Field Index</a></td>
-              <td>.fdx</td>
-              <td>Contains pointers to field data</td>
-            </tr>
-            <tr>
-              <td><a href="#field_data">Field Data</a></td>
-              <td>.fdt</td>
-              <td>The stored fields for documents</td>
-            </tr>
-            <tr>
-              <td><a href="#tis">Term Infos</a></td>
-              <td>.tis</td>
-              <td>Part of the term dictionary, stores term info</td>
-            </tr>
-            <tr>
-              <td><a href="#tii">Term Info Index</a></td>
-              <td>.tii</td>
-              <td>The index into the Term Infos file</td>
-            </tr>
-            <tr>
-              <td><a href="#Frequencies">Frequencies</a></td>
-              <td>.frq</td>
-              <td>Contains the list of docs which contain each term along with frequency</td>
-            </tr>
-            <tr>
-              <td><a href="#Positions">Positions</a></td>
-              <td>.prx</td>
-              <td>Stores position information about where a term occurs in the index</td>
-            </tr>
-            <tr>
-              <td><a href="#Normalization Factors">Norms</a></td>
-              <td>.nrm</td>
-              <td>Encodes length and boost factors for docs and fields</td>
-            </tr>
-            <tr>
-              <td><a href="#tvx">Term Vector Index</a></td>
-              <td>.tvx</td>
-              <td>Stores offset into the document data file</td>
-            </tr>
-            <tr>
-              <td><a href="#tvd">Term Vector Documents</a></td>
-              <td>.tvd</td>
-              <td>Contains information about each document that has term vectors</td>
-            </tr>
-            <tr>
-              <td><a href="#tvf">Term Vector Fields</a></td>
-              <td>.tvf</td>
-              <td>The field level info about term vectors</td>
-            </tr>
-            <tr>
-              <td><a href="#Deleted Documents">Deleted Documents</a></td>
-              <td>.del</td>
-              <td>Info about what files are deleted</td>
-            </tr>
-          </table>
-
-        </p>
-      </section>
-
-        <section id="Primitive Types"><title>Primitive Types</title>
-
-            <section id="Byte"><title>Byte</title>
-
-                <p>
-                    The most primitive type
-                    is an eight-bit byte. Files are accessed as sequences of bytes. All
-                    other data types are defined as sequences
-                    of bytes, so file formats are byte-order independent.
-                </p>
-
-            </section>
-
-            <section id="UInt32"><title>UInt32</title>
-
-                <p>
-                    32-bit unsigned integers are written as four
-                    bytes, high-order bytes first.
-                </p>
-                <p>
-                    UInt32    --&gt; &lt;Byte&gt;<sup>4</sup>
-                </p>
-
-            </section>
-
-            <section id="Uint64"><title>Uint64</title>
-
-                <p>
-                    64-bit unsigned integers are written as eight
-                    bytes, high-order bytes first.
-                </p>
-
-                <p>UInt64    --&gt; &lt;Byte&gt;<sup>8</sup>
-                </p>
-
-            </section>
-
-            <section id="VInt"><title>VInt</title>
-
-                <p>
-                    A variable-length format for positive integers is
-                    defined where the high-order bit of each byte indicates whether more
-                    bytes remain to be read. The low-order seven bits are appended as
-                    increasingly more significant bits in the resulting integer value.
-                    Thus values from zero to 127 may be stored in a single byte, values
-                    from 128 to 16,383 may be stored in two bytes, and so on.
-                </p>
-
-                <p>
-                    <b>VInt Encoding Example</b>
-                </p>
-
-                <table width="100%" border="0" cellpadding="4" cellspacing="0">
-                    <col width="64*"/>
-                    <col width="64*"/>
-                    <col width="64*"/>
-                    <col width="64*"/>
-                    <tr valign="TOP">
-                        <td width="25%">
-                            <p align="RIGHT">
-                                <b>Value</b>
-                            </p>
-                        </td>
-                        <td width="25%">
-                            <p align="RIGHT">
-                                <b>First byte</b>
-                            </p>
-                        </td>
-                        <td width="25%">
-                            <p align="RIGHT">
-                                <b>Second byte</b>
-                            </p>
-                        </td>
-                        <td width="25%">
-                            <p align="RIGHT">
-                                <b>Third byte</b>
-                            </p>
-                        </td>
-                    </tr>
-                    <tr valign="BOTTOM">
-                        <td width="25%" sdval="0" sdnum="1033;0;#,##0">
-                            <p align="RIGHT">0
-                            </p>
-                        </td>
-                        <td width="25%" sdval="0" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
-                               margin-right: 0.01cm">
-                                00000000
-                            </p>
-                        </td>
-                        <td width="25%" sdnum="1033;0;00000000">
-                            <p align="RIGHT" style="margin-left: -0.07cm; margin-right:
-                               0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                        <td width="25%" sdnum="1033;0;00000000">
-                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
-                               0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                    </tr>
-                    <tr valign="BOTTOM">
-                        <td width="25%" sdval="1" sdnum="1033;0;#,##0">
-                            <p align="RIGHT">1
-                            </p>
-                        </td>
-                        <td width="25%" sdval="1" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
-                               margin-right: 0.01cm">
-                                00000001
-                            </p>
-                        </td>
-                        <td width="25%" sdnum="1033;0;00000000">
-                            <p align="RIGHT" style="margin-left: -0.07cm; margin-right:
-                               0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                        <td width="25%" sdnum="1033;0;00000000">
-                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
-                               0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                    </tr>
-                    <tr valign="BOTTOM">
-                        <td width="25%" sdval="2" sdnum="1033;0;#,##0">
-                            <p align="RIGHT">2
-                            </p>
-                        </td>
-                        <td width="25%" sdval="10" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
-                               margin-right: 0.01cm">
-                                00000010
-                            </p>
-                        </td>
-                        <td width="25%" sdnum="1033;0;00000000">
-                            <p align="RIGHT" style="margin-left: -0.07cm; margin-right:
-                               0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                        <td width="25%" sdnum="1033;0;00000000">
-                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
-                               0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                    </tr>
-                    <tr>
-                        <td width="25%" valign="TOP">
-                            <p align="RIGHT">...
-                            </p>
-                        </td>
-                        <td width="25%" valign="BOTTOM" sdnum="1033;0;00000000">
-                            <p align="RIGHT" style="margin-left: 0.11cm; margin-right:
-                               0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                        <td width="25%" valign="BOTTOM" sdnum="1033;0;00000000">
-                            <p align="RIGHT" style="margin-left: -0.07cm; margin-right:
-                               0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                        <td width="25%" valign="BOTTOM" sdnum="1033;0;00000000">
-                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
-                               0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                    </tr>
-                    <tr valign="BOTTOM">
-                        <td width="25%" sdval="127" sdnum="1033;0;#,##0">
-                            <p align="RIGHT">127
-                            </p>
-                        </td>
-                        <td width="25%" sdval="1111111" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
-                               margin-right: 0.01cm">
-                                01111111
-                            </p>
-                        </td>
-                        <td width="25%" sdnum="1033;0;00000000">
-                            <p align="RIGHT" style="margin-left: -0.07cm; margin-right:
-                               0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                        <td width="25%" sdnum="1033;0;00000000">
-                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
-                               0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                    </tr>
-                    <tr valign="BOTTOM">
-                        <td width="25%" sdval="128" sdnum="1033;0;#,##0">
-                            <p align="RIGHT">128
-                            </p>
-                        </td>
-                        <td width="25%" sdval="10000000" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
-                               margin-right: 0.01cm">
-                                10000000
-                            </p>
-                        </td>
-                        <td width="25%" sdval="1" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: -0.07cm;
-                               margin-right: 0.01cm">
-                                00000001
-                            </p>
-                        </td>
-                        <td width="25%" sdnum="1033;0;00000000">
-                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
-                               0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                    </tr>
-                    <tr valign="BOTTOM">
-                        <td width="25%" sdval="129" sdnum="1033;0;#,##0">
-                            <p align="RIGHT">129
-                            </p>
-                        </td>
-                        <td width="25%" sdval="10000001" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
-                               margin-right: 0.01cm">
-                                10000001
-                            </p>
-                        </td>
-                        <td width="25%" sdval="1" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: -0.07cm;
-                               margin-right: 0.01cm">
-                                00000001
-                            </p>
-                        </td>
-                        <td width="25%" sdnum="1033;0;00000000">
-                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
-                               0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                    </tr>
-                    <tr valign="BOTTOM">
-                        <td width="25%" sdval="130" sdnum="1033;0;#,##0">
-                            <p align="RIGHT">130
-                            </p>
-                        </td>
-                        <td width="25%" sdval="10000010" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
-                               margin-right: 0.01cm">
-                                10000010
-                            </p>
-                        </td>
-                        <td width="25%" sdval="1" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: -0.07cm;
-                               margin-right: 0.01cm">
-                                00000001
-                            </p>
-                        </td>
-                        <td width="25%" sdnum="1033;0;00000000">
-                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
-                               0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                    </tr>
-                    <tr>
-                        <td width="25%" valign="TOP">
-                            <p align="RIGHT">...
-                            </p>
-                        </td>
-                        <td width="25%" valign="BOTTOM" sdnum="1033;0;00000000">
-                            <p align="RIGHT" style="margin-left: 0.11cm; margin-right:
-                               0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                        <td width="25%" valign="BOTTOM" sdnum="1033;0;00000000">
-                            <p align="RIGHT" style="margin-left: -0.07cm; margin-right:
-                               0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                        <td width="25%" valign="BOTTOM" sdnum="1033;0;00000000">
-                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
-                               0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                    </tr>
-                    <tr valign="BOTTOM">
-                        <td width="25%" sdval="16383" sdnum="1033;0;#,##0">
-                            <p align="RIGHT">16,383
-                            </p>
-                        </td>
-                        <td width="25%" sdval="11111111" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
-                               margin-right: 0.01cm">
-                                11111111
-                            </p>
-                        </td>
-                        <td width="25%" sdval="1111111" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: -0.07cm;
-                               margin-right: 0.01cm">
-                                01111111
-                            </p>
-                        </td>
-                        <td width="25%" sdnum="1033;0;00000000">
-                            <p align="RIGHT" style="margin-left: -0.47cm; margin-right:
-                               0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                    </tr>
-                    <tr valign="BOTTOM">
-                        <td width="25%" sdval="16384" sdnum="1033;0;#,##0">
-                            <p align="RIGHT">16,384
-                            </p>
-                        </td>
-                        <td width="25%" sdval="10000000" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
-                               margin-right: 0.01cm">
-                                10000000
-                            </p>
-                        </td>
-                        <td width="25%" sdval="10000000" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: -0.07cm;
-                               margin-right: 0.01cm">
-                                10000000
-                            </p>
-                        </td>
-                        <td width="25%" sdval="1" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: -0.47cm;
-                               margin-right: 0.01cm">
-                                00000001
-                            </p>
-                        </td>
-                    </tr>
-                    <tr valign="BOTTOM">
-                        <td width="25%" sdval="16385" sdnum="1033;0;#,##0">
-                            <p align="RIGHT">16,385
-                            </p>
-                        </td>
-                        <td width="25%" sdval="10000001" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
-                               margin-right: 0.01cm">
-                                10000001
-                            </p>
-                        </td>
-                        <td width="25%" sdval="10000000" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: -0.07cm;
-                               margin-right: 0.01cm">
-                                10000000
-                            </p>
-                        </td>
-                        <td width="25%" sdval="1" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: -0.47cm;
-                               margin-right: 0.01cm">
-                                00000001
-                            </p>
-                        </td>
-                    </tr>
-                    <tr>
-                        <td width="25%" valign="TOP">
-                            <p align="RIGHT">...
-                            </p>
-                        </td>
-                        <td width="25%" valign="BOTTOM" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: 0.11cm;
-                               margin-right: 0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                        <td width="25%" valign="BOTTOM" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: -0.07cm;
-                               margin-right: 0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                        <td width="25%" valign="BOTTOM" sdnum="1033;0;00000000">
-                            <p class="western" align="RIGHT" style="margin-left: -0.47cm;
-                               margin-right: 0.01cm">
-                                <br/>
-
-                            </p>
-                        </td>
-                    </tr>
-                </table>
-
-                <p>
-                    This provides compression while still being
-                    efficient to decode.
-                </p>
-
-            </section>
-
-            <section id="Chars"><title>Chars</title>
-
-                <p>
-                    Lucene writes unicode
-                    character sequences as UTF-8 encoded bytes.
-                </p>
-
-
-            </section>
-
-            <section id="String"><title>String</title>
-
-                <p>
-                   Lucene writes strings as UTF-8 encoded bytes.
-                    First the length, in bytes, is written as a VInt,
-                    followed by the bytes.
-                </p>
-
-                <p>
-                    String --&gt; VInt, Chars
-                </p>
-
-            </section>
-        </section>
-
-        <section id="Compound Types"><title>Compound Types</title>
-            <section id="MapStringString"><title>Map&lt;String,String&gt;</title>
-
-                <p>
-                   In a couple places Lucene stores a Map
-                    String-&gt;String.
-                </p>
-
-                <p>
-                   Map&lt;String,String&gt; --&gt; Count&lt;String,String&gt;<sup>Count</sup>
-                </p>
-
-            </section>
-
-        </section>
-
-        <section id="Per-Index Files"><title>Per-Index Files</title>
-
-            <p>
-                The files in this section exist one-per-index.
-            </p>
-
-            <section id="Segments File"><title>Segments File</title>
-
-                <p>
-                    The active segments in the index are stored in the
-                    segment info file,
-                    <tt>segments_N</tt>.
-                    There may
-                    be one or more
-                    <tt>segments_N</tt>
-                    files in the
-                    index; however, the one with the largest
-                    generation is the active one (when older
-                    segments_N files are present it's because they
-                    temporarily cannot be deleted, or, a writer is in
-                    the process of committing, or a custom
-                    <a href="api/core/org/apache/lucene/index/IndexDeletionPolicy.html">IndexDeletionPolicy</a>
-                   is in use). This file lists each
-                    segment by name, has details about the separate
-                    norms and deletion files, and also contains the
-                    size of each segment.
-                </p>
-
-                <p>
-                    As of 2.1, there is also a file
-                    <tt>segments.gen</tt>.
-                    This file contains the
-                    current generation (the
-                    <tt>_N</tt>
-                    in
-                    <tt>segments_N</tt>)
-                    of the index. This is
-                    used only as a fallback in case the current
-                    generation cannot be accurately determined by
-                    directory listing alone (as is the case for some
-                    NFS clients with time-based directory cache
-                    expiraation). This file simply contains an Int32
-                    version header (SegmentInfos.FORMAT_LOCKLESS =
-                    -2), followed by the generation recorded as Int64,
-                    written twice.
-                </p>
-                <p>
-                    <b>3.1</b>
-                    Segments --&gt; Format, Version, NameCounter, SegCount, &lt;SegVersion, SegName, SegSize, DelGen, DocStoreOffset, [DocStoreSegment, DocStoreIsCompoundFile], HasSingleNormFile, NumField,
-                    NormGen<sup>NumField</sup>,
-                    IsCompoundFile, DeletionCount, HasProx, Diagnostics, HasVectors&gt;<sup>SegCount</sup>, CommitUserData, Checksum
-                </p>
-
-                <p>
-                    Format, NameCounter, SegCount, SegSize, NumField,
-                    DocStoreOffset, DeletionCount --&gt; Int32
-                </p>
-
-               <p>
-                    Version, DelGen, NormGen, Checksum --&gt; Int64
-                </p>
-
-                <p>
-                   SegVersion, SegName, DocStoreSegment --&gt; String
-                </p>
-
-               <p>
-                  Diagnostics --&gt; Map&lt;String,String&gt;
-               </p>
-
-                <p>
-                    IsCompoundFile, HasSingleNormFile,
-                    DocStoreIsCompoundFile, HasProx, HasVectors --&gt; Int8
-                </p>
-
-               <p>
-                   CommitUserData --&gt; Map&lt;String,String&gt;
-                </p>
-
-                <p>
-                    Format is -9 (SegmentInfos.FORMAT_DIAGNOSTICS).
-                </p>
-
-                <p>
-                    Version counts how often the index has been
-                    changed by adding or deleting documents.
-                </p>
-
-                <p>
-                    NameCounter is used to generate names for new segment files.
-                </p>
-
-                <p>
-                    SegVersion is the code version that created the segment.
-                </p>
-
-                <p>
-                    SegName is the name of the segment, and is used as the file name prefix
-                    for all of the files that compose the segment's index.
-                </p>
-
-                <p>
-                    SegSize is the number of documents contained in the segment index.
-                </p>
-
-                <p>
-                    DelGen is the generation count of the separate
-                    deletes file. If this is -1, there are no
-                    separate deletes. If it is 0, this is a pre-2.1
-                    segment and you must check filesystem for the
-                    existence of _X.del. Anything above zero means
-                    there are separate deletes (_X_N.del).
-                </p>
-
-                <p>
-                    NumField is the size of the array for NormGen, or
-                    -1 if there are no NormGens stored.
-                </p>
-
-                <p>
-                    NormGen records the generation of the separate
-                    norms files. If NumField is -1, there are no
-                    normGens stored and they are all assumed to be 0
-                    when the segment file was written pre-2.1 and all
-                    assumed to be -1 when the segments file is 2.1 or
-                    above. The generation then has the same meaning
-                    as delGen (above).
-                </p>
-
-                <p>
-                    IsCompoundFile records whether the segment is
-                    written as a compound file or not. If this is -1,
-                    the segment is not a compound file. If it is 1,
-                    the segment is a compound file. Else it is 0,
-                    which means we check filesystem to see if _X.cfs
-                    exists.
-                </p>
-
-                <p>
-                    If HasSingleNormFile is 1, then the field norms are
-                    written as a single joined file (with extension
-                    <tt>.nrm</tt>); if it is 0 then each field's norms
-                    are stored as separate <tt>.fN</tt> files.  See
-                    "Normalization Factors" below for details.
-                </p>
-
-                <p>
-                   DocStoreOffset, DocStoreSegment,
-                    DocStoreIsCompoundFile: If DocStoreOffset is -1,
-                    this segment has its own doc store (stored fields
-                    values and term vectors) files and DocStoreSegment
-                    and DocStoreIsCompoundFile are not stored.  In
-                    this case all files for stored field values
-                    (<tt>*.fdt</tt> and <tt>*.fdx</tt>) and term
-                    vectors (<tt>*.tvf</tt>, <tt>*.tvd</tt> and
-                    <tt>*.tvx</tt>) will be stored with this segment.
-                    Otherwise, DocStoreSegment is the name of the
-                    segment that has the shared doc store files;
-                    DocStoreIsCompoundFile is 1 if that segment is
-                    stored in compound file format (as a <tt>.cfx</tt>
-                    file); and DocStoreOffset is the starting document
-                    in the shared doc store files where this segment's
-                    documents begin.  In this case, this segment does
-                    not store its own doc store files but instead
-                    shares a single set of these files with other
-                    segments.
-                </p>
-
-                <p>
-                   Checksum contains the CRC32 checksum of all bytes
-                   in the segments_N file up until the checksum.
-                   This is used to verify integrity of the file on
-                   opening the index.
-               </p>
-
-               <p>
-                   DeletionCount records the number of deleted
-                   documents in this segment.
-               </p>
-
-               <p>
-                   HasProx is 1 if any fields in this segment have
-                   position data (IndexOptions.DOCS_AND_FREQS_AND_POSITIONS); else, it's 0.
-               </p>
-
-               <p>
-                   CommitUserData stores an optional user-supplied
-                   opaque Map&lt;String,String&gt; that was passed to
-                   IndexWriter's commit or prepareCommit, or
-                   IndexReader's flush methods.
-                </p>
-               <p>
-                   The Diagnostics Map is privately written by
-                   IndexWriter, as a debugging aid, for each segment
-                   it creates.  It includes metadata like the current
-                   Lucene version, OS, Java version, why the segment
-                   was created (merge, flush, addIndexes), etc.
-                </p>
-         
-        <p> HasVectors is 1 if this segment stores term vectors,
-            else it's 0.
-                </p>
-
-            </section>
-
-            <section id="Lock File"><title>Lock File</title>
-
-                <p>
-                    The write lock, which is stored in the index
-                    directory by default, is named "write.lock".  If
-                    the lock directory is different from the index
-                    directory then the write lock will be named
-                    "XXXX-write.lock" where XXXX is a unique prefix
-                    derived from the full path to the index directory.
-                    When this file is present, a writer is currently
-                    modifying the index (adding or removing
-                    documents).  This lock file ensures that only one
-                    writer is modifying the index at a time.
-                </p>
-            </section>
-
-            <section id="Deletable File"><title>Deletable File</title>
-
-                <p>
-                    A writer dynamically computes
-                    the files that are deletable, instead, so no file
-                    is written.
-                </p>
-
-            </section>
-
-            <section id="Compound Files"><title>Compound Files</title>
-
-                <p>Starting with Lucene 1.4 the compound file format became default. This
-                    is simply a container for all files described in the next section
-                                       (except for the .del file).</p>
-                                                               <p>Compound Entry Table (.cfe) --&gt; Version,  FileCount, &lt;FileName, DataOffset, DataLength&gt;
-                    <sup>FileCount</sup>
-                </p>
-
-                <p>Compound (.cfs) --&gt; FileData <sup>FileCount</sup>
-                </p>
-                
-                                                               <p>Version --&gt; Int</p>
-                                                               
-                <p>FileCount --&gt; VInt</p>
-
-                <p>DataOffset --&gt; Long</p>
-                
-                <p>DataLength --&gt; Long</p>
-
-                <p>FileName --&gt; String</p>
-
-                <p>FileData --&gt; raw file data</p>
-                <p>The raw file data is the data from the individual files named above.</p>
-
-               <p>Starting with Lucene 2.3, doc store files (stored
-               field values and term vectors) can be shared in a
-               single set of files for more than one segment.  When
-               compound file is enabled, these shared files will be
-               added into a single compound file (same format as
-               above) but with the extension <tt>.cfx</tt>.
-               </p>
-
-            </section>
-
-        </section>
-
-        <section id="Per-Segment Files"><title>Per-Segment Files</title>
-
-            <p>
-                The remaining files are all per-segment, and are
-                thus defined by suffix.
-            </p>
-            <section id="Fields"><title>Fields</title>
-                <p>
-                    <br/>
-                    <b>Field Info</b>
-                    <br/>
-                </p>
-
-                <p>
-                    Field names are
-                    stored in the field info file, with suffix .fnm.
-                </p>
-                <p>
-                    FieldInfos
-                    (.fnm) --&gt; FNMVersion,FieldsCount, &lt;FieldName,
-                    FieldBits&gt;
-                    <sup>FieldsCount</sup>
-                </p>
-
-                <p>
-                    FNMVersion, FieldsCount --&gt; VInt
-                </p>
-
-                <p>
-                    FieldName --&gt; String
-                </p>
-
-                <p>
-                    FieldBits --&gt; Byte
-                </p>
-
-                <p>
-                    <ul>
-                        <li>
-                            The low-order bit is one for
-                            indexed fields, and zero for non-indexed fields.
-                        </li>
-                        <li>
-                            The second lowest-order
-                            bit is one for fields that have term vectors stored, and zero for fields
-                            without term vectors.
-                        </li>
-                        <li>If the third lowest-order bit is set (0x04), term positions are stored with the term vectors.</li>
-                        <li>If the fourth lowest-order bit is set (0x08), term offsets are stored with the term vectors.</li>
-                        <li>If the fifth lowest-order bit is set (0x10), norms are omitted for the indexed field.</li>
-                        <li>If the sixth lowest-order bit is set (0x20), payloads are stored for the indexed field.</li>
-                        <li>If the seventh lowest-order bit is set (0x40), term frequencies and positions omitted for the indexed field.</li>
-                        <li>If the eighth lowest-order bit is set (0x80), positions are omitted for the indexed field.</li>
-                    </ul>
-                </p>
-
-               <p>
-                  FNMVersion (added in 2.9) is -2 for indexes from 2.9 - 3.3. It is -3 for indexes in Lucene 3.4+
-               </p>
-
-                <p>
-                    Fields are numbered by their order in this file. Thus field zero is
-                    the
-                    first field in the file, field one the next, and so on. Note that,
-                    like document numbers, field numbers are segment relative.
-                </p>
-
-
-
-                <p>
-                    <br/>
-                    <b>Stored Fields</b>
-                    <br/>
-                </p>
-
-                <p>
-                    Stored fields are represented by two files:
-                </p>
-
-                <ol>
-                    <li><a name="field_index"/>
-                        <p>
-                            The field index, or .fdx file.
-                        </p>
-
-                        <p>
-                            This contains, for each document, a pointer to
-                            its field data, as follows:
-                        </p>
-
-                        <p>
-                            FieldIndex
-                            (.fdx) --&gt;
-                            &lt;FieldValuesPosition&gt;
-                            <sup>SegSize</sup>
-                        </p>
-                        <p>FieldValuesPosition
-                            --&gt; Uint64
-                        </p>
-                        <p>This
-                            is used to find the location within the field data file of the
-                            fields of a particular document. Because it contains fixed-length
-                            data, this file may be easily randomly accessed. The position of
-                            document
-                            <i>n</i>
-                            's
-                            <i></i>
-                            field data is the Uint64 at
-                            <i>n*8</i>
-                            in
-                            this file.
-                        </p>
-                    </li>
-                    <li>
-                        <p><a name="field_data"/>
-                            The field data, or .fdt file.
-
-                        </p>
-
-                        <p>
-                            This contains the stored fields of each document,
-                            as follows:
-                        </p>
-
-                        <p>
-                            FieldData (.fdt) --&gt;
-                            &lt;DocFieldData&gt;
-                            <sup>SegSize</sup>
-                        </p>
-                        <p>DocFieldData --&gt;
-                            FieldCount, &lt;FieldNum, Bits, Value&gt;
-                            <sup>FieldCount</sup>
-                        </p>
-                        <p>FieldCount --&gt;
-                            VInt
-                        </p>
-                        <p>FieldNum --&gt;
-                            VInt
-                        </p>
-                        <p>Bits --&gt;
-                            Byte
-                        </p>
-                        <p>
-                            <ul>
-                                <li>low order bit is one for tokenized fields</li>
-                                <li>second bit is one for fields containing binary data</li>
-                                <li>third bit is one for fields with compression option enabled
-                                    (if compression is enabled, the algorithm used is ZLIB),
-                                    only available for indexes until Lucene version 2.9.x</li>
-                                <li>4th to 6th bit (mask: 0x7&lt;&lt;3) define the type of a
-                                numeric field: <ul>
-                                  <li>all bits in mask are cleared if no numeric field at all</li>
-                                  <li>1&lt;&lt;3: Value is Int</li>
-                                  <li>2&lt;&lt;3: Value is Long</li>
-                                  <li>3&lt;&lt;3: Value is Int as Float (as of Float.intBitsToFloat)</li>
-                                  <li>4&lt;&lt;3: Value is Long as Double (as of Double.longBitsToDouble)</li>
-                                </ul></li>
-                            </ul>
-                        </p>
-                        <p>Value --&gt;
-                            String | BinaryValue | Int | Long (depending on Bits)
-                        </p>
-                        <p>BinaryValue --&gt;
-                            ValueSize, &lt;Byte&gt;^ValueSize
-                        </p>
-                        <p>ValueSize --&gt;
-                            VInt
-                        </p>
-
-                    </li>
-                </ol>
-
-            </section>
-            <section id="Term Dictionary"><title>Term Dictionary</title>
-
-                <p>
-                    The term dictionary is represented as two files:
-                </p>
-                <ol>
-                    <li><a name="tis"/>
-                        <p>
-                            The term infos, or tis file.
-                        </p>
-
-                        <p>
-                            TermInfoFile (.tis)--&gt;
-                            TIVersion, TermCount, IndexInterval, SkipInterval, MaxSkipLevels, TermInfos
-                        </p>
-                        <p>TIVersion --&gt;
-                            UInt32
-                        </p>
-                        <p>TermCount --&gt;
-                            UInt64
-                        </p>
-                        <p>IndexInterval --&gt;
-                            UInt32
-                        </p>
-                        <p>SkipInterval --&gt;
-                            UInt32
-                        </p>
-                        <p>MaxSkipLevels --&gt;
-                            UInt32
-                        </p>
-                        <p>TermInfos --&gt;
-                            &lt;TermInfo&gt;
-                            <sup>TermCount</sup>
-                        </p>
-                        <p>TermInfo --&gt;
-                            &lt;Term, DocFreq, FreqDelta, ProxDelta, SkipDelta&gt;
-                        </p>
-                        <p>Term --&gt;
-                            &lt;PrefixLength, Suffix, FieldNum&gt;
-                        </p>
-                        <p>Suffix --&gt;
-                            String
-                        </p>
-                        <p>PrefixLength,
-                            DocFreq, FreqDelta, ProxDelta, SkipDelta
-                            <br/>
-                            --&gt; VInt
-                        </p>
-                        <p>
-                           This file is sorted by Term. Terms are
-                            ordered first lexicographically (by UTF16
-                            character code) by the term's field name,
-                            and within that lexicographically (by
-                            UTF16 character code) by the term's text.
-                        </p>
-                        <p>TIVersion names the version of the format
-                            of this file and is equal to TermInfosWriter.FORMAT_CURRENT.
-                        </p>
-                        <p>Term
-                            text prefixes are shared. The PrefixLength is the number of initial
-                            characters from the previous term which must be pre-pended to a
-                            term's suffix in order to form the term's text. Thus, if the
-                            previous term's text was "bone" and the term is "boy",
-                            the PrefixLength is two and the suffix is "y".
-                        </p>
-                        <p>FieldNumber
-                            determines the term's field, whose name is stored in the .fdt file.
-                        </p>
-                        <p>DocFreq
-                            is the count of documents which contain the term.
-                        </p>
-                        <p>FreqDelta
-                            determines the position of this term's TermFreqs within the .frq
-                            file. In particular, it is the difference between the position of
-                            this term's data in that file and the position of the previous
-                            term's data (or zero, for the first term in the file).
-                        </p>
-                        <p>ProxDelta
-                            determines the position of this term's TermPositions within the .prx
-                            file. In particular, it is the difference between the position of
-                            this term's data in that file and the position of the previous
-                            term's data (or zero, for the first term in the file.  For fields
-                                       that omit position data, this will be 0 since
-                            prox information is not stored.
-                        </p>
-                        <p>SkipDelta determines the position of this
-                            term's SkipData within the .frq file. In
-                            particular, it is the number of bytes
-                            after TermFreqs that the SkipData starts.
-                            In other words, it is the length of the
-                            TermFreq data. SkipDelta is only stored 
-                            if DocFreq is not smaller than SkipInterval.
-                        </p>
-                    </li>
-                    <li>
-                        <p><a name="tii"/>
-                            The term info index, or .tii file.
-                        </p>
-
-                        <p>
-                            This contains every IndexInterval
-                            <sup>th</sup>
-                            entry from the .tis
-                            file, along with its location in the &quot;tis&quot; file. This is
-                            designed to be read entirely into memory and used to provide random
-                            access to the &quot;tis&quot; file.
-                        </p>
-
-                        <p>
-                            The structure of this file is very similar to the
-                            .tis file, with the addition of one item per record, the IndexDelta.
-                        </p>
-
-                        <p>
-                            TermInfoIndex (.tii)--&gt;
-                            TIVersion, IndexTermCount, IndexInterval, SkipInterval, MaxSkipLevels, TermIndices
-                        </p>
-                        <p>TIVersion --&gt;
-                            UInt32
-                        </p>
-                        <p>IndexTermCount --&gt;
-                            UInt64
-                        </p>
-                        <p>IndexInterval --&gt;
-                            UInt32
-                        </p>
-                        <p>SkipInterval --&gt;
-                            UInt32
-                        </p>
-                        <p>TermIndices --&gt;
-                            &lt;TermInfo, IndexDelta&gt;
-                            <sup>IndexTermCount</sup>
-                        </p>
-                        <p>IndexDelta --&gt;
-                            VLong
-                        </p>
-                        <p>IndexDelta
-                            determines the position of this term's TermInfo within the .tis file. In
-                            particular, it is the difference between the position of this term's
-                            entry in that file and the position of the previous term's entry.
-                        </p>
-                        <p>SkipInterval is the fraction of TermDocs stored in skip tables. It is used to accelerate TermDocs.skipTo(int).
-                            Larger values result in smaller indexes, greater acceleration, but fewer accelerable cases, while
-                            smaller values result in bigger indexes, less acceleration (in case of a small value for MaxSkipLevels) and more
-                            accelerable cases.</p>
-                        <p>MaxSkipLevels is the max. number of skip levels stored for each term in the .frq file. A low value results in 
-                           smaller indexes but less acceleration, a larger value results in slighly larger indexes but greater acceleration.
-                           See format of .frq file for more information about skip levels.</p>
-                    </li>
-                </ol>
-            </section>
-
-            <section id="Frequencies"><title>Frequencies</title>
-
-                <p>
-                    The .frq file contains the lists of documents
-                    which contain each term, along with the frequency of the term in that
-                    document (except when frequencies are omitted: IndexOptions.DOCS_ONLY).
-                </p>
-                <p>FreqFile (.frq) --&gt;
-                    &lt;TermFreqs, SkipData&gt;
-                    <sup>TermCount</sup>
-                </p>
-                <p>TermFreqs --&gt;
-                    &lt;TermFreq&gt;
-                    <sup>DocFreq</sup>
-                </p>
-                <p>TermFreq --&gt;
-                    DocDelta[, Freq?]
-                </p>
-                <p>SkipData --&gt;
-                    &lt;&lt;SkipLevelLength, SkipLevel&gt;
-                    <sup>NumSkipLevels-1</sup>, SkipLevel&gt;
-                    &lt;SkipDatum&gt;
-                </p>
-                <p>SkipLevel --&gt;
-                    &lt;SkipDatum&gt;
-                    <sup>DocFreq/(SkipInterval^(Level + 1))</sup>
-                </p>
-                <p>SkipDatum --&gt;
-                    DocSkip,PayloadLength?,FreqSkip,ProxSkip,SkipChildLevelPointer?
-                </p>
-                <p>DocDelta,Freq,DocSkip,PayloadLength,FreqSkip,ProxSkip --&gt;
-                    VInt
-                </p>
-                <p>SkipChildLevelPointer --&gt;
-                    VLong
-                </p>
-                <p>TermFreqs
-                    are ordered by term (the term is implicit, from the .tis file).
-                </p>
-                <p>TermFreq
-                    entries are ordered by increasing document number.
-                </p>
-                <p>DocDelta: if frequencies are indexed, this determines both
-                    the document number and the frequency. In
-                    particular, DocDelta/2 is the difference between
-                    this document number and the previous document
-                    number (or zero when this is the first document in
-                    a TermFreqs). When DocDelta is odd, the frequency
-                    is one. When DocDelta is even, the frequency is
-                    read as another VInt.  If frequencies are omitted, DocDelta
-                    contains the gap (not multiplied by 2) between
-                    document numbers and no frequency information is
-                    stored.
-                </p>
-                <p>For example, the TermFreqs for a term which occurs
-                    once in document seven and three times in document
-                    eleven, with frequencies indexed, would be the following
-                    sequence of VInts:
-                </p>
-                <p>15, 8, 3
-                </p>
-               <p> If frequencies were omitted (IndexOptions.DOCS_ONLY) it would be this sequence
-               of VInts instead:
-                 </p>
-                <p>
-                  7,4
-                 </p>
-                <p>DocSkip records the document number before every
-                    SkipInterval
-                    <sup>th</sup>
-                    document in TermFreqs.
-                    If payloads are disabled for the term's field,
-                    then DocSkip represents the difference from the
-                    previous value in the sequence.
-                    If payloads are enabled for the term's field, 
-                    then DocSkip/2 represents the difference from the
-                    previous value in the sequence. If payloads are enabled
-                    and DocSkip is odd,
-                    then PayloadLength is stored indicating the length 
-                    of the last payload before the SkipInterval<sup>th</sup>
-                    document in TermPositions.
-                                       FreqSkip and ProxSkip record the position of every
-                    SkipInterval
-                    <sup>th</sup>
-                    entry in FreqFile and
-                    ProxFile, respectively. File positions are
-                    relative to the start of TermFreqs and Positions,
-                    to the previous SkipDatum in the sequence.
-                </p>
-                <p>For example, if DocFreq=35 and SkipInterval=16,
-                    then there are two SkipData entries, containing
-                    the 15
-                    <sup>th</sup>
-                    and 31
-                    <sup>st</sup>
-                    document
-                    numbers in TermFreqs. The first FreqSkip names
-                    the number of bytes after the beginning of
-                    TermFreqs that the 16
-                    <sup>th</sup>
-                    SkipDatum
-                    starts, and the second the number of bytes after
-                    that that the 32
-                    <sup>nd</sup>
-                    starts. The first
-                    ProxSkip names the number of bytes after the
-                    beginning of Positions that the 16
-                    <sup>th</sup>
-                    SkipDatum starts, and the second the number of
-                    bytes after that that the 32
-                    <sup>nd</sup>
-                    starts.
-                </p>
-                <p>Each term can have multiple skip levels.
-                   The amount of skip levels for a term is NumSkipLevels = Min(MaxSkipLevels, floor(log(DocFreq/log(SkipInterval)))).
-                   The number of SkipData entries for a skip level is DocFreq/(SkipInterval^(Level + 1)), whereas the lowest skip
-                   level is Level=0. <br></br>
-                   Example: SkipInterval = 4, MaxSkipLevels = 2, DocFreq = 35. Then skip level 0 has 8 SkipData entries,
-                   containing the 3<sup>rd</sup>, 7<sup>th</sup>, 11<sup>th</sup>, 15<sup>th</sup>, 19<sup>th</sup>, 23<sup>rd</sup>,
-                   27<sup>th</sup>, and 31<sup>st</sup> document numbers in TermFreqs. Skip level 1 has 2 SkipData entries, containing the
-                   15<sup>th</sup> and 31<sup>st</sup> document numbers in TermFreqs. <br></br>
-                   The SkipData entries on all upper levels &gt; 0 contain a SkipChildLevelPointer referencing the corresponding SkipData
-                   entry in level-1. In the example has entry 15 on level 1 a pointer to entry 15 on level 0 and entry 31 on level 1 a pointer
-                   to entry 31 on level 0.                   
-                </p>
-
-            </section>
-            <section id="Positions"><title>Positions</title>
-
-                <p>
-                    The .prx file contains the lists of positions that
-                    each term occurs at within documents.  Note that
-                    fields omitting positional data do not store
-                    anything into this file, and if all fields in the
-                    index omit positional data then the .prx file will not
-                    exist.
-                </p>
-                <p>ProxFile (.prx) --&gt;
-                    &lt;TermPositions&gt;
-                    <sup>TermCount</sup>
-                </p>
-                <p>TermPositions --&gt;
-                    &lt;Positions&gt;
-                    <sup>DocFreq</sup>
-                </p>
-                <p>Positions --&gt;
-                    &lt;PositionDelta,Payload?&gt;
-                    <sup>Freq</sup>
-                </p>
-                <p>Payload --&gt;
-                    &lt;PayloadLength?,PayloadData&gt;
-                </p>
-                <p>PositionDelta --&gt;
-                    VInt
-                </p>
-                <p>PayloadLength --&gt;
-                    VInt
-                </p>
-                <p>PayloadData --&gt;
-                    byte<sup>PayloadLength</sup>
-                </p>
-                <p>TermPositions
-                    are ordered by term (the term is implicit, from the .tis file).
-                </p>
-                <p>Positions
-                    entries are ordered by increasing document number (the document
-                    number is implicit from the .frq file).
-                </p>
-                <p>PositionDelta
-                    is, if payloads are disabled for the term's field, the difference 
-                    between the position of the current occurrence in
-                    the document and the previous occurrence (or zero, if this is the
-                    first occurrence in this document).
-                    If payloads are enabled for the term's field, then PositionDelta/2
-                    is the difference between the current and the previous position. If
-                    payloads are enabled and PositionDelta is odd, then PayloadLength is 
-                    stored, indicating the length of the payload at the current term position.
-                </p>
-                <p>
-                    For example, the TermPositions for a
-                    term which occurs as the fourth term in one document, and as the
-                    fifth and ninth term in a subsequent document, would be the following
-                    sequence of VInts (payloads disabled):
-                </p>
-                <p>4,
-                    5, 4
-                </p>
-                <p>PayloadData
-                    is metadata associated with the current term position. If PayloadLength
-                    is stored at the current position, then it indicates the length of this 
-                    Payload. If PayloadLength is not stored, then this Payload has the same
-                    length as the Payload at the previous position.
-                </p>
-            </section>
-            <section id="Normalization Factors"><title>Normalization Factors</title>
-
-                                       <p>There's a single .nrm file containing all norms:
-                </p>
-                <p>AllNorms
-                    (.nrm) --&gt; NormsHeader,&lt;Norms&gt;
-                    <sup>NumFieldsWithNorms</sup>
-                </p>
-                <p>Norms
-                    --&gt; &lt;Byte&gt;
-                    <sup>SegSize</sup>
-                </p>
-                <p>NormsHeader
-                    --&gt; 'N','R','M',Version
-                </p>
-                <p>Version
-                    --&gt; Byte
-                </p>
-                <p>NormsHeader 
-                                       has 4 bytes, last of which is the format version for this file, currently -1.
-                </p>
-                <p>Each
-                    byte encodes a floating point value. Bits 0-2 contain the 3-bit
-                    mantissa, and bits 3-8 contain the 5-bit exponent.
-                </p>
-                <p>These
-                    are converted to an IEEE single float value as follows:
-                </p>
-                <ol>
-                    <li>
-                        <p>If
-                            the byte is zero, use a zero float.
-                        </p>
-                    </li>
-                    <li>
-                        <p>Otherwise,
-                            set the sign bit of the float to zero;
-                        </p>
-                    </li>
-                    <li>
-                        <p>add
-                            48 to the exponent and use this as the float's exponent;
-                        </p>
-                    </li>
-                    <li>
-                        <p>map
-                            the mantissa to the high-order 3 bits of the float's mantissa; and
-
-                        </p>
-                    </li>
-                    <li>
-                        <p>set
-                            the low-order 21 bits of the float's mantissa to zero.
-                        </p>
-                    </li>
-                </ol>
-                <p>A separate norm file is created when the norm values of an existing segment are modified. 
-                                       When field <em>N</em> is modified, a separate norm file <em>.sN</em> 
-                                       is created, to maintain the norm values for that field.
-                </p>
-                               <p>Separate norm files are created (when adequate) for both compound and non compound segments.
-                </p>
-
-            </section>
-            <section id="Term Vectors"><title>Term Vectors</title>
-                <p>
-                 Term Vector support is an optional on a field by
-                  field basis. It consists of 3 files.
-                </p>
-                <ol>
-                    <li><a name="tvx"/>
-                        <p>The Document Index or .tvx file.</p>
-                        <p>For each document, this stores the offset
-                           into the document data (.tvd) and field
-                           data (.tvf) files.
-                        </p>
-                        <p>DocumentIndex (.tvx) --&gt; TVXVersion&lt;DocumentPosition,FieldPosition&gt;
-                            <sup>NumDocs</sup>
-                        </p>
-                        <p>TVXVersion --&gt; Int (TermVectorsReader.CURRENT)</p>
-                        <p>DocumentPosition --&gt; UInt64 (offset in
-                        the .tvd file)</p>
-                        <p>FieldPosition --&gt; UInt64 (offset in the
-                        .tvf file)</p>
-                    </li>
-                    <li><a name="tvd"/>
-                        <p>The Document or .tvd file.</p>
-                        <p>This contains, for each document, the number of fields, a list of the fields with
-                            term vector info and finally a list of pointers to the field information in the .tvf
-                            (Term Vector Fields) file.</p>
-                        <p>
-                            Document (.tvd) --&gt; TVDVersion&lt;NumFields, FieldNums, FieldPositions&gt;
-                            <sup>NumDocs</sup>
-                        </p>
-                        <p>TVDVersion --&gt; Int (TermVectorsReader.FORMAT_CURRENT)</p>
-                        <p>NumFields --&gt; VInt</p>
-                        <p>FieldNums --&gt; &lt;FieldNumDelta&gt;
-                            <sup>NumFields</sup>
-                        </p>
-                        <p>FieldNumDelta --&gt; VInt</p>
-                        <p>FieldPositions --&gt; &lt;FieldPositionDelta&gt;
-                            <sup>NumFields-1</sup>
-                        </p>
-                        <p>FieldPositionDelta --&gt; VLong</p>
-                        <p>The .tvd file is used to map out the fields that have term vectors stored and
-                            where the field information is in the .tvf file.</p>
-                    </li>
-                    <li><a name="tvf"/>
-                        <p>The Field or .tvf file.</p>
-                        <p>This file contains, for each field that has a term vector stored, a list of
-                            the terms, their frequencies and, optionally, position and offest information.</p>
-                        <p>Field (.tvf) --&gt; TVFVersion&lt;NumTerms, Position/Offset, TermFreqs&gt;
-                            <sup>NumFields</sup>
-                        </p>
-                        <p>TVFVersion --&gt; Int (TermVectorsReader.FORMAT_CURRENT)</p>
-                        <p>NumTerms --&gt; VInt</p>
-                        <p>Position/Offset --&gt; Byte</p>
-                        <p>TermFreqs --&gt; &lt;TermText, TermFreq, Positions?, Offsets?&gt;
-                            <sup>NumTerms</sup>
-                        </p>
-                        <p>TermText --&gt; &lt;PrefixLength, Suffix&gt;</p>
-                        <p>PrefixLength --&gt; VInt</p>
-                        <p>Suffix --&gt; String</p>
-                        <p>TermFreq --&gt; VInt</p>
-                        <p>Positions --&gt; &lt;VInt&gt;<sup>TermFreq</sup></p>
-                        <p>Offsets --&gt; &lt;VInt, VInt&gt;<sup>TermFreq</sup></p>
-                        <br/>
-                        <p>Notes:</p>
-                        <ul>
-                            <li>Position/Offset byte stores whether this term vector has position or offset information stored.</li>
-                            <li>Term
-                                text prefixes are shared. The PrefixLength is the number of initial
-                                characters from the previous term which must be pre-pended to a
-                                term's suffix in order to form the term's text. Thus, if the
-                                previous term's text was "bone" and the term is "boy",
-                                the PrefixLength is two and the suffix is "y".
-                            </li>
-                            <li>Positions are stored as delta encoded VInts. This means we only store the difference of the current position from the last position</li>
-                            <li>Offsets are stored as delta encoded VInts. The first VInt is the startOffset, the second is the endOffset.</li>
-                        </ul>
-
-
-                    </li>
-                </ol>
-            </section>
-
-            <section id="Deleted Documents"><title>Deleted Documents</title>
-
-                <p>The .del file is
-                    optional, and only exists when a segment contains deletions.
-                </p>
-
-                <p>Although per-segment, this file is maintained exterior to compound segment files.
-                </p>
-                <p>
-                Deletions
-                    (.del) --&gt; [Format],ByteCount,BitCount, Bits | DGaps (depending on Format)
-                </p>
-
-                <p>Format,ByteSize,BitCount --&gt;
-                    Uint32
-                </p>
-
-                <p>Bits --&gt;
-                    &lt;Byte&gt;
-                    <sup>ByteCount</sup>
-                </p>
-
-                               <p>DGaps --&gt;
-                    &lt;DGap,NonzeroByte&gt;
-                    <sup>NonzeroBytesCount</sup>
-                </p>
-
-                <p>DGap --&gt;
-                    VInt
-                </p>
-
-                <p>NonzeroByte --&gt;
-                    Byte
-                </p>
-                               
-                <p>Format
-                    is Optional. -1 indicates DGaps. Non-negative value indicates Bits, and that Format is excluded.
-                </p>
-
-                <p>ByteCount
-                    indicates the number of bytes in Bits. It is typically
-                    (SegSize/8)+1.
-                </p>
-
-                <p>
-                    BitCount
-                    indicates the number of bits that are currently set in Bits.
-                </p>
-
-                <p>Bits
-                    contains one bit for each document indexed. When the bit
-                    corresponding to a document number is set, that document is marked as
-                    deleted. Bit ordering is from least to most significant. Thus, if
-                    Bits contains two bytes, 0x00 and 0x02, then document 9 is marked as
-                    deleted.
-                </p>
-
-                               <p>DGaps
-                    represents sparse bit-vectors more efficiently than Bits.
-                    It is made of DGaps on indexes of nonzero bytes in Bits,
-                    and the nonzero bytes themselves. The number of nonzero bytes
-                    in Bits (NonzeroBytesCount) is not stored.
-                </p>
-                <p>For example,
-                    if there are 8000 bits and only bits 10,12,32 are set,
-                    DGaps would be used:
-                </p>
-                <p>
-                    (VInt) 1 , (byte) 20 , (VInt) 3 , (Byte) 1
-                </p>
-            </section>
-        </section>
-
-        <section id="Limitations"><title>Limitations</title>
-
-            <p>
-             When referring to term numbers, Lucene's current
-             implementation uses a Java <code>int</code> to hold the
-             term index, which means the maximum number of unique
-             terms in any single index segment is ~2.1 billion times
-             the term index interval (default 128) = ~274 billion.
-             This is technically not a limitation of the index file
-             format, just of Lucene's current implementation.
-           </p>
-           <p>
-             Similarly, Lucene uses a Java <code>int</code> to refer
-             to document numbers, and the index file format uses an
-             <code>Int32</code> on-disk to store document numbers.
-             This is a limitation of both the index file format and
-             the current implementation.  Eventually these should be
-             replaced with either <code>UInt64</code> values, or
-             better yet, <code>VInt</code> values which have no
-             limit.
-            </p>
-
-        </section>
-
-    </body>
-
-</document>