lucene-java-3.4.0/lucene/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/package.html

   1 <!doctype html public "-//w3c//dtd html 4.0 transitional//en">
   2 <!--
   3  Licensed to the Apache Software Foundation (ASF) under one or more
   4  contributor license agreements.  See the NOTICE file distributed with
   5  this work for additional information regarding copyright ownership.
   6  The ASF licenses this file to You under the Apache License, Version 2.0
   7  (the "License"); you may not use this file except in compliance with
   8  the License.  You may obtain a copy of the License at
   9
  10      http://www.apache.org/licenses/LICENSE-2.0
  11
  12  Unless required by applicable law or agreed to in writing, software
  13  distributed under the License is distributed on an "AS IS" BASIS,
  14  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  15  See the License for the specific language governing permissions and
  16  limitations under the License.
  17 -->
  18 <HTML>
  19 <HEAD>
  20     <TITLE>Benchmarking Lucene By Tasks</TITLE>
  21 </HEAD>
  22 <BODY>
  23 <DIV>
  24 Benchmarking Lucene By Tasks.
  25 <p>
  26 This package provides "task based" performance benchmarking of Lucene.
  27 One can use the predefined benchmarks, or create new ones.
  28 </p>
  29 <p>
  30 Contained packages:
  31 </p>
  32
  33 <table border=1 cellpadding=4>
  34  <tr>
  35    <td><b>Package</b></td>
  36    <td><b>Description</b></td>
  37  </tr>
  38  <tr>
  39    <td><a href="stats/package-summary.html">stats</a></td>
  40    <td>Statistics maintained when running benchmark tasks.</td>
  41  </tr>
  42  <tr>
  43    <td><a href="tasks/package-summary.html">tasks</a></td>
  44    <td>Benchmark tasks.</td>
  45  </tr>
  46  <tr>
  47    <td><a href="feeds/package-summary.html">feeds</a></td>
  48    <td>Sources for benchmark inputs: documents and queries.</td>
  49  </tr>
  50  <tr>
  51    <td><a href="utils/package-summary.html">utils</a></td>
  52    <td>Utilities used for the benchmark, and for the reports.</td>
  53  </tr>
  54  <tr>
  55    <td><a href="programmatic/package-summary.html">programmatic</a></td>
  56    <td>Sample performance test written programatically.</td>
  57  </tr>
  58 </table>
  59
  60 <h2>Table Of Contents</h2>
  61 <p>
  62     <ol>
  63         <li><a href="#concept">Benchmarking By Tasks</a></li>
  64         <li><a href="#usage">How to use</a></li>
  65         <li><a href="#algorithm">Benchmark "algorithm"</a></li>
  66         <li><a href="#tasks">Supported tasks/commands</a></li>
  67         <li><a href="#properties">Benchmark properties</a></li>
  68         <li><a href="#example">Example input algorithm and the result benchmark
  69                     report.</a></li>
  70         <li><a href="#recsCounting">Results record counting clarified</a></li>
  71     </ol>
  72 </p>
  73 <a name="concept"></a>
  74 <h2>Benchmarking By Tasks</h2>
  75 <p>
  76 Benchmark Lucene using task primitives.
  77 </p>
  78
  79 <p>
  80 A benchmark is composed of some predefined tasks, allowing for creating an
  81 index, adding documents,
  82 optimizing, searching, generating reports, and more. A benchmark run takes an
  83 "algorithm" file
  84 that contains a description of the sequence of tasks making up the run, and some
  85 properties defining a few
  86 additional characteristics of the benchmark run.
  87 </p>
  88
  89 <a name="usage"></a>
  90 <h2>How to use</h2>
  91 <p>
  92 Easiest way to run a benchmarks is using the predefined ant task:
  93 <ul>
  94  <li>ant run-task
  95      <br>- would run the <code>micro-standard.alg</code> "algorithm".
  96  </li>
  97  <li>ant run-task -Dtask.alg=conf/compound-penalty.alg
  98      <br>- would run the <code>compound-penalty.alg</code> "algorithm".
  99  </li>
 100  <li>ant run-task -Dtask.alg=[full-path-to-your-alg-file]
 101      <br>- would run <code>your perf test</code> "algorithm".
 102  </li>
 103  <li>java org.apache.lucene.benchmark.byTask.programmatic.Sample
 104      <br>- would run a performance test programmatically - without using an alg
 105      file. This is less readable, and less convinient, but possible.
 106  </li>
 107 </ul>
 108 </p>
 109
 110 <p>
 111 You may find existing tasks sufficient for defining the benchmark <i>you</i>
 112 need, otherwise, you can extend the framework to meet your needs, as explained
 113 herein.
 114 </p>
 115
 116 <p>
 117 Each benchmark run has a DocMaker and a QueryMaker. These two should usually
 118 match, so that "meaningful" queries are used for a certain collection.
 119 Properties set at the header of the alg file define which "makers" should be
 120 used. You can also specify your own makers, extending DocMaker and implementing
 121 QureyMaker.
 122         <blockquote>
 123                 <b>Note:</b> since 2.9, DocMaker is a concrete class which accepts a
 124                 ContentSource. In most cases, you can use the DocMaker class to create
 125                 Documents, while providing your own ContentSource implementation. For
 126                 example, the current Benchmark package includes ContentSource
 127                 implementations for TREC, Enwiki and Reuters collections, as well as
 128                 others like LineDocSource which reads a 'line' file produced by
 129                 WriteLineDocTask.
 130         </blockquote>
 131 </p>
 132
 133 <p>
 134 Benchmark .alg file contains the benchmark "algorithm". The syntax is described
 135 below. Within the algorithm, you can specify groups of commands, assign them
 136 names, specify commands that should be repeated,
 137 do commands in serial or in parallel,
 138 and also control the speed of "firing" the commands.
 139 </p>
 140
 141 <p>
 142 This allows, for instance, to specify
 143 that an index should be opened for update,
 144 documents should be added to it one by one but not faster than 20 docs a minute,
 145 and, in parallel with this,
 146 some N queries should be searched against that index,
 147 again, no more than 2 queries a second.
 148 You can have the searches all share an index reader,
 149 or have them each open its own reader and close it afterwords.
 150 </p>
 151
 152 <p>
 153 If the commands available for use in the algorithm do not meet your needs,
 154 you can add commands by adding a new task under
 155 org.apache.lucene.benchmark.byTask.tasks -
 156 you should extend the PerfTask abstract class.
 157 Make sure that your new task class name is suffixed by Task.
 158 Assume you added the class "WonderfulTask" - doing so also enables the
 159 command "Wonderful" to be used in the algorithm.
 160 </p>
 161
 162 <p>
 163 <u>External classes</u>: It is sometimes useful to invoke the benchmark
 164 package with your external alg file that configures the use of your own
 165 doc/query maker and or html parser. You can work this out without
 166 modifying the benchmark package code, by passing your class path
 167 with the benchmark.ext.classpath property:
 168 <ul>
 169   <li>ant run-task -Dtask.alg=[full-path-to-your-alg-file]
 170       <font color="#FF0000">-Dbenchmark.ext.classpath=/mydir/classes
 171       </font> -Dtask.mem=512M</li>
 172 </ul>
 173 <u>External tasks</u>: When writing your own tasks under a package other than
 174 <b>org.apache.lucene.benchmark.byTask.tasks</b> specify that package thru the
 175 <font color="#FF0000">alt.tasks.packages</font> property.
 176 </p>
 177
 178 <a name="algorithm"></a>
 179 <h2>Benchmark "algorithm"</h2>
 180
 181 <p>
 182 The following is an informal description of the supported syntax.
 183 </p>
 184
 185 <ol>
 186  <li>
 187  <b>Measuring</b>: When a command is executed, statistics for the elapsed
 188  execution time and memory consumption are collected.
 189  At any time, those statistics can be printed, using one of the
 190  available ReportTasks.
 191  </li>
 192  <li>
 193  <b>Comments</b> start with '<font color="#FF0066">#</font>'.
 194  </li>
 195  <li>
 196  <b>Serial</b> sequences are enclosed within '<font color="#FF0066">{ }</font>'.
 197  </li>
 198  <li>
 199  <b>Parallel</b> sequences are enclosed within
 200  '<font color="#FF0066">[ ]</font>'
 201  </li>
 202  <li>
 203  <b>Sequence naming:</b> To name a sequence, put
 204  '<font color="#FF0066">"name"</font>' just after
 205  '<font color="#FF0066">{</font>' or '<font color="#FF0066">[</font>'.
 206  <br>Example - <font color="#FF0066">{ "ManyAdds" AddDoc } : 1000000</font> -
 207  would
 208  name the sequence of 1M add docs "ManyAdds", and this name would later appear
 209  in statistic reports.
 210  If you don't specify a name for a sequence, it is given one: you can see it as
 211  the  algorithm is printed just before benchmark execution starts.
 212  </li>
 213  <li>
 214  <b>Repeating</b>:
 215  To repeat sequence tasks N times, add '<font color="#FF0066">: N</font>' just
 216  after the
 217  sequence closing tag - '<font color="#FF0066">}</font>' or
 218  '<font color="#FF0066">]</font>' or '<font color="#FF0066">></font>'.
 219  <br>Example -  <font color="#FF0066">[ AddDoc ] : 4</font>  - would do 4 addDoc
 220  in parallel, spawning 4 threads at once.
 221  <br>Example -  <font color="#FF0066">[ AddDoc AddDoc ] : 4</font>  - would do
 222  8 addDoc in parallel, spawning 8 threads at once.
 223  <br>Example -  <font color="#FF0066">{ AddDoc } : 30</font> - would do addDoc
 224  30 times in a row.
 225  <br>Example -  <font color="#FF0066">{ AddDoc AddDoc } : 30</font> - would do
 226  addDoc 60 times in a row.
 227  <br><b>Exhaustive repeating</b>: use <font color="#FF0066">*</font> instead of
 228  a number to repeat exhaustively.
 229  This is sometimes useful, for adding as many files as a doc maker can create,
 230  without iterating over the same file again, especially when the exact
 231  number of documents is not known in advance. For insance, TREC files extracted
 232  from a zip file. Note: when using this, you must also set
 233  <font color="#FF0066">doc.maker.forever</font> to false.
 234  <br>Example -  <font color="#FF0066">{ AddDoc } : *</font>  - would add docs
 235  until the doc maker is "exhausted".
 236  </li>
 237  <li>
 238  <b>Command parameter</b>: a command can optionally take a single parameter.
 239  If the certain command does not support a parameter, or if the parameter is of
 240  the wrong type,
 241  reading the algorithm will fail with an exception and the test would not start.
 242  Currently the following tasks take optional parameters:
 243  <ul>
 244    <li><b>AddDoc</b> takes a numeric parameter, indicating the required size of
 245        added document. Note: if the DocMaker implementation used in the test
 246        does not support makeDoc(size), an exception would be thrown and the test
 247        would fail.
 248    </li>
 249    <li><b>DeleteDoc</b> takes numeric parameter, indicating the docid to be
 250        deleted. The latter is not very useful for loops, since the docid is
 251        fixed, so for deletion in loops it is better to use the
 252        <code>doc.delete.step</code> property.
 253    </li>
 254    <li><b>SetProp</b> takes a <code>name,value<code> mandatory param,
 255        ',' used as a separator.
 256    </li>
 257    <li><b>SearchTravRetTask</b> and <b>SearchTravTask</b> take a numeric
 258               parameter, indicating the required traversal size.
 259    </li>
 260    <li><b>SearchTravRetLoadFieldSelectorTask</b> takes a string
 261               parameter: a comma separated list of Fields to load.
 262    </li>
 263    <li><b>SearchTravRetHighlighterTask</b> takes a string
 264               parameter: a comma separated list of parameters to define highlighting.  See that
 265      tasks javadocs for more information
 266    </li>
 267  </ul>
 268  <br>Example - <font color="#FF0066">AddDoc(2000)</font> - would add a document
 269  of size 2000 (~bytes).
 270  <br>See conf/task-sample.alg for how this can be used, for instance, to check
 271  which is faster, adding
 272  many smaller documents, or few larger documents.
 273  Next candidates for supporting a parameter may be the Search tasks,
 274  for controlling the qurey size.
 275  </li>
 276  <li>
 277  <b>Statistic recording elimination</b>: - a sequence can also end with
 278  '<font color="#FF0066">></font>',
 279  in which case child tasks would not store their statistics.
 280  This can be useful to avoid exploding stats data, for adding say 1M docs.
 281  <br>Example - <font color="#FF0066">{ "ManyAdds" AddDoc > : 1000000</font> -
 282  would add million docs, measure that total, but not save stats for each addDoc.
 283  <br>Notice that the granularity of System.currentTimeMillis() (which is used
 284  here) is system dependant,
 285  and in some systems an operation that takes 5 ms to complete may show 0 ms
 286  latency time in performance measurements.
 287  Therefore it is sometimes more accurate to look at the elapsed time of a larger
 288  sequence, as demonstrated here.
 289  </li>
 290  <li>
 291  <b>Rate</b>:
 292  To set a rate (ops/sec or ops/min) for a sequence, add
 293  '<font color="#FF0066">: N : R</font>' just after sequence closing tag.
 294  This would specify repetition of N with rate of R operations/sec.
 295  Use '<font color="#FF0066">R/sec</font>' or
 296  '<font color="#FF0066">R/min</font>'
 297  to explicitely specify that the rate is per second or per minute.
 298  The default is per second,
 299  <br>Example -  <font color="#FF0066">[ AddDoc ] : 400 : 3</font> - would do 400
 300  addDoc in parallel, starting up to 3 threads per second.
 301  <br>Example -  <font color="#FF0066">{ AddDoc } : 100 : 200/min</font> - would
 302  do 100 addDoc serially,
 303  waiting before starting next add, if otherwise rate would exceed 200 adds/min.
 304  </li>
 305  <li>
 306  <b>Disable Counting</b>: Each task executed contributes to the records count.
 307  This count is reflected in reports under recs/s and under recsPerRun.
 308  Most tasks count 1, some count 0, and some count more.
 309  (See <a href="#recsCounting">Results record counting clarified</a> for more details.)
 310  It is possible to disable counting for a task by preceding it with <font color="#FF0066">-</font>.
 311  <br>Example -  <font color="#FF0066"> -CreateIndex </font> - would count 0 while
 312  the default behavior for CreateIndex is to count 1.
 313  </li>
 314  <li>
 315  <b>Command names</b>: Each class "AnyNameTask" in the
 316  package org.apache.lucene.benchmark.byTask.tasks,
 317  that extends PerfTask, is supported as command "AnyName" that can be
 318  used in the benchmark "algorithm" description.
 319  This allows to add new commands by just adding such classes.
 320  </li>
 321 </ol>
 322
 323
 324 <a name="tasks"></a>
 325 <h2>Supported tasks/commands</h2>
 326
 327 <p>
 328 Existing tasks can be divided into a few groups:
 329 regular index/search work tasks, report tasks, and control tasks.
 330 </p>
 331
 332 <ol>
 333
 334  <li>
 335  <b>Report tasks</b>: There are a few Report commands for generating reports.
 336  Only task runs that were completed are reported.
 337  (The 'Report tasks' themselves are not measured and not reported.)
 338  <ul>
 339              <li>
 340             <font color="#FF0066">RepAll</font> - all (completed) task runs.
 341             </li>
 342             <li>
 343             <font color="#FF0066">RepSumByName</font> - all statistics,
 344             aggregated by name. So, if AddDoc was executed 2000 times,
 345             only 1 report line would be created for it, aggregating all those
 346             2000 statistic records.
 347             </li>
 348             <li>
 349             <font color="#FF0066">RepSelectByPref &nbsp; prefixWord</font> - all
 350             records for tasks whose name start with
 351             <font color="#FF0066">prefixWord</font>.
 352             </li>
 353             <li>
 354             <font color="#FF0066">RepSumByPref &nbsp; prefixWord</font> - all
 355             records for tasks whose name start with
 356             <font color="#FF0066">prefixWord</font>,
 357             aggregated by their full task name.
 358             </li>
 359             <li>
 360             <font color="#FF0066">RepSumByNameRound</font> - all statistics,
 361             aggregated by name and by <font color="#FF0066">Round</font>.
 362             So, if AddDoc was executed 2000 times in each of 3
 363             <font color="#FF0066">rounds</font>, 3 report lines would be
 364             created for it,
 365             aggregating all those 2000 statistic records in each round.
 366             See more about rounds in the <font color="#FF0066">NewRound</font>
 367             command description below.
 368             </li>
 369             <li>
 370             <font color="#FF0066">RepSumByPrefRound &nbsp; prefixWord</font> -
 371             similar to <font color="#FF0066">RepSumByNameRound</font>,
 372             just that only tasks whose name starts with
 373             <font color="#FF0066">prefixWord</font> are included.
 374             </li>
 375  </ul>
 376  If needed, additional reports can be added by extending the abstract class
 377  ReportTask, and by
 378  manipulating the statistics data in Points and TaskStats.
 379  </li>
 380
 381  <li><b>Control tasks</b>: Few of the tasks control the benchmark algorithm
 382  all over:
 383  <ul>
 384      <li>
 385      <font color="#FF0066">ClearStats</font> - clears the entire statistics.
 386      Further reports would only include task runs that would start after this
 387      call.
 388      </li>
 389      <li>
 390      <font color="#FF0066">NewRound</font> - virtually start a new round of
 391      performance test.
 392      Although this command can be placed anywhere, it mostly makes sense at
 393      the end of an outermost sequence.
 394      <br>This increments a global "round counter". All task runs that
 395      would start now would
 396      record the new, updated round counter as their round number.
 397      This would appear in reports.
 398      In particular, see <font color="#FF0066">RepSumByNameRound</font> above.
 399      <br>An additional effect of NewRound, is that numeric and boolean
 400      properties defined (at the head
 401      of the .alg file) as a sequence of values, e.g. <font color="#FF0066">
 402      merge.factor=mrg:10:100:10:100</font> would
 403      increment (cyclic) to the next value.
 404      Note: this would also be reflected in the reports, in this case under a
 405      column that would be named "mrg".
 406      </li>
 407      <li>
 408      <font color="#FF0066">ResetInputs</font> - DocMaker and the
 409      various QueryMakers
 410      would reset their counters to start.
 411      The way these Maker interfaces work, each call for makeDocument()
 412      or makeQuery() creates the next document or query
 413      that it "knows" to create.
 414      If that pool is "exhausted", the "maker" start over again.
 415      The resetInpus command
 416      therefore allows to make the rounds comparable.
 417      It is therefore useful to invoke ResetInputs together with NewRound.
 418      </li>
 419      <li>
 420      <font color="#FF0066">ResetSystemErase</font> - reset all index
 421      and input data and call gc.
 422      Does NOT reset statistics. This contains ResetInputs.
 423      All writers/readers are nullified, deleted, closed.
 424      Index is erased.
 425      Directory is erased.
 426      You would have to call CreateIndex once this was called...
 427      </li>
 428      <li>
 429      <font color="#FF0066">ResetSystemSoft</font> -  reset all
 430      index and input data and call gc.
 431      Does NOT reset statistics. This contains ResetInputs.
 432      All writers/readers are nullified, closed.
 433      Index is NOT erased.
 434      Directory is NOT erased.
 435      This is useful for testing performance on an existing index,
 436      for instance if the construction of a large index
 437      took a very long time and now you would to test
 438      its search or update performance.
 439      </li>
 440  </ul>
 441  </li>
 442
 443  <li>
 444  Other existing tasks are quite straightforward and would
 445  just be briefly described here.
 446  <ul>
 447      <li>
 448      <font color="#FF0066">CreateIndex</font> and
 449      <font color="#FF0066">OpenIndex</font> both leave the
 450      index open for later update operations.
 451      <font color="#FF0066">CloseIndex</font> would close it.
 452      <li>
 453      <font color="#FF0066">OpenReader</font>, similarly, would
 454      leave an index reader open for later search operations.
 455      But this have further semantics.
 456      If a Read operation is performed, and an open reader exists,
 457      it would be used.
 458      Otherwise, the read operation would open its own reader
 459      and close it when the read operation is done.
 460      This allows testing various scenarios - sharing a reader,
 461      searching with "cold" reader, with "warmed" reader, etc.
 462      The read operations affected by this are:
 463      <font color="#FF0066">Warm</font>,
 464      <font color="#FF0066">Search</font>,
 465      <font color="#FF0066">SearchTrav</font> (search and traverse),
 466      and <font color="#FF0066">SearchTravRet</font> (search
 467      and traverse and retrieve).
 468      Notice that each of the 3 search task types maintains
 469      its own queryMaker instance.
 470          <li>
 471          <font color="#FF0066">CommitIndex</font> and
 472          <font color="#FF0066">Optimize</font> can be used to commit
 473          changes to the index and/or optimize the index created thus
 474          far.
 475          <li>
 476          <font color="#FF0066">WriteLineDoc</font> prepares a 'line'
 477          file where each line holds a document with <i>title</i>,
 478          <i>date</i> and <i>body</i> elements, seperated by [TAB].
 479          A line file is useful if one wants to measure pure indexing
 480          performance, without the overhead of parsing the data.<br>
 481          You can use LineDocSource as a ContentSource over a 'line'
 482          file.
 483          <li>
 484          <font color="#FF0066">ConsumeContentSource</font> consumes
 485          a ContentSource. Useful for e.g. testing a ContentSource
 486          performance, without the overhead of preparing a Document
 487          out of it.
 488  </ul>
 489  </li>
 490  </ol>
 491
 492 <a name="properties"></a>
 493 <h2>Benchmark properties</h2>
 494
 495 <p>
 496 Properties are read from the header of the .alg file, and
 497 define several parameters of the performance test.
 498 As mentioned above for the <font color="#FF0066">NewRound</font> task,
 499 numeric and boolean properties that are defined as a sequence
 500 of values, e.g. <font color="#FF0066">merge.factor=mrg:10:100:10:100</font>
 501 would increment (cyclic) to the next value,
 502 when NewRound is called, and would also
 503 appear as a named column in the reports (column
 504 name would be "mrg" in this example).
 505 </p>
 506
 507 <p>
 508 Some of the currently defined properties are:
 509 </p>
 510
 511 <ol>
 512     <li>
 513     <font color="#FF0066">analyzer</font> - full
 514     class name for the analyzer to use.
 515     Same analyzer would be used in the entire test.
 516     </li>
 517
 518     <li>
 519     <font color="#FF0066">directory</font> - valid values are
 520     This tells which directory to use for the performance test.
 521     </li>
 522
 523     <li>
 524     <b>Index work parameters</b>:
 525     Multi int/boolean values would be iterated with calls to NewRound.
 526     There would be also added as columns in the reports, first string in the
 527     sequence is the column name.
 528     (Make sure it is no shorter than any value in the sequence).
 529     <ul>
 530         <li><font color="#FF0066">max.buffered</font>
 531         <br>Example: max.buffered=buf:10:10:100:100 -
 532         this would define using maxBufferedDocs of 10 in iterations 0 and 1,
 533         and 100 in iterations 2 and 3.
 534         </li>
 535         <li>
 536         <font color="#FF0066">merge.factor</font> - which
 537         merge factor to use.
 538         </li>
 539         <li>
 540         <font color="#FF0066">compound</font> - whether the index is
 541         using the compound format or not. Valid values are "true" and "false".
 542         </li>
 543     </ul>
 544 </ol>
 545
 546 <p>
 547 Here is a list of currently defined properties:
 548 </p>
 549 <ol>
 550
 551   <li><b>Root directory for data and indexes:</b></li>
 552     <ul><li>work.dir (default is System property "benchmark.work.dir" or "work".)
 553     </li></ul>
 554   </li>
 555
 556   <li><b>Docs and queries creation:</b></li>
 557     <ul><li>analyzer
 558     </li><li>doc.maker
 559     </li><li>doc.maker.forever
 560     </li><li>html.parser
 561     </li><li>doc.stored
 562     </li><li>doc.tokenized
 563     </li><li>doc.term.vector
 564     </li><li>doc.term.vector.positions
 565     </li><li>doc.term.vector.offsets
 566     </li><li>doc.store.body.bytes
 567     </li><li>docs.dir
 568     </li><li>query.maker
 569     </li><li>file.query.maker.file
 570     </li><li>file.query.maker.default.field
 571     </li><li>search.num.hits
 572     </li></ul>
 573   </li>
 574
 575   <li><b>Logging</b>:
 576     <ul><li>log.step
 577         </li><li>log.step.[class name]Task ie log.step.DeleteDoc (e.g. log.step.Wonderful for the WonderfulTask example above).
 578     </li><li>log.queries
 579     </li><li>task.max.depth.log
 580     </li></ul>
 581   </li>
 582
 583   <li><b>Index writing</b>:
 584     <ul><li>compound
 585     </li><li>merge.factor
 586     </li><li>max.buffered
 587     </li><li>directory
 588     </li><li>ram.flush.mb
 589     </li></ul>
 590   </li>
 591
 592   <li><b>Doc deletion</b>:
 593     <ul><li>doc.delete.step
 594     </li></ul>
 595   </li>
 596
 597   <li><b>Task alternative packages</b>:
 598     <ul><li>alt.tasks.packages
 599       - comma separated list of additional packages where tasks classes will be looked for
 600       when not found in the default package (that of PerfTask).  If the same task class
 601       appears in more than one package, the package indicated first in this list will be used.
 602     </li></ul>
 603   </li>
 604
 605 </ol>
 606
 607 <p>
 608 For sample use of these properties see the *.alg files under conf.
 609 </p>
 610
 611 <a name="example"></a>
 612 <h2>Example input algorithm and the result benchmark report</h2>
 613 <p>
 614 The following example is in conf/sample.alg:
 615 <pre>
 616 <font color="#003333"># --------------------------------------------------------
 617 #
 618 # Sample: what is the effect of doc size on indexing time?
 619 #
 620 # There are two parts in this test:
 621 # - PopulateShort adds 2N documents of length  L
 622 # - PopulateLong  adds  N documents of length 2L
 623 # Which one would be faster?
 624 # The comparison is done twice.
 625 #
 626 # --------------------------------------------------------
 627 </font>
 628 <font color="#990066"># -------------------------------------------------------------------------------------
 629 # multi val params are iterated by NewRound's, added to reports, start with column name.
 630 merge.factor=mrg:10:20
 631 max.buffered=buf:100:1000
 632 compound=true
 633
 634 analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
 635 directory=FSDirectory
 636
 637 doc.stored=true
 638 doc.tokenized=true
 639 doc.term.vector=false
 640 doc.add.log.step=500
 641
 642 docs.dir=reuters-out
 643
 644 doc.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleDocMaker
 645
 646 query.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleQueryMaker
 647
 648 # task at this depth or less would print when they start
 649 task.max.depth.log=2
 650
 651 log.queries=false
 652 # -------------------------------------------------------------------------------------</font>
 653 <font color="#3300FF">{
 654
 655     { "PopulateShort"
 656         CreateIndex
 657         { AddDoc(4000) > : 20000
 658         Optimize
 659         CloseIndex
 660     >
 661
 662     ResetSystemErase
 663
 664     { "PopulateLong"
 665         CreateIndex
 666         { AddDoc(8000) > : 10000
 667         Optimize
 668         CloseIndex
 669     >
 670
 671     ResetSystemErase
 672
 673     NewRound
 674
 675 } : 2
 676
 677 RepSumByName
 678 RepSelectByPref Populate
 679 </font>
 680 </pre>
 681 </p>
 682
 683 <p>
 684 The command line for running this sample:
 685 <br><code>ant run-task -Dtask.alg=conf/sample.alg</code>
 686 </p>
 687
 688 <p>
 689 The output report from running this test contains the following:
 690 <pre>
 691 Operation     round mrg  buf   runCnt   recsPerRun        rec/s  elapsedSec    avgUsedMem    avgTotalMem
 692 PopulateShort     0  10  100        1        20003        119.6      167.26    12,959,120     14,241,792
 693 PopulateLong -  - 0  10  100 -  -   1 -  -   10003 -  -  - 74.3 -  - 134.57 -  17,085,208 -   20,635,648
 694 PopulateShort     1  20 1000        1        20003        143.5      139.39    63,982,040     94,756,864
 695 PopulateLong -  - 1  20 1000 -  -   1 -  -   10003 -  -  - 77.0 -  - 129.92 -  87,309,608 -  100,831,232
 696 </pre>
 697 </p>
 698
 699 <a name="recsCounting"></a>
 700 <h2>Results record counting clarified</h2>
 701 <p>
 702 Two columns in the results table indicate records counts: records-per-run and
 703 records-per-second. What does it mean?
 704 </p><p>
 705 Almost every task gets 1 in this count just for being executed.
 706 Task sequences aggregate the counts of their child tasks,
 707 plus their own count of 1.
 708 So, a task sequence containing 5 other task sequences, each running a single
 709 other task 10 times, would have a count of 1 + 5 * (1 + 10) = 56.
 710 </p><p>
 711 The traverse and retrieve tasks "count" more: a traverse task
 712 would add 1 for each traversed result (hit), and a retrieve task would
 713 additionally add 1 for each retrieved doc. So, regular Search would
 714 count 1, SearchTrav that traverses 10 hits would count 11, and a
 715 SearchTravRet task that retrieves (and traverses) 10, would count 21.
 716 </p><p>
 717 Confusing? this might help: always examine the <code>elapsedSec</code> column,
 718 and always compare "apples to apples", .i.e. it is interesting to check how the
 719 <code>rec/s</code> changed for the same task (or sequence) between two
 720 different runs, but it is not very useful to know how the <code>rec/s</code>
 721 differs between <code>Search</code> and <code>SearchTrav</code> tasks. For
 722 the latter, <code>elapsedSec</code> would bring more insight.
 723 </p>
 724
 725 </DIV>
 726 <DIV>&nbsp;</DIV>
 727 </BODY>
 728 </HTML>