Misc Tools

The misc package has various tools for splitting/merging indices, changing norms, finding high freq terms, and others.

DirectIOLinuxDirectory

NOTE: This uses C++ sources (accessible via JNI), which you'll have to compile on your platform. Further, this is a very platform-specific extensions (runs only on Linux, and likely only on 2.6.x kernels).

DirectIOLinuxDirectory is a Directory implementation that bypasses the OS's buffer cache for any IndexInput and IndexOutput opened through it (using the linux-specific O_DIRECT flag).

Note that doing so typically results in bad performance loss! You should not use this for searching, but rather for indexing (or maybe just merging during indexing), to avoid evicting useful pages from the buffer cache. See here for details. Steps to build:

To use this, you'll likely want to make a custom subclass of FSDirectory that only opens direct IndexInput/Output for merging. One hackish way to do this is to check if the current thread's name starts with "Lucene Merge Thread". Alternatively, you could use this Dir as is for all indexing ops, but not for searching.

NativePosixUtil.cpp/java also expose access to the posix_madvise, madvise, posix_fadvise functions, which are somewhat more cross platform than O_DIRECT, however, in testing (see above link), these APIs did not seem to help prevent buffer cache eviction.