知识库简要设计_1

知识库简设计

概述
知识库系统提供知识分类目录维护知识录入检索权限等程理实现知识享提高知识积累分类存储知识数享知识应户创建收集者整理文档放里理分类权限分配等

二分类理
知识库理系统中分类理包括分类目录创建修改删等维护功种知识分门类进行纳方便理查阅

三知识理
知识库理包括知识录入查阅修改删等功中录入修改采格式编辑器（eWebEditor）进行维护见选择录知识文档分类目录录入完成通全文检索方式查阅知识文档文档知识维护时选择否享果选择享该文档脱离权限限制户查阅

四全文检索
全文检索够整系统进行搜索工具输入检索关键字查询出包含关键字知识文档里检索结果然遵守知识文档权限理户检索关键词结果中会显示该户权限查知识享知识文档

五权限理
户文档浏览权限设置通设置描述户浏览分类目录知识文档知识库系统公组件形式存挂需应中留户接口接收户信息进行权限设置

Lucene 范例

20070901 154228| 分类：业务 |字号订阅
Lucene 软件包发布形式 JAR 文件面分析 JAR 文件里面 JAVA 包读者初步解

Package orgapachelucenedocument

包提供封装索引文档需类 Document Field样文档终封装成 Document 象

Package orgapacheluceneanalysis

包功文档进行分词文档建立索引前必须进行分词包作成
建立索引做准备工作

Package orgapacheluceneindex

包提供类协助创建索引创建索引进行更新里面两基础类：IndexWriter IndexReader中 IndexWriter 创建索引添加文档索引中IndexReader 删索引中文档

Package orgapachelucenesearch

包提供建立索引进行搜索需类 IndexSearcher Hits IndexSearcher 定义指定索引进行搜索方法Hits 保存搜索结果

简单搜索应程序

假设电脑目录中含文文档需查找文档含某关键词实现种功首先利 Lucene 目录中文档建立索引然建立索引中搜索查找文档通例子读者会利 Lucene 构建搜索应程序较清楚认识

建立索引

文档进行索引Lucene 提供五基础类分 Document Field IndexWriter Analyzer Directory面分介绍五类途：

Document

Document 描述文档里文档指 HTML 页面封电子邮件者文文件 Document 象 Field 象组成 Document 象想象成数库中记录 Field 象记录字段

Field

Field 象描述文档某属性封电子邮件标题容两 Field 象分描述

Analyzer

文档索引前首先需文档容进行分词处理部分工作 Analyzer 做Analyzer 类抽象类实现针语言应需选择适合 AnalyzerAnalyzer 分词容交 IndexWriter 建立索引

IndexWriter

IndexWriter Lucene 创建索引核心类作 Document 象加索引中

Directory

类代表 Lucene 索引存储位置抽象类目前两实现第 FSDirectory表示存储文件系统中索引位置第二 RAMDirectory表示存储存中索引位置

熟悉建立索引需类开始某目录面文文件建立索引清单1出某目录文文件建立索引源代码

清单 1 文文件建立索引

package TestLucene

import javaioFile

import javaioFileReader

import javaioReader

import javautilDate

import orgapacheluceneanalysisAnalyzer

import orgapacheluceneanalysisstandardStandardAnalyzer

import orgapachelucenedocumentDocument

import orgapachelucenedocumentField

import orgapacheluceneindexIndexWriter

**

* This class demonstrate the process of creating index with Lucene

* for text files

*

public class TxtFileIndexer {

public static void main(String[] args) throws Exception{

indexDir 保存索引文件路径

        File   indexDir new File(D\\luceneIndex)

        dataDir 需加入索引文件路径

        File   dataDir new File(D\\luceneData)

        Analyzer luceneAnalyzer new StandardAnalyzer()

        File[] dataFiles dataDirlistFiles()

        IndexWriter indexWriter new IndexWriter(indexDirluceneAnalyzertrue)

        long startTime new Date()getTime()

        for(int i 0 i < dataFileslength i++){

         if(dataFiles[i]isFile() && dataFiles[i]getName()endsWith(txt)){

          Systemoutprintln(Indexing file + dataFiles[i]getCanonicalPath())

          Document document new Document()

          Reader txtReader new FileReader(dataFiles[i])

          documentadd(FieldText(pathdataFiles[i]getCanonicalPath()))

          documentadd(FieldText(contentstxtReader))

          indexWriteraddDocument(document)

         }

        }

        indexWriteroptimize()

        indexWriterclose()

        long endTime new Date()getTime()



        Systemoutprintln(It takes + (endTime startTime)

                           + milliseconds to create index for the files in directory

                     + dataDirgetPath())

}

}

清单1中注意类 IndexWriter 构造函数需三参数

第参数指定创建索引存放位置 File 象 FSDirectory 象者 RAMDirectory 象

第二参数指定 Analyzer 类实现指定索引分词器文挡容进行分词

第三参数布尔型变量果 true 话代表创建新索引 false 话代表原索引基础进行操作

接着程序遍历目录面文文档文文档创建 Document 象

然文文档两属性：路径容加入两 Field 象中接着两 Field 象加入 Document 象中

文档 IndexWriter 类 add 方法加入索引中

样便完成索引创建

接进入建立索引进行搜索部分

搜索文档

利Lucene进行搜索建立索引样非常方便面部分中已目录文文档建立索引现索引进行搜索找包含某关键词短语文档Lucene提供基础类完成程分呢IndexSearcher Term Query TermQuery Hits 面分介绍类功

Query

抽象类实现TermQuery BooleanQuery PrefixQuery 类目户输入查询字符串封装成Lucene够识Query

Term

Term 搜索基单位Term象两String类型域组成生成Term象条语句完成：Term term new Term(fieldNamequeryWord) 中第参数代表文档Field进行查找第二参数代表查询关键词

TermQuery

TermQuery 抽象类Query子类时Lucene支持基查询类生成TermQuery象语句完成： TermQuery termQuery new TermQuery(new Term(fieldNamequeryWord)) 构造函数接受参数Term象

IndexSearcher

IndexSearcher建立索引进行搜索读方式开索引IndexSearcher实例索引进行操作

Hits

Hits保存搜索结果

介绍完搜索必须类开始前建立索引进行搜索清单2出完成搜索功需代码

清单2 ：建立索引进行搜索

package TestLucene

import javaioFile

import orgapachelucenedocumentDocument

import orgapacheluceneindexTerm

import orgapachelucenesearchHits

import orgapachelucenesearchIndexSearcher

import orgapachelucenesearchTermQuery

import orgapachelucenestoreFSDirectory

**

* This class is used to demonstrate the

* process of searching on an existing

* Lucene index

*

*

public class TxtFileSearcher {

public static void main(String[] args) throws Exception{

     String queryStr lucene

     This is the directory that hosts the Lucene index

        File indexDir new File(D\\luceneIndex)

        FSDirectory directory FSDirectorygetDirectory(indexDirfalse)

        IndexSearcher searcher new IndexSearcher(directory)

        if(indexDirexists()){

         Systemoutprintln(The Lucene index is not exist)

         return

        }

        Term term new Term(contentsqueryStrtoLowerCase())

        TermQuery luceneQuery new TermQuery(term)

        Hits hits searchersearch(luceneQuery)

        for(int i 0 i < hitslength() i++){

         Document document hitsdoc(i)

         Systemoutprintln(File + documentget(path))

        }

}

}

清单2中类IndexSearcher构造函数接受类型Directory象Directory抽象类目前两子类： FSDirctoryRAMDirectory 程序中传入FSDirctory象作参数代表存储磁盘索引位置构造函数执行完成代表 IndexSearcher读方式开索引然程序构造Term象通Term象指定文档容中搜索包含关键词lucene文档接着利Term象构造出TermQuery象TermQuery象传入 IndexSearchersearch方法中进行查询返回结果保存Hits象中循环语句搜索文档路径印出搜索应程序已开发完毕样利Lucene开发搜索应程序简单

[java语言]解决Word文档检索问题lucene天职搜索

原创空间软件技术

邢红瑞发表 20051120 130837

lunece姓氏Lucene is Doug’s wife’s middle name it’s also her maternal grandmother’s first name

车东老blog针MSWord文档解析器Word文档基ASCIIRTF文档

需COM象机制解析实apachePOI完全做解析MSWord文档

修改例子算抛砖引玉家转头

Lucene没规定数源格式提供通结构（Document象）接受索引输入

文数

package orgtatanframework

import javaioPrintStream

import javaioPrintWriter

public class DocumentHandlerException extends Exception {

private Throwable cause

**

   * Default constructor

   *

public DocumentHandlerException() {

    super()

}

**

   * Constructs with message

   *

public DocumentHandlerException(String message) {

    super(message)

}

**

   * Constructs with chained exception

   *

public DocumentHandlerException(Throwable cause) {

    super(causetoString())

    thiscause cause

}

**

   * Constructs with message and exception

   *

public DocumentHandlerException(String message Throwable cause) {

    super(message cause)

}

**

   * Retrieves nested exception

   *

public Throwable getException() {

    return cause

}

public void printStackTrace() {

    printStackTrace(Systemerr)

}

public void printStackTrace(PrintStream ps) {

    synchronized (ps) {

      superprintStackTrace(ps)

      if (cause null) {

        psprintln( Nested Exception )

        causeprintStackTrace(ps)

      }

    }

}

public void printStackTrace(PrintWriter pw) {

    synchronized (pw) {

      superprintStackTrace(pw)

      if (cause null) {

        pwprintln( Nested Exception )

        causeprintStackTrace(pw)

      }

    }

}

}

解析MSWORD类

package orgtatanframework

import orgapachepoihdfextractorWordDocument

import javaioInputStream

import javaioStringWriter

import javaioPrintWriter

public class POIWordDocHandler {

public String getDocument(InputStream is)

    throws DocumentHandlerException {

    String bodyText null

    try {

      WordDocument wd new WordDocument(is)

      StringWriter docTextWriter new StringWriter()

      wdwriteAllText(new PrintWriter(docTextWriter))

      docTextWriterclose()

      bodyText docTextWritertoString()

    }

    catch (Exception e) {

      throw new DocumentHandlerException(

        Cannot extract text from a Word document e)

    }

    if ((bodyText null) && (bodyTexttrim()length() > 0)) {



      return bodyText

    }

    return null

}

}

建立索引类

package orgtatanframework

import orgapacheluceneindexIndexWriter

import orgapacheluceneanalysisstandardStandardAnalyzer

import orgapachelucenedocumentDocument

import orgapachelucenedocumentField

import javaioFile

import javaioFileInputStream

import javaioIOException

import javautilDate

public class Indexer {

public static void main(String[] args) throws Exception {



    File indexDir new File(dtestdocindex)

    File dataDir new File(dtestdocmsword)

    long start new Date()getTime()

    int numIndexed index(indexDir dataDir)

    long end new Date()getTime()

    Systemoutprintln(Indexing + numIndexed + files took

      + (end start) + milliseconds)

}

public static int index(File indexDir File dataDir)

    throws Exception {

    if (dataDirexists() || dataDirisDirectory()) {

      throw new IOException(dataDir

        + does not exist or is not a directory)

    }

     IndexWriter writer new IndexWriter(indexDir

      new CJKAnalyzer() true)

    writersetUseCompoundFile(false)

    indexDirectory(writer dataDir)

    int numIndexed writerdocCount()

    writeroptimize()

    writerclose()

    return numIndexed

}

private static void indexDirectory(IndexWriter writer File dir)

    throws Exception {

    File[] files dirlistFiles()

    for (int i 0 i < fileslength i++) {

      File f files[i]

      if (fisDirectory()) {

        indexDirectory(writer f) recurse

      } else if (fgetName()endsWith(doc)) {

        indexFile(writer f)

      }

    }

}

private static void indexFile(IndexWriter writer File f)

    throws Exception {

    if (fisHidden() || fexists() || fcanRead()) {

      return

    }

    Systemoutprintln(Indexing + fgetCanonicalPath())

    Document doc new Document()

    POIWordDocHandler handler new POIWordDocHandler()

    docadd(FieldUnStored(body handlergetDocument(new FileInputStream(f))))

    docadd(FieldKeyword(filename fgetCanonicalPath()))

    writeraddDocument(doc)

}

}

注意问题：Field象UnStored函数全文索引存储

检索类

package orgtatanframework

import orgapachelucenedocumentDocument

import orgapachelucenequeryParserQueryParser

import orgapachelucenesearchHits

import orgapachelucenesearchIndexSearcher

import orgapachelucenesearchQuery

import orgapachelucenestoreDirectory

import orgapachelucenestoreFSDirectory

import orgapacheluceneanalysisToken

import orgapacheluceneanalysiscjkCJKAnalyzer

public class Searcher {

     public static void main(String[] args) throws Exception {



         Directory fsDir FSDirectorygetDirectory(D\\testdoc\\index false)

            IndexSearcher is new IndexSearcher(fsDir)



            Token[] tokens AnalyzerUtilstokensFromAnalysis(new CJKAnalyzer() 情)

            for (int i 0 i < tokenslength i++) {

           Query query QueryParserparse(tokens[i]termText() body new CJKAnalyzer())



            Hits hits issearch(query)



            for (int j 0 j < hitslength() j++) {

                Document doc hitsdoc(j)

                Systemoutprintln(docget(filename))

              }





            }

     }

}

注意问题：TermQuery检索出中文目前中文切词功

文档香网(httpswwwxiangdangnet)户传

《香当网》用户分享的内容，不代表《香当网》观点或立场，请自行判断内容的真实性和可靠性！
该内容是文档的文本内容，更好的格式请下载文档

知识库简要设计_1

相关文档

知识库建设工作总结

简要报告

1-4月全县经济运行情况简要分析

《Animals》1对1英语学习教学设计

《春》教学设计_1

一个知识库系统与外部数据源接口的研究

3G手机上网参数配置知识库使用说明

市经信委规划与技术改造处知识库信息

XX镇工作简要汇报

事故简要情况报告表

社区简要工作总结

人事工作职责简要

设计部实习报告_1

《技术与设计1》第2章第1节

今年控感工作简要总结

食堂职工工作简要汇报

民情流水线工程简要情况

某局长简要事迹

系统运行操作简要说明

教师关爱学生的简要事迹

文档贡献者

该用户的其他文档

相关PPT

相关PDF