Minidx Support Forum


 
Subject: MAX_EXTRACT_TEXT_SIZE上限?
heroyo
Newbie
Rank: 1



UID 110
Digest Posts 0
Credits 0
Posts 4
Reading Access 10
Registered 19-8-2008
Status Offline
Post at 19-8-2008 21:37  Profile | P.M. 
MAX_EXTRACT_TEXT_SIZE上限?

Dear All~

請教大家關於Extract text Demo中宣告的MAX_EXTRACT_TEXT_SIZE變數,
截取本文用64M作為截取的上限,是否有特殊的意義呢?
例:64M是不是經過測試,所訂立出來,效能上是最洽當的上限大小呢?
是不是測試截取200種檔案格式之後,所訂立的截取上限呢?
或是沒有特別的意思呢?
謝謝!

' no more than 64MB of raw text for a resume!
Private Const MAX_EXTRACT_TEXT_SIZE As Integer = 64 * 1024 * 1024

Best Regards,
Hiro
Top
[Adv.]
dingzhigang
Administrator
Rank: 9Rank: 9Rank: 9



UID 2
Digest Posts 0
Credits 40
Posts 74
Reading Access 200
Registered 27-3-2007
Status Offline
Post at 20-8-2008 10:47  Profile | Blog | P.M. 


QUOTE:
Originally posted by heroyo at 19-8-2008 21:37
Dear All~

請教大家關於Extract text Demo中宣告的MAX_EXTRACT_TEXT_SIZE變數,
截取本文用64M作為截取的上限,是否有特殊的意義呢?
例:64M是不是經過測試,所訂立出來,效能上是最洽當的上限大小呢?
是不是測試截取200種檔案 ...

heroyo,你好

定义64是由于受到Win系统本身的限制,实际上对超过这个数字的文本进行索引的意义已经不大了。
当然你也可以对这个数字进行修改,打开注册表,
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Contro l\ContentIndex
MaxTextFilterBytes,默认是25,000,000
Top
[Adv.]
heroyo
Newbie
Rank: 1



UID 110
Digest Posts 0
Credits 0
Posts 4
Reading Access 10
Registered 19-8-2008
Status Offline
Post at 20-8-2008 12:15  Profile | P.M. 
謝謝您!!

另外請教一個問題,
請問ExtractText.dll元件是如何產出的呢?
該元件是否為Microsoft釋出的元件呢?
謝謝!

Best Regards,
Hiro
Top
[Adv.]
dingzhigang
Administrator
Rank: 9Rank: 9Rank: 9



UID 2
Digest Posts 0
Credits 40
Posts 74
Reading Access 200
Registered 27-3-2007
Status Offline
Post at 20-8-2008 12:18  Profile | Blog | P.M. 


QUOTE:
Originally posted by heroyo at 20-8-2008 12:15
謝謝您!!

另外請教一個問題,
請問ExtractText.dll元件是如何產出的呢?
該元件是否為Microsoft釋出的元件呢?
謝謝!

Best Regards,
Hiro

下面又说明的:
http://blog.minidx.com/2007/12/31/334.html
参考开头的说明就可以了。
Top
[Adv.]
heroyo
Newbie
Rank: 1



UID 110
Digest Posts 0
Credits 0
Posts 4
Reading Access 10
Registered 19-8-2008
Status Offline
Post at 20-8-2008 12:54  Profile | P.M. 
Dear dingzhigang,

謝謝您的回覆,
我在公司的產品中,為了實現Lucene全文檢索的功能,
有參考ExtractText.dll這顆元件(VB.NET 2005開發),
已初步完成截取txt,doc,ppt,xls,docx,pptx,xlsx附件格式的內文,
但是上級安全性考量,希望不要引用不明來歷的元件。
請問ExtractText.dll這顆元件,是否有Open Source呢?
該元件又是用哪一種程式語言開發呢?
謝謝您!

Best Regards,
Hiro
Top
[Adv.]
dingzhigang
Administrator
Rank: 9Rank: 9Rank: 9



UID 2
Digest Posts 0
Credits 40
Posts 74
Reading Access 200
Registered 27-3-2007
Status Offline
Post at 20-8-2008 14:36  Profile | Blog | P.M. 
该组件目前没有打算开源,抱歉
是用C++实现的.
Top
[Adv.]
heroyo
Newbie
Rank: 1



UID 110
Digest Posts 0
Credits 0
Posts 4
Reading Access 10
Registered 19-8-2008
Status Offline
Post at 20-8-2008 15:09  Profile | P.M. 
瞭解, 我自己會再深入研究IFilter,
希望能夠用vb.net寫出截取附件內容的功能,
還是很謝謝您!!
Top
[Adv.]
 


All times are GMT+8, the time now is 20-11-2008 18:07


Processed in 0.736652 second(s), 7 queries , Gzip enabled

Clear Cookies - Contact Us - Minidx Inc - Archiver - WAP