用Java来开发一个Web数据抽取工具字数:9776,页数:33

用Java来开发一个Web数据抽取工具

本文ID:LW22798

字数:9776,页数:33

￥50

范文字数:9776,页数:33 摘要本课题是介绍如何用Java来开发一个Web数据抽取工具。主要内容就是实现Spider（发现、搜集网页信息需要有高性能的“网络蜘蛛”程序去自动地在互联网中搜索信息），解析HTML（Web中的信息都是建立在HTML协议之上的，所以网络机器人在检索网页时的第一个问题就是如何解析HTML），提高程序性能（..

范文字数:9776,页数:33

摘要

本课题是介绍如何用Java来开发一个Web数据抽取工具。主要内容就是实现Spider（发现、搜集网页信息需要有高性能的“网络蜘蛛”程序去自动地在互联网中搜索信息），解析HTML（Web中的信息都是建立在HTML协议之上的，所以网络机器人在检索网页时的第一个问题就是如何解析HTML），提高程序性能（利用Java的多线程技术在Internet中拥有海量的Web页面中开发出高效的Spider程序）。Eclipse开发工具采用Spider核心技术遍历URL下载整个Web站点。我通过设计和调用各种Java类实现了上述技术的要求。本设计程序本质是一个Web Spider。与其它下载工具相比较它的主要优势是能够自动填充form(如：自动登录)和使用cookies来处理session。它还有灵活的下载规则(如：通过网页的URL，大小，MIME类型等)来限制下载。经过程序运行测试，效果良好。
关键词: 数据抽取,Java类,Web Spider,Java多线程

Abstract

This topic is an introduction how to develop a Web data sample tool based on the Java.The main contents is to carry out Spider(find, collect web page's information need to have "Web spider" of high performance to search the information of itself in the Internet), analyze HTML(the informations in the Web all build up in the HTML, so the first problem for web robot is how to analyze HTML when crawling web page) and raise program function.(make use of the Java multi-threading technique to develop efficiently of Spider program in the Internet which have a number of Web pages) To adopt the core technique of Spider in the Eclipse to crawl the URL ,then download the whole Web site.I carry out the above-mentioned technical request with design and use various of Java class.The essence of this program is a Web Spider.The main advantage downloaded tool with other to compare it is it can automatically to fill form(such as:Automatically register)with usage cookies to handle session.It still has vivid download rule(such as:Pass the URL, size of web page, MIME type etc.)to limit a download.y the effect is good by a test.
Keyword: the data sample,Java class,Web Spider,the Java multi-threading

摘要 I
Abstract II
第一章绪论 1
1.1 背景 1
1.2 设计目的及实现方法 1
1.3 国内外的现状 1
第二章相关技术综述和技术背景 3
2.1 开发工具Eclipse 3
2.1.1 Eclipse简介 3
2.1.2 Eclipse工作台 3
2.1.3 在Eclipse中开发Java程序 4
2.1.4 在Eclipse中调试Java程序 4
2.2 核心技术——Spider 4
2.2.1 工作原理 4
2.2.2 搜索策略 4
2.2.3 搜索策略的趋势 5
2.3 Spider设计 6
2.3.1 Spider采集 6
2.3.2 Socket连接的实现 7
2.3.3 Spider程序结构 7
2.3.4 Spider构架 9
2.4 Spider中采用提高程序性能的技术 9
第三章总体设计 11
3.1 设计原则 11
3.2 功能目标设计 11
3.3 设计描述 11
3.4 设计的实现 12
3.4.1 Java类的建立 12
3.4.2 Java类的调用和修改 14
3.5 说明 19
3.5.1 主窗口 19
3.5.2 功能窗口 20
第四章运行与测试 24
第五章总结 28
致谢 28
参考文献 29

点击下载: 用Java来开发一个Web数据抽取工具 (收费:5000 积分)

《用Java来开发一个Web数据抽取工具》WORD格式全文下载链接

用Java来开发一个Web数据抽取工具相关范文


上一篇：JSP网上书店系统	下一篇：电子商务的动态商务网站——网络..

点击查看关于 Java 开发一个 Web 数据抽取工具 的相关范文题目

【返回顶部】