On a digital archive based on data mining analysis of information management

Paper Keywords: manage files of digitized books

Abstract: With the development of socialist modernization, advances in computer technology, information technology in all areas of social development, plays a very important role. Information technology is to be classified as economic and social development of the primary content page file including document management, text translation conversion, pictures, audio and video materials, multimedia teleconferencing, etc. Especially University Archives is more focused teaching and research, web file management is an inevitable trend.

In today's information technology development, libraries, university libraries in particular, not only to the information simple digital conversion and management, but also to the emerging network of things and archiving for file management, including documents, text translation conversion, pictures, audiovisual materials, multimedia teleconferencing, etc. So the network file management, library management become an inevitable trend, which must file management on the technical and legal issues related to further elaborated and discussed.

The so-called data mining (Data Mining), is from a large, incomplete, noisy, fuzzy, random data, the extraction of implicit in them, but it is not known in advance of potentially useful information and knowledge process. These data may be structured such as relational data in the database, it can be semi-structured, such as text, graphics, images, data, or even on the network are distributed heterogeneous data method can be found in the knowledge of mathematics , can also be non-mathematical, which can be interpreted, it can be summarized. discovered knowledge can be used for information management, query optimization, decision support, process control, etc., can also be used to maintain the data itself. data mining With many years of mathematical and statistical techniques and artificial intelligence, knowledge engineering and other fields of research to build their own theoretical system involving databases, artificial intelligence, statistics, machine learning, artificial neural networks, visualization, parallel computing, interdisciplinary, is the international database and decision support areas at the forefront of research directions.

First, the data mining capabilities through data mining to predict future trends and behaviors, making predictive, knowledge-based decision-making data mining goal is to discover hidden from the database, meaningful knowledge can be classified according to their function the following categories.

1, association analysis correlation analysis can find large amounts of data into the database relevant contact, a commonly used technique for the association rules and sequential patterns of association rules is to discover a thing among other things, the interconnectedness or interdependence.

2, clustering the input data does not have any type of mark, clustering is according to certain rules divides the data into a reasonable set of upcoming classes or objects grouped into clusters, so that objects in the same cluster than between high degree of similarity, while objects in different clusters vary widely. Clustering enhance people's understanding of the objective reality, the concept description and analysis of variance prerequisite clustering technologies include traditional pattern recognition methods and mathematical classification school.

3, automatic data mining predict trends and behavior in a large database of automatic classification and prediction, looking predictive information, automatically proposed model describes the important data classes or to predict future data trends, so that previously required extensive manual analysis of the problems now directly from the data itself can be quickly concluded.

4, complex concept description for the database data, people expect a concise description of the form to describe the pooled data set. Concept description is to describe the meaning of certain objects and summarize the relevant characteristics of such objects. Concept description is divided into distinctive characteristic description and description, the former describes the common features of certain objects, which describe the different types of objects difference. generate a class characteristic involves only the class of all objects in the object commonality. generate distinctive described in many ways, such as the decision tree method, genetic algorithm.

5, the error detection data in the database records often some exceptions, these deviations from the database detects meaningful. Deviation includes many potential knowledge, such as classification of abnormal instances exception to the rule is not satisfied, the results observed with the predicted value deviations and changes in magnitude with time. deviation detecting the basic method is to find the result observed significant difference between the reference value which is detected commonly used in financial and banking fraud, market analysis or the analysis of specific consumer consumption habits.

Links to free download http://eng.hi138.com

Second, data mining in the construction of modern university archives Applications

1, the resource data includes digitized archives in processing various types of electronic files generated electronic documents stored in the center of all kinds of electronic files, file software collects information archives information network construction and maintenance information we file from user research university information needs, data mining for the University Archives and accurate understanding of the archives fully grasp the user's information needs to provide a method.

(1) using the Web to access information mining techniques to discover the association mode, sequence mode, and Web access trends, build multi-dimensional view of user interest model which can determine the file popularity information or services found that user access patterns and user needs tendency to study different aspects of the information the user needs to optimize file archives information resources provide a scientific basis.

(2) Collection University Archives Network web server maintains user registration information, access to records, as well as information about the user interaction with the system information such as raw data, after cleansing, enrichment and conversion form for statistical analysis of user access to the database, logging database, custom information database, the user feedback information, and other data set.

2, from the construction of the University Archives departure information resources, data mining provides for the University Archives Select a scientific basis for the development of a major road.

(A) the use of files and file management software to access network information mining analysis of the file resource utilization, will be high efficiency, high demand priority digitized traditional vector file, such as: access to information through the archives record retrieval request user request failed to analyze the data, according to statistics class files and frequent use of refuse collector sets, combining aggregation algorithm found the missing library resources, targeted to complement and enrich archival information resources.

(2) in the university archives collections management processes using text mining, the use of association, classification, clustering and other methods, according to information from the mass of files related topics mining, classification, processing, sorting and orderly restructuring, forming a special file information library and archive information on various topics libraries.

3, from the University Archives good perspective of information management, data mining to optimize the collection of information and projections for future work play an important role.

(A) in providing access to the sessions, for the information each time a user borrow correlation analysis found that the association between various types of archival information rule or proportional relationship, so you can further optimize the holdings information.

(2) University Archives holdings information to carry out the establishment of text features, feature extraction, feature matching, feature set reduction and model evaluation work, the realization of a large collection of documents summarize the contents, classification, clustering, association analysis, distribution analysis through generalize and summarize knowledge can be found in the file for future work to predict trends.

Third, the data mining application data management class management class university archives data include: intelligent monitoring systems, fire systems, temperature and humidity control system, intelligent Shelves, data management systems, data systems, the use of a large amount of their daily work Management class data. We are starting a data mining tool in such seemingly useless data to extract valuable knowledge and apply to the university archives work, and the modernization of the university archives play a role.

University Archives archival work is focused on teachers and students to serve, to serve as the center to carry out the work, how to use advanced tools to improve the quality of services that have plagued our problems of data mining for the university archives archival work intelligently , personalized, boutique provides an effective method in the intelligent retrieval system calls the user interest model, automatically correct search strategies and user interest will retrieve results according rapid clustering and classification, and principled way to sort it out, for Institute, Academy of Social Sciences and other research-based file users can make use of data mining for targeted file information mining and research to summarize, the results reported in the form provided to the user, so that not only achieve the secondary development of the university archives, will give users a surprise.

Network was originally just between scientists and researchers to exchange files of the software, the Internet can be used for education and research government subsidies in China, the University has funding to support a university library, archives digital library network is not profitable , the output is a long-term social teaching and research today, the Internet has become increasingly commercialized, networking in the digital economy has become the great potential of technology investments. university digital libraries can also consider the establishment of for-profit destination network repository, using internet business some business models, such as online advertising, banner advertising, sponsorship advertising, subscriptions, B2C, etc. The income can be used for university digital library network archives building rolling development. Currently these are people budding economic model are poorly understood. managing the network of public policy making body is a government department, the implementation of e-government, the development of network resources, promote the network from the text published in the transfer printing is the current important task of the relevant government departments. University's policies, attitudes and practices critical to the development of digital libraries. Market instruments and policies that balance the network archives building, web archives run, online content delivery and preservation should be considered.


[1] Peer to peer Networking and Digital Right Management, by Michael A. Einhorn, Bill Rosenblatt, Policy Analysis No.534, CATO Institute. Fabruary 17,2005
[2] What Every Citizen Should Know About DRM, aka Digital Right Management, by Mike Godwin Seuitoo Technology Counsel At Public Knowledge, 30 January, 2008, Ebook-Computer & Internet,
[3] Peter Lyman archives of the World Wide Web. Information reference, 2004 (13
[4] Zhou Hongren, etc. information to help build a harmonious society, 14 focus points. Chinese information industry, 2008 (3
[5] ITU-T Technology Watch Reports. 2006-2008, Telecommunication Standardization Policy Division, ITU Telecommunication Standardization Sector

Links to free download http://eng.hi138.com

[DBNETLIB][ConnectionOpen (Connect()).]SQL Server 不存在或拒绝访问。

Related Research Papers on Information Management

File Management Papers