应用市场网上信息挖掘的在线书店.doc

资源描述

《应用市场网上信息挖掘的在线书店.doc》由会员分享，可在线阅读，更多相关《应用市场网上信息挖掘的在线书店.doc（27页珍藏版）》请在三一办公上搜索。

1、毕业设计(论文)外文文献翻译专业计算机科学与技术学生姓名李蕾班级学号指导教师信息工程学院 Applications of web mining for marketing of online bookstoresAbstract: The purpose of this study is to identify potential customers of online bookstores through web content mining without customers transaction records and demographic information. Our stud

2、y first creates a list of scholars whose research field is in information technology and categories of IT expertise. We then use a search engine to count the numbers of web pages related to scholars and expertise. These data are pre-processed with three key steps before being used: filtering abnorma

3、l data, normalizing data, and generating binary data. Association analysis and hierarchical cluster analysis are employed to generate the clusters of scholars and the clusters of expertise. In order to test the accuracy of using web mining to predict clients interested booklists, our study evaluates

4、 the accuracy of prediction through survey. The results show that the accuracy rate of the recommended booklists targeted on potential customers (scholars) is statistically significant.Keywords: Web mining; Association analysis; Hierarchical cluster analysis; Marketing; Online bookstore1. Introducti

5、onThe exponential growth of the Internet and the evolution of the multimedia technology have grown electronic commerce (E-commerce) and offered a new business model for those industries using physical distribution system. Because there are no limitations of time and space on the Internet, customers,

6、 therefore, can browse among products and order easily. In the midst of an information explosion, the demand from customers starving for knowledge increases. Under the circumstances, with the aid of the Internet, the online bookstore not only provides the convenience for reading, but also customizes

7、 individual service, both combined to satisfy readers demand of knowledge. At present, one of the marketing approaches in the online bookstore is to email booklists to all customers in the database, expecting to increase the response and purchase rate through the massive contact. There is no pre-cla

8、ssification or screening in this “one to all” approach. An improved approach is to email one of the pre-designed booklists to each customer based on their past transaction records and interests. However, the most idealistic method is to use “one-to-one” marketing, which is an approach that concentra

9、tes on providing services or products to one customer at a time by identifying and then meeting their individual needs.Data mining is the technique of sorting through large amounts of data and picking out relevant information (Han & Kamber, 2001), and can be used to achieve one-to-one marketing (Ber

10、ry & Linoff, 1997). Through the use of sophisticated algorithms identifying trends within data that go beyond simple analysis, users have the ability to identify key attributes of business processes and target opportunities.Web mining, in general, is the application of data mining techniques to disc

11、over patterns from the web (Baraglia and Silvestri, 2007, Chakrabarti, 2002, Cooley et al., 1999 R. Cooley, B. Mobasher and J. Srivastava, Data preparation for mining world wide web browsing patterns, Journal of Knowledge and Information System 1 (1) (1999), pp. 532.Cooley et al., 1999, Eirinaki and

12、 Vazirgiannis, 2003, Liu, 2007, Mobasher et al., 2000 and Olson and Shi, 2007). For example, using association analysis to analyze users usage data, which records the users behavior when the user browses or makes transactions, on the web site and the results can make the content of the website to fi

13、t correctly with the users needs. Different with data mining, there are no existing data available for web mining. Web miners can use name or terminology to search and to collect data. There are lots of valuable information on the web, but it is not easy to find it. Search engines provide the initia

14、l act needed to conduct more complex form of web mining.The research objectives of this article are to capitalize on search engines to collect data, identify the characteristics of potential customers through association analysis and cluster analysis, and provide recommended booklists to the targete

15、d potential customers to improve the current marketing mode employed by the majority of online bookstores: target blindly on customers. Online bookstores use two modes for marketing: finding customers for each book and finding books for each customer. These two modes, basically, can be connected to

16、commercial value through the concept of communities of practice.The concept of a community of practice was introduced by Lave and Wenger (1991) and they defined it as the process of social learning that occurs when people who have a common interest in some subject or problem collaborate over an exte

17、nded period to share ideas, find solutions, and build innovations. It refers as well to the stable group that is formed from such regular interactions. More recently, communities of practice have become associated with knowledge management as people have begun to see them as ways of developing socia

18、l capital, nurturing new knowledge, stimulating innovation, or sharing existing tacit knowledge within an organization (Wenger, McDermott, & Snyder, 2002). A community used in knowledge management looks like a cluster used in marketing. Both communities and clusters show high internal homogeneity wi

19、thin cluster, e.g. people have similar purchase preference or a common interest in a product within cluster. There are two types of communities of practice: face-to-face based and web based communities of practice (or virtual communities). In short, from the perspective of the Internet marketing, ou

20、r study employs web mining to search and to collect customers data, identifies two kinds of clusters (communities) by association analysis and cluster analysis from the data, and finally aims at increasing customer satisfaction and the success of marketing.The major strength of our research is to fi

21、nd potential customers without existing customers background information and transaction records. In the past, most online bookstores marketed products or services based on existing customers databases and cannot actively contact potential customers those customers never transacted with online books

22、tores or their background information are not available. Therefore, using web mining to search potential customers may increase online bookstores sales and profits.2. Literature review2.1. Web miningWeb mining is the integration of information gathered by traditional data mining methodologies and te

23、chniques with information gathered over the World Wide Web. The web presents new challenges to the traditional data mining algorithms that work on flat data. Kosala and Blockeel (2000) surveyed the research in the area of web mining, suggested three web mining categories web content mining, web stru

24、cturing mining, and web usage mining, and then situated some of the research with respect to theses categories. The three different types of web mining are defined as (Pierrakos et al., 2003 and Wikipedia, 2008): Web usage mining is the application that uses data mining to analyze and discover inter

25、esting patterns of users usage data on the web. The usage data records the users behavior when the user browses or makes transactions on the web site. Web structure mining is the process of using graph theory to analyze the node and connection structure of a web site. Web content mining is the autom

26、atic process to discover useful information from the content of a web page. The type of the web content may consist of text, image, audio or video data in the web. Web content mining sometimes is called web text mining, because the text content is the most widely researched area. There are two group

27、s of web content mining strategies: Those that directly mine the content of documents and those that improve on the content search of other tools like search engines ( Galeas, 2008). The information gathered through web mining is evaluated (sometimes with the aid of software graphing applications) b

28、y using traditional data mining techniques, such as clustering and classification, association, and examination of sequential patterns.2.2. Association analysisAssociation analysis is defined as the discovery of association rules showing attribute-value conditions that occur frequently together in a

29、 given set of data (Han & Kamber, 2001). Association rule is widely used for market basket analysis. An example of such a rule is when a customer purchases a computer, calculating the probability that the software will be purchased together. The results can be used to develop marketing or advertisin

30、g strategies and the items that are frequently purchased together can be placed in close proximity in order to promote the sales of such items together (Berry and Linoff, 1997 and Han and Kamber, 2001).Following the original definition by Agrawal, Imielinski, and Swami (1993), association mining is

31、defined as:Let I=i1,i2,in be a set of n binary attributes called items. Let D=t1,t2,tm be a set of transactions called the database. Each transaction t has a unique transaction ID and contains a subset of the items in I. A rule XY is defined as an implication of the form where X,YI and XY=. The sets

32、 of items X and Y are called antecedent (left-hand-side or LHS) and consequent (right-hand-side or RHS) of the rule.There are two important parameters used in association analysis to measure the accuracy and utility of association rule: support and confidence (Han and Kamber, 2001 and Paolo, 2003).

33、The support of an association pattern refers to the percentage of item-relevant data in the transaction data base. For example, for item A, the support is defined as:Support(A)=the numbers of transactions containing A/total numbers of transactionsLet A and B are sets of items, and the confidence is

34、used to measure how often items in B appear in transaction that contain A. Given a set of transactions data, the confidence of AB is defined as:(1)High support means that the item set in the association rule appears very often. High confidence means that the inference of the association rule is reli

35、able. To assure the quality of association rules, we need to set a minimum support threshold to assure the item set in the association rule appears often, and set a minimum confidence threshold to assure the inference of the association rule is reliable.However, both high support and confidence also

36、 lead to the result of fewer interesting rules, which will be a problem in a small size of data. Therefore, it is better to reference users demand to decide and balance both the high values of parameters and the numbers of interesting rules.A thorough discussion of algorithms in association analysis

37、 is beyond the scope of this paper. Its basic algorithm has been applied widely (Berry and Linoff, 1997, Han and Kamber, 2001, Olson and Shi, 2007 and Paolo, 2003).2.3. Cluster analysisThe primary objective of cluster analysis is to classify objects (respondents, products, events, etc.) so that each

38、 object is very similar to others in the cluster (Hair, Anderson, Tatham, & Black, 1998). The resulting clusters of objects should show high internal (within-cluster) homogeneity and high external (between-cluster) heterogeneity (Hair et al., 1998). Cluster analysis may reveal associations and struc

39、ture in data which, though not previously evident, nevertheless are sensible and useful once found (Clustan, 2008).Hierarchical cluster analysis (or hierarchical clustering) is a general approach to cluster analysis. A key component of the analysis is the repeated calculation of distance measures be

40、tween objects, and between clusters once objects begin to be grouped into clusters. The outcome is represented graphically as a dendrogram or tree graph.The initial data for the hierarchical cluster analysis of N objects is a set of N(N1)/2 object-to-object distances and a linkage function for compu

41、tation of the cluster-to-cluster distances. The most common algorithms for hierarchical clustering are: single linkage, complete linkage, average linkage, centroid method, and Wards method. These algorithms differ mainly in how the distance between clusters is calculated (Hair et al., 1998 and Stati

42、stics, 2008).3. Methodology3.1. Data collection and processingIn Taiwan, a variety of books and textbooks are sold through online bookstores and university professors are important online shoppers. Moreover, they are the key persons who decide textbooks used in class. Most online bookstores promote

43、books and communicate with customers (e.g. university professors) through email. Due to the problem of email spam the abuse of electronic messaging systems to indiscriminately send unsolicited bulk messages, most customers tend to delete the advertising mails directly upon receiving them. Therefore,

44、 it is indeed a challenge regarding attracting receivers interests and encouraging them to read the mail.The meanings of an effective electronic mail are twofold. First, the keynote of an electronic mail should entice potential clients. Second, the content of an electronic mail should be advantageou

45、s to them. In order to achieve the previous two objectives, we have to know in advance regarding clients interests and expertise. Usually, it is very difficult to acquire customers individual information. To protect individual privacy, most respondents are very sensitive to answering questions relat

46、ing to their privacy and do not have willingness to offer their profiles to companies. Not like online bookstores which have well-established customers databases or transaction records, our study has to find an alternative approach to obtain customers interests and expertise data.With the evolution

47、of information technology, our life is increasingly inseparable from the Internet. Most of the data on the Internet are public and accessible. Using the Internet is an effective and convenient approach to gain customers data. The data collection and data processing are depicted as follows (The frame

48、work is shown on Fig. 1).Full-size image (57K)Fig. 1.Data collection, processing, and data mining.View Within Article3.1.1. Data collection1. Form a list of scholars and types of expertise Our study firstly employed the database from National Science Council, Taiwan ( www.nsc.gov.tw), to gain random

49、ly a list of scholars (university professors) whose research interests are in information technology in Taiwan and form types of expertise of information technology. After screening and deleting invalid data, we obtained a list of 200 in both scholars and expertise in information technology field. 2. Count the numbers of web pagesWe used search engines t

展开阅读全文