Powered by Social Snap. Copy link. Copy Copied. Data Warehousing and Mining Notes. Data Warehousing and Mining Book. Data Warehousing and Mining Syllabus. Data Warehousing and Mining Question Paper. A short summary of this paper. All rights reserved. Seagate Software is a trademark of Seagate Technology, Inc. All other trademarks and registered trademarks are property of their respective owners.
For a complete listing of the Artech House Computer Science library, turn to the back of this book. ISBN alk. Database Management.
Data mining I. Series HE Data warehousing and data mining for telecommunications— Artech Computer Science Library 1. Decision support systems 2. Telecommunications 3. Digital communications I. Illustrations by Brigitte Kilger-Mattison. Printed and bound in the United States of America. No part of this book may be reproduced or utilized in any form or by any means, electronic or mechani- cal, including photocopying, recording, or by any information storage and retrieval sys- tem, without permission in writing from the publisher.
All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized. Artech House cannot attest to the accuracy of this informa- tion. Use of a term in this book should not be regarded as affecting the validity of any trademark or service mark. Contents Foreword xiii 1.
This growth has affected every line of business in telecommunications, including the long distance business, the emerging wireless industry, and the changing wire- line industry. Competition has entered every one of these businesses as expan- sion has shifted into high gear. And competition is a key attribute of these businesses in the future. Who will survive and even what the future telecommu- nications enterprise will look like are the basic questions to be answered over the next several years.
During this same time period the information industry has been undergoing its own explosive growth. Chip speed and storage technology have brought the capability to have multi-terabyte sized databases, even to small- and medium- sized companies. Our understanding regarding the value of and best design for data warehouses has evolved from earlier concepts of Executive Information Systems EIS.
USCC has grown approxi- mately tenfold in less than five years with over 1. The cellular business has been a duopoly up to now, but new compe- tition from PCS personal communications services carriers is entering our markets in an attempt to leapfrog the cellular carriers with new technology and marketing force.
The stage is set then for a new round of growth for the wireless business, but it is also a much more competitive round where timely actionable information and analysis will be increasingly critical.
USCC introduced its first-generation data warehouse just a few years ago. It served primarily as a gathering place for data from various diverse legacy systems. Our access and analysis tools were limited to reporting tools, and the amount of data was limited by our understanding of requirements and potential uses.
The design of our next-generation data warehouse takes advantage of improved hardware capabilities so that terabytes of data will not be far off for us. But more importantly, the very objectives of the data warehouse have evolved. Now we are looking at a full fledged marketing and customer service tool which will incorporate external data as well as internal business data. More sophisticated tools will provide not only reporting but analysis that will help us to see correlations and ultimately business opportunities we had not previously identified.
The improved technology also helps to make this information avail- able on a scale and at a price that will allow us to extend access to the entire company. By extending this access we can bring the value of this new technology to our customers. We can find opportunities to help our existing customers more advantageously use their service. We can better understand what our customers want by more clearly understanding their usage behavior.
The data warehouse provides us an opportunity to improve our overall service to our customers and thus improve our business. This book provides new concepts in identifying and providing business value through the use of the data warehouse tool.
The focus is not on technology but what business value can be brought to the enterprise with that technology. James D. I would like to give credit where credit is due and acknowledge those First and foremost, at least half of the credit for this book needs to go to my wife, Brigitte Kilger-Mattison.
Brigitte was responsible for editing all the mate- rial, creating all the graphics, and coordinating all the efforts of everyone else involved in this project.
This book could not have been completed without her painstaking attention to detail, her dedication, and her loyalty. I would also like to thank the two people who reviewed this work as it was in progress, and who provided me with valuable feedback.
Thanks guys. The latter part of this book, which holds the information about data mining tools, could not have been completed without the support of the software vendors themselves.
I would also like to thank my coworkers and associates at Sequent Com- puter Systems for putting up with me while I pounded through this material.
Thanks for your tolerance through it all. It is almost inconceivable that it was only years ago when Alexander Graham Bell invented the first functional telephone, a device that allowed two people to talk to each other across a stretch of wire only several yards long.
The impact of telecommunications on our lives cannot be emphasized enough. And yet, there is so much of it, and it is coming so fast, and its impact is so insidious, that we really cannot successfully appreciate exactly how far reaching it is. No one can deny that the business of telecommunications is a big business. It is clearly a modern miracle. Equally clear is the fact that working within a telecommunications firm can be exciting, challenging, and more than a little bit frustrating.
The purpose of this book, in the broadest sense, is to talk about that process, to develop a better understanding of it, and to try to identify those situations where the application of a different kind of technology computer technology most effectively supports that bigger mission.
We want to talk about ways to harness the power of computer technologies specifically data ware- housing and data mining and see how they can supplement the process of turning the technological telecommunications potentials into economic and practical reality for people, businesses, and governments all over the world.
We want to talk about how we can greatly enhance the process of turning ideas into actions and turning those actions into the results that we have all come to expect from telecommunications firms.
The following list provides us with a reference to some of the more promi- nent segments of the telecommunications industry today. While the humble process of making it possible for Aunt Mabel to talk with Uncle Billy from opposite sides of the world is nice, it just does not seem like such a strong and driving force as to make the telecommunications industry as big, powerful, and important as it is.
Clearly, there is something else involved. That something else is business. While telecommunications has made personal communication available, it is for the most part a wonderful conven- ience, not a necessity. But when you look at telecommunications and the role it has played in the development of business, and the incredible role it is getting ready to play in the business of the future, you begin to get a much better idea of what the telecommunications and information revolutions are all about.
Think about it for a minute. How has business changed over the past few decades, and what has the role of telecommunications been in making that change possible? Businesses are bigger, more efficient, more global in their perspective and more dominant in all aspects of our lives.
But how did businesses get that way? The stark reality is that telecommunications, in combination with computer innovation, is making it possible for individuals and companies to function at levels never before possible. How much time is spent on the telephone? How dependent is the business on the fax machine, electronic mail, videoconferencing, and conferenced phone calls? What business person can survive without a pager, a cellular phone, and voice mail?
Taken from the big picture, what does this dependence on telecommuni- cations represent? In the simplest terms, it provides efficiency, incredible efficiency never before imagined.
Telecommunications capabilities allow com- panies to coordinate the activities of thousands, even millions of people located anywhere throughout the world. It makes it possible for the best minds to be applied to the most critical problems without concern for where the person happens to be physically located. It enables people to work at levels of efficiency that are staggering by the work standards of only a decade ago, and it makes it possible for everyone to concentrate on the most critical issues that drive the business: efficiency, speed, accuracy, responsiveness, and completeness.
No corporation or government agency of import is ignoring the importance that telecommunications plays. The Internet interactive video, shop at home net- works, home shopping services, telemarketing and direct marketing initiatives, all drive from the telecommunications space. Telecommunications is making more products available to more people than ever before. Look at the pervasive nature of credit cards! What about automated teller machines ATMs and instant cash through banking cards.
Electronic data interchange EDI and other related technologies are making it possible for companies to work together and to coordinate their activities through the simple addition of a telephone line to the equation.
Banking, medicine, government, and almost every other industry are being affected by the telecommunications industry in similar ways. No one anywhere is untouched.
We are no longer limited by space, time, and distance the way we used to be. We are no longer crippled by miscommunication, ignorance, and inefficiency in the ways we used to be.
The new operational paradigm, the one created in no small part by the telecommunications revolution, has been to shift our concerns from the man- agement of things to the management of knowledge itself.
Raymond W. Unlike physical inputs, knowledge is a resource you cannot use up. The more you dispense in your organization, the more you generate. Just like the biblical loaves and fishes—no diminishing returns, only expanding ones.
But we have a long road before we get there. There have been many bumps along the road. Besides the obvious and ever present demands to mix a competitive environment that drives change and a monopo- listic environment that promotes low cost and consistency, we also have the problems of defining what the new products and services will be and how they will work. Some are still being tried. Some have resulted in big success. Almost all of them involve the merging, partnering, or cooperation of many large firms.
Will this prove to be a prudent investment? The recent announcement of British Telecom and its assumption of full ownership of MCI creates some interesting competitive possibilities on an incredibly grand scale.
Bell South has recently experienced less than overwhelming results in its attempt to expand fiber networks into the homes of Hawthorne, Florida. Clearly, if you are in the telecommunications industry today, then mergers, cooperatives, and other types of deals are a big part of a successful long-term strategy.
A version of this service is already being offered in the United States, where an unanswered phone call to your home will switch to your cellular phone, and then perhaps to your pager or voice-mail system.
From the very outset, telecommunications firms had a desperate need for hard data and the ability to interpret it. This was a simple wooden board with rows of holes in it. In the earlier days of telecommunications, operators connected all calls. The folk tales state that an undertaker by the name of Almon B. Strowger, who was, coincidentally enough, from Kansas City, became convinced that the local exchange operators were purposely routing all calls for undertaking services to his competitor down the street.
He swore to develop a mechanism that would make these operators obsolete and eventually invented the Strowger switch, a device that allowed for the first automated connecting of two telephones by mechanical means.
What is clear, however, is that these firms are learning quickly and are in many ways surpassing their brothers and sisters in other industries in their ability to figure out how to make the knowledge they possess more available to more people. Of course, if we want to talk about a better way for telecommunications firms to meet their strategic objectives, then we must start with an understanding of what those objectives are.
At the highest level, there are only three strategies open to any business that wants to achieve dominance in their market place.
Obviously, the strategy that a particular firm is pursuing will dictate how data warehousing and data mining can help them achieve these ends, but in all cases these technologies represent one of the best ways to accomplish any of them.
In the government-sponsored monopoly days, efficiency, size of infrastructure, and efficiency of operations were the keys to success. With deregulation, however, comes a change in emphasis from a network-is-king philosophy to one that emphasizes the role of the customer.
And for these companies, data warehousing and data mining provide them with the ideal means to exploit it effectively. One tremendous advantage that any telecommunications firm has over its competition and over other industries is the fact that the telecommunications firm knows more about its customers than anyone else.
They know who they are, where they live, where they go, and what they do. Keeping track of customer activities is a byproduct of the service they provide, and figuring out how to capitalize on that knowledge is what data warehousing and data mining is all about.
Marketing databases, customer information systems, enhanced customer service capabilities, predictive behavior models, and integrated marketing strategies are just a few of the tools that the telecommunications firm can use to gain a dominating control over its relationship with customers.
And this kind of customer loyalty is not easily stolen by competitors. The key data stores to support these kinds of activities are customer and transaction based. And the key activities that make exploiting it possible are all marketing based.
We will spend a significant amount of time throughout the rest of this book talking about these activities, and we will show you how they can be exploited quickly and economically. Many of the infrastructural and organizational issues that we will be considering focus on how to put these kinds of operational monitoring and control systems together to the benefit of the entire firm. A company that wants to stay on the leading edge of technology needs to be aware of what all of the pieces of the organization are and what they are doing.
Value-chain-driven warehouses meet these needs. At the same time, technical proficiency requires the optimization of several distinct operational areas. In these cases, analytical warehouses and advanced mining tools can provide engineers and managers with the information neces- sary to drive their business ahead. This chapter has provided us with a better understanding of the scope, nature, and major issues that drive the telecommunications firm today, as well as with some ideas about where they will be headed in the future.
In the following chapters we will be considering the telecommunications company itself in far greater detail. We will examine ways that the company, the information systems, and the infrastructures can be better utilized to improve profitability and competitiveness to even greater levels of accomplishment. In order to do that, we need to begin by answering some fundamental questions.
If the answers do not make sense, then data warehousing and data mining may not be the kinds of technological solutions you should be considering. Data warehousing has become the hot, new, latest and greatest technology to come along since the invention of the database.
But what is a data warehouse, really? If you have spent any time at all talking with hardware or software vendors or reading magazine articles, you will have discovered pretty quickly that, while everyone thinks that these are great things to do, no one can agree on exactly what they mean or how they should be done. This, of course, presents us with an immediate problem. It is an approach that involves no specific discipline, no science, no clear rules or guidelines, and no solid set of tangibles by which you can describe it.
It is more like a philosophy or a way of looking at things than it is a true data processing enterprise. Yet, there have been dozens of books and hundreds of magazine articles written about it, and millions of dollars spent on promoting it, by hardware vendors, software vendors, conferences, trade shows, and all sorts of dignitaries and gurus.
It is a phenomenon that has become all too common in the data processing industry. Ultimately, the roots of data warehousing can be found in the disciplines of database and data management. For years, in fact since the early s, organi- zations and theorists have realized that managing information, and the data that makes that information usable, was the driving force behind modern large-scale corporate business enterprises.
Unfortunately, data and information are not easy things to manage. Why warehousing and how to get started 15 In the s, the first major attempt to provide for the wholesale manage- ment of data was attempted. Specialized software, called database management systems, was created to help businesses get control over the vast landscape of data that they were being forced to manage.
Some data- bases were as large as several hundred megabytes in size. Then, in the mids came the invention of the concept of a relational database. Relational databases were different than these earlier versions because they allowed people to gain more access in more ways with much less dependence on programmers. These relational databases took the industry by storm, to the point where almost no other types of databases even exist anymore.
While everyone was still reeling from the shock waves that the relational database revolution created for the developers and users of business systems, we found ourselves in the throws of yet another revolution.
Suddenly, people found themselves with pow- erful personal computers on their desktops. These computers had more power in them than the older mainframe systems used to have back in the early days of data processing. Not only were they powerful, but they allowed the business person to become his or her own programmer and database administrator DBA.
The result was a major shift in our comprehension of how information could be managed differently. One good thing that came out of the client-server revolution was an under- standing of what the true potential of the combination of dynamic networks, low-cost UNIX database servers, and intelligently loaded personal computers might be in order to change the shape of the corporate working environment.
The bad thing was that we discovered, once again, that the management of the data that drove these processes became the weakest link and the most limiting factor, keeping us from realizing the full potential that was being offered. It is upon this scene that the data warehousing revolution presented itself.
The old way of viewing systems as online transaction processing sys- tems, or reporting systems, or consolidation systems, all failed to let us organize things in a way that really put together all of the pieces that the last several generations of innovation had been delivering to us. This is why, when you look at it on the surface, data warehousing seems to be such an amorphous, undefined, intangible concept.
It seems that way be- cause it has to be. It seems that way because it is really an approach that tries to take all of these capabilities, learn from all of the failures and problems we have experienced in the past, and put them together in a way that makes sense out of it all.
Every theory of data management before data warehousing was driven by several fundamental principles. For over 40 years the industry has been a slave to these principles.
What is truly revolutionary about warehousing is the way that it discards or drastically alters the attention we give to these principles. The elimination of data redundancy and the minimization of disk storage space—Everybody who was anybody agreed that the primary goal of any data management exercise was to try to figure out how to minimize the amount of data that was being stored. It was never, or hardly ever, permissible to store the same data element name, address, sales-code, etc.
A successful activity was one where hours, days, and sometimes even months were spent trying to figure out all of the ways that people might want to use data, or all of the theoretically proper ways to store it, and then go ahead and store it that way once and for all. The use of entity relationship and normalization modeling techniques— These two techniques became the hallmark of the database design process, and both techniques demand that a theoretical discipline not a business-oriented one determine what goes into the data- base.
The dependence on the systems development life cycle and JAD joint application development sessions as the means of designing systems— These techniques represent the sum total of several decades of experi- ence in the building of computer systems.
Unfortunately, like the database design disciplines mentioned above, they too assume that you are building a type of system that is not part of the new organizational paradigm that data warehousing systems require.
Where the old paradigm made the elimination of data redundancy and the optimization of disk storage space the key to design, data warehousing says that the duplication of data is okay, and in many cases a good thing to do.
The Star Schema design paradigm replaces the older approaches. Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.
Integrated Constructed by integrating multiple, heterogeneous data sources Integration tasks handles naming conventions, physical attributes of data. Only accurate and valid at some point in time or over some time interval. The time horizon for the data warehouse is significantly longer than that of operational systems. Operational database provides current value data. Data warehouse data provide information from a historical perspective e.
Data Warehouse is relatively Static in nature. Not updated in real-time but data in the data warehouse is loaded and refreshed from operational systems, it is not updated by end users. Data warehousing helps business managers to : Extract data from various source systems on different platforms Transform huge data volumes into meaningful information Analyze integrated data across multiple business dimensions Provide access of the analyzed information to the business users anytime anywhere.
OLTP vs. Data Warehouse Online Transaction Processing OLTP systems are tuned for known transactions and workloads while workload is not known a priori in a data warehouse OLTP applications normally automate clerical data processing tasks of an organization, like data entry and enquiry, transaction handling, etc. Data Warehouse Query throughput is the performance metric Hundreds of users Managed by subsets.
Why Data Mining? Because it can improve customer service, better target marketing campaigns, identify high-risk clients, and improve production processes. In short, because it can help you or your company make or save money. Data mining has been used to: Identify unexpected shopping patterns in supermarkets. Optimize website profitability by making appropriate offers to each visitor. Predict customer response rates in marketing campaigns.
Defining new customer groups for marketing purposes. Predict customer defections: which customers are likely to switch to an alternative supplier in the near future. Distinguish between profitable and unprofitable customers. Identify suspicious unusual behavior, as part of a fraud detection process. Data analysis and decision support Market analysis and management Target marketing, customer relationship management CRM , market basket analysis, cross selling, market segmentation.
Risk analysis and management Forecasting, customer retention, improved underwriting, quality control, competitive analysis Fraud detection and detection of unusual patterns outliers.
Other Applications Text mining news group, email, documents and Web mining Stream data mining Bioinformatics and bio-data analysis. Market Analysis and Management Where does the data come from? Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus public lifestyle studies Target marketing. Find clusters of model customers who share the same characteristics: interest, income level, spending habits, etc.
Determine customer purchasing patterns over time. Identify the best products for different groups of customers Predict what factors will attract new customers.
Provision of summary information Multidimensional summary reports Statistical summary information data central tendency and variation. Finance planning and asset evaluation cash flow analysis and prediction contingent claim analysis to evaluate assets. Resource planning summarize and compare the resources and spending Competition monitor competitors and market directions group customers into classes and a class-based pricing procedure set pricing strategy in a highly competitive market.
Auto insurance: ring of collisions Money laundering: suspicious monetary transactions Medical insurance. Professional patients, ring of doctors, and ring of references Unnecessary or correlated screening tests.
Telecommunications: phone-call fraud Phone call model: destination of the call, duration, time of day or week. Analyze patterns that deviate from an expected norm. Data Selection Once you have formulated your informational requirements, the nest logical step is to collect and select the data you need.
Setting up a KDD activity is also a long term investment. A data environment will need to download from operational data on a regular basis, therefore investing in a data warehouse is an important aspect of the whole process. Cleaning Almost all databases in large organizations are polluted and when we start to look at the data from a data mining perspective, ideas concerning consistency of data change. Therefore, before we start the data mining process, we have to clean up the data as much as possible, and this can be done automatically in many cases.
Enrichment Matching the information from bought-in databases with your own databases can be difficult. A well-known problem is the reconstruction of family relationships in databases. In a relational environment, we can simply join this information with our original data. Data mining refers to extracting or mining knowledge from large amounts of data. Many people treat data mining as a synonym for another popularly used term, Knowledge Discovery from Database, or KDD.
Alternatively, others view data mining as simply an essential step in the process of knowledge discovery. Reporting It uses two functions: 1. Analysis of the results 2. Application of results. Visualization and knowledge representation techniques are used to present the mined knowledge to the user. Operational Data Sources: It may include: Network databases. Private workstations and servers. External systems Internet, commercially available databases. Automated Prediction of trends and behaviours: Data mining automates the process of finding the predictive information in large databases.
For example : Consider a marketing company. In this company, data mining uses the past promotional mailing to identify the targets to maximize the return. Automated discovery of previously unknown patterns: Data mining sweeps through the database and identifies previously hidden patterns.
0コメント