Data Mining with Microsoft SQL Server 2008 Review Chapter 12
Neural networks continue to fascinate people because of the history in trying to model the brain. The longer history ties to the Artificial Intelligence community, where people continue serious work on mimicing intelligence in machines. Some say the effort has died, and perhaps modeling a machine to one user is not the end goal for everyone. Rather, I believe the Internet itself is a form of collective intelligence, and therefore the ideas may transcend a single processor, user, or application.
Jeff Hawkins, developer of the Palm Pilot, has been personally interested in modeling the brain, as he describes in his book On Intelligence. In this book, Hawkins describes the brain as a six layered network of neurons, and he provides some speculation on how the layers interact with one another. I use the word speculation intentionally because the scientific method requires science fiction. Speculations and hypotheses are essentially science fiction, and researchers test their fictional but plausible ideas in an attempt to learn new scientific facts. It is generally true that even peer-reviewed journals contain a mixture of science fact and science fiction, hopefully more of the former than the latter. Because the scientific community rewards people who become the first recorded or known discoverer, there is therefore an economic incentive (if not just the economy of fame) to be the first one, and hence a market for science fiction outside the novels and movies that people know and love. Modeling the brain continues to be such an exercise, prompted likely because of the known capacities of the mind, and the known energy requirements and physical processing efficiency of the brain. I happen to be among those who believe in a mind which transcends just the brain, a conclusion both supported by science fact and science fiction, and I believe conclusively determined through logic. When smart people compete, therefore, I do not believe they are competing with their brains but instead with their minds.
The brain inside Microsoft Neural Networks does not have the six layer capacity of the human brain, but only has three layers. Statisticians widely believe that the output from neural networks is hard to interpret. The technique is generally robust (as the authors claim on page 377) for situations where there are fewer cases but many input variables. This algorithm should be used for text analysis (at least as a competitor) and the book uses a text mining example to illustrate through DMX.
As with previous chapters, we can add DMX code to make the book’s example read data from our SQL Server database:
// Create a data source using the utility stored procedure// provided with the book
CALLASSprocs.CreateDataSource(‘Chapter 12”,Provider=SQLNCLI10.1;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=DM2008_Chapter12′,‘ImpersonateCurrentUser’,”,”)GO
The DMX code eventually includes a system call to obtain results. The following chart shows in SQL Server Management Studio when connected to the Analysis Services database:
Though I generally avoid large graphics, I like this one since it, in one command, displays the results of three different data mining models using three different algorithms: logistic regression (which is a special case of neural networks, and therefore included in this same chapter, but still considered a different algorithm), decision trees, and naive Bayes. The text mining task focuses on predicting the political party based on words and phrases from a speech. In this example, we did not go through Integration Services to determine these phrases, but we could have. The authors provided the terms already made (you can see http://sqlserverdatamining.com for a tutorial on text mining). This example focuses on the modeling part after we have the terms.
Note in the graph that the pass and fail rates are different depending on the algorithm. Because you can make models to run simultaneously in the same structure, it is a data mining best practice to test models against one another. That concept makes sense at this point in the book, which earlier discussed decision trees and naive Bayes in separate chapters. The focus, again, for data mining is gaining insight toward actionable results. Data mining does not make decisions for us, but informs us on how to make better decisions. Based on the results above, we might conclude that the input data does not have strong predictive value for political parties. One factor which the authors did compare (on page 377) is the success rate among algorithms. Though, I believe more people would be concerned about the overall success rate. And, for people who like politics, I have to concede that this example is not intended as the most robust example. However, it does run quickly and with more data I would NOT shy away from tackling larger problems. In this case, the number of input variables maps to unique words and phrases from speeches. A more involved process would study which words or phrases had more predictive value for political parties and perhaps filter out others. Note that the input deals with American politics over a 200 year period, and therefore we have an issue of parties changing ideas over time. In current American politics we could run a similar example across a presidential election year, since politicians speak many words during a campaign, certainly sufficient to distinguish not just among parties, but perhaps among politicians. Some might use this ability to cluster politicians together based on topics of interest. The amount of words I am using in describing this problem may seem like a lot, but consider that text mining values words and phrases as the currency of expression. Beyond political speeches, here are some other ideas for text mining:
- legal documents
- executive mandates and communications
- judicial opinions
- resumes (vitae)
- news bulletins
- scientific documentation
- tweets (Twitter postings)
- blog postings
- social network profiles
As with the text mining example, we also need one or more demographic variables representing the source to be the target variable (or attribute).
The second DMX example shows the use of a nested table for input. These algorithms (neural network and logistic regression) cannot predict a nested value (as the association algorithm can), but the ability to have nested variables input increases the possibilites. I have said before that an advantage of the SQL Server Data Mining solution includes the ability to have nested input, unlike many competing products which require a single flattened table as input. For this second example, the last DMX statement asks when the predicted values do not match the actual values, to filter by that criteria, and also predict the probability. I believe adding this logical last DMX statement helps the output:
by PredictProbability([Home Ownership], T.[Home Ownership]) desc
The output starts with the following observations:
The authors also include a solution, using the sample data, and creating a single neural network data mining model. The data is the same customers data, and the goal is predicting home ownership. I have only shown a few BIDS screens during this review, and this time I am going to share two of the views that surface for Neural Network modeling. The first screen is the Microsoft Neural Network Viewer, and the second one is the Microsoft Generic Content Tree Viewer (I would have shortened the name of the latter to be “Microsoft Generic Viewer” or “Microsoft Tree Viewer” or even “Microsoft Node Viewer”). The generic viewer shows a numerical breakdown of the model structure, and that view might appeal to people wanting to quickly see what numbers these data mining models produce. The MSDN documentation elaborates on specific output, but sometimes seeing the actual results creates a comfort level, especially for applied statisticians and data analysts.
For a final illustration, this chapter returns to the linear separability data from chapter 6. This data is not linearly separable, and thus many of the data mining algorithms have problems separating groups which we can visually separate. This data includes 40 observations, and includes both the numeric (x,y) coordinates, and the discretized (categorized) equivalents. The goal is to predict the output of the shape, either a PLUS or a SQUARE shape. The following figure illustrates the source data.
Using the viewers, it’s difficult to interpret the result when I run several models. You can try the problem yourself and see what the default viewers show (I will not include those images in this blog post). As a general rule, I believe people doing serious production data mining should learn to use DMX to get any types of numbers required. I like and appreciate the included viewers, which are great for making PowerPoint and Visio presentations. I return to the authors’ recommended call to SystemGetAccuracyResults. In my run I used four data mining models:
- Decision Trees with categorized input
- Naive Bayes with categorized input
- Neural Networks with categorized input
- Neural Networks with numeric input
Using categorized input simplifies the task, but only Neural Network models provided anything close to reasonbable predictions, as illustrated in the following table.
I hesitate when, in the machine learning world, people talk about the “best classification algorithm”. I often hear people dismissing neural networks as lacking predictive value. In the business world, the mistaken impressions of some create competitive advantages for others. Never leave any tool in the toolbox — but if you do, have an empirical reason for doing so.
The MSDN Documentation provides excellent information online about the Neural Network algorithm:
- Mining Model Content for Neural Network Models
- Microsoft Neural Network Algorithm
- Microsoft Neural Network Algorithm Technical Reference
- Viewing a Mining Model with the Microsoft Neural Network Viewer
- Querying a Neural Network Model
The MSDN Documentation provides excellent information online about the Logistic Regression algorithm:
- Mining Model Content for Logistic Regression Models
- Microsoft Logistic Regression Algorithm
- Microsoft Logistic Regression Algorithm Technical Reference
- Querying a Logistic Regression Model
MacLennan, J., Tang, Z., & Crivat, B. (2009). Data Mining with Microsoft SQL Server 2008. Indianapolis, IN: Wiley Publishing Inc.