Data Mining with Microsoft SQL Server 2008 Review Chapter 17
This chapter covers a topic on extending data mining. Specifically, the chapter does not deal directly with user interfaces, but instead developer interfaces and machine learning algorithms. Perhaps the previous programming chapter was intended as the user interface extension. If we consider the common Model-View-Controller paradigm:
- Model — the machine learning algorithm (this chapter)
- View — the user interfaces (supposedly the last chapter)
- Controller — the means of achieving a goal (which at least will bring Analysis Services and perhaps SQL Server to the table)
In this three-part division, not much was said about view, but this chapter does talk about viewers. This chapter does not have any specific exercises or user code, since a lot of information and resources are online. I believe that the material in this chapter could fill another book. The sections below describe the main online resources for extending this technology. I have chosen to reorder the links and topics from the chapter into another order which makes more sense to me.
SQL Server Data Mining (Team Website)
Some helpful resources exist on the team website. See especially the links “Tutorials” and “Whitepapers & Articles”. See http://www.sqlserverdatamining.com/ssdm/
About PMML (Predictive Model Markup Language) version 2.1
From the website:
The Predictive Model Markup Language (PMML) is an XML-based language which provides a way for applications to define statistical and data mining models and to share models between PMML compliant applications.
PMML provides applications a vendor-independent method of defining models so that proprietary issues and incompatibilities are no longer a barrier to the exchange of models between applications. It allows users to develop models within one vendor’s application, and use other vendors’ applications to visualize, analyze, evaluate or otherwise use the models. Previously, this was very difficult, but with PMML, the exchange of models between compliant applications is now straightforward.
Analysis Services supports version 2.1 of PMML. Click here for my extended analysis of PMML support by SQL Server 2008 R2.
SQL Server Books Online 2008 R2
From Microsoft: “This set of documentation helps you understand SQL Server, and how to implement data management and business intelligence projects. SQL Server includes several data management and analysis technologies.” In this context, SQL Server refers not only to the transactional database, but also to the associated business intelligence technologies: Integration Services, Analysis Services, and Reporting Services. The data mining technology is part of Analysis Services. See http://msdn.microsoft.com/en-us/library/ms130214.aspx
Other online articles include:
- Add Custom Data Mining Algorithms to SQL Server 2005
- SQL Server Data Mining: Plug-In Algorithms
- A Tutorial for Constructing a Plug-in Algorithm
- A Tutorial for Constructing a Plug-in Viewer
Each version of SQL Server has been accompanied by a free downloadable “Feature Pack” (sometimes updated during the product support time). I am not aware of how well these 2005 tools work with SQL Server 2008 or 2008 R2, though I would have enough confidence to try to make them work. The tools for 2005 include:
- Data Mining Managed Plug-in Algorithm API for SQL Server 2005
- Microsoft SQL Server 2005 Datamining Viewer Controls
These tools include:
- Microsoft SQL Server 2008 Data Mining Add-ins for Microsoft Office 2007
- Microsoft SQL Server 2008 Datamining Viewer Controls
- Microsoft Windows PowerShell Extensions for SQL Server
These tools include:
- Microsoft® SQL Server® PowerPivot for Microsoft® Excel
- Microsoft® Windows PowerShell Extensions for SQL Server® 2008 R2
- Microsoft® SQL Server® 2008 Data Mining Add-ins for Microsoft® Office 2007
- Microsoft® Datamining Viewer Controls For Microsoft® SQL Server® 2008
Extending Visual Studio
Business Intelligence Development Studio (BIDS) is an extension for Visual Studio. BIDS can be extended, and at minimum you should be using the free BIDS Helper. Visual Studio is a product which lives within a shell which anyone could freely use (under license terms) to fill with other content or types of interfaces. Two ways to extend the product are writing macros and authoring add-ins. See http://msdn.microsoft.com/en-us/vstudio/vextend.aspx
Extending Microsoft Office
There are two existing add-ins for Microsoft Office, one being for Visio and the other for Excel. I believe that any of the Microsoft Office products could be extended (through the ribbon) to provide management of data mining algorithms, and specific functions related to that product. The current add-in for Excel 2007 focuses on data analysis, and the Visio 2007 add-in focuses on displaying models. What could you build for Microsoft Word or Outlook or Access or Project or Publisher or PowerPoint?
As a general idea, I believe anyone building add-ins for any Microsoft Office product could find ways to improve their functions by adding a connection to Analysis Services and SQL Server Data Mining. PowerPivot has an add-in for Microsoft Excel which we could expect developers to leverage in the future. For general information on extending Microsoft Office see Office Development with Visual Studio.
Whitepaper: Adding a C# Library to SQL Server Data Mining
A company called Visual Numerics produces a C# library with many numerical functions. There are a number of whitepapers on the IMSL Family. Someone might say, why not just program data mining through basic C# (or any other inherently managed .NET language) and that option is always available. There are convenience factors in using the SQL Server Data Mining technology, though I know that strong developers would want to evaluate all their options. I therefore provided this link because it shows a solution matching an existing C# library with SQL Server Data Mining.
One already available add-in is on Codeplex, the open-source sharing portal for Microsoft products. This plug-in provides a support vector machine algoritm, and I would recommend downloading the product. In my experience, I have not been able to make it work as it should work. You can register your own comments on the page, and you might want to join this project as a volunteer developer. I did post a comment on x64 installation, since gacutil.exe is in another location for VS2010 (as on my machine), though the project needs to be recompiled to work on x64. Since I do not know all the possible options (such as 32 or 64 bit, and different VS versions), I would not be the person to solve the installation for all the possible combinations.
Microsoft Developers’ Network (MSDN)
Some people know MSDN as an annual subscription to Microsoft products for development, and that answer is correct. However, MSDN generally is what I consider a social network of developers, which starts with the MSDN online website. A closely related website is the TechNet website, and I have been seeing content move between the msdn.microsoft.com and technet.microsoft.com URLs. I do not know the history of the two areas, but I still classify MSDN as a social community contrasted with TechNet as a resource portal. The actual content overlaps, and TechNet has its own subscription program.
The SQL Server Data Mining solution is technically a service and not an application. I have sometimes talked with audiences about whether new generations of “developers” might shed that title and prefer other terms like “service architect”. I believe that this technology leads the field in providing programming options which allow people to either replace the Microsoft data mining algorithms or extend this technology into new algorithms. Unlike an application, a service waits on a server, ready to serve data miners eager to build structures and models, and make actionable decisions. In this blog post, I provided the key links shared by the author, and in the spirit of the chapter, extended their information to some other resources I believe are valuable.
MacLennan, J., Tang, Z., & Crivat, B. (2009). Data Mining with Microsoft SQL Server 2008. Indianapolis, IN: Wiley Publishing Inc.