The March 28, 2011 issue of Forbes magazine featured an interview with billionaire Yuri Milner. I start with a quotation from the article (which I link at the end of this blog post):
Milner loses all reticence when he talks about the future of social media. He says we live in the “age of the mathematician,” in which inordinate power and riches will go to the people who create the algorithms that end up dictating who and what we know.
“The Billionaire who Friended the Web”, Forbes Magazine, March 28, 2011, page 79
The quote speaks to data mining, and this blog post discusses Milner’s perspective. I argue that the “age of the mathematician” is not made possible simply from algorithms (which in many cases have existed for decades) but instead the combination of large datasets and the software to process such information. I even believe the software is more important than the hardware, which again, has largely existed in similar format for many years. Milner backs up his assertion about power and riches based on the strategic investments that his companies have made.
Continue reading “Yuri Milner and DST Global” »
Data Mining with Microsoft SQL Server 2008 Review Chapter 18
I have started a new refrain in my presentations: “SQL Server Data Mining is not an application, it is a service”. Possible application interfaces for this technology include:
The book (in its entirety) covers four of the five interfaces. In this chapter, the authors provide “code snippets” in C# (intended for an ASP.NET application) and also DMX code. For the purpose of this review, I will provide the PowerShell translation of the C# code (since PowerShell is not covered in the book). DMX could be sent from any of the five options above, but I will be discussing SSMS.
Both the “code snippets” and the DMX were provided to model the Movie Click dataset. The goal of this chapter is Continue reading “Implementing a Web Cross-Selling Application” »
Data Mining with Microsoft SQL Server 2008 Review Chapter 8
I have commented several times that time series was an entire class when I was in graduate school. It was an appropriate topic for that stage (either for graduate school or later in an undergraduate) because calculus is required to communicate the mathematics. If I had to bet on a single data mining algorithm used across all situations and companies and countries and industries, this one would be it. For the 2008 version, Microsoft has made good improvements to this algorithm, allowing analysts to tune parameters depending on the situation. Among all the available Microsoft data mining algorithms, I believe the parameter choices affect results for this algorithm the most, and therefore might justify multiple models for comparison (since only empirical results can best demonstrate efficient outcomes).
Time series was a big topic for W. Edwards Deming. He used this subject to demonstrate what variance is, and whether a system was in control. Continue reading “Microsoft Time Series Algorithm” »