This blog summarizes key points from a presentation by this same name, presented by Dr. Eugene Agichtein. At the end of this blog post I have a link for the PowerPoint and webcast recorded in 2007. Also, Dr. Agichtein has a book of similar topic to be published later this year (as he confirmed with me), so I will also share that link too.
On this blog, I posted on the general topic of web content mining. As I presented the topic, I provided evidence for why information extraction is an area of active research, since the problem of determining synonyms is a logical challenge.
In the presentation by Dr. Agichtein, he matches important information about information extraction with a goal of showing how this task might scale up for large dataset processing. I will be sharing Dr. Agichtein’s links (from 2007) and comment on how Microsoft technology (SharePoint with FAST Search Server) can provide scaled efficiencies for these tasks.
What do academics know about information extraction?
Dr. Agichtein mentions:
- the academic area is mature
- work still continues
- the area combines Computational Linguistics, Machine Learning, Data mining, Databases, and Information Retrieval
- the traditional focus has been on accuracy of extraction
He then mentions a Microsoft example, where Bill Gates is CEO
What are Dr. Agichtein’s three information tasks to focus on?
- Entity tagging
- Relation extraction
- Event extraction
Continue reading “Towards Web-Scale Information Extraction” »
Industry analysis organization Gartner announced four major trends for the next few years. This blog post projects implications for data mining (in general).
Prediction #1: “By 2013, 33 percent of BI functionality will be consumed via handheld devices.”
From their text, they include the tablet, which you may remember having seen from previous-released laptops with flippable screens (to make them a tablet with a stylus). The obvious technology which has driven this topic is the Apple iPad, but others are on the way, and vendors are trying to cash in on the wave of renewed interest in tablet devices. I recently purchased a small Asus computer, larger than a netbook, but allowing me to blog from just about anywhere. Asus also made a tablet announcemet at the Computer Electronics Show recently.
Continue reading “Data Mining Implications of Gartner’s 2011 Projections” »
Rob Collie formerly worked for Microsoft, and now blogs at PowerPivotPro.com. His blog is included in my Recommended Blogs for 2011. Late last year, he posted his comments on Vertipaq and Analysis Services. I would expect my regular reading audience to be used to models, so let’s start with Rob’s graphic from http://powerpivotpro.com/2010/11/12/five-observations-from-sql-pass/:
In the blog post, Rob tells the story which we are all knowing now:
- PowerPivot is an expression of the Vertipaq technology as an Excel add-in
- Vertipaq will be moved into Analysis Services along with the DAX language
- The rapid adoption of PowerPivot so far provides a confidence that improvements in this area will continue
Continue reading “Vertipaq, Analysis Services, and Data Mining” »
Earlier I blogged on this topic during an event wrap up (see http://www.marktab.net/datamining/2010/09/20/sql-saturday-46-raleigh-nc-post-event-wrapup/)
The question is what is setup and what is customization. I will talk today in context of SharePoint.
The technology application is important for SharePoint, which I will note has a variety of websites dedicated to both topics:
I like the analogy of home improvement stores, where someone is deciding on a major home upgrade (pick any). The choices are more than just do-it-yourself or have-someone-else-do-it. Why?
- Even if you do-it-yourself, someone else will end up being involved at some point in the process (since you or I or any individual did not create SharePoint)
- Even if you have-someone-else-do-it, they will have you involved to some degree in the decision-making process
SharePoint requries server hosting, and the do-it-yourself approach means at minimum setting up a server. Yet, the point of SharePoint is NOT about doing it yourself, but we-do-it-together. So again, the do-it-yourself approach leans still toward working with other people. No one escapes.
Continue reading “Setup versus Customization with SharePoint” »
SQL Rally is a new conference scheduled for May 11-13, 2011 at the Marriott World Center in Orlando, FL. The conference will have pre-conference seminars intended to appeal to a broad range of attendees and also perceived to be worthy of additional expense (meaning above the conference fees, see http://www.sqlandy.com/wp-content/uploads/2010/09/SQLRally-PreCon-Application-Final.pdf). In brief, the pre-conference seminar topics should relate to the median core attendee who wants to spend even more time and money on additional learning.
I decided to submit a data mining seminar idea for this conference. The conference team graciously passed over my seminar idea, with good reasons. I have decided to blog about this topic because I believe I can provide insight into the Microsoft Data Mining community, and talk about where this technology is perceived to fit in the Microsoft world. I believe my seminar has a conference home, and now that I have this written outline I can seek more feedback.
My blog post outline:
- My seminar proposal
- Gracious Response from SQL Rally
- My response to SQL Rally, and commentary
Continue reading “Data Mining Seminar Passed Over for SQL Rally 2011” »
Data Mining with Microsoft SQL Server 2008 Review Chapter 5
This chapter provides an excellent tutorial on how to pragmatically use data mining with Excel 2007. The chapter’s title says “Office 2007″ and within the chapter only two applications within Office are mentioned:
- Excel 2007 — which has a free 32-bit client to perform data mining (as of this blog post, Microsoft produces two add-ins, one for SQL Server 2008, including R2, and one for SQL Server 2005)
- Visio 2007 — which has a visual display feature integrated with SQL Server Data Mining
I have produced a blog post which connects the newly-termed academic field of Visual Analytics with this technology. I like simple phrases, and Visual Analytics aptly describes a an active research field where people are trying to study how to best make use of visual interfaces and automated algorithms. Certainly, Microsoft Office remains one of the strongest visual interfaces for decision making and ubiquitous in both non-profit and for-profit organizations. If Microsoft Office were all about Microsoft Word, we might argue against its visual component. However, the lavish use of icons through the Microsoft Office ribbon (in version 2007 and higher) proves that visual tools and layout are part of the Microsoft Office design interface. Microsoft SharePoint 2010 is the current web application framework (my terminology) for extending the reach of Microsoft Office through work groups and across the enterprise. Microsoft acquired two important visual-related technologies, including ProClarity (now Performance Point) and visual tools from Dundas, both of which demonstrate their commitment toward visual technologies which integrate with business intelligence and support analytic decision makers.
Continue reading “Implementing a Data Mining Process using Office 2007” »
The Association for Computing Machinery produces a regular journal called SIGKDD Explorations, where SIGKDD is an acronym for Special Interest Group on Knowledge Discovery and Data Mining. I would classify the journal as academic, even though private-sector consultants or companies may be coauthoring articles.
In a recent issue, there is an article titled “Visual Analytics: How much visualization and how much analytics?”. The article makes the following claims:
- “Visual Analytics is the science of analytical reasoning supported by interactive visual interfaces.” (page 5)
- “The term Visual Analytics has been around for about five years now.” (page 5)
- “The core of our view on Visual Analytics is the new enabling and accessible analytic reasoning interactions supported by the combination of automated and visual analytics.”(page 5)
Altogether, these statements mean that Visual Analytics is a relatively new academic buzzword to define a specific field of research, namely the combination of automated analysis and visual representation. Someone might ask, how much does that description look like what people do with Excel? I would at first pass answer that Excel 2010 has exceptional graphic and visualization capabilities, but it does not inherently provide automated data analysis. However, SQL Server Data Mining adds the automated portion of this equation.
Continue reading “Visual Analytics and SQL Server Data Mining” »