I’m writing today to encourage participation in the periodic 2013 Rexer Analytics Data Miner Survey before it closes (this time) on April 22, 2013 (if you are reading this post after this date, then worry not: you will likely have a chance for a future survey — the next one is scheduled for early 2015). I partcipated myself, and will explain what the survey is about and why and how I am recommending participation. I provide a link for participating, and some results from the past 2011 survey. This post has the details.
The survey took me about 10 minutes to complete. The survey has mostly closed-ended but some open-ended questions on data mining and analytics. The questions cover what industry you work in, what software you are using, and what machine learning algorithms you apply in your work.
The survey asks for email, and as Karl Rexer explained to me: “we only use that information to send people two things: 1) a free copy of the 40 page Survey Summary Report, and 2) an invitation to participate in next year’s survey. We do not use the email list for any other purpose, and we do not share the list with anyone.“ My main advice on how to participate is to 1) only answer questions you want to share (and could answer, since many of the questions are detailed), and 2) only put your email if you want your responses tied to you. If you have any questions about the data use, please contact Karl Rexer directly (through the Rexer Analytics website), and he can elaborate.
However, why participate? There are only a few surveys of data mining software use, as you can validate by looking for one in a search engine. Even closer, look at kdnuggets.com, which is typically faithful in both promoting surveys and results from them. I have been using a number of data mining software tools with my clients including Microsoft SQL Server Analysis Services, SAS Enterprise Miner, SPSS, and R.
I believe it would be hard to have a truly random sample for data mining since the population of people using data mining software is not well-defined. Some might identify themselves as “data scientists” or other obvious terms, but other people might simply be industry professionals who have strong analytics abilities. I continue to believe that data mining should naturally follow an introduction to statistics education, which all could (in my recommendation, should) happen in high school (or middle school for some). I also anticipate that the line will blur between outright data mining software activity (such as professional software mentioned in this survey) and regular software use, since I expect machine learning to be behind most all software in the coming decade. I sometimes remind people that one of Microsoft’s top selling implementation of machine learning is the Xbox Kinect sensor.
I like all of the questions on this survey, but would have additionally asked others. If you are working professionally in analytics, then I believe you would benefit in at least knowing what questions are on this survey (which you could subsequently use as working questions on your own projects). If you have other ideas for new questions, please send them to me on the contact form: as some know from my slide decks and public presentations, I have asked audiences other questions which are not on this survey (especially about enterprise production use). Still, this survey has value in drawing a lot of participation, even if only a convenience sample.
To take the survey, click this link.
Results from the 2011 Survey
HIGHLIGHTS from the 5th Annual Data Miner Survey (2011):
• SURVEY & PARTICIPANTS: 52-item survey of data miners, conducted on-line in 2011. Participants: 1,319 data miners from over 60 countries.
• FIELDS & GOALS: Data miners work in a diverse set of fields. CRM/Marketing has been the #1 field for the past five years. Fittingly, ”improving the understanding of customers”, “retaining customers” and other CRM goals continue to be the goals identified by the most data miners.
• ALGORITHMS: Decision trees, regression, and cluster analysis continue to form a triad of core algorithms for most data miners. However, a wide variety of algorithms are being used. A third of data miners currently use text mining and another third plan to do so in the future.
• TOOLS: R continued its rise this year and is now being used by close to half of all data miners (47%). R users report preferring it for being free, open source, and having a wide variety of algorithms. Many people also cited R’s flexibility and the strength of the user community. STATISTICA is selected as the primary data mining tool by the most respondents (17%). Data miners report using an average of 4 software tools. STATISTICA, KNIME, Rapid Miner and Salford Systems received the strongest satisfaction ratings in 2011.
• ANALYTIC CAPABILITY AND SUCCESS MEASUREMENT: Only 12% of corporate respondents rate their company as having very high analytic sophistication. However, companies with better analytic capabilities are outperforming their peers. Respondents report analyzing analytic success via Return on Investment (ROI) and analyzing the predictive validity or accuracy of their models. Challenges to measuring success include client or user cooperation and data availability/quality.
• SHARED INSIGHTS: In the 2010 Survey data miners shared best practices in overcoming the key challenges data miners face (verbatims: www.RexerAnalytics.com/Overcoming_Challenges.html). In the 2011 Survey data miners shared their best practices for measuring analytic success (verbatims: www.RexerAnalytics.com/DMSurvey2011_MeasuringSuccess.html) and examples of the positive impact that data mining can have to benefit society, health, and the world (verbatims: www.RexerAnalytics.com/DMSurvey2011_PositiveImpact.html). Additionally, 225 R users shared information about how and why they are using R (verbatims: www.RexerAnalytics.com/DMSurvey2011_R-Comments.html).