This blog summarizes key points from a presentation by this same name, presented by Dr. Eugene Agichtein. At the end of this blog post I have a link for the PowerPoint and webcast recorded in 2007. Also, Dr. Agichtein has a book of similar topic to be published later this year (as he confirmed with me), so I will also share that link too.
On this blog, I posted on the general topic of web content mining. As I presented the topic, I provided evidence for why information extraction is an area of active research, since the problem of determining synonyms is a logical challenge.
In the presentation by Dr. Agichtein, he matches important information about information extraction with a goal of showing how this task might scale up for large dataset processing. I will be sharing Dr. Agichtein’s links (from 2007) and comment on how Microsoft technology (SharePoint with FAST Search Server) can provide scaled efficiencies for these tasks.
What do academics know about information extraction?
Dr. Agichtein mentions:
- the academic area is mature
- work still continues
- the area combines Computational Linguistics, Machine Learning, Data mining, Databases, and Information Retrieval
- the traditional focus has been on accuracy of extraction
He then mentions a Microsoft example, where Bill Gates is CEO
What are Dr. Agichtein’s three information tasks to focus on?
- Entity tagging
- Relation extraction
- Event extraction
Continue reading “Towards Web-Scale Information Extraction” »

Data Mining with Microsoft SQL Server 2008