background preloader

Web-Harvest Project Home Page

Web-Harvest Project Home Page

Web Data Harvesting: Web Scraping Software Web scraping software is an innovative tool that makes gathering lots of information relatively easy. The program has numerous implications for anyone who has the need to search for comparable information from various locations and put it into usable context. This method of finding extensive information in a short period of time is cost effective. Applications are used everyday for business, medicine, meteorology, government, and law enforcement. The software is user friendly and can be operated by anyone from non-tech data collectors to experienced Web designers. Programs are available for purchase in stores or online. A user enters the software and begins by programming an “agent”, this is the tool that will retrieve any and all information. Web scraping software provides customer information, marketing information, and competitor information. There have been legal ramifications as some have complained about intrusion and copyright infringement. Screen Scraper

Weka 3 - Data Mining with Open Source Machine Learning Software in Java Weka is a collection of machine learning algorithms for data mining tasks. It contains tools for data preparation, classification, regression, clustering, association rules mining, and visualization. Found only on the islands of New Zealand, the Weka is a flightless bird with an inquisitive nature. The name is pronounced like this, and the bird sounds like this. Weka is open source software issued under the GNU General Public License. We have put together several free online courses that teach machine learning and data mining using Weka. Weka supports deep learning!

Web Data Mining - An Introduction Open Source Data Mining Tools | Elastic Web Mining | Bixo Labs Below is a report on the open source data mining tools session at the ACM data mining unconference this past Sunday (01 Nov 2009). This only covers tools that the panelists had used, so it’s not a survey of the available tools. See Jeff Dalton’s blog post on Java Open Source NLP and Text Mining tools for an example of a more complete list of a closely related group of tools. Weka Paul O’Rorke talked about Weka, a collection of machine learning algorithms for data mining tasks. An attendee mentioned MOA. R Language David Smith talked about R. Attendee asked about comparing Matlab & R, with respect to viability in a production environment. Attendee said many people use R for prototyping and generating models, but production uses something else. Paul mentioned that R provides a very compact representation of data mining tasks. Nicolas Cebron talked about KNIME (pronounced “naim”), a modular data exploration platform. Attendee asked about long-term viability of KNIME. Mahout Hadoop Bixo

XELOPES - prudsys The prudsys XELOPES (eXtEnded Library fOr Prudsys Embedded Solutions) is a platform and data source independent business intelligence library which unites classical data mining methods and new real time analytics. The library can be used as standalone software, offering pre-fabricated solutions to fundamental analytics problems; furthermore, it can be integrated into other software products, emphasising its full performance capacity as an embedded analytical tool. Especially when it comes to new and complex problems, the numerous algorithms of the prudsys XELOPES, which can be combined in modules, allow for the development of adequate solutions. Data mining standards prudsys XELOPES supports essential BI standards. Stream access Since classical data mining processes must generally handle extremely large data matrices, the streaming concept for data access was implemented in the prudsys XELOPES. Analytical functions The prudsys XELOPES combines a number of classical data mining models.

Glean Comparison Search: An Educational Research and Search Tool Carrot2 - Open Source Search Results Clustering Engine The R Project for Statistical Computing About Futurity Futurity features the latest discoveries by scientists at top research universities in the US, UK, Canada, and Australia. The nonprofit site, which launched in 2009, is supported solely by its university partners (listed below) in an effort to share research news directly with the public. Contacts editor@futurity.org 615 Hylan Hall University of Rochester Rochester, NY 14627 Jenny Leonard, editoreditor@futurity.org (585) 275-6076 Katie George, assistant editorkgeorge@admin.rochester.edu (585) 276-4508 Liz Goodfellow, assistant editoregoodfel@admin.rochester.edu (585) 276-6186 Monique Patenaude, assistant editorm.patenaude@rochester.edu (585) 275-6725 Governing Board

Related: