background preloader

Apache OpenNLP - Welcome to Apache OpenNLP

Apache OpenNLP - Welcome to Apache OpenNLP

Home Snowball Zemanta - contextual intelligence for everyone! 3 Rules for Building Features in a Lean Startup 20 Flares370010--×20 Flares x Access to videos, talks, and worksheetsInvitation to private Google Plus CommunityJoin in on live Q&A webinars and fireside chats MALLET homepage MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. MALLET includes sophisticated tools for document classification: efficient routines for converting text to “features”, a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics. Quick Start / Developer’s Guide In addition to classification, MALLET includes tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields. Topic models are useful for analyzing large collections of unlabeled text. Many of the algorithms in MALLET depend on numerical optimization. The toolkit is Open Source Software, and is released under the Apache 2.0 License.

Welcome to Apache Stanbol! - Apache Stanbol How We Build Features x Access to videos, talks, and worksheetsInvitation to private Google Plus CommunityJoin in on live Q&A webinars and fireside chats JGibbLDA: A Java Implementation of Latent Dirichlet Allocation (LDA) using Gibbs Sampling for Parameter Estimation and Inference Metaweb video From Freebase On July 16th 2010, when Metaweb announced their acquisition by Google, they also launched a video that explains what Metaweb/Freebase does, what entities are, etc. You know what drives me crazy about words? Like, check this out: someone says, "I love Boston." But, I guess there's really no way of knowing. So how come on the web, so many sites still try to organise stuff with words? But what if there was a better way? Welcome to Metaweb. OK, well, let's compare that to text. But that's just the beginning. So, Metaweb's been in the process of identifying millions of these entities and mapping out how they're related, and what words other sites use to refer to them. So, how is this going to help you? Or, say you're that product guy at the music site. Are you kidding me? And it's not just movies and bands. Metaweb makes your site smarter.

News -- Deploy your own "cloud" with Debian "Wheezy" April 25th, 2012 The Debian Project produces an entirely Free operating system that empowers its users to be in control of the software running their computers. These days, more and more computing is being moved away from user computers to the so-called cloud – a vague term often used to refer to Software as a Service (SaaS) offerings. We encourage Debian users to prefer cloud offerings where the SaaS infrastructure is entirely made of Free Software and can be run under their control. To help our users with these tasks, we are proud to announce the availability of several new technologies that would ease the deployment of Debian-based clouds. The work to finalize Debian 7.0 Wheezy is still ongoing, but packages of the above technologies are already available as part of our testing release. Preserving user freedoms in the cloud is a tricky business and one of the major challenges ahead for Free Software. About Debian Contact Information

Topic Modeling Toolbox The first step in using the Topic Modeling Toolbox on a data file (CSV or TSV, e.g. as exported by Excel) is to tell the toolbox where to find the text in the file. This section describes how the toolbox converts a column of text from a file into a sequence of words. The process of extracting and preparing text from a CSV file can be thought of as a pipeline, where a raw CSV file goes through a series of stages that ultimately result in something that can be used to train the topic model. 01.val source = CSVFile("pubmed-oa-subset.csv") ~> IDColumn(1); 03.val tokenizer = { 04. 05. 06. 07. 10.val text = { 11. source ~> 12. 13. 14. 15. 16. 17. The input data file (in the source variable) is a pointer to the CSV file you downloaded earlier, which we will pass through a series of stages that each transform, filter, or otherwise interact with the data. If the first row in your CSV file contains the column names, you can remove that row using the Drop stage: Tokenizing Finding meaningful words

BBC Internet Blog: BBC World Cup 2010 dynamic semantic publishing

Related: