background preloader

Natural language processing

Natural language processing
Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human–computer interaction. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation. History[edit] The history of NLP generally starts in the 1950s, although work can be found from earlier periods. In 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence. The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. Up to the 1980s, most NLP systems were based on complex sets of hand-written rules. NLP using machine learning[edit] Major tasks in NLP[edit] Parsing

Web sémantique Logo du W3C pour le Web sémantique Le Web sémantique, ou toile sémantique[1], est une extension du Web standardisée par le World Wide Web Consortium (W3C)[2]. Ces standards encouragent l'utilisation de formats de données et de protocoles d'échange normés sur le Web, en s'appuyant sur le modèle Resource Description Framework (RDF). Le web sémantique est par certains qualifié de web 3.0 . Alors que ses détracteurs ont mis en doute sa faisabilité, ses promoteurs font valoir que les applications réalisées par les chercheurs dans l'industrie, la biologie et les sciences humaines ont déjà prouvé la validité de ce nouveau concept[5]. Histoire[modifier | modifier le code] Tim Berners-Lee à l'origine exprimait la vision du Web sémantique comme suit : I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web — the content, links, and transactions between people and computers. — Tim Berners-Lee, Weaving the Web[13] — Weaving the Web[13]

ECML/PKDD'02 Tutorial on Text Mining and Internet Content filtering José María Gómez Hidalgo Departamento de Inteligencia Artificial Universidad Europea de Madrid In the recent years, we have witnessed an impressive growth of the availability of information in electronic format, mostly in the form of text, due to the Internet and the increasing number and size of digital and corporate libraries. TM is an emerging research and development field that address the information overload problem borrowing techniques from data mining, machine learning, information retrieval, natural-language understanding, case-based reasoning, statistics, and knowledge management to help people gain rapid insight into large quantities of semi-structured or unstructured text. A prototypical application of TM techniques is Internet information filtering. Outline The goal of this tutorial is making the audience familiar to the emerging area of Text Mining, in a practical way. The tutorial is divided into two main parts. In particular, the tutorial will cover the following topics: 1.

Bibliothèque OPDS - BookEden Créer un site gratuitement Close La taverne aux mille et une histoires Liste de référence des livres Comment aimeriez vous voir Book Eden fonctionner ? FantasticReading Moteur de recherche Disclaimer L'accès à cette page est réservé aux membres. Connexion au compte Créer un site gratuit avec e-monsite - Signaler un contenu illicite sur ce site montylingua :: a free, commonsense-enriched natural language understander Recent bugfixes Version 2.1 (6 Aug 2004) - includes new MontyNLGenerator component generates sentences and summaries Version 2.0.1 - fixes API bug in version 2.0 which prevents java api from being callable What is MontyLingua? [top] MontyLingua is a free*, commonsense-enriched, end-to-end natural language understander for English. Version 2.0 is substantially FASTER, MORE ACCURATE, and MORE RELIABLE than version 1.3.1. MontyLingua differs from other natural language processing tools because: MontyLingua performs the following tasks over text: MontyTokenizer - Tokenizes raw English text (sensitive to abbreviations), and resolve contractions, e.g. * free for non-commercial use. please see MontyLingua Version 2.0 License Terms of Use [top] Author: Hugo Liu <hugo@media.mit.edu> Project Page: < Documentation [top] New in version 2.0 (29 Jul 2004) Download MontyLingua [top] READ THIS if you are running ML on Mac OS X, or Unix William W. L.

Des données libres et liées : export RDF des données Mots-clés : Données, Linked Data, RDF, Export, XML Les données d'Open Food Facts étaient déjà ouvertes et libres (en open data comme on dit), et maintenant elles sont aussi liées. Et oui, libres et liées à la fois ! Libres car la licence ouverte permet aux données d'être utilisées par tous et pour tout usage, et liées parce que les données sont maintenant reliées non seulement entre elles, mais aussi avec d'autres jeux de données, par l'intermédiaire de la base DBPedia. J'explique en français : il y maintenant un gros fichier qui contient les données d'Open Food Facts sur les produits, leurs ingrédients et leur composition nutritionnelle. Grâce à ce fichier, les données d'OFF font maintenant partie de ce qu'on appelle "le Web des Données". Bientôt les données d'Open Food Facts croisées avec plein d'autres jeux de données ? Les détails techniques : L'export RDF est ici : (en XML/RDF)

ConceptNet What is ConceptNet? [top] ConceptNet is a freely available commonsense knowledgebase and natural-language-processing toolkit which supports many practical textual-reasoning tasks over real-world documents right out-of-the-box (without additional statistical training) including topic-jisting (e.g. a news article containing the concepts, “gun,” “convenience store,” “demand money” and “make getaway” might suggest the topics “robbery” and “crime”), affect-sensing (e.g. this email is sad and angry), analogy-making (e.g. “scissors,” “razor,” “nail clipper,” and “sword” are perhaps like a “knife” because they are all “sharp,” and can be used to “cut something”), text summarization contextual expansion causal projection cold document classification and other context-oriented inferences The ConceptNet knowledgebase is a semantic network presently available in two versions: concise (200,000 assertions) and full (1.6 million assertions). Papers about ConceptNet [top]: Download ConceptNet [top] S.

Le modèle DISC 27 janvier 2006 5 27 /01 /janvier /2006 00:00 Comment peut-on passer un arrangement avec des individus pour travailler dans les meilleures conditions possibles, même et surtout si l'on est différents? William Marston pensait que l'être humain se comporte selon deux axes, selon qu'il a tendance à être plutôt "actif" ou "passif", et selon qu'il perçoit un environnement humain ou factuel comme hostile ou favorable. Derrière chacun de ces styles, affirme la théorie, se cachent des besoins fondamentaux et un système de valeur différents. Le dominant éprouve le besoin de prendre des décisions et d'atteindre ses objectifs. ses principaux moteurs sont la performance et la responsabilité. A l'opposé, le stable, pour sa part, veut surtout être apprécié et accepté. c'est le travail d'équipe et le dialogue qui le font bouger. L'influent ressent la nécessité impérieuse d'être reconnu et félicité. La grille de lecture du modèle DISC Pour utiliser le modèle DISC avec le modèle du management situationnel

Brill POS Tagger for Win32 Paul Maddox The `Bow' Toolkit Bow (or libbow) is a library of C code useful for writing statistical text analysis, language modeling and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document retrieval (arrow) and document clustering (crossbow). The library and its front-ends were designed and written by Andrew McCallum, with some contributions from several graduate and undergraduate students. The name of the library rhymes with `low', not `cow'. About the library The library provides facilities for: Recursively descending directories, finding text files. The library does not: Have English parsing or part-of-speech tagging facilities. It is known to compile on most UNIX systems, including Linux, Solaris, SUNOS, Irix and HPUX. The code conforms to the GNU coding standards. Citation McCallum, Andrew Kachites. Here is a BiBTeX entry: Obtaining the Source Source code for the library can be downloaded from this directory.

LDC - Linguistic Data Consortium - Current Projects Maximum Entropy Modeling Using SharpEntropy. Free source code and programming articles Overview This article presents a maximum entropy modeling library called SharpEntropy, and discusses its usage, first by way of a simple example of predicting outcomes, and secondly, by presenting a way of splitting English sentences into constituent tokens (useful for natural language processing). Please note that because most of the code is a conversion based on original Java libraries published under the LGPL license, the source code available for download with this article is also released under the LGPL license. A second article, Statistical parsing of English sentences, shows how SharpEntropy can be used to perform sophisticated natural language processing tasks. Introduction SharpEntropy is a C# port of the MaxEnt toolkit available from SourceForge. Maximum entropy modeling is a general-purpose machine learning technique originally developed for statistical physics, but which has been employed in a wide variety of fields, including computer vision and natural language processing.

Related: