Top 10 data mining algorithms in plain English Today, I’m going to explain in plain English the top 10 most influential data mining algorithms as voted on by 3 separate panels in this survey paper. Once you know what they are, how they work, what they do and where you can find them, my hope is you’ll have this blog post as a springboard to learn even more about data mining. What are we waiting for? Let’s get started!
Supports de cours Cette page recense les supports utilisés pour mes enseignements de Machine Learning, Data Mining et de Data Science au sein du Département Informatique et Statistique (DIS) de l'Université Lyon 2, principalement en Master 2 Statistique et Informatique pour la Science des donnéEs (SISE), formation en data science, dans le cadre du traitement statistique des données et de la valorisation des big data. Je suis très attentif à la synergie forte entre l'informatique et les statistiques dans ce diplôme, ce sont là les piliers essentiels du métier de data scientist. Attention, pour la majorité, il s'agit de « slides » imprimés en PDF, donc très peu formalisés, ils mettent avant tout l'accent sur le fil directeur du domaine étudié et recensent les points importants.
Blog eye tracking : ergonomie web, tests utilisateur Le eye tracking comme outil pour la sécurité aérienne Le magazine belge L’écho publie aujourd’hui un article sur la technologie de eye tracking Pertech. Cet article présente l’oculomètre Pertech entre comme outil pour la sécurité aérienne. L’article est à télécharger au format pdf ici-même : Le eye tracking comme outil pour la sécurité aérienne Scheduling in Hadoop Hadoop is a general-purpose system that enables high-performance processing of data over a set of distributed nodes. But within this definition is the fact that Hadoop is a multi-tasking system that can process multiple data sets for multiple jobs for multiple users at the same time. This capability of multi-processing means that Hadoop has the opportunity to more optimally map jobs to resources in a way that optimizes their use.
Add-in Sipina pour Excel 2007 et 2010 - Sipina - Arbres de décision Vendredi 27 août 2010 5 27 /08 /Août /2010 13:33 La macro complémentaire sipina.xla participe largement à la diffusion du logiciel Sipina. Dans un environnement qui lui est familier, le tableur, l'utilisateur peut manipuler / transformer / recoder les données à sa guise avant de les envoyer vers le logiciel spécialisé de Data Mining. Exit les problèmes de compatibilités entre formats de fichiers, les points décimaux capricieux, etc. Il lui suffit de sélectionner les données puis de cliquer sur un nouveau menu intégré dans Excel. Nous avons décrit l'installation et l'utilisation de l'add-in dans Office 2000 (la procédure est valable jusqu'à Office 2003).
4 Promising Curation Tools That Help Make Sense of the Web Steven Rosenbaum is a curator, author, filmmaker and entrepreneur. He is the CEO of Magnify.net, a real-time video curation engine for publishers, brands, and websites. His book Curation Nation is slated to be published this spring by McGrawHill Business. As the volume of content swirling around the web continues to grow, we're finding ourselves drowning in a deluge of data. Observations About Streaming Data Analytics for Science I recently had the pleasure of attending two excellent workshops on the topic of streaming data analytics and science. A goal of the workshops was to understand the state of the art of “big data” streaming applications in scientific research and, if possible, identify common themes and challenges. Called Stream2015 and Stream2016, these meetings were organized by Geoffrey Fox, Lavanya Ramakrishnan and Shantenu Jha.
Tutoriels Tanagra pour le Data Mining The Best Data Visualization Projects of 2011 I almost didn't make a best-of list this year, but as I clicked through the year's post, it was hard not to. If last year (and maybe the year before) was the year of the gigantic graphic, this was the year of big data. Or maybe we've gotten better at filtering to the good stuff. (Fancy that.) In any case, data graphics continue to thrive and designers are putting more thought into what the data are about, and that's a very good thing. So here are my favorites from 2011, ordered by preference. MongoDB Performance Tuning: Everything You Need to Know MongoDB is one of the most popular document databases. It’s the M in the MEAN stack (MongoDB, Express, Angular, and Node.js). Unlike relational databases such as MySQL or PostgreSQL, MongoDB uses JSON-like documents for storing data. MongoDB is free, open-source, and incredibly performant. However, just as with any other database, certain issues can cost MongoDB its edge and drag it down. In this article, we’ll look at a few key metrics and what they mean for MongoDB performance.
Sipina - Arbres de décision 50 Great Examples of Data Visualization Wrapping your brain around data online can be challenging, especially when dealing with huge volumes of information. And trying to find related content can also be difficult, depending on what data you’re looking for. But data visualizations can make all of that much easier, allowing you to see the concepts that you’re learning about in a more interesting, and often more useful manner. Below are 50 of the best data visualizations and tools for creating your own visualizations out there, covering everything from Digg activity to network connectivity to what’s currently happening on Twitter.