Data Mining Research - www.dataminingblog.com | Data Mining Blogs If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting! I posted an earlier version of this data mining blog list in a previously on DMR. Here is an updated version (blogs recently added to the list have the logo “new”). I will keep this version up-to-date. Abbott Analytics: both industry and research oriented posts covering any topic related to data mining (Will Dwinnell and Dean Abbott)A Blog by Tim Manns: as defined in it’s subtitle, this blog deals with “data mining, analysing terabyte data warehouses, using SPSS Clementine, telecommunications, and other stuff” (Tim Manns).AI, Data mining, Machine learning and other things (Markus Breitenbach): Markus writes about machine learning with a focus on statistics, security and AI.anuradha@NumbersSpeak: A blog on analytics applications, statistics and data mining (Anuradha Sharma).Blog by bruno: This blog covers a very large number of topics including web data analysis and data visualization. Ryan Rosario
PigTools - Apache Pig UDF Collections. DataFu DataFu is Linkedin's collection of Pig UDFs, which has become an Apache Incubator project. ( Elephant-Bird Twitter's library of LZO and/or Protocol Buffer-related Hadoop InputFormats, OutputFormats, Writables, Pig LoadFuncs, HBase miscellanea, etc. RPM and Debian packages for Elephant Bird can be found at Pygmalion A project to facilitate using Pig with Apache Cassandra. Tools that help run Pig workflows Amazon Amazon Elastic MapReduce makes it easy to launch Pig in interactive or batch mode in AWS. 'hamake' utility allows you to automate incremental processing of datasets stored on HDFS using Hadoop tasks written in Java or using PigLatin scripts. Mortar Data Mortar Framework Piglet PigPy Eclipse
HADOOP, HIVE, Map Reduce avec PHP : part 1 Lorsque l’on commence à débattre sur le «BIG DATA», on finit toujours par discuter du stockage. «Hadoop», de par son architecture et son fonctionnement, n’impose aucune contrainte technique sur le stockage de la donnée. Intégrant nativement le concept de Map & Reduce, «Hadoop» est un candidat sérieux pour les besoins de stockage massif et d’extraction qu’impose le «BIG DATA». Architecture technique Hadoop Le schéma ci-dessus décrit l’architecture technique d’une entreprise de e-commerce vendant des produits alimentaires pour animaux. installation d’«Hadoop»,découverte et manipulation d’«HDFS»,réalisation de Map et de Reduce en PHP avec «Hadoop streaming»,découverte de «HIVE», Installation du framework HADOOP Apache «Hadoop» est un framework écrit en JAVA qui permet entre autre, de distribuer au sein d’un cluster, des taches de type Map Reduce et d’y stocker le résultat final. Un nombre important de projets OpenSources s’appuyant sur le framework ont vu le jour : Service SSHd