background preloader

Machine Learning Repository

Machine Learning Repository
Related:  Big data

Datasets for Data Mining and Data Science See also Data repositories AssetMacro, historical data of Macroeconomic Indicators and Market Data. Awesome Public Datasets on github, curated by caesar0301. Related mldata :: Welcome An Overview of Web Archiving D-Lib Magazine March/April 2012 Volume 18, Number 3/4Table of Contents An Overview of Web Archiving Jinfang Niu University of South Florida jinfang@usf.edu doi:10.1045/march2012-niu1 Printer-friendly Version Abstract This overview is a study of the methods used at a variety of universities, and international government libraries and archives, to select, acquire, describe and access web resources for their archives. Keywords: web archive, web archive methods, web resources Introduction Web archiving is the process of gathering up data that has been recorded on the World Wide Web, storing it, ensuring the data is preserved in an archive, and making the collected data available for future research. Library and information schools need to prepare students for these challenges. Like the management of many other kinds of information resources, the workflow of web archiving includes appraisal and selection, acquisition, organization and storage, description and access. Appraisal and Selection

Large Network Dataset Collection Social networks Networks with ground-truth communities Communication networks Citation networks Collaboration networks Web graphs Product co-purchasing networks Internet peer-to-peer networks Road networks Autonomous systems graphs Signed networks Location-based online social networks Wikipedia networks, articles, and metadata Temporal networks User Actions Memetracker and Twitter Online Communities Online Reviews Face-to-Face Communication Networks Graph classification datasets Network types Directed : directed network Undirected : undirected network Bipartite : bipartite network Multigraph : network has multiple edges between a pair of nodes Temporal : for each node/edge we know the time when it appeared in the network Labeled : network contains labels (weights, attributes) on nodes and/or edges Network statistics Citing SNAP We encourage you to cite our datasets if you have used them in your work.

Welcome to INFOMINE: Scholarly Internet Resource Collections INFOMINE is a unique Web resource featuring well organized access to important university level research and educational tools on the Internet. A virtual library, INFOMINE is notable for its collection of annotated and indexed links. Information in INFOMINE is easy to find given the multiplicity of access points provided (ways of finding the information contained). INFOMINE contains over 100,000 links (26,000 librarian created links and 75,000 plus robot/crawler created links). INFOMINE began in January of 1994 as a project of the Library of the University of California, Riverside. INFOMINE, as mentioned, provides a great number of access points, BROWSE (What's New, Title, Table of Contents, Subject -- LCSH, Subject - LCC, Search -- Research Discipline, Key Word, Megatopics - Keyword in context, Title, Author, hyperlinked indexing) and SEARCH (Title, Subject -- LCSH, Key Word, Author, Description, Full-text), and LIMIT search (Resource Type, Resource Origin and Access) modes.

50 Resources for Getting the Most Out of Google Analytics Google Analytics is a very useful free tool for tracking site statistics. For most users, however, it never becomes more than just a pretty interface with interesting graphs. The resources below will help anyone, from the beginner to those who have been using Google Analytics for some time, learn how to get the most out of this great tool. For Beginners The following list of links will help you get started with Google Analytics from setup to understanding what data is being presented by Google Analytics. How to Use Google Analytics for Beginners – Mahalo’s how-to guide for beginners. Tips & Tricks If you’re already fairly familiar with Google Analytics and you’re ready to dig deeper and learn more about how to make use of the information that is available to you with Google Analytics, this list of tips & tricks is for you. Plugins, Hacks & Additions Want to learn how to get even more out of and extend Google Analytics by extending it with third party plugins, additions and hacks?

Common Google Universal Analytics Mistakes that kill your Analysis & Conversions I have audited hundreds of web analytics accounts and profiles. And each account/view had at least one or two issues which seriously stood in my way of getting optimum results from my analysis. I have put all of these issues into five broad categories: Directional Issues Data Collection Issues Data Integration issues Data Interpretation Issues Data Reporting Issues These are the most common mistakes that kill your analysis, reporting and conversions. In order to get optimum results from your analysis of Universal Analytics reports you must aim to find and fix as many of these issues as possible. Failing to do so will almost always result in inaccurate analysis, interpretation and reporting. 1. These issues are not associated with Google Universal Analytics or any other analytics software you use but are commonly found in analysts themselves and are reflected in the way they set up Google Analytics account, advanced segment, conversions segments, filters and custom reports. For example: 1. 2.

Using the New Cohort Analysis in Google Analytics The cohort was the basic tactical unit of Roman Legions following the reforms of Gaius Marius in 107 BC. Initially a Roman legion consisted of ten cohorts, each consisting of 480 men. Today we use the term cohort to distinguish between groups of consumers to help us make them spend more money on things they probably don’t need. Progress? And now Google Analytics has a fancy new Cohort Analysis Report that lets us analyze the death rates from the Second Punic War… Er… no… it helps us analyze the consumer/shoe thing. Ok, So What are Cohorts? For our purposes – cohorts are a way of grouping together people (or content), usually, based on date, and for our purposes it’s grouping them by their first session on a website. You can be part of more than one cohort in this manner, but it’s all still based on that date you were first acquired. What is Cohort Analysis? Cohort Analysis is looking at these groups of people, over time, and seeing how their behavior differs. The New Cohort Analysis Report

Advanced Content Analysis in Google Analytics The author's posts are entirely his or her own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz. We analyze the performance of our content every day. Sometimes it's subconscious, like when we check the number of tweets we get from a new blog post. Other times, we make more conscious efforts, like reviewing performance metrics in Google Analytics. This feedback—both formal and anecdotal—informs what we do next. It influences future blog posts and validates our strategies. Paying attention to which of your content efforts are working well is the cornerstone to data-driven marketing. These articles show how taking data-driven approach to producing content can produce great results. I don't know about you, but exponential traffic sounds pretty great to me! But we will never get there without taking a methodical and data-driven approach to our efforts. It's time to take things to the next level! In the past, I would have to do this in a manual fashion.

Learn Big Data Analytics using Top Youtube Tutorial Videos & TED Talks Introduction There has been a lot of investment in Big Data by various companies in last few years. This rise in usage of big data analytics has resulted in high demand of skilled big data professionals. While there has been a lot of debate over usefulness of this spend, there is a clear increase in the jobs on Big Data. Given the sharp increase in demand, big data has become a lucrative area to upskill yourself. There are a lot of technologies and terminologies associated with Big Data, which can act as an additional road block to get you started. Disclaimer: We DO NOT intend to promote any brand or service through this article. Who is expected to benefit most from watching these videos? I have written this article keeping in mind the beginners fraternity of Big Data. The structure of this article is designed to give a complete overview on various technologies used in Big Data Analytics. TED Talks on Big Data 1. Duration: 11:30 mins 2. Duration: 22:00 mins Summary: Dr. 3. 4. 5. 1. 2. 3.

18 New Must Read Books for Data Scientists on R and Python Introduction “It’s called reading. It’s how people install new software into their brain” Personally, I haven’t learnt as much from videos & online tutorials as much I’ve learnt from books. Until this very moment, my tiny wooden shelf has enough books to keep me busy this winter. Understanding machine learning & data science is easy. The confidence of questioning the logic comes from reading books. Here is a list of books on doing machine learning / data science in R and Python which I’ve come across in last one year. Disclosure: The amazon links in this article are affiliate links. R for Data Science Hands-on Programming with R This book is written by Garrett Grolemund. Available: Buy Now R for Everyone: Advanced Analytics and Graphics This book is written by Jared P. R Cookbook This book is written by Teetor Paul. R Graphics Cookbook This book is written by Winston Chang. Applied Predictive Modeling This book is written by Max Kuhn and Kjell Johnson. Introduction to Statistical Learning Related

Data Science Cheat Sheets – Python / R / MySQL & SQL / Spark / Hadoop & Hive / Machine Learning / Django – AITS – Data Mining Club Gear up to speed and have Data Science & Data Mining concepts and commands handy with these cheatsheets covering R, Python, Django, MySQL, SQL, Hadoop, Apache Spark and Machine learning algorithms. There are thousands of packages and hundreds of functions out there in the Data science world! An aspiring data enthusiast need not know all. Here are the most important ones that have been brainstormed and captured in a compact few pages. Mastering Data science involves understanding of statistics, Mathematics, Programming knowledge especially in R, Python & SQL and then deploying a combination of all these to derive insights using the business understanding & a human instinct—that drives decisions. Here are the cheatsheets by category: Cheat sheets for Python: Python is a popular choice for beginners, yet still powerful enough to back some of the world’s most popular products and applications. Cheat sheets for R: The R’s ecosystem has been expanding so much that a lot of referencing is needed.

Related: