NoSQL Databases and Polyglot Persistence: A Curated Guide

The Net Takeaway: SQL and Hadoop SQL and Hadoop · 11/20/2008 12:23 PM, Database Analysis I don’t know why there is so much confusion over the role of MapReduce oriented databases like Hadoop vs. SQL oriented databases. It’s actually pretty simple. There are 2 things people want to do with databases: Select and Aggregate/Report, aka Process. The Select portion is filtering: finding specific data points based on attributes like time, category, etc. So, how do we tell databases to do these 2 things? While some programmers immediately get what SQL can do, others find it to be “YAL”, “Yet Another Language”. MapReduce is a programming concept that’s been around for a while in the object-oriented world, but has recently become more popular as scripting languages rise and as processors become more parallel. Therefore, if you think about it, both Hadoop and SQL databases are doing the same thing: Selecting some data (the Map phase) and Processing it (the Reduce phase). So, why the sturm und drang? But we aren’t there yet. SQL vs.

Getting Started with NoSQL « myNoSQL Couple of weeks ago, I had the pleasure to sit down with Mathias Meyer, Chief Visionary at Scalarium, a Berlin startup and discuss NoSQL adoption. Like myself, Mathias is really excited about NoSQL and he uses every opportunity to introduce more people to the NoSQL space. Recently he gave quite a few presentations around the Europe about NoSQL databases. The discussion has focused on how would someone start learning and using NoSQL databases and the path to follow in this new ecosystem. Alex: How does one get started with NoSQL? Mathias: Well, that’s a question I get quite a lot, but it is not that easy to answer. From a business perspective, you are probably going to find some use cases where storing your data in a relational database doesn’t make too much sense and you’ll start looking for ways to get it out of the database. Alex: So, as a developer you should just give yourself a chance to play with the new shiny toys. Mathias: Indeed. You can’t really give a universal answer here.

High Scalability How Digg is Built? Using a Bunch of NoSQL technologies The picture should speak for Digg’s polyglot persistency approach: But here is also a description of the data stores in use: Digg stores data in multiple types system depending on the type of data and the access patterns, and also for historical reasons in some cases :)Cassandra: The primary store for “Object-like” access patterns for such things as Items (stories), Users, Diggs and the indexes that surround them. I know this will sound strange, but isn’t it too much in there? @antirez Original title and link: How Digg is Built? via: by Alex Popescu & Ana-Maria Bacalu Most read Latest

CouchDB: Technical Overview A Database for the Web CouchDB is a database that completely embraces the web. Store your data with JSON documents. Access your documents with your web browser, via HTTP. CouchDB comes with a suite of features, such as on-the-fly document transformation and real-time change notifications, that makes web app development a breeze. See the introduction , technical overview , or one of the guides for more information. Want to Contribute? CouchDB is an open source project. One of the first things you should do is actually use CouchDB, and get to know it, read about it, evangelise it, and engage with the wider community. Why don’t you check out JIRA and help us triage some of those issues? Do you want to contribute code?

TechCrunch nodechat.js – Using node.js, backbone.js, socket.io, and redis to make a real time chat app Geek fun: take node.js and a NoSQL database — usually it is MongoDB, CouchDB, or Redis, but adventurous types could even try Riak, HBase, or Cassandra — and create a “real-time” chat or collaborative editor: nodechat.js is a simple, realtime chat app that leverages node.js, backbone.js, socket.IO, and redis. I wrote it as an exercise and I am sharing it becuase there are relatively few working examples using all these pieces together. The outcome? Update: A node.js, socket.io, and CouchDB post. Original title and link: nodechat.js – Using node.js, backbone.js, socket.io, and redis to make a real time chat app (NoSQL databases © myNoSQL)

Big Data Craft NoRM Building a Better Submission Form If you participated in our invitation to photograph a “Moment in Time” earlier in May, you used our new photo submission software, which we call Stuffy. Built to enable users to upload media files — and to allow our producers to review uploaded files quickly — Stuffy uses a “NoSQL” storage engine to make customized forms simple. Submission Form The original photo uploader, called Puffy — the Photo Upload Form For You — had hundreds of lines of code for a single custom form. Over time, that single form turned into multiple forms, as we met internal demands for the tool. On the back end, the original application used a MySQL database, requiring somewhat complex SQL to generate each form and its submissions. In the application, a form serves two purposes: to tell the application what fields to display in the form, and to collect submissions for the application. A Different Approach Displaying a photo submission form now requires a single lookup. Beyond Uploading

s Hadoop Demo VM - Cloudera Support The page you were looking for has a similar name to the following pages: Page: Cloudera's Hadoop Demo VM CDH3u3 (Overview) Last Updated: March 2012 CDH version: CDH3u3 To make it easy for you to get started with Apache Hadoop, we created a set of virtual machines with everything you need. Our VM runs CentOS 5.... Page: Cloudera's Hadoop Demo VM - chd3u1 (Overview) Last Updated: August 2011 CDH version: CDH3u1 To make it easy for you to get started with Apache Hadoop, we created a set of virtual machines with everything you need. Page: Cloudera's Hadoop Demo VM for CDH4 (Overview) Last Updated: October 2012 CDH version: CDH4.1.1 To make it easy for you to get started with Apache Hadoop, we created a set of virtual machines with everything you need. Page: Cloudera's Hadoop Demo VM for CDH3u4 (Overview) Looking for CDH4 VMs? Page: Cloudera's Hadoop Demo VM for CDH4.0.1 (Overview) Looking for CDH3 VMs?

An example of using F# and C# (.net/mono) with Amazon’s Elastic Mapreduce (Hadoop) Feb 07 This posting gives an an example how F# and C# can scale potentially to up to thousands of machines with Mapreduce in order to efficiently process TeraByte (TB) and PetaByte (PB) data amounts. It shows a C# (c sharp) mapper function and a F# (f sharp) reducer function with a description on how to deploy the job on Amazon’s Elastic Mapreduce using bootstrap action (it was tested with an elastic mapreduce cluster of 10 machines). The .net environment used is mono 2.8 and FSharp 2.0. Mapreduce Code C# mapper Compiling c# code F# reducer Compiling fsharp code Deployment on Amazon’s Elastic Mapreduce In order to run mono/.net code on Elastic Mapreduce (Debian) Linux nodes you need to install mono on each node, this can be done with a bootstrap action shell script. Bootstrap action shell script for installing mono Python script to deploy mapreduce and check status until it is done Best regards,Amund Tveit (amund Leave a Reply