Neo4j -- or why graph dbs kick ass Neo4j - a Graph Database that Kicks Buttox Update: Social networks in the database: using a graph database. A nice post on representing, traversing, and performing other common social network operations using a graph database. If you are Digg or LinkedIn you can build your own speedy graph database to represent your complex social network relationships. For those of more modest means Neo4j, a graph database, is a good alternative. A graph is a collection nodes (things) and edges (relationships) that connect pairs of nodes. A graph looks something like: For more lovely examples take a look at the Graph Image Gallery. Here's a good summary by Emil Eifrem, founder of the Neo4j, making the case for why graph databases rule: Most applications today handle data that is deeply associative, i.e. structured as graphs (networks). So relational database can't handle complex relationships. Neo4j's Key Characteristics
How Twitter Uses NoSQL InfoQ has released a video of Twitter's Kevin Weil speaking at Strange Loop earlier this year on how the company uses NoSQL. Weil is quick to point out that Twitter is heavily dependent on MySQL. However, Twitter does employ NoSQL solutions for many purposes for which MySQL isn't ideal. Scribe Syslog stopped scaling for Twitter after a while, so instead it uses Scribe, a log collection framework created and open-sourced by Facebook. Twitter uses Scribe to write logs to Hadoop. Hadoop Twitter needs to store more data per day than it can reliably write to a single hard drive, so it needs to store data on clusters. Weil says MySQL isn't efficient at doing analytics at the scale Twitter needs. Pig This Pig script finds the top five pages of your site visited by people aged 18 to 25. Weil says the most natural way to "talk to" Hadoop is through Java. Hbase
Why Use a Graph-Oriented Database? | YarcData Suppose you worked for a business analysis software company, and your CEO wanted you to look into the possibility of developing a product that would help investment banks detect insider trading. Further suppose that the CEO wanted you to brief her on your proposed technical approach to insider trading detection, and you’re standing in front of a whiteboard with a marker in your hand (you know that she likes hand-sketched diagrams), and you’ve decided to use a fictionalized version of this story you read in Bloomberg as an example. What would you draw on the whiteboard? I’m thinking it might look something like this: Let’s look at the last three arrows in the diagram. In this story, these represent a chain of causality – this thing happened, which caused this other thing to happen, and so on. There are other situations when it would be natural to draw a sort of diagram like this. Let’s consider a different example. Is there anything inherently graph-oriented about this information?
Hadoop - NoSQL - NoSQL and Cloud Databases What is Hadoop? Hadoop isn't a simple database; it's a bunch of different technologies built on top of the Hadoop common utilities, MapReduce, and HDFS (Hadoop Distributed File System). Each of these products serves a simple purpose - HDFS handles storage, MapReduce is a parallel job distribution system, HBase is a distributed database with support for structured tables. How Do You Install It? Installing Hadoop is not quite as easy as installing Cassandra. Cloudera Flavor If you're running linux (which is the easiest way to do this), just follow the instructions to use Cloudera's Hadoop repositories.If you don't have a Linux distribution handy, you can download a VM from Cloudera (yeah, it's that easy).If you really want to run Cloudera's Hadoop on Windows, you will need to install Cygwin and create a Linux-like environment. Apache Flavor Which Login and Security Model(s) Does Hadoop Support? Good question! N.B. When Does It Make Sense To Use Hadoop Instead of SQL Server, Oracle, or DB2?
5 Graph Databases to Consider Of the major categories of NoSQL databases - document-oriented databases, key-value stores and graph databases - we've given the least attention to graph databases on this blog. That's a shame, because as many have pointed out it may become the most significant category. Graph databases apply graph theory to the storage of information about the relationships between entries. The relationships between people in social networks is the most obvious example. Google has its own graph computing system called Pregel (you can find the paper on the subject here), but there are several commercial and open source graph databases available. Neo4j This is one of the most popular databases in the category, and one of the only open source options. Neo Technologies cites several customers, though none of them are household names. Here's a fun illustration of how relationship data in graph databases works, from an InfoQ article by Neo Technologies COO Peter Neubauer: FlockDB AllegroGraph GraphDB InfiniteGraph