
HBase – Apache HBase Home Welcome to Apache Pig! Research Publication: Sawzall Interpreting the Data: Parallel Analysis with Sawzall Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan Abstract Very large data sets often have a flat but regular structure and span multiple disks and machines. Examples include telephone call records, network logs, and web document repositories. These large data sets are not amenable to study using traditional database techniques, if only because they can be too large to fit in a single relational database. We present a system for automating such analyses. Published in:Scientific Programming Journal Special Issue on Grids and Worldwide Computing Programming Models and Infrastructure 13:4, pp. 227-298. Download: PDF Version URL (Final): Journal link Animation: The paper references this movie showing how the distribution of requests to google.com around the world changed through the day on August 14, 2003.
Impala Cloudera Impala is the industry’s leading massively parallel processing (MPP) SQL query engine that runs natively in Apache Hadoop. The Apache-licensed, open source Impala project combines modern, scalable parallel database technology with the power of Hadoop, enabling users to directly query data stored in HDFS and Apache HBase without requiring data movement or transformation. Impala is designed from the ground up as part of the Hadoop ecosystem and shares the same flexible file and data formats, metadata, security and resource management frameworks used by MapReduce, Apache Hive, Apache Pig and other components of the Hadoop stack. Now You Have a Choice Before Impala, if your relational database was at capacity, you may have had no choice but to expand that system to maintain your expectations of performance. Now you have a choice. Impala delivers: Performance equivalent to leading MPP databases, and 10-100x faster than Apache Hive/Stinger. Key Features of Impala
Welcome to Apache™ Hadoop®! Tez - Groovy - Home The Julia Language Pangool - Hadoop API made easy Apache ZooKeeper - Home Apache Tez – Welcome to Apache Tez
A data warehouse system for Hadoop that offers a SQL-like query language to facilitate easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. by sergeykucherov Jul 15