background preloader

Welcome to Hive!

Welcome to Hive!

Apache Thrift Welcome to Apache Pig! Big Data 2011 by GigaOM - Infrastructure - Web- Eventbrite Invalid quantity. Please enter a quantity of 1 or more. The quantity you chose exceeds the quantity available. Please enter your name. Please enter an email address. Please enter a valid email address. Please enter your message or comments. Please enter the code as shown on the image. Please select the date you would like to attend. Please enter a valid email address in the To: field. Please enter a subject for your message. Please enter a message. You can only send this invitations to 10 email addresses at a time. $$$$ is not a properly formatted color. Please limit your message to $$$$ characters. $$$$ is not a valid email address. Please enter a promotional code. Sold Out Pending You have exceeded the time limit and your reservation has been released. The purpose of this time limit is to ensure that registration is available to as many people as possible. This option is not available anymore. Please read and accept the waiver. All fields marked with * are required. US Zipcodes need to be 5 digits. Map

Apache ZooKeeper - Home Research Publication: Sawzall Interpreting the Data: Parallel Analysis with Sawzall Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan Abstract Very large data sets often have a flat but regular structure and span multiple disks and machines. Examples include telephone call records, network logs, and web document repositories. We present a system for automating such analyses. Published in:Scientific Programming Journal Special Issue on Grids and Worldwide Computing Programming Models and Infrastructure 13:4, pp. 227-298. Download: PDF Version URL (Final): Journal link Animation: The paper references this movie showing how the distribution of requests to google.com around the world changed through the day on August 14, 2003.

[repost]How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data « New IT Farmer How do you query hundreds of gigabytes of new data each day streaming in from over 600 hyperactive servers? If you think this sounds like the perfect battle ground for a head-to-head skirmish in the great MapReduce Versus Database War, you would be correct. Bill Boebel, CTO of Mailtrust (Rackspace’s mail division), has generously provided a fascinating account of how they evolved their log processing system from an early amoeba’ic text file stored on each machine approach, to a Neandertholic relational database solution that just couldn’t compete, and finally to a Homo sapien’ic Hadoop based solution that works wisely for them and has virtually unlimited scalability potential. Rackspace faced a now familiar problem. Facing exponential growth they spent about 3 months building a new log processing system using Hadoop (an open-source implementation of Google File System and MapReduce), Lucene and Solr.

Apache Kafka The Julia Language About | Elastic Web Mining | Bixo Labs Scale Unlimited is based in Nevada City, California and provides consulting and training services for big data analytics, search, and web mining. The company was founded in 2008 by Stefan Groschupf, Chris Wensel, and Ken Krugler, three of the world’s leading experts in scalable, reliable data analytics, workflow design and web mining. All are well-known community members and contributors to key open source projects, including Hadoop, Bixo, Cascading, Solr, Lucene, Katta and Tika. Solutions from Scale Unlimited are built using these and other widely used and well supported open source packages, providing maximum flexibility with no commercial lock-in. Inspiration Scale Unlimited solves three major problems that the founders experienced first-hand at previous startups and consulting projects. First, processing big data requires a workflow system that is efficient, reliable and scalable. With Scale Unlimited, solutions are built using Hadoop and Cascading-based workflows. Team Technical Advisors

HBase - Apache HBase™ Home For fast, interactive Hadoop queries, Drill may be the answer — Cloud Computing News

A data warehouse system for Hadoop that offers a SQL-like query language to facilitate easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. by sergeykucherov Jul 15

Related: