Because Hadoop isn’t perfect: 8 ways to replace HDFS Hadoop is on its way to becoming the de facto platform for the next-generation of data-based applications, but it’s not without flaws. Ironically, one of Hadoop’s biggest shortcomings now is also one of its biggest strengths going forward — the Hadoop Distributed File System. Within the Apache Software Foundation, HDFS is always improving in terms of performance and availability. But if the growing number of options for replacing HDFS signifies anything, it’s that HDFS isn’t quite where it needs to be. Cassandra (DataStax) Not a file system at all but an open source, NoSQL key-value store, Cassandra has become a viable alternative to HDFS for web applications that rely on fast data access. Ceph Ceph is an open source, multi-pronged storage system that was recently commercialized by a startup called Inktank. Dispersed Storage Network (Cleversafe) Isilon (EMC) Lustre MapR File System NetApp Open Solution for Hadoop Feature image courtesy of Shutterstock user Panos Karapanagiotis.
Btrfs Btrfs (B-tree file system, variously pronounced: "Butter F S", "Butterface",[7] "Better F S",[5] "B-tree F S",[8] or simply by spelling it out) is a GPL-licensed copy-on-write file system for Linux. Development began at Oracle Corporation in 2007. As of August 2014[update], the file system's on-disk format has been marked as stable.[9] History[edit] The core data structure of Btrfs—the copy-on-write B-tree—was originally proposed by IBM researcher Ohad Rodeh at a presentation at USENIX 2007. In 2008, the principal developer of the ext3 and ext4 file systems, Theodore Ts'o, stated that although ext4 has improved features, it is not a major advance; it uses old technology and is a stop-gap. In 2011, de-fragmentation features were announced for version 3.0 of the Linux kernel.[21] Besides Mason at Oracle, Miao Xie at Fujitsu contributed performance improvements.[22] In June 2012, Chris Mason left Oracle, but still continues to work on Btrfs. Features[edit] Planned features include:
DRBD:What is DRBD Coda Documentation Coda File System (Illustration by Gaich Muramatsu) and/or Sort: Coda Documentation old and new May 2000 This is a collection of documents some of which are in progress. 1. The following will be your primary resource for getting up and running: The original design and studies that were done can be found at Scientific papers from the Coda project We have a few manuals - these have details for programmers and advanced system administration, but are less well organised than the coda-howto: Papers are being written about the implementation: 2. The easiest way to get manuals, and internals documentation is to download the whole package from our documentation ftp site 3. For more information send us mail. View comments
HDFS Architecture Guide Introduction The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. Assumptions and Goals Hardware Failure Hardware failure is the norm rather than the exception. Streaming Data Access Applications that run on HDFS need streaming access to their data sets. Large Data Sets Applications that run on HDFS have large data sets. Simple Coherency Model HDFS applications need a write-once-read-many access model for files. “Moving Computation is Cheaper than Moving Data” A computation requested by an application is much more efficient if it is executed near the data it operates on. Portability Across Heterogeneous Hardware and Software Platforms HDFS has been designed to be easily portable from one platform to another. NameNode and DataNodes HDFS has a master/slave architecture. The File System Namespace
Features, Architecture and Requirements :: MooseFS network file system - Moose FS Google File System Un article de Wikipédia, l'encyclopédie libre. Schéma de principe de Google File System Google File System (GFS) est un système de fichiers distribué propriétaire. Il est développé par Google pour leurs propres applications. Il ne paraît pas être publiquement disponible et il est construit sur du code GPL (ext3 et Linux). Conception[modifier | modifier le code] GFS a été conçu pour répondre aux besoins de stockage de données des applications Google, notamment pour tout ce qui concerne ses activités de recherche sur le Web. Il est optimisé pour la gestion de fichiers de taille importante (jusqu'à plusieurs gigaoctets), et pour les opérations courantes des applications Google : les fichiers sont très rarement supprimés ou réécrits, la plupart des accès portent sur de larges zones et consistent surtout en des lectures, ou des ajouts en fin de fichier (record append); GFS a donc été conçu pour accélérer le traitement de ces opérations. Fichiers[modifier | modifier le code]
DLFP: MooseFS, système de fichier réparti à tolérance de panne MooseFS est un système de fichiers distribué méconnu regorgeant de qualités. En vrac : Le code est distribué sous GPLv3 ; Il utilise FUSE et fonctionne en espace utilisateur ; Il dispose d'une poubelle automatique à durée de rétention modifiable à souhait ; Il est très simple à déployer et administrer : comptez une heure, lecture de la documentation comprise pour avoir un serveur maître et quatre serveurs de données fonctionnels ; Compatible POSIX, il ne requiert aucune modification des programmes pour pouvoir y accéder ; L'ajout de machines pour agrandir l'espace disponible est d'une simplicité enfantine ; Vous choisissez le nombre de réplicas que vous désirez, par fichier ou par répertoire, pour la tolérance de panne, avec une seule commande, le tout à chaud… Le développement de MooseFS a débuté en 2005, et il a été libéré le 30 mai 2008. MooseFS est un système de fichiers méconnu regorgeant de qualités : Un peu d'histoire. C'est stable ! Bien sûr, tout n'est pas si parfait.
Unison Wiki - Main - UnisonFAQGeneral What are the differences between Unison and rsync? Rsync is a mirroring tool; Unison is a synchronizer. That is, rsync needs to be told "this replica contains the true versions of all the files; please make the other replica look exactly the same." Unison is capable of recognizing updates in both replicas and deciding which way they should be propagated. Both Unison and rsync use the so-called "rsync algorithm," by Andrew Tridgell and Paul Mackerras, for performing updates. What are the differences between Unison and CVS, Subversion, etc.? Both CVS and Unison can be used to keep a remote replica of a directory structure up to date with a central repository. Unison's main advantage is being somewhat more automatic and easier to use, especially on large groups of files. CVS, on the other hand, is a full-blown version control system, and it has lots of other features (version history, multiple branches, etc.) that Unison (which is just a file synchronizer) doesn't have.
PASTIS Pastis est un système de fichiers réparti multi-écrivain. Utilisant une architecture pair-à-pair (P2P) complètement décentralisée, il vise à permettre l'utilisation de la capacité de stockage agrégée de centaines de milliers d'ordinateurs connectés à Internet. La réplication des données permet d'assurer leur persistance malgré le caractère fortement dynamique du réseau, alors que des techniques cryptographiques en garantissent l'authenticité et l'intégrité. Dans Pastis le routage et stockage des données sont effectués par le protocole de routage Pastry et la table de hachage répartie (DHT) PAST. Nous avons développé un prototype de Pastis, codé en Java 1.5 et utilisant FreePastry 1.4.1, l'implémentation open source de Pastry/PAST. Publications J-M. F. J-M. J-M. Welcome