Getafix: How Facebook tools learn to fix bugs automatically - Facebook Code Facebook has built a tool called Getafix that automatically finds fixes for bugs and offers them to engineers to approve. This allows engineers to work more effectively, and it promotes better overall code quality.We believe Getafix is the first tool of its kind to be deployed to production at Facebook scale, contributing to the stability and performance of apps that billions of people use.Getafix powers Sapfix, which suggests fixes for bugs that our Sapienz testing tool finds. Getafix also provides fixes for bugs found by Infer, our static testing tool.Because Getafix learns from engineers’ past code fixes, its recommendations are intuitive for engineers to review.Getafix improves upon previous auto-fix technology by using more powerful techniques for learning fix patterns from past code changes. Getafix uses a more powerful clustering algorithm and also analyzes the context around the particular lines of problematic code to find more appropriate fixes. A new way of mining fix patterns
Facebook's Realtime Analytics System Recently, I was reading Todd Hoff's write-up on FaceBook real time analytics system. As usual, Todd did an excellent job in summarizing this video from Engineering Manager at Facebook Alex Himel, Engineering Manager at Facebook. In this first post, I’d like to summarize the case study, and consider some things that weren't mentioned in the summaries. This will lead to an architecture for building your own Realtime Time Analytics for Big-Data that might be easier to implement, using Facebook's experience as a starting point and guide as well as the experience gathered through a recent work with few of GigaSpaces customers. The second post provide a summary of that new approach as well as a pattern and a demo for building your own Real Time Analytics system.. The Business Drive for real time analytics: Time is money
The Three Ages of Google - Batch, Warehouse, Instant The world has changed. And some things that should not have been forgotten, were lost. I found these words from the Lord of the Rings echoing in my head as I listened to a fascinating presentation by Luiz André Barroso, Distinguished Engineer at Google, concerning Google's legendary past, golden present, and apocryphal future. His talk, Warehouse-Scale Computing: Entering the Teenage Decade, was given at the Federated Computing Research Conference. Luiz clearly knows his stuff and was early at Google, so he has a deep and penetrating perspective on the technology.
Scaling to 12 Million Concurrent Connections: How MigratoryData Did It Massive scalability is the biggest challenge we undertake at MigratoryData, a provider of an enterprise publish-subscribe messaging system for building very scalable real-time web and mobile applications. We recently published a blog post demonstrating 12 million concurrent connections with MigratoryData WebSocket Server running on a single 1U server. I am going to share some lessons learned while pushing the boundaries of scalability with MigratoryData WebSocket Server.
The LMAX Architecture LMAX is a new retail financial trading platform. As a result it has to process many trades with low latency. The system is built on the JVM platform and centers on a Business Logic Processor that can handle 6 million orders per second on a single thread. How to create sustainable open data projects with purpose There has been much hand-wringing of late about whether the explosion of government-run app contests over the last couple of years has generated any real value for the public. With only one of the Apps for Democracy projects still running, it’s easy to see the entire movement being written off as an overly optimistic fad. The organisation that I’m lucky enough to lead — mySociety — didn’t come from the world of app contests, but it does build the kind of open-source, open-data-grounded civic apps that such contests are suppose to produce. I believe that mySociety’s story shows that it’s possible to build meaningful, impactful civic and democratic web apps, to grow them to a scale where they’re unambiguously a good use of time and money, then sustain them for years at a time. You have to be just as focused on user needs as any company (and perhaps more so)
Six Things You Need to Know Before Going Serverless While the benefits of going serverless seem boundless, you need to keep in mind that it may not be the right fit for your project. Just consider the following limitations: A) Lambda’s maximum execution time is 15 minutes (900 seconds) Any resource heavy invocation that requires more than 15 minutes will be automatically terminated before it has had a chance to complete its task. B) No server management means Less Control! Digital Humanities Spotlight: 7 Important Digitization Projects by Maria Popova From Darwin’s marginalia to Voltaire’s correspondence, or what Dalí’s controversial World’s Fair pavilion has to do with digital myopia. Despite our remarkable technological progress in the past century and the growth of digital culture in the past decade, a large portion of humanity’s richest cultural heritage remains buried in analog archives. Bridging the disconnect is a fledgling discipline known as the Digital Humanities, bringing online historical materials and using technologies like infrared scans, geolocation mapping, and optical character recognition to enrich these resources with related information or make entirely new discoveries about them.
Understanding When to use RabbitMQ or Apache Kafka How do humans make decisions? In everyday life, emotion is often the circuit-breaking factor in pulling the trigger on a complex or overwhelming decision. But for experts making complex decisions that have long term consequences, it can’t be pure impulse. High performers typically use the circuit breaker of “instinct”, “gut feel” or other emotions only once their expert, unconscious mind has absorbed all the facts required to make a decision.
Amazon's Dynamo In two weeks we’ll present a paper on the Dynamo technology at SOSP, the prestigious biannual Operating Systems conference. Dynamo is internal technology developed at Amazon to address the need for an incrementally scalable, highly-available key-value storage system. The technology is designed to give its users the ability to trade-off cost, consistency, durability and performance, while maintaining high-availability. Let me emphasize the internal technology part before it gets misunderstood: Dynamo is not directly exposed externally as a web service; however, Dynamo and similar Amazon technologies are used to power parts of our Amazon Web Services, such as S3. Visualization deconstructed: Why animated geospatial data works In this, my first Visualization Deconstructed post, I’m expanding the scope to examine one of the most popular contemporary visualization techniques: animation of geospatial data over time. The beauty of photo versus the wonder of film In a previous post, Sebastien Pierre provided some excellent analysis about the illuminating visualization produced by Paul Butler, which examined the relationships between Facebook users around the world. Here, we saw the intricate beauty that comes from a designer who finds the sweet spot of insightful effectiveness and aesthetic elegance.