The essence of any nonfiction writing.
The basic problem in writing mathematics is the same as in writing biology, writing a novel, or writing directions for assembling a harpsichord: the problem is to communicate an idea. To do so, and to do it clearly, you must - have something to say, - have someone to say it to, - organize what you want to say, - arrange it in the order you it said in, - write it, rewrite it, and re-rewrite it several times, - be willing to think hard about and work hard on mechanical details such as diction, notation, and punctuation. That’s all there is to it.
(Source: stat.rice.edu)
older GCCs produced faster code than what newer versions produced, under the same options. Usually with very huge differences, over 20% between the two GCCs for instance, while elsewhere people celebrate 2% improvements.
One must be vigilant in our world of leaky abstractions.
(Source: lists.w3.org)
Bryan A. Garner recently posted “A Bizspeak Blacklist”:
It’s mission-critical to be plain-spoken, whether you’re trying to be best-of-breed at outside-the-box thinking or simply incentivizing colleagues to achieve a paradigm shift in core-performance value-adds. Leading-edge leveraging of your plain-English skill set will ensure that your actionable items synergize future-proof assets with your global-knowledge repository.
The article contains a pretty nice database of “biz-speak” words. I often find myself on the receiving end of these but, to my dismay, many of the words in Bryan’s article have crept into my own use. Time to break the habit.
GNU diction, an implementation of a Unix classic, allows users to specify their own bad usage databases. I created one based on the above article. Nobiz is a simple shell script that highlights bad business terms in your prose:
% lib/nobiz < /tmp/xxx
(stdin):1: It's [mission-critical] to be plain-spoken,
whether you're trying to be best-of-breed at
outside-the-box thinking or simply incentivizing
colleagues to achieve a [paradigm shift] in
core-performance value-adds.
(stdin):4: Leading-edge leveraging of your plain-English
skill set will ensure that your [actionable]
items [synergize] future-proof assets with your
global-knowledge repository.
4 phrases in 2 sentences found.
GNU diction is required to run it (brew install diction if you are on OSX)
Concurrent programming is an increasingly important topic in the context of building modern distributed systems. Most such systems have a large amount of inherent concurrency. For example: To accommodate for large corpus sizes, a search engine splits its index into many small pieces. In order to efficiently satisfy queries, it must issue requests to each of these shards concurrently.
Threads (or processes) are a common abstraction for programming concurrent systems: there are multiple threads of execution, each of which has its own stack. The system manages how these threads are mapped onto physical hardware, and how they are preempted and interleaved. This allows the operating system to suspend execution of a thread while it is waiting for I/O operations, allowing other threads to proceed. Traditionally threads have had both high fixed costs (stacks must be allocated in increments of page sizes) as well high context switching costs. The emergence of event based programming addressed these issues: there is only one thread, with a run-loop, and I/O events are explicitly dispatched (eg. data is ready to be read, a write completed, etc.). While this reduces the cost of threads (you have only one stack, you don’t need the OS to interleave your executions), the programming model is awkward: the programmer has to split the program up into pieces delimited by I/O operations. In C, in particular, you have to declare separate functions for each of these pieces. The introduction of the libevent book has more details. This is also the model of node.js though, because javascript has first-class closures, the problem is ameliorated somewhat: by nesting callbacks, you can easily share context. This improves brevity but not modularity: each sequence of actions is fixed in place, errors must be handled very carefully. Nontrivial applications become difficult to reason about very quickly.
The assumptions upon which event-based programming is based have largely withered: context switches aren’t very expensive anymore, and memory has become plentiful. However, the degree to which our systems need to be concurrent has also increased. It is not uncommon to handle 100s of thousands, if not millions, of operations simultaneously. The archetypical example of this sort of application are so-called “long-polling” web servers.
Languages like Haskell and Go implement lightweight threads in their runtimes, allowing for cheaply maintaining millions of threads of execution, with the runtime itself multiplexing I/O operations. Go manages this by using segmented stacks: it allocates stack space as needed. This requires both the complicity of the compiler and a different ABI, but that’s the beauty of being able to start over.
So, clearly, “events vs. threads” is a false dichotomy; the statement doesn’t even make much sense, and conflates two separate concerns: They denote two concrete programming models, differing in (1) how context is encoded (heap vs. stack), and (2) where multiplexing is done (in a library, or runtime/OS).
In Finagle and elsewhere, we use composable futures as the basis of our concurrent programming model (these are quite different than the venerable java.util.Future and its brethren). Futures, it is often argued, make up for the deficiencies of traditional “event programming”: callbacks are tedious, compromise modularity, and make for spaghetti-like code that is difficult to understand. Futures correct this by introducing constructs that make callbacks manageable.
This misses the point.
Futures offer a concurrent programming model that translates naturally to the types of concerns that arise when building distributed systems. Take for instance remote procedure calls; RPCs are inherently asynchronous. This means that each component operates at their own speed, and the components fail independently of each other. There is no shared clock, and no shared bus. RPCs usually have dramatically different latency characteristics than a typical local operation. In fact, this holds for any sort of I/O, not just RPCs.
Futures model the real world truthfully: A Future[T] represents a T-typed result, which may or may not be delivered at some time in the future, or which may fail altogether. Like real-world asynchronous operations, a future is always in one of 3 states: incomplete, completed successfully, or completed with a failure.
Futures provide a firewall between synchronous and asynchronous operations: Because futures are typed differently than regular values (a value furnished by a future has type Future[T], not T), you can easily, at a glance, discern which operations imply asynchronous semantics. This allows you to reason more easily about the runtime semantics of your code: not only are such operations slower, but they have very different failure semantics. Furthermore, futures are “tainting”: if you want to incorporate the result of a future, you have to “lift” other values in to them. That is, if you are computing anything that requires an asynchronous, the operation itself becomes asynchronous - its type must be Future[T]. The upshot is that semantics are witnessed by types; we can use the type system to model and enforce behavior. Put another way: we can use the compiler to guarantee certain behavior.
Futures compose well: Futures, like physically asynchronous operations, are closed under composition. This is important: the composition of two asynchronous operations can in turn be described as an asynchronous operation. Futures work the same way: I might compose two Futures sequentially (eg. because there is a data dependency) or concurrently (because there is not), both of which produce a new future. When composing Futures, failure handling is baked in: When composed sequentially, the sequence of operations short circuit at the first failure; when composed in a concurrent configuration, any underlying failure also fails the composed operation.
Futures are persistent: A Future[T] always means the same thing, refers to the same underlying operation, no matter how it is later used. To string operations together (concurrently or sequentially), new futures are produced, with new semantics. This makes reasoning about Futures very simple; you needn’t worry about how they might be used or modified elsewhere.
Futures liberate semantics from mechanics: By stringing futures together via the various combinators, the programmer is expressing the semantics of an operation, not its execution mechanics. In this example, adapted from the Finagle User’s Guide, we fetch all images in a web page:
val allImages: Future[Seq[Image]] =
fetchUrl(url) flatMap { bytes =>
val fetches = findImageUrls(bytes) map { url =>
fetchUrl(url)
}
Future.collect(fetches)
}
This can be read thus:
allImagesis the operation that first fetchesurl, then for each image URL in the body of the page that was fetched, the result of fetching all of those URLs as images in turn.
The code is fairly close to the English description of what we want to do. We have described the operation, but not how to perform it. In the description, there is inherent concurrency: the subsequent image URLs may all be fetched at the same time; this translates naturally into the concurrent combinators used with Futures, in this case Future.collect. Failure handling is omitted from the description, and is also omitted from the code. In this case, it is clear that we cannot proceed if any operation fails, and those are also the semantics of the composite operation (allImages). The data dependencies, inherent to the problem being solved, are all-revealing.
This separation enhances the ease of reasoning: the meaning of the operation isn’t intertwined instructions for how to go about executing it. No threads are spun up, no errors explicitly handled, no explicit coordination mechanism is required to gather the result of the image fetches. Instead, there are only data dependencies — to fetch all of the images, we need to fetch their URLs; to get the URLs we need to fetch the web page.
This property also allows us to separately consider the runtime behavior of such operations. The Future implementation might choose to limit the number of concurrent operations, for example, or to exploit inherent locality (like biasing connection pools so that a given connection is always handled by one underlying thread). A library might thread through tracing data to diagnose operations. This is not hypothetical: Finagle does all of this, and it’s done entirely as a library without any special runtime support.
Futures present a concurrent programming model that is appealing on its own. They should not be seen as a poor-man’s version of threads.
Ultimately such abstractions exist in the service of writing robust, safe, performant, and modular software. Futures have served us very well in this regard at Twitter, and I think they’ll have a long shelf-life as a go-to abstraction for constructing concurrent systems. We’re in the Cambrian period of concurrency, and I predict futures will emerge as one of the survivors.
This isn’t to say that other models aren’t also good. I’m personally a huge fan of Go/Haskell’s cheap threads model: because they are built into the language and runtime, their usage is enforced. This is perhaps the main sticking point of futures. Because they are implemented as 3rd-party libraries, their usage can be spotty, and you may find yourself always writing little bridges or adapters. At Twitter, we’re lucky in that we have a common system upon which our systems are built, so this is less of a problem. But this isn’t inherent to the model: C# and F# has it into the language and runtime. Scala’s SIP-14 also promises to unify that ecosystem.
As with all large-scale deployments, we have encountered failures of every type that we expected, and some we didn’t.
- It is vital to keep checksums of all crucial files (for example machine-type manifests) since they will become corrupted. Checksums also allow the detection of hand-edited configurations that were temporarily changed, for example as part of a debugging investigation. Without such automatic detection it is very hard to prevent gradual configuration drift in a large system.
- TCP/IP checksums are weak, and messages will be silently corrupted unless they are protected by additional application- level checksums.
- Networking hardware will malfunction and start flipping large numbers of bits; this both causes a storm of retries, and makes it inevitable that some errors will remain undetected by TCP.
- Computers will spontaneously start running very slowly, but keep making progress, so systems need to tolerate and detect this as well as fail-stop errors.
- Throttling and load shedding are crucial in all aspects of an automated system. Failure detectors must be able to distinguish between the symptoms of failure and overloading, otherwise overloaded computers may be marked as failed and removed from service, amplifying the problem and triggering a cascade of failures that disables the entire application
For various reasons, including performance and cost, Twitter has poured significant engineering effort into breaking down the site backend into smaller JVM based services. As a nice side effect we’ve been able to open source several of the libraries and other useful tools that came out of this…
Heather Miller recently added a new container type called Try to the Scala standard library. This is based on com.twitter.util.Try which we’ve made much productive use of at Twitter. After it was introduced, there was a minor uproar in parts of the Scala community, and discussion turned into trolling — not productive. I wanted to write about Try, its motivations, and why I (still) believe it is both useful and well-designed.
The basic idea of Try is to represent a computation that may fail, allowing the programmer to divorce exception handling from the stack. It provides a number of useful combinators (map, flatMap, rescue, handle, etc.) that enhance composition. For example, I can write a function that knows how to recover a certain kind of error, and apply it to any Try value. Just as with exceptions, if a Try fails but isn’t recovered (eg. with rescue), the computation is terminated. This allows me to write code that chains computations together, and since we use the standard naming scheme for these combinators (map, flatMap, filter) we can use Scala’s comprehension syntax:
for { res1 <- op1() res2 <- op2(res1) } yield res1 + res2
Moreover, since Try encodes fallible computations, we can use these combinators independently of where the values are computed. This allows us to compose such computations independently of the stack (traditional exception handlers need to be installed, roughly, on a stack frame).
It is natural, then, to implement futures on top of this as they have similar semantics of composition. The main difference is that a future’s value is deferred. A future’s state, thus, is either Pending, or Complete[Try[A]]. This was our first use case for Try, representing a Future’s internal state, but it has found much productive use elsewhere. For example, we have thread pools that return Try values. They’re also used to compose IO operations, and to implement simple parsers. We use it to make decisions about fallible computations (eg. is this retriable?). It has become a useful and versatile widget.
Since Try encodes computation failures, we decided early on that their combinators should themselves catch exceptions, converting them to encoded exceptions. That is, no operation on a Try value should itself throw an exception — in this sense, Try is safe. If you say
val result = Try { .. }
you are guaranteed that no exceptions will be thrown; any exception is encoded into the try value itself, no matter where it came from. This is a very nice guarantee to have indeed, and the programmer is free to make stronger assumptions about her code. And I’d furthermore argue that this is the intuitive behavior of such a construct: It mirrors, lexically, the behavior of try { ... } catch ....
The reason for the negative reactions, so far as I can tell, is that catching exceptions within the combinators violates some laws for Functors for which the name map is well-defined. See this issue, filed by Dan Rosen, for more information (It’s quite informative, and highlights the issue nicely). This violation was also pointed out by Tony Morris in simpler, more general terms:
forall x f. (x map f).isSuccess == x.isSuccess // false
These are good points, and I would like to have a nice discussion about it, without resorting to trolling. I think this represents a prime example of making certain tradeoffs — rigor for humanness — that is often left by the wayside the closer one moves towards “pure” functional programming.
I’m going to try to explain our motivations for this design, and why I believe it’s O.K.; better, even, than the more “algebraically correct” version.
As I alluded to above, the biggest reason for catching exceptions in combinators is that it provides a very strong guarantee: Try or its operations will never themselves throw exceptions; they are always encoded. This matches expectations by the programmer (the semantics of Try matches those of try), and provides strong guarantees. It provides a “safe box”, syntactically concise, for fallible computations.
If this weren’t the case, we’d effectively be introducing two categories of exceptions: Raw (on the stack) and encoded (by Try). That would certainly be more complex, and would almost certainly break the expectation of most programmers. It would result in more fragile code. Less rigor, but more humanity. And humanness is an extremely important design criteria for APIs. Ultimately, programmers are also humans. They rely on all sorts of assumptions, they are fallible. It is our job to make their jobs simple, it’s our job to ensure that code written with our APIs is likely to be correct, likely to be more robust.
Despite this alleged lack of rigor, I don’t believe we’ve experienced a single error in production due to Try, but it probably has prevented us from showing errors millions of times.
Maybe we shouldn’t have named it Try.map.
Thanks to John Owens for feedback.
Edits: I’ve removed some of the original language around trolling. I believe it was too strongly worded and did not come out right. I want to have a real discussion about this. Thanks so much to Daniel Spiewak for his level headed and illuminating comments below.
loading…