Planet Haskell without Tom Moertel's postsPipes Output
http://pipes.yahoo.com/pipes/pipe.info?_id=Jti_id1G3hGa1KrxdPQQIA
Sat, 23 Aug 2014 05:33:14 +0000http://pipes.yahoo.com/pipes/Informatics Independence Referendum Debate
http://wadler.blogspot.com/2014/08/informatics-independence-referendum.html
School of Informatics, University of EdinburghIndependence Referendum Debate4.00--5.30pm Monday 25 AugustAppleton Tower Lecture Room 2For the NAYs: Prof. Alan BundyFor the AYEs: Prof. Philip WadlerModerator: Prof. Mike FourmanAll members of the School of Informaticsand the wider university community welcome(This is a debate among colleagues and not a formal University event.All views expressed are those of the individuals who express them,and not the University of Edinburgh.)Philip Wadlertag:blogger.com,1999:blog-9757377.post-8453831262351180199Fri, 22 Aug 2014 10:42:00 +0000Research funding in an independent Scotland
http://wadler.blogspot.com/2014/08/research-funding-in-independent-scotland.html
A useful summary, written by a colleague.In Summary: the difference between the Scottish tax contribution and RCUK spending in Scotland is small compared to savings that will be made in other areas such as defenceFunding levels per institution are actually similar in Scotland to those in the rest of the UK, it’s just that there are more institutions hereBecause of its relative importance any independent Scottish government would prioritise researchIf rUK rejects a common research area it would lose the benefits of its previous investments, and the Scottish research capacity, which is supported by the Scottish government and the excellence of our universitiesThere are significant disadvantages with a No vote, resulting from UK immigration policy and the possibility of exiting the EUPhilip Wadlertag:blogger.com,1999:blog-9757377.post-5956952763092287171Fri, 22 Aug 2014 10:32:00 +0000Scotland can't save England
http://wadler.blogspot.com/2014/08/scotland-cant-save-england.html
Salmond concluded his debate with Darling by observing that for half his lifetime Scotland had been ruled by governments that Scotland had not elected. Many take this the other way, and fret that if Scotland leaves the UK, then Labour would never win an election. Wings Over Scotland reviews the figures. While Scotland has an effect on the size of the majority, elections would yield the same ruling party with or without Scotland in 65 of the last 67 years. To a first approximation, Scotland's impact over the rest of the UK is nil, while the rest of the UK overwhelms Scotland's choice half the time.1945 Labour govt (Attlee)————————————Labour majority: 146Labour majority without any Scottish MPs in Parliament: 143NO CHANGE WITHOUT SCOTTISH MPS1950 Labour govt (Attlee)————————————Labour majority: 5Without Scottish MPs: 2NO CHANGE1951 Conservative govt (Churchill/Eden)——————————————————–Conservative majority: 17Without Scottish MPs: 16NO CHANGE1955 Conservative govt (Eden/Macmillan)——————————————————–Conservative majority: 60Without Scottish MPs: 61NO CHANGE1959 Conservative govt (Macmillan/Douglas-Home)————————————————————————Conservative majority: 100Without Scottish MPs: 109NO CHANGE1964 Labour govt (Wilson)————————————Labour majority: 4Without Scottish MPs: -11CHANGE: LABOUR MAJORITY TO CONSERVATIVE MAJORITY OF 1(Con 280, Lab 274, Lib 5)1966 Labour govt (Wilson)————————————Labour majority: 98Without Scottish MPs: 77NO CHANGE1970 Conservative govt (Heath)——————————————–Conservative majority: 30Without Scottish MPs: 55NO CHANGE1974 Minority Labour govt (Wilson)————————————————Labour majority: -33Without Scottish MPs: -42POSSIBLE CHANGE – LABOUR MINORITY TO CONSERVATIVE MINORITY(Without Scots: Con 276, Lab 261, Lib 11, Others 16)1974b Labour govt (Wilson/Callaghan)—————————————————–Labour majority: 3Without Scottish MPs: -8CHANGE: LABOUR MAJORITY TO LABOUR MINORITY(Lab 278 Con 261 Lib 10 others 15)1979 Conservative govt (Thatcher)————————————————Conservative majority: 43Without Scottish MPs: 70NO CHANGE1983 Conservative govt (Thatcher)————————————————Conservative majority: 144Without Scottish MPs: 174NO CHANGE1987 Conservative govt (Thatcher/Major)——————————————————Conservative majority: 102Without Scottish MPs: 154NO CHANGE1992 Conservative govt (Major)———————————————Conservative majority: 21Without Scottish MPs: 71NO CHANGE1997 Labour govt (Blair)———————————–Labour majority: 179Without Scottish MPs: 139NO CHANGE2001 Labour govt (Blair)———————————–Labour majority: 167Without Scottish MPs: 129NO CHANGE2005 Labour govt (Blair/Brown)——————————————–Labour majority: 66Without Scottish MPs: 43NO CHANGE2010 Coalition govt (Cameron)——————————————Conservative majority: -38Without Scottish MPs: 19CHANGE: CON-LIB COALITION TO CONSERVATIVE MAJORITYPhilip Wadlertag:blogger.com,1999:blog-9757377.post-564020846520700160Thu, 21 Aug 2014 22:05:00 +0000How Scotland will be robbed
http://wadler.blogspot.com/2014/08/how-scotland-will-be-robbed.html
Thanks to the Barnett Formula, the UK government provides more funding per head in Scotland than in the rest of the UK. Better Together touts this as an extra £1400 in each person's pocket that will be lost if Scotland votes 'Aye' (famously illustrated with Lego). Put to one side the argument as to whether the extra £1400 is a fair reflection of the extra Scotland contributes to the UK economy, through oil and other means. The Barnett Formula is up for renegotiation. Will it be maintained if Scotland votes 'Nay'?Wings over Scotland lays out the argument that if Scotland opts to stick with Westminster then Westminster will stick it to Scotland.The Barnett Formula is the system used to decide the size of the “block grant” sent every year from London to the Scottish Government to run devolved services. ...Until now, however, it’s been politically impossible to abolish the Formula, as such a manifestly unfair move would lead to an upsurge in support for independence. In the wake of a No vote in the referendum, that obstacle would be removed – Scots will have nothing left with which to threaten Westminster.It would still be an unwise move for the UK governing party to be seen to simply obviously “punish” Scotland after a No vote. But the pledge of all three Unionist parties to give Holyrood “more powers” provides the smokescreen under which the abolition of Barnett can be executed and the English electorate placated.The block grant is a distribution of tax revenue. The “increased devolution” plans of the UK parties will instead make the Scottish Government responsible for collecting its own income taxes. The Office of Budget Responsibility has explained in detail how “the block grant from the UK government to Scotland will then be reduced to reflect the fiscal impact of the devolution of these tax-raising powers.” (page 4).But if Holyrood sets Scottish income tax at the same level as the UK, that’ll mean the per-person receipts are also the same, which means that there won’t be the money to pay for the “extra” £1400 of spending currently returned as part-compensation for Scottish oil revenues, because the oil revenues will be staying at Westminster. ...We’ve explained the political motivations behind the move at length before. The above is simply the mechanical explanation of how it will happen if Scotland votes No. The“if” is not in question – all the UK parties are united behind the plan.A gigantic act of theft will be disguised as a gift. The victories of devolution will be lost, because there’ll no longer be the money to pay for them. Tuition fees and prescription charges will return. Labour’s “One Nation” will manifest itself, with the ideologically troublesome differences between Scotland and the rest of the UK eliminated.And what’s more, it’ll all have been done fairly and above-board, because the Unionist parties have all laid out their intentions in black and white. They’ll be able to say, with justification, “Look, you can’t complain, this is exactly what we TOLD you we’d do”.This analysis looks persuasive to me, and I've not seen it put so clearly elsewhere. Please comment below if you know sources for similar arguments. Philip Wadlertag:blogger.com,1999:blog-9757377.post-349089037699298931Thu, 21 Aug 2014 21:36:00 +0000Transgressing the limits
http://theorylunch.wordpress.com/2014/01/14/transgressing-the-limits/
Today, the 14th of January 2014, we had a special session of our Theory Lunch. I spoke about ultrafilters and how they allow to generalize the notion of limit.
Consider the space of bounded sequences of real numbers, together with the supremum norm. We would like to define a notion of limit which holds for every and satisfies the well known properties of standard limit:
Linearity: .
Homogeneity: .
Monotonicity: if for every then .
Nontriviality: if for every then .
Consistency: if the limit exists in the classical sense, then the two notions coincide.
The consistency condition is reasonable also because it avoids trivial cases: if we fix and we define the limit of the sequence as the value , then the first four properties are satisfied.
Let us recall the classical definition of limit: we say that converges to if and only if, for every , the set of values such that is cofinite, i.e., has a finite complement: the inequality can be satisfied at most for finitely many values of . The family of cofinite subsets of (in fact, of any set ) has the following properties:
Upper closure: if and then .
Meet stability: if then .
A family of subsets of with the two properties above is called a filter on . An immediate example is the trivial filter ; another example is the improper filter . The family of cofinite subset of is called the Fréchet filter on . The Fréchet filter is not the improper one if and only if is infinite.
An ultrafilter on is a filter on satisfying the following additional conditions:
Properness: .
Maximality: for every , either or .
For example, if , then is an ultrafilter on , called the principal ultrafilter generated by . Observe that : if we say that is free. These are, in fact, the only two options.
Lemma 1. For a proper filter to be an ultrafilter, it is necessary and sufficient that it satisfies the following condition: for every and nonempty , if then for at least one .
Proof: It is sufficient to prove the thesis with . If with , then is a proper filter that properly contains . If the condition is satisfied, for every which is neither nor we have , thus either or .
Theorem 1. Every nonprincipal ultrafilter is free. In addition, an ultrafilter is free if and only if it extends the Fréchet filter. In particular, every ultrafilter over a finite set is principal.
Proof: Let be a nonprincipal ultrafilter. Let : then , so either there exists such that and , or there exists such that and . In the first case, ; in the second case, we consider and reduce to the first case. As is arbitrary, is free.
Now, for every the set belongs to but not to : therefore, no principal ultrafilter extends the Fréchet filter. On the other hand, if is an ultrafilter, is finite, and , then by maximality, hence for some because of Lemma 1, thus cannot be a free ultrafilter.
So it seems that free ultrafilters are the right thing to consider when trying to expand the concept of limit. There is an issue, though: we have not seen any single example of a free ultrafilter; in fact, we do not even (yet) know whether free ultrafilters do exist! The answer to this problem comes, in a shamelessly nonconstructive way, from the following
Ultrafilter lemma. Every proper filter can be extended to an ultrafilter.
The ultrafilter lemma, together with Theorem 1, implies the existence of free ultrafilters on every infinite set, and in particular on . On the other hand, to prove the ultrafilter lemma the Axiom of Choice is required, in the form of Zorn’s lemma. Before giving such proof, we recall that a family of sets has the finite intersection property if every finite subfamily has a nonempty intersection: every proper filter has the finite intersection property.
Proof of the ultrafilter lemma. Let be a proper filter on and let be the family of the collections of subsets of that extend and have the finite intersection property, ordered by inclusion. Let be a totally ordered subfamily of : then extends and has the finite intersection property, because for every finitely many there exists by construction such that .
By Zorn’s lemma, has a maximal element , which surely satisfies and . If and , then still has the finite intersection property, therefore by maximality. If then still has the finite intersection property, therefore again by maximality.
Suppose, for the sake of contradiction, that there exists such that and : then neither nor have the finite intersection property, hence there exist such that . But means , and means : therefore,
against having the finite intersection property.
We are now ready to expand the idea of limit. Let be a metric space and let be an ultrafilter on : we say that is the ultralimit of the sequence along if for every the set
belongs to . (Observe how, in the standard definition of limit, the above set is required to belong to the Fréchet filter.) If this is the case, we write
Ultralimits, if they exist, are unique and satisfy our first four conditions. Moreover, the choice of a principal ultrafilter corresponds to the trivial definition . So, what about free ultrafilters?
Theorem 2. Every bounded sequence of real numbers has an ultralimit along every free ultrafilter on .
Proof: It is not restrictive to suppose for every . Let be an arbitrary, but fixed, free ultrafilter on . We will construct a sequence of closed intervals , , such that and for every . By the Cantor intersection theorem it will be : we will then show that .
Let . Let be either or , chosen according to the following criterion: . If both halves satisfy the criterion, then we just choose one once and for all. We iterate the procedure by always choosing as one of the two halves of such that .
Let . Let , and let be so large that : then , thus . As the smaller set belongs to , so does the larger one.
We have thus almost achieved our original target: a notion of limit which applies to every bounded sequence of real numbers. Such notion will depend on the specific free ultrafilter we choose: but it is already very reassuring that such a notion exists at all! To complete our job we need one more check: we have to be sure that the definition is consistent with the classical one. And this is indeed what happens!
Theorem 3. Let be a sequence of real numbers and let . Then in the classical sense if and only if for every free ultrafilter on .
To prove Theorem 3 we make use of an auxiliary result, which is of interest by itself.
Lemma 2. Let be the family of collections of subsets of that have the finite intersection property. The maximal elements of are precisely the ultrafilters.
Proof: Every ultrafilter is clearly maximal in . If is maximal in , then it is clearly proper and upper closed, and we can reason as in the proof of the ultrafilter lemma to show that it is actually an ultrafilter.
Proof of Theorem 3: Suppose does not converge to in the classical sense. Fix such that the set is infinite. Then the family has the finite intersection property: an ultrafilter that extends must be free. Then , and does not have an ultralimit along .
The converse implication follows from the classical definition of limit, together with the very notion of free ultrafilter.
Theorem 3 does hold for sequences of real numbers, but does not extend to arbitrary metric spaces. In fact, the following holds, which we state without proving.
Theorem 4. Let be a metric space. The following are equivalent.
For some free ultrafilter on , every sequence in has an ultralimit along .
For every free ultrafilter on , every sequence in has an ultralimit along .
is compact.
Ultrafilters are useful in many other contexts. For instance, they are used to construct hyperreal numbers, which in turn allow a rigorous definition of infinitesimals and the foundation of calculus over those. But this might be the topic for another Theory Lunch talk.Silvio Capobiancohttp://theorylunch.wordpress.com/?p=1221Tue, 14 Jan 2014 16:04:03 +0000The fundamental problem of programming language package management
http://feedproxy.google.com/~r/ezyang/~3/VUSeqjV5Fs8/
Why are there so many goddamn package managers? They sprawl across both operating systems (apt, yum, pacman, Homebrew) as well as for programming languages (Bundler, Cabal, Composer, CPAN, CRAN, CTAN, EasyInstall, Go Get, Maven, npm, NuGet, OPAM, PEAR, pip, RubyGems, etc etc etc). "It is a truth universally acknowledged that a programming language must be in want of a package manager." What is the fatal attraction of package management that makes programming language after programming language jump off this cliff? Why can't we just, you know, reuse an existing package manager?
You can probably think of a few reasons why trying to use apt to manage your Ruby gems would end in tears. "System and language package managers are completely different! Distributions are vetted, but that's completely unreasonable for most libraries tossed up on GitHub. Distributions move too slowly. Every programming language is different. The different communities don't talk to each other. Distributions install packages globally. I want control over what libraries are used." These reasons are all right, but they are missing the essence of the problem.
The fundamental problem is that programming languages package management is decentralized.
This decentralization starts with the central premise of a package manager: that is, to install software and libraries that would otherwise not be locally available. Even with an idealized, centralized distribution curating the packages, there are still two parties involved: the distribution and the programmer who is building applications locally on top of these libraries. In real life, however, the library ecosystem is further fragmented, composed of packages provided by a huge variety of developers. Sure, the packages may all be uploaded and indexed in one place, but that doesn't mean that any given author knows about any other given package. And then there's what the Perl world calls DarkPAN: the uncountable lines of code which probably exist, but which we have no insight into because they are locked away on proprietary servers and source code repositories. Decentralization can only be avoided when you control absolutely all of the lines of code in your application.. but in that case, you hardly need a package manager, do you? (By the way, my industry friends tell me this is basically mandatory for software projects beyond a certain size, like the Windows operating system or the Google Chrome browser.)
Decentralized systems are hard. Really, really hard. Unless you design your package manager accordingly, your developers will fall into dependency hell. Nor is there a one "right" way to solve this problem: I can identify at least three distinct approaches to the problem among the emerging generation of package managers, each of which has their benefits and downsides.
Pinned versions. Perhaps the most popular school of thought is that developers should aggressively pin package versions; this approach advocated by Ruby's Bundler, PHP's Composer, Python's virtualenv and pip, and generally any package manager which describes itself as inspired by the Ruby/node.js communities (e.g. Java's Gradle, Rust's Cargo). Reproduceability of builds is king: these package managers solve the decentralization problem by simply pretending the ecosystem doesn't exist once you have pinned the versions. The primary benefit of this approach is that you are always in control of the code you are running. Of course, the downside of this approach is that you are always in control of the code you are running. An all-to-common occurrence is for dependencies to be pinned, and then forgotten about, even if there are important security updates to the libraries involved. Keeping bundled dependencies up-to-date requires developer cycles--cycles that more often than not are spent on other things (like new features).
A stable distribution. If bundling requires every individual application developer to spend effort keeping dependencies up-to-date and testing if they keep working with their application, we might wonder if there is a way to centralize this effort. This leads to the second school of thought: to centralize the package repository, creating a blessed distribution of packages which are known to play well together, and which will receive bug fixes and security fixes while maintaining backwards compatibility. In programming languages, this is much less common: the two I am aware of are Anaconda for Python and Stackage for Haskell. But if we look closely, this model is exactly the same as the model of most operating system distributions. As a system administrator, I often recommend my users use libraries that are provided by the operating system as much as possible. They won't take backwards incompatible changes until we do a release upgrade, and at the same time you'll still get bugfixes and security updates for your code. (You won't get the new hotness, but that's essentially contradictory with stability!)
Embracing decentralization. Up until now, both of these approaches have thrown out decentralization, requiring a central authority, either the application developer or the distribution manager, for updates. Is this throwing out the baby with the bathwater? The primary downside of centralization is the huge amount of work it takes to maintain a stable distribution or keep an individual application up-to-date. Furthermore, one might not expect the entirety of the universe to be compatible with one another, but this doesn't stop subsets of packages from being useful together. An ideal decentralized ecosystem distributes the problem of identifying what subsets of packages work across everyone participating in the system. Which brings us to the fundamental, unanswered question of programming languages package management:
How can we create a decentralized package ecosystem that works?
Here are a few things that can help:
Stronger encapsulation for dependencies. One of the reasons why dependency hell is so insidious is the dependency of a package is often an inextricable part of its outwards facing API: thus, the choice of a dependency is not a local choice, but rather a global choice which affects the entire application. Of course, if a library uses some library internally, but this choice is entirely an implementation detail, this shouldn't result in any sort of global constraint. Node.js's NPM takes this choice to its logical extreme: by default, it doesn't deduplicate dependencies at all, giving each library its own copy of each of its dependencies. While I'm a little dubious about duplicating everything (it certainly occurs in the Java/Maven ecosystem), I certainly agree that keeping dependency constraints local improves composability.
Advancing semantic versioning. In a decentralized system, it's especially important that library writers give accurate information, so that tools and users can make informed decisions. Wishful, invented version ranges and artistic version number bumps simply exacerbate an already hard problem (as I mentioned in my previous post). If you can enforce semantic versioning, or better yet, ditch semantic versions and record the true, type-level dependency on interfaces, our tools can make better choices. The gold standard of information in a decentralized system is, "Is package A compatible with package B", and this information is often difficult (or impossible, for dynamically typed systems) to calculate.
Centralization as a special-case. The point of a decentralized system is that every participant can make policy choices which are appropriate for them. This includes maintaining their own central authority, or deferring to someone else's central authority: centralization is a special-case. If we suspect users are going to attempt to create their own, operating system style stable distributions, we need to give them the tools to do so... and make them easy to use!
For a long time, the source control management ecosystem was completely focused on centralized systems. Distributed version control systems such as Git fundamentally changed the landscape: although Git may be more difficult to use than Subversion for a non-technical user, the benefits of decentralization are diverse. The Git of package management doesn't exist yet: if someone tells you that package management is solved, just reimplement Bundler, I entreat you: think about decentralization as well!Edward Z. Yanghttp://blog.ezyang.com/?p=9103Thu, 21 Aug 2014 13:02:53 +0000SmartChecking Matt Might’s Red-Black Trees
http://leepike.wordpress.com/2014/08/20/smartchecking-matt-mights-red-black-trees/
Matt Might gave a nice intro to QuickCheck via testing red-black trees recently. Of course, QuickCheck has been around for over a decade now, but it’s still useful (if underused–why aren’t you QuickChecking your programs!?).
In a couple of weeks, I’m presenting a paper on an alternative to QuickCheck called SmartCheck at the Haskell Symposium.
SmartCheck focuses on efficiently shrinking and generalizing large counterexamples. I thought it’d be fun to try some of Matt’s examples with SmartCheck.
The kinds of properties Matt Checked really aren’t in the sweet spot of SmartCheck, since the counterexamples are so small (Matt didn’t even have to define instances for shrink!). SmartCheck focuses on shrinking and generalizing large counterexamples.
Still, let’s see what it looks like. (The code can be found here.)
SmartCheck is only interesting for failed properties, so let’s look at an early example in Matt’s blog post where something goes wrong. A lot of the blog post focuses on generating sufficiently constrained arbitrary red-black trees. In the section entitled, “A property for balanced black depth”, a property is given to check that the path from the root of a tree to every leaf passes through the same number of black nodes. An early generator for trees fails to satisfy the property.
To get the code to work with SmartCheck, we derive Typeable and Generic instances for the datatypes, and use GHC Generics to automatically derive instances for SmartCheck’s typeclass. The only other main issue is that SmartCheck doesn’t support a `forall` function like in QuickCheck. So instead of a call to QuickCheck such as
> quickCheck (forAll nrrTree prop_BlackBalanced)
We change the arbitrary instance to be the nrrTree generator.
Because it is so easy to find a small counterexample, SmartCheck’s reduction algorithm does a little bit of automatic shrinking, but not too much. For example, a typical minimal counterexample returned by SmartCheck looks like
T R E 2 (T B E 5 E)
which is about as small as possible. Now onto generalization!
There are three generalization phases in SmartCheck, but we’ll look at just one, in which a formula is returned that is universally quantified if every test case fails. For the test case above, SmartCheck returns the following formula:
forall values x0 x1:
T R E 2 (T B x1 5 x0)
Intuitively, this means that for any well-typed trees chosen that could replace the variables x0 and x1, the resulting formula is still a counterexample.
The benefit to developers is seeing instantly that those subterms in the counterexample probably don’t matter. The real issue is that E on the left is unbalanced with (T B E 5 E) on the right.
One of the early design decisions in SmartCheck was focus on structurally shrinking data types and essentially ignore “base types” like Int, char, etc. The motivation was to improve efficiency on shrinking large counterexamples.
But for a case like this, generalizing base types would be interesting. We’d hypothetically get something like
forall values (x0, x1 :: RBSet Int) (x2, x3 :: Int):
T R E x2 (T B x1 x3 x0)
further generalizing the counterexample. It may be worth adding this behavior to SmartCheck.
SmartCheck’s generalization begins to bridge the gap from specific counterexamples to formulas characterizing counterexamples. The idea is related to QuickSpec, another cool tool developed by Claessen and Hughes (and SmallBone). Moreover, it’s a bridge between testing and verification, or as Matt puts it, from the 80% to the 20%.Lee Pikehttp://leepike.wordpress.com/?p=713Thu, 21 Aug 2014 04:53:30 +0000IAP: Speeding up conduit
https://www.fpcomplete.com/blog/2014/08/iap-speeding-up-conduit
This post contains fragments of active Haskell code, best viewed and executed at
https://www.fpcomplete.com/blog/2014/08/iap-speeding-up-conduit
As most of us know, performance isn't a one-dimensional spectrum. There are in
fact multiple different ways to judge performance of a program. A commonly
recognized tradeoff is that between CPU and memory usage. Often times, a
program can be sped up by caching more data, for example.conduit is a streaming data library. In that sense, it has two very specific
performance criterion it aims for:Constant memory usage.Efficient usage of scarce resources, such as closing file descriptors as early as possible.While CPU performance is always a nice goal, it has never been my top priority in
the library's design, especially given that in the main use case for conduit
(streaming data in an I/O context), the I/O cost almost always far outweighs any
CPU overhead from conduit.However, for our upcoming Integrated Analysis
Platform (IAP) release, this is no longer
the case. conduit will be used in tight loops, where we do need to optimize
for the lowest CPU overhead possible.This blog post covers the first set of optimizations I've applied to conduit.
There is still more work to be done, and throughout this blogpost I'll be
describing some of the upcoming changes I am attempting.I'll give a brief summary up front:Applying the codensity transform results in much better complexity of monadic bind.We're also less reliant on rewrite rules firing, which has always been unreliable (and now I know why).This change does represent a breaking API change. However, it only affects users of the Data.Conduit.Internal module. If you've just been using the public API, your code will be unaffected, besides getting an automatic speedup.These changes will soon be released as conduit 1.2.0, after a period for community feedback.Note that this blog post follows the actual steps I went through (more or less)
in identifying the performance issues I wanted to solve. If you want to skip
ahead to the solution itself, you may want to skip to the discussion on
difference lists, or even straight to continuation passing
style, church-encoding,
codensity.By the way, after I originally wrote this blog post, I continued working on the
optimizations I describe as possible future enhancements. Those are actually
working out far better than I expected, and it looks like conduit 1.2.0 will be
able to ship with them. I'll be writing a separate blog post detailing those
changes. A bit of a teaser is: for vector-equivalent code, conduit now
generates identical core as vector itself.The benchmarksBefore embarking on any kind of serious optimizations, it's important to have
some benchmarks. I defined three benchmarks for the work I was going to be
doing:A simple sum: adding up the numbers from 1 to 10000. This is to get a baseline of the overhead coming from conduit.A monte carlo analysis: This was based on a previous IAP blog post. I noticed when working on that benchmark that, while the conduit solution was highly memory efficient, there was still room to speed up the benchmark.Sliding vectors: Naren Sundar recently sent a sliding windows pull requests, which allow us to get a view of a fixed size of a stream of values. This feature is very useful for a number of financial analyses, especially regarding time series.Naren's pull request was based on immutable data structures, and for those cases it is highly efficient. However, it's possible to be far more memory efficient by writing to a mutable vector instead, and then taking immutable slices of that vector. Mihaly Barasz sent a pull request for this feature, and much to our disappointment, for small window sizes, it performed worse than sliding windows. We want to understand why.You can see the benchmark code, which stays mostly unchanged for the rest of this blog post (a few new cases are added to demonstrate extra points). The benchmarks always contain a low-level base case representing the optimal performance we can expect from hand-written Haskell (without resorting to any kind of FFI tricks or the like).You can see the first run
results
which reflect conduit 1.1.7, plus inlining of a few functions. Some initial analysis:Control.Monad.foldM is surpringly slow.Data.Conduit.List.foldM has a rather steep performance hit versus Data.Conduit.List.fold.There's a very high overhead in the monte carlo analysis.For sliding vector, the conduit overhead is more pronounced at smaller window sizes.But even with large window sizes, mutable vector conduits still have a large overhead. The sliding window/immutable approach, however, shows almost no overhead.That hopefully sets the scene enough for us to begin to dive in.Rewrite rules: liftGHC offers a very powerful optimization technique: rewrite rules. This allows
you to tell the compiler that a certain expression can be rewritten to a more
efficient one. A common example of a rewrite rule would be to state that map f
. map g is the same as map (f . g). This can be expressed as:{-# RULES "map f . map g" forall f g. map f . map g = map (f . g) #-}Note that GHC's list rewrite rules are actually more complicated than this, and
revolve around a concept called build/foldr fusion.Let's look at the implementation of the yield function in conduit (with some newtypes stripped away):yield :: Monad m => o -> ConduitM i o m ()
yield o = HaveOutput (Done ()) (return ()) o
{-# INLINE [1] yield #-}
{-# RULES
"yield o >> p" forall o (p :: ConduitM i o m r).
yield o >> p = HaveOutput p (return ()) o
#-}The core datatype of conduit is recursive. The HaveOutput constructor
contains a field for "what to do next." In the case of yield, there isn't
anything to do next, so we fill that with Done (). However, creating that
Done () value just to throw it away after a monadic bind is wasteful. So we
have a rewrite rule to fuse those two steps together.But no such rewrite rule exists for lift! My first step was to add such a
rule, and check the
results.
Unfortunately, the rule didn't have any real impact, because it wasn't firing.
Let's put that issue to the side; we'll come back to it later.Cleanup, inliningOne of the nice features introduced in (I believe) GHC 7.8 is that the compiler
will now warn you when a rewrite rule may not fire. When compiling conduit, I
saw messages like:Data/Conduit/List.hs:274:11: Warning:
Rule "source/map fusion $=" may never fire
because ‘$=’ might inline first
Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘$=’
Data/Conduit/List.hs:275:11: Warning:
Rule "source/map fusion =$=" may never fire
because ‘=$=’ might inline first
Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘=$=’
Data/Conduit/List.hs:542:11: Warning:
Rule "source/filter fusion $=" may never fire
because ‘$=’ might inline first
Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘$=’
Data/Conduit/List.hs:543:11: Warning:
Rule "source/filter fusion =$=" may never fire
because ‘=$=’ might inline first
Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘=$=’
Data/Conduit/List.hs:552:11: Warning:
Rule "connect to sinkNull" may never fire
because ‘$$’ might inline first
Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘$$’This demonstrates an important interaction between inlining and rewrite rules.
We need to make sure that expressions that need to be rewritten are not inlined
first. If they are first inlined, then GHC won't be able to rewrite them to our
more optimized version.A common approach to this is to delay inlining of functions until a later
simplification phase. The GHC simplification process runs in multiple steps,
and we can state that rules and inlining should only happen before or after a
certain phase. The phases count down from 2 to 0, so we commonly want to delay
inlining of functions until phase 0, if they may be subject to rewriting.Conversely, some functions need to be inlined before a rewrite rule can fire. In stream fusion, for example, the fusion framework depends on the following sequencing to get good performance:map f . map g
-- inline map
unstream . mapS f . stream . unstream . mapS g . stream
-- rewrite stream . unstream
unstream . mapS f . mapS g . stream
-- rewrite mapS . mapS
unstream . mapS (f . g) . streamIn conduit, we need to make sure that all of this is happening in the correct
order. There was one particular complexity that made it difficult to ensure
this happened. conduit in fact has two core datatypes: Pipe and ConduitM,
with the latter being a more friendly newtype wrapper around the first. Up
until this point, the code for the two was jumbled into a single internal
module, making it difficult to track which things were being written in which
version of the API.My next step was to split things into .Pipe and .Conduit internal
modules,
and then clean up GHC's
warnings
to get rules to fire more reliably. This gave a modest performance
boost
to the sliding vector benchmarks, but not much else. But it does pave the way
for future improvements.Getting serious about sum, by cheatingThe results so far have been uninspiring. We've identified a core problem (too
many of those Done data constructors being used), and noticed that the
rewrite rules that should fix that don't seem to be doing their job. Now
let's take our first stab at really improving performance: with aggressive
rewrite rules.Our sum benchmark is really simple: use enumFromTo to create a stream of
values, and fold (or foldM) to consume that. The thing that slows us down
is that, in between these two simple functions, we end up allocating a bunch of
temporary data structures. Let's get rid of them with rewrite
rules!This certainly did the
trick.
The conduit implementation jumped from 185us to just 8.63us. For comparison,
the low level approach (or vector's stream fusion) clocks in at 5.77us, whereas
foldl' on a list is 80.6us. This is a huge win!But it's also misleading. All we've done here is sneakily rewritten our conduit
algorithm into a low-level format. This solves the specific problem on the
table (connecting enumFromTo with fold), but won't fully generalize to other
cases. A more representative demonstration of this improvement is the speedup
for foldM, which went from 1180us to 81us. The reason this is more realistic
is that the rewrite rule is not specialized to enumFromTo, but rather works
on any Source.I took a big detour at this point, and ended up writing an initial
implementation of stream fusion in
conduit.
Unfortunately, I ran into a dead end on that branch, and had to put that work
to the side temporarily. However, the improvements discussed in the rest of
this blog post will hopefully reopen the door to stream fusion, which I hope to
investigate next.Monte carlo, and associativityNow that I'd made the results of the sum benchmark thoroughly useless, I
decided to focus on the results of monte carlo, where the low level
implementation still won by a considerable margin (3.42ms vs 10.6ms). The
question was: why was this happening? To understand, let's start by looking at the code:analysis = do
successes <- sourceRandomN count
$$ CL.fold (\t (x, y) ->
if (x*x + y*(y :: Double) < 1)
then t + 1
else t)
(0 :: Int)
return $ fromIntegral successes / fromIntegral count * 4
sourceRandomN :: (MWC.Variate a, MonadIO m) => Int -> Source m a
sourceRandomN cnt0 = do
gen <- liftIO MWC.createSystemRandom
let loop 0 = return ()
loop cnt = do
liftIO (MWC.uniform gen) >>= yield >> loop (cnt - 1)
loop cnt0The analysis function is not very interesting: it simply connects sourceRandomN with a fold. Given that we now have a well behaved and consistently-firing rewrite rule for connecting to folds, it's safe to say that was not the source of our slowdown. So our slowdown must be coming from:liftIO (MWC.uniform gen) >>= yield >> loop (cnt - 1)This should in theory generate really efficient code. yield >> loop (cnt - 1)
should be rewritten to \x -> HaveOutput (loop (cnt - 1)) (return ()) x), and
then liftIO should get rewritten to generate:PipeM $ do
x <- MWC.uniform gen
return $ HaveOutput (loop $ cnt - 1) (return ()) xI added another
commit
to include a few more versions of the monte carlo benchmark (results here). The two most
interesting are:Explicit usage of the Pipe constructors:sourceRandomNConstr :: (MWC.Variate a, MonadIO m) => Int -> Source m a
sourceRandomNConstr cnt0 = ConduitM $ PipeM $ do
gen <- liftIO MWC.createSystemRandom
let loop 0 = return $ Done ()
loop cnt = do
x <- liftIO (MWC.uniform gen)
return $ HaveOutput (PipeM $ loop (cnt - 1)) (return ()) x
loop cnt0This version ran in 4.84ms, vs the original conduit version which ran in 15.8ms. So this is definitely the problem!Explicitly force right-associated binding order:sourceRandomNBind :: (MWC.Variate a, MonadIO m) => Int -> Source m a
sourceRandomNBind cnt0 = lift (liftIO MWC.createSystemRandom) >>= \gen ->
let loop 0 = return ()
loop cnt = do
lift (liftIO $ MWC.uniform gen) >>= (\o -> yield o >> loop (cnt - 1))
in loop cnt0Or to zoom in on the important bit:lift (liftIO $ MWC.uniform gen) >>= (\o -> yield o >> loop (cnt - 1))By the monad laws, this code is identical to the original. However, instead of standard left-associativity, we have right associativity or monadic bind. This code ran in 5.19ms, an approximate threefold speedup vs the left associative code!This issue of associativity was something Roman Cheplyaka told me about back
in
April,
so I wasn't surprised to see it here. Back then, I'd looked into using
Codensity together with ConduitM, but didn't get immediate results, and
therefore postponed further research until I had more time.OK, so why exactly does left-associativity hurt us so much? There are two reasons actually:Generally speaking, many monads perform better when they are right associated. This is especially true for free monads, of which conduit is just a special case. Janis Voigtl ̈ander's paper Asymptotic Improvement of Computations over Free Monads and Edward Kmett's blog post series free monads for less do a far better job of explaining the issue than I could.In the case of conduit, left associativity prevented the lift and yield rewrite rules from firing, which introduced extra, unnecessary monadic bind operations. Forcing right associativity allows these rules to fire, avoiding a lot of unnecessary data constructor allocation and analysis.At this point, it became obvious at this point that the main slowdown I
was seeing was driven by this problem. The question is: how should we
solve it?Difference listsTo pave the way for the next step, I want to take a quick detour and talk about
something simpler: difference lists. Consider the following code:(((w ++ x) ++ y) ++ z)Most experienced Haskellers will cringe upon reading that. The append operation
for a list needs to traverse every cons cell in its left value. When we
left-associate append operations like this, we will need to traverse every cell
in w, then every cell in w ++ x, then every cell in w ++ x ++ y. This is highly
inefficient, and would clearly be better done in a right-associated style
(sound familiar?).But forcing programmers to ensure that their code is always right-associated
isn't always practical. So instead, we have two common alternatives. The first
is: use a better datastructure. In particular, Data.Sequence has far cheaper
append operations than lists.The other approach is to use difference lists. Difference lists are
functions instead of actual list values. They are instructions for adding
values to the beginning of the list. In order to append, you use normal
function composition. And to convert them to a list, you apply the resulting
function to an empty list. As an example:type DList a = [a] -> [a]
dlist1 :: DList Int
dlist1 rest = 1 : 2 : rest
dlist2 :: DList Int
dlist2 rest = 3 : 4 : rest
final :: [Int]
final = dlist1 . dlist2 $ []
main :: IO ()
main = print finalBoth difference lists and sequences have advantages. Probably the simplest summary is:Difference lists have smaller constant factors for appending.Sequences allow you to analyze them directly, without having to convert them to a different data type first.That second point is important. If you need to regularly analyze your list and
then continue to append, the performance of a difference list will be abysmal.
You will constantly be swapping representations, and converting from a list to
a difference list is an O(n) operation. But if you will simply be constructing
a list once without any analysis, odds are difference lists will be faster.This situation is almost identical to our problems with conduit. Our monadic
composition operator- like list's append operator- needs to traverse the entire
left hand side. This connection is more clearly spelled out in Reflection
without Remorse by Atze van
der Ploeg and Oleg Kiselyov (and for me, care of Roman).Alright, with that out of the way, let's finally fix conduit!Continuation passing style, church-encoding, codensityThere are essentially two things we need to do with conduits:Monadically compose them to sequence two streams into a larger stream.Categorically compose them to connect one stream to the next in a pipeline.The latter requires that we be able to case analyze our datatypes, while
theoretically the former does not: something like difference lists for simple
appending would be ideal. In the past, I've tried out a number of different
alternative implementations of conduit, none of which worked well enough. The
problem I always ran into was that either monadic bind became too expensive, or
categorical composition became too expensive.Roman, Mihaly, Edward and I discussed these issues a
bit
on Github, and based on Roman's advice, I went ahead with writing a benchmark
of different conduit
implementations. I
currently have four implementations in this benchmark (and hope to add more):Standard, which looks very much like conduit 1.1, just a bit simplified (no rewrite rules, no finalizers, no leftovers).Free, which is conduit rewritten to explicitly use the free monad transformer.Church, which modifies Free to instead use the Church-encoded free monad transformer.Codensity, which is a Codensity-transform-inspired version of conduit.You can see the benchmark
results,
which clearly show the codensity version to be the winner. Though it would be
interesting, I think I'll avoid going into depth on the other three
implementations for now (this blog post is long enough already).What is Codensity?Implementing Codensity in conduit just means changing the ConduitM newtype
wrapper to look like this:newtype ConduitM i o m r = ConduitM
{ unConduitM :: forall b.
(r -> Pipe i i o () m b) -> Pipe i i o () m b
}What this says is "I'm going to provide an r value. If you give me a function that needs an r value, I'll give it that r value and then continue with the resulting Pipe." Notice how similar this looks to the type signature of monadic bind itself:(>>=) :: Pipe i i o () m r
-> (r -> Pipe i i o () m b)
-> Pipe i i o () m bThis isn't by chance, it's by construction. More information is available in
the Haddocks of
kan-extension,
or in the above-linked paper and blog posts by Janis and Edward. To see why
this change is important, let's look at the new implementations of some of the
core conduit functions and type classes:yield o = ConduitM $ \rest -> HaveOutput (rest ()) (return ()) o
await = ConduitM $ \f -> NeedInput (f . Just) (const $ f Nothing)
instance Monad (ConduitM i o m) where
return x = ConduitM ($ x)
ConduitM f >>= g = ConduitM $ \h -> f $ \a -> unConduitM (g a) h
instance MonadTrans (ConduitM i o) where
lift mr = ConduitM $ \rest -> PipeM (liftM rest mr)Instead of having explicit Done constructors in yield, await, and lift,
we use the continuation rest. This is the exact same transformation we were
previously relying on rewrite rules to provide. However, our rewrite rules
couldn't fire properly in a left-associated monadic binding. Now we've avoided
the whole problem!Our Monad instance also became much smaller. Notice that in order to
monadically compose, there is no longer any need to case-analyze the left hand
side, which avoids the high penalty of left association.Another interesting quirk is that our Monad instance on ConduitM no longer
requires that the base m type constructor itself be a Monad. This is nice
feature of Codensity.So that's half the story. What about categorical composition? That certainly
does require analyzing both the left and right hand structures. So don't we
lose all of our speed gains of Codensity with this? Actually, I think not.
Let's look at the code for categorical composition:ConduitM left0 =$= ConduitM right0 = ConduitM $ \rest ->
let goRight final left right =
case right of
HaveOutput p c o -> HaveOutput (recurse p) (c >> final) o
NeedInput rp rc -> goLeft rp rc final left
Done r2 -> PipeM (final >> return (rest r2))
PipeM mp -> PipeM (liftM recurse mp)
Leftover right' i -> goRight final (HaveOutput left final i) right'
where
recurse = goRight final left
goLeft rp rc final left =
case left of
HaveOutput left' final' o -> goRight final' left' (rp o)
NeedInput left' lc -> NeedInput (recurse . left') (recurse . lc)
Done r1 -> goRight (return ()) (Done r1) (rc r1)
PipeM mp -> PipeM (liftM recurse mp)
Leftover left' i -> Leftover (recurse left') i
where
recurse = goLeft rp rc final
in goRight (return ()) (left0 Done) (right0 Done)In the last line, we apply left0 and right0 to Done, which is how we
convert our Codensity version into something we can actually analyze. (This is
equivalent to applying a difference list to an empty list.) We then traverse
these values in the same way that we did in conduit 1.1 and earlier.The important difference is how we ultimately finish. The code in question is
the Done clause of the goRight's case analysis, namely:Done r2 -> PipeM (final >> return (rest r2))Notice the usage of rest, instead of what we would have previously done: used
the Done constructor. By doing this, we're immediately recreating a Codensity
version of our resulting Pipe, which allows us to only traverse our incoming
Pipe values once each, and not need to retraverse the outgoing Pipe for
future monadic binding.This trick doesn't just work for composition. There are a large number of
functions in conduit that need to analyze a Pipe, such as addCleanup and
catchC. All of them are now implemented in this same style.After implementing this change, the resulting benchmarks look much
better.
The naive implementation of monte carlo is now quite close to the low-level
version (5.28ms vs 3.44ms, as opposed to the original 15ms). Sliding vector is
also much better: the unboxed, 1000-size window benchmark went from 7.96ms to
4.05ms, vs a low-level implementation at 1.87ms.Type-indexed sequencesOne approach that I haven't tried yet is the type-indexed sequence approach
from Reflection without Remorse. I still intend to add it to my conduit
benchmark, but I'm not optimistic about it beating out Codensity. My guess is
that a sequence data type will have a higher constant factor overhead, and
based on the way composition is implemented in conduit, we won't get any
benefit from avoiding the need to transition between two representations.Edward said he's hoping to get an implementation of such a data structure into
the free package, at which point I'll update my benchmark to see how it
performs.To pursue next: streamProducer, streamConsumer, and moreWhile this round of benchmarking produced some very nice results, we're clearly
not yet at the same level as low-level code. My goal is to focus on that next.
I have some experiments going already relating to getting conduit to expose
stream fusion rules. In simple cases, I've generated a conduit-compatible API
with the same performance as vector.The sticking point is getting something which is efficient not just for
functions explicitly written in stream style, but also provides decent
performance when composed with the await/yield approach. While the latter
approach will almost certainly be slower than stream fusion, I'm hoping we can
get it to degrade to current-conduit performance levels, and allow stream
fusion to provide a significant speedup when categorically composing two
Conduits written in that style.The code discussed in this post is now available on the next-cps branch of
conduit. conduit-extra, conduit-combinators, and a number of other packages
either compile out-of-the-box with these changes, or require minor tweaks
(already implemented), so I'm hoping that this API change does not affect too
many people.As I mentioned initially, I'd like to have some time for community discussion
on this before I make this next release.https://www.fpcomplete.com/blog/2014/08/iap-speeding-up-conduitThu, 21 Aug 2014 00:00:00 +0000Continuations and Exceptions
http://neilmitchell.blogspot.com/2014/08/continuations-and-exceptions.html
Neil Mitchelltag:blogger.com,1999:blog-7094652.post-9035262543512338818Wed, 20 Aug 2014 17:39:00 +0000Maniac week postmortem
http://byorgey.wordpress.com/2014/08/19/maniac-week-postmortem/
My maniac week was a great success! First things first: here’s a time-lapse video1 (I recommend watching it at the full size, 1280×720).
Some statistics2:
Total hours of productive work: 55.5 (74 pings)
Average hours of work per day3: 11
Average hours of sleep per night: 7.8 (52 pings over 5 nights)4
Total hours not working or sleeping: 27.25 (37 pings)
Average hours not working per day: 5.5
Pages of dissertation written: 24 (157 to 181)
[I was planning to also make a visualization of my TagTime data showing when I was sleeping, working, or not-working, but putting together the video and this blog post has taken long enough already! Perhaps I’ll get around to it later.]
Overall, I would call the experiment a huge success—although as you can see, I was a full 2.5 hours per day off my target of 13.5 hours of productive work each day. What with eating, showering, making lunch, getting dinner, taking breaks (both intentional breaks as well as slacking off), and a few miscellaneous things I had to take care of like taking the car to get the tire pressure adjusted… it all adds up surprisingly fast. I think this was one of the biggest revelations for me; going into it I thought 3 hours of not-work per day was extremely generous. I now think three hours of not-work per day is probably within reach for me but would be extremely difficult, and would probably require things like planning out meals ahead of time. In any case, 55 hours of actual, focused work is still fantastic.
Some random observations/thoughts:
Having multiple projects to work on was really valuable; when I got tired of working on one thing I could often just switch to something else instead of taking an actual break. I can imagine this might be different if I were working on a big coding project (as most of the other maniac weeks have been). The big project would itself provide multiple different subtasks to work on, but more importantly, coding provides immediate feedback that is really addictive. Code a new feature, and you can actually run the new code! And it does something cool! That it didn’t do before! In contrast, when I write another page of my dissertation I just have… another page of my dissertation. I am, in fact, relatively excited about my dissertation, but it can’t provide that same sort of immediate reinforcing feedback, and it was difficult to keep going at times.
I found that having music playing really helped me get into a state of “flow”. The first few days I would play some album and then it would stop and I wouldn’t think to put on more. Later in the week I would just queue up many hours of music at a time and that worked great.
I was definitely feeling worn out by the end of the week—the last two days in particular, it felt a lot harder to get into a flow. I think I felt so good the first few days that I became overconfident—which is good to keep in mind if I do this again. The evening of 12 August was particularly bad; I just couldn’t focus. It might have been better in the long run to just go home and read a book or something; I’m just not sure how to tell in the moment when I should push through and when it’s better to cut my losses.
Blocking Facebook, turning off email notifications, etc. was really helpful. I did end up allowing myself to check email using my phone (I edited the rules a few hours before I started) and I think it was a good idea—I ended up still needing to communicate with some people, so it was very convenient and not too distracting.
Note there are two places on Tuesday afternoon where you can see the clock jump ahead by an hour or so; of course those are times when I turned off the recording. One corresponded to a time when I needed to read and write some sensitive emails; during the other, I was putting student pictures into an anki deck, and turned off the recording to avoid running afoul of FERPA.
That’s all I can think of for now; questions or comments, of course, are welcome.
Some technical notes (don’t try this at home; see http://expost.padm.us/maniactech for some recommendations on making your own timelapse). To record and create the video I used a homegrown concoction of scrot, streamer, ImageMagick, ffmpeg, with some zsh and Haskell scripts to tie it all together, and using diagrams to generate the clock and tag displays. I took about 3GB worth of raw screenshots, and it takes probably about a half hour to process all of it into a video.↩
These statistics are according to TagTime, i.e. gathered via random sampling, so there is a bit of inherent uncertainty. I leave it as an exercise for the reader to calculate the proper error bars on these times (given that I use a standard ping interval of 45 minutes).↩
Computed as 74/(171 – 9) pings multiplied by 24 hours; 9 pings occurred on Sunday morning which I did not count as part of the maniac week.↩
This is somewhat inflated by Saturday night/Sunday morning, when I both slept in and got a higher-than-average number of pings; the average excluding that night is 6.75 hours, which sounds about right.↩Brenthttp://byorgey.wordpress.com/?p=1400Tue, 19 Aug 2014 16:02:04 +0000Bluetooth on Haskell
http://ku-fpg.blogspot.com/2014/08/bluetooth-on-haskell.html
Ryan Scotttag:blogger.com,1999:blog-2897973558498152636.post-2310804394935588765Mon, 18 Aug 2014 21:04:00 +0000Fame at last!
http://jpmoresmau.blogspot.com/2014/08/fame-at-last.html
I was reading the book "Haskell Data Analysis Cookbook" when suddenly, my name pops up! Funny to see a link to a 7 year old blog entry, who knew I would go down in history for a few line of codes for a perceptron? It's deep in Chapter 7, for those interested. Maybe this is a sign that I should abandon everything else and spend my time on AI, since it's obviously where fame and riches abound! Right...JP Moresmautag:blogger.com,1999:blog-37404288.post-601332294398043705Mon, 18 Aug 2014 15:10:00 +0000Algebraic Effect Handlers, Twice
http://tomschrijvers.blogspot.com/2014/08/algebraic-effect-handlers-twice.html
I have two new papers on algebraic effect handlers:Effect Handlers in ScopeNicolas Wu, Tom Schrijvers, Ralf Hinze.To appear at the Haskell Symposium 2014.Algebraic effect handlers are a powerful means for describing effectful computations. They provide a lightweight and orthogonal technique to define and compose the syntax and semantics of different effects. The semantics is captured by handlers, which are functions that transform syntax trees.Unfortunately, the approach does not support syntax for scoping constructs, which arise in a number of scenarios. While handlers can be used to provide a limited form of scope, we demonstrate that this approach constrains the possible interactions of effects and rules out some desired semantics.This paper presents two different ways to capture scoped constructs in syntax, and shows how to achieve different semantics by reordering handlers. The first approach expresses scopes using the existing algebraic handlers framework, but has some limitations. The problem is fully solved in the second approach where we introduce higher-order syntax.Heuristics Entwined with Handlers CombinedTom Schrijvers, Nicolas Wu, Benoit Desouter, Bart Demoen.To appear at the PPDP Symposium 2014.A long-standing problem in logic programming is how to cleanly separate logic and control. While solutions exist, they fall short in one of two ways: some are too intrusive, because they require significant changes to Prolog’s underlying implementation; others are lacking a clean semantic grounding. We resolve both of these issues in this paper.We derive a solution that is both lightweight and principled. We do so by starting from a functional specification of Prolog based on monads, and extend this with the effect handlers approach to capture the dynamic search tree as syntax. Effect handlers then express heuristics in terms of tree transformations. Moreover, we can declaratively express many heuristics as trees themselves that are combined with search problems using a generic entwining handler. Our solution is not restricted to a functional model: we show how to implement this technique as a library in Prolog by means of delimited continuations.Tom Schrijverstag:blogger.com,1999:blog-5844006452378085451.post-7955732922485256223Mon, 18 Aug 2014 08:05:00 +0000Haskell development job at Standard Chartered
http://donsbot.wordpress.com/2014/08/17/haskell-development-job-at-standard-chartered/
The Strats team at Standard Chartered is hiring expert typed FP developers for Haskell dev roles in London.
This is a “front office” finance role – meaning you will work directly with traders building software to automate and improve their efficiency. The role is very development focused, and you will use Haskell for almost all tasks: data analysis, DSLs, market data publishing, databases, web services, desktop GUIs, large parallel tasks, quantitative models, solvers, everything. There may be a small amount of C++ or F# on occasion. This is a fast paced role – code you write today will be deployed within hours to hundreds of users and has to work.
You will join an expert team in London, and significant, demonstrable experience in typed FP (Haskell, OCaml, F# etc) is strongly preferred. We have more than 2 million lines of Haskell, and our own compiler. In this context we look for skill and taste in typed API design to capture and abstract over complex, messy systems.
Experience writing typed APIs to external systems such as databases, web services, pub/sub platforms is very desirable. We like working code, so if you have Hackage or github libraries, we want to see them. We also like StackOverflow answers, blog posts, academic papers, or other arenas where you can show broad FP ability.
The role requires physical presence on the trading floor in London. Remote work isn’t an option. Ideally you have some project and client management skills — you will talk to users, understand their problems, and then implement and deliver what they really need. No financial background is required.
More info about our development process is in the 2012 PADL keynote, and a 2013 HaskellCast interview.
If this sounds exciting to you, please send CVs to me – dons00 at gmail.com Tagged: community, jobs, londonDon Stewarthttp://donsbot.wordpress.com/2014/08/17/haskell-development-job-at-standard-chartered/Sun, 17 Aug 2014 12:21:00 +0000xml-to-json – new version released, constant memory usage
http://noamlewis.wordpress.com/2014/08/16/xml-to-json-new-version-released-constant-memory-usage/
I’ve released a new version (1.0.0) of xml-to-json, which aims to solve memory issues encountered when converting large XML files. The new version includes two executables: the regular (aka “classic”) version, xml-to-json, which includes the various features, and the newly added executable xml-to-json-fast, which runs with constant memory usage and can process files of arbitrary size. It does this by not validating the input xml, and by basically streaming json output as it encounters xml elements (tags) in the input. The implementation takes advantage of the cool tagsoup library for processing XML.
Check the README.md for more details. Hackage is updated. Tagged: Haskellsinelawhttp://noamlewis.wordpress.com/2014/08/16/xml-to-json-new-version-released-constant-memory-usage/Sat, 16 Aug 2014 19:24:24 +0000Citation, Recitation, and Resuscitation
http://winterkoninkje.dreamwidth.org/98668.html
http://winterkoninkje.dreamwidth.org/98668.htmlSat, 16 Aug 2014 00:26:08 +0000[leuzkdqp] Units
http://kenta.blogspot.com/2014/08/leuzkdqp-units.html
Some notes on dimensional quantities and type systems:Addition, subtraction, assignment, and comparison should fail if the units are incompatible.Multiplication, division, and exponentiation by a rational dimensionless power always work. These operations assume commutativity.Distinguishing addition from multiplication vaguely reminds me of the difference between floating point and fixed point. Unit conversion: a quantity can be read in one set of units then shown in another set. Abstractly it does not exist as a real number in either.Converting between different families of units requires exact linear algebra on rational numbers.In some functions, units pass through just fine. Others, e.g., trigonometric, require dimensionless numbers.Not all dimensionless numbers are the same unit: adding an angle to the fine structure constant seems as meaningless as adding a foot to a volt. But multiplying them could be reasonable.One can take any compound type with a dimensionless internal type and turn it into a new compound type with that internal type having units. But should this be considered a "new" type? Of course, this is useless unless the internal type defined arithmetic operations: "True" miles per hour seems meaningless.Creating such compound types is analogous to the "function" idea above by viewing a compound type as a data constructor function of a base type. Constructors do not do operations which can fail, like addition, so the function always succeeds.Creating a list populated by successive time derivatives of position seems like a useful thing to be able do. But every element of the list will have different dimensions, which violates the naive idea of a list being items all of the same type.We would like to catch all dimensionality errors at compile time, but this may not be possible. The extreme example would be implementing the "units" program. Is that an anomaly?It is OK to add a vector to a coordinate (which has an origin) but not a coordinate to a coordinate. There seems to be a concept of units and "delta" units.It is OK to subtract coordinates to get a delta.Maybe multiplying coordinates is also illegal.Coordinates versus vectors, units versus delta units, seems like an orthogonal problem to "regular" units. Separate them in software so one can use either concept independently, for example, distinguishing dimensionless from delta dimensionless.Go further than just "delta" to distinguish first, second, etc., differences.An X component and Y component of a vector might have the same units, say, length, but one wants to avoid adding them, as this is typically a typo. But sometimes, for rotations, one does add them.A Haskell Wiki page: http://www.haskell.org/haskellwiki/Physical_units. The units package seems promising.Kentag:blogger.com,1999:blog-6757805.post-1823126804177339731Fri, 15 Aug 2014 18:39:00 +0000Announcing Stackage Server
https://www.fpcomplete.com/blog/2014/08/announcing-stackage-server
A New ServiceA couple months ago I made a post explaining
Stackage server,
its motivations and use-cases, and that it would be available in the
coming months. It's now officially available in beta!Stackage server.As a quick recap: the essence of Stackage is that rather than
publishing at the granularity of packages, like Hackage, it
publishes at the granularity of package sets: Either everything
builds together, or nothing is published until it does. We call these
published things “snapshots.”Note: A snapshot is an exact image that can be reproduced at any time. With
the snapshot's digest hash, you can download and install the same
index and install packages based on that index all the
time. Subsequently generated snapshots have no effect on previous
ones.I've been using it for a couple months for every project I've worked
on, private and public. It's perfect for application developers and
teams who want to share a common always-building package set. Provided
you're using one of
the 500+ packages we're publishing in the snapshots,
there will always be a build plan for the package you want to use in
your project.And if your library is in Stackage, as explained in
the previous post,
you will get a heads up on Github if your updates or other people's
updates cause a build failure related to your library.How it WorksSnapshots are built every couple days. It takes about 16 hours to
complete a build. You can view the build progress at
jenkins.stackage.org.There are two types of snapshots published by FP Complete:Exclusive: this excludes packages not specified in
the Stackage configuration. This
means anything that you try to install from this snapshot will have
a build plan.Inclusive: this includes Hackage packages not known to build. If
you try to install a package not tracked by Stackage, it may or may
not build.You can use whichever suites your needs. If you want everything to
always build, the former is an attractive choice. If you need to use a
package not currently on Stackage, the latter choice makes sense.Try it Right NowChoose a snapshot. Each snapshot applies to a specific GHC
version. For example,
the latest (as of writing) GHC 7.8 build. You'll
see something like this:To use, copy the following to your ~/.cabal/config:remote-repo: stackage:http://www.stackage.org/stackage/604a3649795771f6dd8b80bfd4eeb748e1d97599Note: Remove or comment out any existing remote-repo line.Run the following to update your packages:$ cabal updateIf you already have installed some packages, it's better to clear out
your package set. See
this page in the FAQ
for how to do that.SandboxesHow does this interact with sandboxes? Good question. Here's the
rundown:hsenv: Yes, works fine. Edit the .hsenv/cabal/config file and off
you go.cabal sandbox: Not yet! There is
an open issue about this. But
I have tried cabal sandboxes inside hsenv, which worked.We've added this to the
FAQ on
the wiki. Contributions to this wiki page are welcome!FeedbackPersonally, I'm very satisfied with this service so far. I just use my
existing tools with a different remote-repo.Others familiar with Nix have asked how they
compare. They are very similar solutions in terms of versioning and
curation (although Stackage has full-time maintenance); the main
advantage to Stackage is that it just uses existing tools, so you
don't have to learn a new tool and way of working to have a better
user experience.We'd like feedback on a few points:Is the inclusive/exclusive separation useful?Is the process of using Stackage in an existing system (clearing the
package set and starting fresh) easy?Should we establish a convention for storing Stackage snapshot
hashes in projects source-tracking repositories?And any other feedback you come up with while using it.Stackage for businessesAs part of my last announcement for Stackage I mentioned there will
also be custom services for businesses looking to build their
development platform on Stackage.These commercial services include:Premium support - FP Complete will quickly respond and make
improvements or fixes to the public Stackage server as they need to
happen.Private snapshots with premium support - very helpful for commercial
users looking to add proprietary or custom libraries.Validated pre-compiled build images based on public or private
snapshots. These can be used on developer systems or automated
build systems.Packaged Jenkins server using the pre-compiled build images.All these additional commercial services are meant to be helpful
add-ons and we look forward to hearing more about what features you
think would be beneficial to you. For more information email us at:
sales@fpcomplete.comhttps://www.fpcomplete.com/blog/2014/08/announcing-stackage-serverThu, 14 Aug 2014 00:00:00 +0000Safe library - generalising functions
http://neilmitchell.blogspot.com/2014/08/safe-library-generalising-functions.html
Neil Mitchelltag:blogger.com,1999:blog-7094652.post-1488894573406021838Tue, 12 Aug 2014 21:08:00 +0000Similarities: Monoid, MonadPlus, Category
http://unknownparallel.wordpress.com/2014/08/11/similarities-monoid-monadplus-category/
Dan Burtonhttp://unknownparallel.wordpress.com/?p=163Mon, 11 Aug 2014 21:41:28 +0000Big Data Engineer / Data Scientist at Recruit IT (Full-time)
http://functionaljobs.com/jobs/8731-big-data-engineer-data-scientist-at-recruit-it
urn:uuid:f343ec12-9504-c16b-65ad-273b9f3e0891Mon, 11 Aug 2014 00:32:50 +0000Equational reasoning at scale
http://www.haskellforall.com/2014/07/equational-reasoning-at-scale.html
Haskell programmers care about the correctness of their software and they specify correctness conditions in the form of equations that their code must satisfy. They can then verify the correctness of these equations using equational reasoning to prove that the abstractions they build are sound. To an outsider this might seem like a futile, academic exercise: proving the correctness of small abstractions is difficult, so what hope do we have to prove larger abstractions correct? This post explains how to do precisely that: scale proofs to large and complex abstractions.Purely functional programming uses composition to scale programs, meaning that:We build small components that we can verify correct in isolationWe compose smaller components into larger componentsIf you saw "components" and thought "functions", think again! We can compose things that do not even remotely resemble functions, such as proofs! In fact, Haskell programmers prove large-scale properties exactly the same way we build large-scale programs:We build small proofs that we can verify correct in isolationWe compose smaller proofs into larger proofsThe following sections illustrate in detail how this works in practice, using Monoids as the running example. We will prove the Monoid laws for simple types and work our way up to proving the Monoid laws for much more complex types. Along the way we'll learn how to keep the proof complexity flat as the types grow in size.MonoidsHaskell's Prelude provides the following Monoid type class:class Monoid m where mempty :: m mappend :: m -> m -> m-- An infix operator equivalent to `mappend`(<>) :: Monoid m => m -> m -> mx <> y = mappend x y... and all Monoid instances must obey the following two laws:mempty <> x = x -- Left identityx <> mempty = x -- Right identity(x <> y) <> z = x <> (y <> z) -- AssociativityFor example, Ints form a Monoid:-- See "Appendix A" for some caveatsinstance Monoid Int where mempty = 0 mappend = (+)... and the Monoid laws for Ints are just the laws of addition:0 + x = xx + 0 = x(x + y) + z = x + (y + z)Now we can use (<>) and mempty instead of (+) and 0:>>> 4 <> 26>>> 5 <> mempty <> 510This appears useless at first glance. We already have (+) and 0, so why are we using the Monoid operations?Extending MonoidsWell, what if I want to combine things other than Ints, like pairs of Ints. I want to be able to write code like this:>>> (1, 2) <> (3, 4)(4, 6)Well, that seems mildly interesting. Let's try to define a Monoid instance for pairs of Ints:instance Monoid (Int, Int) where mempty = (0, 0) mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)Now my wish is true and I can "add" binary tuples together using (<>) and mempty:>>> (1, 2) <> (3, 4)(4, 6)>>> (1, 2) <> (3, mempty) <> (mempty, 4)(4, 6)>>> (1, 2) <> mempty <> (3, 4)(4, 6)However, I still haven't proven that this new Monoid instance obeys the Monoid laws. Fortunately, this is a very simple proof.I'll begin with the first Monoid law, which requires that:mempty <> x = xWe will begin from the left-hand side of the equation and try to arrive at the right-hand side by substituting equals-for-equals (a.k.a. "equational reasoning"):-- Left-hand side of the equationmempty <> x-- x <> y = mappend x y= mappend mempty x-- `mempty = (0, 0)`= mappend (0, 0) x-- Define: x = (xL, xR), since `x` is a tuple= mappend (0, 0) (xL, xR)-- mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)= (0 + xL, 0 + xR)-- 0 + x = x= (xL, xR)-- x = (xL, xR)= xThe proof for the second Monoid law is symmetric-- Left-hand side of the equation= x <> mempty-- x <> y = mappend x y= mappend x mempty-- mempty = (0, 0)= mappend x (0, 0)-- Define: x = (xL, xR), since `x` is a tuple= mappend (xL, xR) (0, 0)-- mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)= (xL + 0, xR + 0)-- x + 0 = x= (xL, xR)-- x = (xL, xR)= xThe third Monoid law requires that (<>) is associative:(x <> y) <> z = x <> (y <> z)Again I'll begin from the left side of the equation:-- Left-hand side(x <> y) <> z-- x <> y = mappend x y= mappend (mappend x y) z-- x = (xL, xR)-- y = (yL, yR)-- z = (zL, zR)= mappend (mappend (xL, xR) (yL, yR)) (zL, zR)-- mappend (x1, y1) (x2 , y2) = (x1 + x2, y1 + y2)= mappend (xL + yL, xR + yR) (zL, zR)-- mappend (x1, y1) (x2 , y2) = (x1 + x2, y1 + y2)= mappend ((xL + yL) + zL, (xR + yR) + zR)-- (x + y) + z = x + (y + z)= mappend (xL + (yL + zL), xR + (yR + zR))-- mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)= mappend (xL, xR) (yL + zL, yR + zR)-- mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)= mappend (xL, xR) (mappend (yL, yR) (zL, zR))-- x = (xL, xR)-- y = (yL, yR)-- z = (zL, zR)= mappend x (mappend y z)-- x <> y = mappend x y= x <> (y <> z)That completes the proof of the three Monoid laws, but I'm not satisfied with these proofs.Generalizing proofsI don't like the above proofs because they are disposable, meaning that I cannot reuse them to prove other properties of interest. I'm a programmer, so I loathe busy work and unnecessary repetition, both for code and proofs. I would like to find a way to generalize the above proofs so that I can use them in more places.We improve proof reuse in the same way that we improve code reuse. To see why, consider the following sort function:sort :: [Int] -> [Int]This sort function is disposable because it only works on Ints. For example, I cannot use the above function to sort a list of Doubles.Fortunately, programming languages with generics let us generalize sort by parametrizing sort on the element type of the list:sort :: Ord a => [a] -> [a]That type says that we can call sort on any list of as, so long as the type a implements the Ord type class (a comparison interface). This works because sort doesn't really care whether or not the elements are Ints; sort only cares if they are comparable.Similarly, we can make the proof more "generic". If we inspect the proof closely, we will notice that we don't really care whether or not the tuple contains Ints. The only Int-specific properties we use in our proof are:0 + x = xx + 0 = x(x + y) + z = x + (y + z)However, these properties hold true for all Monoids, not just Ints. Therefore, we can generalize our Monoid instance for tuples by parametrizing it on the type of each field of the tuple:instance (Monoid a, Monoid b) => Monoid (a, b) where mempty = (mempty, mempty) mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)The above Monoid instance says that we can combine tuples so long as we can combine their individual fields. Our original Monoid instance was just a special case of this instance where both the a and b types are Ints.Note: The mempty and mappend on the left-hand side of each equation are for tuples. The memptys and mappends on the right-hand side of each equation are for the types a and b. Haskell overloads type class methods like mempty and mappend to work on any type that implements the Monoid type class, and the compiler distinguishes them by their inferred types.We can similarly generalize our original proofs, too, by just replacing the Int-specific parts with their more general Monoid counterparts.Here is the generalized proof of the left identity law:-- Left-hand side of the equationmempty <> x-- x <> y = mappend x y= mappend mempty x-- `mempty = (mempty, mempty)`= mappend (mempty, mempty) x-- Define: x = (xL, xR), since `x` is a tuple= mappend (mempty, mempty) (xL, xR)-- mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)= (mappend mempty xL, mappend mempty xR)-- Monoid law: mappend mempty x = x= (xL, xR)-- x = (xL, xR)= x... the right identity law:-- Left-hand side of the equation= x <> mempty-- x <> y = mappend x y= mappend x mempty-- mempty = (mempty, mempty)= mappend x (mempty, mempty)-- Define: x = (xL, xR), since `x` is a tuple= mappend (xL, xR) (mempty, mempty)-- mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)= (mappend xL mempty, mappend xR mempty)-- Monoid law: mappend x mempty = x= (xL, xR)-- x = (xL, xR)= x... and the associativity law:-- Left-hand side(x <> y) <> z-- x <> y = mappend x y= mappend (mappend x y) z-- x = (xL, xR)-- y = (yL, yR)-- z = (zL, zR)= mappend (mappend (xL, xR) (yL, yR)) (zL, zR)-- mappend (x1, y1) (x2 , y2) = (mappend x1 x2, mappend y1 y2)= mappend (mappend xL yL, mappend xR yR) (zL, zR)-- mappend (x1, y1) (x2 , y2) = (mappend x1 x2, mappend y1 y2)= (mappend (mappend xL yL) zL, mappend (mappend xR yR) zR)-- Monoid law: mappend (mappend x y) z = mappend x (mappend y z)= (mappend xL (mappend yL zL), mappend xR (mappend yR zR))-- mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)= mappend (xL, xR) (mappend yL zL, mappend yR zR)-- mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)= mappend (xL, xR) (mappend (yL, yR) (zL, zR))-- x = (xL, xR)-- y = (yL, yR)-- z = (zL, zR)= mappend x (mappend y z)-- x <> y = mappend x y= x <> (y <> z)This more general Monoid instance lets us stick any Monoids inside the tuple fields and we can still combine the tuples. For example, lists form a Monoid:-- Exercise: Prove the monoid laws for listsinstance Monoid [a] where mempty = [] mappend = (++)... so we can stick lists inside the right field of each tuple and still combine them:>>> (1, [2, 3]) <> (4, [5, 6])(5, [2, 3, 5, 6])>>> (1, [2, 3]) <> (4, mempty) <> (mempty, [5, 6])(5, [2, 3, 5, 6])>>> (1, [2, 3]) <> mempty <> (4, [5, 6])(5, [2, 3, 5, 6])Why, we can even stick yet another tuple inside the right field and still combine them:>>> (1, (2, 3)) <> (4, (5, 6))(5, (7, 9))We can try even more exotic permutations and everything still "just works":>>> ((1,[2, 3]), ([4, 5], 6)) <> ((7, [8, 9]), ([10, 11), 12)((8, [2, 3, 8, 9]), ([4, 5, 10, 11], 18))This is our first example of a "scalable proof". We began from three primitive building blocks:Int is a Monoid[a] is a Monoid(a, b) is a Monoid if a is a Monoid and b is a Monoid... and we connected those three building blocks to assemble a variety of new Monoid instances. No matter how many tuples we nest the result is still a Monoid and obeys the Monoid laws. We don't need to re-prove the Monoid laws every time we assemble a new permutation of these building blocks.However, these building blocks are still pretty limited. What other useful things can we combine to build new Monoids?IOWe're so used to thinking of Monoids as data, so let's define a new Monoid instance for something entirely un-data-like:-- See "Appendix A" for some caveatsinstance Monoid b => Monoid (IO b) where mempty = return mempty mappend io1 io2 = do a1 <- io1 a2 <- io2 return (mappend a1 a2)The above instance says: "If a is a Monoid, then an IO action that returns an a is also a Monoid". Let's test this using the getLine function from the Prelude:-- Read one line of input from stdingetLine :: IO StringString is a Monoid, since a String is just a list of characters, so we should be able to mappend multiple getLine statements together. Let's see what happens:>>> getLine -- Reads one line of inputHello "Hello">>> getLine <> getLineABC DEF "ABCDEF">>> getLine <> getLine <> getLine1 23 456 "123456"Neat! When we combine multiple commands we combine their effects and their results.Of course, we don't have to limit ourselves to reading strings. We can use readLn from the Prelude to read in anything that implements the Read type class:-- Parse a `Read`able value from one line of stdinreadLn :: Read a => IO aAll we have to do is tell the compiler which type a we intend to Read by providing a type signature:>>> readLn :: IO (Int, Int)(1, 2) (1 ,2)>>> readLn <> readLn :: IO (Int, Int)(1,2) (3,4) (4,6)>>> readLn <> readLn <> readLn :: IO (Int, Int)(1,2) (3,4) (5,6) (9,12)This works because:Int is a MonoidTherefore, (Int, Int) is a MonoidTherefore, IO (Int, Int) is a MonoidOr let's flip things around and nest IO actions inside of a tuple:>>> let ios = (getLine, readLn) :: (IO String, IO (Int, Int))>>> let (getLines, readLns) = ios <> ios <> ios>>> getLines1 23 456 123456>>> readLns(1,2) (3,4) (5,6) (9,12)We can very easily reason that the type (IO String, IO (Int, Int)) obeys the Monoid laws because:String is a MonoidIf String is a Monoid then IO String is also a MonoidInt is a MonoidIf Int is a Monoid, then (Int, Int) is also a `MonoidIf (Int, Int) is a Monoid, then IO (Int, Int) is also a MonoidIf IO String is a Monoid and IO (Int, Int) is a Monoid, then (IO String, IO (Int, Int)) is also a MonoidHowever, we don't really have to reason about this at all. The compiler will automatically assemble the correct Monoid instance for us. The only thing we need to verify is that the primitive Monoid instances obey the Monoid laws, and then we can trust that any larger Monoid instance the compiler derives will also obey the Monoid laws.The Unit MonoidHaskell Prelude also provides the putStrLn function, which echoes a String to standard output with a newline:putStrLn :: String -> IO ()Is putStrLn combinable? There's only one way to find out!>>> putStrLn "Hello" <> putStrLn "World"HelloWorldInteresting, but why does that work? Well, let's look at the types of the commands we are combining:putStrLn "Hello" :: IO ()putStrLn "World" :: IO ()Well, we said that IO b is a Monoid if b is a Monoid, and b in this case is () (pronounced "unit"), which you can think of as an "empty tuple". Therefore, () must form a Monoid of some sort, and if we dig into Data.Monoid, we will discover the following Monoid instance:-- Exercise: Prove the monoid laws for `()`instance Monoid () where mempty = () mappend () () = ()This says that empty tuples form a trivial Monoid, since there's only one possible value (ignoring bottom) for an empty tuple: (). Therefore, we can derive that IO () is a Monoid because () is a Monoid.FunctionsAlright, so we can combine putStrLn "Hello" with putStrLn "World", but can we combine naked putStrLn functions?>>> (putStrLn <> putStrLn) "Hello"HelloHelloWoah, how does that work?We never wrote a Monoid instance for the type String -> IO (), yet somehow the compiler magically accepted the above code and produced a sensible result.This works because of the following Monoid instance for functions:instance Monoid b => Monoid (a -> b) where mempty = \_ -> mempty mappend f g = \a -> mappend (f a) (g a)This says: "If b is a Monoid, then any function that returns a b is also a Monoid".The compiler then deduced that:() is a MonoidIf () is a Monoid, then IO () is also a MonoidIf IO () is a Monoid then String -> IO () is also a MonoidThe compiler is a trusted friend, deducing Monoid instances we never knew existed.Monoid pluginsNow we have enough building blocks to assemble a non-trivial example. Let's build a key logger with a Monoid-based plugin system.The central scaffold of our program is a simple main loop that echoes characters from standard input to standard output:main = do hSetEcho stdin False forever $ do c <- getChar putChar cHowever, we would like to intercept key strokes for nefarious purposes, so we will slightly modify this program to install a handler at the beginning of the program that we will invoke on every incoming character:install :: IO (Char -> IO ())install = ???main = do hSetEcho stdin False handleChar <- install forever $ do c <- getChar handleChar c putChar cNotice that the type of install is exactly the correct type to be a Monoid:() is a MonoidTherefore, IO () is also a MonoidTherefore Char -> IO () is also a MonoidTherefore IO (Char -> IO ()) is also a MonoidTherefore, we can combine key logging plugins together using Monoid operations. Here is one such example:type Plugin = IO (Char -> IO ())logTo :: FilePath -> PluginlogTo filePath = do handle <- openFile filePath WriteMode hSetBuffering handle NoBuffering return (hPutChar handle)main = do hSetEcho stdin False handleChar <- logTo "file1.txt" <> logTo "file2.txt" forever $ do c <- getChar handleChar c putChar cNow, every key stroke will be recorded to both file1.txt and file2.txt. Let's confirm that this works as expected:$ ./loggerTest ABC 42 $ cat file1.txtTestABC42$ cat file2.txtTestABC42Try writing your own Plugins and mixing them in with (<>) to see what happens. "Appendix C" contains the complete code for this section so you can experiment with your own Plugins.ApplicativesNotice that I never actually proved the Monoid laws for the following two Monoid instances:instance Monoid b => Monoid (a -> b) where mempty = \_ -> mempty mappend f g = \a -> mappend (f a) (g a)instance Monoid a => Monoid (IO a) where mempty = return mempty mappend io1 io2 = do a1 <- io1 a2 <- io2 return (mappend a1 a2)The reason why is that they are both special cases of a more general pattern. We can detect the pattern if we rewrite both of them to use the pure and liftA2 functions from Control.Applicative:import Control.Applicative (pure, liftA2)instance Monoid b => Monoid (a -> b) where mempty = pure mempty mappend = liftA2 mappendinstance Monoid b => Monoid (IO b) where mempty = pure mempty mappend = liftA2 mappendThis works because both IO and functions implement the following Applicative interface:class Functor f => Applicative f where pure :: a -> f a (<*>) :: f (a -> b) -> f a -> f b-- Lift a binary function over the functor `f`liftA2 :: Applicative f => (a -> b -> c) -> f a -> f b -> f cliftA2 f x y = (pure f <*> x) <*> y... and all Applicative instances must obey several Applicative laws:pure id <*> v = v((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)pure f <*> pure x = pure (f x)u <*> pure x = pure (\f -> f y) <*> uThese laws may seem a bit adhoc, but this paper explains that you can reorganize the Applicative class to this equivalent type class:class Functor f => Monoidal f where unit :: f () (#) :: f a -> f b -> f (a, b)Then the corresponding laws become much more symmetric:fmap snd (unit # x) = x -- Left identityfmap fst (x # unit) = x -- Right identityfmap assoc ((x # y) # z) = x # (y # z) -- Associativity where assoc ((a, b), c) = (a, (b, c))fmap (f *** g) (x # y) = fmap f x # fmap g y -- Naturality where (f *** g) (a, b) = (f a, g b)I personally prefer the Monoidal formulation, but you go to war with the army you have, so we will use the Applicative type class for this post.All Applicatives possess a very powerful property: they can all automatically lift Monoid operations using the following instance:instance (Applicative f, Monoid b) => Monoid (f b) where mempty = pure mempty mappend = liftA2 mappendThis says: "If f is an Applicative and b is a Monoid, then f b is also a Monoid." In other words, we can automatically extend any existing Monoid with some new feature f and get back a new Monoid.Note: The above instance is bad Haskell because it overlaps with other type class instances. In practice we have to duplicate the above code once for each Applicative. Also, for some Applicatives we may want a different Monoid instance.We can prove that the above instance obeys the Monoid laws without knowing anything about f and b, other than the fact that f obeys the Applicative laws and b obeys the Applicative laws. These proofs are a little long, so I've included them in Appendix B.Both IO and functions implement the Applicative type class:instance Applicative IO where pure = return iof <*> iox = do f <- iof x <- iox return (f x)instance Applicative ((->) a) where pure x = \_ -> x kf <*> kx = \a -> let f = kf a x = kx a in f xThis means that we can kill two birds with one stone. Every time we prove the Applicative laws for some functor F:instance Applicative F where ...... we automatically prove that the following Monoid instance is correct for free:instance Monoid b => Monoid (F b) where mempty = pure mempty mappend = liftA2 mappendIn the interest of brevity, I will skip the proofs of the Applicative laws, but I may cover them in a subsequent post.The beauty of Applicative Functors is that every new Applicative instance we discover adds a new building block to our Monoid toolbox, and Haskell programmers have already discovered lots of Applicative Functors.Revisiting tuplesOne of the very first Monoid instances we wrote was:instance (Monoid a, Monoid b) => Monoid (a, b) where mempty = (mempty, mempty) mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)Check this out:instance (Monoid a, Monoid b) => Monoid (a, b) where mempty = pure mempty mappend = liftA2 mappendThis Monoid instance is yet another special case of the Applicative pattern we just covered!This works because of the following Applicative instance in Control.Applicative:instance Monoid a => Applicative ((,) a) where pure b = (mempty, b) (a1, f) <*> (a2, x) = (mappend a1 a2, f x)This instance obeys the Applicative laws (proof omitted), so our Monoid instance for tuples is automatically correct, too.Composing applicativesIn the very first section I wrote:Haskell programmers prove large-scale properties exactly the same way we build large-scale programs:We build small proofs that we can verify correct in isolationWe compose smaller proofs into larger proofsI don't like to use the word compose lightly. In the context of category theory, compose has a very rigorous meaning, indicating composition of morphisms in some category. This final section will show that we can actually compose Monoid proofs in a very rigorous sense of the word.We can define a category of Monoid proofs:Objects are types and their associated Monoid proofsMorphisms are Applicative FunctorsThe identity morphism is the Identity applicativeThe composition operation is composition of Applicative FunctorsThe category laws are isomorphisms instead of equalitiesSo in our Plugin example, we began on the proof that () was a Monoid and then composed three Applicative morphisms to prove that Plugin was a Monoid. I will use the following diagram to illustrate this:+-----------------------+| || Legend: * = Object || || v || | = Morphism || v || |+-----------------------+* `()` is a `Monoid`v| IOv* `IO ()` is a `Monoid`v| ((->) String)v* `String -> IO ()` is a `Monoid`v| IOv* `IO (String -> IO ())` (i.e. `Plugin`) is a `Monoid`Therefore, we were literally composing proofs together.ConclusionYou can equationally reason at scale by decomposing larger proofs into smaller reusable proofs, the same way we decompose programs into smaller and more reusable components. There is no limit to how many proofs you can compose together, and therefore there is no limit to how complex of a program you can tame using equational reasoning.This post only gave one example of composing proofs within Haskell. The more you learn the language, the more examples of composable proofs you will encounter. Another common example is automatically deriving Monad proofs by composing monad transformers.As you learn Haskell, you will discover that the hard part is not proving things. Rather, the challenge is learning how to decompose proofs into smaller proofs and you can cultivate this skill by studying category theory and abstract algebra. These mathematical disciplines teach you how to extract common and reusable proofs and patterns from what appears to be disposable and idiosyncratic code.Appendix A - Missing Monoid instancesThese Monoid instance from this post do not actually appear in the Haskell standard library:instance Monoid b => Monoid (IO b)instance Monoid IntThe first instance was recently proposed here on the Glasgow Haskell Users mailing list. However, in the short term you can work around it by writing your own Monoid instances by hand just by inserting a sufficient number of pures and liftA2s.For example, suppose we wanted to provide a Monoid instance for Plugin. We would just newtype Plugin and write:newtype Plugin = Plugin { install :: IO (String -> IO ()) }instance Monoid Plugin where mempty = Plugin (pure (pure (pure mempty))) mappend (Plugin p1) (Plugin p2) = Plugin (liftA2 (liftA2 (liftA2 mappend)) p1 p2)This is what the compiler would have derived by hand.Alternatively, you could define an orphan Monoid instance for IO, but this is generally frowned upon.There is no default Monoid instance for Int because there are actually two possible instances to choose from:-- Alternative #1instance Monoid Int where mempty = 0 mappend = (+)-- Alternative #2instance Monoid Int where mempty = 1 mappend = (*)So instead, Data.Monoid sidesteps the issue by providing two newtypes to distinguish which instance we prefer:newtype Sum a = Sum { getSum :: a }instance Num a => Monoid (Sum a)newtype Product a = Product { getProduct :: a}instance Num a => Monoid (Product a)An even better solution is to use a semiring, which allows two Monoid instances to coexist for the same type. You can think of Haskell's Num class as an approximation of the semiring class:class Num a where fromInteger :: Integer -> a (+) :: a -> a -> a (*) :: a -> a -> a -- ... and other operations unrelated to semiringsNote that we can also lift the Num class over the Applicative class, exactly the same way we lifted the Monoid class. Here's the code:instance (Applicative f, Num a) => Num (f a) where fromInteger n = pure (fromInteger n) (+) = liftA2 (+) (*) = liftA2 (*) (-) = liftA2 (-) negate = fmap negate abs = fmap abs signum = fmap signumThis lifting guarantees that if a obeys the semiring laws then so will f a. Of course, you will have to specialize the above instance to every concrete Applicative because otherwise you will get overlapping instances.Appendix BThese are the proofs to establish that the following Monoid instance obeys the Monoid laws:instance (Applicative f, Monoid b) => Monoid (f b) where mempty = pure mempty mappend = liftA2 mappend... meaning that if f obeys the Applicative laws and b obeys the Monoid laws, then f b also obeys the Monoid laws.Proof of the left identity law:mempty <> x-- x <> y = mappend x y= mappend mempty x-- mappend = liftA2 mappend= liftA2 mappend mempty x-- mempty = pure mempty= liftA2 mappend (pure mempty) x-- liftA2 f x y = (pure f <*> x) <*> y= (pure mappend <*> pure mempty) <*> x-- Applicative law: pure f <*> pure x = pure (f x)= pure (mappend mempty) <*> x-- Eta conversion= pure (\a -> mappend mempty a) <*> x-- mappend mempty x = x= pure (\a -> a) <*> x-- id = \x -> x= pure id <*> x-- Applicative law: pure id <*> v = v= xProof of the right identity law:x <> mempty = x-- x <> y = mappend x y= mappend x mempty-- mappend = liftA2 mappend= liftA2 mappend x mempty-- mempty = pure mempty= liftA2 mappend x (pure mempty)-- liftA2 f x y = (pure f <*> x) <*> y= (pure mappend <*> x) <*> pure mempty-- Applicative law: u <*> pure y = pure (\f -> f y) <*> u= pure (\f -> f mempty) <*> (pure mappend <*> x)-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)= ((pure (.) <*> pure (\f -> f mempty)) <*> pure mappend) <*> x-- Applicative law: pure f <*> pure x = pure (f x)= (pure ((.) (\f -> f mempty)) <*> pure mappend) <*> x-- Applicative law : pure f <*> pure x = pure (f x)= pure ((.) (\f -> f mempty) mappend) <*> x-- `(.) f g` is just prefix notation for `f . g`= pure ((\f -> f mempty) . mappend) <*> x-- f . g = \x -> f (g x)= pure (\x -> (\f -> f mempty) (mappend x)) <*> x-- Apply the lambda= pure (\x -> mappend x mempty) <*> x-- Monoid law: mappend x mempty = x= pure (\x -> x) <*> x-- id = \x -> x= pure id <*> x-- Applicative law: pure id <*> v = v= xProof of the associativity law:(x <> y) <> z-- x <> y = mappend x y= mappend (mappend x y) z-- mappend = liftA2 mappend= liftA2 mappend (liftA2 mappend x y) z-- liftA2 f x y = (pure f <*> x) <*> y= (pure mappend <*> ((pure mappend <*> x) <*> y)) <*> z-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)= (((pure (.) <*> pure mappend) <*> (pure mappend <*> x)) <*> y) <*> z-- Applicative law: pure f <*> pure x = pure (f x)= ((pure f <*> (pure mappend <*> x)) <*> y) <*> z where f = (.) mappend-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)= ((((pure (.) <*> pure f) <*> pure mappend) <*> x) <*> y) <*> z where f = (.) mappend-- Applicative law: pure f <*> pure x = pure (f x)= (((pure f <*> pure mappend) <*> x) <*> y) <*> z where f = (.) ((.) mappend)-- Applicative law: pure f <*> pure x = pure (f x)= ((pure f <*> x) <*> y) <*> z where f = (.) ((.) mappend) mappend-- (.) f g = f . g= ((pure f <*> x) <*> y) <*> z where f = ((.) mappend) . mappend-- Eta conversion= ((pure f <*> x) <*> y) <*> z where f x = (((.) mappend) . mappend) x-- (f . g) x = f (g x)= ((pure f <*> x) <*> y) <*> z where f x = (.) mappend (mappend x)-- (.) f g = f . g= ((pure f <*> x) <*> y) <*> z where f x = mappend . (mappend x)-- Eta conversion= ((pure f <*> x) <*> y) <*> z where f x y = (mappend . (mappend x)) y-- (f . g) x = f (g x)= ((pure f <*> x) <*> y) <*> z where f x y = mappend (mappend x y)-- Eta conversion= ((pure f <*> x) <*> y) <*> z where f x y z = mappend (mappend x y) z-- Monoid law: mappend (mappend x y) z = mappend x (mappend y z)= ((pure f <*> x) <*> y) <*> z where f x y z = mappend x (mappend y z)-- (f . g) x = f (g x)= ((pure f <*> x) <*> y) <*> z where f x y z = (mappend x . mappend y) z-- Eta conversion= ((pure f <*> x) <*> y) <*> z where f x y = mappend x . mappend y-- (.) f g = f . g= ((pure f <*> x) <*> y) <*> z where f x y = (.) (mappend x) (mappend y)-- (f . g) x = f= ((pure f <*> x) <*> y) <*> z where f x y = (((.) . mappend) x) (mappend y)-- (f . g) x = f (g x)= ((pure f <*> x) <*> y) <*> z where f x y = ((((.) . mappend) x) . mappend) y-- Eta conversion= ((pure f <*> x) <*> y) <*> z where f x = (((.) . mappend) x) . mappend-- (.) f g = f . g= ((pure f <*> x) <*> y) <*> z where f x = (.) (((.) . mappend) x) mappend-- Lambda abstraction= ((pure f <*> x) <*> y) <*> z where f x = (\k -> k mappend) ((.) (((.) . mappend) x))-- (f . g) x = f (g x)= ((pure f <*> x) <*> y) <*> z where f x = (\k -> k mappend) (((.) . ((.) . mappend)) x)-- Eta conversion= ((pure f <*> x) <*> y) <*> z where f = (\k -> k mappend) . ((.) . ((.) . mappend))-- (.) f g = f . g= ((pure f <*> x) <*> y) <*> z where f = (.) (\k -> k mappend) ((.) . ((.) . mappend))-- Applicative law: pure f <*> pure x = pure (f x)= (((pure g <*> pure f) <*> x) <*> y) <*> z where g = (.) (\k -> k mappend) f = (.) . ((.) . mappend)-- Applicative law: pure f <*> pure x = pure (f x)= ((((pure (.) <*> pure (\k -> k mappend)) <*> pure f) <*> x) <*> y) <*> z where f = (.) . ((.) . mappend)-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)= ((pure (\k -> k mappend) <*> (pure f <*> x)) <*> y) <*> z where f = (.) . ((.) . mappend)-- u <*> pure y = pure (\k -> k y) <*> u= (((pure f <*> x) <*> pure mappend) <*> y) <*> z where f = (.) . ((.) . mappend)-- (.) f g = f . g= (((pure f <*> x) <*> pure mappend) <*> y) <*> z where f = (.) (.) ((.) . mappend)-- Applicative law: pure f <*> pure x = pure (f x)= ((((pure g <*> pure f) <*> x) <*> pure mappend) <*> y) <*> z where g = (.) (.) f = (.) . mappend-- Applicative law: pure f <*> pure x = pure (f x)= (((((pure (.) <*> pure (.)) <*> pure f) <*> x) <*> pure mappend) <*> y) <*> z where f = (.) . mappend-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)= (((pure (.) <*> (pure f <*> x)) <*> pure mappend) <*> y) <*> z where f = (.) . mappend-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)= ((pure f <*> x) <*> (pure mappend <*> y)) <*> z where f = (.) . mappend-- (.) f g = f . g= ((pure f <*> x) <*> (pure mappend <*> y)) <*> z where f = (.) (.) mappend-- Applicative law: pure f <*> pure x = pure (f x)= (((pure f <*> pure mappend) <*> x) <*> (pure mappend <*> y)) <*> z where f = (.) (.)-- Applicative law: pure f <*> pure x = pure (f x)= ((((pure (.) <*> pure (.)) <*> pure mappend) <*> x) <*> (pure mappend <*> y)) <*> z-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)= ((pure (.) <*> (pure mappend <*> x)) <*> (pure mappend <*> y)) <*> z-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)= (pure mappend <*> x) <*> ((pure mappend <*> y) <*> z)-- liftA2 f x y = (pure f <*> x) <*> y= liftA2 mappend x (liftA2 mappend y z)-- mappend = liftA2 mappend= mappend x (mappend y z)-- x <> y = mappend x y= x <> (y <> z)Appendix C: Monoid key loggingHere is the complete program for a key logger with a Monoid-based plugin system:import Control.Applicative (pure, liftA2)import Control.Monad (forever)import Data.Monoidimport System.IOinstance Monoid b => Monoid (IO b) where mempty = pure mempty mappend = liftA2 mappendtype Plugin = IO (Char -> IO ())logTo :: FilePath -> PluginlogTo filePath = do handle <- openFile filePath WriteMode hSetBuffering handle NoBuffering return (hPutChar handle)main = do hSetEcho stdin False handleChar <- logTo "file1.txt" <> logTo "file2.txt" forever $ do c <- getChar handleChar c putChar cGabriel Gonzaleztag:blogger.com,1999:blog-1777990983847811806.post-971828486155429031Sun, 20 Jul 2014 15:27:00 +0000managed-1.0.0: A monad for managed resources
http://www.haskellforall.com/2014/08/managed-100-monad-for-managed-resources.html
I'm splitting off the Managed type from the mvc library into its own stand-alone library. I've wanted to use this type outside of mvc for some time now, because it's an incredibly useful Applicative that I find myself reaching for in my own code whenever I need to acquire resources.If you're not familiar with the Managed type, it's simple:-- The real implementation uses smart constructorsnewtype Managed a = Managed { with :: forall r . (a -> IO r) -> IO r }-- It's a `Functor`/`Applicative`/`Monad`instance Functor Managed where ...instance Applicative Managed where ...instance Monad Managed where ...-- ... and also implements `MonadIO`instance MonadIO Managed where ...Here's an example of mixing the Managed monad with pipes to copy one file to another:import Control.Monad.Managedimport System.IOimport Pipesimport qualified Pipes.Prelude as Pipesmain = runManaged $ do hIn <- managed (withFile "in.txt" ReadMode) hOut <- managed (withFile "out.txt" WriteMode) liftIO $ runEffect $ Pipes.fromHandle hIn >-> Pipes.toHandle hOutHowever, this is not much more concise than the equivalent callback-based version. The real value of the Managed type is its Applicative instance, which you can use to lift operations from values that it wraps.Equational reasoningMy previous post on equational reasoning at scale describes how you can use Applicatives to automatically extend Monoids while preserving the Monoid operations. The Managed Applicative is no different and provides the following type class instance that automatically lifts Monoid operations:instance Monoid a => Monoid (Managed a)Therefore, you can treat the Managed Applicative as yet another useful building block in your Monoid tool box.However, Applicatives can do more than extend Monoids; they can extend Categorys, too. Given any Category, if you extend it with an Applicative you can automatically derive a new Category. Here's the general solution:import Control.Applicativeimport Control.Category import Prelude hiding ((.), id)newtype Extend f c a b = Extend (f (c a b))instance (Applicative f, Category c) => Category (Extend f c) where id = Extend (pure id) Extend f . Extend g = Extend (liftA2 (.) f g)So let's take advantage of this fact to extend one of the pipes categories with simple resource management. All we have to do is wrap the pull-based pipes category in a bona-fide Category instance:import Pipesnewtype Pull m a b = Pull (Pipe a b m ()) instance Monad m => Category (Pull m) where id = Pull cat Pull p1 . Pull p2 = Pull (p1 <-< p2)Now we can automatically define resource-managed pipes by Extending them with the Managed Applicative:import Control.Monad.Managedimport qualified Pipes.Prelude as Pipesimport System.IOfromFile :: FilePath -> Extend Managed (Pull IO) () StringfromFile filePath = Extend $ do handle <- managed (withFile filePath ReadMode) return (Pull (Pipes.fromHandle handle))toFile :: FilePath -> Extend Managed (Pull IO) String XtoFile filePath = Extend $ do handle <- managed (withFile filePath WriteMode) return (Pull (Pipes.toHandle handle))All we need is a way to run Extended pipes and then we're good to go:runPipeline :: Extend Managed (Pull IO) () X -> IO ()runPipeline (Extend mp) = runManaged $ do Pull p <- mp liftIO $ runEffect (return () >~ p)If we compose and run these Extended pipes they just "do the right thing":main :: IO ()main = runPipeline (fromFile "in.txt" >>> toFile "out.txt")Let's check it out:$ cat in.txt123$ ./example$ cat out.txt123We can even reuse existing pipes, too:reuse :: Monad m => Pipe a b m () -> Extend Managed (Pull m) a breuse = Extend . pure . Pullmain = runPipeline $ fromFile "in.txt" >>> reuse (Pipes.take 2) >>> toFile "out.txt"... and reuse does the right thing:$ ./example$ cat out.txt12What does it mean for reuse to "do the right thing"? Well, we can specify the correctness conditions for reuse as the following functor laws:reuse (p1 >-> p2) = reuse p1 >>> reuse p2reuse cat = idThese two laws enforce that reuse is "well-behaved" in a rigorous sense.This is just one example of how you can use the Managed type to extend an existing Category. As an exercise, try to take other categories and extend them this way and see what surprising new connectable components you can create.ConclusionExperts will recognize that Managed is a special case of Codensity or ContT. The reason for defining a separate type is:simpler inferred types,additional type class instances, and:a more beginner-friendly name.Managed is closely related in spirit to the Resource monad, which is now part of resourcet. The main difference between the two is:Resource preserves the open and close operationsManaged works for arbitrary callbacks, even unrelated to resourcesThis is why I view the them as complementary Monads.Like all Applicatives, the Managed type is deceptively simple. This type does not do much in isolation, but it grows in power the more you compose it with other Applicatives to generate new Applicatives.Gabriel Gonzaleztag:blogger.com,1999:blog-1777990983847811806.post-550543572771615741Sun, 10 Aug 2014 23:47:00 +0000Field Accessors Considered Harmful
http://feedproxy.google.com/~r/SoftwareSimply/~3/sZAuwIDkcvY/field-accessors-considered-harmful.html
It's pretty well known these days that Haskell's field accessors are rather cumbersome syntactically and not composable. The lens abstraction that has gotten much more popular recently (thanks in part to Edward Kmett's lens package) solves these problems. But I recently ran into a bigger problem with field accessors that I had not thought about before. Consider the following scenario. You have a package with code something like this:
data Config = Config { configFieldA :: [Text] }
So your Config data structure gives your users getters and setters for field A (and any other fields you might have). Your users are happy and life is good. Then one day you decide to add a new feature and that feature requires expanding and restructuring Config. Now you have this:
data MConfig = MConfig { mconfigFieldA :: [Text] }
data Config = Config { configMC :: MConfig
, configFieldX :: Text
, configFieldY :: Bool }
This is a nice solution because your users get to keep the functionality over the portion of the Config that they are used to, and they still get the new functionality. But now there's a problem. You're still breaking them because configFieldA changed names to mconfigFieldA and now refers to the MConfig structure instead of Config. If this was not a data structure, you could preserve backwards compatibility by creating another function:
configFieldA = mconfigFieldA . configMC
But alas, that won't work here because configFieldA is not a normal function. It is a special field accessor generated by GHC and you know that your users are using it as a setter. It seems to me that we are at an impasse. It is completely impossible to deliver your new feature without breaking backwards compatibility somehow. No amount of deprecated cycles can ease the transition. The sad thing is that it seems like it should have been totally doable. Obviously there are some kinds of changes that understandably will break backwards compatibility. But this doesn't seem like one of them since it is an additive change. Yes, yes, I know...it's impossible to do this change without changing the type of the Config constructor, so that means that at least that function will break. But we should be able to minimize the breakage to the field accessor functions, and field accessors prevent us from doing that.
However, we could have avoided this problem. If we had a bit more foresight, we could have done this.
module Foo (mkConfig, configFieldA) where
data Config = Config { _configFieldA :: [Text] }
mkConfig :: [Text] -> Config
mkConfig = Config
configFieldA = lens _configFieldA (\c a -> c { _configFieldA = a })
This would allow us to avoid breaking backwards compatibility by continuing to export appropriate versions of these symbols. It would look something like this.
module Foo
( MConfig
, mkMConfig
, mconfigFieldA
, Config
, mkConfig
, configFieldA
, configMC
-- ...
) where
data MConfig = MConfig { _mconfigFieldA :: [Text] }
data Config = Config { _configMC :: MConfig
, _configFieldX :: Text
, _configFieldY :: Bool }
mkMConfig = MConfig
mkConfig a = Config (mkMConfig a) "" False
mconfigFieldA = lens _mconfigFieldA (\c a -> c { _mconfigFieldA = a })
configMC = lens _configMC (\c mc -> c { _configMC = mc })
-- The rest of the field lenses here
configFieldA = configMC . mconfigFieldA
Note that the type signatures for mkConfig and configFieldA stay exactly the same. We weren't able to do this with field accessors because they are not composable. Lenses solve this problem for us because they are composable and we have complete control over their definition.
For quite some time now I have thought that I understood the advantage that lenses give you over field accessors. Discovering this added ability of lenses in helping us preserve backwards compatibility came as a pleasant surprise. I'll refrain from opining on how this should affect your development practices, but I think it makes the case for using lenses in your code a bit stronger than it was before.mightybytetag:blogger.com,1999:blog-8768401356830813531.post-2057597500022751501Sun, 10 Aug 2014 21:33:00 +0000Readers wanted!
http://byorgey.wordpress.com/2014/08/10/readers-wanted/
tl;dr: Read a draft of my thesis and send me your feedback by September 9!
Over the past year I’ve had several people say things along the lines of, “let me know if you want me to read through your thesis”. I never took them all that seriously (it’s easy to say you are willing to read a 200-page document…), but it never hurts to ask, right?
My thesis defense is scheduled for October 14, and I’m currently undertaking a massive writing/editing push to try to get as much of it wrapped up as I can before classes start on September 4. So, if there’s anyone out there actually interested in reading a draft and giving feedback, now is your chance!
The basic idea of my dissertation is to put combinatorial species and related variants (including a port of the theory to HoTT) in a common categorical framework, and then be able to use them for working with/talking about data types. If you’re brave enough to read it, you’ll find lots of category theory and type theory, and very little code—but I can promise lots of examples and pretty pictures. I’ve tried to make it somewhat self-contained, so it may be a good way to learn a bit of category theory or homotopy type theory, if you’ve been curious to learn more about those topics.
You can find the latest draft here (auto-updated every time I commit); more generally, you can find the git repo here. If you notice any typos or grammatical errors, feel free to open a pull request. For anything more substantial—thoughts on the organization, notes or questions about things you found confusing, suggestions for improvement, pointers to other references—please send me an email (first initial last name at gmail). And finally, please send me any feedback by September 9 at the latest (but the earlier the better). I need to have a final version to my committee by September 23.
Last but not least, if you’re interested to read it but don’t have the time or inclination to provide feedback on a draft, never fear—I’ll post an announcement when the final version is ready for your perusal!Brenthttp://byorgey.wordpress.com/?p=1395Sun, 10 Aug 2014 19:42:02 +0000What is currying (in Swift)?
http://justtesting.org/post/94325843216
http://justtesting.org/post/94325843216Sun, 10 Aug 2014 09:19:23 +0000Scribal traditions of "ancient" Hebrew scrolls
http://izbicki.me/blog/blog/ancient-hebrew-torah-scrolls.html
http://izbicki.me/blog/blog/ancient-hebrew-torah-scrolls.htmlSun, 10 Aug 2014 00:00:00 +0000What’s a module system good for anyway?
http://feedproxy.google.com/~r/ezyang/~3/JgcsQmUUj3k/
This summer, I've been working at Microsoft Research implementing Backpack, a module system for Haskell. Interestingly, Backpack is not really a single monolothic feature, but, rather, an agglomeration of small, infrastructural changes which combine together in an interesting way. In this series of blog posts, I want to talk about what these individual features are, as well as how the whole is greater than the sum of the parts.
But first, there's an important question that I need to answer: What's a module system good for anyway? Why should you, an average Haskell programmer, care about such nebulous things as module systems and modularity. At the end of the day, you want your tools to solve specific problems you have, and it is sometimes difficult to understand what problem a module system like Backpack solves. As tomejaguar puts it: "Can someone explain clearly the precise problem that Backpack addresses? I've read the paper and I know the problem is 'modularity' but I fear I am lacking the imagination to really grasp what the issue is."
Look no further. In this blog post, I want to talk concretely about problems Haskellers have today, explain what the underlying causes of these problems are, and say why a module system could help you out.
The String, Text, ByteString problem
As experienced Haskellers are well aware, there are multitude of string types in Haskell: String, ByteString (both lazy and strict), Text (also both lazy and strict). To make matters worse, there is no one "correct" choice of a string type: different types are appropriate in different cases. String is convenient and native to Haskell'98, but very slow; ByteString is fast but are simply arrays of bytes; Text is slower but Unicode aware.
In an ideal world, a programmer might choose the string representation most appropriate for their application, and write all their code accordingly. However, this is little solace for library writers, who don't know what string type their users are using! What's a library writer to do? There are only a few choices:
They "commit" to one particular string representation, leaving users to manually convert from one representation to another when there is a mismatch. Or, more likely, the library writer used the default because it was easy. Examples: base (uses Strings because it completely predates the other representations), diagrams (uses Strings because it doesn't really do heavy string manipulation).
They can provide separate functions for each variant, perhaps identically named but placed in separate modules. This pattern is frequently employed to support both strict/lazy variants Text and ByteStringExamples: aeson (providing decode/decodeStrict for lazy/strict ByteString), attoparsec (providing Data.Attoparsec.ByteString/Data.Attoparsec.ByteString.Lazy), lens (providing Data.ByteString.Lazy.Lens/Data.ByteString.Strict.Lens).
They can use type-classes to overload functions to work with multiple representations. The particular type class used hugely varies: there is ListLike, which is used by a handful of packages, but a large portion of packages simply roll their own. Examples: SqlValue in HDBC, an internal StringLike in tagsoup, and yet another internal StringLike in web-encodings.
The last two methods have different trade offs. Defining separate functions as in (2) is a straightforward and easy to understand approach, but you are still saying no to modularity: the ability to support multiple string representations. Despite providing implementations for each representation, the user still has to commit to particular representation when they do an import. If they want to change their string representation, they have to go through all of their modules and rename their imports; and if they want to support multiple representations, they'll still have to write separate modules for each of them.
Using type classes (3) to regain modularity may seem like an attractive approach. But this approach has both practical and theoretical problems. First and foremost, how do you choose which methods go into the type class? Ideally, you'd pick a minimal set, from which all other operations could be derived. However, many operations are most efficient when directly implemented, which leads to a bloated type class, and a rough time for other people who have their own string types and need to write their own instances. Second, type classes make your type signatures more ugly String -> String to StringLike s => s -> s and can make type inference more difficult (for example, by introducing ambiguity). Finally, the type class StringLike has a very different character from the type class Monad, which has a minimal set of operations and laws governing their operation. It is difficult (or impossible) to characterize what the laws of an interface like this should be. All-in-all, it's much less pleasant to program against type classes than concrete implementations.
Wouldn't it be nice if I could import String, giving me the type String and operations on it, but then later decide which concrete implementation I want to instantiate it with? This is something a module system can do for you! This Reddit thread describes a number of other situations where an ML-style module would come in handy.
(PS: Why can't you just write a pile of preprocessor macros to swap in the implementation you want? The answer is, "Yes, you can; but how are you going to type check the thing, without trying it against every single implementation?")
Destructive package reinstalls
Have you ever gotten this error message when attempting to install a
new package?
$ cabal install hakyll
cabal: The following packages are likely to be broken by the reinstalls:
pandoc-1.9.4.5
Graphalyze-0.14.0.0
Use --force-reinstalls if you want to install anyway.
Somehow, Cabal has concluded that the only way to install hakyll is to reinstall some dependency. Here's one situation where a situation like this could come about:
pandoc and Graphalyze are compiled against the latest unordered-containers-0.2.5.0, which itself was compiled against the latest hashable-1.2.2.0.
hakyll also has a dependency on unordered-containers and hashable, but it has an upper bound restriction on hashable which excludes the latest hashable version. Cabal decides we need to install an old version of hashable, say hashable-0.1.4.5.
If hashable-0.1.4.5 is installed, we also need to build unordered-containers against this older version for Hakyll to see consistent types. However, the resulting version is the same as the preexisting version: thus, reinstall!
The root cause of this error an invariant Cabal
currently enforces on a package database: there can only be one instance
of a package for any given package name and version. In particular,
this means that it is not possible to install a package multiple times,
compiled against different dependencies. This is a bit troublesome,
because sometimes you really do want the same package installed multiple
times with different dependencies: as seen above, it may be the only way to fulfill
the version bounds of all packages involved.
Currently, the only way to work around this problem is to use a Cabal sandbox (or blow away your package database and reinstall everything, which is basically the same thing).
You might be wondering, however, how could a module system possibly help with this? It doesn't... at least, not directly. Rather, nondestructive reinstalls of a package are a critical feature for implementing a module system like Backpack (a package may be installed multiple times with different concrete implementations of modules). Implementing Backpack necessitates fixing this problem, moving Haskell's package management a lot closer to that of Nix's or NPM.
Version bounds and the neglected PVP
While we're on the subject of cabal-install giving errors, have you ever gotten this error attempting to install a new package?
$ cabal install hledger-0.18
Resolving dependencies...
cabal: Could not resolve dependencies:
# pile of output
There are a number of possible reasons why this could occur, but usually
it's because some of the packages involved have over-constrained version
bounds (especially upper bounds), resulting in an unsatisfiable set of constraints. To
add insult to injury, often these bounds have no grounding in reality (the package author
simply guessed the range) and removing
it would result in a working compilation. This situation is
so common that Cabal has a flag --allow-newer which lets you
override the upper bounds of packages. The annoyance of managing bounds has lead to the development of tools like cabal-bounds, which try to make it less tedious to keep upper bounds up-to-date.
But as much as we like to rag on them, version bounds have a very important function: they prevent you from attempting to compile packages against dependencies which don't work at all! An under-constrained set of version bounds can easily have compiling against a version of the dependency which doesn't type check.
How can a module system help? At the end of the day, version numbers are trying to capture something about the API exported by a package, described by the package versioning policy. But the current state-of-the-art requires a user to manually translate changes to the API into version numbers: an error prone process, even when assisted by various tools. A module system, on the other hand, turns the API into a first-class entity understood by the compiler itself: a module signature. Wouldn't it be great if packages depended upon signatures rather than versions: then you would never have to worry about version numbers being inaccurate with respect to type checking. (Of course, versions would still be useful for recording changes to semantics not seen in the types, but their role here would be secondary in importance.) Some full disclosure is warranted here: I am not going to have this implemented by the end of my internship, but I'm hoping to make some good infrastructural contributions toward it.
Conclusion
If you skimmed the introduction to the Backpack paper, you might have come away with the impression that Backpack is something about random number generators, recursive linking and applicative semantics. While these are all true "facts" about Backpack, they understate the impact a good module system can have on the day-to-day problems of a working programmer. In this post, I hope I've elucidated some of these problems, even if I haven't convinced you that a module system like Backpack actually goes about solving these problems: that's for the next series of posts. Stay tuned!Edward Z. Yanghttp://blog.ezyang.com/?p=9016Sat, 09 Aug 2014 23:21:19 +00007 Startups - part 5 - the XMPP backend
http://lpuppet.banquise.net/blog/2014/08/09/7-startups-part-5-the-xmpp-backend/
Note: I ran out of time weeks ago. I could never finish this serie as I envisionned, and I don’t see much free time on the horizon. Instead of letting this linger forever, here is
a truncated conclusion. The previous episodes were :
Part 1 : probably the best episode, about the basic game types.
Part 2 : definition of the game rules in an unspecified monad.
Part 3 : writing an interpreter for the rules.
Part 4 : stumbling and failure in writing a clean backend system.
In the previous episode I added a ton of STM code and helper functions in several 15
minutes sessions. The result was not pretty, and left me dissatisfied.
For this episode, I decided to release my constraints. For now, I am only going to support the following :
The backend list will not be dynamic : a bunch of backends are going to be registered once, and it will be not be possible to remove an existing or add a
previous backend once this is done.
The backends will be text-line based (XMPP and IRC are good protocols for this). This will unfortunately make it harder to write a nice web interface for the
game too, but given how much time I can devote to this side-project this doesn’t matter much …
The MVC paradigm
A great man once said that “if you have category theory, everything looks like a pipe. Or a monad. Or a traversal. Or perhaps it’s a cosomething”. With the
previously mentionned restrictions, I was able to shoehorn my problem in the shape of the mvc package, which I wanted
to try for a while. It might be a bit different that what people usually expect when talking about the model - view - controller pattern, and is basically :
Some kind of pollable input (the controllers),
a pure stream based computation (the model), sporting an internal state and transforming the data coming from the inputs into something that is passed to …
… IO functions that run the actual effects (the views).
Each of these components can be reasoned about separately, and combined together in various ways.
There is however one obvious problem with this pattern, due to the way the game is modeled. Currently, the game is supposed to be able to receive data from the
players, and to send data to them. It would need to live entirely in the model for this to work as expected, but the way it is currently written doesn’t make
it obvious.
It might be possible to have the game be explicitely CPS, so that the pure part would run the game until communication with the players is required, which would
translate nicely in an output that could be consumed by a view.
This would however require some refactoring and a lot of thinking, which I currently don’t have time for, so here is instead how the information flows :
Here PInput and GInput are the type of the inputs (respectively from player and games). The blue boxes are two models that will be combined together. The
pink ones are the type of outputs emitted from the models. The backends serve as drivers for player communication. The games run in their respective
threads, and the game manager spawns and manages the game threads.
Comparison with the “bunch of STM functions” model
I originally started with a global TVar containing the state information of each players (for example if they are part of a game, still joining, due to answer
to a game query, etc.). There were a bunch of “helper functions” that would manipulate the global state in a way that would ensure its consistency. The catch is
that the backends were responsible for calling these helper functions at appropriate times and for not messing with the global state.
The MVC pattern forces the structure of your program. In my particular case, it means a trick is necessary to integrate it with the current game logic (that will
be explained later). The “boSf” pattern is more flexible, but carries a higher cognitive cost.
With the “boSf” pattern, response to player inputs could be :
Messages to players, which fits well with the model, as it happened over STM channels, so the whole processing / state manipulation / player output could
be of type Input -> STM ().
Spawning a game. This time we need forkIO and state manipulation. This means a type like c :: Input -> STM (IO ()), with a call like join (atomically (c input)).
Now there are helper functions that return an IO action, and some that don’t. When some functionnality is added, some functions need to start returning IO
actions. This is ugly and makes it harder to extend.
Conclusion of the serie
Unfortunately I ran out of time for working on this serie a few weeks ago. The code is out, the game works and it’s fun.
My original motivation for writing this post was as an exposure on basic type-directed design to my non-Haskeller
friends, but I think it’s not approachable to non Haskellers, so I never shown them.
The main takeaways are :
Game rules
The game rules have first been written with an unspecified monad that exposed several functions required for user
interaction. That’s the reason I started with defining a typeclass, that way I wouldn’t have to worry about implementing
the “hard” part and could concentrate on writing the rules instead. For me, this was the fun part, and it was also the
quickest.
As of the implementation of the aforementionned functions, I then used the operational package, that would let me
write and “interpreter” for my game rules. One of them is pure, and used in tests. There are two other interpreters, one
of them for the console version of the game, the other for the multi-backends system.
Backend system
The backends are, I think, easy to expand. Building the core of the multi-game logic with the mvc package very
straightforward. It would be obvious to add an IRC backend to the XMPP one, if there weren’t that many IRC packages to
choose from on hackage …
A web backend doesn’t seem terribly complicated to write, until you want to take into account some common web
application constraints, such as having several redundant servers. In order to do so, the game interpreter should be
explicitely turned into an explicit continuation-like system (with the twist it only returns on blocking calls) and
the game state serialized in a shared storage system.
Bugs
My main motivation was to show it was possible to eliminate tons of bug classes by encoding of the invariants in the
type system. I would say this was a success.
The area where I expected to have a ton of problems was the card list. It’s a tedious manual process, but some tests weeded out most of the errors (it helps that there are some
properties that can be verified on the deck). The other one was the XMPP message processing in its XML horror. It looks terrible.
The area where I wanted this process to work well was a success. I wrote the game rules in one go, without any feedback. Once they were completed, I wrote the backends and tested
the game. It turned out they were very few bugs, especially when considering the fact that the game is a moderately complicated board game :
One of the special capabilities was replaced with another, and handled at the wrong moment in the game. This was
quickly debugged.
I used traverse instead of both for tuples. I expected them to have the same result, and it “typechecked” because my tuple was of type (a,a), but the Applicative instance
for tuples made it obvious this wasn’t the case. That took a bit longer to find out, as it impacted half of the military victory points, which are distributed only three times
per game.
I didn’t listen to my own advice, and didn’t take the time to properly encode that some functions only worked with nonempty lists as arguments. This was also quickly found out,
using quickcheck.
The game seems to run fine now. There is a minor rule bugs identified (the interaction between card-recycling abilities and the last turn for example), but I don’t have time to fix
it.
There might be some interest with the types of the Hub, as they also encode a lot of invariants.
Also off-topic, but I really like using the lens vocabulary to encode the relationship between types these days. A trivial example can be found here.
The game
That might be the most important part. I played a score of games, and it was a lot of fun. The game is playable, and
just requires a valid account on an XMPP server. Have fun !http://lpuppet.banquise.net/blog/2014/08/09/7-startups-part-5-the-xmpp-backendSat, 09 Aug 2014 08:48:00 +0000Deprecating yesod-platform
http://www.yesodweb.com/blog/2014/08/deprecating-yesod-platform
I want to deprecate the yesod-platform, and instead switch to Stackage server
as the recommended installation method for Yesod for end users. To explain why,
let me explain the purpose of yesod-platform, the problems I've encountered
maintaining it, and how Stackage
Server can
fit in. I'll also explain some unfortunate complications with Stackage Server.Why yesod-platform existsImagine a simpler Yesod installation path:cabal install yesod-bin, which provides the yesod executable.yesod init to create a scaffolding.cabal install inside that directory, which downloads and installs all of the necessary dependencies.This in fact used to be the installation procedure, more or less. However, this
led to a number of user problems:Back in the earlier days of cabal-install, it was difficult for the dependency solver to find a build plan in this situation. Fortunately, cabal-install has improved drastically since then.This does still happen occasionally, especially with packages with restrictive upper bounds. Using --max-backjumps=-1 usually fixes that.It sometimes happens that an upstream package from Yesod breaks Yesod, either by changing an API accidentally, or by introducing a runtime bug.This is where yesod-platform comes into play. Instead of leaving it up to
cabal-install to track down a consistent build plan, it specifies exact
versions of all depedencies to ensure a consistent build plan.Conflicts with GHC deps/Haskell PlatformYesod depends on aeson. So logically, yesod-platform should have a strict
dependency on aeson. We try to always use the newest versions of dependencies,
so today, that would be aeson == 0.8.0.0. In turn, this demands text >=
1.1.1.0. However, if you look at the Haskell Platform
changelog, there's no version
of the platform that provides a new enough version of text to support that
constraint.yesod-platform could instead specify an older version of aeson, but that would
unnecessarily constrain users who aren't sticking to the Haskell Platform
versions (which, in my experience, is the majority of users). This would also
cause more dependency headaches down the road, as you'd now also need to force
older versions of packages like criterion.To avoid this conflict, yesod-platform has taken the approach of simply
omitting constraints on any packages in the platform, as well as any packages
with strict bounds on those packages. And if you look at yesod-platform today,
you'll that there is no mention of aeson or text.A similar issue pops up for packages that are a dependency of the GHC package
(a.k.a., GHC-the-library). The primary problem there is the binary package. In
this case, the allowed version of the package depends on which version of GHC
is being used, not the presence or absence of the Haskell Platform.This results in two problems:It's very difficult to maintain this list of excluded packages correctly. I
get large number of bug reports about these kinds of build plan problems.We're giving up quite a bit of the guaranteed buildability that
yesod-platform was supposed to provide. If aeson 0.7.0.4 (as an example)
doesn't work with yesod-form, yesod-platform won't be able to prevent such a
build plan from happening.There's also an issue with the inability to specify dependencies on
executable-only packages, like alex, happy, and yesod-bin.Stackage ServerStackage Server solves exactly the same problem. It provides a consistent set
of packages that can be installed together. Unlike yesod-platform, it can be
distinguished based on GHC version. And it's far simpler to maintain. Firstly,
I'm already maintaining Stackage Server full time. And secondly, all of the
testing work is handled by a very automated process.So here's what I'm proposing: I'll deprecate the yesod-platform package, and change the Yesod quickstart guide to have the following instructions:Choose an appropriate Stackage snapshot from stackage.orgModify your cabal config file appropriatelycabal install yesod-bin alex happyUse yesod init to set up a scaffoldingcabal install --enable-tests in the new directoryFor users wishing to live on more of a bleeding edge, the option is always
available to simply not use Stackage. Such a usage will give more control over
package versions, but will also lack some stability.The problemsThere are a few issues that need to be ironed out.cabal sandbox does not allow changing the remote-repo. Fortunately, Luite seems to have this solved, so hopefully this won't be a problem for long. Until then, you can either use a single Stackage snapshot for all your development, or use a separate sandboxing technology like hsenv.Haskell Platform conflicts still exist. The problem I mentioned above with aeson and text is a real problem. The theoretically correct solution is to create a Stackage snapshot for GHC 7.8 + Haskell Platform. And if there's demand for that, I'll bite the bullet and do it, but it's not an easy bullet to bite. But frankly, I'm not hearing a lot of users saying that they want to peg Haskell Platform versions specifically.In fact, the only users who really seem to want to stick to Haskell Platform versions are Windows users, and the main reason for this is the complexity in installing the network package on Windows. I think there are three possible solutions to this issue, without forcing Windows users onto old versions of packages:Modify the network package to be easier to install on Windows. I really hope this has some progress. If this is too unstable to be included in the official Hackage release, we could instead have an experimental Stackage snapshot for Windows with that modification applied.Tell Windows users to simply bypass Stackage and yesod-platform, with the possibility of more build problems on that platform.We could similarly recommend Windows users develop in a Linux virtual machine/Docker image.Provide a Windows distribution of GHC + cabal-install + network. With the newly split network/network-uri, this is a serious possibility.Despite these issues, I think Stackage Server is a definite improvement on
yesod-platform on Linux and Mac, and will likely still improve the situation on
Windows, once we figure out the Haskell Platform problems.I'm not making any immediate changes. I'd very much like to hear people using
Yesod on various operating systems to see how these changes will affect them.http://www.yesodweb.com/blog/2014/08/deprecating-yesod-platformSat, 09 Aug 2014 05:10:00 +0000Maniac week
http://byorgey.wordpress.com/2014/08/04/maniac-week/
Inspired by Bethany Soule (and indirectly by Nick Winter, and also by the fact that my dissertation defense and the start of the semester are looming), I am planning a “maniac week” while Joyia and Noah will be at the beach with my family (I will join them just for the weekend). The idea is to eliminate as many distractions as possible and to do a ton of focused work. Publically committing (like this) to a time frame, ground rules, and to putting up a time-lapse video of it afterwards are what actually make it work—if I don’t succeed I’ll have to admit it here on my blog; if I waste time on Facebook the whole internet will see it in the video; etc. (There’s actually no danger of wasting time on Facebook in particular since I have it blocked, but you get the idea.)
Here are the rules:
I will start at 6pm (or thereabouts) on Friday, August 8.
I will continue until 10pm on Wednesday, August 13, with the exception of the morning of Sunday, August 10 (until 2pm).
I will get at least 7.5 hours of sleep each night.
I will not eat cereal for any meal other than breakfast.
I will reserve 3 hours per day for things like showering, eating, and just plain resting. Such things will be tracked by the TagTime tag “notwork”.
I will spend the remaining 13.5 hours per day working productively. Things that will count as productive work:
Working on my dissertation
Course prep for CS 354 (lecture and assignment planning, etc.) and CS 134 (reading through the textbook); making anki decks with names and faces for both courses
Updating my academic website (finish converting to Hakyll 4; add potential research and independent study topics for undergraduates)
Processing FogBugz tickets
I may work on other research or coding projects (e.g. diagrams) each day, but only after spending at least 7 hours on my dissertation.
I will not go on IRC at all during the week. I will disable email notifications on my phone (but keep the phone around for TagTime), and close and block gmail in my browser. I will also disable the program I use to check my UPenn email account.
For FogBugz tickets which require responding to emails, I will simply write the email in a text file and send it later.
I may read incoming email and write short replies on my phone, but will keep it to a bare minimum.
I will not read any RSS feeds during the week. I will block feedly in my browser.
On August 18 I will post a time-lapse video of August 8-13. I’ll probably also write a post-mortem blog post, if I feel like I have anything interesting to say.
I reserve the right to tweak these rules (by editing this post) up until August 8 at 6pm. After that point it’s shut up and work time, and I cannot change the rules any more.
And no, I’m not crazy. You (yes, you) could do this too.Brenthttp://byorgey.wordpress.com/?p=1370Mon, 04 Aug 2014 20:46:57 +0000criterion 1.0
http://www.serpentine.com/blog/2014/08/08/criterion-1-0/
<html>
<head>
<style type="text/css">code{white-space:pre;}</style>
</head>
Almost five years after I initially released criterion, I'm delighted to announce a major release with a large number of appealing new features.
As always, you can install the latest goodness using cabal install criterion, or fetch the source from github.
Please let me know if you find criterion useful!
New documentation
I built both a home page and a thorough tutorial for criterion. I've also extended the inline documentation and added a number of new examples.
All of the documentation lives in the github repo, so if you'd like to see something improved, please send a bug report or pull request.
New execution engine
Criterion's model of execution has evolved, becoming vastly more reliable and accurate. It can now measure events that take just a few hundred picoseconds.
benchmarking return ()
time 512.9 ps (512.8 ps .. 513.1 ps)
While almost all of the core types have changed, criterion should remain API-compatible with the vast majority of your benchmarking code.
New metrics
In addition to wall-clock time, criterion can now measure and regress on the following metrics:
CPU time
CPU cycles
bytes allocated
number of garbage collections
number of bytes copied during GC
wall-clock time spent in mutator threads
CPU time spent running mutator threads
wall-clock time spent doing GC
CPU time spent doing GC
Linear regression
Criterion now supports linear regression of a number of metrics.
Here's a regression conducted using --regress cycles:iters:
cycles: 1.000 R² (1.000 R² .. 1.000 R²)
iters 47.718 (47.657 .. 47.805)
The first line of the output is the R² goodness-of-fit measure for this regression, and the second is the number of CPU cycles (measured using the rdtsc instruction) to execute the operation in question (integer division).
This next regression uses --regress allocated:iters to measure the number of bytes allocated while constructing an IntMap of 40,000 values.
allocated: 1.000 R² (1.000 R² .. 1.000 R²)
iters 4.382e7 (4.379e7 .. 4.384e7)
(That's a little under 42 megabytes.)
New outputs
While its support for active HTML has improved, criterion can also now output JSON and JUnit XML files.
New internals
Criterion has received its first spring cleaning, and is much easier to understand as a result.
Acknowledgments
I was inspired into some of this work by the efforts of the authors of the OCaml Core_bench package.
</html>Bryan O'Sullivanhttp://www.serpentine.com/blog/?p=1089Fri, 08 Aug 2014 10:02:48 +0000On being the "same" or "different": Introduction to Apartness
http://winterkoninkje.dreamwidth.org/98453.html
http://winterkoninkje.dreamwidth.org/98453.htmlThu, 07 Aug 2014 09:51:21 +0000Working with postgresql-simple with generics-sop
http://ocharles.org.uk/blog/posts/2014-08-07-postgresql-simple-generic-sop.html
http://ocharles.org.uk/blog/posts/2014-08-07-postgresql-simple-generic-sop.htmlThu, 07 Aug 2014 00:00:00 +0000An Open Letter to Alex Salmond
http://wadler.blogspot.com/2014/08/an-open-letter-to-alex-salmond.html
Dear Alex,It's not working.There may be reasons to stick to the story that all will be smooth sailing in the event that Scotland becomes independent, but everyone knows it is not true.Last night's debate makes it clear that refusal to discuss Plan B for currency hurts more than it helps. The same could be said for how negotiations will proceed over Trident, the EU, Nato, pensions, education, and the economy.You took too long pressing Darling to admit that Scotland could succeed as an independent country, which you did to show that he too has issues with the truth. Last night's audience failed to get a straight answer from either side, and that led to their verdict: the debate was a disappointment.It's time for an about face. Tell the truth. Independence will be stormy. The transition will not be easy. We cannot be certain of the outcome, save that it will put Scotland in a position to make its own decisions.The future is indefinite, whether we opt for independence or no. The probable outcome if Scotland remains in the UK is growing social injustice, austerity for the poor and more for the 1%, bedroom tax, referenda on whether to stay in the EU. You made these points last night, but they were overshadowed by your inability to admit that independence has risks too.Truth will be refreshing. It will knock all loose and reinvigorate the debate. It may restore part of Scottish people's faith in the political process, something we sorely need regardless of which side wins in September.If you stick to your current strategy, polls make it clear that No will win. It's time for Plan B.Yours aye, -- PPhilip Wadlertag:blogger.com,1999:blog-9757377.post-6816976985263166616Wed, 06 Aug 2014 09:58:00 +0000Announcing auto-update
http://www.yesodweb.com/blog/2014/08/announcing-auto-update
Kazu and I are happy to announce the first release of
auto-update, a
library to run update actions on a given schedule. To make it more concrete,
let's start with a motivating example.Suppose you're writing a web service which will return the current time. This
is simple enough with WAI and Warp, e.g.:{-# LANGUAGE OverloadedStrings #-}
import Data.ByteString.Lazy.Char8 (pack)
import Data.Time (formatTime, getCurrentTime)
import Network.HTTP.Types (status200)
import Network.Wai (responseLBS)
import Network.Wai.Handler.Warp (run)
import System.Locale (defaultTimeLocale)
main :: IO ()
main =
run 3000 app
where
app _ respond = do
now <- getCurrentTime
respond $ responseLBS status200 [("Content-Type", "text/plain")]
$ pack $ formatTime defaultTimeLocale "%c" nowThis is all well and good, but it's a bit inefficient. Imagine if you have a
thousand requests per second (some people really like do know what time it
is). We will end up recalculating the string representation of the time a 999
extra times than is necessary! To work around this, we have a simple solution:
spawn a worker thread to calculate the time once per second. (Note: it will
actually calculate it slightly less than once per second due to the way
threadDelay works; we're assuming we have a little bit of latitude in
returning a value thats a few milliseconds off.){-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent (forkIO, threadDelay)
import Control.Monad (forever)
import Data.ByteString.Lazy.Char8 (ByteString, pack)
import Data.IORef (newIORef, readIORef, writeIORef)
import Data.Time (formatTime, getCurrentTime)
import Network.HTTP.Types (status200)
import Network.Wai (responseLBS)
import Network.Wai.Handler.Warp (run)
import System.Locale (defaultTimeLocale)
getCurrentTimeString :: IO ByteString
getCurrentTimeString = do
now <- getCurrentTime
return $ pack $ formatTime defaultTimeLocale "%c" now
main :: IO ()
main = do
timeRef <- getCurrentTimeString >>= newIORef
_ <- forkIO $ forever $ do
threadDelay 1000000
getCurrentTimeString >>= writeIORef timeRef
run 3000 (app timeRef)
where
app timeRef _ respond = do
time <- readIORef timeRef
respond $ responseLBS status200 [("Content-Type", "text/plain")] timeNow we will calculate the current time once per second, which is far more
efficient... right? Well, it depends on server load. Previously, we talked
about a server getting a thousand requests per second. Let's instead reverse
it: a server that gets one request every thousand seconds. In that case, our
optimization turns into a pessimization.This problem doesn't just affect getting the current time. Another example is
flushing logs. A hot web server could be crippled by flushing logs to disk on
every request, whereas flushing once a second on a less popular server simply
keeps the process running for no reason. One option is to put the power in the
hands of users of a library to decide how often to flush. But often times, we
won't know until runtime how frequently a service will be requested. Or even
more complicated: traffic will come in spikes, with both busy and idle times.(Note that I've only given examples of running web servers, though I'm certain
there are plenty of other examples out there to draw from.)This is the problem that auto-update comes to solve. With auto-update, you
declare an update function, a frequency with which it should run, and a
threshold at which it should "daemonize". The first few times you request a
value, it's calculated in the main thread. Once you cross the daemonize
threshold, a dedicated worker thread is spawned to recalculate the value. If
the value is not requested during an update period, the worker thread is shut
down, and we go back to the beginning.Let's see how our running example works out with this:{-# LANGUAGE OverloadedStrings #-}
import Control.AutoUpdate (defaultUpdateSettings,
mkAutoUpdate, updateAction)
import Data.ByteString.Lazy.Char8 (ByteString, pack)
import Data.Time (formatTime, getCurrentTime)
import Network.HTTP.Types (status200)
import Network.Wai (responseLBS)
import Network.Wai.Handler.Warp (run)
import System.Locale (defaultTimeLocale)
getCurrentTimeString :: IO ByteString
getCurrentTimeString = do
now <- getCurrentTime
return $ pack $ formatTime defaultTimeLocale "%c" now
main :: IO ()
main = do
getTime <- mkAutoUpdate defaultUpdateSettings
{ updateAction = getCurrentTimeString
}
run 3000 (app getTime)
where
app getTime _ respond = do
time <- getTime
respond $ responseLBS status200 [("Content-Type", "text/plain")] timeIf you want to see the impact of this change, add a putStrLn call to
getCurrentTimeString and make a bunch of requests to the service. You should
see just one request per second, once you get past that initial threshold
period (default of 3).Kazu and I have started using this library in a few places:fast-logger no longer requires explicit flushing; it's handled for you automatically.wai-logger and wai-extra's request logger, by extension, inherit this functionality.Warp no longer has a dedicated thread for getting the current time.The Yesod scaffolding was able to get rid of an annoying bit of commentary.Hopefully others will enjoy and use this library as well.Control.ReaperThe second module in auto-update is Control.Reaper. This provides something
similar, but slightly different, from Control.AutoUpdate. The goal is to
spawn reaper/cleanup threads on demand. These threads can handle such things
as:Recycling resources in a resource pool.Closing out unused connections in a connection pool.Terminating threads that have overstayed a timeout.This module is currently being used in Warp for slowloris timeouts and file
descriptor cache management, though I will likely use it in http-client in the
near future as well for its connection manager management.http://www.yesodweb.com/blog/2014/08/announcing-auto-updateWed, 06 Aug 2014 07:10:00 +0000Fun with (Kalman) Filters Part II
http://idontgetoutmuch.wordpress.com/2014/08/06/fun-with-kalman-filters-part-ii/
Introduction
Suppose we have particle moving in at constant velocity in 1 dimension, where the velocity is sampled from a distribution. We can observe the position of the particle at fixed intervals and we wish to estimate its initial velocity. For generality, let us assume that the positions and the velocities can be perturbed at each interval and that our measurements are noisy.
A point of Haskell interest: using type level literals caught a bug in the mathematical description (one of the dimensions of a matrix was incorrect). Of course, this would have become apparent at run-time but proof checking of this nature is surely the future for mathematicians. One could conceive of writing an implementation of an algorithm or proof, compiling it but never actually running it purely to check that some aspects of the algorithm or proof are correct.
The Mathematical Model
We take the position as and the velocity :
where and are all IID normal with means of 0 and variances of and
We can re-write this as
where
Let us denote the mean and variance of as and respectively and note that
Since and are jointly Gaussian and recalling that = as covariance matrices are symmetric, we can calculate their mean and covariance matrix as
We can now use standard formulæ which say if
then
and apply this to
to give
This is called the measurement update; more explicitly
Sometimes the measurement residual , the measurement prediction covariance and the filter gain are defined and the measurement update is written as
We further have that
We thus obtain the Kalman filter prediction step:
Further information can be found in (Boyd 2008), (Kleeman 1996) and (Särkkä 2013).
A Haskell Implementation
The hmatrix now uses type level literals via the DataKind extension in ghc to enforce compatibility of matrix and vector operations at the type level. See here for more details. Sadly a bug in the hmatrix implementation means we can’t currently use this excellent feature and we content ourselves with comments describing what the types would be were it possible to use it.
> {-# OPTIONS_GHC -Wall #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> {-# OPTIONS_GHC -fno-warn-orphans #-}
> {-# LANGUAGE DataKinds #-}
> {-# LANGUAGE ScopedTypeVariables #-}
> {-# LANGUAGE RankNTypes #-}
> module FunWithKalmanPart1a where
> import Numeric.LinearAlgebra.HMatrix hiding ( outer )
> import Data.Random.Source.PureMT
> import Data.Random hiding ( gamma )
> import Control.Monad.State
> import qualified Control.Monad.Writer as W
> import Control.Monad.Loops
Let us make our model almost deterministic but with noisy observations.
> stateVariance :: Double
> stateVariance = 1e-6
> obsVariance :: Double
> obsVariance = 1.0
And let us start with a prior normal distribution with a mean position and velocity of 0 with moderate variances and no correlation.
> -- muPrior :: R 2
> muPrior :: Vector Double
> muPrior = vector [0.0, 0.0]
> -- sigmaPrior :: Sq 2
> sigmaPrior :: Matrix Double
> sigmaPrior = (2 >< 2) [ 1e1, 0.0
> , 0.0, 1e1
> ]
We now set up the parameters for our model as outlined in the preceeding section.
> deltaT :: Double
> deltaT = 0.001
> -- bigA :: Sq 2
> bigA :: Matrix Double
> bigA = (2 >< 2) [ 1, deltaT
> , 0, 1
> ]
> a :: Double
> a = 1.0
> -- bigH :: L 1 2
> bigH :: Matrix Double
> bigH = (1 >< 2) [ a, 0
> ]
> -- bigSigmaY :: Sq 1
> bigSigmaY :: Matrix Double
> bigSigmaY = (1 >< 1) [ obsVariance ]
> -- bigSigmaX :: Sq 2
> bigSigmaX :: Matrix Double
> bigSigmaX = (2 >< 2) [ stateVariance, 0.0
> , 0.0, stateVariance
> ]
The implementation of the Kalman filter using the hmatrix package is straightforward.
> -- outer :: forall m n . (KnownNat m, KnownNat n) =>
> -- R n -> Sq n -> L m n -> Sq m -> Sq n -> Sq n -> [R m] -> [(R n, Sq n)]
> outer :: Vector Double
> -> Matrix Double
> -> Matrix Double
> -> Matrix Double
> -> Matrix Double
> -> Matrix Double
> -> [Vector Double]
> -> [(Vector Double, Matrix Double)]
> outer muPrior sigmaPrior bigH bigSigmaY bigA bigSigmaX ys = result
> where
> result = scanl update (muPrior, sigmaPrior) ys
>
> -- update :: (R n, Sq n) -> R m -> (R n, Sq n)
> update (xHatFlat, bigSigmaHatFlat) y =
> (xHatFlatNew, bigSigmaHatFlatNew)
> where
> -- v :: R m
> v = y - bigH #> xHatFlat
> -- bigS :: Sq m
> bigS = bigH <> bigSigmaHatFlat <> (tr bigH) + bigSigmaY
> -- bigK :: L n m
> bigK = bigSigmaHatFlat <> (tr bigH) <> (inv bigS)
> -- xHat :: R n
> xHat = xHatFlat + bigK #> v
> -- bigSigmaHat :: Sq n
> bigSigmaHat = bigSigmaHatFlat - bigK <> bigS <> (tr bigK)
> -- xHatFlatNew :: R n
> xHatFlatNew = bigA #> xHat
> -- bigSigmaHatFlatNew :: Sq n
> bigSigmaHatFlatNew = bigA <> bigSigmaHat <> (tr bigA) + bigSigmaX
We create some ranodm data using our model parameters.
> singleSample ::(Double, Double) ->
> RVarT (W.Writer [(Double, (Double, Double))]) (Double, Double)
> singleSample (xPrev, vPrev) = do
> psiX <- rvarT (Normal 0.0 stateVariance)
> let xNew = xPrev + deltaT * vPrev + psiX
> psiV <- rvarT (Normal 0.0 stateVariance)
> let vNew = vPrev + psiV
> upsilon <- rvarT (Normal 0.0 obsVariance)
> let y = a * xNew + upsilon
> lift $ W.tell [(y, (xNew, vNew))]
> return (xNew, vNew)
> streamSample :: RVarT (W.Writer [(Double, (Double, Double))]) (Double, Double)
> streamSample = iterateM_ singleSample (1.0, 1.0)
> samples :: ((Double, Double), [(Double, (Double, Double))])
> samples = W.runWriter (evalStateT (sample streamSample) (pureMT 2))
Here are the actual values of the randomly generated positions.
> actualXs :: [Double]
> actualXs = map (fst . snd) $ take nObs $ snd samples
> test :: [(Vector Double, Matrix Double)]
> test = outer muPrior sigmaPrior bigH bigSigmaY bigA bigSigmaX
> (map (\x -> vector [x]) $ map fst $ snd samples)
And using the Kalman filter we can estimate the positions.
> estXs :: [Double]
> estXs = map (!!0) $ map toList $ map fst $ take nObs test
> nObs :: Int
> nObs = 1000
And we can see that the estimates track the actual positions quite nicely.
Of course we really wanted to estimate the velocity.
> actualVs :: [Double]
> actualVs = map (snd . snd) $ take nObs $ snd samples
> estVs :: [Double]
> estVs = map (!!1) $ map toList $ map fst $ take nObs test
Bibliography
Boyd, Stephen. 2008. “EE363 Linear Dynamical Systems.” http://stanford.edu/class/ee363.
Kleeman, Lindsay. 1996. “Understanding and Applying Kalman Filtering.” In Proceedings of the Second Workshop on Perceptive Systems, Curtin University of Technology, Perth Western Australia (25-26 January 1996).
Särkkä, Simo. 2013. Bayesian Filtering and Smoothing. Vol. 3. Cambridge University Press.Dominic Steinitzhttp://idontgetoutmuch.wordpress.com/?p=718Wed, 06 Aug 2014 06:34:26 +0000Equality is Hard
http://jozefg.bitbucket.org/posts/2014-08-06-equality.html
http://jozefg.bitbucket.org/posts/2014-08-06-equality.htmlWed, 06 Aug 2014 00:00:00 +0000Imagine that this is not an academic debate
http://winterkoninkje.dreamwidth.org/98119.html
http://winterkoninkje.dreamwidth.org/98119.htmlTue, 05 Aug 2014 16:01:36 +0000Are you serious? "What degree?"
http://logicaltypes.blogspot.com/2014/07/are-you-serious-what-degree.html
So, serious question.An aside (get used to them, by the way), people are never sure if I'm serious.I'm, like, seriously? C'mon!I am always silly (well, oftentimes silly, compared to my dour workmates), but I am always serious in my silliness.Chesterton said it well: the opposite of funny is not serious: the opposite of funny is not funny.Okay, that aside is done. Now onto the topic at hand.Hands up, those who have ever used your education in your jobs. Hands up, those who needed proof of your degree to prove competency as a prerequisite of employment.(sound of crickets.)Thought so.Actually, more of you might raise your hands to me than to most of my colleagues, because why? Because I have close ties to academe, that's why. So there are more than a few of you who are required to have your Master's degree or, those of you who are post-docs, to have your Ph.D. to get that research position or grant that you're working on.The rest of the world?No.Education, in the real world, is a detriment to doing your job.Across the board.A.Cross.The.Board.Do you know how many Ph.D.s we fire? Do you know how many Ph.D.s and matriculated students we turn away, because of their education and their lack of real-world experience?We did a survey: a sure bell-weather for a prospective employee? The amount of their education: the more they have, the more likely they are to be useless on the job.It's the Ph.D-disease: 'Ph.D: "piled high and deep."' People get ed-ju-ma-kated and then they think because they have a sheepskin or that they have a certain GPA at a certain school and know a certain ... educated way of doing things, that they know how to do this-and-that other thing, totally unrelated on the job."I've studied the Martin-Löf's intuitionistic type theory.""Great, how do you connect to our database.""Uh ....""Next."You'll bridle at this, but you know who agrees most strongly with me?Ph.D.sI went to the ICFP2006 specifically looking to roll dependent types into the programming language I was using, in industry, and I could not get the time of day from a single implementer of the type-theory in Twelf, and you know why?"Why would you be interested in this? This is purely theoretical."Uh, huh. As in "not applicable to industry."I reread the paper again, later: "Dependent types make programs easier to understand and more reliable."And why would I want that in my programs in industry, where it mattered.I spent four years at the United States Coast Guard Academy: 10,000 students apply, only 300 are allowed in, only 150 graduate, each year. It cost, at the time, $250,000 to graduate a student from the Academy, making it the most expensive school in the nation.Thank you, taxpayers, for my education.I graduated with a dual major: mathematics and computer science.How much of my education did I use on the job to save 150 lives from a capsized Arctic research exploration vessel (boy, they surely used their education, didn't they! ... to capsize their ship), act as the translator when we boarded Japanese trawlers, provide civil rights education and mediate EEO complaints and ...None. Zip. Zilch.After my stint, how much of my college education did I use on the job.My job was encoding matrices in FORTRAN. How much FORTRAN did I study in college?Zip. Zilch. Nada.How much of the Advanced Calculus II did I use on the job.People, it was frikken matrix manipulation! Something you can (now) look up on wikipedia and pick up in, oh, two hours if you're really slow.Java. C#. Swift. Visual whatever. Spring. SQL. HBase. Angular. JavaScript.All these things (Python, Ruby on Rails) can be taught in college, but they can be taught in high school, and I learned them in school, did I?No, I did not. I learned them on my own, thank you very much.Design patterns, frameworks, data structures. Do educated people know these things?Some of them do. Most people with a 'computer science' degree DO NOT, people.They do not. They woefully do not, as comp sci teachers lament over and over again, and as I, the hiring manager, scratch my head wondering what, precisely, did these kids learn in school, because, insofar as I see, they did not learn abstraction, polymorphism, typing, or data structures.They learned the if-statement.They learned the if-statement that n-dispatches within a for-loop from 1 to n off an array. That's the data structure they know: the array.Maps? Sets?The array.Polymorphism?We got you covered: the if-statement.Functional decomposition?Well, there's always main(String[] args), with a big-ole if-statement.The word on the street is education is a canard at best and a detriment at most, and at worst, it's a project-sinker.That's a shame, because there are educated people who are productive and smart and effective in their field, and can help.How to Solve It: a Modern Approach claims that one billion dollars is wasted on software because it's written in the absence of very simple techniques, such as linear programming.One.Billion.Dollars.Our authors, Michalewicz and Fogel, are off. Way off. I know. By a factor of at least one-hundred.We wasted a billion dollars on a file-indexing system for the FBI. Oh, it had email, too. Government project. Never used because it was never delivered in a useable state. Do you know how many go through that cycle?I don't. I do know know I've seen project after project just ...God. The waste.And I have friends. And they tell me stories.But, you get MITRE in there, or you get a Ph.D. or two in there, and what happens?They study the issue.They study it for six-plus months, and then they write you a nice little white paper that states the obvious search criteria that you knew from day one, but what do you have to say? Where is your ROC-analyses? So your bayesian system that was cranking out results month after month was killed by the bean-counting pointy-heads and they submit a ... working solution that could go into production? Oh, no. They submit a white paper calling for a research grant to allow for a year of surveying and further study of the issue.God.Then they get fired or they move on to more interesting research areas leaving behind us to clean up the mess and get a working system out the door in some serviceable shape that used zero percent of their research.Zero percent.You see, they modeled the situation, but the model doesn't fit the data, which is raw and dirty, so their solution solved the model, not your problem, not even close.Your degree.How much have you used your degree on your job?If you're a researcher, you probably use quite a bit of what you've studied in your research, and you are contributing more to your field of study.If you're not, then you're told this, day one, on your job: "Friend, put those books away, you're never going to use them again."I mean, seriously: did you really bring your college books to your job thinking you'd use them?NEEEERRRRRRD!This, here, is the real world. The ivory tower is for the academics. In the real world, you roll up your sleeves, get to work, and get some results; because if you don't, the door is right over there.You were expecting to use your degree on your job? This is America, people. We don' need no edjumakashun.Now, if this were Soviet Russia, your degree uses you.-----So, silliness, and serious silliness aside.Seriously.You were expecting to use your degree on your job?English major. This is America, we don't talk English, we talk American, so, that's nice that you have that degree.Mathematics major. This is America, we don't do 'maths,' nor trig, nor geometric algebras, nor category theory, how would you use any of that on your job?I was seriously asked that on my interview for a job overseeing a 1.7 petabyte-sized database.I said: 'uh, map-reduce are straight from category theory.'"Yes, but how do you use that on your job?"We both blinked at each other dumbly.The gulf.You don't go to school to get trained to do a job well, ladies and gentlemen.I mean, too many of you do that, and too many others go do school to party some, to sex some, to blaze some, and then get to work after your folks financed your four-plus year bacchanal.College is not a technical training-institute and has nothing to do with acquiring skills or proficiency on repetitive stress disorder, oh, I meant: 'your job.' Your job, almost without exception, can be just as proficiently performed by nearly anyone they drag off the street and put in your chair for eight hours a day. They sit in your chair for a few days, and everyone else won't even know you're gone.Most jobs.My wife beat the pants off her entire payroll division with an excel spreadsheet because they didn't have simple accounting principles and deductive reasoning. Why? Because they were well-regulated at their jobs, proficient at it, in fact, and their job was to make continuous clerical errors because they had absolutely no rigor. Why would they? They weren't paid for rigor. They were paid for doing their jobs, which was: don't make waves.I regularly go into situations where other software engineers (a misnomer, they are more like computer programmers, not engineers) say such-and-so cannot be done in programming language X.Then, I implement a little bit of category theory, in programming language X, do some simple mappings and natural transformations, and, voilà! those 50,000 lines of code that didn't solve the problem but only made things worse? I replace all that with 500 lines of code that actually delivers the solution.Unit tested: all the edge cases.And meeting their requirements, because I've translated the requirements into a declarative DSL on top of their programming language X.Of course they couldn't solve the insurmountable problem in programming language X, not because they were using programming language X (although it helped with the colossal fail being object-disoriented and improvably/mutatatively impure), but because they couldn't think outside the box that 'you can only do this and that' as a software engineer. They were caught in their own domain and can't even see that they had boxed themselves in.Because they were educated that way. Comp Sci 101: this is how you write a program. This is the 'if'-statement. This is the for-loop. If that doesn't work, add more if-statements wrapped by more for-loops, and this statement is perfectly acceptable:x = x + 1Go to town.That's what their education gave them: they went to school to acquire a trade and a proficiency at the if-statement, and gave up their ability to see and to think.And some, many, academics are the most bigoted, most blundering blinders-on fools out there, because they see it their way, and they see their way as the only way, which requires a six-month research grant and further study after that.With co-authorship on the American Mathematical Society journal article.And the uneducated are the worst, most pigheaded fools out there, so sure that the educated have nothing to offer, that they have no dirt under their perfectly manicured fingernails attached silky smooth hands that have never seen an honest-day's work nor, God forbid! a callous, so what do they know, these blowhards, so the uneducated ignore the advances of research into type-theory, category theory, object theory (polymorphism does help at times), any theory, and just code and code and code until they have something that 'looks good.'How to solve this?Start with you.Not with your education, that is: not with your education that tells you who you are.Start with how you can help, and then help.Project one: I saw how fractal dimensionality would solve a spectrum analysis problem. Did I say the words 'fractal' or 'dimensions'? No. I was working with real-programmers. If I asked if I could try this, do you know what they would say? Pfft. Yeah, right. Get back to work, geophf!But, instead, I implemented the algorithm. I sat with a user who had been working on those signals and knew what he needed, iterated through the result a week.Just a week. While I did my job-job full time. I did the fractal spectrum analysis on my own time.My 'thing' floored the software management team. They had seen straight-line approximations before. They thought I was doing actual signal analysis. I mean: with actual signals.They showed my 'thing' to the prospective customer. And got funded.Another project: data transformation and storage, built a system that encompassed six-hundred data elements using a monadic framework to handle the semideterminism. That was an unsolvable problem in Java. I used Java.Java with my monadic framework, yes, but Java, to solve the problem.Third project: calculating a 'sunset date' over a data vector of dimension five over a time continuum. Hm: continuum.Unsolvable problem. Three teams of software developers tackled it over six months. Nobody could get close to the solution.Continuum.I used a comonadic framework.Took me, along with a tester who was the SME on the problem, and a front-end developer to get the presentation layer just right, about a month, and we solved that baby and put it to bed.Unit tested. All edge cases.Did I tell them I used a comonadic framework?Nah, they tripped over themselves when they saw the word 'tuple.'No joke, my functional programming language friends: they, 'software engineers,' were afraid of the word 'tuple.'So I explained as much as anyone wanted to know when anyone asked. I wrote design documents, showing unit test case results, and they left me alone. They knew I knew what I was doing, and I got them their results. That's what they needed.They didn't need my degree.They didn't need to know I used predicate logic to optimize SQL queries that took four hours to run to a query that took forty-five seconds.They didn't need to know I refactored using type theory, that A + B are disjoint types and A * B are type instances and A ^ B are function applications so I could look at a program, construct a mathematical model of it and get rid of 90% of it because it was all redundantly-duplicated code inside if-clauses, so I simply extracted (2A + 2B ... ad nauseam) to 2(A + B ...) and then used a continuation, for God's sake, with 'in the middle of the procedure' code, or, heaven help me, parameterization over a simple functional decomposition exercise to reduce a nightmare of copy-and-paste to something that had a story to tell that made sense.How do you connect to a database?Do you need a college degree for that?Kids with college degrees don't know the answer to that simple interview question.And they don't know the Spring framework, making 'how to connect to a database' a stupid-superfluous question.They don't know what unit tests give them. They don't know what unit tests don't give them. Because they, college kids and crusty old 'software engineers,' don't write them, so they have no consistency nor security in their code: they can't change anything here because it might break something over there, and they have no unit tests as a safety net to provide that feedback to them, and since they are programming in language X, a nice, strict, object-oriented programming language, they have no programs-as-proofs to know that what they are writing is at all good or right or anything.A college degree gives you not that. A not college degree gives you not that.A college degree is supposed to what, then?It's suppose to open your mind to the possibility of a larger world, and it's supposed to give you the tools to think, and to inquire, so that you can discern."This, not that. That, and then this. This causes that. That is a consequence of this. I choose this for these reasons. These reasons are sound because of these premises. This reason here. Hm. I wonder about that one. It seems unsound. No: unfamiliar. Is it sound or unsound? Let me find out and know why."English. Mathematics. Art. Literature. Music. Philosophy. All of these things are the humanities. The sciences and the above. Law. Physics. All these lead one to the tools of inquiry.In school, you are supposed to have been given tools to reason.Then, you're sent back out into the world.And then you are supposed to reason.And with your reason, you make the world a better place, or a worse place.These things at school, these are the humanities, and they are there to make you human.Not good at your job, not so you can 'use' your degree as a skill at work, but to make you human.And, as human, are you good at your job?Yes.And, as human, do you make your world a place such that others are good and happy at their jobs?Yes.The end of being human is not to be skilled, nor proficient ... 'good' at your job.But it's an accident of it, a happy accident.The 'end' of being human?Well: that's your inquiry.That's what school, that's what everything, is for: for you to answer the unanswered question.Your way.And, if you accept that, and are fully realized as a human being, then your way is the best way in the world, and your way has the ability to change lives. First, your own, then others. Perhaps your coworkers.Perhaps hundreds of others.Perhaps thousands.Perhaps you will change the entire world.But you won't know that until you take that first step of inquiry.Then the next.Then the next.And you look back, and you see how far you've come, and ... wow.Just wow.That's what school is for. Not for your job.For you.geophftag:blogger.com,1999:blog-4650294074444534066.post-5671015007577180231Tue, 29 Jul 2014 03:50:00 +0000On my pulling away from Haskell communities
http://winterkoninkje.dreamwidth.org/97625.html
http://winterkoninkje.dreamwidth.org/97625.htmlMon, 04 Aug 2014 02:30:37 +0000Code reuse considered harmful
http://feedproxy.google.com/~r/blogspot/hSASX/~3/YE6QnAqLNKA/code-reuse-considered-harmful.html
The title is intended to call for attention. This post is about one perspective of software development in the light of my own experience in the area, it won't contain anything really revealing and is not to be taken as an absolute true for life. It's a rant. I hope you have a good time reading it, feel free to leave me any kind of feedback. I see a bunch of people praising reuse as being the prime thing of good software development, and few talking about replaceability. There seems to be a constant seek to avoid writing code that is used only once, as if it was a really bad thing. Then we end up with software that is made of conceptual factories that create factories that create the things the software really needs, yes there are two levels of factories, or more. Is this really necessary? How much time do we save by this extreme look for reusing code? First, let me ask and answer a simple question: why duplicated code is annoying? Well, duplicated code makes it harder to change stuff. When you have the same piece of code written multiple times in a code base and you find that it needs a change, e.g. bug fix or new feature, you will need to change it in all places. Things can get worst if you don't know all places where the code is duplicated, so you may forget to change one of these spots. The result is that duplicated code is a sign of harder maintenance and a fertile ground for further bugs to spawn. That's why we learned to hate it. We started fighting this anti-pattern with all strength we had. Code reuse is the perfect counter to code duplication, right? Sure, it is right, if we reuse a piece of code in two places, we have no duplication between these places. So, we did it! We found the Holy Grail of code quality, no more duplicated code, yay! But something unintended happened. Remember the old saying: with great powers, comes great responsibility. People started to be obsessed with it. As soon as they learned to use the hammer of code reuse, everything turned into a nail, when it didn't work out in the first hit, they adjust the size of the hammer and hit it again with more effort. This seek after code reuse led us to a plethora of abstractions that seems to handle every problem by reusing some code. Don't get me wrong, lots of them are useful, these are the ones that were created from observation. The problem is the ones that are created from "it's cool to abstract", or other random reason that is not true observation. We see frameworks after frameworks that try to fit every problem of the world into a single model. Developers learn to use these frameworks and suddenly find out that the framework creator is wrong and create yet another abstraction over it or creates yet another framework that tries to use a different model to solve the world. What happens when we have a bug in one of these abstractions or we need to enhance it? Silence, for a while, then the sky turns black, you take a break, go for a walk, come back to your computer and start blaming the other developer that created the bug or that "got the abstraction wrong", because your vision was the right one. What happened? We reused code to avoid code duplication, but we are still having the same problems: code that is hard to maintain and evolve. My guess? We missed the enemy. Code duplication is not our enemy. Maintenance problem and rigidity of code is. My tip? Give more focus on replaceability of code instead of reuse in your talks, codes, classes, etc. Create the right abstraction to fix the problem at hand in a way that is easy to replace the underlying code when needed. Some time in the future, you will need to change it anyway. That's what agile methodologies try to teach us: embrace change. Planning for a design to be reused says: "my design will be so awesome, that I will reuse it everywhere." That's what agile says: "your design will need to change sometime, because the requirements will change, plan for the replaceability of it." People are doing things like service oriented architecture in the wrong way because they are looking for reuse of services and not for replaceability of services, they end up with a Big Web of Mud. That's all folks. Thanks for your time.Thiago Negritag:blogger.com,1999:blog-5832470180761099563.post-1631004021798057258Sun, 03 Aug 2014 21:24:00 +0000Unity3D, Bejeweled and Domain-driven design
http://feedproxy.google.com/~r/blogspot/hSASX/~3/I5UC8tXaCfU/unity3d-bejeweled-and-domain-driven.html
I'm working in a new game like Bejeweled. I'm happy with the freedom of code organization that Unity3D engine allows. During my first contacts with it, I thought that almost everything would be oriented to MonoBehaviour class, but this showed to be false. This class is necessary just as a glue point between any C# code and the objects of the engine. I'll report how I've started coding this game and the changes I made so far, you can watch the following video to see the current state of the game: I started creating a GameObject for every entity that I identified in the game mechanics: BoardPieceThe board contains every pieces and manages them: public class Board : MonoBehaviour { private GameObject[,] pieces; void Awake() { pieces = /* Initialize pieces */; }}The piece type is defined by a MonoBehaviour that exposes an enumeration: public class Piece : MonoBehaviour { public PieceType Type;}public enum PieceType { Circle, Square, Star, Triangle, Hexagon, Polygon} After the definition of the entities participating in the game, I started to code game's logic inside these classes. It worked for a while, but some problems appeared. The same classes had lots of different responsibilities (i.e. game's rules, animations, handling input) and this made it hard to code some stuff, because I needed to maintain a mind map of all concerns to avoid breaking something. Also, during animations, the board in memory was in an inconsistent state, waiting for the end of the animation to then continue processing. Recently I've read some stuff about Domain-driven design (DDD) and decided to apply a bit of it in this game. My first step was to separate my code domain from the others, I selected game's mechanics as my core domain: if this part is not well behaving and it's hard to maintain, I'll be in a bad spot. Then I went to create this domain classes completely separated from the rest of the game, I ignored the existence of Unity3D at this point. I only seen a single entity for this domain: the board. It makes no sense for the piece to exist on its own, everything that involves pieces always happens inside the board. I still have a class for the piece, but it is an internal thing of the board. My design became this: public class BoardPosition { public readonly int Row; public readonly int Column; public BoardPosition(int row, int column) { Row = row; Column = column; }}public class Board { private Piece[,] pieces; public Board() { pieces = /* Initialize pieces */; }#region Queries public Piece PieceAt(BoardPosition p) { /* ... */ }#endregion#region Events public delegate void PieceCreatedDelegate(BoardPosition position, Piece piece); public event PieceCreatedDelegate PieceCreated; public delegate void PieceDestroyedDelegate(BoardPosition position); public event PieceDestroyedDelegate PieceDestroyed; public delegate void PieceMovedDelegate(BoardPosition from, BoardPosition to); public event PieceMovedDelegate PieceMoved; public delegate void PiecesSwappedDelegate(BoardPosition a, BoardPosition b); public event PiecesSwappedDelegate PiecesSwapped;#endregion#region Commands public void SwapPieces(BoardPosition a, BoardPosition b) { ...; // Swap pieces PiecesSwapped(a, b); } public void StepGameState() { ...; // Destroy pieces ...; // Move pieces ...; // Create pieces for (...) { PieceDestroyed(...); } for (...) { PieceMoved(...); } for (...) { PieceCreated(...); } }#endregion}This way, the view part of the game register itself to handle the events generated by the board and update the user interface as needed. public class BoardView : MonoBehaviour { private Board board; private GameObject[,] pieces; void Awake() { board = new Board(); board.PieceCreated += HandlePieceCreated; board.PieceDestroyed += HandlePieceDestroyed; board.PieceMoved += HandlePieceMoved; board.PiecesSwapped += HandlePiecesSwapped; pieces = /* Initialize pieces based on 'board' */; } public void HandlePieceCreated(BoardPosition position, Piece piece) { /* ... */ } public void HandlePieceDestroyed(BoardPosition position) { /* ... */ } public void HandlePieceMoved(BoardPosition from, BoardPosition to) { /* ... */ } public void HandlePiecesSwapped(BoardPosition a, BoardPosition b) { /* ... */ } void Update() { board.Step(); if (/* ... */) { board.SwapPieces(a, b); } }} This design made it hard to sync time between the model and the view. The model calls the methods of the view to notify about changes, the view has little space left to decide when to handle each event. In my case, some events started animations that needed to hold other events from happening, i.e. there is a temporal sequencing between some events. I changed the model to return a list of events that happened at each command, instead of calling the handler directly: #region Eventspublic interface BoardEvent {}public class PieceCreated : BoardEvent { /* ... */ }public class PieceDestroyed : BoardEvent { /* ... */ }public class PieceMoved : BoardEvent { /* ... */ }public class PiecesSwapped : BoardEvent { /* ... */ }#endregion#region Commandspublic List SwapPieces(BoardPosition a, BoardPosition b) { /* ... */ }public List StepGameState() { /* ... */ }#endregionNow, the view needs to call the handlers itself, but can decide when to handle each event: public class BoardView : MonoBehaviour { private List events; void Update() { if (events.Count < 1) { events = board.StepGameState(); } foreach (BoardEvent e in events) { if (CanHandleNow(e)) { Handle(e); } } // ... if (HandledEverything) { events.Clear(); } }}After this, I still felt that this temporal sequencing was not clear, it was "floating in the air". I decided to put it into the model, it's part of my domain: every event has a temporal identifier: public class Board { private int timeCounter; public List StepGameState() { ...; // Destroy pieces for (...) { events.add(new PieceDestroyed(timeCounter, ...)); } if (eventHappened) { timeCounter++; } ...; // Move pieces for (...) { events.add(new PieceMoved(timeCounter, ...)); } if (eventHappened) { timeCounter++; } ...; // Create pieces for (...) { events.add(new PieceCreated(timeCounter, ...)); } if (eventHappened) { timeCounter++; } return events; }}public class BoardView : MonoBehaviour { private int timeCounter; private List events; void Update() { if (events.Count < 1) { events = board.StepGameState(); } foreach (BoardEvent e in events) { if (e.When() == timeCounter) Handle(e); if (e.When() > timeCounter) { stillHasEventsToHandle = true; break; } } if (/*handledAnimationOfAllEventsOfMyTimeCounter*/) { // Advance time perception of view timeCounter++; } if (!stillHasEventsToHandle) { events.Clear(); // Will step game state at next frame } }}Both view and model has a temporal identifier and the sync is more evident. The actual code is looking very similar to the listed here. The model is handling well up to now. I feel bad about one thing: the Step command of the model may leave the board in a "not-consolidated" state, as it makes a single interaction to check for matching groups to be removed from the board. The view then needs to call the Step command more than once between handling two inputs from the user. I didn't want to make a lot of interactions in a single Step to avoid putting lots of stuff in memory before anything is handled by the interface, looks like a waste to me. I miss the lazy part of Haskell. I still have lots of stuff to add to the game's mechanics (my core domain). I'll see the problems of this design in the next days and will post news with the next changes. Critics and suggestions are welcome.Thiago Negritag:blogger.com,1999:blog-5832470180761099563.post-6067453993179413182Sun, 03 Aug 2014 17:12:00 +0000ICFP 2014 Post-Mortem
http://r6.ca/blog/20140803T030905Z.html
I participated in the 2014 ICFP programming contest this year.
This year’s task was to write an AI for a simplified Pac-Man game called Lambda-Man.
You could write the AI in any language you wanted, as long as it complies to a specific SECD machine architecture invented for the contest.
At the end of the lightening round, it was announced that the final task included writing an AI for the ghosts as well.
Again, the ghost AI could be written in any language, as long as it compiles to a separate 8-bit architecture invented for the contest.
I spent the first several hours implementing my own simulator of the arcade.
Eventually I realized that I would have to start working on the AI if I was going to have an entry for the 24-hour lightening division.
It was at that point I realized that the provided on-line simulator was plenty adequate for my needs and I never completed my simulator.
I have some previous experience writing assembler DSLs in Haskell to handle linking.
After the 2006 ICFP contest, our team wrote a fake UM-DOS shell so that we could submit our solution in UM format.
This lead me to writing an
article in The Monad Reader about how to write an assembler using recursive do.
After that, I encountered a really elegant and simple formulation of an assembler monad on some paste site.
Unfortunately, I do not recall the author, but here is how the implementation looks.
newtype Label = Label { unLabel :: Int }
data ASMMonad w a = ASMMonad { runASM :: Label -> ([w],a) }
instance Monad (ASMMonad w) where
return a = ASMMonad $ \_ -> ([], a)
x >>= y = ASMMonad (\(Label i) -> let (o0, a) = runASM x (Label i)
(o1, b) = runASM (y a) (Label (i+length o0))
in (o0 ++ o1, b))
instance MonadFix (ASMMonad w) where
mfix f = ASMMonad (\i -> let (o0, a) = runASM (f a) i in (o0, a))
execASM :: ASMMonad w a -> [w]
execASM m = fst $ runASM m (Label 0)
Next one adds two primitive operations.
The tell function is similar to the version for the writer monad.
The label function returns the current index of the output stream.
tell :: [w] -> ASMMonad w ()
tell l = ASMMonad $ \_ -> (l,())
label :: ASMMonad w Label
label = ASMMonad $ \i -> ([],i)
Lastly one makes an ASMMonadic value for each assembly instruction
data ASM = LDC Int32 -- load constant
| LD Int Int -- load variable
| LDF Label -- load function
| ADD
{- … -}
deriving Show
ldc x = tell [LDC x]
ld x y = tell [LD x y]
ldf x = tell [LDF x]
add = tell [ADD]
{- … -}
At the risk of jumping ahead too far, my compiler can produce linked assembly code very simply.
The clause below compiles a lambda abstraction to linked SECD assembly using recursive do.
compileH env (Abs vars body) = mdo
jmp end
begin <- label
compileH (update env vars) body
rtn
end <- label
ldf begin
Thanks to recursive do, the first line, jmp end, refers to the end label which is bound in the second last line.
With a DSL assembler written in Haskell, I turned to creating another DSL language in Haskell to compile to this assembly language.
The SECD machine is designed for Lisp compilers, so I created a little Lisp language.
data Binding a = a := Lisp a
data Lisp a = Var a
| Const Int32
| Cons (Lisp a) (Lisp a)
| Abs [a] (Lisp a)
| Rec [Binding a] (Lisp a)
{- … -}
The Abs constructor builds an n-ary lambda function.
The Rec constructor plays the role of letrec to build mutually recursive references.
With some abuse of the Num class and OverloadedStrings, this Lisp DSL is barely tolerable to program with directly in Haskell.
Rec [ {- … -}
,"heapNew" := ["cmp"]! (Cons "cmp" 0) -- heap layout 0 = leaf | (Cons (Cons /heap is full/ /value/) (Cons /left tree/ /right tree/))
-- "cmp" @@ ["x","y"] returns true when "x" < "y"
,"heapIsFull" := ["h"]! If (Atom "h") 1 (caar "h")
,"heapInsert" := ["cmpHeap", "v"]! Rec ["cmp" := (car "cmpHeap")
,"insert" := ["heap", "v"]! -- returns (Cons /new heap is full/ /new heap/)
If (Atom "heap") (Cons (Cons 1 "v") (Cons 0 0))
(Rec ["root" := cdar "heap"
,"left" := cadr "heap"
,"right" := cddr "heap"
] $
Rec ["swap" := "cmp" @@ ["v", "root"]] $
Rec ["newRoot" := If "swap" "v" "root"
,"newV" := If "swap" "root" "v"
] $
If (caar "heap" `ou` Not ("heapIsFull" @@ ["left"]))
(Rec ["rec" := "insert" @@ ["left", "newV"]] $
Cons (Cons 0 "newRoot") (Cons "rec" "right"))
(Rec ["rec" := "insert" @@ ["right", "newV"]] $
Cons (Cons ("heapIsFull" @@ ["rec"]) "newRoot") (Cons "left" "rec")))
]
(Cons "cmp" ("insert" @@ [cdr "cmpHeap","v"]))
{- … -}
The @@ operator is infix application for the Lisp langauge and the ! operator is infix lambda abstraction for the Lisp langauge.
This Lisp language compiles to the SECD assembly and the assembly is printed out.
The compiler is very simple.
It does not even implement tail call optimization.
There is a bit of an annoying problem with the compiler; the assembly code is structured in exactly the same way that the original Lisp is structured.
In particular, lambda abstractions are compiled directly in place, and since lambda expressions are typically not executed in the location they are declared,
I have to jump over the compiled code.
You can see this happening in the snippet of my compiler above.
I would have preferred to write
compileH env (Abs vars body) = do
fun <- proc (compileH (update env vars) body)
ldf fun
where proc is some function that takes an ASMMonad value and sticks the assembly code “at the end” and returns a label
holding the location where the assembly code got stashed.
However, I could not figure out a clever and elegent way of modifing the assembly monad to support this new primitive.
This is something for you to ponder.
My Lambda AI, written in my Lisp variant, is fairly simple and similar to other entries.
Lambda-Man searches out the maze for the nearest edible object.
It searches down each path until it hits a junction and inserts the location of the junction into a binary heap.
It also inserts the junction into a binary tree of encountered junctions.
If the junction is already in the binary tree, it does not insert the junction into the heap because it has already considered it.
The closest junction is popped off the heap, and the search is resumed.
There is at least one bit of surprising behaviour.
If there is more than one path from one junction to another, sometimes Lambda-Man ends up taking the longer path.
This behaviour did not seem to be bothersome enough to warrant fixing.
This programming task has renewed my appreciation for typed languages.
The Lisp language I developed is untyped, and I made several type errors programming in it.
Although it is true that I did detect (all?) my errors at run-time, they were still frustrating to debug.
In a typed language, when an invariant enforced by the type system is violated, you get a compile time error that, more or less, points to the code location where the invariant is violated.
In an untyped language, when an invariant is violated, you get a run-time error that, more or less, points to some point in the code where missing invariant has caused a problem.
While this often is enough to determine what invariant was violated, I had little idea where the code breaking the invariant was located.
With some effort I probably could have used GADTs to bring Haskell’s type checker to the Lisp DSL, but I was not confident enough I could pull that off in time.
I also needed to write some ghost AIs.
The 8-bit machine that the ghosts run on is so constrained, 256 bytes of data memory; 256 code locations; 8 registers, that it seemed to make sense to write the code in raw assembly.
The first thing I tried was to make the ghosts move randomly.
This meant I needed to write my own pseudo-random number generator.
Wikipedia lead me to a paper on how to write long period xorshift random number generators.
The examples in that paper are all for 32-bit or 64-bit machines, but I had an 8-bit architecture.
I wrote a little Haskell program to find analogous random number generators for 8-bit machines.
It found 6 possibilities for 32-bit state random number generator composed of four 8-bit words that satisfied the xorshift constraints described in the paper.
Here is the assembly code for getting a 2 bit pseudo-random value.
mov a,[0]
div a,2
xor [0],a
mov a,[0]
mul a,2
xor a,[0]
mov [0],[1]
mov [1],[2]
mov [2],[3]
mul [3],8
xor [3],[2]
xor [3],a
; get 2 bits
mov a,[3]
div a,64
The random seed is held in memory locations [0] through [3].
After moving to the successive the state, this code takes 2 pseudo-random bits from memory location [3] and puts it into register a.
I did not check the quality of this random number generator beyond constructing it so that it has a period of 232-1.
I expect the bit stream to appear to be quite random.
My Lambda-Man performed reasonably well against my random ghosts, so I put some effort into making my random ghosts a little smarter.
I wrote a ghost AI that tried to get above Lambda-man and attack him from above.
Then I made each other ghost try to attack Lambda-man from the other three directions in the same manner.
The idea is to try to trap Lambda-man between two ghosts.
These smarter ghosts were quite a bit more successful against my simple Lambda-man AI.
At this point I was out of contest time, so that was it for my 2014 ICFP contest submission.
Thanks to the organizers for a terrific contest problem.
I am looking forward to see the final rankings.http://r6.ca/blog/20140803T030905Z.htmlSun, 03 Aug 2014 03:09:05 +0000HsQML 0.3.1.1 released: One Thousand Downloads
http://blog.gekkou.co.uk/2014/08/hsqml-0311-released.html
A few days ago I released HsQML 0.3.1.1, a bug fix to my Haskell binding to the Qt Quick GUI library. You can download the latest release from Hackage as usual.The primary purpose of this release was to fix issue 20. HsQML has code which monitors variant values created using the Qt library in order to prevent objects which are referenced by a variant from being garbage collected. A flaw in this code caused it to examine the data held inside variants even when it wasn't valid, causing a crash in certain circumstances.release-0.3.1.1 - 2014.07.31 * Fixed crash when storing Haskell objects in QML variants. * Fixed corrupted logging output caused by threading.In related news, HsQML has now reached over 1000 downloads from Hackage since Hackage 2 started collecting download statistics late last year. See the bar chart below:-The spike in May was owed to the transition to Qt 5 brought about by the release of 0.3.0.0. Hopefully, the graph will climb to new heights with the release of more features in the future!My target for the next release is to support rendering OpenGL graphics directly from Haskell code and into the QML scene, to better support applications with sophisticated requirements for custom drawing. This is tracked by issue 10.Robin KAYtag:blogger.com,1999:blog-4001434382262009587.post-5222699513589427989Sat, 02 Aug 2014 11:59:00 +00001HaskellADay July 2014 problems and solutions
http://logicaltypes.blogspot.com/2014/08/1haskelladay-july-2014-problems-and.html
July 1st, 2014: (text:) "Hi, all! @1HaskellADay problems are back! #YAY First (renewed) problem: a verbose way of saying, 'Give me a deque!' http://lpaste.net/106748" Deque, last, and all that (verbose version with hints) (solution: Deque the halls (with my solution): Data.Deque)July 2nd, 2014: (text:) "Today's Haskell exercise: Vectors, length in constant time, and (bonus) reverse return in constant time. http://lpaste.net/106843" Vector (solution: Vector: Magnitude, and Direction, OH, YEAH! Data.Vector)July 4th, 2014: (text:) "Today's exercise(s). pack/unpack. encode/decode. Cheer up, sweet B'Genes! http://lpaste.net/106912" Cheer up, Sweet B'Genes (solution: GATTACA)July 7th, 2014: (text:) "#haskell daily exercise: ROLL with it, Baby! http://lpaste.net/107047 ('cause I'm feeling a little #forth'y')" Roll (solution: Rollin' on the (finite) river)Bonus problem: July 7th, 2014: (text:) "For those who found the 'roll'-exercise trivial; here's (a more than) a bit more of a challenge for you to play with. http://lpaste.net/107023" Acid rules! (solution: "A solution set to today's challenge exercise: BASIC ... http://lpaste.net/107069 ... and Acidic http://lpaste.net/107071 ... WHEW! That was fun!" BASIC ... and Acitic)July 8th, 2014: (text:) "Today's #Haskell exercise: LOTTO! Powerball! Mega-whatever! Who's the big winner? http://lpaste.net/107104" Lotto (solution: "And the big winner today is ... solution-set to today's #Haskell lotto exercise http://lpaste.net/107130" ... and the winner is ...)Bonus problem: July 8th, 2014: (text:) "#bonus #haskell exercise: Well, that was RND... Randomness, and such (or 'as such') http://lpaste.net/107105" Well, that was RND (solution: For YESTERDAY's bonus question of roll-your-own-rnd-nbr-generator, here's one as comonadic cellular automata (*WHEW*) http://lpaste.net/107205: Data.Random)July 9th, 2014: (text:) "Okay, ... WHERE did yesterday and today GO? :/ #haskell exercise today: "Hey, Buddy!" http://lpaste.net/107181 I will post solution in 4 hours" Hey, Buddy! Distinct sets-of-an-original-set. (solution: "Here's a story ..." A(n inefficient) solution to bunches and cliques. http://lpaste.net/107273" Brady Bunch)July 10th, 2014: (text:) "Today's #haskell list-exercise: "Get out of the pool!" http://lpaste.net/107286 Will post a solution at 9 pm EDT (which is what time CET? ;)" (solution: "She's a supa-freak! She's supa-freaky! (Bass riff) A solution to today's #haskell exercise about list-length-ordering http://lpaste.net/107308")July 11th, 2014: (text:) ""It's Friday, Friday!" So, does that mean Rebecca Black wants to code #haskell, too? Today is a Calendar #exercise http://lpaste.net/107328" (solution: ""In a New York Minute": a solution to today's #haskell exercise that took WAAAY more than a minute to complete! #WHEW http://lpaste.net/107370")July 14th, 2014: (text:) "Today's #haskell exercise: isPrime with some numbers to test against. They aren't even really Mp-hard. ;) http://lpaste.net/107463" First stab at primality-test (solution: "A simple, straightforward stab at the test for primality. #haskell #exercise http://lpaste.net/107483" The start of a primal inquiryJuly 15th, 2014: (text:) "Primes and ... 'not-primes.' For a prime, p, a run of p-consecutive 'plain' numbers is today's #haskell exercise: http://lpaste.net/107536" (solution: "So ya gotta ask yerself da question: are ya feelin' lucky, punk? Run of p non-primes in linear time http://lpaste.net/107549 #haskell exercise." Alternate solution by Gautier: http://lpaste.net/107549)July 16th, 2014: (text:) "Difference lists? We no need no steenkin' Difference lists!" http://lpaste.net/107593 DList in #haskell for today's exercise. (solution: "DLists? We got'cher DList right here! A solution to today's #haskell exercise is posted at http://lpaste.net/107607")July 17th, 2014 (text:) "http://lpaste.net/107652 Prélude à l'après-midi d'un Stream ... I thought that last word was ... something else...? #haskell exercise today." Comonads for lists and Id. (solution: "Control.Comonad: That was easy! http://lpaste.net/107661 #haskell exercise #solution" Learn you a Comonad for Greater Good. Funny story, bro'! id is not necessarily Id. (I knew that.) http://lpaste.net/107662 #haskell solution")Bonus exercise: July 17th, 2014 (text:) "Streams are natural, streams are fun, streams are best when ... THEY'RE BONUS QUESTIONS! #bonus #haskell exercise http://lpaste.net/107655" LET'S GET THIS PARTY STARTED! (solution: "Take this Stream and ... it! #solution to today's #haskell #bonus exercises http://lpaste.net/107665")July 18th, 2014: (text: "Today's #haskell exercise: Frère Mersenne would like a prime, please. http://lpaste.net/107700") (see solution next bullet)Bonus exercise: July 18th, 2014 (text: "#bonus prime-time! Frère Mersenne would be pleased with a partial proof of a prime ... in good time. http://lpaste.net/107701") (solution: "A #haskell #solution for (monadic?) primes and the #bonus interruptible primes. http://lpaste.net/107708") Primary primes.Bonus-bonus exercise: July 18th, 2014 (text: "Ooh! π-charts! No. Wait. #bonus-bonus #haskell exercise. http://lpaste.net/107702") (solution: "#bonus-bonus: a #solution http://lpaste.net/107718")July 21st, 2014: (text: "Demonstrating coprimality of two integers with examples #haskell exercise http://lpaste.net/107819") (solution: "A coprimes solution #haskell problem is at http://lpaste.net/107843")July 22nd, 2014: (text: "The prime factors of a number (and grouping thereof) as today's #haskell exercise. http://lpaste.net/107878") (solution: "OKAY, THEN! Some prime factors for ya, ... after much iteration (torquing) over this #haskell exercise solution. http://lpaste.net/107939")Bonus exercise: July 22nd, 2014: ("For today's #bonus #haskell exercise you'll find a Bag 'o gold at the end of the rainbow http://lpaste.net/107881") (solution: "Second things first: a definition for the Bag data type as today's #bonus #haskell exercise. http://lpaste.net/107815")July 23rd, 2014: (text: "Today's #haskell exercise, two variations of Euler's totient function http://lpaste.net/107955") (solution: "And, for a very small φ ... http://lpaste.net/107972 is a solution-set to today's #haskell exercise.")July 24th, 2014: (text: "WEAKSAUCE! Goldbach's conjecture irreverently presented as a #haskell exercise. http://lpaste.net/108019") (solution: "That solution to today's #haskell exercise will cost you one Goldbach (*groan!*) http://lpaste.net/108059")July 25th, 2014: LOGIC! Peano series: it's as easy as p1, p2, p3 ... http://lpaste.net/108099 ... in today's #haskell exercise. "Excuse me, Miss, where do we put this Grande Peano?" A solution to today's #Haskell exercise in the HA!-DSL http://lpaste.net/108140Bonus: July 25th, 2014: http://lpaste.net/108108 Bonus #haskell problem for today. But not as easy as λa, λb, λc ... Church numerals and booleans. Correction: Ooh! forall! Church encodings and Haskell have a funny-kind of relationship. Updated the #bonus #haskell exercise with rank-types and forall. Solution: "Gimme that olde-time Church encoding ... it's good enough for me!" A solution to today's #bonus #haskell exercise http://lpaste.net/108114July 28th, 2014: George Boole, I presume? Today's #haskell exercise: http://lpaste.net/108272. Solution: This and-or That ... a NAND-implementation of today's #haskell exercise at http://lpaste.net/108295July 29th, 2014: Readin' 'Ritin' 'Rithmetic: today's #haskell exercise http://lpaste.net/108358 Solution: That's alotta NANDs! A solution to today's exercise at http://lpaste.net/108387July 30th, 2014: ACHTUNG! BlinkenLights! Today's #haskell exercise http://lpaste.net/108420. Solution: Let it Snow! Let it Snow! Let it (binary) Snow! A solution to today's exercise is at http://lpaste.net/108429 July 31st, 2014: π-time! http://lpaste.net/108485 Today's #haskell exercise. BLARG! UPDATE! Please read the update attached to the problem statement, simplifying the calculation quite a bit: http://lpaste.net/108485 Solution: Apple or coconut π? A solution to today's problem http://lpaste.net/108494Notes on the problemsJuly 9th, 2014. I didn't quite know how to go about this, so I made several attempts with the State pattern. But how to describe it? It's the base pool from which you draw, and each (sub-)choice-point affects it, what's that type? I spent way too much time trying to discern the type, and failing. But now, a much simpler approach suggests itself to me (after experiencing the New York Minute exercise): this is simply a permutation of the list, and then that permutation is partitioned by the sizes of the groups! Implementing permute-then-partition is a much simpler approach than tracking some monster monadic state transformer.No, that didn't work, either. A permutation will give you [[1,2], ...] and[[2,1], ...] That is, all solutions, even the redundant ones. So, I reworked the problem simply following the data. With takeout feeding the iterative-deepening function, I finally got a guarded state-like thingie working fast and correctly. The new solution is on the same page as the old one.July 11th, 2014. The New York Minute problem demonstrates the implementation of a rule-based classifer. It takes unclassified numeric inputs, and based on the cues from the rules, either types each number into day, month, year, hour, minute, or rejects the input data as unclassifiable. I was pleased to have implemented this entire system in less than two days of work! Sweet!July 22nd, 2014. So, I've been running up against the double-exponential cost of computing a stream of primes for some time now since I gave the solution to the question of demonstrating the Prime Number Theorem. So now I have to tackle of bringing down that extraordinary, or unreasonable, cost down to something useable, and along those lines (of feasibility), I'm thinking of instead of regenerating and re-searching the primesish stream that we have some kind of State-like thing of ([already generated primes], indexed-primesish) ... something like that. Solution: "General improvement of problem-solving modules in anticipation of solving today's #haskell exercise, including primes: http://lpaste.net/107480"geophftag:blogger.com,1999:blog-4650294074444534066.post-6974373927895013304Fri, 01 Aug 2014 21:57:00 +0000Letter to a Young Haskell Enthusiast
http://comonad.com/reader/2014/letter-to-a-young-haskell-enthusiast/
The following letter is not about what "old hands" know and newcomers do not. Instead, it is about lessons that we all need to learn more than once, and remind ourselves of. It is about tendencies that are common, and understandable, and come with the flush of excitement of learning any new thing that we understand is important, and about the difficulty, always, in trying to decide how best to convey that excitement and sense of importance to others, in a way that they will listen. It is written more specifically, but only because I have found that if we don't talk specifics as well as generalities, the generalities make no sense. This holds for algebraic structures, and it holds for other, vaguer concepts no less. It is a letter full of things I want to remember, as well as of advice I want to share. I expect I will want to remind myself of it when I encounter somebody who is wrong on the internet, which, I understand, may occur on rare occasion.
You’ve recently entered the world of strongly typed functional programming, and you’ve decided it is great. You’ve written a program or two or a library or two, and you’re getting the hang of it. You hop on IRC and hear new words and ideas every day. There are always new concepts to learn, new libraries to explore, new ways to refactor your code, new typeclasses to make instances of.
Now, you’re a social person, and you want to go forth and share all the great things you’ve learned. And you have learned enough to distinguish some true statements from some false statements, and you want to go and slay all the false statements in the world.
Is this really what you want to do? Do you want to help people, do you want to teach people new wonderful things? Do you want to share the things that excite you? Or do you want to feel better about yourself, confirm that you are programming better, confirm that you are smarter and know more, reassure yourself that your adherence to a niche language is ok by striking out against the mainstream? Of course, you want to do the former. But a part of you probably secretly wants to do the latter, because in my experience that part is in all of us. It is our ego, and it drives us to great things, but it also can hold us back, make us act like jerks, and, worst of all, stand in the way of communicating with others about what we truly care about.
Haskell wasn’t built on great ideas, although it has those. It was built on a culture of how ideas are treated. It was not built on slaying others’ dragons, but on finding our own way; not tearing down rotten ideas (no matter how rotten) but showing by example how we didn’t need those ideas after all.
In functional programming, our proofs are not by contradiction, but by construction. If you want to teach functional programming, or preach functional programming, or just to even have productive discussions as we all build libraries and projects together, it will serve you well to learn that ethic.
You know better than the next developer, or so you think. This is because of something you have learned. So how do you help them want to learn it too? You do not tell them this is a language for smart people. You do not tell them you are smart because you use this language. You tell them that types are for fallible people, like we all are. They help us reason and catch our mistakes, because while software has grown more complex, we’re still stuck with the same old brains. If they tell you they don’t need types to catch errors, tell them that they must be much smarter than you, because you sure do. But even more, tell them that all the brainpower they use to not need types could turn into even greater, bigger, and more creative ideas if they let the compiler help them.
This is not a language for clever people, although there are clever things that can be done in this language. It is a language for simple things and clever things alike, and sometimes we want to be simple, and sometimes we want to be clever. But we don’t give bonus points for being clever. Sometimes, it’s just fun, like solving a crossword puzzle or playing a tricky Bach prelude, or learning a tango. We want to keep simple things simple so that tricky things are possible.
It is not a language that is “more mathematical” or “for math” or “about math”. Yes, in a deep formal sense, programming is math. But when someone objects to this, this is not because they are a dumb person, a bad person, or a malicious person. They object because they have had a bad notion of math foisted on them. “Math” is the thing that people wield over them to tell them they are not good enough, that they cannot learn things, that they don’t have the mindset for it. That’s a dirty lie. Math is not calculation — that’s what computers are for. Nor is math just abstract symbols. Nor is math a prerequisite for Haskell. If anything, Haskell might be what makes somebody find math interesting at all. Our equation should not be that math is hard, and so programming is hard. Rather, it should be that programming can be fun, and this means that math can be fun too. Some may object that programming is not only math, because it is engineering as well, and creativity, and practical tradeoffs. But, surprisingly, these are also elements of the practice of math, if not the textbooks we are given.
I have known great Haskell programmers, and even great computer scientists who know only a little linear algebra maybe, or never bothered to pick up category theory. You don’t need that stuff to be a great Haskell programmer. It might be one way. The only thing you need category theory for is to take great categorical and mathematical concepts from the world and import them back to programming, and translate them along the way so that others don’t need to make the same journey you did. And you don’t even need to do that, if you have patience, because somebody else will come along and do it for you, eventually.
The most important thing, though not hardest, about teaching and spreading knowledge is to emphasize that this is for everyone. Nobody is too young, too inexperienced, too old, too set in their ways, too excitable, insufficiently mathematical, etc. Believe in everyone, attack nobody, even the trolliest.* Attacking somebody builds a culture of sniping and argumentativeness. It spreads to the second trolliest, and soforth, and then eventually to an innocent bystander who just says the wrong thing to spark bad memories of the last big argument.
The hardest thing, and the second most important, is to put aside your pride. If you want to teach people, you have to empathize with how they think, and also with how they feel. If your primary goal is to spread knowledge, then you must be relentlessly self-critical of anything you do or say that gets in the way of that. And you don’t get to judge that — others do. And you must just believe them. I told you this was hard. So if somebody finds you offputting, that’s your fault. If you say something and somebody is hurt or takes offense, it is not their fault for being upset, or feeling bad. This is not about what is abstractly hurtful in a cosmic sense; it is about the fact that you have failed, concretely, to communicate as you desired. So accept the criticism, apologize for giving offense (not just for having upset someone but also for what you did to hurt them), and attempt to learn why they feel how they feel, for next time.
Note that if you have made somebody feel crummy, they may not be in a mood to explain why or how, because their opinion of you has already plummeted. So don’t declare that they must or should explain themselves to you, although you may politely ask. Remember that knowledge does not stand above human behavior. Often, you don't need to know exactly why a person feels the way they do, only that they do, so you can respect that. If you find yourself demanding explanations, ask yourself, if you knew this thing, would that change your behavior? How? If not, then learn to let it go.
Remember also that they were put off by your actions, not by your existence. It is easy to miss this distinction and react defensively. "Fight-or-flight" stands in the way of clear thinking and your ability to empathize; try taking a breath and maybe a walk until the adrenaline isn't derailing your true intentions.
Will this leave you satisfied? That depends. If your goal is to understand everything and have everybody agree with regards to everything that is in some sense objectively true, it will not. If your goal is to have the widest, nicest, most diverse, and most fun Haskell community possible, and to interact in an atmosphere of mutual respect and consideration, then it is the only thing that will leave you satisfied.
If you make even the most modest (to your mind) mistake, be it in social interaction or technical detail, be quick to apologize and retract, and do so freely. What is there to lose? Only your pride. Who keeps track? Only you. What is there to gain? Integrity, and ultimately that integrity will feel far more fulfilling than the cheap passing thrills of cutting somebody else down or deflecting their concerns.
Sometimes it may be, for whatever reason, that somebody doesn’t want to talk to you, because at some point your conversation turned into an argument. Maybe they did it, maybe you did it, maybe you did it together. It doesn’t matter, learn to walk away. Learn from the experience how to communicate better, how to avoid that pattern, how to always be the more positive, more friendly, more forward-looking. Take satisfaction in the effort in that. Don’t talk about them behind their back, because that will only fuel your own bad impulses. Instead, think about how you can change.
Your self-esteem doesn’t need your help. You may feel you need to prove yourself, but you don't. Other people, in general, have better things to do with their time than judge you, even when you may sometimes feel otherwise. You know you’re talented, that you have learned things, and built things, and that this will be recognized in time. Nobody else wants to hear it from you, and the more they hear it, the less they will believe it, and the more it will distract from what you really want, which is not to feed your ego, not to be great, but to accomplish something great, or even just to find others to share something great with. In fact, if anyone's self-esteem should be cared for, it is that of the people you are talking to. The more confident they are in their capacity and their worth, the more willing they will be to learn new things, and to acknowledge that their knowledge, like all of ours, is limited and partial. You must believe in yourself to be willing to learn new things, and if you want to cultivate more learners, you must cultivate that self-belief in others.
Knowledge is not imposing. Knowledge is fun. Anyone, given time and inclination, can acquire it. Don’t only lecture, but continue to learn, because there is always much more than you know. (And if there wasn’t, wow, that would be depressing, because what would there be to learn next?) Learn to value all opinions, because they all come from experiences, and all those experiences have something to teach us. Dynamic typing advocates have brought us great leaps in JIT techniques. If you’re interested in certain numerical optimizations, you need to turn to work pioneered in C++ or Fortran. Like you, I would rather write in Haskell. But it is not just the tools that matter but the ideas, and you will find they come from everywhere.
In fact, we have so much to learn that we direct our learning by setting up barriers — declaring certain tools, fields, languages, or communities not worth our time. This isn’t because they have nothing to offer, but it is a crutch for us to shortcut evaluating too many options all at once. It is fine, and in fact necessary, to narrow the scope of your knowledge to increase its depth. But be glad that others are charting other paths! Who knows what they will bring back from those explorations.
If somebody is chatting about programming on the internet, they’re already ahead of the pack, already interested in craft and knowledge. You may not share their opinions, but you have things to learn from one another, always. Maybe the time and place aren’t right to share ideas and go over disputes. That’s ok. There will be another time and place, or maybe there won’t be. There is a big internet full of people, and you don’t need to be everybody’s friend or everybody’s mentor. You should just avoid being anybody’s enemy, because your time and theirs is too precious to waste it on hard feelings instead of learning new cool stuff.
This advice is not a one-time proposition. Every time we learn something new and want to share it, we face these issues all over again -- the desire to proclaim, to overturn received wisdom all at once -- and the worse the received wisdom, the more vehemently we want to strike out. But if we are generous listeners and attentive teachers, we not only teach better and spread more knowledge, but also learn more, and enjoy ourselves more in the process. To paraphrase Rilke’s “Letter to a Young Poet”: Knowledge is good if it has sprung from necessity. In this nature of its origin lies the judgement of it: there is no other.
Thanks to the various folks in and around the Haskell world who have helped me refine this article. I don't name you only because I don't want to imply your endorsement, or give what is still, at base, a very personal take, any particular sort of imprimatur of a broader group of people, all of whom I suspect will disagree among themselves and with me about various specifics.
*: It has been pointed out to me that this advice is not universal. Clearly there are some things that deserve more pointed responses. Bigotry, outright harassment and poisonous behavior, etc. So please read this paragraph only as it applies to talking about technical issues, not as regards to many other things, where there are people better equipped than me to give advice.Gershom Bazermanhttp://comonad.com/reader/?p=933Fri, 01 Aug 2014 06:13:58 +0000A few new papers
http://existentialtype.wordpress.com/2014/07/21/a-few-new-papers/
I’ve just updated my web page with links to some new papers that are now available:
“Homotopical Patch Theory” by Carlo Angiuli, Ed Morehouse, Dan Licata, and Robert Harper. To appear, ICFP, Gothenburg, October 2014. We’re also preparing an expanded version with a new appendix containing material that didn’t make the cut for ICFP. (Why do we still have such rigid space limitations? And why do we have such restricted pre-publication deadlines as we go through the charade of there being a “printing” of the proceedings? One day soon CS will step into its own bright new future.). The point of the paper is to show how to apply basic methods of homotopy theory to various equational theories of patches for various sorts of data. One may see it as an application of functorial semantics in HoTT, in which theories are “implemented” by interpretation into a universe of sets. The patch laws are necessarily respected by any such interpretation, since they are just cells of higher dimension and functors must behave functorially at all dimensions.
“Cache Efficient Functional Algorithms” by Guy E. Blelloch and Robert Harper. To appear, Comm. ACM Research Highlight this fall. Rewritten version of POPL 2013 paper meant for a broad CS audience. Part of a larger effort to promote integration of combinatorial theory with logical and semantic theory, two theory communities that, in the U.S. at least, ignore each other completely. (Well, to be plain about it, it seems to me that the ignoring goes more in one direction than the other.) Cost semantics is one bridge between the two schools of thought, abandoning the age-old “reason about the compiled code” model used in algorithm analysis. Here we show that one can reason about spatial locality at the abstract level, without having to drop-down to low-level of how data structures are represented and allocated in memory.
“Refining Objects (Preliminary Summary)” by Robert Harper and Rowan Davies. To appear, Luca Cardelli 60th Birthday Celebration, Cambridge, October, 2014. A paper I’ve been meaning to write sometime over the last 15 years, and finally saw the right opportunity, with Luca’s symposium coming up and Rowan Davies visiting Carnegie Mellon this past spring. Plus it was a nice project to get me started working again after I was so rudely interrupted this past fall and winter. Provides a different take on typing for dynamic dispatch that avoids the ad hoc methods introduced for oop, and instead deploying standard structural and behavioral typing techniques to do more with less. This paper is a first cut as proof of concept, but it is clear that much more can be said here, all within the framework of standard proof-theoretic and realizability-theoretic interpretations of types. It would help to have read the relevant parts of PFPL, particularly the under-development second edition, which provides the background elided in the paper.
“Correctness of Compiling Polymorphism to Dynamic Typing” by Kuen-Bang Hou (Favonia), Nick Benton, and Robert Harper, draft (summer 2014). Classically polymorphic type assignment starts with untyped -terms and assigns types to them as descriptions of their behavior. Viewed as a compilation strategy for a polymorphic language, type assignment is rather crude in that every expression is compiled in uni-typed form, complete with the overhead of run-time classification and class checking. A more subtle strategy is to maintain as much structural typing as possible, resorting to the use of dynamic typing (recursive types, naturally) only for variable types. The catch is that polymorphic instantiation requires computation to resolve the incompatibility between, say, a bare natural number, which you want to compute with, and its encoding as a value of the one true dynamic type, which you never want but are stuck with in dynamic languages. In this paper we work out an efficient compilation scheme that maximizes statically available information, and makes use of dynamic typing only insofar as the program demands we do so. Of course there are better ways to compile polymorphism, but this style is essentially forced on you by virtual machines such as the JVM, so it is worth studying the correctness properties of the translation, which we do here making use of a combination of structural and behavioral typing.
I hope to comment here more fully on these papers in the near future, but I also have a number of other essays queued up to go out as soon as I can find the time to write them. Meanwhile, other deadlines loom large.
[Update: added fourth item neglected in first draft. Revise formatting. Add links to people. Brief summary of patch theory paper. Minor typographical corrections.]
[Update: the promised expanded version of the forthcoming ICFP paper is now available.]Filed under: Programming, Research Tagged: behavioral typing, cache efficient algorithms, compilation, cost semantics, dynamic dispatch, homotopy type theory, ICFP, polymorphism, structural typing, type refinementsRobert Harperhttp://existentialtype.wordpress.com/?p=558Mon, 21 Jul 2014 17:11:44 +0000http://www.w3.org/1999/xhtml
http://www.well-typed.com/blog/96
Haskell programmers tend to spend far less time with debuggers than programmers in other languages. Partly this is because for pure code debuggers are of limited value anyway, and Haskell’s type system is so strong that a lot of bugs are caught at compile time rather than at runtime. Moreover, Haskell is a managed language – like Java, say – and errors are turned into exceptions. Unlike in unmanaged languages such as C “true” runtime errors such as segmentation faults almost never happen.
I say “almost” because they can happen: either because of bugs in ghc or the Haskell runtime, or because we are doing low level stuff in our own Haskell code. When they do happen we have to drop down to a system debugger such as lldb or gdb, but debugging Haskell at that level can be difficult because Haskell’s execution model is so different from the execution model of imperative languages. In particular, compiled Haskell code barely makes any use of the system stack or function calls, and uses a continuation passing style instead (see my previous blog posts Understanding the Stack and Understanding the RealWorld). In this blog post I will explain a technique I sometimes use to help diagnose low-level problems.
Since I work on OSX I will be using lldb as my debugger. if you are using gdb you can probably use similar techniques; The LLDB Debugger shows how gdb and lldb commands correlate, and the ghc wiki also lists some tips. However, I have no experience with scripting gdb so your mileage may vary.
Description of the problem
As our running example I will use a bug that I was tracking down in a client project. The details of the project don’t matter so much, except that this project happens to use the GHC API to compile Haskell code—at runtime—into bytecode and then run it; moreover, it also—dynamically, at runtime—loads C object files into memory.
In one example run it loads the (compiled) C code
#include
void hello(void) {
printf("hello\n");
}
and then compiles and runs this Haskell code:
{-# LANGUAGE ForeignFunctionInterface #-}
module Main where
foreign import ccall "hello" hello :: IO ()
main = hello
Sadly, however, this resulted in total system crash.
Starting point
By attaching lldb to the running process we got a tiny bit more information about the crash:
* thread #4: tid = 0x3550aa, 0x000000010b3b8226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x000000010b3b8226
-> 0x10b3b8226: addb %al, (%rax)
0x10b3b8228: addb %al, (%rax)
0x10b3b822a: addb %al, (%rax)
0x10b3b822c: addb %al, (%rax)
It turns out we have a null-pointer dereferencing here. Anybody who has spent any time debugging Intel assembly code however will realize that this particular instruction
addb %al, (%rax)
is in fact the decoding of zero:
(lldb) memory read -c 8 0x10b3b8226
0x10b3b8226: 00 00 00 00 00 00 00 00 ........
In other words, chances are good we were never meant to execute this instruction at all. Unfortunately, asking lldb for a backtrace tells us absolutely nothing new:
(lldb) bt
* thread #4: tid = 0x3550aa, 0x000000010b3b8226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
* frame #0: 0x000000010b3b8226
Finding a call chain
The lack of a suitable backtrace in lldb is not surprising, since compiled Haskell code barely makes use of the system stack. Instead, the runtime maintains its own stack, and code is compiled into a continuation passing style. For example, if we have the Haskell code
functionA :: IO ()
functionA = do .. ; functionB ; ..
functionB :: ()
functionB = do .. ; functionC ; ..
functionC :: IO ()
functionC = .. crash ..
main :: IO ()
main = functionA
and we step through the execution of this program in lldb, and we ask for a backtrace when we start executing function A all we get is
(lldb) bt
* thread #1: tid = 0x379731, 0x0000000100000a20 Main`A_functionA_info, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
* frame #0: 0x0000000100000a20 Main`A_functionA_info
with no mention of main. Similarly, the backtraces on entry to functions B and C are
* thread #1: tid = 0x379731, 0x0000000100000b90 Main`B_functionB_info, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
* frame #0: 0x0000000100000b90 Main`B_functionB_info
and
* thread #1: tid = 0x379731, 0x0000000100000c88 Main`C_functionC_info, queue = 'com.apple.main-thread', stop reason = breakpoint 3.1
* frame #0: 0x0000000100000c88 Main`C_functionC_info
none of which is particularly informative. However, stepping manually through the program we do first see function A on the (singleton) call stack, then function B, and finally function C. Thus, by the time we reach function C, we have discovered a call chain A, B, C—it’s just that it involves quite a bit of manual work.
Scripting lldb
Fortunately, lldb can be scripted (see Using Scripting and Python to Debug in LLDB and the LLDB Python Reference). What we want to do is keep stepping through the code, showing the top-level (and quite possibly only) function at the top of the call stack at each step, until we crash.
We can use the following Python script to do this:
import lldb
def step_func(debugger, command, result, internal_dict):
thread = debugger.GetSelectedTarget().GetProcess().GetSelectedThread()
while True:
thread.StepOver()
stream = lldb.SBStream()
thread.GetStatus(stream)
description = stream.GetData()
print description
if thread.GetStopReason() == lldb.eStopReasonException:
break
def __lldb_init_module(debugger, dict):
debugger.HandleCommand('command script add -f %s.step_func sf' % __name__)
For the above example, we might use this as follows: we load our application into lldb
# lldb Main
Current executable set to 'Main' (x86_64).
register our new command sf
(lldb) command script import mystep.py
set a breakpoint where we want to start stepping
(lldb) breakpoint set -n A_functionA_info
Breakpoint 1: where = Main`A_functionA_info, address = 0x0000000100000b90
run to the breakpoint:
(lldb) run
Process 54082 launched: 'Main' (x86_64)
Process 54082 stopped
* thread #1: tid = 0x384510, 0x0000000100000b90 Main`A_functionA_info, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000100000b90 Main`A_functionA_info
and then use sf to find a call-trace until we crash:
(lldb) sf
...
* thread #1: tid = 0x384510, 0x0000000100000bf0 Main`B_functionB_info, queue = 'com.apple.main-thread', stop reason = instruction step over
frame #0: 0x0000000100000bf0 Main`B_functionB_info
Main`B_functionB_info:
...
* thread #1: tid = 0x384510, 0x0000000100000c78 Main`C_functionC_info, queue = 'com.apple.main-thread', stop reason = instruction step over
frame #0: 0x0000000100000c78 Main`C_functionC_info
Main`C_functionC_info:
...
* thread #1: tid = 0x384510, 0x0000000100000d20 Main`crash + 16 at crash.c:3, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x0000000100000d20 Main`crash + 16 at crash.c:3
Note that if you are using the threaded runtime, you may have to select which thread you want to step through before calling sf:
(lldb) thread select 4
(lldb) sf
Tweaking the script
You will probably want to tweak the above script in various ways. For instance, in the application I was debugging, I wanted to step into each assembly language instruction but over each C function call, mostly because lldb was getting confused with the call stack. I also added a maximum step count:
def step_func(debugger, command, result, internal_dict):
args = shlex.split(command)
if len(args) > 0:
maxCount = int(args[0])
else:
maxCount = 100
thread = debugger.GetSelectedTarget().GetProcess().GetSelectedThread()
i = 0;
while True:
frame = thread.GetFrameAtIndex(0)
file = frame.GetLineEntry().GetFileSpec().GetFilename()
inC = type(file) is str and file.endswith(".c")
if inC:
thread.StepOver()
else:
thread.StepInstruction(False)
stream = lldb.SBStream()
thread.GetStatus(stream)
description = stream.GetData()
print i
print description
i += 1;
if thread.GetStopReason() == lldb.eStopReasonException or i > maxCount:
break
You may want to tweak this step into/step over behaviour to suit your application; certainly you don’t want to have a call trace involving every step taken in the Haskell RTS or worse, in the libraries it depends on.
Back to the example
Rather than printing every step along the way, it may also be useful to simply remember the step-before-last and show that on a crash; often it is sufficient to know what happened just before the crash. Indeed, in the application I was debugging the call stack just before the crash was:
2583
thread #3: tid = 0x35e743, 0x00000001099da56d libHSrts_thr_debug-ghc7.8.3.20140729.dylib`schedule(initialCapability=0x0000000109a2f840, task=0x00007fadda404550) + 1533 at Schedule.c:470, stop reason = step over
frame #0: 0x00000001099da56d libHSrts_thr_debug-ghc7.8.3.20140729.dylib`schedule(initialCapability=0x0000000109a2f840, task=0x00007fadda404550) + 1533 at Schedule.c:470
467 }
468
469 case ThreadInterpret:
-> 470 cap = interpretBCO(cap);
471 ret = cap->r.rRet;
472 break;
473
2584
thread #3: tid = 0x35e743, 0x0000000103106226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x0000000103106226
-> 0x103106226: addb %al, (%rax)
0x103106228: addb %al, (%rax)
0x10310622a: addb %al, (%rax)
0x10310622c: addb %al, (%rax)
Which is a lot more helpful than the backtrace, as we now have a starting point: something went wrong when running the bytecode interpreter (remember that the application was compiling and running some Haskell code at runtime).
To pinpoint the problem further, we can set a breakpoint in interpretBCO and run sf again (the way we defined sf it steps over any C function calls by default). This time we get to:
4272
thread #4: tid = 0x35f43a, 0x000000010e77c548 libHSrts_thr_debug-ghc7.8.3.20140729.dylib`interpretBCO(cap=0x000000010e7e7840) + 18584 at Interpreter.c:1463, stop reason = step over
frame #0: 0x000000010e77c548 libHSrts_thr_debug-ghc7.8.3.20140729.dylib`interpretBCO(cap=0x000000010e7e7840) + 18584 at Interpreter.c:1463
1460 tok = suspendThread(&cap->r, interruptible ? rtsTrue : rtsFalse);
1461
1462 // We already made a copy of the arguments above.
-> 1463 ffi_call(cif, fn, ret, argptrs);
1464
1465 // And restart the thread again, popping the stg_ret_p frame.
1466 cap = (Capability *)((void *)((unsigned char*)resumeThread(tok) - STG_FIELD_OFFSET(Capability,r)));
4273
thread #4: tid = 0x35f43a, 0x0000000107eba226, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x0000000107eba226
-> 0x107eba226: addb %al, (%rax)
0x107eba228: addb %al, (%rax)
0x107eba22a: addb %al, (%rax)
0x107eba22c: addb %al, (%rax)
Ok, now we are really getting somewhere. Something is going wrong when are doing a foreign function call. Let’s re-run the application once more, setting a breakpoint at ffi_call:
(lldb) breakpoint set -n ffi_call
Breakpoint 1: where = libffi.dylib`ffi_call + 29 at ffi64.c:421, address = 0x0000000108e098dd
(lldb) cont
Process 51476 resuming
Process 51476 stopped
* thread #4: tid = 0x360fd3, 0x0000000108e098dd libffi.dylib`ffi_call(cif=0x00007fb262f00000, fn=0x00000001024a4210, rvalue=0x000000010b786ac0, avalue=0x000000010b7868a0) + 29 at ffi64.c:421, stop reason = breakpoint 1.1
frame #0: 0x0000000108e098dd libffi.dylib`ffi_call(cif=0x00007fb262f00000, fn=0x00000001024a4210, rvalue=0x000000010b786ac0, avalue=0x000000010b7868a0) + 29 at ffi64.c:421
418 /* If the return value is a struct and we don't have a return value
419 address then we need to make one. Note the setting of flags to
420 VOID above in ffi_prep_cif_machdep. */
-> 421 ret_in_memory = (cif->rtype->type == FFI_TYPE_STRUCT
422 && (cif->flags & 0xff) == FFI_TYPE_VOID);
423 if (rvalue == NULL && ret_in_memory)
424 rvalue = alloca (cif->rtype->size);
and let’s take a look at the function we’re about to execute:
(lldb) disassemble -s fn
0x1024a4210: pushq %rbp
0x1024a4211: movq %rsp, %rbp
0x1024a4214: leaq (%rip), %rdi
0x1024a421b: popq %rbp
0x1024a421c: jmpq 0x1024a4221
0x1024a4221: pushq $0x6f6c6c65
0x1024a4226: addb %al, (%rax)
0x1024a4228: addb %al, (%rax)
0x1024a422a: addb %al, (%rax)
0x1024a422c: addb %al, (%rax)
0x1024a422e: addb %al, (%rax)
We were expecting to execute hello:
# otool -tV hello.o
hello.o:
(__TEXT,__text) section
_hello:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp, %rbp
0000000000000004 leaq L_str(%rip), %rdi ## literal pool for: "hello"
000000000000000b popq %rbp
000000000000000c jmpq _puts
and if you compare this with the code loaded into memory it all becomes clear. The jump instruction in the object file
jmpq _puts
contains a symbolic reference to puts; but the jump in the code that we are about to execute in fact jumps to the next instruction in memory:
0x1024a421c: jmpq 0x1024a4221
0x1024a4221: ...
In other words, the loaded object file has not been properly relocated, and when we try to call puts we end up jumping into nowhere. At this point the bug was easily resolved.
Further Reading
We have barely scratched the surface here of what we can do with lldb or gdb. In particular, ghc maintains quite a bit of runtime information that we can inspect with the debugger. Tim Schröder has an excellent blog post about inspecting ghc’s runtime data structures with gdb, and Nathan Howell has written some extensive lldb scripts to do the same, although they may now be somewhat outdated. See also the reddit discussion about this blog post.edskohttp://www.well-typed.com/blog/96Fri, 01 Aug 2014 09:17:34 +0000[zljfhron] Lagged Fibonacci digits
http://kenta.blogspot.com/2014/08/zljfhron-lagged-fibonacci-digits.html
Part 1: Two newest tapsThe Fibonacci sequence modulo 10 starting with 0 and 1 repeats with a period of 60, the Babylonians' favorite number: 0 1 1 2 3 5 8 3 1 4 5 9 4 3 7 0 7 7 4 1 5 6 1 7 8 5 3 8 1 9 0 9 9 8 7 5 2 7 9 6 5 1 6 7 3 0 3 3 6 9 5 4 9 3 2 5 7 2 9 1. One could imagine a very confusing digital clock in which two digits slide by every minute (or second). Other starting pairs give other periods:
Period 20: 0 2 2 4 6 0 6 6 2 8 0 8 8 6 4 0 4 4 8 2.
Period 12: 2 1 3 4 7 1 8 9 7 6 3 9. (Also usable for a clock.)
Period 4: 4 2 6 8.
Period 3: 0 5 5.
Period 1: 0 0.
These periods sum to 100, which exhausts all possibilities of two starting digits.Part 2: Lagged Fibonacci, newest and oldest tapsStart with the digits 0 0 0 0 0 0 1 at the top of a 7-column tableau. Fill in the following rows left-to-right by adding modulo-10 the digit to left and the digit immediately above. For the leftmost digit on each row, the digit "to the left" is the last digit on the previous row. This is equivalent to the recurrence a[i] = (a[i-1] + a[i-7]) mod 10. The sequence repeats with an impressively long period of 2480437. The original motivation was a method for a person to produce by hand a page of seemingly random digits with little effort. The tableau begins0 0 0 0 0 0 1
1 1 1 1 1 1 2
3 4 5 6 7 8 0
3 7 2 8 5 3 3
6 3 5 3 8 1 4and ends 2480437 digits later:5 1 7 4 0 4 8
3 4 1 5 5 9 7
0 4 5 0 5 4 1
1 5 0 0 5 9 0
1 6 6 6 1 0 0
1 7 3 9 0 0 0
1 8 1 0 0 0 0
1 9 0 0 0 0 0
1Periods of other column widths (lag lengths) and their factorizations, starting with 0 0 ... 1:
1 4 = 2*2
2 60 = 2*2*3*5
3 217 = 7*31
4 1560 = 2*2*2*3*5*13
5 168 = 2*2*2*3*7
6 196812 = 2*2*3*3*7*11*71
7 2480437 = 127*19531
8 15624 = 2*2*2*3*3*7*31
9 28515260 = 2*2*5*73*19531
10 1736327236 = 2*2*7*19*31*127*829
11 249032784 = 2*2*2*2*3*7*11*13*71*73
12 203450520 = 2*2*2*3*5*7*13*31*601
13 482322341820 = 2*2*3*5*11*17*31*71*19531 The sequence of periods 4, 60, 217, 1560, 168 does not appear on OEIS. These first four terms I have confirmed are the longest possible period among all start seeds, not just 0 0 0 ... 1. It is curious that the factor 19531 occurs multiple times. Part 3: Two oldest tapsWe consider the recurrence a[i] = (a[i-6] + a[i-7]) mod 10, that is, the two "oldest" values. To calculate the next digit on a seven-column tableau, add the number above to the number above and to the right (northeast). Or, this could also be done with a more compact six-column tableau adding the number above and the number above and to the left (northwest). This recurrence repeats with a period 661416, smaller than the corresponding lag-7 sequence in Part 2.Periods for lag lengths are given below. The periods seem neither uniformly longer nor shorter than Part 2.2 60 = 2*2*3*5
3 168 = 2*2*2*3*7
4 1560 = 2*2*2*3*5*13
5 16401 = 3*7*11*71
6 196812 = 2*2*3*3*7*11*71
7 661416 = 2*2*2*3*7*31*127
8 15624 = 2*2*2*3*3*7*31
9 8894028 = 2*2*3*11*13*71*73
10 1736327236 = 2*2*7*19*31*127*829
11 3712686852 = 2*2*3*7*31*73*19531
12 203450520 = 2*2*2*3*5*7*13*31*601
13 25732419240 = 2*2*2*3*5*11*17*31*71*521
The sequence 60, 168, 1560 does not appear in OEIS. But the analogous sequence modulo 2 (instead of modulo 10) is A046932, and curiously, the analogous sequence of Part 2 modulo 2 seems to be the exact same sequence. Also, there's the factor 19531 again. Searching for the number on OEIS gives a hint of what is going on. The VIC cipher used this recurrence in base 10 with width 10.Source codeHere is a Haskell implementation including Floyd's cycle-detection algorithm. It is only a partial implementation of cycle detection because it does not go back to test that the found period is not a multiple of a shorter period. I'm hoping this second step wasn't necessary.Because the generated sequences were implemented as giant lists, it was very easy to have catastrophic space leaks. I've left in old versions of code demonstrating a few such failures. Even printing out a list followed by its length required some care. This aspect of Haskell, compared to an imperative programming language, I strongly dislike.The code is far from optimized, most notably because it was an exercise in learning to use Data.Sequence to hold the state. Unboxed arrays would probably have been better.Kentag:blogger.com,1999:blog-6757805.post-1412520959581531795Fri, 01 Aug 2014 07:33:00 +0000New skin
http://winterkoninkje.dreamwidth.org/97427.html
http://winterkoninkje.dreamwidth.org/97427.htmlThu, 31 Jul 2014 21:47:56 +0000FP Haskell Center is Going Free
https://www.fpcomplete.com/blog/2014/07/fphc-open-publish-announcement
FP Haskell Center Open Publish AnnouncementAs you know, FP Complete’s mission is to drive the wide-scale adoption of Haskell functional programming. As our next step in this mission, I’m pleased to announce we are releasing our full FP Haskell Center (FPHC) free for developers of open projects. This model will be very familiar to Haskellers and GitHub users.FPHC has been live online for a little less than a year and has been very successful. I’m proud of our hard-working team who have achieved so much so quickly, using many important open-source components as well as some very good commercial services. Because of this progress and the support and activity from our users, we have reached the point where we can give even more back to the Haskell community and do more to promote the overall advancement of Haskell.As of October 1, users of our free Community or “Open Publish” edition will have access to features previously offered only to FPHC Commercial customers. This includes complete remote git support, complete git command line integration for Emacs clients, as well as git “mega repos,” and inter-project dependencies.With these increased free features, the paid Personal edition is no longer needed. If you have a Personal license, we’ll stop charging you, and will renew you to a free Open Publish license. Similarly, Academic accounts will also convert to Open Publish accounts. For more details see the schedule below and our Frequently Ask Questions.Unlike Commercial licenses, Open Publish accounts will automatically publish all projects on the FPHC site with each commit. Open Publish accounts aren’t available on private servers, and won’t include a shared or private FP Application Server. Of course we continue to offer the paid Commercial license for those who need it — but most people won’t.We’re confident that access to more capable tools will inspire others to be more involved with pushing Haskell forward, and we believe this move will better suit open source projects and independent developers.Now that our corporate clients are expanding their work orders with us, we can offer even more for open source developers. This has also afforded us the opportunity to focus on our own innovative Haskell projects which we will be rolling out over time. Our corporate clients are excited by what we’ve been able to deliver with our Haskell-built tools, and this is just the beginning. Of course we also continue to contribute to Yesod, Fay, GHC, Stackage, and other important Haskell community projects beyond FP Haskell Center itself.Innovation is about motivation, and we hope our free tools for open source Haskell developers provide the resources and motivation to build more projects. What has already been amassed in our School of Haskell is proof that the Haskell community knows how to build great resources given the right tools and forums. Keep up the amazing work and let us know what else we can do to help.Aaron Contorer, CEO, FP CompletePlanned Release Schedule:As of July 31, we will no longer be offering the yearly Personal edition. The monthly edition will continue to be available to allow Personal subscribers to continue using git while we prepare the new FPHC Open Publish license. At this time we will continue to offer FPHC Professional and Community licenses.On August 30, we will stop accepting new monthly Personal edition subscriptions.October 1, we will release the new enhanced free Open Publish Community edition and discontinue online purchases and monthly subscriptions. At this time, all previous Community and personal licenses will become FPHC Open Publish licenses. New yearly paid FPHC commercial licenses may still be ordered by contacting FP Complete Sales.https://www.fpcomplete.com/blog/2014/07/fphc-open-publish-announcementThu, 31 Jul 2014 20:11:00 +0000Scalable program architectures
http://www.haskellforall.com/2014/04/scalable-program-architectures.html
Haskell design patterns differ from mainstream design patterns in one important way:Conventional architecture: Combine a several components together of type A to generate a "network" or "topology" of type BHaskell architecture: Combine several components together of type A to generate a new component of the same type A, indistinguishable in character from its substituent partsThis distinction affects how the two architectural styles evolve as code bases grow.The conventional architecture requires layering abstraction on top of abstraction:Oh no, these Bs are not connectable, so let's make a network of Bs and call that a C.Well, I want to assemble several Cs, so let's make a network of Cs and call that a D...Wash, rinse, and repeat until you have an unmanageable tower of abstractions.With a Haskell-style architecture, you don't need to keep layering on abstractions to preserve combinability. When you combine things together the result is still itself combinable. You don't distinguish between components and networks of components.In fact, this principle should be familiar to anybody who knows basic arithmetic. When you combine a bunch of numbers together you get back a number:3 + 4 + 9 = 16Zero or more numbers go in and exactly one number comes out. The resulting number is itself combinable. You don't have to learn about "web"s of numbers or "web"s of "web"s of numbers.If elementary school children can master this principle, then perhaps we can, too. How can we make programming more like addition?Well, addition is simple because we have (+) and 0. (+) ensures that we can always convert more than one number into exactly number:(+) :: Int -> Int -> Int... and 0 ensures that we can always convert less than one number into exactly one number by providing a suitable default:0 :: IntThis will look familiar to Haskell programmers: these type signatures resemble the methods of the Monoid type class:class Monoid m where -- `mappend` is analogous to `(+)` mappend :: m -> m -> m -- `mempty` is analogous to `0` mempty :: mIn other words, the Monoid type class is the canonical example of this Haskell architectural style. We use mappend and mempty to combine 0 or more ms into exactly 1 m. The resulting m is still combinable.Not every Haskell abstraction implements Monoid, nor do they have to because category theory takes this basic Monoid idea and generalizes it to more powerful domains. Each generalization retains the same basic principle of preserving combinability.For example, a Category is just a typed monoid, where not all combinations type-check:class Category cat where -- `(.)` is analogous to `(+)` (.) :: cat b c -> cat a b -> cat a c -- `id` is analogous to `0` id :: cat a a... a Monad is like a monoid where we combine functors "vertically":-- Slightly modified from the original type classclass Functor m => Monad m where -- `join` is analogous to `(+)` join :: m (m a) -> m a -- `return` is analogous to `0` return :: a -> m a... and an Applicative is like a monoid where we combine functors "horizontally":-- Greatly modified, but equivalent to, the original type classclass Functor f => Applicative f where -- `mult` is is analogous to `(+)` mult :: f a -> f b -> f (a, b) -- `unit` is analogous to `0` unit :: f ()Category theory is full of generalized patterns like these, all of which try to preserve that basic intuition we had for addition. We convert more than one thing into exactly one thing using something that resembles addition and we convert less than one thing into exactly one thing using something that resembles zero. Once you learn to think in terms of these patterns, programming becomes as simple as basic arithmetic: combinable components go in and exactly one combinable component comes out.These abstractions scale limitlessly because they always preserve combinability, therefore we never need to layer further abstractions on top. This is one reason why you should learn Haskell: you learn how to build flat architectures.Gabriel Gonzaleztag:blogger.com,1999:blog-1777990983847811806.post-7399908678496201769Sat, 05 Apr 2014 03:33:00 +0000Pragmatic Haskell Developer at Anchor Systems (Full-time)
http://functionaljobs.com/jobs/8729-pragmatic-haskell-developer-at-anchor-systems
urn:uuid:ee53c746-4749-559c-8c53-495d43f8cca0Wed, 30 Jul 2014 05:51:33 +0000vectorBuilder: packed-representation yielding for conduit
https://www.fpcomplete.com/blog/2014/07/vectorbuilder-packed-conduit-yielding
This post contains fragments of active Haskell code, best viewed and executed at
https://www.fpcomplete.com/blog/2014/07/vectorbuilder-packed-conduit-yielding
Back in March, I mentioned that we'd be using conduit for high performance analyses. We've been busy working on various aspects of this behind the scenes. This is the first publicly available follow-up since then.One issue with financial analyses is bridging the gap between in-memory representations and streaming data. The former allows for higher performance for many forms of analysis, while the latter allows us to deal with far larger data sets and to generate output more quickly for certain analyses.We'll be using the vector package almost exclusively for efficient in-memory representations in IAP, and conduit for streaming data. The question comes: what happens when we need to transition from one to the other? There are already functions like yieldMany to yield values from a Vector as a Conduit, and conduitVector or sinkVector to consume values from a Conduit into a packed representation.While these solutions are always sufficient, they aren't always optimal. When we're going for high speed analyses, we don't want to waste time boxing unpacked values or allocating extra constructors unnecessarily. This blog post introduces a new tool we're adding to the IAP toolchain (and the conduit toolchain in general): vectorBuilder.The problemOften times when working with Haskell in a high performance context, the overhead introduced by a linked list representation can be too high. Having an extra constructor around each value, a constructor for each cons cell, and the indirection introduced by having to follow pointers, can completely kill performance. The most common examples of this are the high speedup you can often achieve by replacing String with Text (or sometimes ByteString), or by using Vectors- especially unboxed or storable Vectors.conduit has a similar representation of a stream as a list, including the constructor overheads just mentioned. It's not surprising, therefore, that in a situation that a list would be a poor representation, conduits will often suffer similar performance problems. Like lists, some of this overhead is mitigated by shortcut fusion (a.k.a., rewrite rules). But this isn't always the case.conduit-combinators provides a helper function which allows us to take back performance, by working with a packed representation instead of creating a bunch of cons cells. It does this by using the vector package's generic mutable interface under the surface, while at a user-facing level providing a simple yield-like function, avoiding the need to muck around with mutable buffers.This article will cover how to use this function, some implementation details, and comparisons to other approaches.NOTE: At the time of writing, the version of conduit-combinators provided on School of Haskell does not contain the vectorBuilder function, and therefore the active code below will not run.Motivating use caseLet's start with a simple goal: we have chunks of bytes coming in, and we want to (1) duplicate each successive byte so that, e.g. [1, 2, 3] becomes [1, 1, 2, 2, 3, 3] and (2) rechunk the values into vectors of size 512. The original data could be chunked in any way, so we can rely on any specific incoming chunk size (in this case, a known 256 chunk size would be convenient).Likely the easiest approach is to convert our stream of chunked values (e.g., ByteString or Vector Word8) into a stream of elements (e.g., Word8), duplicate the individual values, then chunk those back up. Such a solution would look like:rechunk1 = concatC
=$= concatMapC (\x -> [x, x])
=$= conduitVector 512This uses the concatC combinator to "flatten out" the input stream, concatMapC to duplicate each individual Word8, and then conduitVector to create a stream of 512-sized Vectors. In my simple benchmark, this function took 13.06ms.But as we can probably guess, this falls into the problem zone described in our introduction. So instead of dealing with things on the individual byte level, let's try to use some higher-level functions operating on Vectors of values instead. Our new approach will be to first mapC over the stream and use vector's concatMap to double each value, and then use takeCE and foldC to extract successive chunks of size 4096. In code:rechunk2 =
mapC (concatMap $ replicate 2) =$= loop
where
loop = do
x <- takeCE 512 =$= foldC
unless (null x) $ yield x >> loopIn the same benchmark, this performed at 8.83ms, a 32% speedup. While respectable, we can do better.Buffer copyingOur first approach is optimal in one way: it avoids needless buffer copying. Each Word8 is copied precisely once into an output Vector by conduitVector. Unfortunately, this advantage is killed by the overhead of boxing the Word8s and allocating constructors for conduit. Our second approach avoids the boxing and constructors by always operating on Vectors, but we end up copying buffers multiple times: from the original Vector to the doubled Vector, and then when folding together multiple Vectors into a single Vector of size 512.What we want to do is to be able to yield a Word8 and have it fill up an output buffer, and once that buffer is filled, yield that buffer downstream and start working on a new one. We could do that by directly dealing with mutable Vectors, but that's error-prone and tedious. Instead, let's introduce our new combinator function: vectorBuilder (or its unqualified name, vectorBuilderC).The idea is simple. vectorBuilder will allocate an output buffer for you. It provides you with a special yield-like function that fills up this buffer, and when it's full, yields the entire buffer downstream for you.To use it, we're going to use one other combinator function: mapM_CE, which performs an action for every value in a chunked input stream (in our case, for each Word8 in our input Vector Word8s). Altogether, this looks like:rechunk3 = vectorBuilderC 512 $ \yield' ->
mapM_CE (\x -> yield' x >> yield' x)We call yield' twice to double our bytes. vectorBuilder ensures that each output buffer is of size 512. mapM_CE efficiently traverses the incoming Vectors without creating intermediate data structures.This version benchmarks at 401.12us. This is approximately 95% faster than our previous attempt!Avoiding transformersThere's something tricky about the yield' function above. Notice how it's not being used in the Conduit monad transformer, but is instead living the base monad (e.g., IO). This is not accidental. Not only does this allow us to use existing monadic combinators like mapM_CE, it also allows for far more efficient code. To demonstrate, let's look at two different ways of doing the same thing:bgroup "transformers" $
let src = return () in
[ bench "single" $ nfIO $ do
ref <- newIORef 0
let incr = modifyIORef ref succ
src $$ liftIO (replicateM_ 1000 incr)
, bench "multi" $ nfIO $ do
ref <- newIORef 0
let incr = liftIO $ modifyIORef ref succ
src $$ replicateM_ 1000 incr
]Both of these benchmarks use no conduit features. They both create an IORef, then increment it 1000 times. The difference is that the first calls liftIO once, while the second calls liftIO 1000 times. Let's see the difference in benchmark results:benchmarking transformers/single
mean: 4.292891 us, lb 4.285319 us, ub 4.303626 us, ci 0.950
std dev: 45.83832 ns, lb 35.04324 ns, ub 59.43617 ns, ci 0.950
benchmarking transformers/multi
mean: 93.10228 us, lb 92.95708 us, ub 93.30159 us, ci 0.950
std dev: 869.6636 ns, lb 673.8342 ns, ub 1.090044 us, ci 0.950Avoiding extra liftIO calls has a profound performance impact. The reason for this is somewhat similar to what we've been discussing up until now about extra cons cells. In our case, it's extra PipeM constructors used by conduit's MonadIO instance. I don't want to dwell on those details too much right now, as that's a whole separate topic of analysis, involving looking at GHC core output. But let's take it as a given right now.The question is: how does vectorBuilder allow you to live in the base monad, but still yield values downstream, which requires access to the Conduit transformer? There's a trick here using mutable variables. The implementation essentially works like this:Allocate a new, empty mutable vector.Allocate a mutable variable holding an empty list.Start running the user-supplied Conduit function, providing it with a specialized yield function.The specialized yield function- which lives in the base monad- will write values into the mutable vector. Once that mutable vector is filled, the vector is frozen and added to the end of the mutable variable's list, and a new mutable vector is allocated.The next time the user's function awaits for values from upstream, we jump into action. Since we're already forced to be in the Conduit transformer at that point, this is our chance to yield. We grab all of the frozen vectors from the mutable variable and yield them downstream. Once that's done, we await for new data from upstream, and provide it to the user's function.When the user's function is finished, we freeze the last bit of data from the mutable vector and yield that downstream too.The upsides of this approach are ease-of-use and performance. There is one downside you should be aware of: if you generate a large amount of output without awaiting for more data from upstream, you can begin to accumulate more memory. You can force the collection of frozen Vectors to be flushed using the following helper function:forceFlush :: Monad m => ConduitM i o m ()
forceFlush = await >>= maybe (return ()) leftoverThis simply awaits for a value, allowing vectorBuilder to clear its cache, and then gives the new value back as a leftover.Overall, your goal should be to have a decent trade-off between memory and time efficiency. To demonstrate, try playing around with the functions f1, f2, and f3 in the following code snippet:{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE FlexibleContexts #-}
import ClassyPrelude.Conduit
forceFlush :: Monad m => ConduitM i o m ()
forceFlush = await >>= maybe (return ()) leftover
-- Memory inefficient, time efficient
f1 :: (Int -> IO ()) -> Sink () IO ()
f1 f = liftIO $ forM_ [1..1000000] f
-- Memory efficient, time inefficient
f2 :: (Int -> Sink () IO ()) -> Sink () IO ()
f2 f = forM_ [1..1000000] $ \i -> do
f i
forceFlush
-- Good trade-off
f3 f = forM_ (chunksOf 10000 [1..1000000]) $ \is -> do
liftIO $ mapM_ f is
forceFlush
where
chunksOf _ [] = []
chunksOf i x =
y : chunksOf i z
where
(y, z) = splitAt i x
main = vectorBuilderC 4096 f3
$$ (sinkNull :: Sink (Vector Int) IO ())ByteString and VectorIt may be surprising to have seen an entire article on packed representations of bytes, and not yet seen ByteString. As a matter of fact, the original use case I started working on this for had nothing to do with the vector package. However, I decided to focus on vector for two reasons:Unlike bytestring, it provides a well developed mutable interface. Not only that, but the mutable interface is optimized for storable, unboxed, and generic vectors, plus existing helper packages like hybrid-vectors. In other words, this is a far more general-purpose solution.It's trivial and highly efficient to convert a ByteString to and from a storable Vector.To demonstrate that second point, let's try to read a file, duplicate all of its bytes as we did above, and write it back to a separate file. We'll use the toByteVector and fromByteVector functions, which I recently added to mono-traversable for just this purpose:{-# LANGUAGE NoImplicitPrelude #-}
import ClassyPrelude.Conduit
import System.IO (IOMode (ReadMode, WriteMode),
withBinaryFile)
double :: (Word8 -> IO ()) -> Sink (SVector Word8) IO ()
double yield' = mapM_CE $ \w ->
yield' w >> yield' w
main :: IO ()
main = withBinaryFile "input.txt" ReadMode $ \inH ->
withBinaryFile "output.txt" WriteMode $ \outH ->
sourceHandle inH
$$ mapC toByteVector
=$ vectorBuilderC 4096 double
=$ mapC fromByteVector
=$ sinkHandle outHComparison with blaze-builderThere's a strong overlap between what vectorBuilder does, and how blaze-builder (and more recently, bytestring's Builder type) are intended to be used. I unfortunately can't give any conclusive comparisons between these two techniques right now. What I can say is that there are cases where using a Builder has proven to be inefficient, and vectorBuilder provides a large performance improvement. I can also say that vectorBuilder addresses many more use cases that Builder. For example, at FP Complete we're planning to use this in financial analyses for creating time series data.On the other hand, blaze-builder and bytestring's Builder have both had far more real-world tuning than vectorBuilder. They also have support for things such as copying existing ByteStrings into the output stream, whereas vectorBuilder always works by copying a single element at a time.So for now, if you have a use case and you're uncertain whether to use vectorBuilder to blaze-builder, I recommend either trying both approaches, or discussing it on one of the Haskell mailing lists to get more feedback.Complete codeThe code for most of the blog post above is below. Sorry that it's a bit messy:{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE NoMonomorphismRestriction #-}
import ClassyPrelude.Conduit
import Control.Monad.Primitive (PrimMonad)
import Control.Monad.ST (runST)
import Criterion.Main (bench, bgroup, defaultMain, nfIO,
whnfIO)
import qualified Data.Vector.Generic as VG
import qualified System.Random.MWC as MWC
import Test.Hspec (hspec, shouldBe)
import Test.Hspec.QuickCheck (prop)
rechunk1 :: ( Monad m
, VG.Vector vector (Element input)
, PrimMonad base
, MonadBase base m
, MonoFoldable input
)
=> Conduit input m (vector (Element input))
rechunk1 = concatC =$= concatMapC (\x -> [x, x]) =$= conduitVector 512
{-# INLINE rechunk1 #-}
rechunk2 :: (Monad m, IsSequence a) => Conduit a m a
rechunk2 =
mapC (concatMap $ replicate 2) =$= loop
where
loop = do
x <- takeCE 512 =$= foldC
unless (null x) $ yield x >> loop
{-# INLINE rechunk2 #-}
rechunk3 :: ( MonadBase base m
, PrimMonad base
, MonoFoldable input
, VG.Vector vector (Element input)
)
=> Conduit input m (vector (Element input))
rechunk3 = vectorBuilderC 512 $ \yield' ->
mapM_CE (\x -> yield' x >> yield' x)
{-# INLINE rechunk3 #-}
main :: IO ()
main = do
hspec $ prop "rechunking" $ \ws -> do
let src = yield (pack ws :: UVector Word8)
doubled = concatMap (\w -> [w, w]) ws
res1 = runST $ src $$ rechunk1 =$ sinkList
res2 = runST $ src $$ rechunk2 =$ sinkList
res3 = runST $ src $$ rechunk3 =$ sinkList
res1 `shouldBe` (res2 :: [UVector Word8])
(res3 :: [UVector Word8]) `shouldBe` (res2 :: [UVector Word8])
(unpack $ (mconcat res2 :: UVector Word8)) `shouldBe` (doubled :: [Word8])
case reverse res1 :: [UVector Word8] of
[] -> return ()
x:xs -> do
(length x <= 512) `shouldBe` True
all ((== 512) . length) xs `shouldBe` True
gen <- MWC.createSystemRandom
bytes <- replicateM 20 $
MWC.uniformR (12, 1024) gen >>= MWC.uniformVector gen
defaultMain
[ bgroup "copy bytes"
[ bench "rechunk1" $ whnfIO
$ yieldMany (bytes :: [UVector Word8])
$$ (rechunk1 :: Conduit (UVector Word8) IO (UVector Word8))
=$ sinkNull
, bench "rechunk2" $ whnfIO
$ yieldMany (bytes :: [UVector Word8])
$$ (rechunk2 :: Conduit (UVector Word8) IO (UVector Word8))
=$ sinkNull
, bench "rechunk3" $ whnfIO
$ yieldMany (bytes :: [UVector Word8])
$$ (rechunk3 :: Conduit (UVector Word8) IO (UVector Word8))
=$ sinkNull
]
, bgroup "transformers" $
let src = return () in
[ bench "single" $ nfIO $ do
ref <- newIORef (0 :: Int)
let incr = modifyIORef ref succ
src $$ liftIO (replicateM_ 1000 incr)
, bench "multi" $ nfIO $ do
ref <- newIORef (0 :: Int)
let incr = liftIO $ modifyIORef ref succ
src $$ replicateM_ 1000 incr
]
]https://www.fpcomplete.com/blog/2014/07/vectorbuilder-packed-conduit-yieldingWed, 30 Jul 2014 04:00:00 +0000Many Shades of Halting Oracles
http://jozefg.bitbucket.org/posts/2014-07-30-many-shades-of-halting.html
http://jozefg.bitbucket.org/posts/2014-07-30-many-shades-of-halting.htmlWed, 30 Jul 2014 00:00:00 +0000Beware of bracket
http://feedproxy.google.com/~r/RomanCheplyaka/~3/_0O9V2wAWLQ/2014-07-30-bracket.html
http://ro-che.info//articles/2014-07-30-bracket.htmlTue, 29 Jul 2014 21:00:00 +0000Reading files from the proc filesystem
http://www.yesodweb.com/blog/2014/07/reading-files-proc-filesystem
I was stumped by this one myself for a bit today, so I thought writing it up in
a blog post would be a good way to make sure (1) I don't forget this little
fact, and (2) hopefully the next person doesn't need to puzzle over this as
long as I did. Let's say you want to read the contents of a file in the proc
filesystem, such as /proc/uptime. There are many ways to do that in Haskell.
Let's ignore any streaming data framework for the moment, and instead focus on
just the "string-like" types: String and strict/lazy ByteString/Text.
Here's a little program that tries all of them out:import qualified Data.ByteString as S
import qualified Data.ByteString.Lazy as L
import qualified Data.Text.IO as T
import qualified Data.Text.Lazy.IO as TL
test :: Show a => String -> (FilePath -> IO a) -> IO ()
test name reader = do
contents <- reader "/proc/uptime"
putStrLn $ name ++ ": " ++ show contents
main :: IO ()
main = do
test "String " readFile
test "strict ByteString" S.readFile
test "lazy ByteString " L.readFile
test "strict Text " T.readFile
test "lazy Text " TL.readFileGiven that the uptime file is just simple ASCII data, you'd probably assume
(like I did) that all of these will produce the same result. In fact, that's
not the case. On my system, the results are:String : "60740.70 136144.86\n"
strict ByteString: ""
lazy ByteString : "60740.70 136144.86\n"
strict Text : "60740.70 136144.86\n"
lazy Text : "60740.70 136144.86\n"Strict ByteString reading is returning an empty value! Why is this happening?
It's actually quite easy to see once you throw in two new pieces of
information. First, let's look at the implementation of
Data.ByteString.readFile:readFile :: FilePath -> IO ByteString
readFile f = bracket (openBinaryFile f ReadMode) hClose
(\h -> hFileSize h >>= hGet h . fromIntegral)Notice how we allocate a buffer exactly the right size to read in the entire
contents of the file. We don't do this with any of the file reading functions.
For the lazy variants, we don't want to read the entire file into memory at
once. And for strict Text, knowing the size of the file doesn't tell us the
size of the buffer we need to allocate, due to variable length encoding. So
this nifty optimization only applies to strict ByteStrings.Now piece of data number two:$ ls -l /proc/uptime
-r--r--r-- 1 root root 0 Jul 27 13:56 /proc/uptimeHuh, the file is empty! As is well
documented,
virtually every file in the proc filesystem is listed as empty, and the
contents are generated on demand by the kernel.So how do you read the file contents into a strict ByteString? There are actually plenty of approaches that work. In my case, I ended up just writing a helper function using conduit: localReadFile fp =
IO.withBinaryFile fp IO.ReadMode $ \h ->
sourceHandle h $$ foldCBut probably the simplest thing to do is to just convert a lazy ByteString
into a strict ByteString, e.g. fmap L.toStrict . L.readFile.http://www.yesodweb.com/blog/2014/07/reading-files-proc-filesystemSun, 27 Jul 2014 10:50:00 +0000Safety
http://feedproxy.google.com/~r/CartesianClosedComic/~3/bXO55sUNXNE/25
http://ro-che.info/ccc/25Sat, 26 Jul 2014 23:00:00 +0000Playing with Haskell
http://www.dialectical-computing.de/blog/blog/2014/07/27/playing-with-haskell/
After reading a few books on Haskell I had to face the reality:
learning by doing a project was necessary!
I chose a project which was easy enough to be finished in a few weeks
or months (but still slightly challenging) and had a practical
utility.
My project is a JSON command-line utility loosely inspired by
underscore-cli and Hawk. Arbitrary Lens expressions can be used to
filter or transform the input.
If you don’t know what Lens are, think of them as
getters/setters/filters/functions combinators similar to JQuery or CSS
selectors but for any type of data-structures. I’m still a beginner
regarding Lens.
The challenge for me was to learn how to dynamically evaluate Haskell
expressions. This is uncommon since Haskell is statically typed. The
library I used to do that is naturally limited in its functionality in
comparison to a Lisp but the final result is all but disappointing.
For the purpose of my program I implemented a pretty printer for JSON
similar to aeson-pretty but with colors. Maybe I should package it for
Hackage?
Once I had hdevtools setup for cabal sandboxes, programming in Haskell
was enjoyable. Refactoring is easy thanks to the strong type system. I
was stuck once or twice with the type system but the people on the
#haskell channel were helpful. The code has a certain form of esthetic
even if I feel more knowledge would allow me to be cleaner. For
example I wonder if it is possible to avoid pattern matching on Left
and Right for multiple calls which return something like IO (Either
x y), since both IO and Either are monads.
You can have a look at the project here:
https://github.com/kototama/jphttp://www.dialectical-computing.de/blog/blog/2014/07/27/playing-with-haskellSat, 26 Jul 2014 22:00:00 +0000