Self-organization of iconic linguistic items

The Aargh Page
# Not surpisingly, "argh" is much more frequent than any of the alternatives, and the items with fewer 'a's or 'r's are more frequent than their longer neighbors.
# However, there are high-frequency islands, even way out in the long-word planes. For example, "a17r23gh" (17,23) occurs in 171 pages, even though if you change the number of 'a's or 'r's by one, it drops at least 20-fold. "A15r5gh" is almost 100 times more frequent than its neighbors.

Tracking the distribution of different spellings of 'aargh' on the web may seem like a completely pointless exercise (although a great proof of concept). The same could be said about the companion page regarding 'hmm'.

However, there could be an interesting lesson to be learned here. Particularly interesting is the concept of high-frequency islands (my emphasis). The spelling of both 'aargh' and 'hm' exhibits iconicity in two ways. First, it is onomatopoeic in the sense that it tries to capture the sounds made in puzzlement and exasperation (these sounds are conventionalized, i.e. vary by language, and the need for special spelling is dictated by English orthography). However, iconicity can be also employed to indicate the strength of emotion (this mirrors but does not replicate what happens in spoken English). Given all this, we can predict that the more emphasis the author wishes to employ, the longer the string. The number of 'a's or 'r's could either be governed by rules or be random. Of course, we know no such rules exist so the next prediction is random distribution. And that, by and large, is borne out but then there are those pesky frequency islands.

Why are those there and what role do they play. We could think of them as kind of spandrels, i.e. there for purely structural reasons, such as certain combinations being more likely given keyboard design or the human hand-to-eye coordination, etc. But given the nature of the islands that seems unlikely.

A better way of looking at them (although lacking the beauty of causal explanations) would be to treat the islands as properties of the stochastic system that is the typing in of 'argh' by users of the internet. This is deeply unsatisfactory to our usual way of looking at explanations but may be the only honest way of looking at it.

Analogy:  Now, this can serve as an analogy for language as a whole. What if much of what we have been so far capturing throught the medium of the 'grammatical rule' is a combination of rules, spandrels and properties of language as a stochastic system? Our system of rules has mushroomed (Plato only had about two - hyperbolically reminiscent of Chomsky's minimalism) and many of them are there simply to plug holes created by the postulation of some previous rule. Formal linguistics (both in its generative and non-generative guises) is particularly susceptible to this. Most of the rules in generative grammar are describing an independent system only partially isomorphic with natural language. However, there has never been a viable alternative. It may (and really just may) lie in this direction (some key ideas related to this are: neural theory of language, neural nets, bootstrapping, construction grammars, local grammars - how exactly it all falls together nobody knows but a picture seems as if it is about to emerge).