Prejudice, shared meanings, local grammars and Google as a resource for research

The Prejudice Map The Prejudice Map According to Google, people in the world are known for...

Here's an interesting 'mash up' of Google's API that supposedly answers the question above. It uses a very simple Google query as in http://www.google.com/search?q="germans+are+known+for+*" to create a graphical overview of the different 'prejudices' about various cultures in the world. It brings up several interesting questions. Let's deal with the boring ones first.

We can talk about the nature of prejudice and ask ourselves which of these statements constitute prejudice and how do we determine their veracity or even appropriateness. In fact, the only thing the map outlines is how various people online have completed (without prompting) sentences like 'Germans are known for'. This can certainly be revealing but it is nothing more than that.

This leads us to the question of shared meaning or perhaps a conceptual code (in the sense of 'code switching' rather than cryptography). Here we have a large collection of texts that can (when queried appropriately) reveal something about some greater shared meaning of the community. But then we know something about the nature of the community in question, namely that it isn't really the kind of community that can engage in the processes of codification (at least on some level). So we're back to the same problem we had with 'argh' and the 'islands of probability'. This in some way challenges our concept of shared meaning and shared code (or rather some of the assumed mechanics thereof). However, for the purposes of interpretation of this map it is probably a non-sensical question. Each statement comes from an individual who in some (as yet undefined) way represents the views (or ways of talking) of his or her community. How often do you need to hear X are brave to be able to say 'X are known for being brave' is an open question (I'd say often once is sufficient). No real puzzle there.

Now for the really interesting question which has to do with linguistics. The software as it is only works for nations with large enough mentions. The query as it is nets 18,700 results for Germans 97 for Czechs and 17 for Albanians. This is partly because it needs to be very simple and asks the engine to select one word (or so) of the subsequent text. As such the results (when collected automatically) are not very good. This shows the limits of 'dumb' text retrieval algorithms. What would the results be if we used something akin to Sinclair et al.'s idea of local grammar which already is very good at locating definitions (I suspect that's what Google use in their search engine define: function) or evaluation. And then we could ask the seemingly interesting question: Is there such a thing as a grammar of prejudice? To which the answer is again no but there is probably something like a grammar of talking about nations. Such grammar would probably be a very useful thing to have along with other local grammars because it would make data mining much easier. However, it depends on some way of tagging the corpus for parts of speech or other elements - either during compilation or on the fly.

Interesting. I started with rather lofty questions, dismissed them as misleading and woolly-headed (even though they were mine) and ended up with a view on data-mining. That's how philosophy should be done!