This wonderful CIRCOS-style chart (click to zoom) visualises migration data collected by the UN. Particularly noticeable are the flows from Mexico to the United States; and from India, Pakistan, and Bangladesh to the United Arab Emirates.
The chart is from Nikola Sander, Guy Abel, and Ramon Bauer. There is also an interactive visualisation of changes over time. Excellent data visualisation!
Data visualisation is one of my favourite topics. We can examine data visualisation in terms of three “lenses.” The first lens is that of mathematical mapping. Typically we have a visual element that represents one or more real-world variables, and we want a clear mapping from one to the other – that is, a one-to-one mapping without too much distortion.
For example, Fisher’s Iris flower data set gives 4 measurements for each of 150 flowers of three species of Iris. Multidimensional scaling can be used to map distances in the 4-dimensional data to distances in a 2-dimensional image (below). Flowers with similar measurements map to dots close together in the image, and flowers with wildly different measurements map to dots far apart. Some distortion is inevitable here, but the diagram clearly shows that Iris setosa flowers (se) are quite different from Iris virginica (vi) and Iris versicolor (ve), which sometimes have similar measurements:
In many visualisations, the mathematical mapping is defined by a colour scale, as in this map of fine particulate air pollution (credit: Aaron van Donkelaar, Dalhousie University):
The second lens is that of cognitive psychology. We build visualisations for people, not Martians, and people see some things more easily than others. A great deal has been written on this topic. To pick just one example, rainbow scales like the one above have been strongly criticised. They fail when turned into grey-scale, they may over-emphasize the transition at whichever value is coded yellow, and they can also confuse people with colour-blindness. It must be said, however, that the rainbow scale above (an improved version) combines hue and brightness in a way that actually makes good sense to people with most forms of colour-blindness (see the simulation below of what a person with red/green colour-blindness would see). Consequently, the rainbow scale above probably works better for this dataset than many of the alternative colour scales would. On the other hand, the nonlinearity in the colour scale is rather confusing.
The final lens is that of design. Visualisations that are mathematically and cognitively satisfactory may still be boring or ugly. Although the design lens is a little more subjective, there are many examples of good visualisation design out there. The tide prediction infographic by Kelvin Tow below (submitted to the 2013 Information is Beautiful Awards) is just one. The classic books by Edward Tufte also have a lot of good advice in this area.
A fascinating recent paper on arXiv.org, entitled “Geo-located Twitter as the proxy for global mobility patterns” (also reported on the MIT technology review) uses Twitter to study human movement (the study is based on a dataset of almost a billion tweets). The CIRCOS image below shows the top 30 country-to-country visitor flows, as estimated by the authors. Ribbon colours indicate trip destination, so Mexico-based Twitterers visiting the US are a major category. While the US is the most common travel destination, Russia is the most common point of origin.
There’s lots more in the paper: it’s well worth a read. Twitterers may not be totally representative of the world population, but there are still many interesting conclusions to be drawn here, and an opportunity for even more interesting follow-up work.
Network diagram from Hawelka, Sitko, Beinat, Sobolevsky, Kazakopoulos, and Ratti: “Geo-located Twitter as the proxy for global mobility patterns”