Hi, I'm Gregor, welcome to my blog where I mostly write about data visualization, cartography, colors, data journalism and some of my open source software projects.

Map Symbol Clustering — k-Means vs. noverlap


While working on the soon-to-be-released map widget for Piwik (heck, it’s been over two years since the first sketches!) I implemented two map symbol clustering algorithms into Kartograph.js. Last year I wrote about why this is a good idea, and now I turned that advice into reusable code. In this post I want to share my findings after experimenting with different clustering techniques.


Inspired by an example on the Polymaps website the first thing I tried was [k-Means clustering]( The code provided with the demo worked really well and was easy to integrate with the symbol API in Kartograph. The only parameter k-Means needs is the desired number of clusters. Below you can see an interactive demo of k-Means clustering. You can change the number of clusters using the slider. The main problem with k-means is that it doesn’t fix the overlapping symbols. However, since it reduces the number of displayed symbols it does improve the readability of the map. The tricky part is to find the ideal number of clusters. The fewer clusters, the more details we’re losing in the less “populated” places. Instead of optimizing this, I decided to go back to my original idea, which is to simply cluster the overlapping symbols.


The technique is described in this post. It takes two parameters: the tolerance controls to which amount overlapping is accepted. A value of 0.1 means we tolerate 10% overlapping of adjacent symbols. The parameter maxRatio lets you prevent overlapping of equally sized symbols. A value of 0.8 means that no symbols are grouped if the radius of the smaller symbol is larger than 80% of the radius of the larger symbol. The resulting clustering looks much better to me. What’s nice about it is that this algorithm doesn’t affect non-overlapping symbols, so we don’t loose details outside the big cities. The name of this technique is inspired by a Gephi plugin. Update: Just wanted to note the fact that for the noverlap clustering the maximum radius of the symbols is a crucial parameter as well. The larger the symbols are, the more of their neighborhood they “occupy”. Therefor I added a third slider in both demos.

How to use it

You can see a larger comparison of both techniques side-by-side in this example. There you also find instructions how to use the clustering in your own maps.


RT @flowingdata: Map symbol clustering | /scratch/scupper (Dec 10, 2012)

[…] @flowingdata: Map symbol clustering… This entry was posted in Uncategorized and tagged tweet by scup4442. Bookmark the […]

Map Symbol Clustering | BrainFade (Jan 06, 2013)

[…] Link: Map Symbol Clustering — k-Means vs. noverlap […]