Announcements
The new Eurosurveillance website is almost here.

Eurosurveillance is on the updated list of the Directory of Open Access Journals and in the SHERPA/RoMEO database. Read more here.

On 6 June 2017, the World Health Organization (WHO) published updates to its ‘Essential Medicines List’ (EML). Read more here.

In this issue

 Home Eurosurveillance Edition  2017: Volume 22/ Issue 26 Article 4
 Tweet

Eurosurveillance, Volume 22, Issue 26, 29 June 2017
Research article
Soetens, Hahné, and Wallinga: Dot map cartograms for detection of infectious disease outbreaks: an application to Q fever, the Netherlands and pertussis, Germany

+ Author affiliations

Citation style for this article: Soetens L, Hahné S, Wallinga J. Dot map cartograms for detection of infectious disease outbreaks: an application to Q fever, the Netherlands and pertussis, Germany. Euro Surveill. 2017;22(26):pii=30562. DOI: http://dx.doi.org/10.2807/1560-7917.ES.2017.22.26.30562

Received:13 July 2016; Accepted:02 December 2016

## Introduction

Here we propose a method of mapping infectious disease data, the dot map cartogram, which displays the geographical locations of reported cases from routine surveillance or outbreak investigations, such that public health experts can visualise both absolute numbers and spatial trends in incidence of infection. The method is developed to address two major causes of misinterpretation in commonly used maps: masking patterns of disease by not taking into account the population density distribution, and masking patterns by categorisation of information (across space and incidence of disease). With the dot map cartogram, we address the problem, raised by a recent systematic review [1], that spatial methods are underutilised and used in only ca 0.4% of all published outbreak investigations.

The two most frequently used map types for spatial outbreak data are the dot map and choropleth incidence map [1]. In a dot map, every case is represented by a point on the map, showing absolute numbers of cases. Illustrating absolute numbers of cases reveals the size of the outbreak and identifies areas without cases. Dot maps do not account for the underlying geographical distribution of the population. As populations are usually heterogeneously distributed, important variations in incidence of infection can be masked. Another drawback of dot maps is that they may reveal too much information about the location of specific cases, by which the privacy of a case might be violated.

In a choropleth map, a quantitative attribute is displayed per spatial unit. For example, ordinal classes of incidence may be displayed by municipality, with the areas shaded according to their incidence value and a range of shading classes [2]. Choropleth maps are mostly used to display surveillance data. They are less suitable to display outbreak data when the interest is in exact locations of cases. Although this type of map gives a quick overview of where there are more cases than expected based on the background population, it has a major limitation: The appearance of this map is heavily influenced by arbitrary choices with regard to the classification system for the ordinal classes and the spatial unit used (the latter is also referred to as the Modifiable Areal Unit Problem, MAUP [3]). Because the data are categorised both across the quantitative attribute and across the geographical units, this can lead to a loss of information by masking important internal patterns and variations. Therefore, this type of map can be misleading. This is especially problematic when mapping rare events, which is often the case with infectious diseases.

Here we combine the advantages of both maps into one map type which we call dot map cartogram. This map type uses a diffusion-based algorithm [4], which creates contiguous or value-by-area cartograms. In these value-by-area cartograms, regions are enlarged or reduced relative to the number of individuals they contain. We apply this principle to traditional dot maps: we deform the map contours based on population size and simultaneously deform the dot pattern. In this way, a dense point pattern in a big city will become more dispersed, whereas a dense point pattern in a rural area will become even more dense. A similar technique called density-equalising map projection (DEMP) has been pioneered before to describe the spatial distribution of cryptosporidiosis among AIDS patients in San Francisco [5]. This earlier study used a different density-equalising algorithm, and application was limited to local outbreak data. This method has the advantages that the density of the dots reflects the incidence, the number of dots represent the absolute number of cases, and arbitrary choices for scale of spatial unit and classification system are avoided.

In this study, we assessed whether the advantages of the dot map cartogram outweigh the potential disadvantages. One potential drawback is that the dot map cartogram may reveal too much information with regard to the privacy of the cases. Another potential drawback is that the spatial distortion makes it hard to recognise or localise the map topology. We created dot map cartograms using real life data and compared them to the traditional choropleth and dot maps regarding four criteria: (i) ability to show both absolute numbers and incidence of the disease, (ii) sensitivity to choices regarding spatial scale and classification system, (iii) ability to ensure the privacy of individual cases, and (iv) ability to recognise locations. The comparison was applied to the mapping of a point-source outbreak and to the mapping of the occurrence of a human-to-human transmissible disease.

## Methods

### Data

As an example of outbreak mapping to locate a source, we used data from the Q fever outbreak in the Netherlands in 2009. Q fever case reports in the Netherlands communicable disease notification system (Osiris) include the 4-digit postal code of residence. Cases with date of onset of illness in 2009 (n = 1,740) were extracted for this analysis. Population data and shapefiles (a data format which stores geometrical locations and metadata such as population size per geometrical location) on different spatial levels (4-digit postal code and municipality level) were accessed via Statistics Netherlands [5].

As an example of a human-to-human transmissible disease, we used pertussis notifications in Germany in 2015. Data on pertussis cases was made available by the Robert Koch Institute through the SurvStat@RKI 2.0 web portal [6], in which the district of residence is registered for every case. We extracted data on laboratory-confirmed pertussis cases with a date of diagnosis in 2015 (n = 9,017). The most recent population data and shapefiles for district and federal state level boundaries (2013) were accessed through the open data portal of the Federal Agency for Cartography and Geodesy Germany [7]. We assumed that the population size per district or federal state in 2015 was sufficiently similar to that in 2013.

### Geographical representation

#### Dot map

In a dot map, every case is represented by a dot on their location of residence. As this would reveal the exact locations of the cases and harm their privacy, we chose to use a proxy location by assigning a random point location in the spatial unit of residence (4-digit postal code for the Netherlands and district for Germany) to every case. The 4-digit postal code areas in the Netherlands have a mean population of 4,119 persons (mean surface: 8.6 km2), and the districts in Germany have a mean population of 200,914 persons (mean surface: 888.7 km2). For the map of the Netherlands we have used the national RD (‘rijksdriehoeksstelsel’)-based projection and for Germany we used the Universal Transverse Mercator (UTM) 32 projection. Both are conformal map projections in which local angles are preserved and shapes are represented accurately and without distortion for small areas.

#### Choropleth incidence map

In a choropleth incidence map, disease incidence is displayed per spatial unit, using ordinal classes. We chose two spatial unit levels for each country: one at the same level as the patient data (spatial unit of residence as described above) and the other one level higher (municipality in the Netherlands or federal state in Germany). In addition, we categorised incidence into five ordinal classes, using two separate classification systems: the Jenks’ natural breaks algorithm [8] which seeks to reduce the variance within classes and maximise the variance between classes, and the quantile method in which equal numbers of spatial units are placed into each class [8]. The colour schemes for the incidence classes were based on previously established map colour palettes [9].

#### Basic principle

We created cartograms by reshaping the spatial units such that their area was proportional to their population by applying the Gastner-Newman diffusion-based algorithm [4] without changing the underlying map topology [10]. The principle is that once the areas have been scaled to be proportional to their population, then population density is by definition the same for each area of visually the same size on the cartogram. To create dot map cartograms, the point patterns of cases (as described in the dot map section) were simultaneously transformed in the reshape process of the spatial units so that the number of cases per unit area reflected the incidence. In addition, to provide points of reference to interpret the dot map cartograms, capital municipality, provincial or state boundaries were simultaneously transformed, and an inset was added with the undistorted map. We used ScapeToad 1.2 software [11] to reshape the spatial units and the point patterns. From this programme, the transformed layers were exported as shapefiles. The exported files were imported into R statistical software to create the maps. The main R package used to create the maps was ggplot2. The R code for the construction of the dot map cartogram is provided elsewhere (https://github.com/lsoetens/DotMapCartogram).

#### Cartogram quality

As we used spatial units with a population size at a certain spatial level to deform the original map, the MAUP problem was inherited. However, the consequences of this problem can be reduced by relying on an objective measure to assess which spatial level we could best use for our transformations. We assessed the cartogram quality by the objective measures, i.e. the average and maximum normalised cartographic error [10]. The cartographic error is the most commonly used measure for distortion in the value-by-area realisation. We assumed an input map M that is partitioned into n regions with polygonal boundaries. For each region v, a(v) denotes the area of v in M and the weight w(v) is the desired area for the region based on the background population. The diffusion-based algorithm constructed the cartogram M0, that is a deformation of M, in which o(v) is the observed area of region v. The average normalised cartographic error e was calculated as:

### Formula 1

$e = 1 n ∑ v ∈ V | o ( v ) − w ( v ) | max { o ( v ) , w ( v ) }$with a range of [0,1]

and the maximum normalised cartographic error x was calculated as:

### Formula 2

$x = max ( | o ( v ) − w ( v ) | max { o ( v ) , w ( v ) } )$with a range of [0,1]

No guidelines exist as to what constitutes acceptable cartogram quality; to be consistent with the values measured in a previous publication [10], we aimed for e < 0.1 and x < 0.4 in our cartograms. The performance of the algorithm, and subsequently the cartogram quality, depends on the distribution of the population density in the input file and the grid size on which the computation of the population density is based. Large discrepancies in the population density between regions impair the algorithm performance, while a finer grid will produce a higher quality cartogram, but may introduce distortion. We started with an input map at the same spatial unit of residence as the case data (i.e. 4-digit postal code for the Netherlands and district level for Germany) and assessed the average and maximum error of the cartograms. For the Netherlands, cartogram error was not satisfactory and we changed to the municipality level and accepted a maximum normalised cartogram error of x > 0.4; this was due to the very small population size of the municipalities on the West Frisian Barrier Islands. These areas should be smaller than presented in the created maps, but would no longer be visible. We used a 512 × 512 grid, which gave sufficient results; a higher-resolution grid of 1,024 × 1,024 is possible, but requires substantially greater computation times.

### User requirements

To create the dot map cartograms, no specific geographical information system skills are needed. The only requirement is some basic knowledge of the R statistical software to be able to run the provided code (https://github.com/lsoetens/DotMapCartogram). Required software includes R studio and ScapeToad [11]. Both programmes are open source programmes, which can be downloaded for free. A detailed manual for creating the dot map cartograms is provided along with the code (https://github.com/lsoetens/DotMapCartogram).

## Results

### Geographical representation of point-source outbreaks: Q fever in the Netherlands

The maps of Q fever cases are compared in Figure 1. The dot map cartogram (Figure 1A) is based on the population size per municipality in the Netherlands in 2009 (e = 0.08, sd = 0.09; x = 0.78). It is compared with the dot map (Figure 1B) and the choropleth incidence maps (Figures 1C-E).

#### Ability to show both absolute numbers and incidence

The dot map cartogram is able to show both absolute numbers and incidence. Incidence is readily inferred from the choropleth incidence maps, but the exact size of the outbreak in absolute numbers is not apparent. The dot map showed a clear clustering pattern with several hotspots, but whether this is related to the population density (incidence) cannot be inferred from the dot map.

#### Sensitivity to choices regarding spatial scale and classification system

The dot map cartogram and dot map do not suffer from arbitrary classification issues. The choropleth incidence maps are sensitive to this; different spatial scale and classification systems result in highly variable maps. When only one of those maps is used, the geography of the outbreak could be misinterpreted. One would conclude that the outbreak is more widespread (Figure 1C) or more intense (Figure 1E) than is illustrated in Figure 1D.

#### Ability to ensure the privacy of cases

The dot map compromises case privacy by revealing exact locations of cases, while the choropleth incidence map protects privacy by aggregating case information. The dot map cartogram meets this criterion partially by deforming the underlying region.

#### Ability to recognise locations

The dot map cartogram is harder to read, and specific locations are difficult to recognise. This can be improved by providing locations of major towns or district boundaries. The choropleth incidence map and dot map outperform the dot map cartogram on this criterion.

### Geographical representation of a human-to-human transmissible disease: pertussis in Germany

The maps for pertussis cases are compared in Figure 2. Figure 2A depicts the dot map cartogram based on the population size by district in Germany in 2013 (e = 0.07, sd = 0.06; x = 0.29); this map is contrasted with the corresponding dot map (Figure 2B) and choropleth incidence maps (Figures 2C-E).

#### Ability to show both absolute numbers and incidence

A much larger number of cases are illustrated in this example than for the Q fever data. This is immediately apparent from the dot map and the dot map cartogram, but less so from the choropleth incidence map. The clustering patterns on the dot map in several large cities and for example in the Ruhr district (in the west of Germany) disappear into an almost random pattern in the dot map cartogram, indicating that the clustering patterns on the dot map can be explained by the underlying population density in those areas. In contrast, in the south-east of Germany near Munich, a clustering pattern is present in both the dot map and the dot map cartogram, indicating that this cluster is not attributable to population density but reflects a higher incidence of pertussis notifications. Therefore, in this situation, both incidence and absolute number of cases are important in interpreting the map: they reveal the size of the problem and provide clustering of cases not attributable to the population density.

#### Sensitivity to choices regarding spatial scale and classification system

The choropleth incidence maps showed that the choice of classification system or spatial level results in highly variable maps. In this comparison, Figure 2D, based on the Jenks’ natural breaks classification system and the district level, is comparable to the dot map cartogram, showing the same areas with high incidence; however, there is no way to determine a priori which classification system and spatial level would result in the ‘right’ map.

#### Ability to secure the privacy of cases

All maps meet this criterion. In this dataset, the dot map does not compromise privacy because of the frequency of occurrence of the disease. As before, the dot map cartogram protects case privacy due to the deformation of the underlying regions, and the choropleth incidence map because information is aggregated.

#### Ability to recognise locations

The choropleth incidence map and dot map outperform the dot map cartogram on this criterion. Adding reference points in the dot map cartogram, such as the federal state and federal state municipality boundaries, can help in recognising locations.

## Discussion

We have proposed the dot map cartogram for displaying spatial infectious disease data and illustrating both incidence and absolute numbers of cases. A similar technique has been suggested before [12], but to our knowledge it was never used to study clustering of point patterns at a national level in infectious disease epidemiology. We compared the dot map cartogram to the frequently used choropleth incidence map and the dot map [1]. The main advantages of the dot map cartogram over the other two is that it is able to simultaneously reveal epidemic patterns adjusted for the population distribution, and to unmask patterns that are hidden by aggregation and categorisation of information. The visual distortion of the dot map cartogram is a barrier to pinpointing the location of a case: this is a benefit in the field of infectious diseases because case privacy should be ensured in presenting surveillance data. In addition, dot map cartograms illustrate areas without cases, which are harder to discern by the use of choropleth incidence maps.

We did not address map types other than dot maps and incidence maps for this comparison. Smooth incidence maps are an alternative to the choropleth incidence maps, in which the incidence is smoothed across the spatial units. This technique was applied to the data from a previous study for the Q fever outbreak in the Netherlands in 2009 [13]. In illustrating hotspots, the dot map cartogram is comparable to the smooth incidence map. The advantage of the smooth incidence map is that it permits identification of the exact location of hot spots, since the map is not deformed. However, smooth incidence maps do not reveal areas without cases and it is hard to discern the absolute number of cases and the true scope and size of the outbreak. Many more mapping techniques exist, such as map types based on other interpolation techniques (such as inverse distance weighting, kriging and trend surface fitting [2]), incidence based value-by-area cartograms [14,15], and many others. Assessment of all available methods was not within the scope of this paper.

We used the Gastner and Newman diffusion-based algorithm [4] to construct the cartogram. Alternative algorithms are available to construct contiguous cartograms [16-26]. We chose the Gastner and Newman algorithm because it performs well compared to the alternative algorithms in terms of quantitative measures such as the cartographic error, shape error and topology error [10], and qualitative measures such as subjective preferences and task performance [27]. Recent studies have proposed quantitative [10] and qualitative [27] measures for cartogram generation techniques. The development of standardised performance measures will allow objective ranking and selecting of cartogram algorithms.

With the demonstration of dot map cartograms, we provide public health professionals with an alternative spatial method for outbreak analysis. Firstly, we expect that dot map cartograms minimise misinterpretation of the data. Secondly, as demonstrated in this study, dot map cartograms protect the privacy of cases. Thirdly, dot map cartograms do not require expensive specialised GIS software, which facilitates use in settings with limited resources. The only requirements to construct dot map cartograms are free software, case reports with location data and access to population data as shapefiles. As there is a general trend for governments to make administrative population data publicly available, the latter is available for most countries. Finally, our method can easily be extended with information on covariates relevant for mapping: a relevant example in infectious disease surveillance or research is to colour the case dots for a specific attribute, such as age group, sex or vaccination status. Introducing other layers of detail or attributes further broadens the utility of this spatial method for use in infectious disease research and surveillance. However, as the technique is graphical rather than geographical, it does not replace a geographical information system explaining the impact of various geographical factors on the spread of a certain infectious disease.

### Acknowledgements

We thank Lisa Hansen for thoroughly editing the manuscript for English language.

None declared.

### Authors’ contributions

Authors LS, SH, and JW contributed to the design of the study. LS led on the data analysis and drafting of the manuscript supported by SH and JW. All authors commented on drafts of the manuscript and approved the final version.

## References

1. Smith CM, Le Comber SC, Fry H, Bull M, Leach S, Hayward AC. Spatial methods for infectious disease outbreak investigations: systematic literature review.Euro Surveill. 2015;20(39):30026.

2. Heywood I, Cornelius S, Carver S. An introduction to geographical information systems. New Jersey: Prentice-Hall; 2011. ISBN: 9780273722595.

3. Openshaw S. Ecological fallacies and the analysis of areal census data.Environ Plann A. 1984;16(1):17-31. DOI: 10.1068/a160017 PMID: 12265900

4. Gastner MT, Newman ME. From The Cover: Diffusion-based method for producing density-equalizing maps.Proc Natl Acad Sci USA. 2004;101(20):7499-504. DOI: 10.1073/pnas.0400280101 PMID: 15136719

5. Wijk- en buurtkaart 2008. [District and neighborhood maps 2008]. Den Haag: Statistics Netherlands. [Accessed: 20 Jan 2016]. Dutch. Available from: http://www.cbs.nl/nl-NL/menu/themas/dossiers/nederland-regionaal/publicaties/geografische-data/archief/2009/2010-wijk-en-buurtkaart-2008.htm

6. SurvStat@RKI 2.0. Berlin: Robert Koch Institute. [Accessed: 3 Feb 2016]. Available from: https://survstat.rki.de/

7. Open Data - Freie Daten und Dienste des BKG. [Open data – free data and services of the BKG]. Leipzig: Federal Agency for Cartography and Geodesy (BKG). [Accessed: 15 Nov 2015]. German. Available from: http://www.geodatenzentrum.de/geodaten/gdz_rahmen.gdz_div?gdz_spr=deu&gdz_akt_zeile=5&gdz_anz_zeile=0&gdz_user_id=0

8. Brewer CA, Pickle L. Evaluation of methods for classifying epidemiological data on choropleth maps in series.Ann Assoc Am Geogr. 2002;92(4):662-81. DOI: 10.1111/1467-8306.00310

9. Harrower M, Brewer CA. ColorBrewer.org: An online tool for selecting colour schemes for maps.Cartogr J. 2003;40(1):27-37. DOI: 10.1179/000870403235002042

10. Alam MJ, Kobourov SG, Veeramoni S. Quantitative Measures for Cartogram Generation Techniques.Comput Graph Forum. 2015;34(3):351-60. DOI: 10.1111/cgf.12647

11. Andrieu D, Kaiser C, Ourednik A. ScapeToad: not just one metric. [Accessed: 20 Sep 2015]. Lausanne: Choros Laboratory, Available from: http://scapetoad.choros.ch/

12. Khalakdina A, Selvin S, Merrill DW, Erdmann CA, Colford JM. Analysis of the spatial distribution of cryptosporidiosis in AIDS patients in San Francisco using density equalizing map projections (DEMP).Int J Hyg Environ Health. 2003;206(6):553-61. DOI: 10.1078/1438-4639-00245 PMID: 14626902

13. van der Hoek W, van de Kassteele J, Bom B, de Bruin A, Dijkstra F, Schimmer B,  et al.  Smooth incidence maps give valuable insight into Q fever outbreaks in The Netherlands. Geospat Health. 2012;7(1):127-34. DOI: 10.4081/gh.2012.111 PMID: 23242690

14. Dodd PJ, Sismanidis C, Seddon JA. Global burden of drug-resistant tuberculosis in children: a mathematical modelling study.Lancet Infect Dis. 2016;16(10):1193-201. DOI: 10.1016/S1473-3099(16)30132-3 PMID: 27342768

15. Dorling D, Barford A, Newman M. WORLDMAPPER: the world as you’ve never seen it before.IEEE Trans Vis Comput Graph. 2006;12(5):757-64. DOI: 10.1109/TVCG.2006.202 PMID: 17080797

16. Tobler WR. Democratic representation and apportionment: a continous transformation useful for districting.Ann N Y Acad Sci. 1973;219:215-20. DOI: 10.1111/j.1749-6632.1973.tb41401.x PMID: 4518429

17. Dougenik JA, Chrisman NR, Niemeyer DR. An algorithm to construct continuous area cartograms.Prof Geogr. 1985;37(1):75-81. DOI: 10.1111/j.0033-0124.1985.00075.x

18. Gusein-Zade SM, Tikunov VS. A new technique for constructing continuous cartograms.Cartogr Geogr Inf Syst. 1993;20(3):167-73. DOI: 10.1559/152304093782637424

19. Keim DA, North SC, Panse C. CartoDraw: a fast algorithm for generating contiguous cartograms.IEEE Trans Vis Comput Graph. 2004;10(1):95-110. DOI: 10.1109/TVCG.2004.1260761 PMID: 15382701

20. Keim DA, Panse C, North SC. Medial-axis-based cartograms.IEEE Comput Graph Appl. 2005;25(3):60-8. DOI: 10.1109/MCG.2005.64 PMID: 15943089

21. Kämper JH, Kobourov SG, Nöllenburg M. Circular-arc cartograms. 2013 IEEE Pacific Visualization Symposium (PacificVis). Sydney; 2013. http://dx.doi.org/10.1109/PacificVis.2013.6596121

22. Dorling D. Area cartograms: their use and creation. In: The map reader: theories of mapping practice and cartographic representation. Dodge M, Kitchin R, Perkins C, eds. Chichester: John Wiley & Sons; 2011. p. 252-60. http://dx.doi.org/10.1002/9780470979587.ch33

23. Edelsbrunner H, Waupotitsch R. A combinatorial approach to cartograms.J. Comput. Geom.1997;7(5-6):343-60.

24. House DH, Kocmoud CJ. Continuous cartogram construction. Proceedings of the IEEE Visualization. Research Triangle Park; 18-23 Oct 1998. p. 197-204. https://dx.doi.org/10.1109/VISUAL.1998.745303

25. Schulman J, Selvin S, Merrill DW. Density equalized map projections: a method for analysing clustering around a fixed point.Stat Med. 1988;7(4):491-505. DOI: 10.1002/sim.4780070406 PMID: 3368676

26. Selvin S, Merrill D, Schulman J, Sacks S, Bedell L, Wong L. Transformations of maps to investigate clusters of disease.Soc Sci Med. 1988;26(2):215-21.

27. Nusrat S, Alam MJ, Kobourov SG. Evaluating cartogram effectiveness. IEEE Trans Vis Comput Graph. 2016. https://dx.doi.org/10.1109/TVCG.2016.2642109

 Tweet

Disclaimer: The opinions expressed by authors contributing to Eurosurveillance do not necessarily reflect the opinions of the European Centre for Disease Prevention and Control (ECDC) or the editorial team or the institutions with which the authors are affiliated. Neither ECDC nor any person acting on behalf of ECDC is responsible for the use that might be made of the information in this journal. The information provided on the Eurosurveillance site is designed to support, not replace, the relationship that exists between a patient/site visitor and his/her physician. Our website does not host any form of commercial advertisement. Except where otherwise stated, all manuscripts published after 1 January 2016 will be published under the Creative Commons Attribution (CC BY) licence. You are free to share and adapt the material, but you must give appropriate credit, provide a link to the licence, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.