When I was first discovering data visualisation, I went to all the obvious places for inspiration: newspapers with great visual journalism, The Pudding, Behance, Tableau Public, and online portfolios of other information designers. They were all great resources to get examples of what good could look like, but whenever I found an outstanding example of data viz online, I was always left with a strange feeling. I may describe it as a mix of awe with “but, how..?”. How did such a fantastic depiction of data come to be? What was the process behind it? Why did the designer decide the things they decided? And as I evolved my data skills and created more and more data viz for stakeholders and clients, the questions started getting a bit deeper, but the answers I either came across or came up with were primarily based on empirical experience rather than anything structured.
I learned data viz by doing - I didn’t have much in the way of formal training in the fields of Computer Science, Data Science or anything STEM related. I learned the basics as I went, based on the needs that arose from the roles I landed. Blog posts from fellow visualisers often focus on how to do things in a particular tool or how to create specific charts, but it is difficult to find references for the broader, more theoretical and conceptual side of data viz. Why do we make the micro-decisions we make as data viz designers? Why did I choose one chart over another? How do I know if my dataset influences my chart choices or how that influence even happens? How do I go about dealing with interactivity? We often have to explain these decisions to stakeholders and clients as well, but because it is so rare to find a framework that addresses them clearly, we often lack the vocabulary to clarify what our decisions might depend on.
If you’re looking for a theoretical framework that helps you understand why the standard answer to every question about data visualisation is “it depends”, then Visualization Analysis & Design by Tamara Munzner is for you. The author does a fantastic job of going deeper into the concepts, the jargon and the research of data viz, beyond technological tools and beyond an over-reliance on disciplines adjacent to data visualisation. It is a book about data visualisation for data visualisers, which treats the field as its own domain. It is a commendable effort - to make data viz not just part of design, not just part of data science or statistics, but as its own field with its own set of defining terms and processes.
It reads like a textbook but it isn’t dull or hard to follow. The book is very well structured, making it easy to consult in case you need to refresh a concept - or to read the whole thing cover to cover if you really want to dig deeper into data viz theory. It also serves as an outstanding reference compilation for research papers, authors and other books that are relevant to the field. Despite its depth and content density, it is incredibly friendly to beginners looking for a more structured learning experience.
What, Why, How
Munzner defines a basic framework for data visualisation, which guides the book’s chapters and how the content is organised. She calls it the What-Why-How framework. One particularly great point the author makes is that the process shall not be seen as a “straightjacket” but as a guide to get us started when doing data viz work. The framework is also iterative: there can be multiple instances of What-Why-How in one given analysis, depending on its complexity or context. In summary, the three stages are described:
What is the data the end-user will see or that will feed into the visualisation produced? Here, Tamara Munzner discussed data types, dataset types, their availability and the attributes that modify each one of these characteristics, such as data types being categorical, ordered, quantitative, sequential, diverging, etc. This is a super important primer for those just starting out in the field, as it helps us organise our mental models regarding how we perceive data. Data is an abstraction of reality, recorded in particular ways. Understanding these definitions makes it easier for us to make sense of them and decode their meaning.
Why should we develop a visualisation for the end user? What are the intentions that lead to this need? During this stage of the framework, the book delves more into the reasons why visualisations matter from a practical standpoint. What actions do we perform that may benefit from visualisations? Is it to discover what the data tells us? To present our findings to others, or just for enjoyment? Do we need to identify, compare, annotate, summarise or aggregate the data in any way? Are we trying to deal with trends, identify outliers, similarities or correlations? Understanding better how task abstractions are defined and why we need the data we think we need, helps us to delineate which actions we should take from it - and this informs better design choices later on.
How should we build the visualisation and interactions? How will we define the design choices that will compose the final product? How will we encode, manipulate, facet or reduce our data to enable our visualisations? Will we use order, alignment, visualise change over time, compare dimensions and attributes, filter or aggregate our data? How will all of these big decisions inform the smaller ones we will make when building a chart, report, presentation or dashboard?
An instance of the What-Why-How process should satisfy one or a small group of analytical inquiries, but chances are that if this instance is successful, other instances will be chained from it. And each subsequent instance will be another case of What-Why-How.
This is brilliantly elegant because it indeed describes the mental models we employ as visualisation designers: first, we define what data will feed into our analysis, then we figure out why we need it to be a certain way, what’s to be gained from it? What is its purpose? And finally, we operationalise how we will depict the data, which encodings or methods we will employ, and what design choices we will make. And that can happen multiple times in a more complex piece of work.
Take a simple Sales Dashboard as an example. Suppose we have only the very basic metrics: volume of sales, revenue and profit.
What data will we need for the sales volume analysis? Perhaps the number of sales orders, or quantities sold of each product, for each region.
Why does it matter to our users to see this sales volume? Maybe they need to track how their sales teams are performing, or maybe they need to make sure that new products are selling at the expected levels. Each of these reasons may yield a different visualisation choice.
Once we know what the data is and why it matters, how will we visualise it? Will it be a number? A KPI with conditional formatting? Will it be a bullet chart against a target?
We then go into the next metric: What is the data I have for revenue? Is it the value of each sales order? Is it price multiplied by the quantity of each unit sold across multiple products?
Why should I look at revenue within the sales volume context? Perhaps it will give me a better understanding of where the sales team’s efforts are better spent: higher-value products that yield more revenue despite lower sales volumes. Perhaps I just want to make sure revenue targets are met.
Once I have this information, how can I visualise revenue alongside sales volumes? What would make sense for a more complete analysis within the context of the analytical questions raised?
We then go through the same basic framework for profit and any further visualisations or themes we should add.
Visualisation Analysis and Design then goes into the depths of each one of these steps and defines the language and the procedures for the What, the Why and the How of data viz. Each one of the steps is complemented by a neat visual at the start of the chapters, summarising visually the concepts of each point raised. It is wonderfully structured, making it easy to consult whenever you need a refresh on a particular point.
To top it all off, the author has a YouTube Channel with a playlist containing complementary video explanations of each of the book’s chapters. If you’re on the fence about this book being a good addition to your data viz bookshelf, I strongly suggest you look at the videos first.
Should you read it?
The author begins the book by explaining she wrote it because she found herself frustrated with the curriculum of data visualisation academic courses. She wanted a textbook like this to guide the lectures instead of getting students to read dozens of different research papers. Don’t be fooled; this is a classic textbook - although it is one that deals with broader themes and will likely age gracefully as the principles discussed here don’t rely heavily on any particular technology - they are transferable to the data visualisation practice as a whole.
If you’re looking for a how-to book for specific techniques applicable to particular tools, this is not the book for you. But if you’re ready to explore more of the theory and abstractions behind your choices as a data visualisation designer, or if you’re looking for a more structured, generalist approach to the field, then Visualization Analysis & Design by Tamara Munzner is the perfect resource. It is also excellent reference material to get you into other resources from the field: additional research, articles, books and authors are heavily referenced in the text.
I highly recommend this book to everyone who’s serious about becoming a data viz developer and making a career out of it.
Always check your local library first to see if any of the books I recommend are available. If they’re not, consider donating a copy!
Get a copy at your local library | Amazon
You can get more resources about the book, sample chapters and lecture videos here.
If you subscribe to my monthly Newsletter, you’ll get a summary of all recommendations, plus more of my data viz musings.
You can also follow Data Rocks on LinkedIn or read this and other articles on Medium.