Avoiding Data Pitfalls or how a book can become your best friend in a journey full of traps.

Thabata Romanowski

Jan 20, 20238 min read

Book cover of Avoiding Data Pitfalls by Ben Jones — From the Data Viz Bookshelf: Avoiding Data Pitfalls by Ben Jones.

If you are not from a STEM background (that is short for Science, Technology, Engineering, and Mathematics), chances are you either have struggled or are struggling to make sense of all the jargon around you when it comes to data. Learning the ropes of working with data in a business environment when you have multiple meetings, deadlines, priorities and tasks to juggle sounds daunting.

When we feel overwhelmed, we are more prone to commit mistakes, overlook gaps, ignore conflicting information or perpetuate biases by the sheer force of cognitive load. It’s unintentional, but often the simple thought of having to wrangle data for a report or presentation sends shivers down the spine: “what will I mess up now…?”

I have bitter-sweet news for you. The bitter part is that mistakes are inevitable on your way to learning more about data, and it’s a messy and complex process. The path will not be linear and will, instead, be filled with traps you’ll likely fall into. And you will fall into them. Multiple times even. Everyone falls into them, regardless of seniority or affinity with numbers and technology. I have. Ben Jones, the author of Avoiding Data Pitfalls, has too - and he’s very open about it.

The sweet part is that you don’t have to go alone. Nor you have to beat yourself up when it happens. See every mistake as a free training opportunity: every time you fall into one of the pitfalls mentioned by the author, you’ll have learned about a new way to avoid them.

The tortuous path of becoming a data person

When I started my career, I was an intern at a bank. I confess it wasn’t what I had envisioned myself doing. My day was filled with spreadsheets and customer calls. The spreadsheet part was particularly challenging for my young self. I had never interacted with data in an organised, business sense before. Sure, I have read about statistics and seen reports and metrics - but this was the first time I had to make things from data.

My boss at the time gave me a reconciliation report I had to run weekly. It consisted of merging two spreadsheets from different sources based on a common key. The term VLOOKUP may mean something now. But it didn’t mean anything to me back then. Everyone starts somewhere, and I started there, trying to hurriedly look up on Google what the hell VLOOKUP actually was and why did I have to use that to do my report.

Needless to say, mistakes were made. By me. Multiple times. First, I discovered the two spreadsheets had mismatches between them - a key present in one source but not in the other. Then, I found out that both spreadsheets had duplicated keys - each with slightly different details in the columns I needed to bring together. There was no documentation on what to do with these situations, the clock was ticking, and assumptions had to be made. The further I looked, the more I felt like I was digging a bottomless pit. In my inexperience, I put the report together the best way I could, which was obviously (in hindsight) not good enough.

It is easy to attribute our mistakes when working with data to a lack of technical knowledge or overly complex systems. Often our struggles take the shape of technical jargon or tools and technology, and we attribute our negative feelings to those things instead of being aware of what’s happening.

It is a tangible thing to blame: perhaps I made mistakes because I didn’t know Excel (or Tableau, or Power BI, or SQL, or Python), or I didn’t understand Statistics or Math, or because I had no idea of what a relational database was supposed to be, or why it has anything to do with the PowerPoint I had to present my boss showing incorrect reconciliation figures.

The truth is, most of the time, the path to becoming more comfortable using data as part of our daily routines doesn’t depend as much on technology as we think. In fact, other, more human things get in the way.

Self-awareness is your data superpower.

Comic strip with 2 squares. Title: The path to becoming data-informed. First square: Straight line going upwards with marks along it: first mark: Facts; second mark: Data; third mark: Analysis; last mark: All the right answers. Second square has the drawing of a mountain range full of ups and downs, with several marks along the way: reality; incomplete data; lots of assumptions; uh-oh, incorrect assumptions; new assumptions, maybe this is it?; all the biases; let's test this hypothesis; oh no, not again; hum...; ok, it seems to make sense; phew; I still don't have answers but I now know multiple ways how not to get them. — The tortuous path to becoming a data person, created by Data Rocks

What is happening then? I hear you ask. Very likely, you and I are stumbling into something Ben Jones chose to call a Data Pitfall in his excellent book Avoiding Data Pitfalls: How to Steer Clear of Common Blunders when Working with Data and Presenting Analysis and Visualisations.

Ben Jones describes, in an approachable, easy-to-follow language, eight* pitfalls in his book. The author goes into great detail and gives excellent examples of each trap, how we fall into them and what to do to learn from them and become more aware of their treacherous ways.

As he advises multiple times across the text, anyone can become more aware of the existence of data pitfalls and how they present themselves, but nobody can entirely avoid them. They’re still the same, but they present themselves in novel ways.

The more you climb the DIKW pyramid (or, the further you go in your analytics path), the sneakier the pitfalls become.

He breaks down his book into nine chapters. He brings up seven* pitfall groups, where he discusses common biases, mistakes, inconsistencies, fallacies and phenomenons that affect how we interact with data in multiple ways:

The mistakes we make when we think about data, such as the gap between reality and what’s represented in the data we collect or the data entry mistakes we all have to deal with at some point;
The traps we fall into when we are processing data, like the example in my story - when you have mismatched joins, bad blends, or dirty and inconsistent data;
The errors we make when calculating data, such as aggregations, missing values and misleading percentages;
The dangers lurking in how we compare data, mainly when using averages, sample sizes, and understanding how populations affect our results;
The incorrect assumptions we can fall for when we analyse data, where the author brings up one of my favourite discussions: the false idea that data-informed decision-making has no place for intuition;
The risks of not paying attention to how we visualise data, where another one of my favourite subjects is brought up: the also false idea that there are rigid rules when it comes to data visualisation, such as the unjustified hatred for pie charts;
The dangers of misusing design to dress up our data, where the author makes a point of function over form, in a thoughtful and balanced manner, without being pedantic about specific chart elements;

Being capable of detecting a data pitfall when it’s approaching and before you fall into it is the primary skill you’ll take away from this book - and it is a unique skill that will enable you to advance much further, much quicker in your data journey. Awareness becomes your data superpower.

But the book also has another underlying lesson, as I see it. One about your data arch-nemesis.

Your fear of making mistakes is your data arch-nemesis.

One point the author makes clear he wants to drive home with this book is that, when working with data, complex systems, decision-making, or how reality is represented through metrics, making mistakes is part of the process.

Making mistakes now is what will help you build the muscles you need to avoid those same mistakes in the future. It is also what paves the way for those coming after you: if you figure out something has been terribly wrong for a long time and fix it, everyone after you will have the chance of no longer making the same mistakes again - they’ll have to come up with new ones!

It is a permanent exercise in acceptance and self-improvement. Mistakes are bound to happen, and the best we can do is to learn how to deal with them graciously when the time comes.

If awareness is the superpower that will guide you away from falling into dreadful data pitfalls, there is also a supervillain that has a lot of fun when you fall into one of these traps and can’t get out: your ego.

Dealing with how you feel when you discover that you may have fallen into one of the pitfalls is the biggest hurdle you’ll learn to overcome on your way to truly becoming a data person.

The book goes briefly into this and offers a practical framework to help you deal with the dreadful feeling that will inevitably be there when you find yourself struggling in your next data project:

First, fix your mistake (get out of the pitfall)
Then, take note of the pitfall you fell into - the author offers a handy checklist at the end of the book to help you keep track of where you fall.
And last, tell everyone about what happened - as difficult as it sounds, you’ll only overcome your fear by owning your mistake so it can’t haunt you anymore.

The 8th pitfall.

If you noticed this little asterisk* sign during the post, I mentioned that the book talks about eight pitfalls, but only seven appear in the chapters.

When you make it to the end, you’ll notice Ben Jones adds an eighth pitfall to the book’s conclusion: The Pitfall of the Unheard Voice. If you are a woman in tech, you know what this pitfall means.

Whether by design or not, the author does exactly what he preached in his book:

He says he was aware he would fall into one or more data pitfalls while writing the book (a data superpower).
He then identifies he made the mistake of only including quotes from men in the epigraph of each chapter and decided to talk about it (getting out of the pitfall)
He then acknowledges the mistake and expands on it as part of his closing comments. (taking note of the pitfall he fell into)
Considering he wrote the book, he could’ve just sneaked the eighth pitfall into the final draft as if nothing had happened. But then, who would’ve learned from it? He shares his mistake in the hopes that we can all learn from it. (telling everyone what happened).

As a woman whose first job in analytics was in an all-male team, This is a far too common pitfall that I’m genuinely pleased to see acknowledged.

Should you read it?

We can’t ignore the fact that data is all around us now. There’s no escape. Avoiding Data Pitfalls by Ben Jones serves as a guide on how to deal with the increasing complexity of the world around us. Whether you’re directly involved in working with analytics or just dipping your toes into the world of data.

I recommend it to everyone who has any interest in becoming more aware of how data works, regardless of seniority level, familiarity with data, or even if you believe you have no interest in working with data at all - go read it. You will learn a thing or two, I am sure of it. It is an outstanding resource to have nearby.

It is not a technical book focused on any particular software, but it will warn you of the many dangers luring you into a data pitfall - such as using VLOOKUP when you have duplicates and mismatched keys, which is precisely what I needed those many years ago.

Always check your local library first to see if any of the books I recommend are available. If they’re not, consider donating a copy!

Get a copy at your local library | Amazon

If you subscribe to my monthly Newsletter, you’ll get a summary of all recommendations, plus more of my data viz musings.

You can also follow Data Rocks on LinkedIn