Data Disasters (work in progress)

Abstract

A real-world companion to statistics 101 exploring disasters in data management, analytical reasoning, and workflow through counterexample and data simulation.

Type
Publication
Online Only

This book explores common patterns for eleven types of data disasters encountered by entry-level data analysts, including errors related to…

Data

  • Data Dalliances: Misinterpreting or misuing data based on how it was collected or what it represents
  • Computational Quandaries: Letting computers do what you said and not what you meant

Analysis

  • Egregious Aggregations: Losing critical information when information is condensed
  • Vexing Visualization: Confusing ourselves or others with plotting choices
  • Incredible Inferences: Drawing incorrect conclusions for analytical results
  • Cavalier Causality: Falling prey to spurious correlations masquerading as causality
  • Mindless Modeling: Failing to get the most value out of models by not tailoring the features, targets, and performance metrics
  • Alternative Algorithms: Lacking an understanding of alternative methods which may be better suited for the problem at hand

Workflow

  • Futile Findings: Asking and answering questions that aren’t useful
  • Complexifying Code: Making projects unwieldy or more difficult to understand than necessary
  • Rejecting Reproducibility: Working inefficiently instead of an efficient, reproducible, and sharable workflow

Related