Posts

Launching the Data Disasters project

From academic training to real world data analysis

Workflows for querying databases via R

Simple, self-contained, reproducible examples are a common part of good software documentation. However, in the spirit of brevity, these examples often do not demonstrate the most sustainable or flexible workflows for integrating software tools into large projects.

Understanding the data (error) generating processes for data validation

A data consumer’s guide to validating data based on the failure modes data producer’s try to avoid

A Tale of Six States: Flexible data extraction with scraping and browser automation

Exploring how Playwright's headless browser automation (and its friends) can help unite the states’ data

Embedding column-name contracts in data pipelines with dbt

dbt supercharges SQL with Jinja templating, macros, and testing – all of which can be customized to enforce controlled vocabularies and their implied contracts on a data model

Causal design patterns for data analysts

An informal primer to causal analysis designs and data structures

Resource Round-Up: Causal Inference

Free books, lectures, blogs, papers, and more for a causal inference crash course

Building a team of internal R packages

On the jobs-to-be-done and design principles for internal tools

Generating SQL with {dbplyr} and sqlfluff

Using the tidyverse’s expressive data wrangling vocabulary as a preprocessor for elegant SQL scripts. (Image source techdaily.ca)

Introducing the {convo} package

An R package for maintaining controlled vocabularies to encode contracts between data producers and consumers