United States’ Income Inequality

2 minute read

United States' Income Inequality

Follow along on Google Colab.

Open In Colab

License

The license for the code and the visual: CC BY-SA 4.0.

Introduction

This was inspired by the Switerland’s regional income (in-equality) thematic map by Timo Grossenbacher and Angelo Zehr. Grossenbacher and Zehr’s map was generated using ggplot2 and sf from R, and uses geographical features to enhance the visual. However, this map of the United States’ Income Inequality was generated using plotly on Python, and touched up with Adobe Illustrator. Furthermore, this does not consider geographical features, but it would be interesting to implement if the data are available.

This was originally posted on r/dataisbeautiful.

Import packages

# Data
from urllib.request import urlopen
import json
import pandas as pd

# Plotly
import plotly.express as px

Data

OUTPUT_PATH = Path.cwd().joinpath('output').resolve()

# create output path if it does not exist:
OUTPUT_PATH.mkdir(parents=True, exist_ok=True)
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
    counties = json.load(response)

The geojson file containing the geometries for the counties was posted by plotly on their Github.

df_gini = pd.read_csv(r'https://raw.githubusercontent.com/lokdoesdata/us-income-inequality/main/data/gini.csv')
df_income = pd.read_csv(r'https://raw.githubusercontent.com/lokdoesdata/us-income-inequality/main/data/income.csv')

df = df_gini.merge(df_income[['GEO_ID', 'Mean Income']], on = 'GEO_ID')
df['fips'] = df['GEO_ID'].str[-5:]

The data used were the Gini index of income inequality and the mean income by county. They were both obtained from the United States’ Census 2018 ACS 5-Year Estimates.

Calculation

gini_1 = df['Gini'].quantile(1/3)
gini_2 = df['Gini'].quantile(2/3)

df['gini quartile'] = df['Gini'].apply(
    lambda x: 3 if x >= gini_2 else 2 if x >= gini_1 else 1)

income_1 = df['Mean Income'].quantile(1/3)
income_2 = df['Mean Income'].quantile(2/3)

df['income quartile'] = df['Mean Income'].apply(
    lambda x: 3 if x >= income_2 else 2 if x >= income_1 else 1)

df['color scale'] = df['gini quartile'].astype(str) + ' - ' + df['income quartile'].astype(str)

Both Gini index and Mean income were divided into thirds based on the percentile. Combining the two creates a three-by-three bivariate color scale.

Visual

color_scale = {
    '3 - 3': '#3F2949', # high inequality, high income
    '2 - 3': '#435786',
    '1 - 3': '#4885C1', # low inequality, high income
    '3 - 2': '#77324C',
    '2 - 2': '#806A8A', # medium inequality, medium income
    '1 - 2': '#89A1C8',
    '3 - 1': '#AE3A4E', # high inequality, low income
    '2 - 1': '#BC7C8F',
    '1 - 1': '#CABED0', # low inequality, low income
}

fig = px.choropleth(df, 
                    geojson=counties, 
                    locations='fips', 
                    color='color scale',
                    color_discrete_map=color_scale,
                    scope='usa'
                   )

fig.update_layout(margin={'r':0, 't':0, 'l':0, 'b':0})
raw
Untouched, low quality, static export of United States’ Income Inequality (for demonstration only)

The visual was exported into a PDF and additional enhancements were done on Adobe Illustrator.