Exploring local revenue through edbuildr and the tidyverse (60 min)
Break (10 min)
Visualizing local revenue data with ggplot2 (30 min)
Homework assignment (5 min)
Homework Review
Solving a common problem with the homework
Great job, overall! One universal comment is that we want to avoid hard coding values.
# Several people took an extra step and hard coded in the valuemedian(sd_data$MHI, na.rm = T)mhi_median <-57199# You can streamline the code like this: mhi_median <-median(sd_data$MHI, na.rm = T)# We avoid hard coding variables because if you update the data set and the 'MHI' # variable updates, then 'mhi_median' will automatically update. # find the median value of median property value (MPV) and store the result in a variable called `mpv_median`mpv_median <-median(sd_data$MPV, na.rm = T)# find the median value of state revenue per pupil (SRPP) and store the result in a variable called `srpp_median`srpp_median <-median(sd_data$SRPP, na.rm = T)# find the median value of local revenue per pupil (LRPP) and store the result in a variable called `lrpp_median`lrpp_median <-median(sd_data$LRPP, na.rm = T)
How do local taxes affect school finance equity?
How Property Taxes Fund Schools
A large share of funding for public schools (44.9%) is generated by taxes at the local level. Property taxes are the most common source of local tax revenue for public schools in large part because they provide a stable revenue stream compared to taxes on sales or income. These taxes can be set by counties, towns, or school boards, depending on state policy.
Property taxes are typically collected based on two variables:
Assessed value of property
Property tax rate, sometimes called the “mill” rate.
How do property taxes produce education finance inequities?
School districts generate revenue from the local property tax associated with property located within districts’ geographic boundaries. When school districts have significantly different levels of property wealth producing those tax revenues compared to one another, a massive barrier to equitable educational finance results.
Defining key property tax terms
What does “assessed value of property” mean?
The assessed value of a property is the dollar value used by a local government to determine how much a particular property is worth. The assessed value of a property may not always match the market value. Some local governments use an assessment rate to translate the market value of a property into taxable value.
What is the “property tax rate” or “mill rate?
The mill rate is the number of dollars per $1,000 in assessed value that property owners will owe each year. The mill rate multiplied by assessed value equals local property tax bills.
In some instances, state policy directly shapes or limits property taxes. Some states include caps on how much mill rates can grow each year; other states limit the growth in the assessed values of properties over time.
Making Local School Funding More Equitable
The way school district boundaries are drawn and the way school funding formulas account for differences in tax capacity among districts can reflect, amplify, or mitigate property wealth inequality. State policymakers can also implement policies that directly address local funding inequity by constraining runaway local taxation by wealthier districts, compensating for local revenue differences with state funds, and/or redistributing some of those funds. Some examples include:
Capping local revenue: State policymakers can set a limit on the amount of per-pupil revenue local communities can generate for their public schools.
Supplementing local tax revenues in lower-wealth districts: States can provide additional funding to lower-wealth districts who choose to set a higher tax rate to ensure they get the same revenue for their tax effort that wealthier districts get.
Implementing redistributive mechanisms for exceeding local revenue caps: If communities wish to generate more revenue for their local school system above the cap, policymakers can set a rule that for every additional dollar generated above the cap, another dollar must be allocated to a state fund for schools in lower-wealth communities.
Enacting a statewide property tax: Moving from many locally collected taxes to a single, statewide property tax is the most direct way to address property wealth inequities among school districts. Vermont is the only state to implement a state-level property tax system for financing education.
Does your state set a local tax floor or ceiling?
State
Description
Alabama
Alabama sets a floor for local property tax rates, as well as a ceiling, above which voter approval is required.
Delaware
None.
Georgia
Georgia sets a floor for local property tax rates, as well as a level above which voter approval is required.
Michigan
Michigan sets a ceiling for local property tax rates.
Mississippi
Mississippi sets a floor and a ceiling for local property tax rates.
New York
None.
Pennsylvania
None.
Questions for advocates
How do property wealth per pupil and mill rates vary by school district in your state?
Do school districts in your state have non-property tax local revenue sources?
Does state policy work to compensate for differences in local revenue capacity among districts? If so, how?
What policies might help improve the equity of local funding in your state?
Introduction to the tidyverse
R is a functional programming language - most of what you will do in R is work with functions
A function:
Takes a specified input
Performs an operation
Returns an output
Functions are helpful tools to reduce repetition in typing and improves your code’s consistency, reliability and readability.
The tidyverse is a powerful collection of R packages that work well together
The most popular packages in the R community are past of what is called the “tidyverse,” which includes packages like ggplot2, tidyr, stringr, tibble, and purrr.
Tidyverse packages are built to work together. Every tidyverse package contains functions that can manipulate or visualize data that lives in data frames.
Most functions in the tidyverse require a data frame (R’s version of a table) as the first argument in each function. The functions can be “chained” together with other functions.
We will explore the tidyverse by looking at EdBuild’s FY2019 education data compiled from the F33 survey, SAIPE, EDGE, and CCD. The edbuildr package provides access to clean district data on funding, student demographics, and wealth.
To read EdBuild’s finance data into our working environment we will use the masterpull() function
This code will load the edbuildr package and then download a clean dataframe from EdBuild’s server and store it as a variable called edbuild_fin_fy19.
There are three options for the data_type argument:
“geo”: only includes districts with geographic boundaries (no charter/special school districts) that meet EdBuild’s criteria for fiscal analysis
“fin”: includes all districts that meet EdBuild’s criteria for fiscal analysis
“full”: includes all districts – TO BE USED WITH CAUTION
The dplyr() package provides many functions to manipulate your data frames’ columns and rows
The functions you’ll most frequently use from the dplyr packages are:
select(): names columns to keep from a data frame
rename(): name columns to keep from a data frame
filter(): remove rows that do not meet the condition in the logical statement from the output
mutate(): create a new column that will be added to the end of your data frame.
A special operator called a “pipe” will allow you to chain several functions together
In R, the pipe operator is a vertical bar and a greater-than sign: |>
Instead of “nesting” the results of one function inside of another, the pipe allows you to execute the same command in a more human-readable order.
# example of "nesting" the result of one function inside another functiondf1 <-select(filter(raw_df, enroll >10000), dist_id, dist_name, enroll, rev_pp)# example of using the "pipe" to produce the same result as above# read the pipe as "and then" to understand what's being done to your datadf2 <- raw_df |># start with raw_df AND THENfilter(enroll >10000) |># filter for enrollment > 10,000 AND THENselect(dist_id, dist_name, enroll, rev_pp) # select only these columns
Iterative exploratory analysis
LIVE CODING EXAMPLE: exploring_the_tidyverse.R
# load ----------options(scipen =999)library(tidyverse)library(edbuildr)# Krista likes to denote which spreadsheet is raw for transparency dist_fy19_raw <-masterpull(data_type ="geo")# filter Minnesota data and clean -----# filter, rename, and mutate data for Minnesota mn_ed_data <- dist_fy19_raw |>filter(State =="Minnesota") |>rename(district = NAME,county = County,enroll = ENROLL, total_local_rev = LR,total_state_rev = SR,total_state_local_rev = SLR, urbanicity = dUrbanicity,operational_schools = dOperational_schools, district_type = dType, white_enroll = dWhite, sped_enroll = dIEP, ell_enroll = dLEP, econ_dis_enroll = StPov,bipoc_pct = pctNonwhite, pov_pct = StPovRate,median_house_income = MHI, median_prop_value = MPV) |>mutate(bipoc_enroll = enroll - white_enroll,ell_pct = ell_enroll/enroll, sped_pct = sped_enroll/enroll, local_rev_pp = total_local_rev/enroll,state_rev_pp = total_state_rev/enroll,local_state_rev_pp = total_state_local_rev/enroll) |>select(district, county, enroll, local_rev_pp, state_rev_pp, local_state_rev_pp, total_local_rev, total_state_rev, total_state_local_rev, urbanicity, operational_schools, district_type, pov_pct, bipoc_pct, ell_pct, sped_pct)
Visual exploratory analysis
Summarizing columns can be helpful, but it can hide nuance in your data that can be better seen via plotting
The ggplot2 package is the most widely-used data visualization approach in the R ecosystem.
Plots in ggplot2 are created by starting with your data, then building up layers
Once your specify your data, you can add layers of “geoms” to create your plot
Today we will use geom_histogram() and geom_point()
Iterating your way to beauty with ggplot2 using Minnesota data
Step 1: Create a basic plot
# first minnesota plot ggplot(mn_ed_data, aes(x = pov_pct, y = local_rev_pp)) +geom_point()
The missing data makes sense. All eight of the rows don’t have any available data.
Step 3: Clean up formatting of chart elements (1/5)
Now that we understand our missing data, we can create an updated dataframe and re-plot our newly cleaned data.
# create tidy dfmn_ed_clean <- mn_ed_data |>filter(!is.na(local_rev_pp), !is.na(pov_pct))view(mn_ed_clean)# Drop the 2 districts that have no enrollmentmn_ed_clean <- mn_ed_clean |>filter(enroll >0)# first minnesota plot w/ clean dataggplot(mn_ed_clean, aes(x = pov_pct, y = local_rev_pp)) +geom_point()
Step 3: Clean up formatting of chart elements (1/5)
Step 3: Clean up formatting of chart elements (2/5)
We see some overlap in the points. Reducing the opacity of the points can be accomplished by setting the alpha parameter in geom_point() to a value less than 1. Setting it to .5 will make data points 50% translucent.
# reduce opacity of pointsggplot(mn_ed_clean, aes(x = pov_pct, y = local_rev_pp)) +geom_point(alpha = .5)
Step 3: Clean up formatting of chart elements (3/5)
Let’s take care of some formatting issues.
Our axes don’t look great - the decimals ought to be percentages and the vertical axis represents dollars. Here, the scales package provides some help.
# format axes library(scales)# format the x and y axesggplot(mn_ed_clean, aes(x = pov_pct, y = local_rev_pp)) +geom_point(alpha = .5) +# make sure you have the `scales` package loaded!scale_x_continuous(labels =label_percent()) +scale_y_continuous(labels =label_dollar())
Step 3: Clean up formatting of chart elements (3/5)
Step 3: Clean up formatting of chart elements (4/5)
Next, we should add some labels to our axes that make sense, along with a title for our plot and a caption that details our data sources.
# add data labels ggplot(mn_ed_clean, aes(x = pov_pct, y = local_rev_pp)) +geom_point(alpha = .5) +# make sure you have the `scales` package loaded!scale_x_continuous(labels =label_percent()) +scale_y_continuous(labels =label_dollar()) +labs(x ="Student Poverty Rate", y ="Local Per-Pupil Revenue",title ="Local Per-Pupil Revenue by Student Povert Rate in Minnesota School Districts",caption ="Source: Edbuild Data, 2019")
Step 3: Clean up formatting of chart elements (4/5)
Step 3: Clean up formatting of chart elements (5/5)
Themes can be used to change the appearance of elements in your plot. There are many stock options, but I prefer theme_bw() for its clean appearance and helpful and unobtrusive grid lines.
# change themeggplot(mn_ed_clean, aes(x = pov_pct, y = local_rev_pp)) +geom_point(alpha = .5) +scale_x_continuous(labels =label_percent()) +scale_y_continuous(labels =label_dollar()) +labs(x ="Student Poverty Rate", y ="Local Per-Pupil Revenue",title ="Local Per-Pupil Revenue by Student Povert Rate in Minnesota School Districts",caption ="Source: Edbuild Data, 2019") +theme_bw()
Step 3: Clean up formatting of chart elements (5/5)
Break
Step 4: Add a new layer of data
Now that we have a decent-looking graph, let’s add in a new data element to vary point size by enrollment.
# add size element ggplot(mn_ed_clean, aes(x = pov_pct, y = local_rev_pp, size = enroll)) +geom_point(alpha = .5) +scale_x_continuous(labels =label_percent()) +scale_y_continuous(labels =label_dollar()) +labs(x ="Student Poverty Rate", y ="Local Per-Pupil Revenue",title ="Local Per-Pupil Revenue by Student Povert Rate in Minnesota School Districts",caption ="Source: Edbuild Data, 2019") +theme_bw()
Step 4: Add a new layer of data
Step 5: Tidy up formatting (1/2)
Adding a new variable for size creates a legend. We need to tidy the legend’s labels and the title.
# clean up the legendggplot(mn_ed_clean, aes(x = pov_pct, y = local_rev_pp, size = enroll)) +geom_point(alpha = .5) +scale_x_continuous(labels =label_percent()) +scale_y_continuous(labels =label_dollar()) +# change legend label formattingscale_size(labels = comma) +labs(x ="Student Poverty Rate", y ="Local Per-Pupil Revenue",title ="Local Per-Pupil Revenue by Student Povert Rate in Minnesota School Districts",caption ="Source: Edbuild Data, 2019",# add nice label for size elementsize ="Enrollment") +theme_bw()
Step 5: Tidy up formatting (1/2)
Step 5: Tidy up formatting (2/2)
We can also adjust some paramenters to allow for more visual contrast in size. By default, ggplot2 will adjust points’ radii based on the size variable. Using area is a more visually honest way to represent the data, so let’s make that change.
# create more contrast in sizeggplot(mn_ed_clean, aes(x = pov_pct, y = local_rev_pp, size = enroll)) +geom_point(alpha = .5) +scale_x_continuous(labels =label_percent()) +scale_y_continuous(labels =label_dollar()) +# change size scaling to vary by area, not radius + change max sizescale_size_area(labels =label_comma(), max_size =10) +labs(x ="Student Poverty Rate", y ="Local Per-Pupil Revenue",title ="Local Per-Pupil Revenue by Student Povert Rate in Minnesota School Districts",caption ="Source: Edbuild Data, 2019",# add nice label for size elementsize ="Enrollment") +theme_bw()
Step 5: Tidy up formatting (2/2)
Step 6: Repeat steps 4-5 as needed (1/5)
Adding color can be helpful. Let’s add color based on urbanicity.
# add in color based on urbanicityggplot(mn_ed_clean, aes(x = pov_pct, y = local_rev_pp, size = enroll,color = urbanicity)) +geom_point(alpha = .5) +scale_x_continuous(labels =percent_format(accuracy =1)) +scale_size_area(labels = comma, max_size =10) +labs(x ="Student Poverty Rate", y ="Local Per-Pupil Revenue",title ="Local Per-Pupil Revenue by Student Povert Rate in Minnesota School Districts",caption ="Source: Edbuild Data, 2019",size ="Enrollment") +theme_bw()
Step 6: Repeat steps 4-5 as needed (1/5)
Step 6: Repeat steps 4-5 as needed (2/5)
That doesn’t look great. Let’s use some more functions from the tidyverse to clean up the data a little more to reduce 11 urbanicity levels to four, then tidy up the legend label.
# clean up the unicorn vomitmn_ed_clean <- mn_ed_clean |>mutate(urbanicity =fct_collapse(as.factor(urbanicity),City =c("11-City: Large", "12-City: Mid-size","13-City: Small"),Suburb =c("21-Suburb: Large","22-Suburb: Mid-size","23-Suburb: Small"),Town =c("31-Town: Fringe","32-Town: Distant","33-Town: Remote"),Rural =c("41-Rural: Fringe","42-Rural: Distant","43-Rural: Remote")))# better colors and tidy legend labelggplot(mn_ed_clean, aes(x = pov_pct, y = local_rev_pp, size = enroll,color = urbanicity)) +geom_point(alpha = .5) +scale_x_continuous(labels =label_percent()) +scale_y_continuous(labels =label_dollar()) +scale_size_area(labels =label_comma(), max_size =10) +labs(x ="Student Poverty Rate", y ="Local Per-Pupil Revenue",title ="Local Per-Pupil Revenue by Student Povert Rate in Minnesota School Districts",caption ="Source: Edbuild Data, 2019",size ="Enrollment") +theme_bw()
Step 6: Repeat steps 4-5 as needed (2/5)
Step 6: Repeat steps 4-5 as needed (3/5)
We can and should adjust the colors used. R recognizes some pretty funky color names, which can be found in this helpful cheat sheet.
# adjust colors manuallyggplot(mn_ed_clean, aes(x = pov_pct, y = local_rev_pp, size = enroll,color = urbanicity)) +geom_point(alpha = .5) +# create manual color palette# color names pulled from a pdf y'all should bookmark# http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdfscale_color_manual(values =c("tomato3", "steelblue2","seagreen3", "orchid1")) +scale_x_continuous(labels =label_percent()) +scale_y_continuous(labels =label_dollar()) +scale_size_area(labels =label_comma(), max_size =10) +labs(x ="Student Poverty Rate", y ="Local Per-Pupil Revenue",title ="Local Per-Pupil Revenue by Student Povert Rate in Minnesota School Districts",caption ="Source: Edbuild Data, 2019",size ="Enrollment") +theme_bw()
Step 6: Repeat steps 4-5 as needed (3/5)
Step 6: Repeat steps 4-5 as needed (4/5)
We should strive to make our analyses as accessible as possible. The viridis package includes some color palettes that are friendly for folks with color blindness, which affects 5-10 percent of the US population.
# use colors better for visual impairmentslibrary(viridis)# adjust colors manuallyggplot(mn_ed_clean, aes(x = pov_pct, y = local_rev_pp, size = enroll,color = urbanicity)) +geom_point(alpha = .5) +# use a colorblind-friendly palettescale_color_viridis_d() +scale_x_continuous(labels =label_percent()) +scale_y_continuous(labels =label_dollar()) +scale_size_area(labels =label_comma(), max_size =10) +labs(x ="Student Poverty Rate", y ="Local Per-Pupil Revenue",title ="Local Per-Pupil Revenue by Student Povert Rate in Minnesota School Districts",caption ="Source: Edbuild Data, 2019",size ="Enrollment") +theme_bw()
Step 6: Repeat steps 4-5 as needed (4/5)
Step 6: Repeat steps 4-5 as needed (5/5)
Let’s adjust the range of colors used to exclude that hard-to-see yellow.
# that yellow is hard to see - let's adjust the rangeggplot(mn_ed_clean, aes(x = pov_pct, y = local_rev_pp, size = enroll,color = urbanicity)) +geom_point(alpha = .5) +# adjust color rangescale_color_viridis_d(end = .8) +scale_x_continuous(labels =label_percent()) +scale_y_continuous(labels =label_dollar()) +scale_size_area(labels =label_comma(), max_size =10) +labs(x ="Student Poverty Rate", y ="Local Per-Pupil Revenue",title ="Local Per-Pupil Revenue by Student Povert Rate in Minnesota School Districts",caption ="Source: Edbuild Data, 2019",size ="Enrollment") +theme_bw()
Step 6: Repeat steps 4-5 as needed (5/5)
Step 7: Replace a layer of data
Sometimes color isn’t the answer - let’s try facets instead.
# use facets instead of colorggplot(mn_ed_clean, aes(x = pov_pct, y = local_rev_pp, size = enroll)) +geom_point(alpha = .5) +# add facet by urbanicityfacet_wrap(~urbanicity) +scale_x_continuous(labels =label_percent()) +scale_y_continuous(labels =label_dollar()) +scale_size_area(labels =label_comma(), max_size =10) +labs(x ="Student Poverty Rate", y ="Local Per-Pupil Revenue",title ="Local Per-Pupil Revenue by Student Povert Rate in Minnesota School Districts",caption ="Source: Edbuild Data, 2019",size ="Enrollment") +theme_bw()
Step 7: Replace a layer of data
Using histograms to compare state and local revenue
Use Edbuildr data to create state summaries
Summarize the state level data and calculate the median for enrollment, number of schools, median household income, median property value, and more.