When Congress passed the Coronavirus Aid, Relief, and Economic Security Act in March, it included $4 billion in homeless aid through the Emergency Solutions Grant program run by the Department of Housing and Urban Development. HUD since 1987 had calculated the amount of homeless aid communities received based on a formula that was unrelated to homelessness. Since the pandemic grant program was so large, Congress told HUD to recalculate its formula to reflect an area’s homelessness. That “new” formula was used to allocate the second round of ESG funding, nearly $3 billion, in June 2020 and dramatically reshaped where dollars went nationwide.
I wanted to discover how much money each city, county and state recipient would have gotten to care for their homeless residents over the last decade had HUD used this better-targeted formula starting in 2011 instead of 2020.
The new formula considers four variables for each community:
The numbers of total homeless people and unsheltered homeless people come from the previous year’s Point-in-Time, or PIT, counts, conducted by local HUD-approved bodies that oversee homeless services for a given area. Because the geographic areas covered by these PIT counts often differ from the geographic areas that HUD used to allocate ESG funding, the department had to adjust for those boundary differences to calculate payouts under the new formula. HUD overlaid the ESG recipient boundaries with the homeless census’ geographic areas, creating a third, intermediate geography that enabled them to assign homeless counts into the ESG funding geography according to the proportion of ESG funding these intermediate areas received.
To see how much money communities would have received if HUD had used the homeless-centered CARES Act formula yearly, the Howard Center followed the department’s methodology for the pandemic payouts and applied it to the years 2011 to 2019. It did that with the help of a third HUD formula, which is based on the same Community Development Block Grant calculation that HUD uses in the ESG program to determine the amount of an area’s homeless assistance. Using this third formula, which HUD makes public, allowed the Howard Center to create a “crosswalk” for each of the past 10 years to apply actual homeless counts to the existing ESG grant recipient areas.
This process produced a nearly identical crosswalk to the one HUD used in calculating its 2020 pandemic payouts. The Howard Center checked its use of the third formula in assigning homeless counts to grant areas for 2020 with HUD’s actual crosswalk, which the department provided, and the average difference was just 0.006%.
Studying Alabama helps visualize this process. Homeless counts are seen in the blue map for the eight Continuums of Care, HUD-approved entities that administer homeless services. But HUD disburses homeless grants in its ESG program based on the five areas shown on the red map. The crosswalk necessary to calculate the 2020 CARES Act payout created a unique combination of the two geographies, shown in the yellow map.
For example, the Birmingham metro area counted 981 homeless people in 2019. The crosswalk allocates 57% of that total homeless count, or 561 people, to the Birmingham city ESG payment area; 23%, or 224 people, to the Jefferson County ESG payment area; and the remaining 22%, or 196 people, to the Alabama Balance of State ESG payment area within the Birmingham CoC. These calculations are necessary only in places where the CoC and ESG boundaries are in conflict. For example, the 390 homeless people in the Florence/Northwest Alabama CoC fall entirely within the ESG boundary for the Alabama Balance of State — a designation used for largely rural areas.
The remaining two variables of the CARES Act formula, measuring very low-income renters, come from a custom tabulation of American Community Survey data that HUD receives annually from the U.S. Census Bureau. The Comprehensive Housing Affordability Strategy data demonstrates the extent of housing problems and housing needs, particularly for low-income households nationwide. HUD calculates funding allocations using census data at the place-remainder level (Summary Level 070). The Howard Center used geographic information system (GIS) analysis to create custom shape files that corresponded to the place-remainder census geography (summary level 070). It then overlaid them with each year’s ESG map, using areal weighting to apportion the very low-income renter counts where an ESG boundary split a place-remainder area. Areal weighting is one of the simplest ways to interpolate demographic data across geographic boundaries that do not correspond with U.S. Census Bureau boundaries. It does not provide perfect results because it assumes demography is distributed evenly across a geographic region. However, it needed to be used on only about 4% of the country in 2020 and, on average, less than 4% over the last decade.
Calculate ESG-CV2 allocations using crosswalk constructed in 2020_01_simple_shares and data gathered in QGIS.
This code shows how I import the 2012-2016 ACS population data and combine the component parts making up two 2012-2016 CHAS variables used in the ESG-CV2 formula.
HUD combined the following 14 variables at the 070 level to calculate an area’s number of very low-income renters at risk for homelessness:
HUD combined the following 6 variables at the 070 level to calculate an areas’s number of very low-income renters at risk for unsheltered homelessness:
#Import 2012-2016 ACS population data downloaded from SocialExplorer.org
pop <- read_csv("data/inputs/070-pop-2016.csv") %>%
# mutate (Geo_PLACESE = as.character(Geo_PLACESE),
# ACS09_5yr_B01003001 = as.character(ACS09_5yr_B01003001))%>%
rename (GEOID = Geo_PLACESE,
name = Geo_NAME,
pop = ACS16_5yr_B01003001) %>%
dplyr::select (Geo_GEOID, name, pop)
#Import CHAS variables confirmed by HUD
chas <- read_csv ("data/inputs/2012thru2016-070-csv/Table3.csv") %>%
clean_names() %>%
dplyr::select (geoid, t3_est47, t3_est48, t3_est53, t3_est54, t3_est59, t3_est60, t3_est45, t3_est65, t3_est66, t3_est71, t3_est72, t3_est77, t3_est78, t3_est83, t3_est84) %>%
mutate (
vli_renters = (t3_est47 + t3_est48 + t3_est53 + t3_est54 + t3_est59 + t3_est60 + t3_est65 + t3_est66 + t3_est71 + t3_est72 + t3_est77 + t3_est78 + t3_est83 + t3_est84),
vli_crowd = (t3_est47 + t3_est48 + t3_est53 + t3_est54 + t3_est59 + t3_est60)) %>%
dplyr::select (geoid, vli_renters, vli_crowd) %>%
rename (Geo_GEOID = geoid)
#Join together
sl070_2020 <- pop %>%
full_join(chas, by="Geo_GEOID")
#Export to add into shapefile in QGIS
##write.csv (sl070_2020, "data/FY2020/sl070_2020.csv")
See the Google Sheet here for more details on the separate analysis I did for each year 2011 through 2020 in QGIS. In a nutshell, overlaying the 2016 TIGER/Line place and county subdivision shapefiles produces a shapefile that matches the places and place remainders of summary level 070.
After joining together the population and CHAS data above, I joined it to the shapefile using the 22-digit unique identifier Geo_GEOID. I created this field in QGIS using the following expression in the field calculator:
CASE
WHEN “STATEFP” IS NULL THEN
concat (‘07000US’ + “STATEFP_2” + “COUNTYFP”+ “COUSUBFP” + ‘99999’)
ELSE concat (‘07000US’ + “STATEFP_2” + “COUNTYFP” + “COUSUBFP” + “PLACEFP”)
END
All county subdivisions in the overlaid file that don’t have a matching place when overlaid will have a null State FIPS code. These are the “Remainder of” areas, whose Geo_GEOIDs all end with “99999.” Otherwise, the Geo_GEOID is simply a concatenation of “07000US”, the state FIPS code, county FIPS code, county subdivision FIPS code and place FIPS code.
Import prepared data from QGIS, crosswalks and PIT count, and then clean up by removing unnecessary columns and slivers from QGIS data
#Import csv from Census summary level 070 data overlaid onto ESG boundaries in QGIS
merged_esgs_data <- read.csv("data/FY2020/final/merged_intersects_esg.csv", stringsAsFactors = FALSE)
#Importing CDBG crosswalk from HUD
crosswalk <- read_excel("data/inputs/hud_calculations/Disaggregated COC to ESG crosswalk.xlsx") %>%
clean_names() %>%
rename (UOGID = geocode,
coc_num = coc_id) %>%
dplyr::select (1,3,4,8,9)
#Importing ESG crosswalk from HUD
crosswalk_esg <- read_excel("data/inputs/hud_calculations/Aggregated COC to ESG crosswalk.xlsx") %>%
clean_names() %>%
dplyr::select (1,2,6) %>%
rename (coc_num = coc_id)
#Import 2019 PIT count to apply using HUD's crosswalk
pit_2019 <- read_excel("data/inputs/hud_calculations/2007-2019-PIT-Counts-by-CoC.xlsx",
sheet = "2019",
n_max = 397) %>% #used for csv, stringsAsFactors = FALSE)
clean_names %>%
dplyr::select (1:2, overall_homeless_2019, unsheltered_homeless_2019) %>%
rename(coc_num = co_c_number,
coc_name = co_c_name,
coc_total_homeless = overall_homeless_2019,
coc_unsh_homeless = unsheltered_homeless_2019) %>%
mutate(coc_num = ifelse(coc_num == "MO-604a", "MO-604", coc_num))
#Change factors from shapefile into numeric to mutate later.
merged_esgs_data$sl070_pop <- as.numeric(merged_esgs_data$sl070_pop)
merged_esgs_data$sl070_vli_ <- as.numeric(merged_esgs_data$sl070_vli_)
merged_esgs_data$sl070_vl_1 <- as.numeric(merged_esgs_data$sl070_vl_1)
#Add padding to UOGID to make sure they join properly later.
#Add padding to State and County FIPS codes for joining FMRs later.
merged_esgs_data$UOGID <- str_pad(merged_esgs_data$UOGID, 6, pad = "0")
merged_esgs_data$STATEFP_2 <- str_pad(merged_esgs_data$STATEFP_2, 2, pad = "0")
merged_esgs_data$COUNTYFP <- str_pad(merged_esgs_data$COUNTYFP, 3, pad = "0")
merged_esgs_data <- merged_esgs_data %>%
# Add data for 2 Louisville districts that didn't join in QGIS
mutate (
sl070_pop = case_when (
#Louisville Central CCD
NAMELSAD_2 == "Louisville Central CCD" ~ 30297,
#Louisville West CCD
NAMELSAD_2 == "Louisville West CCD" ~ 63820,
TRUE ~ sl070_pop),
sl070_vli_ = case_when (
#Louisville Central CCD
NAMELSAD_2 == "Louisville Central CCD" ~ 8215,
#Louisville West CCD
NAMELSAD_2 == "Louisville West CCD" ~ 10870,
TRUE ~ sl070_vli_),
sl070_vl_1 = case_when (
#Louisville Central CCD
NAMELSAD_2 == "Louisville Central CCD" ~ 220,
#Louisville West CCD
NAMELSAD_2 == "Louisville West CCD" ~ 855,
TRUE ~ sl070_vl_1))
#remove unnecessary columns
merged_esgs_data <- merged_esgs_data %>%
dplyr::select (-c(fid, NAME, PLACENS, LSAD, CLASSFP, PCICBSA, PCINECTA, MTFCC, FUNCSTAT, ALAND, AWATER, INTPTLAT, INTPTLON, COUSUBNS, LSAD_2, CLASSFP_2, MTFCC_2, CNECTAFP, NECTAFP, NCTADVFP, ALAND_2, AWATER_2, INTPTLAT_2, INTPTLON_2, fid_2))
#remove QGIS overlay slivers that could create errors later
merged_esgs_data <- merged_esgs_data %>%
filter (area_prop > 0)
#Trying to get rid of edge errors, only slightly decreases final result error
# mutate (area_prop1 = case_when(
# area_prop > .95 ~ round(area_prop, digits = 1),
# area_prop < .01 ~ round(area_prop, digits = 1),
# TRUE ~ area_prop))
#merged_esgs_data <- merged_esgs_data %>%
# rename (area_prop_orig = area_prop,
# area_prop = area_prop1)
#looking for edge errors in QGIS data
merged_esgs_data %>%
filter (str_detect(sl070_name, "Union County") & STATEFP_2 == "34" & area_prop < 1) %>%
select (sl070_name, sl070_pop, area_prop, NAME_3)
merged_esgs_data %>%
filter (str_detect(sl070_name, "Portland") & STATEFP_2 == "41" & area_prop < 1) %>%
select (sl070_name, sl070_pop, area_prop, NAME_3, Geo_GEOID)
merged_esgs_data %>%
filter (str_detect(sl070_name, "Long Beach") & STATEFP_2 == "06" & area_prop < 1) %>%
select (sl070_name, sl070_pop, area_prop, NAME_3, Geo_GEOID)
merged_esgs_data %>%
filter (str_detect(sl070_name, "Franklin County") & STATEFP_2 == "39") %>%
select (sl070_name, sl070_pop, area_prop, NAME_3, Geo_GEOID)
merged_esgs_data %>%
group_by(area_prop) %>%
count()
Following this methodology available from Yale, I used areal weighting to adjust population data from the SL070 level into the overlaid ESG geography. It’s important to note demography is not necessarily uniformly distributed across geographic entities so this will not provide perfect results.
#How many polygons will areal weighting affect? (can only run once you've run subsequent chunks)
merged_esgs_data %>%
filter (!area_prop == 0) %>% #remove slivers caused by slight mismatches of SL070 & ESG shapefiles
filter (area_prop < 1) %>% #look for anywhere the overlaid geography isn't exactly the same as the SL070 geography
count()
6511 / 73426
Multiply population and CHAS variables by their area proportion.
The area outlined in red shows a section of Daytona Beach, Florida, with its demographic data represented at the census place-remainder level. Areal weighting will distribute 18.6% of its very low-income renters into the Volusia County ESG funding area (shown in yellow) and 81.4% of its very low-income renters into the Florida non-entitlement ESG funding area (shown in blue). Non-entitlement areas receive federal homeless assistance through their state rather than directly from HUD.
fixed_esgs <- merged_esgs_data %>%
mutate (pop_wt = (sl070_pop * area_prop),
vlirent_wt = (sl070_vli_ * area_prop),
vliover_wt = (sl070_vl_1 * area_prop))
#Check places with area_prop of 1. sl070 variables should be same as pop_wt, so this command should return ~66900 places
fixed_esgs %>%
filter (sl070_pop == pop_wt) %>%
filter (area_prop == 1) %>%
dplyr::select (sl070_name, area_prop, sl070_pop, pop_wt )
Group data by ESG area (UOGID) and summarize weighted data for two CHAS variables.
esg_vli <- fixed_esgs %>%
group_by (UOGID) %>%
summarize (t_vli_renters = sum(vlirent_wt, na.rm=TRUE),
t_vli_rent_over = sum(vliover_wt, na.rm=TRUE))
For 2020, join HUD’s crosswalk to add COC match to this data. For previous years, use my calculated crosswalk created in 01_2020_shares.
Then join 2019 PIT count.
#joining crosswalk to grouped sl070 data
all_tracts_esg <- esg_vli %>%
full_join (crosswalk_esg, by = c("UOGID"="esg_id")) # HUD's crosswalk for 2020
# full_join (compare, by = c("UOGID"="esg_id")) # My crosswalk for earlier years
#join in PIT count
joined_pit <- all_tracts_esg %>%
left_join ( pit_2019, by = "coc_num")
#Any NAs? Will appear for COCs that have disbanded like AR-504
joined_pit %>%
filter (is.na (coc_total_homeless))
joined_pit %>%
filter (is.na (coc_unsh_homeless))
Apply share to PIT count to calculate the homeless count in each of the 654 unique ESG-COC areas.
joined_pit <- joined_pit %>%
mutate (share_homeless = coc_total_homeless * share_of_coc_in_this_part_of_esg_area,
share_unsh = coc_unsh_homeless * share_of_coc_in_this_part_of_esg_area)
#Checking homeless totals
#These will be slightly below actual count because they don't include counts from four insular areas (American Samoa, Guam, Northern Mariana Islands and Virgin Islands)
sum(joined_pit$share_homeless, na.rm=T)
sum(joined_pit$share_unsh, na.rm=T)
#Actual PIT count
sum(pit_2019$coc_total_homeless)
sum(pit_2019$coc_unsh_homeless)
#write.csv (joined_pit, "data/outputs/pit_joined_to_crosswalk.csv")
Group by ESG to summarize the unique COC-ESG areas into their proper ESG areas.
esg_homeless <- joined_pit %>%
group_by (UOGID) %>%
summarize (homeless = round(sum(share_homeless, na.rm=T)),
unsh = round(sum(share_unsh, na.rm=T)))
#Checking homeless totals
#These will be slightly below actual count because they don't include four insular areas
sum(esg_homeless$homeless)
sum(esg_homeless$unsh)
#Actual PIT count
sum(pit_2019$coc_total_homeless)
sum(pit_2019$coc_unsh_homeless)
#Any COCs in PIT count that aren't showing up? Need to add manually in 01_simple_shares (other than insular areas).
anti_join (pit_2019, joined_pit, by = "coc_num")
HUD adjusted each of the CARES Act formula variables to account for places with high housing costs by calculating a weighted national average rent for a one-bedroom apartment. It then considered how much that average differed from each location’s fair market rent and gave communities proportionately weighted increases in the formula. For example, a city whose fair market rent was 10% above the national average would have its total homeless count increased by 10%. Non-entitlement areas were not adjusted, regardless of housing costs. Increases were capped at 20% to lessen the impact of high-cost outliers.
HUD adjusted each of these variables to take into account local housing and economic circumstances. Specifically, HUD adjusted variables upward to a maximum of 20% to account for places with high housing costs.
Fair market rent data for FY2020 became available Oct. 1, 2019, at https://www.huduser.gov/portal/datasets/fmr.html?WT.mc_id=Sept192019&WT.tsrc=Email#2020
This code imports the FMR data to prepare it for weighting our variables. We’ll match it to our ESG data by county FIPS code. Unfortunately, the six New England states without counties won’t match properly, so this code pulls out Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island and Vermont (which conveniently are the only states whose FMR FIPS code end with “99999”) and calculates the average of each FMR area’s rents to use instead.
The other states all just need to have the “99999” removed from their FIPS code to match up with the rest of our data.
#Import 2020 FMR data
fmr <- read_excel("data/inputs/fmr/FY20_4050_FMRs_rev.xlsx") %>%
dplyr::select (fips2010, fmr_1, areaname) %>%
rename (fmr1 = fmr_1)
#Calculate FMR for 6 New England states without counties to match by GEOID_2
new_england_counties <- fmr %>%
filter(!(str_detect(fips2010, "99999"))) %>%
#mutate (fips = substr(fips2010, 1, 5)) %>%
rename (GEOID_2 = fips2010) %>%
dplyr::select (GEOID_2, areaname, fmr1) %>%
# distinct(areaname, .keep_all = TRUE) %>%
group_by(areaname) %>%
mutate (fmr1 = mean(fmr1))#%>%
#rename (county_fips = fips)
#Prepare other states' data to be joined by county FIPS code by removing 9s
final_fmr <- fmr %>%
filter((str_detect(fips2010, "99999"))) %>%
mutate (fips = substr(fips2010, 1, 5)) %>%
rename(county_fips = fips) %>%
dplyr::select (county_fips, areaname, fmr1)
Create county_fips code in our gathered data to pull in FMRs for non-New England states. GEOID_2 column in this data already matches the FMR Area code. This code joins the two dataframes into our full ESG dataframe.
#create county fips code in QGIS sl_070 data for matching FMRs
fixed_esgs <- fixed_esgs %>%
mutate (county_fips = paste0(STATEFP_2, COUNTYFP))
#pad GEOID_2 to ensure proper matching
fixed_esgs$GEOID_2 <- str_pad (fixed_esgs$GEOID_2, 10, pad ="0")
#join dataframes
esg_homeless_fmr <- fixed_esgs %>%
full_join (final_fmr, by="county_fips") %>%
full_join (new_england_counties, by="GEOID_2") %>%
#New England's FMRs will be in fmr1.y
#All other states' FMRs will be in fmr1.x.
#This next line combines them into one column.
mutate ( fmr1 = if_else (is.na(fmr1.y), fmr1.x, fmr1.y))
#Check for any missed? Should only be 6 areas over water "County subdivisions not defined"
esg_homeless_fmr %>%
filter (is.na(fmr1))
#Check for FMRs that haven't matched to the ESG data. Should be Samoa, Guam, Mariana Islands and Virgin Islands. Also a St. Louis area county that doesn't actually exist (29056)
esg_homeless_fmr %>%
filter (is.na(UOGID))
This code groups by ESG to calculates an average FMR for each ESG area. I also tried a weighted mean, weighted by population of each little unit of area, but that resulted in a bigger percent difference from HUD’s calculations at the end.
This also removes those areas without an FMR we found above to prevent errors later.
#esg_homeless_fmr <- esg_homeless_fmr %>%
# group_by (UOGID) %>%
# summarize (esg_fmr = mean (fmr1, na.rm = T)) %>%
# esg_fmr_wt = weighted.mean(fmr1, pop_wt, na.rm=T))
# filter (!is.na(UOGID)) #breaks weighted mean otherwise
esg_homeless_fmr <- esg_homeless_fmr %>%
group_by (UOGID) %>%
summarize (#esg_fmr = mean (fmr1, na.rm = T),
esg_fmr = weighted.mean(fmr1, pop_wt, na.rm=T)) %>%
filter (!is.na(UOGID)) #breaks weighted mean otherwise
#Check for any missed? Should only be 0.
esg_homeless_fmr %>%
filter (is.na(esg_fmr))
#esg_homeless_fmr %>%
# filter (is.na(esg_fmr_wt))
#esg_homeless_fmr %>%
# filter (esg_fmr != esg_fmr_wt) %>%
# mutate (diff = esg_fmr_wt - esg_fmr) %>%
# arrange (desc(diff))
It should result in dataframe of 358 places that got ESG funding in 2020, minus four insular areas.
unadj_data2 <- full_join (esg_homeless, esg_vli, by="UOGID") %>%
full_join (esg_homeless_fmr, by ="UOGID")
Here’s where the weighting of variables by fair market rent comes into play. HUD calculated the national average of one-bedroom fair market rent (weighted on each of the four variables from the CARES Act formula) and then increased a community’s value by that percentage.
For example, a city whose fair market rent was 10% above the national average when weighted by total homeless count would have the value of its total homeless count increased by 10% before calculating its share of the total homeless count overall. Nonentitlement areas were all calculated using unadjusted counts, and no data was adjusted downward. Increases were capped at 20% to lessen the impact of high-cost outliers.
This code adds the four weighted averages to our dataframe.
weighted_data1 <- unadj_data2 %>%
mutate (homeless_fmr_adj = weighted.mean(esg_fmr, homeless, na.rm=T),
unsh_fmr_adj = weighted.mean(esg_fmr, unsh, na.rm=T),
vli_fmr_adj = weighted.mean(esg_fmr, t_vli_renters, na.rm=T),
vli_over_fmr_adj = weighted.mean(esg_fmr, t_vli_rent_over, na.rm=T),
# homeless_fmr_adj2 = weighted.mean(esg_fmr, homeless, na.rm=T),
# unsh_fmr_adj2 = weighted.mean(esg_fmr, unsh, na.rm=T),
# vli_fmr_adj2 = weighted.mean(esg_fmr, t_vli_renters, na.rm=T),
# vli_over_fmr_adj2 = weighted.mean(esg_fmr, t_vli_rent_over, na.rm=T)
)
This code mimics the SPSS code HUD sent me showing how they calculate the percentage to adjust variables from the correct weighted national average. They simply divide the area’s FMR by the national weighted average.
weighted_data2 <- weighted_data1 %>%
mutate (homeless_adj = (esg_fmr / homeless_fmr_adj),
unsh_adj = (esg_fmr / unsh_fmr_adj),
vli_adj = (esg_fmr / vli_fmr_adj),
vli_over_adj = (esg_fmr / vli_over_fmr_adj))
This code selects entitlement areas only (those whose UOGIDs do not end with “9999”) and calculates a multiple for all areas’ variables. This allows for adjusting areas with higher-than-average FMRs upward by that multiple, capping adjustments at 20%.
weighted_data3 <- weighted_data2 %>%
mutate (adj_homeless = case_when(
(!str_detect(UOGID, "9999$")) & between(homeless_adj, 1, 1.20) ~ homeless_adj,
(!str_detect(UOGID, "9999$")) & between(homeless_adj, 0, 1) ~ 1,
(!str_detect(UOGID, "9999$")) & between(homeless_adj, 1.2, 100) ~ 1.2),
adj_homeless = replace_na(adj_homeless, 1)
) %>%
mutate (adj_unsh = case_when(
(!str_detect(UOGID, "9999$")) & between(unsh_adj, 1, 1.20) ~ unsh_adj,
(!str_detect(UOGID, "9999$")) & between(unsh_adj, 0, 1) ~ 1,
(!str_detect(UOGID, "9999$")) & between(unsh_adj, 1.2, 100) ~ 1.2),
adj_unsh = replace_na(adj_unsh, 1)
) %>%
mutate (adj_vli = case_when(
(!str_detect(UOGID, "9999$")) & between(vli_adj, 1, 1.20) ~ vli_adj,
(!str_detect(UOGID, "9999$")) & between(vli_adj, 0, 1) ~ 1,
(!str_detect(UOGID, "9999$")) & between(vli_adj, 1.2, 100) ~ 1.2),
adj_vli = replace_na(adj_vli, 1)
) %>%
mutate (adj_vli_over = case_when(
(!str_detect(UOGID, "9999$")) & between(vli_over_adj, 1, 1.20) ~ vli_over_adj,
(!str_detect(UOGID, "9999$")) & between(vli_over_adj, 0, 1) ~ 1,
(!str_detect(UOGID, "9999$")) & between(vli_over_adj, 1.2, 100) ~ 1.2),
adj_vli_over = replace_na(adj_vli_over, 1)
)
This code applies the multiple calculated in the previous step to all variables’ values and removes unnecessary columns.
weighted_data4 <- weighted_data3 %>%
mutate (final_homeless = homeless * adj_homeless,
final_unsh = unsh * adj_unsh,
final_vli = t_vli_renters * adj_vli,
final_vli_over = t_vli_rent_over * adj_vli_over
)
#check for places that got adjusted upward for homeless count. esg_fmr should be > homeless_fmr_adj
weighted_data4 %>%
filter (adj_homeless > 1) %>%
dplyr::select (1,2,6,7, 11,15, 19)
#reduce extra columns for final dataframe with all data, weighted by FMR, joined to ESG areas
weighted_data5 <- weighted_data4 %>%
dplyr::select (1, 19:22)
HUD summed each formula variable to get national totals, which were then used to calculate how much each of the 362 communities nationwide would get in emergency homeless aid. Because the U.S. territories of American Samoa, Guam, the Northern Mariana Islands and the Virgin Islands do not all have annual homeless counts or data on very low-income renter counts, HUD set aside 0.2% of the funding and allocated it to these four areas by population.
The following formula shows the final calculation to determine the amount each ESG area receives:
Allocation amount ($2.954 billion [removes 0.2% for insular areas]) X
0.5 * (area’s adjusted count of total homeless / national sum of adjusted total homeless counts)
0.1 * (area’s adjusted count of unsheltered homeless / national sum of adjusted unsheltered homeless counts)
0.15 * (area’s adjusted count of very low-income renters / national sum of adjusted very low-income renters)
0.25 * (area’s adjusted count of very low-income renters living in overcrowded conditions or without full kitchen and plumbing / national sum of adjusted very low-income renters overcrowded or without full kitchen and plumbing)
Calculate national totals of each variable and set the allocation amount to match ESG-CV2’s funding minus insular areas. Then plug values into formula above to calculate shares for each ESG area.
nat_homeless <- sum (weighted_data5$final_homeless, na.rm=T)
nat_unsh <- sum (weighted_data5$final_unsh, na.rm=T)
nat_vli <- sum (weighted_data5$final_vli, na.rm=T)
nat_vli_over <- sum (weighted_data5$final_vli_over, na.rm=T)
amt <- 2954080000
shares3 <- weighted_data5 %>%
mutate ( final_vli = replace_na ( final_vli, 1),
final_vli_over = replace_na ( final_vli_over, 1),
share =
(amt * ((0.50 * final_homeless / nat_homeless) +
(0.10 * final_unsh / nat_unsh) +
(0.15 * final_vli / nat_vli) +
(0.25 * final_vli_over / nat_vli_over)
))
)
#Check that shares all add up to the allocation amount of 2954080000
sum(shares3$share, na.rm=T)
Import HUD’s actual ESG-CV2 allocations to compare how my formula has done.
#Import HUD ESG-CV for comparison (ESG-CV2)
hud_funding_cv2 <- read_excel("data/inputs/hud_allocations/esg-cv2-amounts.xlsx",
sheet = 1) %>% clean_names() %>%
dplyr::select (key, locale, esg_cv2) %>%
rename(uogid = key) %>%
filter(esg_cv2 != 0)
#Import HUD ESG-CV for comparison (ESG20)
compare_esg_cv2 <- shares3 %>%
full_join (hud_funding_cv2, by = c("UOGID" = "uogid")) %>%
mutate (formula_difference = (share - esg_cv2),
pct_change = ((formula_difference / esg_cv2)*100),
share = round(share)) %>%
rename (my_formula = share,
allocation = esg_cv2) %>%
dplyr::select (1,7,6,8:10)
mean(compare_esg_cv2$pct_change, na.rm=T) #1.53% when absolute value & -.086% otherwise
#write.csv (compare_esg_cv2, "data/outputs/compare_esg_cv2.csv")
#86% were were within 2% of HUD's calculations
compare_esg_cv2 %>%
filter (pct_change < 2)
#309/358 = 86%
#6% were more than 5% off of HUD's calculations, with the worst being 9.2% off.
compare_esg_cv2 %>%
filter (pct_change > 5) %>%
arrange (desc(pct_change))
#22/358 = 6.1%
Using the same weighted data, simply apply new allocation amount of $290 million - $580,000 that went to insular areas to estimate what the ESG-CV2 formula would have allocated had it been in place for the regular FY2020 ESG cycle.
hud_funding_20 <- read_excel("data/inputs/hud_allocations/fy2020-formula-allocations-AllGrantees 042120.xlsx") %>%
clean_names() %>%
dplyr::select (key, name, esg20) %>%
rename(uogid = key) %>%
mutate (uogid = if_else (uogid == "119999", "110006", uogid)) %>%
filter(esg20 != 0)
amt <- 289420000 #$290M - $580K to insular areas (.2%)
cutoff_amt <- 290000000 * .0005
shares2020 <- weighted_data5 %>%
mutate ( final_vli = replace_na ( final_vli, 1),
final_vli_over = replace_na ( final_vli_over, 1),
share =
(amt * ((0.50 * final_homeless / nat_homeless) +
(0.10 * final_unsh / nat_unsh) +
(0.15 * final_vli / nat_vli) +
(0.25 * final_vli_over / nat_vli_over)
))
)
#Calculate those places below 0.05% cutoff and rollup their shares into state non-entitlement regions
compare20 <- shares2020 %>%
full_join (hud_funding_20, by = c("UOGID" = "uogid")) %>%
mutate (formula_diff = (share - esg20),
pct_diff = ((formula_diff / esg20)*100),
state = str_sub(UOGID, 1, 2)) %>%
rename (my_formula = share,
allocation = esg20) %>%
dplyr::select (1,7,6,8:11) %>%
mutate (cutoff = if_else (my_formula < cutoff_amt, 0, my_formula)) %>%
#filter (!str_detect(UOGID, "9999$")) %>%
group_by (state) %>%
mutate (st_tot = sum (my_formula),
missing = sum (cutoff),
roll_up = st_tot - missing,
rolled_up = if_else (str_detect(UOGID, "9999$"), my_formula + roll_up, cutoff),
pct_diff_rollup = ((rolled_up - allocation) / allocation)*100) %>%
ungroup() %>%
dplyr::select (-c(7:11)) %>%
mutate (my_formula = round (my_formula),
formula_diff = round (formula_diff),
pct_diff = round (pct_diff, digits = 2),
rolled_up = round (rolled_up),
pct_diff_rollup = round (pct_diff_rollup, digits = 2))
#write.csv (compare20, "data/outputs/simpleshare_final/compare20_simpleshares.csv")
This process was repeated for the years 2011-2019, using the appropriate data sources HUD would have used in each of those years to make ESG allocations. See each year’s code in the Rmarkdowns in this project’s Github repo.
Sum up hypothetical amounts with ESG-CV2 formula for 2011-2020.
esg2011 <- read.csv ("data/outputs/simpleshare_final/compare11_simpleshares.csv") %>%
mutate (X = 2011)
esg2012 <- read.csv ("data/outputs/simpleshare_final/compare12_simpleshares.csv") %>%
mutate (X = 2012)
esg2013 <- read.csv ("data/outputs/simpleshare_final/compare13_simpleshares.csv") %>%
mutate (X = 2013)
esg2014 <- read.csv ("data/outputs/simpleshare_final/compare14_simpleshares.csv") %>%
mutate (X = 2014)
esg2015 <- read.csv ("data/outputs/simpleshare_final/compare15_simpleshares.csv") %>%
mutate (X = 2015)
esg2016 <- read.csv ("data/outputs/simpleshare_final/compare16_simpleshares.csv") %>%
mutate (X = 2016)
esg2017 <- read.csv ("data/outputs/simpleshare_final/compare17_simpleshares.csv") %>%
mutate (X = 2017)
esg2018 <- read.csv ("data/outputs/simpleshare_final/compare18_simpleshares.csv") %>%
mutate (X = 2018)
esg2019 <- read.csv ("data/outputs/simpleshare_final/compare19_simpleshares.csv") %>%
mutate (X = 2019)
esg2020 <- read.csv ("data/outputs/simpleshare_final/compare20_simpleshares.csv") %>%
mutate (X = 2020)
#bind data frames together
final <- esg2020 %>%
rbind (esg2019) %>%
rbind (esg2018) %>%
rbind (esg2017) %>%
rbind (esg2016) %>%
rbind (esg2015) %>%
rbind (esg2014) %>%
rbind (esg2013) %>%
rbind (esg2012) %>%
rbind (esg2011) %>%
rename (year = X) %>%
mutate (UOGID = as.character (UOGID),
UOGID = case_when (
name == "District Of Columbia" ~ "110006",
name == "WASHINGTON" ~ "110006",
TRUE ~ UOGID))
final$UOGID <- str_pad(final$UOGID, 6, pad = "0")
#sum up actual allocations, hypothetical allocations, and allocations after those places awarded less than 0.05% were rolled up to the state level.
summed <- final %>%
group_by (UOGID) %>%
mutate (t_actual = sum(allocation, na.rm=T),
t_mine = sum(my_formula, na.rm=T),
t_rollup = sum(rolled_up, na.rm=T),
count = n()) %>%
distinct(UOGID, .keep_all = TRUE)
#calculate difference between hypothetical and actual and percent differences for presentation.
final2 <- summed %>%
mutate (formula_diff = (t_mine - t_actual),
pct_diff = ((formula_diff / t_actual)*100),
rollup_diff = (t_rollup - t_actual),
pct_diff_rollup = (rollup_diff / t_actual)*100) %>%
dplyr::select (2,3,10:12, 6,7,14,9,1,13)
#write.csv (final2, "data/outputs/simpleshare_final/final_pprns.csv")
# How many places had money rolled up all 10 years? 81 (+4 insular areas)
# Which states had the most of these cities? Penn (15), NY (11), then MI, IL, IN & PR tied with 6 each
final2 %>%
filter (pct_diff_rollup == -100) %>%
mutate (state = substr(UOGID, 1, 2)) %>%
group_by (state) %>%
count (state) %>%
arrange (desc(n))
# How many places had money rolled up at least once in the decade? 128, an additional 47
final %>%
filter (rolled_up == 0) %>%
distinct (UOGID, .keep_all = TRUE)
How much would each state’s awards have changed with the new formula from 2011-2020?
library(tidycensus)
fips <- fips_codes %>%
distinct (state_code, .keep_all=T)
compare <- final2 %>%
mutate (state = substr(UOGID, 1,2))
compare2 <- compare %>%
group_by (state) %>%
summarize (new_formula_2011_2020 = sum(t_mine),
actual_allocation_2011_2020 = sum(t_actual)) %>%
mutate (diff = new_formula_2011_2020 - actual_allocation_2011_2020,
pct_change = diff / actual_allocation_2011_2020 *100) %>%
left_join (fips, by = c("state" = "state_code")) %>%
select(state_name, new_formula_2011_2020, actual_allocation_2011_2020, diff, pct_change) %>%
arrange (desc(diff))
#write.csv (compare2, "data/outputs/state_total_comparison.csv")
final3 <- compare %>%
left_join (fips, by=c("state" = "state_code")) %>%
select (-c(state, state.y, county_code, county))
#write.csv (final3, "data/outputs/simpleshare_final/final_pprns.csv")
These final calculations are presented in the following publications:
There may be minor differences in the GIS analysis or the fair market rent adjustments that the Howard Center used compared to HUD. However, the vast majority of the Howard Center’s calculations (86%) were less than 2 percentage points off of HUD’s allocations for the second round of ESG funding in the CARES Act.
The Howard Center consulted a half-dozen formula funding experts at HUD and demographic researchers at Boston University and the University of Washington in conducting its analysis. Among the experts was Todd Richardson, an economist who heads the HUD research division that devised the new CARES Act homeless aid formula.
Responding to the Howard Center’s findings on how the CARES Act formula would have changed homeless funding amounts if applied over the last decade, Richardson said, “From a 10,000-foot level, these results are about what I would expect.”