Products

  1. ArcGIS interactive StoryMap visualizing data analysis described below.
    interactive map of homeless assistance formula comparison
  2. “Calculating homeless aid has relied on a flawed formula” on The Associated Press
  3. “Calculating the impact of HUD’s homeless formulas” on ArizonaPBS/CronkiteNews
  4. “Caring for COVID’s invisible victims” - full package of investigative stories on ArizonaPBS/CronkiteNews

Investigation methodology

When Congress passed the Coronavirus Aid, Relief, and Economic Security Act in March, it included $4 billion in homeless aid through the Emergency Solutions Grant program run by the Department of Housing and Urban Development. HUD since 1987 had calculated the amount of homeless aid communities received based on a formula that was unrelated to homelessness. Since the pandemic grant program was so large, Congress told HUD to recalculate its formula to reflect an area’s homelessness. That “new” formula was used to allocate the second round of ESG funding, nearly $3 billion, in June 2020 and dramatically reshaped where dollars went nationwide.

I wanted to discover how much money each city, county and state recipient would have gotten to care for their homeless residents over the last decade had HUD used this better-targeted formula starting in 2011 instead of 2020.

The new formula considers four variables for each community:

  • Share of all homeless people from the previous year’s homeless census (50% weight)
  • Share of unsheltered homeless people from the previous year’s homeless census (10% weight)
  • Share of those at risk for homelessness, calculated using the total number of very low-income renter households for a community. HUD usually defines very low-income renters as those who earn less than 50% of the area’s median family income. (15% weight)
  • Share of those at risk for unsheltered homelessness, calculated using the total number of very low-income renters who are living in overcrowded situations (more than one person per room) or in housing without a complete kitchen or plumbing. (25% weight)

Adjust homeless census to ESG geography

The numbers of total homeless people and unsheltered homeless people come from the previous year’s Point-in-Time, or PIT, counts, conducted by local HUD-approved bodies that oversee homeless services for a given area. Because the geographic areas covered by these PIT counts often differ from the geographic areas that HUD used to allocate ESG funding, the department had to adjust for those boundary differences to calculate payouts under the new formula. HUD overlaid the ESG recipient boundaries with the homeless census’ geographic areas, creating a third, intermediate geography that enabled them to assign homeless counts into the ESG funding geography according to the proportion of ESG funding these intermediate areas received.

To see how much money communities would have received if HUD had used the homeless-centered CARES Act formula yearly, the Howard Center followed the department’s methodology for the pandemic payouts and applied it to the years 2011 to 2019. It did that with the help of a third HUD formula, which is based on the same Community Development Block Grant calculation that HUD uses in the ESG program to determine the amount of an area’s homeless assistance. Using this third formula, which HUD makes public, allowed the Howard Center to create a “crosswalk” for each of the past 10 years to apply actual homeless counts to the existing ESG grant recipient areas.

This process produced a nearly identical crosswalk to the one HUD used in calculating its 2020 pandemic payouts. The Howard Center checked its use of the third formula in assigning homeless counts to grant areas for 2020 with HUD’s actual crosswalk, which the department provided, and the average difference was just 0.006%.

Studying Alabama helps visualize this process. Homeless counts are seen in the blue map for the eight Continuums of Care, HUD-approved entities that administer homeless services. But HUD disburses homeless grants in its ESG program based on the five areas shown on the red map. The crosswalk necessary to calculate the 2020 CARES Act payout created a unique combination of the two geographies, shown in the yellow map.

Comparing Alabama’s CoC geography to its ESG grant recipient areas and the third, intermediary geography that HUD used to convert homeless counts from CoCs onto the ESG geography

For example, the Birmingham metro area counted 981 homeless people in 2019. The crosswalk allocates 57% of that total homeless count, or 561 people, to the Birmingham city ESG payment area; 23%, or 224 people, to the Jefferson County ESG payment area; and the remaining 22%, or 196 people, to the Alabama Balance of State ESG payment area within the Birmingham CoC. These calculations are necessary only in places where the CoC and ESG boundaries are in conflict. For example, the 390 homeless people in the Florence/Northwest Alabama CoC fall entirely within the ESG boundary for the Alabama Balance of State — a designation used for largely rural areas.

The Code (2020_01_simple_shares.Rmd)

Goal

HUD used ESG funding amounts to divide PIT counts from the COC geography into ESG geography. I can calculate ESG funding amounts for the ~1,200 ESG- & CDBG-eligible places from the CDBG18 allocations in HUD’s workbook, but that leaves ~2,900 counties in balance of state areas without an ESG funding amount.

HUD’s PPRN (Preliminary Pro Rata Need) is another formula program derived directly from the CDBG formula and makes the calculations available online each year for all ~4,100 unique cities and counties in the HUD CDBG/ESG geographies.

This program uses the available PPRN figures and converts them to the approximate ESG funding amounts for the nonentitlement areas since both programs are based on the same underlying CDBG formula.

PPRN Formula Mechanics

After a 0.2% setaside for insular areas, 75% of the total PPRN allocation goes to ESG entitled places and 25% goes to places that do not get direct ESG funding (which may be CDBG-eligible cities, urban counties or non-entitled counties within the balance of state area). Source: Federal Register

Sources

Ingest sources

I converted the 2018 Continuum of Care (CoC) Geographic Code Report to a CSV using Tabula. This code imports that file and also import’s HUD’s 2020 crosswalk with their calculated ESG funding amounts based on the CDBG formula.

pprn_df <- read.csv("data/inputs/hud_allocations/FY-2019-Geographic-Codes-with-PPRN.csv") %>%
  clean_names() %>% 
  rename (UOGID = code,
          pprn_2 = pprn)

#Convert factor columns into character/numeric to be able to mutate later. Add "0" pad to all HUD keys (UOGID) to ensure they are six digits long and match properly later.
#Remove dollar signs and commas and extra pprn column
pprn_df$UOGID <- str_pad (pprn_df$UOGID, 6, pad = "0")
pprn_df$pprn_2 <- as.character (pprn_df$pprn_2)
#remove $ and ,
pprn_df$pprn = as.numeric(gsub("[\\$,]", "", pprn_df$pprn_2))
pprn_df <- pprn_df %>%
  select (-pprn_2)

#Import disaggregated crosswalk of all 4141 jurisdictions
crosswalk <- read_excel("data/inputs/hud_calculations/Disaggregated COC to ESG crosswalk.xlsx") %>%
  clean_names() %>% 
  rename (UOGID = geocode,
          coc_num = coc_id)

#Check for missing from PPRN file and check it read right from PDF
anti_join (crosswalk, pprn_df, by = "UOGID") 

#Check for missing from crosswalk file -- this will be those areas without CoCs (several places in AL, FL, & OK), insular areas and a few cities that have with joint agreements with their county that HUD didn't include (Marco Island, Naples, Commerce City, Pinellas Park).
#Hard code these in above
anti_join (pprn_df, crosswalk, by = "UOGID") %>%
  filter(!(UOGID %in% c("129003", "129029", "129125", "010072", "010882", "019015", "019019","019049","019055","050930","059033","059047","059083","059113","059127","059131", "409087","409149"))) %>% #filter out non-COC counties in AL, AR, FL, OK
  filter(!(state %in% c("AS","GU","MP","VI"))) #filter out insular areas 

#Import disaggregated crosswalk of all 4141 jurisdictions and correct with places found above
crosswalk <- read_excel("data/inputs/hud_calculations/Disaggregated COC to ESG crosswalk.xlsx") %>%
  clean_names() %>% 
  dplyr::select (1,2,4, 5, 8,9) %>% 
  rename (UOGID = geocode,
          coc_num = coc_id) 

#check for missing
crosswalk %>% 
  filter (is.na(UOGID))

Adjust funding amounts from PPRN amounts to ESG for CDBG/ESG-eligible areas

This code joins the crosswalk to the PPRN table, filtering out places not in the crosswalk and therefore not in a COC. It imports the 362 ESG allocations from 2019 and creates a list of these entitled places to filter out later. It also reimports a fresh version of the FY19 allocations and calculates the total ESG allocation minus insular areas.

#start with crosswalk, join PPRN formula amounts, remove extra columns
joined <- crosswalk %>%
  left_join (pprn_df, by = "UOGID") %>% #removes areas without CoCs
  select (-c(name, state ))

#save UOGIDs of four insular areas to filter out since they get a 0.2% setaside of the total funding.
insular <- as.list (c("600001", "660001", "690001", "780001"))

#import ESG19 allocations from HUD to make list of eligible UOGIDs
hud_funding_19 <- read_excel("data/inputs/hud_allocations/fy2019-formula-allocations-AllGrantees.xlsx", skip = 1) %>%
  dplyr::select (1, 4, 7) %>%
  clean_names() %>%
  rename(UOGID = key) %>%
  mutate (UOGID = if_else (sta == "DC", "110006", UOGID)) %>% #Correct DC key change
  filter (esg19 != 0) %>% #finds only ESG eligible
  filter (!(UOGID %in% (c("600001", "660001", "690001", "780001")))) %>%
  filter (!(UOGID != "119999" & str_detect (UOGID, "9999$"))) #removes states, which are nonentitlement and makes sure DC stays in as entitlement

esg_eligible18 <- as.list (hud_funding_19$UOGID)

#import ESG19 allocations from HUD to calculate total allocation
hud_funding_19 <- read_excel("data/inputs/hud_allocations/fy2019-formula-allocations-AllGrantees.xlsx", skip = 1) %>%
  dplyr::select (1, 7) %>%
  clean_names() %>%
  rename(UOGID = key)

#Calculate total ESG-entitled allocation
esg19_total <- sum (hud_funding_19$esg19) * .998

Calculating ESG/CDBG entitled areas from FY18 CDBG allocations

This imports the CDBG18 calculations worksheet from HUD and calculates a total allocation amount.

We need this because the ESG formula uses the proportion of an area’s CDBG funding to the whole to adjust into the smaller ESG allocation.

An area’s ESG funding = (CDBG18 / CDBG18 total - insular area setaside) * (total 2019 ESG allocation - insular area setaside)

This filters the CDBG calculation worksheet for all ESG-eligible areas using the list we made above. We can then sum these areas to see much total CDBG funding HUD allocated to ESG-entitled areas and how much to the non-entitled areas.

cdbg18_allocation <- read_excel ("data/inputs/hud_calculations/CDBG FY19.xlsx", sheet = "CDBG18") %>% 
  rename (UOGID = HUDKEY,
          cdbg18 = "FY18 Formula") %>% 
  dplyr::select (UOGID, NAME, cdbg18) 

#calculate total 2018 CDBG formula allocations
cdbg18_total <- sum (cdbg18_allocation$cdbg18)

#filter out nonentitled areas to calculate total CDBG to ESG-entitled places and total CDBG to non-entitled places
cdbg18_entitled <- cdbg18_allocation %>% 
  filter (UOGID %in% esg_eligible18) %>%   
  filter (!(UOGID == insular))
#  rbind (c("369119", "Westchester County", 3544891)) #Westchester became CDBG/ESG-eligible in 2019, it's missing from allocation sheet. This adds its 2019 amount or else it would appear as NA.

#check DC is here
cdbg18_entitled %>% 
  filter (UOGID %in% c("110006", "119999"))

#check that insular areas are not in this list
cdbg18_entitled %>% 
  filter (UOGID %in% c("600001", "660001", "690001", "780001"))

cdbg18_nonentitled <- cdbg18_allocation %>% 
  filter (!(UOGID %in% esg_eligible18)) 

#cdbg18_entitled$cdbg18 <- as.numeric (cdbg18_entitled$cdbg18)
cdbg18_esg_amt <- sum(cdbg18_entitled$cdbg18)

cdbg18_non_amt <- sum(cdbg18_nonentitled$cdbg18)
cdbg18_non_amt2 <- (cdbg18_total) - cdbg18_esg_amt #take out insular adjustment because they're already not included

Now we can convert the CDBG allocations to the ESG amounts for CDBG-eligible and ESG-eligible places. This adds our PPRN data and crosswalk into the CDBG18 allocation data and filters out extra state nonentitlements that get pulled in from the right join.

I can also compare my calculations to HUD’s to check for accuracy. The mean difference from my calculations to HUD’s is 0.046%.

#calculate the proportion of each area's portion of the whole allocation.
#multiply that by the total ESG amount to adjust to the proper ESG amount
entitled_funding <- joined %>% 
  right_join (cdbg18_allocation, by="UOGID") %>% 
  select (-NAME) %>%
  filter (!(str_detect(UOGID,"9999$"))) #filters out state nonentitlements that get pulled in from right join

entitled_funding2 <- entitled_funding%>%
  mutate (pct = cdbg18 / cdbg18_total, #proportion of area to the whole allocation
          esg19 = round(pct * esg19_total), #adjusts into smaller ESG amount
          diff = esg19 - esg_amount, #how far off am I from HUD's calculations in the crosswalk?
          pct_diff = abs(diff / esg_amount )) 
          
#0.046% mean difference from HUD's numbers in the crosswalk to these.
mean(entitled_funding2$pct_diff, na.rm=T)

Adjust non-entitled areas from PPRN amounts

HUD’s calculations show the state nonentitlement areas as one lump sum, so we can’t simply convert it. This section translates the PPRN funding amount back into its approximate CDBG18 amount and then over to the ESG19 amount to be used only for nonentitlement areas.

Calculate PPRN totals and save values

Sum up all PPRN Formula amounts to get total. Subtract amounts for insular areas since HUD sets aside .2% for them. Set the proper allocation amounts according to the PPRN formula: ESG entitled areas get 75% and nonentitled ESG areas get 25% of PPRN funding.

#from HUD's allocation files
pprn_total_with_insular <- sum(pprn_df$pprn)
#subtract setasides
pprn_total <- pprn_total_with_insular * .998

pprn_esgs <- pprn_total * .75
pprn_non <- pprn_total * .25

We need to filter our joined crosswalk/PPRN data for for all non-entitled areas. These are counties in the Balance of State areas that end in “9999”. They also have a geo_type marker.

This code calculates the proportion each county’s PPRN constitutes of all nonentitlement PPRN funding. We can then multiply that by the total CDBG funding allocated to nonentitlement areas to convert it into an approximate CDBG amount.

We then repeat the process above of calculating each area’s amount of the whole CDBG18 allocation and multiply that by the ESG19 total allocation to arrive at the area’s approximate ESG funding amount.

This is not a perfect estimate, likely due to reallocations after the formula is calculated, joint agreements between cities and counties or some other reason known only to HUD. There is a large outlier – Bibb County, GA. When that county is factored out, the mean difference from HUD’s calculations to these is 7%. Not perfect, but far better than the 20% mean difference my QGIS analysis provided.

However, once these are all grouped by ESG/CoC, the difference drops significantly to 0.193%. I’m guessing most places with differences are in the same ESG area and get swept up together.

#NONENTITLED AREAS - should total 2942
nonentitled_funding <- joined %>%
  filter (str_detect (esg_id, "9999$")) %>%
  filter (geo_type == "PPRN Non-Entitlement Geographies with ESG bounced to State") %>% 
  rbind (c("369119", "PPRN Non-Entitlement Geographies with ESG bounced to State", "Westchester County", 529089, 369119, "NY-604", 1678341)) #Westchester became CDBG/ESG-eligible in 2019, so it's missing from allocation sheet and its esg_id is not the state level code 369999. This manually adds it back in.
nonentitled_funding$pprn <- as.numeric (nonentitled_funding$pprn)
nonentitled_funding$esg_amount <- as.numeric (nonentitled_funding$esg_amount)

nonentitled_funding_unadjusted <- nonentitled_funding %>%
  mutate (pct = pprn / pprn_non,
          cdbg18 = pct * (cdbg18_non_amt),
          esg19 = round((cdbg18 / cdbg18_total ) * esg19_total),
          diff = esg19 - esg_amount,
          pct_diff = (diff / esg_amount * 100)) 

#All percent differences vary from 4.7 - 5.3%.
adj <- mean(nonentitled_funding_unadjusted$pct_diff, na.rm=T)
pprn_adj <- (100-adj) / 100

nonentitled_funding2 <- nonentitled_funding %>%
  mutate (pct = pprn / pprn_non,
          cdbg18 = pct * (cdbg18_non_amt),
          esg19 = round(((cdbg18 / cdbg18_total ) * esg19_total) * pprn_adj),
          diff = (esg19 - esg_amount),
          pct_diff = abs(diff / esg_amount)) 

#0.192696% mean difference from HUD's numbers in the crosswalk to these.
mean(nonentitled_funding2$pct_diff, na.rm=T)
#str(nonentitled_funding3)

Bring it together

Join ESG/CDBGs and nonentitlement areas back together and check for errors

crosswalk18 <- entitled_funding2 %>%
  rbind(nonentitled_funding2) %>%
  rename (x18_hud_calc = esg_amount)

#Any non matching amounts from the 2020 crosswalk's ESG amounts to my calculated amounts? It's the nonentitlement areas, as expected.
crosswalk18 %>%
  filter (x18_hud_calc != esg19)

#Anything extra that's gotten joined? UOGIDs of areas without COCS + Hawaii counties, which are actually calculated under the UOGID from the crosswalk (159### rather than 15000#)
anti_join(crosswalk18, crosswalk, by="UOGID")

#0.13694% mean difference from HUD's numbers in the crosswalk to these.
mean(crosswalk18$pct_diff, na.rm=T)

#write.csv (crosswalk18, "data/outputs/disaggregated_crosswalk_check.csv")

Create crosswalk

Now that we’ve got the funding amounts for each little area, we can group into 654 unique ESG-CoC combinations and calculate how much ESG funding the portions of each CoC within a particular ESG’s boundaries receive.

#Areas without COC will aggregate into a new NA column that appears in the final crosswalk so we'll remove those.
#How many? 
crosswalk18 %>%
  filter (is.na (coc_num))

#10 rows that we'll filter out here, group by ESG and COC then summarize.
group1 <- crosswalk18 %>%
  filter (!is.na (coc_num)) %>%
  group_by (esg_id, coc_num) %>%
  summarize (esg_4_coc_n_esg = sum(esg19, na.rm=T)) %>%
  ungroup() 

#Import ESG crosswalk from HUD to compare
crosswalk_esg <- read_excel("data/inputs/hud_calculations/Aggregated COC to ESG crosswalk.xlsx") %>%
  clean_names() %>%
  rename (coc_num = coc_id)

#Check -- these amounts should match column D in aggregated crosswalk
check_group1 <- group1 %>%
  full_join (crosswalk_esg, by = c("esg_id", "coc_num")) %>% 
  dplyr::select (1,2,3,5) %>%
  rename (hud_amt = amount_of_esg_for_the_portion_of_coc_in_this_esg_grantee) %>%
  mutate (diff = esg_4_coc_n_esg - hud_amt,
          pct_diff = abs(diff / hud_amt * 100))

#Avg difference of 0.06%
mean(check_group1$pct_diff) 

Group by CoC number to calculate the total ESG funding for each CoC. Divide summed ESG funding for each portion of the CoC within an ESG’s boundaries by the CoC’s total ESG funding to calculate the share of each CoC’s homeless population that resides within this part of the ESG area.

group2 <- group1 %>% 
  group_by (coc_num) %>% 
  mutate (tot_esg_4_coc = sum(esg_4_coc_n_esg, na.rm=T), #calculates total ESG funding for each CoC
          share_coc_n_esg = esg_4_coc_n_esg / tot_esg_4_coc) #calculates share of each CoC within this part of the ESG area. 
  
#Check my calculated ESG funding adds up to approximately the FY2019 ESG allocation of $279,687,680. Will be different because of all the converting between geographies, and should be slightly less because areas without COCs are removed.
sum(group2$esg_4_coc_n_esg, na.rm=T) 

Comparison with HUD

Compare with HUD’s crosswalk to see how close I got.

#Import ESG crosswalk from HUD
crosswalk_esg <- read_excel("data/inputs/hud_calculations/Aggregated COC to ESG crosswalk.xlsx") %>%
  clean_names() %>%
  dplyr::select (1,2,6) %>%
  rename (coc_num = coc_id)

#Join my share calculations by both ESG and CoC to compare my shares based on population to HUD's share calculations
compare_shares <- crosswalk_esg %>%
  full_join (group2, by = c("esg_id", "coc_num")) %>%
  rename (hud_share = share_of_coc_in_this_part_of_esg_area,
          my_share = share_coc_n_esg) %>% # add my_share2010 column to compare to current HUD share to see which might be more wrong
  mutate (diff = abs(hud_share - my_share)) %>%
  dplyr::select(1,2,3,6,7) 

#How far off are my calculated shares from HUD's on average?
mean (compare_shares$diff, na.rm=T) #avg diff = 0.005998% from my calculations to HUDs

#Strip out HUD's share and difference column for previous year's crosswalks 
esg_coc_crosswalk <- compare_shares %>%
  dplyr::select (1, 2, 4) %>%
  rename (share_of_coc_in_this_part_of_esg_area = my_share)

#Export for use in formula program
#write.csv (compare_shares, "data/outputs/x20_simple_shares.csv")

Comparing my crosswalk to HUD’s crosswalk:

  • 247 ESG-CoC combinations match perfectly, leaving 407 calculations different from HUD’s
  • 633/654 = 96.8% were within 5 percentage points of HUD’s calculations
  • 581/654 = 88.8% were within 2 percentage points of HUD’s calculations
  • 540/654 = 82.6% were within 1 percentage point of HUD’s calculations
#247 were dead-on (perfect matches) leaving 407 calculations different from HUD's
compare_shares %>% 
  filter (diff == 0)  %>% count()

# 581/654 = 88.8% were within 2 percentage points of HUD's calculations
compare_shares %>% 
  filter (diff < .02) %>% count()

# 540/654 = 82.6% were within 1 percentage point of HUD's calculations
compare_shares %>% 
  filter (diff < .01) %>% count()

#633/654 = 96.8% were within 5 percentage points of HUD's calculations
compare_shares %>% 
  filter (diff < .05) %>% count()

#247 / 654 ESG-CoC combinations match perfectly ~ over one-third. 
#grouping them by ESG and counting gets 101/358 ESGs (minus insular areas) that match perfectly (28%)
compare_shares %>% 
  filter (diff == 0) %>% 
  group_by(esg_id) %>%
  count()

#all_tracts_pop6 %>% 
#  summarize (n(), n_distinct(cross_uogid))

#What CDBG-eligible cities/urban counties were off? 
#check_pop_8 %>%
#  filter (diff != 0) %>%
#  filter (!str_detect(esg_id, "9999$"))
#It's all the places that changed eligibility + NYC

#How many COCs perfectly match ESG boundaries?
#Group by ESG and count
#Group by COC and count
#Look for those places that only have a single row for each
matches <- compare_shares %>% 
#  filter (hud_share == 1) %>% 
  group_by (esg_id) %>% 
  mutate (n_esg = n()) %>%
  ungroup() %>%
  group_by(coc_num)%>% 
  mutate (n_coc = n()) %>%
  filter (n_esg == 1 & n_coc == 1)

pit_2019 <- read_excel("data/inputs/hud_calculations/2007-2019-PIT-Counts-by-CoC.xlsx", 
                       sheet = "2019", 
                       n_max = 397) %>% #used for csv, stringsAsFactors = FALSE)
  clean_names %>%
  dplyr::select (1:2, overall_homeless_2019, unsheltered_homeless_2019) %>%
  rename(coc_num = co_c_number,
         coc_name = co_c_name,
         coc_total_homeless = overall_homeless_2019,
        coc_unsh_homeless = unsheltered_homeless_2019) %>%
  mutate(coc_num = ifelse(coc_num == "MO-604a", "MO-604", coc_num))

matches_final <- matches %>% 
  left_join (crosswalk, by = c("esg_id"="UOGID")) %>% 
  left_join (pit_2019, by = c("coc_num.x" ="coc_num") ) %>%
  select (coc_num.x, esg_id, geo_name, coc_name)

#write.csv (matches_final, "data/outputs/coc_esg_perfect_matches.csv")

Adjust very low-income renter counts to ESG geography

The remaining two variables of the CARES Act formula, measuring very low-income renters, come from a custom tabulation of American Community Survey data that HUD receives annually from the U.S. Census Bureau. The Comprehensive Housing Affordability Strategy data demonstrates the extent of housing problems and housing needs, particularly for low-income households nationwide. HUD calculates funding allocations using census data at the place-remainder level (Summary Level 070). The Howard Center used geographic information system (GIS) analysis to create custom shape files that corresponded to the place-remainder census geography (summary level 070). It then overlaid them with each year’s ESG map, using areal weighting to apportion the very low-income renter counts where an ESG boundary split a place-remainder area. Areal weighting is one of the simplest ways to interpolate demographic data across geographic boundaries that do not correspond with U.S. Census Bureau boundaries. It does not provide perfect results because it assumes demography is distributed evenly across a geographic region. However, it needed to be used on only about 4% of the country in 2020 and, on average, less than 4% over the last decade.

The Code (2020_02_formula.Rmd)

Goal

Calculate ESG-CV2 allocations using crosswalk constructed in 2020_01_simple_shares and data gathered in QGIS.

Sources

Gather ACS and CHAS data

This code shows how I import the 2012-2016 ACS population data and combine the component parts making up two 2012-2016 CHAS variables used in the ESG-CV2 formula.

HUD combined the following 14 variables at the 070 level to calculate an area’s number of very low-income renters at risk for homelessness:

  • t3_est47 (Renter occupied, lacking complete plumbing or kitchen facilities, <=30% of HAMFI)
  • t3_est48 (Renter occupied, lacking complete plumbing or kitchen facilities, 30-50% of HAMFI)
  • t3_est53 (Renter occupied, with more than 1.5 persons per room, none of the needs above, <=30% of HAMFI)
  • t3_est54 (Renter occupied, with more than 1.5 persons per room, none of the needs above, 30-50% of HAMFI)
  • t3_est59 (Renter occupied, with 1-1.5 persons per room, none of the needs above, <=30% of HAMFI)
  • t3_est60 (Renter occupied, with 1-1.5 persons per room, none of the needs above, 30-50% of HAMFI)
  • t3_est65 (Renter occupied, with housing cost burden >50%, none of the needs above, <=30% of HAMFI)
  • t3_est66 (Renter occupied, with housing cost burden >50%, none of the needs above, 30-50% of HAMFI)
  • t3_est71 (Renter occupied, with housing cost burden 30%-50%, none of the needs above, <=30% of HAMFI)
  • t3_est72 (Renter occupied, with housing cost burden 30%-50%, none of the needs above, 30-50% of HAMFI)
  • t3_est77 (Renter occupied housing cost burden not computed, none of the needs above, <=30% of HAMFI)
  • t3_est78 (Renter occupied housing cost burden not computed, none of the needs above, 30-50% of HAMFI)
  • t3_est83 (Renter occupied, has none of the 4 housing problems, <=30% of HAMFI)
  • t3_est84 (Renter occupied, has none of the 4 housing problems, 30-50% of HAMFI)

HUD combined the following 6 variables at the 070 level to calculate an areas’s number of very low-income renters at risk for unsheltered homelessness:

  • t3_est47 (Renter occupied, lacking complete plumbing or kitchen facilities, <=30% of HAMFI)
  • t3_est48 (Renter occupied, lacking complete plumbing or kitchen facilities, 30-50% of HAMFI)
  • t3_est53 (Renter occupied, with more than 1.5 persons per room, none of the needs above, <=30% of HAMFI)
  • t3_est54 (Renter occupied, with more than 1.5 persons per room, none of the needs above, 30-50% of HAMFI)
  • t3_est59 (Renter occupied, with 1-1.5 persons per room, none of the needs above, <=30% of HAMFI)
  • t3_est60 (Renter occupied, with 1-1.5 persons per room, none of the needs above, 30-50% of HAMFI)
#Import 2012-2016 ACS population data downloaded from SocialExplorer.org
pop <- read_csv("data/inputs/070-pop-2016.csv") %>%
#  mutate (Geo_PLACESE = as.character(Geo_PLACESE),
#          ACS09_5yr_B01003001 = as.character(ACS09_5yr_B01003001))%>%
  rename (GEOID = Geo_PLACESE,
          name = Geo_NAME,
          pop = ACS16_5yr_B01003001) %>%
  dplyr::select (Geo_GEOID, name, pop)
    
#Import CHAS variables confirmed by HUD
chas <- read_csv ("data/inputs/2012thru2016-070-csv/Table3.csv") %>% 
  clean_names() %>%
  dplyr::select (geoid, t3_est47, t3_est48, t3_est53, t3_est54, t3_est59, t3_est60, t3_est45, t3_est65, t3_est66, t3_est71, t3_est72, t3_est77, t3_est78, t3_est83, t3_est84) %>%
  mutate (
    vli_renters = (t3_est47 + t3_est48 + t3_est53 + t3_est54 + t3_est59 + t3_est60 + t3_est65 + t3_est66 + t3_est71 + t3_est72 + t3_est77 + t3_est78 + t3_est83 + t3_est84),
    vli_crowd = (t3_est47 + t3_est48 + t3_est53 + t3_est54 + t3_est59 + t3_est60)) %>%
  dplyr::select (geoid, vli_renters, vli_crowd) %>% 
  rename (Geo_GEOID = geoid)

#Join together
sl070_2020 <- pop %>%
  full_join(chas, by="Geo_GEOID")

#Export to add into shapefile in QGIS
##write.csv (sl070_2020, "data/FY2020/sl070_2020.csv")

QGIS analysis

See the Google Sheet here for more details on the separate analysis I did for each year 2011 through 2020 in QGIS. In a nutshell, overlaying the 2016 TIGER/Line place and county subdivision shapefiles produces a shapefile that matches the places and place remainders of summary level 070.

After joining together the population and CHAS data above, I joined it to the shapefile using the 22-digit unique identifier Geo_GEOID. I created this field in QGIS using the following expression in the field calculator:

CASE
WHEN “STATEFP” IS NULL THEN
concat (‘07000US’ + “STATEFP_2” + “COUNTYFP”+ “COUSUBFP” + ‘99999’)
ELSE concat (‘07000US’ + “STATEFP_2” + “COUNTYFP” + “COUSUBFP” + “PLACEFP”)
END

All county subdivisions in the overlaid file that don’t have a matching place when overlaid will have a null State FIPS code. These are the “Remainder of” areas, whose Geo_GEOIDs all end with “99999.” Otherwise, the Geo_GEOID is simply a concatenation of “07000US”, the state FIPS code, county FIPS code, county subdivision FIPS code and place FIPS code.

Aggregate data into ESG areas

Import prepared data from QGIS, crosswalks and PIT count, and then clean up by removing unnecessary columns and slivers from QGIS data

#Import csv from Census summary level 070 data overlaid onto ESG boundaries in QGIS
merged_esgs_data <- read.csv("data/FY2020/final/merged_intersects_esg.csv", stringsAsFactors = FALSE)

#Importing CDBG crosswalk from HUD
crosswalk <- read_excel("data/inputs/hud_calculations/Disaggregated COC to ESG crosswalk.xlsx") %>%
  clean_names() %>% 
  rename (UOGID = geocode,
          coc_num = coc_id) %>% 
  dplyr::select (1,3,4,8,9)

#Importing ESG crosswalk from HUD
crosswalk_esg <- read_excel("data/inputs/hud_calculations/Aggregated COC to ESG crosswalk.xlsx") %>%
  clean_names() %>%
  dplyr::select (1,2,6) %>%
  rename (coc_num = coc_id)

#Import 2019 PIT count to apply using HUD's crosswalk
pit_2019 <- read_excel("data/inputs/hud_calculations/2007-2019-PIT-Counts-by-CoC.xlsx", 
                       sheet = "2019", 
                       n_max = 397) %>% #used for csv, stringsAsFactors = FALSE)
  clean_names %>%
  dplyr::select (1:2, overall_homeless_2019, unsheltered_homeless_2019) %>%
  rename(coc_num = co_c_number,
         coc_name = co_c_name,
         coc_total_homeless = overall_homeless_2019,
        coc_unsh_homeless = unsheltered_homeless_2019) %>%
  mutate(coc_num = ifelse(coc_num == "MO-604a", "MO-604", coc_num))

#Change factors from shapefile into numeric to mutate later. 
merged_esgs_data$sl070_pop <- as.numeric(merged_esgs_data$sl070_pop)
merged_esgs_data$sl070_vli_ <- as.numeric(merged_esgs_data$sl070_vli_)
merged_esgs_data$sl070_vl_1 <- as.numeric(merged_esgs_data$sl070_vl_1)
#Add padding to UOGID to make sure they join properly later. 
#Add padding to State and County FIPS codes for joining FMRs later.
merged_esgs_data$UOGID <- str_pad(merged_esgs_data$UOGID, 6, pad = "0")
merged_esgs_data$STATEFP_2 <- str_pad(merged_esgs_data$STATEFP_2, 2, pad = "0")
merged_esgs_data$COUNTYFP <- str_pad(merged_esgs_data$COUNTYFP, 3, pad = "0")

merged_esgs_data <- merged_esgs_data %>%
#  Add data for 2 Louisville districts that didn't join in QGIS
  mutate (
    sl070_pop = case_when (
    #Louisville Central CCD
    NAMELSAD_2 == "Louisville Central CCD" ~ 30297,
    #Louisville West CCD
    NAMELSAD_2 == "Louisville West CCD" ~ 63820,
    TRUE ~ sl070_pop),
    sl070_vli_ = case_when (
    #Louisville Central CCD
    NAMELSAD_2 == "Louisville Central CCD" ~ 8215,
    #Louisville West CCD
    NAMELSAD_2 == "Louisville West CCD" ~ 10870,
    TRUE ~ sl070_vli_),
    sl070_vl_1 = case_when (
    #Louisville Central CCD
    NAMELSAD_2 == "Louisville Central CCD" ~ 220,
    #Louisville West CCD
    NAMELSAD_2 == "Louisville West CCD" ~ 855,
    TRUE ~ sl070_vl_1))
    
#remove unnecessary columns
merged_esgs_data <- merged_esgs_data %>%
  dplyr::select (-c(fid, NAME, PLACENS, LSAD, CLASSFP, PCICBSA, PCINECTA, MTFCC, FUNCSTAT, ALAND, AWATER, INTPTLAT, INTPTLON, COUSUBNS, LSAD_2, CLASSFP_2, MTFCC_2, CNECTAFP, NECTAFP, NCTADVFP, ALAND_2, AWATER_2, INTPTLAT_2, INTPTLON_2, fid_2))

#remove QGIS overlay slivers that could create errors later
merged_esgs_data <- merged_esgs_data %>%
  filter (area_prop > 0)
#Trying to get rid of edge errors, only slightly decreases final result error
#    mutate (area_prop1 = case_when(
#    area_prop > .95 ~ round(area_prop, digits = 1),
#    area_prop < .01 ~ round(area_prop, digits = 1),
#    TRUE ~ area_prop)) 
#merged_esgs_data <- merged_esgs_data %>%
#  rename (area_prop_orig = area_prop,
#          area_prop = area_prop1)

#looking for edge errors in QGIS data
merged_esgs_data %>% 
  filter (str_detect(sl070_name, "Union County") & STATEFP_2 == "34" & area_prop < 1) %>% 
  select (sl070_name, sl070_pop, area_prop, NAME_3)

merged_esgs_data %>% 
  filter (str_detect(sl070_name, "Portland") & STATEFP_2 == "41" & area_prop < 1) %>% 
  select (sl070_name, sl070_pop, area_prop, NAME_3, Geo_GEOID)

merged_esgs_data %>% 
  filter (str_detect(sl070_name, "Long Beach") & STATEFP_2 == "06" & area_prop < 1) %>% 
  select (sl070_name, sl070_pop, area_prop, NAME_3, Geo_GEOID)

merged_esgs_data %>% 
  filter (str_detect(sl070_name, "Franklin County") & STATEFP_2 == "39") %>% 
  select (sl070_name, sl070_pop, area_prop, NAME_3, Geo_GEOID)

merged_esgs_data %>% 
  group_by(area_prop) %>%
  count()

Areal interpolation of population data

Following this methodology available from Yale, I used areal weighting to adjust population data from the SL070 level into the overlaid ESG geography. It’s important to note demography is not necessarily uniformly distributed across geographic entities so this will not provide perfect results.

  1. I calculated the area of each SL070 polygon with the expression “$area” in the QGIS field calculator.
  2. I then overlaid that SL070 shapefile with the FY2020 ESG shapefile (362 jurisdictions). The resulting shapefile has 78,674 areas.
  3. I calculated the area of these new overlaid polygons in QGIS.
  4. I calculated the proportion of the overlaid polygons that fall within the original SL070 geography using this expression in QGIS: round (“overlay_ar” / “sl070_area”, 5)
  5. 5,248 of these overlaid polygons have an area_prop of 0, meaning they’re slivers caused by slight mismatches in borders between the SL070 and ESG shapefiles. Removing these reduces the total number of affected polygons to 73,426.
  6. 6,511 of 73,426 polygons have a proportion less than 1, meaning areal weighting might cause slight inaccuracies in 9% of polygons. That means 91% of polygons will use the exact data from SL070 without weighting.
#How many polygons will areal weighting affect? (can only run once you've run subsequent chunks)
merged_esgs_data %>% 
  filter (!area_prop == 0) %>% #remove slivers caused by slight mismatches of SL070 & ESG shapefiles
  filter (area_prop < 1) %>% #look for anywhere the overlaid geography isn't exactly the same as the SL070 geography
  count() 

6511 / 73426

Apply areal weighting to variables

Multiply population and CHAS variables by their area proportion.

Census map showing Daytona Beach, Florida
The area outlined in red shows a section of Daytona Beach, Florida, with its demographic data represented at the census place-remainder level. Areal weighting will distribute 18.6% of its very low-income renters into the Volusia County ESG funding area (shown in yellow) and 81.4% of its very low-income renters into the Florida non-entitlement ESG funding area (shown in blue). Non-entitlement areas receive federal homeless assistance through their state rather than directly from HUD.

fixed_esgs <- merged_esgs_data %>%
    mutate (pop_wt = (sl070_pop * area_prop),
    vlirent_wt = (sl070_vli_ * area_prop),
    vliover_wt = (sl070_vl_1 * area_prop))

#Check places with area_prop of 1. sl070 variables should be same as pop_wt, so this command should return ~66900 places
fixed_esgs %>%
  filter (sl070_pop == pop_wt) %>% 
  filter (area_prop == 1) %>%
  dplyr::select (sl070_name, area_prop, sl070_pop, pop_wt )

Adjust CHAS data to ESG geography

Group data by ESG area (UOGID) and summarize weighted data for two CHAS variables.

esg_vli <- fixed_esgs %>% 
  group_by (UOGID) %>% 
  summarize (t_vli_renters = sum(vlirent_wt, na.rm=TRUE), 
    t_vli_rent_over = sum(vliover_wt, na.rm=TRUE))  

For 2020, join HUD’s crosswalk to add COC match to this data. For previous years, use my calculated crosswalk created in 01_2020_shares.

Then join 2019 PIT count.

#joining crosswalk to grouped sl070 data
all_tracts_esg <- esg_vli %>%
    full_join (crosswalk_esg, by = c("UOGID"="esg_id")) # HUD's crosswalk for 2020
   # full_join (compare, by = c("UOGID"="esg_id")) # My crosswalk for earlier years

#join in PIT count
joined_pit <- all_tracts_esg %>%
  left_join ( pit_2019, by = "coc_num") 

#Any NAs? Will appear for COCs that have disbanded like AR-504
joined_pit %>% 
  filter (is.na (coc_total_homeless))
joined_pit %>% 
  filter (is.na (coc_unsh_homeless))

Apply share to PIT count to calculate the homeless count in each of the 654 unique ESG-COC areas.

joined_pit <- joined_pit %>%
  mutate (share_homeless = coc_total_homeless * share_of_coc_in_this_part_of_esg_area,
          share_unsh = coc_unsh_homeless * share_of_coc_in_this_part_of_esg_area)

#Checking homeless totals
#These will be slightly below actual count because they don't include counts from four insular areas (American Samoa, Guam, Northern Mariana Islands and Virgin Islands)
sum(joined_pit$share_homeless, na.rm=T)
sum(joined_pit$share_unsh, na.rm=T)
#Actual PIT count
sum(pit_2019$coc_total_homeless)
sum(pit_2019$coc_unsh_homeless)

#write.csv (joined_pit, "data/outputs/pit_joined_to_crosswalk.csv")

Group by ESG to summarize the unique COC-ESG areas into their proper ESG areas.

esg_homeless <- joined_pit %>% 
  group_by (UOGID) %>%
  summarize (homeless = round(sum(share_homeless, na.rm=T)),
             unsh = round(sum(share_unsh, na.rm=T)))

#Checking homeless totals
#These will be slightly below actual count because they don't include four insular areas
sum(esg_homeless$homeless)
sum(esg_homeless$unsh)
#Actual PIT count
sum(pit_2019$coc_total_homeless)
sum(pit_2019$coc_unsh_homeless)

#Any COCs in PIT count that aren't showing up? Need to add manually in 01_simple_shares (other than insular areas). 
anti_join (pit_2019, joined_pit, by = "coc_num")

Adjust all data by local fair-market rents

HUD adjusted each of the CARES Act formula variables to account for places with high housing costs by calculating a weighted national average rent for a one-bedroom apartment. It then considered how much that average differed from each location’s fair market rent and gave communities proportionately weighted increases in the formula. For example, a city whose fair market rent was 10% above the national average would have its total homeless count increased by 10%. Non-entitlement areas were not adjusted, regardless of housing costs. Increases were capped at 20% to lessen the impact of high-cost outliers.

Adjusting Fair Market Rent geographies into ESG areas

HUD adjusted each of these variables to take into account local housing and economic circumstances. Specifically, HUD adjusted variables upward to a maximum of 20% to account for places with high housing costs.

Fair market rent data for FY2020 became available Oct. 1, 2019, at https://www.huduser.gov/portal/datasets/fmr.html?WT.mc_id=Sept192019&WT.tsrc=Email#2020

This code imports the FMR data to prepare it for weighting our variables. We’ll match it to our ESG data by county FIPS code. Unfortunately, the six New England states without counties won’t match properly, so this code pulls out Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island and Vermont (which conveniently are the only states whose FMR FIPS code end with “99999”) and calculates the average of each FMR area’s rents to use instead.

The other states all just need to have the “99999” removed from their FIPS code to match up with the rest of our data.

#Import 2020 FMR data
fmr <- read_excel("data/inputs/fmr/FY20_4050_FMRs_rev.xlsx") %>%
  dplyr::select (fips2010, fmr_1, areaname) %>%
  rename (fmr1 = fmr_1)

#Calculate FMR for 6 New England states without counties to match by GEOID_2
new_england_counties <- fmr %>%
  filter(!(str_detect(fips2010, "99999"))) %>%
  #mutate (fips = substr(fips2010, 1, 5)) %>%
  rename (GEOID_2 = fips2010) %>%
  dplyr::select (GEOID_2, areaname, fmr1) %>%
 # distinct(areaname, .keep_all = TRUE) %>%
  group_by(areaname) %>%
  mutate (fmr1 = mean(fmr1))#%>%
  #rename (county_fips = fips)
  
#Prepare other states' data to be joined by county FIPS code by removing 9s
final_fmr <- fmr %>%
  filter((str_detect(fips2010, "99999"))) %>%
  mutate (fips = substr(fips2010, 1, 5)) %>%
  rename(county_fips = fips) %>%
  dplyr::select (county_fips, areaname, fmr1)

Create county_fips code in our gathered data to pull in FMRs for non-New England states. GEOID_2 column in this data already matches the FMR Area code. This code joins the two dataframes into our full ESG dataframe.

#create county fips code in QGIS sl_070 data for matching FMRs
fixed_esgs <- fixed_esgs %>% 
  mutate (county_fips = paste0(STATEFP_2, COUNTYFP)) 

#pad GEOID_2 to ensure proper matching 
fixed_esgs$GEOID_2 <- str_pad (fixed_esgs$GEOID_2, 10, pad ="0")

#join dataframes
esg_homeless_fmr <- fixed_esgs %>%
  full_join (final_fmr, by="county_fips") %>%
  full_join (new_england_counties, by="GEOID_2") %>%
#New England's FMRs will be in fmr1.y
#All other states' FMRs will be in fmr1.x. 
#This next line combines them into one column.
  mutate ( fmr1 = if_else (is.na(fmr1.y), fmr1.x, fmr1.y))

#Check for any missed? Should only be 6 areas over water "County subdivisions not defined"
esg_homeless_fmr %>% 
  filter (is.na(fmr1))

#Check for FMRs that haven't matched to the ESG data. Should be Samoa, Guam, Mariana Islands and Virgin Islands. Also a St. Louis area county that doesn't actually exist (29056)
esg_homeless_fmr %>% 
  filter (is.na(UOGID))

This code groups by ESG to calculates an average FMR for each ESG area. I also tried a weighted mean, weighted by population of each little unit of area, but that resulted in a bigger percent difference from HUD’s calculations at the end.

This also removes those areas without an FMR we found above to prevent errors later.

#esg_homeless_fmr <- esg_homeless_fmr %>%
#  group_by (UOGID) %>%
#  summarize (esg_fmr = mean (fmr1, na.rm = T)) %>%
        #     esg_fmr_wt = weighted.mean(fmr1, pop_wt, na.rm=T)) 
#  filter (!is.na(UOGID)) #breaks weighted mean otherwise

esg_homeless_fmr <- esg_homeless_fmr %>%
  group_by (UOGID) %>%
  summarize (#esg_fmr = mean (fmr1, na.rm = T),
             esg_fmr = weighted.mean(fmr1, pop_wt, na.rm=T)) %>% 
  filter (!is.na(UOGID)) #breaks weighted mean otherwise

#Check for any missed? Should only be 0.
esg_homeless_fmr %>% 
  filter (is.na(esg_fmr))

#esg_homeless_fmr %>% 
#  filter (is.na(esg_fmr_wt))

#esg_homeless_fmr %>% 
#  filter (esg_fmr != esg_fmr_wt) %>%
#  mutate (diff = esg_fmr_wt - esg_fmr) %>%
#  arrange (desc(diff))

Join homeless counts, CHAS data and FMR to ESG geography

It should result in dataframe of 358 places that got ESG funding in 2020, minus four insular areas.

unadj_data2 <- full_join (esg_homeless, esg_vli, by="UOGID") %>%
  full_join (esg_homeless_fmr, by ="UOGID") 

Weighting variables based on FMR

Here’s where the weighting of variables by fair market rent comes into play. HUD calculated the national average of one-bedroom fair market rent (weighted on each of the four variables from the CARES Act formula) and then increased a community’s value by that percentage.

For example, a city whose fair market rent was 10% above the national average when weighted by total homeless count would have the value of its total homeless count increased by 10% before calculating its share of the total homeless count overall. Nonentitlement areas were all calculated using unadjusted counts, and no data was adjusted downward. Increases were capped at 20% to lessen the impact of high-cost outliers.

This code adds the four weighted averages to our dataframe.

weighted_data1 <- unadj_data2 %>%  
  mutate (homeless_fmr_adj = weighted.mean(esg_fmr, homeless, na.rm=T),
          unsh_fmr_adj = weighted.mean(esg_fmr, unsh, na.rm=T),
          vli_fmr_adj = weighted.mean(esg_fmr, t_vli_renters, na.rm=T),
          vli_over_fmr_adj = weighted.mean(esg_fmr, t_vli_rent_over, na.rm=T),
    #      homeless_fmr_adj2 = weighted.mean(esg_fmr, homeless, na.rm=T),
     #     unsh_fmr_adj2 = weighted.mean(esg_fmr, unsh, na.rm=T),
      #    vli_fmr_adj2 = weighted.mean(esg_fmr, t_vli_renters, na.rm=T),
       #   vli_over_fmr_adj2 = weighted.mean(esg_fmr, t_vli_rent_over, na.rm=T)
          )

This code mimics the SPSS code HUD sent me showing how they calculate the percentage to adjust variables from the correct weighted national average. They simply divide the area’s FMR by the national weighted average.

weighted_data2 <- weighted_data1 %>%  
  mutate (homeless_adj = (esg_fmr / homeless_fmr_adj),
          unsh_adj = (esg_fmr / unsh_fmr_adj),
          vli_adj = (esg_fmr / vli_fmr_adj),
          vli_over_adj = (esg_fmr / vli_over_fmr_adj))  

This code selects entitlement areas only (those whose UOGIDs do not end with “9999”) and calculates a multiple for all areas’ variables. This allows for adjusting areas with higher-than-average FMRs upward by that multiple, capping adjustments at 20%.

  • Areas whose FMR is at or below the national weighted average get a multiple of 1.
  • Areas whose FMR is more than 20% over the national weighted average get a multiple of 1.2.
  • Areas whose FMR is 0-20% over the national weighted average get a multiple matching that percent.
weighted_data3 <- weighted_data2 %>%
  mutate (adj_homeless = case_when(
    (!str_detect(UOGID, "9999$")) & between(homeless_adj, 1, 1.20) ~ homeless_adj,
    (!str_detect(UOGID, "9999$")) & between(homeless_adj, 0, 1) ~ 1,
    (!str_detect(UOGID, "9999$")) & between(homeless_adj, 1.2, 100) ~ 1.2),
          adj_homeless = replace_na(adj_homeless, 1)
    ) %>%
  mutate (adj_unsh = case_when(
    (!str_detect(UOGID, "9999$")) & between(unsh_adj, 1, 1.20) ~ unsh_adj,
    (!str_detect(UOGID, "9999$")) & between(unsh_adj, 0, 1) ~ 1,
    (!str_detect(UOGID, "9999$")) & between(unsh_adj, 1.2, 100) ~ 1.2),
          adj_unsh = replace_na(adj_unsh, 1)
    ) %>%  
  mutate (adj_vli = case_when(
    (!str_detect(UOGID, "9999$")) & between(vli_adj, 1, 1.20) ~ vli_adj,
    (!str_detect(UOGID, "9999$")) & between(vli_adj, 0, 1) ~ 1,
    (!str_detect(UOGID, "9999$")) & between(vli_adj, 1.2, 100) ~ 1.2),
          adj_vli = replace_na(adj_vli, 1)
    ) %>%  
  mutate (adj_vli_over = case_when(
    (!str_detect(UOGID, "9999$")) & between(vli_over_adj, 1, 1.20) ~ vli_over_adj,
    (!str_detect(UOGID, "9999$")) & between(vli_over_adj, 0, 1) ~ 1,
    (!str_detect(UOGID, "9999$")) & between(vli_over_adj, 1.2, 100) ~ 1.2),
          adj_vli_over = replace_na(adj_vli_over, 1)
    ) 

This code applies the multiple calculated in the previous step to all variables’ values and removes unnecessary columns.

weighted_data4 <- weighted_data3 %>%
  mutate (final_homeless = homeless * adj_homeless,
          final_unsh = unsh * adj_unsh,
          final_vli = t_vli_renters * adj_vli,
          final_vli_over = t_vli_rent_over * adj_vli_over
          ) 

#check for places that got adjusted upward for homeless count. esg_fmr should be > homeless_fmr_adj
weighted_data4 %>% 
  filter (adj_homeless > 1) %>% 
  dplyr::select (1,2,6,7, 11,15, 19)

#reduce extra columns for final dataframe with all data, weighted by FMR, joined to ESG areas
weighted_data5 <- weighted_data4 %>% 
  dplyr::select (1, 19:22)

Final calculation of new formula allocations

HUD summed each formula variable to get national totals, which were then used to calculate how much each of the 362 communities nationwide would get in emergency homeless aid. Because the U.S. territories of American Samoa, Guam, the Northern Mariana Islands and the Virgin Islands do not all have annual homeless counts or data on very low-income renter counts, HUD set aside 0.2% of the funding and allocated it to these four areas by population.

The following formula shows the final calculation to determine the amount each ESG area receives:
Allocation amount ($2.954 billion [removes 0.2% for insular areas]) X
0.5 * (area’s adjusted count of total homeless / national sum of adjusted total homeless counts)
0.1 * (area’s adjusted count of unsheltered homeless / national sum of adjusted unsheltered homeless counts)
0.15 * (area’s adjusted count of very low-income renters / national sum of adjusted very low-income renters)
0.25 * (area’s adjusted count of very low-income renters living in overcrowded conditions or without full kitchen and plumbing / national sum of adjusted very low-income renters overcrowded or without full kitchen and plumbing)

Calculate national totals of each variable and set the allocation amount to match ESG-CV2’s funding minus insular areas. Then plug values into formula above to calculate shares for each ESG area.

nat_homeless <- sum (weighted_data5$final_homeless, na.rm=T)
nat_unsh <- sum (weighted_data5$final_unsh, na.rm=T)
nat_vli <- sum (weighted_data5$final_vli, na.rm=T)
nat_vli_over <- sum (weighted_data5$final_vli_over, na.rm=T)
amt <- 2954080000

shares3 <- weighted_data5 %>%
  mutate ( final_vli = replace_na ( final_vli, 1),
           final_vli_over = replace_na ( final_vli_over, 1),
           share = 
             (amt * ((0.50 * final_homeless / nat_homeless) + 
                       (0.10 * final_unsh / nat_unsh) + 
                       (0.15 * final_vli / nat_vli) +
                       (0.25 * final_vli_over / nat_vli_over)
             ))
  )

#Check that shares all add up to the allocation amount of 2954080000
sum(shares3$share, na.rm=T)

Accuracy check

Import HUD’s actual ESG-CV2 allocations to compare how my formula has done.

  • 86% of my calculations were were within 2% of HUD’s calculations
  • 6% were more than 5% off of HUD’s calculations, with the worst being 9.2% off.
#Import HUD ESG-CV for comparison (ESG-CV2)
hud_funding_cv2 <- read_excel("data/inputs/hud_allocations/esg-cv2-amounts.xlsx", 
                                     sheet = 1) %>% clean_names() %>%
  dplyr::select (key, locale, esg_cv2) %>% 
  rename(uogid = key) %>%
  filter(esg_cv2 != 0)

#Import HUD ESG-CV for comparison (ESG20)
compare_esg_cv2 <- shares3 %>%
  full_join (hud_funding_cv2, by = c("UOGID" = "uogid")) %>%
  mutate (formula_difference = (share - esg_cv2),
          pct_change = ((formula_difference / esg_cv2)*100),
          share = round(share)) %>%
  rename (my_formula = share,
          allocation = esg_cv2) %>%
  dplyr::select (1,7,6,8:10) 

mean(compare_esg_cv2$pct_change, na.rm=T) #1.53% when absolute value & -.086% otherwise
#write.csv (compare_esg_cv2, "data/outputs/compare_esg_cv2.csv")
#86% were were within 2% of HUD's calculations
compare_esg_cv2 %>%
  filter (pct_change < 2) 
#309/358 = 86%

#6% were more than 5% off of HUD's calculations, with the worst being 9.2% off.
compare_esg_cv2 %>%
  filter (pct_change > 5) %>% 
  arrange (desc(pct_change))
#22/358 = 6.1%

Calculate hypothetical ESG20 allocation

Using the same weighted data, simply apply new allocation amount of $290 million - $580,000 that went to insular areas to estimate what the ESG-CV2 formula would have allocated had it been in place for the regular FY2020 ESG cycle.

hud_funding_20 <- read_excel("data/inputs/hud_allocations/fy2020-formula-allocations-AllGrantees 042120.xlsx") %>% 
  clean_names() %>%
  dplyr::select (key, name, esg20) %>% 
  rename(uogid = key) %>%
  mutate (uogid = if_else (uogid == "119999", "110006", uogid)) %>%
  filter(esg20 != 0)

amt <- 289420000 #$290M - $580K to insular areas (.2%)
cutoff_amt <- 290000000 * .0005

shares2020 <- weighted_data5 %>%
  mutate ( final_vli = replace_na ( final_vli, 1),
           final_vli_over = replace_na ( final_vli_over, 1),
           share = 
             (amt * ((0.50 * final_homeless / nat_homeless) + 
                       (0.10 * final_unsh / nat_unsh) + 
                       (0.15 * final_vli / nat_vli) +
                       (0.25 * final_vli_over / nat_vli_over)
             ))
  )

#Calculate those places below 0.05% cutoff and rollup their shares into state non-entitlement regions
compare20 <- shares2020 %>%
  full_join (hud_funding_20, by = c("UOGID" = "uogid")) %>%
  mutate (formula_diff = (share - esg20),
          pct_diff = ((formula_diff / esg20)*100),
          state = str_sub(UOGID, 1, 2)) %>%
  rename (my_formula = share,
          allocation = esg20) %>%
  dplyr::select (1,7,6,8:11) %>%
  mutate (cutoff = if_else (my_formula < cutoff_amt, 0, my_formula)) %>%
  #filter (!str_detect(UOGID, "9999$")) %>%
  group_by (state) %>%
  mutate (st_tot = sum (my_formula),
          missing = sum (cutoff),
          roll_up = st_tot - missing,
          rolled_up = if_else (str_detect(UOGID, "9999$"), my_formula + roll_up, cutoff),
          pct_diff_rollup = ((rolled_up - allocation) / allocation)*100) %>%
  ungroup() %>%
  dplyr::select (-c(7:11)) %>%
  mutate (my_formula = round (my_formula),
          formula_diff = round (formula_diff),
          pct_diff = round (pct_diff, digits = 2),
          rolled_up = round (rolled_up),
          pct_diff_rollup = round (pct_diff_rollup, digits = 2))

#write.csv (compare20, "data/outputs/simpleshare_final/compare20_simpleshares.csv")

Combine a decade of data

This process was repeated for the years 2011-2019, using the appropriate data sources HUD would have used in each of those years to make ESG allocations. See each year’s code in the Rmarkdowns in this project’s Github repo.

The Code

Sum up hypothetical amounts with ESG-CV2 formula for 2011-2020.

esg2011 <- read.csv ("data/outputs/simpleshare_final/compare11_simpleshares.csv") %>% 
  mutate (X = 2011)
esg2012 <- read.csv ("data/outputs/simpleshare_final/compare12_simpleshares.csv") %>% 
  mutate (X = 2012)
esg2013 <- read.csv ("data/outputs/simpleshare_final/compare13_simpleshares.csv") %>% 
  mutate (X = 2013)
esg2014 <- read.csv ("data/outputs/simpleshare_final/compare14_simpleshares.csv") %>% 
  mutate (X = 2014)
esg2015 <- read.csv ("data/outputs/simpleshare_final/compare15_simpleshares.csv") %>% 
  mutate (X = 2015)
esg2016 <- read.csv ("data/outputs/simpleshare_final/compare16_simpleshares.csv") %>% 
  mutate (X = 2016)
esg2017 <- read.csv ("data/outputs/simpleshare_final/compare17_simpleshares.csv") %>% 
  mutate (X = 2017)
esg2018 <- read.csv ("data/outputs/simpleshare_final/compare18_simpleshares.csv") %>% 
  mutate (X = 2018)
esg2019 <- read.csv ("data/outputs/simpleshare_final/compare19_simpleshares.csv") %>% 
  mutate (X = 2019)
esg2020 <- read.csv ("data/outputs/simpleshare_final/compare20_simpleshares.csv") %>% 
  mutate (X = 2020)

#bind data frames together
final <- esg2020 %>% 
  rbind (esg2019) %>% 
  rbind (esg2018) %>% 
  rbind (esg2017) %>% 
  rbind (esg2016) %>% 
  rbind (esg2015) %>% 
  rbind (esg2014) %>% 
  rbind (esg2013) %>% 
  rbind (esg2012) %>% 
  rbind (esg2011) %>%
  rename (year = X) %>%
  mutate (UOGID = as.character (UOGID),
    UOGID = case_when (
    name == "District Of Columbia" ~ "110006", 
    name == "WASHINGTON" ~ "110006", 
    TRUE ~ UOGID))

final$UOGID <- str_pad(final$UOGID, 6, pad = "0")

#sum up actual allocations, hypothetical allocations, and allocations after those places awarded less than 0.05% were rolled up to the state level. 
summed <- final %>%
  group_by (UOGID) %>%
  mutate (t_actual = sum(allocation, na.rm=T),
          t_mine = sum(my_formula, na.rm=T),
          t_rollup = sum(rolled_up, na.rm=T),
          count = n()) %>%
  distinct(UOGID, .keep_all = TRUE)
  
#calculate difference between hypothetical and actual and percent differences for presentation.
final2 <- summed %>% 
  mutate (formula_diff = (t_mine - t_actual),
          pct_diff = ((formula_diff / t_actual)*100),
          rollup_diff = (t_rollup - t_actual),
          pct_diff_rollup = (rollup_diff / t_actual)*100) %>%
  dplyr::select (2,3,10:12, 6,7,14,9,1,13)

#write.csv (final2, "data/outputs/simpleshare_final/final_pprns.csv")

# How many places had money rolled up all 10 years? 81 (+4 insular areas)
# Which states had the most of these cities? Penn (15), NY (11), then MI, IL, IN & PR tied with 6 each
final2 %>% 
  filter (pct_diff_rollup == -100) %>%
  mutate (state = substr(UOGID, 1, 2)) %>%
  group_by (state) %>%
  count (state) %>% 
  arrange (desc(n))

# How many places had money rolled up at least once in the decade? 128, an additional 47
final %>% 
  filter (rolled_up == 0) %>% 
  distinct (UOGID, .keep_all = TRUE)

Final results

How much would each state’s awards have changed with the new formula from 2011-2020?

library(tidycensus)
fips <- fips_codes %>% 
  distinct (state_code, .keep_all=T) 

compare <- final2 %>% 
  mutate (state = substr(UOGID, 1,2))

compare2 <- compare %>%
  group_by (state) %>%
  summarize (new_formula_2011_2020 = sum(t_mine),
             actual_allocation_2011_2020 = sum(t_actual)) %>%
  mutate (diff = new_formula_2011_2020 - actual_allocation_2011_2020,
          pct_change = diff / actual_allocation_2011_2020 *100) %>% 
  left_join (fips, by = c("state" = "state_code")) %>%
  select(state_name, new_formula_2011_2020, actual_allocation_2011_2020, diff, pct_change) %>% 
  arrange (desc(diff))

#write.csv (compare2, "data/outputs/state_total_comparison.csv")

final3 <- compare %>% 
  left_join (fips, by=c("state" = "state_code")) %>% 
  select (-c(state, state.y, county_code, county))

#write.csv (final3, "data/outputs/simpleshare_final/final_pprns.csv")

These final calculations are presented in the following publications:

  1. ArcGIS interactive StoryMap visualizing data analysis described above
  2. “Calculating the impact of HUD’s homeless formulas” on The Associated Press
  3. “Calculating the impact of HUD’s homeless formulas” on ArizonaPBS/CronkiteNews
  4. Full package of investigative stories: Caring for COVID’s invisible victims on ArizonaPBS/CronkiteNews

Limitations of this analysis

There may be minor differences in the GIS analysis or the fair market rent adjustments that the Howard Center used compared to HUD. However, the vast majority of the Howard Center’s calculations (86%) were less than 2 percentage points off of HUD’s allocations for the second round of ESG funding in the CARES Act.

The Howard Center consulted a half-dozen formula funding experts at HUD and demographic researchers at Boston University and the University of Washington in conducting its analysis. Among the experts was Todd Richardson, an economist who heads the HUD research division that devised the new CARES Act homeless aid formula.

Responding to the Howard Center’s findings on how the CARES Act formula would have changed homeless funding amounts if applied over the last decade, Richardson said, “From a 10,000-foot level, these results are about what I would expect.”