# Estimates

**Product Description**

↓ Features

↓ Variables

↓ Geography

↓ Methodology

#### 2023 Basic Estimates and 2028 Projections

Estimates and Projections provide current year estimates and 5 year projections down to the county, zip code, tract, and block group. This product is ideally suited for business users who need to know where to market, where to expand, and how to allocate resources. Estimates and Projections lets researchers identify changes that have occurred in communities since the 2010 Census and recent American Community Survey (ACS) and forecasts demographic changes that are likely to occur in the future in these communities. We have now created an Expanded Estimates with data expanding out to 10 and 15 years. The same variables, the same geographies – just extending out further into the future.

For example, if you want to open a grocery store you would use Estimates and Projections to establish what an area looks like demographically, such as the number of families and household composition, as well as age and race breakouts. If you know the latitude and longitude of the location for your grocery store, you can quickly run a radius around the point to get a report of one mile, 2 miles, or on any radius that you want.

Estimates and projections are based on complex modeling systems designed to forecast the current and future composition of the U.S. population based on multiple inputs. Often the data used for the input is only available at larger geographic areas (county or state) and therefore must be modeled to determine how to apply these changes across a smaller geographic landscape. The easiest method would be to assume that all subsets of the county have equal growth rates to that of the county and just spread everything equally. Though this would give us some sense of how a number might change, it is not a very useful approach. Intuitively we know that some neighborhoods are expanding while others are declining. The Estimates and Projections are available on DVD or online access. The data is the same and shows how changes are occurring so that researchers and business users can make the best possible decisions when allocating resources.

#### 2023 Estimates Professional and 2028 Projections

If you need more data than the basic current year estimates and 5-year projections provides, then you need the Estimates Professional. This product includes all the variables on the standard Estimates and Projections product, and also expands on them to look at complete break-outs of sex by age by race, household types and household size, median income by race, and more, in both the current year estimates and the 5-year projections. We have now created an Expanded Estimates with data expanding out to 10 and 15 years. The same variables, the same geographies – just extending out further into the future.

This data set now has all new education variables. These new variables include school enrollment in public vs. private schools by the level of education and by sex in the population 3+, as well as educational attainment for population 25+ broken out by sex.

In addition, the Estimates Professional data set has consumer expenditures and demographic profiles. The consumer expenditures segment has current year and 5-year projections of household expenditures based on the Consumer Expenditure Survey (CEX). With consumer expenditures, you can look at household spending patterns across zip codes, tracts, or block groups. The consumer expenditures includes categories of household spending such as food, beverages, housing, transportation, health insurance, and much more. The Demographic Profiles segment pulls together multiple variables and compares residents of an area based on an index of nation averages. The demographic profiles allow you to target the areas that meet your desired customer or client profile. The Consumer Expenditures are also included on our Expanded Estimates with data projected out 5, 10 and 15 years into the future.

The Estimates Professional is available in 5 geographies: states, counties, tracts, block groups, and zip codes. You can also run a radius around a latitude/longitude point.

#### 2023 Estimates Premium and 2028 Projections

Estimates Premium has all of the data from the Estimates/Projections and the Estimates Professional and then also includes information about Poverty, Family Income, Employment and additional Housing variables. For a complete list of variables select the “Variable” tab above. We have now created an Expanded Estimates 2023 Estimates Premium and 2028 Projections which also include data for 2033 and 2038.

The Estimates Premium not only has many more variables than the other two Estimates products it also has more geographies at which to see the data. It of course has the same 5 geographies: states, counties, tracts, block groups, and zip code data. You can also run a radius around a latitude/longitude point. But in addition it includes Towns (MCDs), Cities (Places) and Cities including their Suburbs (MSAs).

**We offer an academic discount on this particular product, please call 1-800-577-6717 for information.**

*If you already own the Estimates and want to upgrade to the Expanded Estimates you can do that by calling GeoLytics at 800-577-6717.*

#### Comparison Table

Basic | Professional | Premium | |
---|---|---|---|

State Single User | $349.00 | $750.00 | $1,095.00 |

National Single User | $595.00 | $1,395.00 | $2,095.00 |

Geographic Identifiers | ✪ 25 variables | ✪ 25 variables | ✪ 25 variables |

Summaries | ✪ 81 variables | ✪ 81 variables | ✪ 81 variables |

Population | ✪ 313 variables | ✪ 313 variables | |

Households | ✪ 36 variables | ✪ 36 variables | |

Housing | ✪ 34 variables | ✪ 41 variables | |

Income | ✪ 41 variables | ✪ 41 variables | |

Consumer Expenditures | ✪ 99 variables | ✪ 99 variables | |

Profiles | ✪ 45 variables | ✪ 45 variables | |

School Enrollment Public vs. Private for population 3+, broken out by sex and education | ✪ 47 variables | ✪ 47 variables | |

Educational Attainment for population 25+, broken out by sex and by education level | ✪ 35 variables | ✪ 35 variables | |

Family Income | ✪ 18 variables | ||

Poverty | ✪ 6 variables | ||

Labor | ✪ 15 variables | ||

Occupation | ✪ 45 variables |

#### Geography

Each of the data sets for the **Basic and Professional versions ** is available at any of the following 5 levels of geography:

- States
- Counties
- Tracts
- Block Groups
- Zip Codes

Each of the data sets for the **Premium version ** is available at any of the following levels of geography:

- Nation
- States
- MSAs
- Counties
- Places (Cities)
- MCDs (Towns, Townships)
- Tracts
- Block Groups
- Zip Codes

*Zip code definitions come from at least two different federal agencies and they don't always line up perfectly. The data in this product defines zip codes using the US Postal Service (USPS) definitions. But the geographic boundaries are derived from the Census Bureau's (USCB) Tiger files and are officially Zip Code Tabulation Areas (ZCTA). Most of the time these two are relatively in accord. But there are some differences. For example, the USPS has zip codes for PO Boxes. The USCB does not - no one "lives there" though many people may receive mail there and it is an official mailing address. Also, the USCB seeks to cover the entire country so for example, it will assign NY City's Central Park or Yosemite to a zip code. The USPS may not include these areas because, in fact, they are not official mailing addresses.*

#### Methodology

**Methodology - Population, Housing, and Income Estimates**

First a quick overview:

To build population estimates one needs several data sets. The population changes that occur in an area will be the addition of births, subtraction of deaths and the addition/subtraction of those who move. The starting point is the 2010 Redistricting BLOCK level data set. This has the most detailed and comprehensive numbers about where the entire population of the US lives and their race. Unfortunately, the Redistricting data set does not have age breakouts so for that we turned to our existing breakouts from the US Census Bureau´s county level estimates, as this was the basis for all of our previously released estimate data sets

The 2009 American Community Survey (ACS) is not in fact true 2009 data but rather a summary of surveys administered over 5 years agglomerated into a single value when you run it at the Block Group level (2005-2009. So we cannot really consider these numbers to be indicative of 2009 numbers but rather are more like the midpoint 2007 findings. We can obtain only Block Group level age breakouts so we use those to weight several variables that are not available in the Redistricting, such as Income, Educational Attainment, etc.This data also has to be converted to the new 2010 boundaries since it is in the 2000 boundaries.

To progress from the 2010 data to current year estimates, we use the US Census Bureau´s (USCB) County and State level annual estimates to roll the numbers forward to the current year. But the USCB data is only available at the County and State level, so the next challenge is distributing the data down to the smaller geographies. To do this we utilize actuarial tables for births and deaths by age and race, and use them to create a model of "likelihood" of dying or likelihood of having a child. This then is what creates the engine driving the increase and decrease in population growth.

The third step is to look at immigration and emigration. Where are people moving "to" and where are they moving "from". The US Postal Service keeps track of all moves as a "to" and "from" location.

Now the more detailed explanation:

1. Working with the Census Bureau "estimation base" county level numbers.

This data is processed to obtain "race distribution" coefficients. However, the Census Bureau estimation base data do not include "other" race category. Also, "two or more races" category is much smaller than it is in Redistricting Census data. By comparing the estimation base to Redistricting county level data, it is possible to obtain some numeric ratios as to how "other race" and "two or more races" populations were distributed among the remaining races in the USCB´s estimation base. These coefficients allow us to re-map the block level data and redistribute the "other race" and part of the "two or more races" population among the 6 remaining mutually exclusive races.

2. The Redistricting block level data are processed with these new racial distribution coefficients. The resulting dataset is our estimation base. It includes 8 race/origin groups:

WA | White alone |

BA | Black alone |

NA | Native American alone |

AA | Asian alone |

PA | Pacific alone |

R2 | Two or more races |

HS | Hispanic |

WN | White, not Hispanic |

We do not have the "Other Race" category in the estimates even though Census 2010 does, because the USCB dropped the "Other Race" data from its estimates. They switched to 8 races in 2001 and we had to follow. It is worth mentioning that the USCB redistributed the racial counts of Other Race completely and the counts for "2 or more Races" were partially redistributed between the rest of the races in their estimates. We did the same and therefore the racial breakdown differs from the Census 2010 but fits the 2001-9 USCB estimates. We believe that the USCB made these changes because there are no actuarial tables for "other" or "2 or more" races so they needed to redistribute those people into one of the race categories by which they could create estimates

3. Having dealt with Race we then turn to Age. The USCB groups the population into 18 age groups. These range from age 0 (under 1) to age 108. The age groups are each 5 year intervals (0-4, 5-9, etc) except the ages 85 and up (85-108) are treated as a single group.

4. Now that we have the entire population broken down into age and race categories we begin building the death-birth model. With the use of Actuarial tables we calculate the statistical likelihood for any given age/race group to die or to give birth. We then apply these coefficients to the 2010 data to create an estimation base for 2011, the coefficients are reapplied to create 2012, and so on until we get to the current year.

The model includes:

- transformation of age group distribution to "exact age" distribution. The resulting data set has population groups for each single year of age from 0 to 108.
- application of death probabilities for a specific age, sex and race group.
- application of birth rates for a specific age, sex and race group. The white population is treated as a mix of white not Hispanic and Hispanic population. The mix ratio is determined from the block data.
- 1 year shift.
- collecting the annual data into 5-year buckets.
- comparison of the results with Census Bureau estimates for this year.
- the results of comparison are used to tweak birth rates and death probabilities to make the numbers of both newborn and deceased in the model to be exactly equal to Census Bureau numbers for each county. The racial distribution is also tweaked to reflect that of Census Bureau data. It puts the annual estimates in sync with USCB data as much as possible.

5. The same model is applied to the results for the projections. This time, however, the "tweaking coefficients" are predicted (as we do not have any materials for comparison). The prediction algorithm is based on a linear regression approach (they actually fit the linear plot very nicely),**Methodology - Household Estimates**

The household estimates were calculated from:

- the ACS Census data with household counts at the Block Group level
- the estimated data on the households
- the Census data on the age-race-sex
- the estimated data on the age-race-sex.

GeoLytics calculates the ratios of Census household variables to Census age-race-sex data and Census housing data and then used these ratios for estimated data of the same nature to get the estimated values. The underlying assumption being that the average family size by race will not have changed dramatically in the years since the 2009 ACS Census was compiled.**Methodology - Housing Estimates**

The only way that the number of housing units (HU) changes is if new buildings are built or old ones torn down. Some houses can be built on empty lots, but if a lot of houses are built usually a whole new development gets put in. So the first thing that we do is to look at the TIGER/Line files. This is the USCB file that shows each and every street in the US and has the numbers of each housing unit. By looking at this dataset we can determine if new streets have been put in and by looking at the numbering we can determine about how many units are being built. We can also see if new numbers have been added to an existing street.

1. The TIGER/Lines records for the years 2009 and 2010 were analyzed. For each block, the sum of associated address ranges was calculated. As a result, each block was assigned a Change Coefficient (CC), a number representing the changes in the aggregate number of addresses within this block. The number is a fraction between -1 and +1. The number 0 represents a block that has not been changed within this time interval. The number +1 represents a block that did not have any addresses in 2009 and has some in 2010, and the number -1 is a block with no addresses in 2010 and had some addresses in 2009. The block changes were later summarized to BG level.

2. The Census Bureau Housing Units Estimates (at the county) for the years 2009 to 2010 were used to assess the number of HU per county for the year 2011 via a linear regression algorithm.

3. For each county, the Census Bureau HU growth/decline was distributed among BGs of this county so that:

- BGs with CC = 0 did not change any HU counts
- BGs with CC not equal to 0 received some parts of the county growth on proportional basis so that BGs with CC > 0 received some HUs and BGs with CC < 0 lose some HUs. The results vary from small changes (mostly, a few percents is a typical change) to some pretty dramatic changes of 3-5 times (rarely). These obviously are where large housing complexes went in and dramatically changed the number of housing units in the block group.

Once we had the change in the number of Housing Units we can then look at the other housing variables such as of number of rooms, vacancy status, tenure (own vs. rent) status, etc. People all live in either a household or a group quarter (military barracks, college dorms, nursing homes, prisons, mental institutions, half-way homes, etc). The group quarters were left stable so the changes in population were then accounted for in the changes in Housing Units that had now been calculated. So for example, if the housing units stayed the same but the population numbers dropped than the vacancy status would go up.

The sum of all changes for all BGs in a county is equal to the Census Bureau HU county growth estimates.**Methodology - Income Estimates**

When calculating Income Estimates there are several components. First, we needed to calculate the changes in income from 2000 to 2009 so that we would have a basis for estimating forward. We also needed to account for the age changes (everyone has aged since April 2000 so all of the age categories needed to shift up).

1. The first step was to create an Income Growth by Race number for each Block Group. Luckily, the 2009 ACS data is actually in the 2000 block group boundaries so we were able to compare data over the exact same geographies. Unfortunately though to roll it forward to the 2011 estimates we then have to normalize the data to the new 2010 geographic boundaries - unfortunately, the 2010 Redistricting does not include any income variables otherwise we would have used this data set and not had to normalize the findings to comply with the new 2010 boundaries.

2. The BG-level racial growth data were applied to 2009 ACS Census data to obtain 2009 racial income growth coefficients for each BG area. First, the growth data for 2000-2009 were processed using a compound interest model. Second, the calculated "interest rates" were applied to 2000 racial income data to get the 2009 growth data.

The Income Growth data by Race were not available for many BG for some races because if there are very few households of a given race in a block group than numbers were suppressed by the USCB. For these cases, we used the USCB Median Income Estimates for years 2000-2009 to get 2011 state median income growth data using a linear regression algorithm and then used these state growth data for Block Groups and races.

3. The racial aggregate income data were processed in the same manner as racial median income data.

4. The Householder age distributions were estimated by using estimated Householder totals from our dataset and an age shift model. Namely, for each age group, a calculated number of householders was moved to the next age group. The first and last age groups were processed in a special way to take into account both new and dead householders. The sum of all householder age brackets is equal to our estimated HH total.

5. The area income range data were estimated using a distribution shift model. First, we assumed that the Census 2000 income brackets represent the "best fit curve" frequency distribution, and then applied a linear stretch transformation to the income scale. Finally, we calculated the new income bracket values produced by this linear stretching of the frequency distribution. The stretch coefficient was equal to the median income growth ratio for this area. This means that the income increase moves some households from its income bracket in 2009 to the next income bracket in 2011. The number of such households can be estimated mathematically if we know the exact number of households for each income value. This exact number can be estimated using the "best fit curve" model.

6. Finally, the BG data (both medians and aggregates) were tuned so that summary state median values were exactly equal to the state median data for 2009, as estimated from Census Bureau´s ACS data set. This was done by using a two-section linear mapping scheme. The scheme

- moves the actual state median so it becomes equal to the target value;
- leaves state minimum and maximum median values for state BGs intact;
- is a*x + b - linear a) between state minimum median value for all state BGs and state median, and b) between state median and state maximum median value for all state BGs (with different a and b within these two segments).

**Methodology - Income Disparity**

For each area, we calculate 2 counts:

- FamLowInc = sum of family counts from FMUNDER10K to EFamIncEst.FM35_40K inclusive
- FamHighInc = sum of family counts from FM150_200K to FM200KPLUS inclusive.

Family disparity is 100 * log (FamLoInc / FamHighInc)

or (in equivalent form)

100 * (log (FamLoInc) - log (FamHighInc))

So it is essentially the 100 times the log of those who are poor (family income under $40K) divided by those who are really rich (family income over $150K).