A few words on Census analogs: The Total Population count corresponds to the Census table P001, count P0010001. The rest correspond to Census age-race-sex tables from P012A to P012I, with the P012F (Other Race table) dropped. We do not have the "Other Race" category in the estimates even though Census 2000 does, because the USCB dropped the "Other Race" data from its estimates. They switched to 8 races in 2001 and we had to follow. It is worth mentioning that the USCB redistributed the racial counts of Other Race completely and the counts for "2 or more Races" were partially redistributed between the rest of the races in their estimates. We did the same and therefore the racial breakdown differs from the Census 2000 but fits the 2001 USCB estimates. We believe that the USCB made these changes because there are no actuarial tables for "other" or "2 or more" races so they needed to redistribute those people into one of the race categories by which they could create estimates
3. Having dealt with Race we then turn to Age. The USCB groups the population into 18 age groups. These range from age 0 (under 1) to age 108. The age groups are each 5 year intervals (0-4, 5-9, etc) except the ages 85 and up (85-108) are treated as a single group.
4. Now that we have the entire population broken down into age and race categories we begin building the death-birth model. With the use of Actuarial tables we calculate the statistical likelihood for any given age/race group to die or to give birth. We then apply these coefficients to the 2000 data to create an estimation base for 2001, the coefficients are reapplied to create 2002, and so on until we get to the current year.
The model includes:
- transformation of age group distribution to "exact age" distribution. The resulting data set has population groups for each single year of age from 0 to 108.
- application of death probabilities for a specific age, sex and race group.
- application of birth rates for a specific age, sex and race group. The white population is treated as a mix of white not Hispanic and Hispanic population. The mix ratio is determined from the block data.
- 1 year shift.
- collecting the annual data into 5-year buckets.
- comparison of the results with Census Bureau estimates for this year.
- the results of comparison are used to tweak birth rates and death probabilities to make the numbers of both newborn and deceased in the model to be exactly equal to Census Bureau numbers for each county. The racial distribution is also tweaked to reflect that of Census Bureau data. It puts the annual estimates in sync with USCB data as much as possible.
5. The same model is applied to the results for 2008-2013. This time, however, the "tweaking coefficients" are predicted (as we do not have any materials for comparison) from the tweaking coefficients for 2002 to 2007. The prediction algorithm is based on a linear regression approach (they actually fit the linear plot very nicely),
Methodology - Household Estimates
The household estimates were calculated from:
- the Census data on the household
- the estimated data on the households
- the Census data on the age-race-sex
- the estimated data on the age-race-sex.
GeoLytics calculated the ratios of Census household variables to Census age-race-sex data and Census housing data and then used these ratios for estimated data of the same nature to get the estimated values. The underlying assumption being that the average family size by race will not have changed dramatically in the years since the 2000 Census was compiled.
Methodology - Housing Estimates
The only way that the number of housing units (HU) changes is if new buildings are built or old ones torn down. Some houses can be built on empty lots, but if a lot of houses are built usually a whole new development gets put in. So the first thing that we did was to look at the TIGER/Line files. This is the USCB file that shows each and every street in the US and has the numbers of each housing unit. By looking at this dataset we can determine if new streets have been put in and by looking at the numbering we can determine about how many units are being built. We can also see if new numbers have been added to an existing street.
1. The TIGER/Lines records for the years 2000 and 2007 were analyzed. For each block, the sum of associated address ranges was calculated. As a result, each block was assigned a Change Coefficient (CC), a number representing the changes in the aggregate number of addresses within this block. The number is a fraction between -1 and +1. The number 0 represents a block that has not been changed within this time interval. The number +1 represents a block that did not have any addresses in 2000 and has some in 2007, and the number -1 is a block with no addresses in 2007 and has some addresses in 2000. The block changes were later summarized to BG level.
2. The Census Bureau Housing Units Estimates (at the county) for the years 2000 to 2007 were used to assess the number of HU per county for the year 2008 via a linear regression algorithm.
3. For each county, the Census Bureau HU growth/decline was distributed among BGs of this county so that:
- BGs with CC = 0 did not change any HU counts
- BGs with CC not equal to 0 received some parts of the county growth on proportional basis so that BGs with CC > 0 received some HUs and BGs with CC < 0 lose some HUs. The results vary from small changes (mostly, a few percent is a typical change) to some pretty dramatic changes of 3-5 times (rarely). These obviously are where large housing complexes went in and dramatically changed the number of housing units in the block group.
Once we had the change in the number of Housing Units we can then look at the other housing variables such as of number of rooms, vacancy status, tenure (own vs. rent) status, etc. People all live in either a household or a group quarter (military barracks, college dorms, nursing homes, prisons, mental institutions, half-way homes, etc). The group quarters were left stable so the changes in population were then accounted for in the changes in Housing Units that had now been calculated. So for example, if the housing units stayed the same but the population numbers dropped than the vacancy status would go up.
The sum of all changes for all BGs in a county is equal to the Census Bureau HU county growth estimates.
Methodology - Income Estimates
When calculating Income Estimates there are several components. First we needed to calculate the changes in income from 1990 to 2000 so that we would have a basis for estimating forward. This again required some racial break-out changes because in 1990 the Race grouping was "Asian and Pacific Islanders" whereas in 2000 they are two separate races. Additionally the age changes had to be accounted for (everyone has aged since April 2000 so all of the age categories needed to shift up).
1. The first step was to create an Income Growth by Race number for each Block Group. Luckily, we were able to use both the GeoLytics Census CD 2000 Long Form (SF3) and the CensusCD 1990 in 2000 boundaries Long Form data product for the 1990 data. By using this normalized data set it means that we already have dealt with the geographic boundary changes from 1990 to 2000 and can then look at just the differences in incomes.
2. The BG-level racial growth data were applied to 2000 Census data to obtain 2008 racial income growth coefficients for each BG area. First, the growth data for 1990-2000 were processed using a compound interest model. Second, the calculated "interest rates" were applied to 2000 racial income data to get the 2008 growth data.
The Income Growth data by Race were not available for many BG for some races because if there are very few households of a given race in a block group than numbers were suppressed by the USCB in 1990. For these cases, we used the USCB Median Income Estimates for years 2000-2006 to get 2008 state median income growth data using a linear regression algorithm, and then used these state growth data for Block Groups and races.
3. The racial aggregate income data were processed in the same manner as racial median income data.
4. The Householder age distributions were estimated by using estimated Householder totals from our dataset and an age shift model. Namely, for each age group, a calculated number of householders was moved to the next age group. The first and last age groups were processed in a special way to take into account both new and dead householders. The sum of all householder age brackets is equal to our estimated HH total for 2008.
5. The area income range data were estimated using a distribution shift model. First, we assumed that the Census 2000 income brackets represent the "best fit curve" frequency distribution, and then applied a linear stretch transformation to the income scale. Finally, I calculated the new income bracket values produced by this linear stretching of the frequency distribution. The stretch coefficient was equal to the median income growth ratio for this area. What it all means is that the income increase moves some households from its income bracket in 2000 to the next income bracket in 2008. The number of such households can be estimated mathematically if we know the exact number of households for each income value. This exact number can be estimated using the "best fit curve" model.
6. Finally, the BG data (both medians and aggregates) were tuned so that summary state median values were exactly equal to the state median data for 2008, as estimated from Census Bureau publications for 2000-2006 (see item 2). It was done by using a two-section linear mapping scheme. The scheme
- moves the actual state median so it becomes equal to the target value;
- leaves state minimum and maximum median values for state BGs intact;
- is a*x + b - linear a) between state minimum median value for all state BGs and state median, and b) between state median and state maximum median value for all state BGs (with different a and b within these two segments).