Working with United States Census Data K. Mitchell, 7/23/2016 (no affiliation with U.S. Census Bureau)
Outline Types of Data Available Census Geographies & Timeframes Data Access on Census.gov website American FactFinder Data Sets Data topics in the ACS Using the American FactFinder Census Bureau API s R packages to work with Census Data What s the deal with PUMS data? Geography: Public Use Microdata Areas (PUMAs) Integrated Public Use Microdata Series: University of Minnesota Questions??
Types of Data Available from Census.gov Decennial Complete Census Short form, 100% of United States residents American Community Survey Long form sample of residents Many more questions, much smaller sample size Topologically Integrated Geographic Encoding and Referencing database (TIGER) https://tigerweb.geo.census.gov/tigerwebmain/tigerweb_main.html Other data collected/reported by the census bureau
Census Geographies (partial list) Whole United States Region: Northeast, Midwest, South, West Division: New England, Middle Atlantic, East North Central, West North Central, South Atlantic, East South Central, West South Central, Mountain, Pacific State County County Subdivision Zip code Census Block Public Use Microdata Areas (PUMA) Congressional District School District Metropolitan Statistical Area / Micropolitan Statistical Area
Census Timeframes (since 2000) Decennial Census Once every 10 years, complete data American Community Survey Sample taken every year, mandatory participation Results available for 1 year, 3 year, and 5 year estimates Results for 1 year estimate available for only larger geographical areas: represents a sample of 1% of the U.S. Population Results for 3 and 5 year estimates available for smaller geographical areas 5 year estimate represents a 5% sample of the U.S Population
Data Access On the Census Bureau Website census.gov Already Processed Online Easiest to use Visualizations Quick Facts Easy Stats And more! Summary Datasets Medium difficulty American Factfinder Community Facts Guided and Advanced Search Download Center Individual Survey Records Most difficult to use 5% Public Use Microdata Sample (PUMS) Individual replies to census surveys
American FactFinder: Available Data Sets American Community Survey American Housing Survey Annual Economic Surveys Annual Surveys of Governments Census of Governments Decennial Census Economic Census Equal Employment Opportunity (EEO) Tabulation Population Estimates Program Puerto Rico Community Survey
American Community Survey Topics Demographic Age and Sex * Group Quarters Population * Hispanic or Latino Origin * Race * Relationship * Total Population Housing Computer Ownership & Internet Access * House Heating Fuel * Kitchen Facilities * Occupancy/Vacancy Status * Occupants per Room * Owner Monthly Costs * Plumbing Facilities * Rent Statistics * Rooms * Bedrooms * Telephone Service Available * Tenure * Units in Structure * Value of Home * Vehicles Available * Year Householder Moved Into Unit * Year Structure Built Economic Class of Worker * Commuting to Work/Journey to Work * Employment Status * Food Stamps/Supplemental Nutrition Assistance Program (SNAP) * Health Insurance Coverage * Income and Earnings * Industry and Occupation * Poverty * Work Status Social Ancestry * Citizenship Status * Disability Status * Educational Attainment * Fertility * Field of Degree * Grandparents as Caregivers * Language * Marital History * Marital Status * Place of Birth * School Enrollment * Residence 1 Year Ago/Migration * Veterans * Year of Entry
American FactFinder: https://factfinder2.census.gov/
American FactFinder: Guided Search
American FactFinder: Guided Search
American FactFinder: Search results
American FactFinder: View and Modify Table
American FactFinder: View and Modify Table
American FactFinder: Download Table
ACS Measures of Statistical Variability Coefficient of Variation Coefficient of variation (CV), also known as relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is often expressed as a percentage, and is defined as the ratio of the standard deviation σ to the mean µ (or its absolute value, µ ). Margin of Error, % Margin of Error The margin of error is the difference between an estimate and its upper or lower confidence bound. All ACS published margins of error are based on a 90 percent confidence level. Standard Error = Margin of Error / 1.645 Lower Confidence Bound = Estimate - Margin of Error Upper Confidence Bound = Estimate + Margin of Error
American FactFinder: https://factfinder2.census.gov/
Download DP02, DP03, DP04, DP05
Census Data API s The census bureau has an ongoing initiative to enable your applications to access data. Sign up for their newsletter: https://public.govdelivery.com/accounts/uscensus/subscriber/new? topic_id=uscensus_7480 New API dataset discovery tool (beta): http://api.census.gov/data.html Request an API key at this link: http://api.census.gov/data/key_signup.html
R Packages to work with census data API acs Download, Manipulate, and Present American Community Survey and Decennial Data from the US Census Provides a general toolkit for downloading, managing, analyzing, and presenting data from the U.S. Census, including SF1 (Decennial short-form), SF3 (Decennial longform), and the American Community Survey (ACS). Confidence intervals provided with ACS data are converted to standard errors to be bundled with estimates in complex acs objects. Package provides new methods to conduct standard operations on acs objects and present/plot data in statistically appropriate ways. Current version is 2.0 +/-.033. Requires API key choroplethr to map census data (internally calls acs package) choroplethrzip to map census data by zip code (accessible via github) https://github.com/arilamstein/choroplethrzip
Example: Choropleths of industry participation by zip code
Example: Choropleths of industry participation by zip code library(choroplethrzip); library(mapproj); library(ggplot2) econ_zip <- read.csv("acs_14_5yr_dp03_with_annkm.csv") z <- data.frame(cbind(substr(as.character(econ_zip$geography),7,11), econ_zip[154])) names(z) <- c("region","value") z$region <- as.character(z$region) z$value <- as.numeric(as.character(z$value)) zip_choropleth(z, state_zoom = "new jersey", title="% in Transportation and warehousing, and utilities") + coord_map() z <- data.frame(cbind(substr(as.character(econ_zip$geography),7,11), econ_zip[158])) names(z) <- c("region","value") z$region <- as.character(z$region) z$value <- as.numeric(as.character(z$value)) zip_choropleth(z, state_zoom = "new jersey", title="% in Information") + coord_map()
What s the deal with PUMS data? American Community Survey (ACS) Public Use Microdata Sample (PUMS) files The full range of population and housing unit responses collected on individual ACS questionnaires Each record in the file represents a single person, or--in the household-level dataset--a single housing unit. PUMS files for an individual year, such as 2014, contain data on approximately one percent of the United States population. PUMS files covering a five-year period, such as 2010-2014, contain data on approximately five percent of the United States population. PUMS datasets on census.gov: https://www.census.gov/programssurveys/acs/data/pums.html Integrated Public Use Microdata Series maintained by the University of Minnesota. https://usa.ipums.org/usa/
Geography: Public Use Microdata Areas (PUMAs) Public Use Microdata Areas (PUMAs) are statistical geographic areas defined for the dissemination of Public Use Microdata Sample (PUMS) data. They are also used for disseminating American Community Survey (ACS) and Puerto Rico Community Survey period estimates. 1 2010 PUMAs: Nest within states or equivalent entities Contain at least 100,000 people Cover the entirety of the United States, Puerto Rico, Guam, and the U.S. Virgin Islands 2 Are built on census tracts and counties Should be geographically contiguous
Integrated Public Use Microdata Series: University of Minnesota Easy-to-use interface Series of drop-downs allow the selection of specific variables instead of dealing with the entire dataset at once. Once the data selection is made an extract may be requested in the desired format. Harmonized variables with other data sets including IPUMS USA U.S. Census ACS IPUMS International Official census from other countries At least one sample from 82 countries IPUMS CPS Current Population Survey; Bureau of labor statistics (BLS), primary source of labor force statistics for the United States population.
IPUMS USA: Select Variables via dropdown
IPUMS USA: Variables / data sets / codes
IPUMS USA: Select Samples
Examples: PUMS Data Usage Visualization of U.S. Household Configurations; Most Common Family Types in America, Nathan Yau, http://flowingdata.com/2016/07/20/modern-family-structure/
Questions??????