nccsdata Part 3: Geographic Filters




Introduction

In part 3 of this 4-part series on the nccsdata package, we cover how to filter downloaded NCCS data based on geography.

Legacy NCCS data consists of several geographic variables, such as:

  • STATE: 2 letter state abbreviation (all caps)
  • CITY: Name of the city associated with the address provided in ADDRESS (all caps)
  • FIPS: State + County FIPS codes (CBSA) as used by the US Census (five-digit integer)

The last variable, FIPS, can be used to match observations based on census units. This preserves the external validity of geographic units by aligning them with US census delineations.

In US census data, FIPS are also tied to Core Based Statistical Areas (CBSAs) that consist of mutually exclusive Metropolitan Statistical Areas (metros with populations above 50,000) and Micropolitan Statistical Areas (populations above 10,000 and below 50,000). Geographic filtering with US census units therefore requires crosswalking units across multiple levels, such as county and CBSA.

Further details and examples of CBSAs and Metropolitan/Micropolitan Statistical Areas are provided on the Census Crosswalks page of the Urban NCCS Site.

All Census-defined metro areas are comprised of counties. This vignette explores the process of filtering nonprofit data by geography using geo_preview() and map_countyfips() to identify the County FIPS associated with specific Metropolitan or Micropolitan Regions.

Exploring CBSA FIPS Codes

The geo_preview() function allows users to preview and retrieve CBSA FIPS codes and/or their associated metadata from a specific state.

geo_preview( 
  geo    = c("cbsa","cbsafips"), 
  within = "FL", 
  type   = "metro" )
#> 
#> 
#> |                                        cbsa| cbsafips|
#> |-------------------------------------------:|--------:|
#> |                             Gainesville, FL|    23540|
#> |                            Jacksonville, FL|    27260|
#> |           Panama City-Panama City Beach, FL|    37460|
#> |           Palm Bay-Melbourne-Titusville, FL|    37340|
#> |   Miami-Fort Lauderdale-West Palm Beach, FL|    33100|
#> |                             Punta Gorda, FL|    39460|
#> |                       Homosassa Springs, FL|    26140|
#> |                     Naples-Marco Island, FL|    34940|
#> |              Pensacola-Ferry Pass-Brent, FL|    37860|
#> |      Deltona-Daytona Beach-Ormond Beach, FL|    19660|
#> |                             Tallahassee, FL|    45220|
#> |         Tampa-St. Petersburg-Clearwater, FL|    45300|
#> |                                 Sebring, FL|    42700|
#> | Sebastian-Vero Beach-West Vero Corridor, FL|    42680|
#> |               Orlando-Kissimmee-Sanford, FL|    36740|
#> |                   Cape Coral-Fort Myers, FL|    15980|
#> |           North Port-Bradenton-Sarasota, FL|    35840|
#> |                                   Ocala, FL|    36100|
#> |                          Port St. Lucie, FL|    38940|
#> |      Crestview-Fort Walton Beach-Destin, FL|    18880|
#> |                   Lakeland-Winter Haven, FL|    29460|
#> |                   Wildwood-The Villages, FL|    48680|

Specifically, filtering on geography requires using GEOIDs (the FIPS codes associated with a specific level of geo aggregation). This is because (1) city and county names are non-unique, and (2) they are long and would be easy to misspell. The geo_preview() function offers a convenient way to identify all of the GEOIDs associated with your desired geography.

The code snippet above demonstrates that geo_preview() returns the names of all CBSAs and their associated GEOIDs. The within argument specifies the desired state, in abbreviated form, as input while the geo argument returns the specified columns. The type argument specified which type of geography is desired.

The geo argument allows uses to select a variety of columns from the geographic crosswalk file.

geo_preview(
  geo    = c("cbsa","county","cbsafips"), 
  within = "FL", 
  type   = "metro"  )
#> 
#> 
#> |                                        cbsa|                           county| cbsafips|
#> |-------------------------------------------:|--------------------------------:|--------:|
#> |                             Gainesville, FL|      Alachua County, FL, Central|    23540|
#> |                            Jacksonville, FL|       Baker County, FL, Outlying|    27260|
#> |           Panama City-Panama City Beach, FL|          Bay County, FL, Central|    37460|
#> |           Palm Bay-Melbourne-Titusville, FL|      Brevard County, FL, Central|    37340|
#> |   Miami-Fort Lauderdale-West Palm Beach, FL|      Broward County, FL, Central|    33100|
#> |                             Punta Gorda, FL|    Charlotte County, FL, Central|    39460|
#> |                       Homosassa Springs, FL|       Citrus County, FL, Central|    26140|
#> |                            Jacksonville, FL|         Clay County, FL, Central|    27260|
#> |                     Naples-Marco Island, FL|      Collier County, FL, Central|    34940|
#> |                            Jacksonville, FL|        Duval County, FL, Central|    27260|
#> |              Pensacola-Ferry Pass-Brent, FL|     Escambia County, FL, Central|    37860|
#> |      Deltona-Daytona Beach-Ormond Beach, FL|      Flagler County, FL, Central|    19660|
#> |                             Tallahassee, FL|     Gadsden County, FL, Outlying|    45220|
#> |                             Gainesville, FL|   Gilchrist County, FL, Outlying|    23540|
#> |         Tampa-St. Petersburg-Clearwater, FL|    Hernando County, FL, Outlying|    45300|
#> |                                 Sebring, FL|    Highlands County, FL, Central|    42700|
#> |         Tampa-St. Petersburg-Clearwater, FL| Hillsborough County, FL, Central|    45300|
#> | Sebastian-Vero Beach-West Vero Corridor, FL| Indian River County, FL, Central|    42680|
#> |                             Tallahassee, FL|   Jefferson County, FL, Outlying|    45220|
#> |               Orlando-Kissimmee-Sanford, FL|        Lake County, FL, Outlying|    36740|
#> |                   Cape Coral-Fort Myers, FL|          Lee County, FL, Central|    15980|
#> |                             Tallahassee, FL|         Leon County, FL, Central|    45220|
#> |                             Gainesville, FL|        Levy County, FL, Outlying|    23540|
#> |           North Port-Bradenton-Sarasota, FL|      Manatee County, FL, Central|    35840|
#> |                                   Ocala, FL|       Marion County, FL, Central|    36100|
#> |                          Port St. Lucie, FL|       Martin County, FL, Central|    38940|
#> |   Miami-Fort Lauderdale-West Palm Beach, FL|   Miami-Dade County, FL, Central|    33100|
#> |                            Jacksonville, FL|      Nassau County, FL, Outlying|    27260|
#> |      Crestview-Fort Walton Beach-Destin, FL|     Okaloosa County, FL, Central|    18880|
#> |               Orlando-Kissimmee-Sanford, FL|       Orange County, FL, Central|    36740|
#> |               Orlando-Kissimmee-Sanford, FL|     Osceola County, FL, Outlying|    36740|
#> |   Miami-Fort Lauderdale-West Palm Beach, FL|   Palm Beach County, FL, Central|    33100|
#> |         Tampa-St. Petersburg-Clearwater, FL|        Pasco County, FL, Central|    45300|
#> |         Tampa-St. Petersburg-Clearwater, FL|     Pinellas County, FL, Central|    45300|
#> |                   Lakeland-Winter Haven, FL|         Polk County, FL, Central|    29460|
#> |                            Jacksonville, FL|    St. Johns County, FL, Central|    27260|
#> |                          Port St. Lucie, FL|    St. Lucie County, FL, Central|    38940|
#> |              Pensacola-Ferry Pass-Brent, FL|   Santa Rosa County, FL, Central|    37860|
#> |           North Port-Bradenton-Sarasota, FL|     Sarasota County, FL, Central|    35840|
#> |               Orlando-Kissimmee-Sanford, FL|     Seminole County, FL, Central|    36740|
#> |                   Wildwood-The Villages, FL|       Sumter County, FL, Central|    48680|
#> |      Deltona-Daytona Beach-Ormond Beach, FL|      Volusia County, FL, Central|    19660|
#> |                             Tallahassee, FL|     Wakulla County, FL, Outlying|    45220|
#> |      Crestview-Fort Walton Beach-Destin, FL|       Walton County, FL, Central|    18880|
#> |           Panama City-Panama City Beach, FL|  Washington County, FL, Outlying|    37460|

Metropolitan and Micropolitan Data

Because CBSAs include combinations of metropolitan or micropolitan statistical areas, geo_preview() allows the user to select either unit using the type argument.

The following code snippet shows how to retrieve CBSA names and FIPS codes for all metropolitan statistical areas in Wyoming:

geo_preview(
  geo    = c("cbsa","cbsafips"), 
  within = "WY", 
  type   = "metro" )
#> 
#> 
#> |         cbsa| cbsafips|
#> |------------:|--------:|
#> | Cheyenne, WY|    16940|
#> |   Casper, WY|    16220|

In the above snippet, setting type to micro returns data for micropolitan statistical areas.

geo_preview( 
  geo    = c("cbsa","cbsafips"), 
  within = "WY", 
  type   = "micro" )
#> 
#> 
#> |             cbsa| cbsafips|
#> |----------------:|--------:|
#> |      Laramie, WY|    29660|
#> |     Gillette, WY|    23940|
#> |     Riverton, WY|    40180|
#> |         Cody, WY|    17650|
#> |     Sheridan, WY|    43260|
#> | Rock Springs, WY|    40540|
#> |   Jackson, WY-ID|    27220|
#> |  Evanston, WY-UT|    21740|

Exploring CSA FIPS

Core Based Statistical Areas (CBSAs) delineate the distinct metropolitan areas. The Combined Statistical Areas (CSA) geography aggregates the data further into metropolital regions where multiple cities function as a coherent entities, typically characterized by shared commercial and commuting zones.

Similar to how metro areas area comprised of a collection of counties, you can think of the Combined Statistical Areas as being comprised of collections of metropolitan and micropolitan areas.

939 Core-Based Statistical Areas =
    384 Metropolitan statistical areas +
    547 micropolitan statistical areas

175 Total Combined Statistical Areas:
    808 Metro + Micro Areas joined together to form CSAs
    123 Metro + Micro Areas are not part of any CSA

geo_preview() can retrieve metadata for Combined Statistical Areas (CSAs).

The code snippet below returns all CSA names and FIPS codes for metropolitan statistical areas in Virginia:

geo_preview(
  geo    = c("csa","csafips"), 
  within = "VA", 
  type   = "metro")
#> 
#> 
#> |                                            csa| csafips|
#> |----------------------------------------------:|-------:|
#> |                                               |      NA|
#> | Washington-Baltimore-Arlington, DC-MD-VA-WV-PA|     548|
#> |        Harrisonburg-Staunton-Stuarts Draft, VA|     277|
#> |               Virginia Beach-Chesapeake, VA-NC|     545|
#> |          Johnson City-Kingsport-Bristol, TN-VA|     304|

Filtering Legacy Data with County FIPS codes

After retrieving the desired CBSA/CSA FIPS codes, map_countyfips() can match them with county FIPS codes present in the legacy data, retrieved with get_data(). Downloaded data can then be filtered using these county FIPS codes, as shown below:

# Retrive CBSA FIPS from NY
cbsa_ny <- 
  geo_preview( geo = c("cbsa", "cbsafips"), 
               within = "NY" )
#> 
#> 
#> |                                  cbsa| cbsafips|
#> |-------------------------------------:|--------:|
#> |           Albany-Schenectady-Troy, NY|    10580|
#> |    New York-Newark-Jersey City, NY-NJ|    35620|
#> |                        Binghamton, NY|    13780|
#> |                             Olean, NY|    36460|
#> |                            Auburn, NY|    12180|
#> |                 Jamestown-Dunkirk, NY|    27460|
#> |                            Elmira, NY|    21300|
#> |                       Plattsburgh, NY|    38460|
#> |                            Hudson, NY|    26460|
#> |                          Cortland, NY|    18660|
#> | Kiryas Joel-Poughkeepsie-Newburgh, NY|    28880|
#> |               Buffalo-Cheektowaga, NY|    15380|
#> |                      Gloversville, NY|    24100|
#> |                           Batavia, NY|    12860|
#> |                        Utica-Rome, NY|    46540|
#> |               Watertown-Fort Drum, NY|    48060|
#> |                         Rochester, NY|    40380|
#> |                          Syracuse, NY|    45060|
#> |                         Amsterdam, NY|    11220|
#> |                           Oneonta, NY|    36580|
#> |                Massena-Ogdensburg, NY|    32390|
#> |                      Seneca Falls, NY|    42900|
#> |                           Corning, NY|    18500|
#> |                        Monticello, NY|    33910|
#> |                            Ithaca, NY|    27060|
#> |                          Kingston, NY|    28740|
#> |                       Glens Falls, NY|    24020|

# Map these to county FIPS codes
ny_countyfips <- 
  map_countyfips( geo.cbsafips = cbsa_ny$cbsafips )

# Pull core data for the year 2015
core_2015 <- 
  get_data( dsname         = "core",
            time           = "2015",
            scope.orgtype  = "NONPROFIT",
            scope.formtype = "PZ" )
#> Valid inputs detected. Retrieving data.
#> Downloading core data
#> Requested files have a total size of 115 MB. Proceed
#>                       with download? Enter Y/N (Yes/no/cancel)
#> Core data downloaded

# Filter with NY county FIPS
core_2015_nyfips <- 
  core_2015 %>% 
  dplyr::filter( FIPS %in% ny_countyfips )

print( as_tibble( core_2015_nyfips ))
#> # A tibble: 11,788 × 170
#>    NTEECC new.code   type.org broad.category major.group univ  hosp  two.digit
#>    <chr>  <chr>      <chr>    <chr>          <chr>       <lgl> <lgl> <chr>    
#>  1 N50    RG-HMS-N50 RG       HMS            N           FALSE FALSE 50       
#>  2 Y42    RG-MMB-Y42 RG       MMB            Y           FALSE FALSE 42       
#>  3 P012   <NA>       <NA>     <NA>           <NA>        NA    NA    <NA>     
#>  4 S47    RG-PSB-S47 RG       PSB            S           FALSE FALSE 47       
#>  5 N50    RG-HMS-N50 RG       HMS            N           FALSE FALSE 50       
#>  6 S41    RG-PSB-S41 RG       PSB            S           FALSE FALSE 41       
#>  7 J40    RG-HMS-J40 RG       HMS            J           FALSE FALSE 40       
#>  8 J40    RG-HMS-J40 RG       HMS            J           FALSE FALSE 40       
#>  9 B41    RG-UNI-B41 RG       UNI            B           TRUE  FALSE 41       
#> 10 M24    RG-HMS-M24 RG       HMS            M           FALSE FALSE 24       
#> # ℹ 11,778 more rows
#> # ℹ 162 more variables: further.category <int>, division.subdivision <chr>,
#> #   broad.category.description <chr>, major.group.description <chr>,
#> #   code.name <chr>, division.subdivision.description <chr>, keywords <chr>,
#> #   further.category.desciption <chr>, ntee2.code <chr>, EIN <int>,
#> #   ACCPER <chr>, ACTIV1 <chr>, ACTIV2 <chr>, ACTIV3 <chr>, ADDRESS <chr>,
#> #   AFCD <chr>, ASS_BOY <dbl>, ASS_EOY <int64>, BLOCK <chr>, BOND_BOY <dbl>, …

Conclusion

Using FIPS codes allows researchers working with NCCS data to standardize their operationalized geographic variables, resulting in greater external validity and reproducibility in their research.

More Stories
methods

nccsdata Part 4: Summary Tables

Part 4 of 4 data stories covering the nccsdata R package. This story focuses on summarising NCCS legacy data.

R packages
methods

nccsdata Part 2: NTEE Codes

Part 2 of 4 data stories covering the nccsdata R package. This story focuses on parsing NTEE codes.

R packages