USA-COVID-state-level-air-pollution-SARIMA-analysis

This repository contains the code used for analysis in our work on the short term air pollution changes in the USA. The preprint can be found here.

The code uses bootstrapped seasonal autoregressive time series models to make counterfactual predictions for pollutant concentrations based on historical data. These predictions are compared to the the actual pollutant concentrations to estimate the corresponding change during the pandemic.

File description:

confounders_all.csv contains the temperature, precipitation and relative humidity values for each state as a function of time.
data_alltimeno2.csv contains the measured NO2 concentrations for each state as a function of time.
data_alltimepm25.csv contains the measured PM2.5 concentrations for each state as a function of time.
df_regions.csv contains the region designations for each state.
pop_density_census2010.csv contains the population density (per square mile) of each state based on the 2010 census. Sourced from: https://web.archive.org/web/20111028061117/http://2010.census.gov/2010census/data/apportionment-dens-text.php
state_policy_changes_1.csv contains the dates of different covid-related interventions for each state.
df_change_pm25.csv and df_change_no2.csv are obtained as outputs of the SARIMA analysis performed using the codes Rcode_PM25_figure1.R and Rcode_NO2_figure1.R respectively. These files contain the estimated change in pollutant concentrations following the state of emergency declaration in each state, based on our analysis.
NEI_sector_report.zip This file needs to be unzipped prior to running Rcode_regression_NEI_WLS.R as it is too large to be stored uncompressed on github. Running unzip NEI_sector_report.zip will unzip the file. This data is sourced from the EPA National Emission Inventory report
df_box_NO2_2019_2020_Jan_Apr.csv is obtained by merging from the outputs of the SARIMA models ran for 2019 (year of prediction) and 2020 (year of prediction) seperately. This file is required to create Figure 4, which is crated by running Figure4_NO2.R.
df_box_PM25_2019_2020_Jan_Apr.csv is obtained by merging from the outputs of the SARIMA models ran for 2019 (year of prediction) and 2020 (year of prediction) seperately. This file is required to create Figure 4, which is crated by running Figure5_PM2.5.R.

How to run the code:

Development Environment

We have included the file covid_sarima_env.yml which can be used to create a conda environment with all of the packages required to run the code in this project. To create the environment, install conda if you don't have it already then run:

conda env create -f covid_sarima_env.yml

from the project directory

Running the Code

Prior to running the code, this repo should be cloned locally. The command to clone the repo is:

git clone [email protected]:poojatya/USA-COVID-state-level-air-pollution-SARIMA-analysis.git

We recommend executing the code files in this repo using the Rscript command.

Note: Currently the num_resamples value in all files that contain it is set to 10 for testing purposes. To reproduce the papers' results, it will need to be changed to 1000.

The code should be run in the following order:

For NO2,

Step 1. run Rcode_NO2_figure1.R (you can reduce the number of bootstraps to 10 by changing num_resamples = 1000 to num_resamples = 10. this will save hours of run time).
Step 2. save the dataframe df_change_no2 that is generated in this code as df_change_no2.csv. (you can also use df_change_no2.csv file that is in this repo).

For PM2.5,

Step 3. run Rcode_PM25_figure1.R (you can reduce the number of bootstraps, num_resamples = 10 by changing num_resamples = 1000 to num_resamples = 10. this will save hours of run time).
Step 4. save the dataframe df_change that is generated in this code as df_change_pm25.csv. (you can also use df_change_pm25.csv file that is in this repo).

Finally regression,

Step 5. run the regression analysis Rcode_regression_NEI_WLS.R
Note: This code performs regression analysis for both NO2 and PM2.5. If you wish to only run the code for one pollutant, you will need to comment out the analysis of the other pollutant to avoid getting an error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

USA-COVID-state-level-air-pollution-SARIMA-analysis

File description:

How to run the code:

Development Environment

Running the Code

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
figures		figures
.gitignore		.gitignore
Figure4_NO2.R		Figure4_NO2.R
Figure5_PM2.5.R		Figure5_PM2.5.R
LICENSE		LICENSE
NEI_sector_report.zip		NEI_sector_report.zip
README.md		README.md
Rcode_2019_Prediction-No2.R		Rcode_2019_Prediction-No2.R
Rcode_2019_Prediction_PM25.R		Rcode_2019_Prediction_PM25.R
Rcode_NO2_figure1.R		Rcode_NO2_figure1.R
Rcode_PM25_USmap_figure3.R		Rcode_PM25_USmap_figure3.R
Rcode_PM25_figure1.R		Rcode_PM25_figure1.R
Rcode_regression_NEI_WLS.R		Rcode_regression_NEI_WLS.R
code_MASE_figS6.R		code_MASE_figS6.R
confounders_all.csv		confounders_all.csv
corr_mobile_stationary_sources.R		corr_mobile_stationary_sources.R
covid_sarima_env.yml		covid_sarima_env.yml
data_alltimeno2.csv		data_alltimeno2.csv
data_alltimepm25.csv		data_alltimepm25.csv
df_box_NO2_2019_2020_Jan_Apr.csv		df_box_NO2_2019_2020_Jan_Apr.csv
df_box_PM25_2019_2020_Jan_Apr.csv		df_box_PM25_2019_2020_Jan_Apr.csv
df_change_no2.csv		df_change_no2.csv
df_change_pm25.csv		df_change_pm25.csv
df_regions.csv		df_regions.csv
pop_density_census2010.csv		pop_density_census2010.csv
state_policy_changes_1.csv		state_policy_changes_1.csv

License

NSAPH/USA-COVID-state-level-air-pollution-SARIMA-analysis

Folders and files

Latest commit

History

Repository files navigation

USA-COVID-state-level-air-pollution-SARIMA-analysis

File description:

How to run the code:

Development Environment

Running the Code

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages