This study uses the China Health and Nutrition Survey (CHNS), a multi-wave ongoing longitudinal survey conducted jointly by the Carolina Population Center and China’s Ministry of Health. The CHNS dataset available to us consists of seven waves (1989, 1991, 1993, 1997, 2000, 2004, and 2006). It employs a multistage random cluster sampling process to draw households from urban (and rural) areas in nine geographically and economically different provinces: Guangxi, Guizhou, Heilongjiang, Henan, Hubei, Hunan, Jiangsu, Liaoning, and Shandong. This dataset provides detailed socio-economic, nutrition, and health information of households and individuals. In addition, it includes separate community questionnaires collecting information (e.g., accessibility) on community built environment or resources, such as transit, recreation facility, public space, school/nursery, hospital, supermarket, wet market, fast food restaurant, etc., at the neighborhood (i.e., the administrative unit of residential committees, typically called “juweihui” in Chinese) level. Most built environment variables are only available for the last three waves. A comprehensive review of the CHNS dataset can be found in the description by Popkin et al. .
The urban subsample of CHNS includes communities from urban districts of prefecture-level cities and towns that are county seats. A total of 9,543 individuals from 2,473 households and 86 neighborhoods are identified in this multi-wave subsample. In the urban subsample, slightly less than half of the observations (46% of individuals and 49% of neighborhoods) are from prefecture-level cities, with the rest from county-level cities. The neighborhoods vary in size of their areas, with a median of 1.1 km2, a 25 percentile of 0.5 km2, and a 75 percentile of 3 km2. The population sizes of neighborhoods also vary, with a median of 2,400, a 25 percentile of 1,470, and a 75 percentile of 3,787.
This study uses longitudinal data of school-age children (between 6 and 18)a and their households and neighborhoods from the 2004 and 2006 waves, when neighborhood food environment data are available. To rule out likely outlier or potentially misidentified urban neighborhoods, analyses are restricted to neighborhoods within 100 minutes of bicycling distance from nearest major medical facility (>99% of the full urban household sample), and within 25 km from a park (96% of the full urban household sample).b The overall attrition rate of urban individuals is 17% from 2004 to 2006. Among 997 observed urban children aged between 6 and 18 in the waves of 2004 and 2006, there are 373 aged between 6 and 18 in 2004, among which 303 were followed in 2006, indicating an attrition rate of about 19%. However, the dataset does suffer some loss from missing values. After we dropped observations who ever answered “don’t know” or “refused to answer” in our variable list, the longitudinal model is left with 185 children. Food consumption data have been collected for three consecutive days, and individuals’ food consumption information were then used to calculate the nutrient values (total caloric intake, total protein intake, total fat intake and total carbohydrate intake) based on the 2004 Food Composition Table for China .
Compared to data used in existing literature, the significant potential and advantage of the CHNS data can be summarized in two aspects. First, it offers longitudinal data, providing a chance for identifying causal effects. As previously mentioned, very few of the existing studies on the relationship between land use/built environment and behavior/health use randomized longitudinal datasets. Second, the data offers high quality income information on each individual, representing a significant advance in the measurement of income in China . Questions on income and time allocation probe for any possible activity each person might have engaged in during the previous year, both in and out of the formal market. Information on state-subsidized housing is gathered from respondents to generate in-kind income, so that full income from market and non-market activities is imputed, and is adjusted by provincial consumer price indexes.
Model and key variables
The focus of this study is to quantify how neighborhood environment affects nutritional intake of Chinese urban children aged 6–18. The independent effect of a change in neighborhood environment (treatment, T) on individual i is defined by Yi,t0,T − Yi,t0, which is counterfactual and unobservable. The difference-in-differences (DID) method is employed to estimate individual average treatment effects using the two-wave longitudinal data from CNHS.
The DID model can be summarized as ΔYit = ΔXitβ + ΔZitγ + Δuit, with t = 1, 2, where uit is a time-varying household/individual error. Yit includes the nutritional intake variables of the child i at time t. Xit is a vector of observable individual/household characteristics. Zit is a vector of food environment variables of the household’s neighborhood, including three variables indicating the neighborhood densities of supermarkets, wet markets, and fast food restaurants, as obtained from the questionnaire items in the CHNS community survey (“how many supermarkets/wet markets/fast food restaurants are there in your neighborhood?”). As access to vehicular transportation has been shown to be linked with access to vegetable and fruits , we include two transportation-related variables in our covariates (Xit) list: whether the neighborhood has any bus stop and whether the household owns a vehicle. In addition, since the choice of wet markets as a shopping destination is linked with affordability [7, 26], we interact the household income variable with the neighborhood wet market density to see whether household income affects the association between wet market access and the child’s nutritional intake. The other interaction term in the models is the interaction between being under the age of 12 and the fast food restaurant density, to account for the possible difference in fast food consumption between younger children and adolescents.
As a comparison, ordinary least squares (OLS) regressions using the 2004 and 2006 cross-sectional samples are conducted to illustrate how different between focusing on the longitudinal dimension (e.g., the within differences) of the data and running cross-sectional regressions. We add maternal education (in years) and the child’s gender to the OLS models, as these are important predictors of the child’s health behavior that are unlikely to vary between 2004 and 2006 (i.e., the individual fixed effects in DID models account for such time-invariant variables). This helps us identify whether unobserved heterogeneity exists and affects the results. Considering the relatively limited attrition (due to moving, death, etc.) rate, selection bias is assumed to be a minor concern in this analysis.
To better understand the possible difference between supermarkets and wet markets in terms of children’s nutritional intake, we also test the hypothesis that the number of supermarkets within five kilometers has a different effect from that of the number of wet markets within five kilometers. We do not test the difference between the fast food restaurant density and the other two food environment variables since we believe that restaurants belong to a qualitatively different food outlet category, not directly comparable to wet markets and supermarkets.