Using ROPScore and CHOP ROP for early prediction of retinopathy of prematurity in a Chinese population

Purpose Retinopathy of prematurity (ROP) is a disease that causes vision loss, vision impairment, and blindness, most frequently manifesting among preterm infants. ROPScore and CHOP ROP (Children’s Hospital of Philadelphia ROP) are similar scoring models to predict ROP using risk factors such as postnatal weight gain, birth weight (BW), and gestation age (GA). The purpose of this study was to compare the accuracy and difference between using ROPScore and CHOP ROP for the early prediction of ROP. Methods A retrospective study was conducted from January 2009 to December 2019 in China. Patients eligible for enrollment included infants admitted to NICU at ≤32 weeks GA or those with ≤1500 g BW. The sensitivity and specificity of ROPScore and CHOP ROP were analyzed, as well as its suitability as an independent predictor of ROP. Results Severe ROP was found in 5.0% of preterm infants. The sensitivity and specificity of the ROPScore test at any stage of ROP was 55.8 and 77.8%, respectively. For severe ROP, the sensitivity and specificity was 50 and 87.0%, respectively. The area under the receiver operating characteristic curve for the ROPScore for predicting severe ROP was 0.76. This value was significantly higher than the values for birth weight (0.60), gestational age (0.73), and duration of ventilation (0.63), when each was category measured separately. For the CHOP ROP, it correctly predicted infants who developed type 1 ROP (sensitivity, 100%, specificity, 21.4%). Conclusions The CHOP ROP model predicted infants who developed type 1 ROP at a sensitivity of 100% whereas ROPScore had a sensitivity of 55.8%. Therefore, the CHOP ROP model is more suitable for Chinese populations than the ROPScore test. Clinical registration number and STROBE guidelines This article was a retrospective cohort study and reported the results of the ROPScore and CHOP ROP algorithms. No results pertaining to interventions on human participants were reported. Thus, registration was not required and this study followed STROBE guidelines.


Background
Retinopathy of prematurity (ROP) is a disease that causes vision loss, vision impairment, and blindness, most frequently manifesting among infants with low birth weight (BW) and poor health status. The survival of preterm infants has increased in the last few decades due to the rapid improvement in neonatal intensive care. Consequently, the incidence of ROP has increased, particularly in newly industrialized countries, comprising of a "third epidemic." [1] The reported incidence of ROP that requires treatment varies from 0 to 34.8%, [2][3][4][5] depending on local neonatal care quality and characteristics of each individual patient.
The development of ROP is associated with multiple risk factors. Early gestation age (GA) and low BW are two of the most important risk factors. Other factors include blood transfusion, mechanical ventilation, anemia, respiratory distress, dyspnea, and poor health. Several screening guidelines of ROP based on GA and BW have been introduced for neonatologists to use in identification of preterm neonates who are at ≤32 weeks GA, or BW ≤1500 g. Risk criteria for preterm neonates include neonates at ≤32 weeks GA or with a BW ≤1500 g. An infant with a very unstable clinical course can also be identified to be of high risk for developing ROP, indicating a need for ophthalmology screening [6]. Challenges in identifying ROP in preterm neonates includes complying with screening guidelines, the expense of timely screenings, potential neurologic and cardiopulmonary side effects of dilated fundus examinations, and the large amount of work required by health professionals. Therefore, a more feasible methodology is necessary to identify infants who require ROP screening.
The ROPScore proposed by Eckert et al. is a scoring system that can be used to predict the severity of ROP [7]. This algorithm utilizes the following predictive variables: birth weight, gestational age, blood transfusion, mechanical ventilation and proportional weight gain at the sixth week of life. The score is calculated in the sixth week of life by use of a spreadsheet. A high score indicates that the infant has a high risk of developing severe ROP [8]. CHOP ROP (Children's Hospital of Philadelphia ROP) used postnatal weight gain, BW, and GA in their ROP prediction model in a cohort of infants, which meets current ROP screening guidelines [9].
As far as known to the authors, only a few studies have validated this screening tool [8][9][10][11]. These studies were retrospective analyses of the efficacy of the ROPScore in American, Canadian, Italian and Brazilian populations. The purpose of this study was to evaluate the use of the ROPScore and CHOP ROP models to predict ROP in a Chinese population.

Patient population
Patients eligible for enrollment included infants admitted to the NICU at GA ≤32 weeks or BW ≤1500 g. Infants with any of the following were excluded from this study: genetic metabolic diseases, congenital major abnormalities, and infants who died before the sixth week after birth.

Weight measurements
Follow standard clinical procedures for all infants and weight measurements were conducted weekly from birth to discharge. These measurements were repeated again at a GA of 40 weeks [12].

ROPScore screening
ROPScore Screening was conducted in the sixth week of life with a Microsoft Excel spreadsheet (Microsoft, Redmond, WA, USA), as suggested by Eckert et al. [7] This algorithm utilizes the following predictive variables: birth weight, gestational age, blood transfusion, mechanical ventilation and proportional weight gain at the sixth week of life [7]. The score is determined by linear regression, which takes into account the effect of each variable towards the onset of ROP.

CHOP ROP screening
Binenbaum et al. developed a simpler logistic regression based model named PINT ROP [13]. The PINT ROP cohort was at a high risk for ROP. Therefore, the investigators applied the same modeling approach to a low risk cohort, which is more representative of the current US ROP screening criteria (BW < 1501 g), to develop an updated model called CHOP ROP [9]. Data was collected from medical records and entered into a web-based database, consisting of BW, GA, weight gain rate measurements, detailed demographics, ophthalmologic and medical data. Data quality was ensured through implementing data input verification rules, data review and discrepancy checking algorithms, and investigation and analysis of all tag values [11,14].

ROP screening and classification
ROP screening was performed for all extremely preterm infants by qualified ophthalmologists with expertise in ROP in accordance with the Chinese guidelines for the examination and treatment of ROP [15]. The choice to conduct additional ROP screening was determined according to the results of the initial screening. Termination of ROP screening was determined according to vascular development in the retina or up to 45 weeks of corrected GA [15]. ROP was subdivided into stages 1-5 based on the International Classification of ROP [16]. Mild ROP was defined as having stage 1 or stage 2 ROP in zone II or III without plus disease [12]. Type 1 ROP was defined as any stage ROP in zone I with plus disease; stage 3 ROP in zone I without plus disease; or stage 2 or 3 ROP in zone II with plus disease [17]. Type 2 ROP was defined as stage 1 or 2 ROP in zone I without plus disease; or stage 3 ROP in zone II without plus disease [17]. Severe ROP was defined as any prethreshold, any stage 3, or any threshold ROP [12].

Clinical data collection
The following clinical data was collected: age, sex, gestational age, birth weight, number of blood transfusion, weekly weight measurements, days of mechanical ventilation and oxygen administration, ROP examination results, and the incidences of necrotizing enterocolitis (NEC), bronchopulmonary dysplasia (BPD), intraventricular hemorrhage (IVH), and sepsis. Diagnosis of ROP was conducted by pediatric ophthalmologists. Evaluations of ROP were judged as follows: none, immature, or mature vascularization. Staging of disease was performed in accordance with the International Classification of ROP [18,19].

Statistical analysis
SPSS software version 19.0 (SPSS, Inc., Chicago, IL, USA) was used for statistical analysis and data management. Maternal and infant characteristics were analyzed using descriptive methods and compared using t-test or one-way ANOVA (> 2 levels) for continuous variables and the chi-squared test for categorical variables. Receiver operating characteristic (ROC) curves were used to assess the accuracy of the continuous values of the ROPScore and CHOP ROP model to predict severe ROP. ROPScore was used as a dependent variable in conducting multiple linear regressions. The independent variables used in multiple logistic regression analysis were based on significant correlations and significant non-parametric univariate analyses. For severe ROP, these variables were: BW, GA, duration of ventilation, sepsis, and weight gain at the sixth week of life. The statistical significance level was set at p < 0.05.

Baseline characteristics
In this study, 3624 children were screened for ROP and underwent weekly weight measurements. The ROPScore and CHOP ROP model was developed for infants with GA ≤32 weeks at birth or BW ≤1500 g. 37 infants were excluded due to incomplete weight data or because they had pathological conditions. Thus, 3587 infants born at GA ≤32 weeks or with BW ≤1500 g were included in this study. The prevalence at any stage of ROP was 372/3587 infants (10.4%). 192 preterm infants developed type 2 ROP (5.4%) and 180/3587 developed type 1 ROP that required treatment (5.0%). The baseline demographics and clinical characteristics for this cohort are shown in (Table 1). The weight gain rate was much lower in the type 1 or type 2 ROP groups compared to the group with no ROP (p < 0.001 respectively).

ROPScore outcomes
The accuracy of ROPScore in predicting ROP in our participants was determined by the ROC curve (Fig. 1). Sensitivity and specificity were obtained for continuous score values by using cut-off points. The range of ROP-Score values was 7.2 to 19.6. The optimal cut-off point established for any stage of ROP was 12.3 (55.8% sensitivity and 77.8% specificity), whereas the optimal cut-off point for severe ROP was 13.3 (50.0% sensitivity and 87.0% specificity).
The areas under the ROC curve for the ROPScore were 0.70 and 0.76 to predict any stage of ROP and severe ROP, respectively. The area value of severe ROP was significantly higher for ROPScore than the areas for BW (0.60), GA (0.73), and duration of ventilation (0.63), when measured separately ( Table 2).

ROPScore and infant characteristics
Multivariate logistic regression analysis showed that BW, GA, duration of ventilation, number of blood transfusions, and weight gain at the sixth week of life were risk factors for ROP. ROPScore had less tendency of predicting ROP. The unadjusted coefficient was 0.064, with an odds ratio of 1.07 at a 95% confidence interval (CI, 1.03 to 1.11). The adjusted coefficient was 1.088 with an odds ratio of 2.97 at 95% CI (0.84 to 10.45) ( Table 3).

CHOP ROP model outcomes
The infants who developed type 1 ROP were correctly predicted by the CHOP ROP model (sensitivity, 100%), but with a low specificities of 21.4% from birth to six weeks of life, 41.2% in the third week, 36.9% in the fourth week, 32.6% in the fifth week, and 38.0% in the sixth week. These results are summarized in (Table 4).

Eckert et al. developed a relatively uncomplicated model for predicting ROP in preterm infants, known as
ROPScore [7]. The model is implemented using an Excel spreadsheet, which is comprised of a logistic regression equation used to calculate risk. The model includes continuous rather than dichotomized terms for BW and GA, weight gain at a single time point (6 weeks postnatal age) as a proportion of BW, dichotomous terms for blood transfusion and the use of oxygen in mechanical ventilation during the first 6 weeks of life. Assuming a specific cut-off level for low or high risk cases, ROP-Score had a sensitivity of 98% and specificity of 56% for predicting ROP cases that required treatment in a cohort of 474 Brazilian infants [7]. In the present study, ROP-Score had a sensitivity of 50% and specificity of 87% for predicting ROP cases that required treatment in a cohort of 3587 Chinese infants. These findings suggest that ROPScore should not be used to determine overall screening criteria. Instead, it should be used to reduce the frequency of exams in low-risk infants [7]. The poor performance of postnatal weight gain ROP models in countries with developing neonatal care systems may be related to differences in ROP  pathophysiology, particularly in older GA infants. At older post-menstrual ages, endogenous production of insulin-like growth factor-1 (IGF-1) has already increased, such that low IGF-1 may play a smaller role in the pathogenesis of severe ROP [20]. In contrast, ROP in such infants might be driven primarily by high oxygen exposure, which has been shown to cause inhibition of vascular endothelial growth factor and retinal blood vessel destruction in oxygen-induced animal models of ROP. Notably, other predictive models currently undergoing testing in ROP also have limitations. For example, WINROP [21] was proposed for use in European populations and has been validated by several studies [12,[22][23][24], which have shown robust effectiveness in predicting ROP. However, some studies have shown that this score does not perform well in underdeveloped countries, in which moderate and late preterm infants can also develop ROP [25,26]. We validated the CHOP ROP model in a large cohort of Chinese infants. The size of the cohort, including 180 infants who developed severe ROP, allowed us to estimate the sensitivity of the model with a high degree of precision. In this study, it was showed that the CHOP ROP model can be applied clinically to reduce the number of infants requiring examinations by one-third. No infants with type 1 ROP were excluded (sensitivity, 100%) using this model, which showed higher sensitivity compared to the evaluation of North American infants (sensitivity, 98.5%) [11]. Therefore, the CHOP ROP model could be used with confidence, ensuring that all infants with type 1 ROP are identified. The model can also be used to guide a modified screening schedule to reduce the number of examinations for lower-risk, older-GA infants.
In China, the prevalence of ROP varies according to the region, level of neonatal care, and access to ophthalmologic screening programs. Importantly, blindness caused by ROP can be prevented with timely screening [27]. The CHOP ROP and ROPScore models are useful for predicting ROP. Scoring systems have become widely used in neonatology, including neonatal intensive care, in order to aid in the detection of comorbidities. Predictive algorithms represent promising and appropriate tools that can be used to identify preterm infants at risk of developing severe ROP, as well as to reduce the excessive number of examinations performed for each preterm infant [28]. The CHOP ROP model was more sensitive than ROPScore for predicting type 1 ROP. The introduction of predictive algorithms remains in the preliminary phase and it should be emphasized that the goal is not to replace current screening guidelines. Rather, these tools can be used to help reduce the incidence of missed diagnoses of ROP [29,30].
Regardless of the positive aspects of these predictive algorithm, there are also limitations in clinical application. First, ROPScore calculation uses preterm weight only at the sixth week of life. Hence, this test may be unable to detect high-risk preterm infants in which aggressive posterior ROP begins prior to weight measurement, then evolves rapidly [30]. Moreover, early hospital discharge of preterm infants who show robust growth is another factor that contributes to failure in collecting

Conclusion
We demonstrated that the ROPScore and CHOP ROP models were an effective, promising, and noninvasive screening tool for the prediction of ROP in a Chinese population of preterm infants. The results obtained by Eckert et al. [7] were compatible with the results obtained in the present cohort regarding high sensitivity. With regard to ROPScore cut-off points, we adjusted the values for use in a Chinese population (12.3 and 13.3, for any stage of ROP and severe ROP, respectively), similar to the cut-off points used in the original study [7]. This suggests that the cut-off points would have been sufficient to detect all preterm infants with severe ROP. However, the sensitivity was lower than that reported by Eckert et al. [7] Thus, the ROPScore may need optimization for the Chinese population. The sensitivity of CHOP ROP model was higher in our study than when applied to North American infants reported by Binenbaum et al. Therefore, the CHOP ROP model may more appropriate for the Chinese population.