J Cancer 2021; 12(15):4561-4573. doi:10.7150/jca.58887

Research Paper

Rapid multi-dynamic algorithm for gray image analysis of the stroma percentage on colorectal cancer

Tengfei Li1,2#, Zekuan Yu3,4 Corresponding address#, Yan Yang2#, Zhongmao Fu1#, Ziang Chen5, Qi Li6, Kundong Zhang1, Zai Luo1, Zhengjun Qiu1, Chen Huang1 Corresponding address

1. Department of Gastrointestinal Surgery, Shanghai General Hospital, Shanghai Jiao Tong University, Shanghai 201600, China.
2. Graduate School of Bengbu Medical College, Bengbu 233000, China.
3. Academy for Engineering and Technology, Fudan University, Shanghai 200433, China.
4. Key Laboratory of Industrial Dust Prevention and Control & Occupational Health and Safety, Ministry of Education.
5. School of Medical Instrument and Food Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China.
6. Department of Medical Oncology, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai 200021, China.
#These authors contributed equally to the work.

This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/). See http://ivyspring.com/terms for full terms and conditions.
Li T, Yu Z, Yang Y, Fu Z, Chen Z, Li Q, Zhang K, Luo Z, Qiu Z, Huang C. Rapid multi-dynamic algorithm for gray image analysis of the stroma percentage on colorectal cancer. J Cancer 2021; 12(15):4561-4573. doi:10.7150/jca.58887. Available from https://www.jcancer.org/v12p4561.htm

File import instruction


Background: Tumor stroma percentage (TSP), as an independent, low-cost prognostic factor, could complement current pathology and act as a more feasible risk factor for prognosis. However, TSP hadn't been applied into TNM staging. Here, the objective of our study was to investigate the prognostic significance of TSP in a robust rapid multi-dynamic approach with the application of MATLAB and threshold Algorithm for Gray Image analysis.

Methods: Using a retrospective collection of 1539 CRC patients comprising three independent cohorts; one SGH cohort (N=996) and two validation cohorts (N =106, N= 437) from 2 institutions. We investigated 996 CRC of no special type. According to our established thresholds, 357 cases (35.84%) were classified as TSP-high and 639 cases (64.16%) as TSP-low. We determined the gray image area as the stromal part of the WSI and calculated the stroma percentage with our proposed method on MATLAB software.

Results: In both TSP-cad(50%) and TSP-cad(median), multivariate analysis showed the TSP-cad was an independent prognostic factor for the vessel invasion and tumor location. For OS, TSP-manual HR=1.512 (95% CI 1.045-2.187); TSP-cad HR=1.443 (95% CI 0.993-2.097) and TSP-cad(median) HR=1.632 (95% CI 1.105-2.410). Fortunately, TSP-manual and TSP-cad were also found independent prognostic factor in all the cohorts. It was found that TSP-cad had slightly higher HR and wider CI than TSP-manual.

Conclusions: Our research showed that TSP was an independent prognostic factor in CRC. Moreover, threshold algorithm for the quantitation of TSP could be established. In conclusion, with this Rapid multi-dynamic threshold Algorithm for Gray Image counting of TSP, which showed a higher accuracy than manual evaluation by pathologists and could be a practical method for CRC to guide clinical decision making.

Keywords: Colorectal cancer, Tumor stroma percentage, Rapid multi-dynamic, Threshold algorithm, Gary image


Colorectal cancer (CRC) is the third most common malignancy worldwide and the third leading cause of cancer-related mortality. The latest cancer statistics of the United States in 2020 showed that the estimated new cases of CRC accounted for nine percent in male population with cancer and eight percent in female with cancer. Overall, for all the solid tumors, CRC was ranked the forth in the morbidity and the second in mortality [1], in which roughly one fourth of the patients were diagnosed with stage II cancer. According to the tumor-node-metastasis (TNM) system of the American Joint Committee on Cancer (AJCC) classification, histological subtype was commonly applied in the staging of CRC [2]. However, the TNM system was proved insufficient to predict the prognosis of patients with stage II CRC [3]. There were some limitations in this classification since prognosis of patients in the same stage varied greatly [4,5]. NIH guidelines recommend chemotherapy for potential high-risk stage II CRC, actually 5-year relapse-free survival (RFS) ranged from 44% to 83% in the stage III CRC [6-8]. Due to early diagnosis and treatment, the mortality rate of CRC was declining. Pathologists usually visually evaluated tumor stroma percentage (TSP) on hematoxylin-eosin (H&E) stained sections under a microscope [9]. There was still an increasing awareness of the clinical importance of evaluating TSP, also known as tumor stroma ratio (TSR) on postoperative H&E-stained sections [10]. It was crucial to optimize risk stratification through personalized treatment to prevent undertreatment and overtreatment. The computer-aided detection (CAD) systems had been developed to quantify stroma percentage. Similar methods using Deep Learning CAD in CRC have shown better predictive power than expert human visual assessment [11]. Therefore, there is an urgent need to apply semiautomatic software to rapidly analyze a large number of pathological sections and to complete current TNM staging in order to predict the prognosis of tumor patients.

Nowadays, it has been well acknowledged that the occurrence, growth and development of tumor is depended on the tumor microenvironment, and the tumor stromal is an indispensable part of the tumor microenvironment [12]. Recently, attention has been attached to TSP's promising potential role in the prognosis of various tumor types. Therefore, TSP has become an important prognostic indicator for different tumor types [13-16].

Based on tumor-associated stroma, a promising prognostic parameter is the TSP. Previous studies further confirmed that TSP, as an independent, low-cost prognostic factor, was capable of completing current pathology which was reported as a more feasible risk factor for prognosis. The TSP was a prognostic tool that stratified tumors into TSP-low and TSP-high based on the quantity of stroma percentage in H&E-stained sections of specimens and it was proved to be a strong and independent prognostic parameter [10,17]. It was shown that high stroma tumor was of poor prognosis in CRC as well as in other solid epithelial tumors [18-21], whereas low stroma tumor was predicted with a more favorable outcome. So far, manual visual assessment, including open source software semi-automatic assessment, was commonly used for evaluating the stroma percentage in clinic [11,22,23]. Previous studies shown that TSP now could be evaluated by manual visual and semi-automatic methods [11,14,24,25]. Tumor area was manually annotated in H&E-stained whole slide images (WSI) and thus a digital image feature of pathology was constructed. Quantitative features were extracted and reduced from the selected patches of tumor cell dense area [26]. Furthermore, a deep learning-based algorithm could perform automated TSP assessment of the CRC subclass of rectum adenocarcinomas by the developed CNN (Convolutional Neural Networks) [11,27].

Unfortunately, the quantification of stroma percentage through manual visual assessment was mainly depended on the intra- and inter-observer variability. Only a small portion of malignant tumors were evaluated, which weren't capable of representing all the malignant tumors. Their overall accuracy in judging the stroma percentage was often limited. Besides, assessment of TSP wasn't widely adopted as a prognostic variable in CRC due to its lack of standardization. Therefore, we need a reliable semi-automatic method to enhance the competence of current treatment strategies to predict TSP in CRC. Based on supervised machine learning and pixel classification, a rapid multi-dynamic threshold algorithm which included image-based semi-automatic approach with open source software MATLAB (R2018b (V9.5); https://www.mathworks.com/), was built, trained and validated by using H&E-stained sections of specimens recently. This method had obtained good results by using robust rapid multi-dynamic algorithm.

By applying our semi-automatic method to select areas from whole-slide scanning images, it is ultimately intended to investigate and quantify the rapid multi-dynamic algorithm approaches in the stroma percentage of CRC stratified by the TSP, which may have increased prognostic significance of TSP.

Materials and methods

Patients' materials

The Shanghai General Hospital (SGH) cohort (N=996) were identified from a retrospectively collected database of patients undergoing surgery for CRC at the Shanghai General Hospital between 2014 and 2018. None of the patients had received neoadjuvant chemotherapy or radiotherapy and no mortality within 30 days of surgery. Furthermore, without formal colonic or rectal resection, patients who received endoscopy treatment were excluded from the research. Patients were staged clinically according to the 8th edition of American Joint Committee (AJCC) TNM classification. Clinicopathological characteristics were diagnosed and confirmed by two independent pathologists according to the guidelines of the AJCC on Cancer, and were presented in Table 1. Written informed consent was obtained from each patient before enrolling in the study. The study was approved by the Ethical Committee for Clinical Research of Shanghai General Hospital. We used computer-generated random numbers to assign 703 patients to the training cohort and 293 patients to the testing cohort. Besides, for the internal validation Tissue Microarray (TMA) cohort, an additional 106 paired CRC were collected from patients diagnosed with CRC at the General Surgery Department of Shanghai General Hospital from 2013 to 2014. All specimens, to construct the TMA, were paraffin-embedded, validated by H&E staining, and finally examined by two independent pathologists. For the external validation, The Cancer Genome Atlas (TCGA) cohort, an additional 437 patients with CRC stage I-Ⅳ were selected from the TCGA dates (https://portal.gdc.cancer.gov/). Two independent validation cohort of 543 consecutive patients were included using the same criteria as those of the SGH cohort to validate the predictive performance of the cohort. Detailed characteristics of patients for TMA cohort and TCGA cohort could be found in Supplementary Table 1 and Supplementary Table 2, respectively.

H&E-stained sections scanning and imaging

H&E-stained sections were scanned using KF-PRO series automatic digital slice scanning system under 10× magnification. Each raw scanning whole-slide image (WSI) was annotated by a pathologist highlighting the region of interest (ROI) (Fig. 1WSI) using the digital slice reading software K-Viewer (Konfoong Biotech, NB, China,, which included the tumor and stroma. This software could be employed here to view the scanning images and to realize image zooming, annotation, tagging and other functions, which was compatible with local computers and networked mobile reading modes. From each whole-slide scanning image, three smaller images were extracted from the ROI. These annotations were adopted to calculate the map area for comparing the proportion of tumor and stroma in each section by using the open source image process software MATLAB. Therefore, three area representative of the tumor invasive fields of 1496 ×996 pixels (40× field) were selected (Fig. 1A, B and C). An area was examined under ×4 magnification to ensure that tumor cells was presented for an all-round view. For the multiple available parts, it was to give out the score of each part and to compute the average. A total of 2298 images were extracted from 996 WSIs. Each extracted image of the 40× fields was based on the annotations of an experienced pathologist on H&E-stained sections. Next, the selected imaging was analyzed by MATLAB. Since whole-slide scanning images contained various cells/tissues (e.g. fat, muscle, lymphocyte infiltrations, necrosis, healthy epithelium, stroma, erythrocytes, tumor, etc.), a semi-automatic learning algorithm was developed to segment the tissue (“Stroma”, “Tumor” and “Necrosis”) based on 40× fields of tumors originating from different patients. Therefore, the 2298 images were annotated using tumor, stroma and necrosis labels (Fig. 1D).

 Figure 1 

Example of a manual assessment at the tumor level. (WSI) Pathology signature construction in hematoxylin and eosin stained whole slide images. ROI, Area was annotated by a pathologist highlighting the region of interest. (A) Red annotation is the most invasive part. (B) Blue annotation. (C) Green annotation. (D) images were annotated using Black-tumor, Purple-stroma and Green-necrosis. A: red annotation, is the most invasive part; B: blue annotation; C: green annotation; Black: tumor; Purple: stroma; Green: necrosis.

J Cancer Image (Click on the image to enlarge.)
 Table 1 

Clinicopathological data for SGH cohort in relation to TSP

NTSP-low (%)TSP-high (%)p valueNTSP-low (%)TSP-high (%)P valueNTSP-low (%)TSP-high (%)P value
Age (years)0.8150.8980.340
TNM Stage0.024*0.025*0.004*
Depth of invasion0.3260.3970.299
LN metastasis0.035*0.035*0.005*
Nerve invasion0.5580.3820.277
Vessel invasion0.4080.2610.453
Tumor size0.4760.1741.000
<5 cm620403(63.1%)217(60.8%)620452(63.6%)168(58.9%)620310(62.2%)310(62.2%)
≥5 cm376236(63.9%)140(39.2%)376259(36.4%)117(41.1%)376188(37.8%)188(37.8%)
Tumor location0.7620.4810.418
Left and rectal670432(67.7%)238(66.7%)670483(67.9%)187(65.6%)670329(66.1%)341(68.5%)

Note. P-value is derived from the univariable association analyses between each of the clinicopathological variables and treatment response. The clinical characters were the data from the initial diagnosis.

Abbreviations: LN metastasis: Lymph node metastasis.

*P < 0.05.

Manual evaluation of the TSP

The primary tumors of both cohorts were scored for TSP on 5 μm H&E-stained tissue sections as described previously[10,19]. The tissue samples selected were those were defined as the most invasive part of the primary tumors as used by the pathologists to determine the T-status (Fig. 1A). To determine the TSP, the region with the highest stroma was selected using an 10× objective. A microscopy field was scored where tumor cells were presented at all borders of the image field (north, south, east, west) of the 4× objective. In case of tumor heterogeneity, each image field score was given by ten-fold of the scoring percentage in which the lowest one was selected as the final scoring percentage.

All slides were first scanned and digitized using the K-Viewer with the ×10 objective. Then, a representative area showing the most invasive part at low magnification (×4 objective) was selected. Subsequently, a single area, surrounded by tumor cells, in which both stroma and tumor existed at high magnification (×40 objective) was chosen. Despite some heterogeneity in the TSP among biopsy tissue blocks throughout the entire slide, the regions with the largest amount of stroma and the worst differentiation were selected as the representative object for analysis according to the previous study. Tissues that contained mucin or necrosis in the selected field were visually excluded. The interrater reliability was evaluated.

The TSP was visually calculated (per tenfold: 10%, 20%, 30% and so forth) per field. In this study, it was considered that the tumor and stroma proportions were complementary. For example, TSP 70% represented that stroma accounted for 70% of the entire tumor tissue, and tumor cells accounted for 30%. In this study, a TSP ≤50% was categorized as TSP-low (Fig. 2A), and a TSP >50% was regarded as TSP-high (Fig. 2B).

CAD evaluation of the TSP

Analysis of on postoperative H&E-stained images of CRC was performed by using the CAD systems and an open source software MATLAB was previously described in detail [11]. Examples of representative TSP images generated by the pixel classifier in MATLAB is presented. The TSP images in the CAD systems included evaluating the ratio of tumor, stroma and necrosis (Fig. 3). The percentage of stroma was evaluated and was expressed and converted into WSIs. Here, TSP was the percentage of stroma based on the proposed CAD system with MATLAB software. We determined the gray image area as the stromal part of the WSI and calculated the stroma percentage on MATLAB software (Supplementary Fig. 1). Following background correction using two negative control sections, the images were quantified for calculating the gray or optimizing threshold (Fig. 4). Digital pathological images were collected in the Original Red-Green-Blue (RGB) style. In the proposed algorithm, all the images were converted into the gray scale images at the first step. Median filter [28] were adopted to remove the noises and artefact of the images and histogram equalization was used to enhance the contrast of the Digital pathological images. Then, the images after denoising and contrast enhancement were shown in Fig. 4D. Operating and closing operations and Bilateral filter [29] were adopted in order to further restrain the noise, enhance the contrast of the different structures, which made images smoother and lighter. OTSU image segmentation method [30] were used in the proposed pipeline, the best threshold t of this pipeline was set to be 0.82 after many preliminary experiments, which could be accurately identified as cellular parenchyma and intercellular substance.

Statistical analyses and data analyses

We compared TSP-manual with TSP-cad as the prognostic factor in CRC in this study. Statistical analyses were performed using IBM SPSS Statistics software (version 24) and the figures were generated from GraphPad Prism 7 (version 7.02). Interobserver agreement on stroma percentage between the two pathologists and CAD system MATLAB software was analyzed by calculating intraclass correlation coefficients (ICC). Here, Pearson correlation analysis was adopted to compare the TSP evaluated by pathologists and the TSP from MATLAB of CAD system. We identified clinical factors associated with the TSP with binary logistic regression analysis. Univariable and multivariable Cox proportional hazard models were used to examine the significance of the clinical characteristics and stroma percentage as predictors of Overall Survival (OS). OS was defined as the time period between the date of primary surgery and the date of death from any cause or the date of last follow-up. Right sided tumors were defined as follows: coecum, colon ascendens, flexura hepatica, colon transversum and for left sided: flexura lienalis, colon descendens, colon sigmoideum, rectosigmoideum.

 Figure 2 

Manual visually assessment TSP. (A) TSP-manuala=20%≤50% was categorized as TSP-low with long OS. (B)TSP-manuala=70%>50% was regarded as TSP-high with short OS. Pathologista consensus.

J Cancer Image (Click on the image to enlarge.)
 Figure 3 

Representative matched H&E and segmented images of tumor (black), stroma (purple) and necrosis (green) using the pixel classifier algorithm in MATLAB. Top row: Stroma hot-spot rectangle, 100 ×150 μm across, selected by the observers for the assessment of TSP-manual and extracted with a diameter of x400 field for processing by the MATLAB Bottom row: The same regions in tissues was segmented by MATLAB to calculate TSP-cad. Pathologista consensus; Necrosisb includes classes: fat, muscle, lymphocyte infiltrations, healthy epithelium, erythrocytes.

J Cancer Image (Click on the image to enlarge.)
 Figure 4 

Schematic representation of the proposed algorithm.

J Cancer Image (Click on the image to enlarge.)

The two TSP-manual agreements between the two pathologists and CAD system MATLAB software were calculated using Cohen's Kappa (κ) on the dichotomized TSP values. Interobserver agreement was classified as “slight” (k =0.00-0.20), “fair” (k = 0.21-0.40), “moderate” (k = 0.41-0.60), “good” (k = 0.61-0.80), or “excellent” (k = 0.81-1.00). Paired and unpaired continuous variables were compared by Student's t test or the Mann-Whitney U test. Performance of the predictive TSP was evaluated from multiple dimensions. The linear trend χ2 score were used to assess discriminatory ability and monotonicity of each TSP. The likelihood-ratio (LR) χ2 test was used to assess homogeneity between TSP. Kaplan-Meier log rank analysis was used to examine the effect of TSP on OS. Univariate survival analysis for TSP-manual and TSP-cad used Cox proportional hazards regression to calculate 95% confidence intervals (95% CI) and hazard ratios (HRs). The linear trend ×2 test was used to analyze the correlation between TSP and clinicopathological features. P value <0.05 was considered statistically significant. All analyses were performed using IBM SPSS Statistics software (version 24).


Clinicopathological data

We then determined whether TSP acted as a valuable variate for the survival and prognosis of CRC patients. First, for SGH cohort, the univariate analysis indicated that patients with TSP-high (TSP>50%) had poorer OS (P=0.0023) compared with those with TSP-low (TSP≤50%) (Fig. 5A). The TNM stages, depth of invasion, lymph node metastasis, nerve invasion, vessel invasion, differentiation status, tumor location and TSP were significantly related to OS, based on the univariate analysis (Table 2 and Supplementary Fig. 2). We conclude that TSP is of prognostic value for patients with either left and rectum tumor or right sided tumor, although for patients with a right sided tumor this is more evident [10]. Second, according to the multivariate survival analysis based on the above factors in the Cox proportional hazards model, TSP-high (TSP>50%) was significantly related with OS (HR = 1.750; 95% CI, 1.214-2.523; P=0.003). These findings indicated that TSP-high (TSP>50%) was shown to have a poor prognosis in CRC.

Performance measures

A comparison between TSP drawn by a manual pathologist (TSP-manual) and those provided by the proposed method (TSP-cad) was carried out to assess the rapid multi-dynamic threshold algorithm performance in the segmentation of tumor stroma. True positive (TP) was the number of pixels existed both in TSP-manual and TSP-cad; false negative (FN) referred to all pixels which were failed to identified by the algorithm; false positive (FP) was all the pixels were identified automatically instead of manually. The segmentation performance which was evaluated by calculating the sensitivity, specificity, precision, defined as followed: We found that the overall accuracy was 90% (T Li,Y Yang), which showed improvement in researches by T Li et al. Values of per-class sensitivity, specificity, Precision are listed in Table 3. The Receiver Operating Characteristic Curve (ROC) [31] were shown in Fig. 6. The AUC (Area Under Curve) value was an important indicator for evaluating segmentation results because it was independent of the threshold. The AUC value of the TSP in the SGH cohort was 0.9 (Fig. 6), which had met the requirements of clinical screening.

Agreement of TSP-manual and TSP-cad

The ICC between the two pathologist for the TSP-manual was 0.545 (95% confidence interval (95% CI) 0.500-0.588). The picture depicted the co-occurrence of TSP scores evaluated by two pathologists (Fig. 7). The ICC's between TSP-cad and TSP-manual were 0.822 (95% CI 0.801-0.842). For the SGH cohort, according to the 50% cut-off value described previously, after dichotomizing the TSP visual object, a moderate agreement was found between the two pathologist (k=0.509). It was observed that a relatively high consistency only existed between TSP-manual and TSP-cad (k=0.813). When median was used as the cut-off value of TSP-cad, the consistency was slightly improved (k=0.552).The results were TSP-cad(median)≤41.30% (TSP-low), TSP-cad(median) >41.3%(TSP-high). The pathologist divided the patients into TSP-low group or TSP-high group. These results were shown in Table 4 and Table 5.

 Table 2 

Uni- and multivariate Cox regression analysis for overall survival in the SGH cohort

Univariate analysisMultivariate analysis
Age (years)
TNM Stage
Depth of invasion
LN metastasis
Nerve invasion
Vessel invasion
Differentiation status
Tumor size
<5 cm1
≥5 cm1.2870.889-1.862
Tumor location
Left and rectal0.5020.348-0.723*0.4980.342-0.725*0.4920.329-0.717*0.4860.334-0.707*

aPathology consensus;

bDuo to P-values is 0.051,which is near to 0.05, and shall be considered to siginificant results (P<0.05);

Abbreviations: HR: hazard ratio; CI: confidence interval; LN metastasis:Lymph node metastasis.

*P < 0.05.

 Figure 5 

Kaplan-Meier survival analysis for overall survival of TSP-low verse TSP-high. Results for SGH cohort (A), Training cohort (B), Testing cohort (C), TMA internal cohort (D), TCGA external cohort (E) according to the TSP classifier stratified by clinicopathological risk factors. P-values were calculated by log-rank test.

J Cancer Image (Click on the image to enlarge.)
 Figure 6 

ROC curve of the TSP-manualc in the SGH cohort.

J Cancer Image (Click on the image to enlarge.)
 Figure 7 

Scatter plot of TSP in 996 patients for Pathologist 1 and Pathologist 2. Overlapping parts indicated the consistent situation (321 in total) for the observers and the isolated one (24 in total) the inconsistent.

J Cancer Image (Click on the image to enlarge.)
 Table 3 

Comparison between TSP-manual and TSP-cad in the segmentation of tumor stroma

 Table 4 

Cross-tabulation of Pathologist 1 versus Pathologist 2 after dichotomisation in SGH cohort

Pathologist 1
Pathologist 2, K=0.509

Using the same cut-off value of TSP-cad, for the Training cohort, the kappa between the TSP-manual and TSP-cad(50%) was 0.827, the kappa between the TSP-manual and TSP-cad(median) was 0.543 (Supplementary Table 3). For the testing cohort, the kappa between the TSP-manual and TSP-cad(50%) was 0.776, the kappa between the TSP-manual and TSP-cad(median) was 0.575 (Supplementary Table 4).

 Table 5 

Cross-tabulation of TSP-manual versus TSP-cad after dichotomisation in the SGH cohort

TSP-cad (50%), K=0.813
TSP-cad (median), K=0.552

Prognostic analyses

The established cut-off values at TSP were TSP-low≤50% and TSP-high>50%. In the SGH cohort, according to these thresholds, 357 cases (35.84%) were classified as TSP-high, 639 cases (64.16%) were classified as TSP-low. In the SGH cohort, a TSP-high was associated with TNM stage and lymph node metastasis in comparison to TSP-low (p=0.024 and p=0.035, Table 1, Table 6). In the TMA cohort, a TSP-high was associated with TNM stage and lymph node metastasis in comparison to TSP-low (p<0.001 and p<0.001, Table S1, Table 6). In the TCGA cohort, a TSP-high was associated with TNM stage and lymph node metastasis in comparison to TSP-low (p=0.024 and p=0.028, Supplementary Table 1, Table 6).

A further Kaplan-Meier analysis in the overall patient population showed a significant adverse OS for patients with a TSP-high and TSP-low. In the SGH cohort, the 5-year survival rate was separately 84.37% in patients with TSP-low and 76.69% in patients with TSP-high. Thereby, For the Training cohort, in the overall population (n=703), 255 patients (36.27%) were assigned to TSP-high group, 448 patients (63.73%) to the TSP-low group. For the Testing cohort, in the overall population (n=293), 102 patients (34.81%) were assigned to TSP-high group, 191 patients (65.19%) to the TSP-low group. A TSP-high was associated with TNM stage and worse OS in comparison to the TSP-low group (p=0.03 and p=0.016, Table 6 and Fig. 5C). The performance of each TSP method was assessed and summarized in Supplementary Table 5. In SGH cohort, TSP-cad(median) showed higher Linear trend χ2 score (OS: 13.445), higher LR test (OS:13.673) compared to the TSP-manual Linear trend χ2 score (OS: 12.025) LR test (OS: 11.601) and TSP-cad(50%) Linear trend χ2 score (OS: 10.953) LR test (OS: 10.3).

 Table 6 

TSP-high was associated with TNM stage and lymph node metastasis in comparison to TSP-low in SGH, Training, Testing, TMA internal, TCGA external cohort

TSP-manualTSP-cad (50%)TSP-cad(median)
p value
SGH cohort0.024*0.025*0.004*
Training cohort0.4230.3230.052
TNM stage
Testing cohort0.003*0.008*0.023*
TMA cohort0.001*NA0.001*
TCGA cohort0.024*NA0.032*
SGH cohort0.035*0.035*0.005*
Training cohort0.3670.2580.063
Lymph node
Testing cohort0.012*0.026*0.023*
TMA cohort0.001*NA0.002*
TCGA cohort0.028*NA0.036*

Abbreviations: NA, not applicable. Due to we do not evaluate TSP using MATLAB of CAD system in the TMA and TCGA cohort, thus we lack of TSP-cad(50%)data in TMA and TCGA cohort.

*P < 0.05.

For TSP-manual, there was a significantly lower OS in the TSP-high compared to the TSP-low group in the SGH cohort (p=0.0023, Fig. 5A). The same outcomes were seen in the Training, Testing, TMA internal, TCGA external cohort (p=0.0349, p=0.0208, p<0.0001, p=0.0075, Fig. 5, respectively). Similarly, Due to we do not evaluate TSP using MATLAB of CAD system in the TMA and TCGA cohort, thus we lack of TSP-cad data in TMA and TCGA cohort. For TSP-cad(50%), the same outcome was seen in the SGH, Training, Testing cohort(p=0.0068, p=0.0305, p=0.0897, respectively), meanwhile, TSP-cad(median) had the same results ( p=0.002, p=0.0165, p=0.0469, respectively) (Supplementary Fig. 3).

In the univariate analysis, all methods for evaluating the TSP were found to be prognostic for OS in the all cohorts. For the SGH cohort: TSP-manual HR = 1.75 (95% CI 1.214-2.523); TSP-cad(50%) HR = 1.659 (95% CI 1.145-2.405); and TSP-cad(median) HR = 1.823 (95% CI 1.239-2.683) (Table 2). For the Training cohort: TSP-manual HR = 1.61 (95% CI 1.030-2.517); TSP-cad(50%) HR = 1.638 (95% CI 1.042-2.573) and TSP-cad(median) HR = 1.759 (95% CI 1.101-2.809) (Supplementary Table 6). For the Testing cohort: TSP-manual HR = 1.75 (95% CI 1.124-2.523); TSP-cad(50%) HR = 1.659 (95% CI 1.145-2.405); and TSP-cad(median) HR = 1.823 (95% CI 1.239-2.683) (Supplementary Table 7). Whether Training or Testing cohort, the result that TSP-cad(median) hazard ratio was slightly higher hazard ratio than TSP-manual, we found that we had the same results as others[27,32]. Thus, we concluded that the optimal cut-off value for TSP-cad(median) should be lower than previous internationally recognized cut-off of 50%[10].

For multivariate analysis, the TSP-cad whether TSP-cad(50%) or TSP-cad(median), was found independent prognostic of vessel invasion and tumor location. For OS, In SGH cohort, TSP-manual HR = 1.516 (95% CI 1.048-2.193); TSP-cad(50%) HR = 1.452 (95% CI 0.999-2.111) and TSP-cad(median) HR=1.608 (95% CI 1.089-2.375) (Table 2).


Accurate assessment was essential for choosing the appropriate treatment. In routine clinical practice, the TNM staging system was a key prognostic determinant in CRC for oncologists and patients. However, the patients with the same cancer stage had different clinical outcomes, which indicated that the current staging system was not sufficient to predict prognosis. In this study, the CRC patients with high TSP was of poor prognosis based on the H&E-stained sections. There are differences between the assessments from pathologists' visual evaluation, which was not quite feasible to adopt widely [21]. Thus we established a threshold algorithm for gray Image, which had acted as a powerful tool different from manual naked eye assessment. Our threshold algorithm could accurately quantify the percentage of tumor stroma area, which greatly improved its forecasting accuracy. In addition, in predicting the survival time, the kaplan-Meier log rank analysis suggested that the outcomes of our threshold algorithm were similar to those of current TNM stage. Although TNM staging was crucial to assess prognosis and establish a treatment strategy, the staging is performed mainly on the basis of anatomical information. In contrast, the TSP-cad threshold algorithm could improve the accuracy of prognosis. Application of TSP-manual with H&E-stained sections might be a straightforward and clinically applicable procedure. In our study, the combination of CAD systems and TSP assessment in different gray sections had a better prognostic value than TSP-manual assessment. These results suggested that the TSP-cad reinforced the prognostic competence of TNM stage, thereby adding more prognostic value to TNM staging. These results suggested that threshold algorithm of the TSP-cad might become a new constituent of the classification of CRC.

Although TSP assessment not a standard component of therapies for patients with CRC, previous reports indicated that TSP could improve their outcomes. The evaluation of the entire tissue sample was vital for CRC prognosis where the tumor stroma was determined as parts in between the TSP-low and the TSP-high. For practical reasons, pathologists typically examined representative regions in each slide before they came up with a prognostic decision (i.e., sampling). However, sampling might lead to prognostic errors, particularly in tumors with heterogeneity. Since the computer analyzed the ROI, it could potentially assist the pathologist in inconsistency in identifying TSP-low or TSP-high regions from pathological slides. This could have an important effect on the prognostic decision and hence could help to reducing the intra- and inter-reader variability.

Our independent tests showed that TSP-cad could be well evaluated by threshold algorithm for Gray Image. The kappa value of TSP-cad(median) (kappa=0.552) was slightly higher than TSP-manual (kappa=0.509). The TSP-cad(median) served as an independent prognostic factor for OS in the testing, training, TMA internal, TCGA external cohorts, respectively. The prognostic value of the TSP-cad(median) was compared to the assessed in consensus by two pathologists for OS both in the univariate and multivariate analysis.

Our study was to investigate and quantify the rapid multi-dynamic threshold algorithm approaches to possibly add prognostic significance to the TSP. Furthermore, a mild consistency was found in the assessment from two pathologists. On the whole, our TSP-manual value was lower than that was recognized by the TSP-cad. The main reason was that our threshold algorithm could well identify necrotic blank areas, which concluded that the mean percentage of necrotic areas was 12.025%. Fortunately, TSP-manual and TSP-cad(median) were also found as an independent prognostic factor in the SGH and Testing cohorts. We found TSP-cad(median) (HR=1.823 95%, CI 1.239-2.683) had slightly higher HR and wider CI than TSP-manual (HR=1.75,95% CI 1.214-2.523). As is shown in Supplementary Table 5, we observed that TSP-cad(median) showed higher Linear trend χ2 score compared to the TSP-manual and TSP-cad(50%), indicating that TSP-cad(median) had excellent discriminatory power in predicting prognosis of distinct cohort.

A similar recent study showed that stromal value cut-point of 33% is slightly less than the previous internationally recognized cut-point of 50% in Triple Negative and Luminal Breast Cancer [31]. The discrepancy between pathologists and CAD system was also described between the cut-point for CAD analysis TSP assessment 65.47% versus 50% pathologists visual assessment in rectal cancer [11]. The percentage difference in cut-point suggests there may be a common discrepancy between pathologists versus CAD when assessing a tumor pathology image. For the univariate and multivariate analysis, our research showed that the cut-off value for TSP-cad(median) in CRC was 41.3%. Interestingly our TSP cut-point is similar to cut-off value 48.8% of Zhao [27]. Our results are both less than cut-point of 50% in CRC. Due to tumor heterogeneity, TSP varies greatly among patients with different tumors, and obvious interstitial fibrosis is one of the important features of pancreatic ductal adenocarcinoma, so cut-off may be inconsistent among different tumor types. Thus, there are reasons to believe that the optimal cut-off is less than previous internationally recognized cut-off of 50% in CRC, this also proves that our rapid multi-dynamic algorithm is highly accurate and practical. In addition, we did not separate the left colon and rectum into different groups. To the best of our knowledge, our study showed for the first time that we put the left colon and rectum together by using the threshold algorithm. In order to swiftly identify the TSP of the patient's postoperative pathological section, the rapid multi-dynamic method was applied into the future assessment methods of CRC.

In this article, we came up with an adaptive algorithm for the automatic discrimination between tumor and stroma percentage in CRC. To the best of our knowledge, the proposed method was a rapid multi-dynamic algorithm for the segmentation of stroma percentage in H&E-stained images of colorectal tissue. Also, TSP was firstly evaluated to verify its relevance to clinical related endpoints, and thus was acknowledged as a future biomarker in this setting. The algorithm was tested on 996 H&E images with high variation of staining intensities (Fig. 2). High segmentation performances were obtained for each image of the data cohort. Being a rapid multi-dynamic threshold algorithm, this algorithm could be used in future studies as a starting point to realize reliable systems for the tumor microenvironment and diagnosis. Another possible application of the threshold algorithm was to evaluate tumor's response to neoadjuvant therapy by comparing the relative proportion of percentage of tumor and the percentage of tumor stroma in the tumor stroma before and after chemotherapy. We analyzed the TSP of primary tumors from diagnostic biopsies and surgical resections. In surgically resected primary tumors, TSP low was detected in 47% of patients, a proportion number which was similar to the results in the literature. Our results demonstrate that tumors with low TSP had a significantly higher risk of vascular and neural invasion, along with T and N tumors, compared with tumors with high TSP. These observations confirmed statements in previous literature. Our research group was currently working on an extension of this algorithm for the semiautomatic quantification of the biomarkers expressed by stroma ingredient (e.g. Tumor-infiltrating lymphocytes, Tumor-associated macrophages, Cancer-associated fibroblasts) near the tumor boundary. Synchronously, we were carrying out large-scale promotion and verification of CRC data in five domestic hospitals.

To sum up, pathological pictures were digitally analyzed by using readily, available and free image processing and analysis software MALTAB. Our idea was to recalibrate with the software after the ratio of TSP was initially determined by the pathologist. In order to reduce individual differences in TSP judgment. By better understanding the role of the tumor stroma in CRC, our results showed that the rapid multi-dynamic algorithm could be a good prognostic information judgment. However, the results of a large population-based cohort study had not been widely reported. Although TSP hadn't been applied into TNM staging, it was believed to turn into a precise and customized therapy in the future. In future outside generalization validations, we continued to optimize this threshold grayscale image thresholding algorithm. We have reasons to believe that our study will be used on a large scale clinically in the future.

In conclusion, we find that in both TSP-cad(50%) and TSP-cad(median), multivariate analysis showed the TSP-cad was an independent prognostic factor for the vessel invasion and tumor location. Fortunately, TSP-manual and TSP-cad were also found independent prognostic factor in all the cohorts. It was found that TSP-cad had slightly higher HR and wider CI than TSP-manual and TSP-cad(median) showed higher Linear trend χ2 score compared to the TSP-manual and TSP-cad(50%). The optimal cut-off is less than previous internationally recognized cut-off of 50% in CRC. Thus, the current study demonstrates that rapid multi-dynamic threshold Algorithm for Gray Image counting of TSP using our proposed method on MATLAB software showed a higher accuracy than manual evaluation by pathologists.

Supplementary Material


Supplementary figures and tables.


This work was supported by the National Natural Science Foundation of China (grants 81772526 to C. Huang, 82072662 to C. Huang), Shanghai Municipal Education Commission-Gaofeng Clinical Medicine (grants 20161425 C.Huang), Shanghai Jiaotong University Medical Cross Fund (grants YG2017MS28 to C. Huang), Three-year Action Plan for Clinical Skills and Clinical Innovation in Shanghai-level Hospitals (grants SHDC2020CR4022 to C. Huang), Fudan Medical Device project (grants 20275,DGF501021-01 to Z. Yu), Independent Research fund of Key Laboratory of Industrial Dust Prevention and Control & Occupational Health and Safety, Ministry of Education (Anhui University of Science and Technology) (grants EK20201003 to Z.Yu).

Author Contributions

Conceptualization: Tengfei Li, Zekuan yu, Yan Yang and Zhongmao Fu; Data curation: Tengfei Li, Zekuan yu, Yan Yang, Zhongmao Fu, Ziang Chen, Qi Li, Kundong Zhang and Zai Luo; Funding acquisition: Zekuan yu and Chen Huang; Investigation: Qi Li, Kundong Zhang, Zai Luo and Zhengjun Qiu; Methodology: Tengfei Li, Zekuan yu, Yan Yang and Zhongmao Fu. Project administration: Tengfei Li, Zekuan yu, Zhengjun Qiu and Chen Huang; Software: Tengfei Li, Zekuan yu, Yan Yang, Zhongmao Fu and Ziang Chen; Validation: Qi Li, Kundong Zhang and Zai Luo; Visualization: Zekuan yu; Writing-original draft: Tengfei Li, Zekuan yu and Chen Huang.

Competing Interests

The authors have declared that no competing interest exists.


1. Siegel R.L, K.D. Miller, and A. Jemal, Cancer statistics, 2020. CA: A Cancer Journal for Clinicians. 2020;70:7-30

2. Weiser MR. AJCC 8th Edition: Colorectal Cancer. Ann Surg Oncol. 2018;25(6):1454-1455

3. Schneider NI, Langner C. Prognostic stratification of colorectal cancer patients: current perspectives. Cancer Manag Res. 2014;6:291-300 Published 2014 Jul 2

4. Compton CC. Optimal pathologic staging: defining stage II disease. Clin Cancer Res. 2007;13(22 Pt 2):6862s-70s

5. Lyall MS, Dundas SR, Curran S, Murray GI. Profiling markers of prognosis in colorectal cancer.Clin Cancer Res. 2006;12(4):1184-1191

6. Petrelli F, Labianca R, Zaniboni A. et al. Assessment of Duration and Effects of 3 vs 6 Months of Adjuvant Chemotherapy in High-Risk Stage II Colorectal Cancer: A Subgroup Analysis of the TOSCA Randomized Clinical Trial. JAMA Oncol. 2020;6(4):547-551

7. O'Connell JB, Maggard MA, Ko CY. Colon cancer survival rates with the new American Joint Committee on Cancer sixth edition staging. J Natl Cancer Inst. 2004;96(19):1420-1425

8. Merkel S, Wein A, Günther K, Papadopoulos T, Hohenberger W, Hermanek P. High-risk groups of patients with Stage II colon carcinoma. Cancer. 2001;92(6):1435-1443

9. van Pelt GW, Kjær-Frifeldt S, van Krieken JHJM. et al. Scoring the tumor-stroma ratio in colon cancer: procedure and recommendations. Virchows Arch. 2018;473(4):405-412

10. Mesker WE, Junggeburt JM, Szuhai K. et al. The carcinoma-stromal ratio of colon carcinoma is an independent factor for survival compared to lymph node status and tumor stage. Cell Oncol. 2007;29(5):387-398

11. Geessink OGF, Baidoshvili A, Klaase JM. et al. Computer aided quantification of intratumoral stroma yields an independent prognosticator in rectal cancer. Cell Oncol (Dordr). 2019;42(3):331-341

12. De Wever O, Mareel M. Role of tissue stroma in cancer cell invasion. J Pathol. 2003;200(4):429-447

13. Downey CL, Simpkins SA, White J. et al. The prognostic significance of tumour-stroma ratio in oestrogen receptor-positive breast cancer. Br J Cancer. 2014;110(7):1744-1747

14. Fu M, Chen D, Luo F. et al. Association of the tumour stroma percentage in the preoperative biopsies with lymph node metastasis in colorectal cancer. Br J Cancer. 2020;122(3):388-396

15. Gujam FJ, Edwards J, Mohammed ZM, Going JJ, McMillan DC. The relationship between the tumour stroma percentage, clinicopathological characteristics and outcome in patients with operable ductal breast cancer. Br J Cancer. 2014;111(1):157-165

16. Peng C, Liu J, Yang G, Li Y. The tumor-stromal ratio as a strong prognosticator for advanced gastric cancer patients: proposal of a new TSNM staging system. J Gastroenterol. 2018;53(5):606-617

17. Mesker WE, Liefers GJ, Junggeburt JM. et al. Presence of a high amount of stroma and downregulation of SMAD4 predict for worse survival for stage I-II colon cancer patients. Cell Oncol. 2009;31(3):169-178

18. Wu J, Liang C, Chen M, Su W. Association between tumor-stroma ratio and prognosis in solid tumor patients: a systematic review and meta-analysis. Oncotarget. 2016;7(42):68954-68965

19. Huijbers A, Tollenaar RA, v Pelt GW. et al. The proportion of tumor-stroma as a strong prognosticator for stage II and III colon cancer patients: validation in the VICTOR trial. Ann Oncol. 2013;24(1):179-185

20. West NP, Dattani M, McShane P. et al. The proportion of tumour cells is an independent predictor for survival in colorectal cancer patients. Br J Cancer. 2010;102(10):1519-1523

21. van Pelt GW, Sandberg TP, Morreau H. et al. The tumour-stroma ratio in colon cancer: the biological role and its prognostic impact. Histopathology. 2018;73(2):197-206

22. Martin B, Banner BM, Schäfer EM. et al. Tumor proportion in colon cancer: results from a semiautomatic image analysis approach. Virchows Arch. 2020;477(2):185-193

23. Park JH, Richards CH, McMillan DC, Horgan PG, Roxburgh CSD. The relationship between tumour stroma percentage, the tumour microenvironment and survival in patients with primary operable colorectal cancer. Ann Oncol. 2014;25(3):644-651

24. Yamaguchi K, Hara Y, Kitano I. et al. Tumor-stromal ratio (TSR) of invasive breast cancer: correlation with multi-parametric breast MRI findings. Br J Radiol. 2019;92(1097):20181032

25. He R, Li D, Liu B. et al. The prognostic value of tumor-stromal ratio combined with TNM staging system in esophagus squamous cell carcinoma. J Cancer. 2021;12(4):1105-1114

26. Zhang F, Yao S, Li Z. et al. Predicting treatment response to neoadjuvant chemoradiotherapy in local advanced rectal cancer by biopsy digital pathology image features. Clin Transl Med. 2020;10(2):e110

27. Zhao K, Li Z, Yao S. et al. Artificial intelligence quantified tumour-stroma ratio is an independent predictor for overall survival in resectable colorectal cancer. EBioMedicine. 2020;61:103054

28. Brownrigg D R K. The weighted median filter[J]. Communications of the ACM. 1984;27(8):807-818

29. Zhang B, Allebach JP. Adaptive bilateral filter for sharpness enhancement and noise removal. IEEE Trans Image Process. 2008;17(5):664-678

30. Otsu N. A threshold selection method from gray-level histograms[J]. IEEE transactions on systems, man, and cybernetics. 1979;9(1):62-66

31. Cook N R. Use and misuse of the receiver operating characteristic curve in risk prediction[J]. Circulation. 2007;115(7):928-935

32. Millar EK, Browne LH, Beretov J. et al. Tumour Stroma Ratio Assessment Using Digital Image Analysis Predicts Survival in Triple Negative and Luminal Breast Cancer. Cancers (Basel). 2020;12(12):E3749

Author contact

Corresponding address Corresponding authors: E-mail: richard-hccom (C.Huang), yzkedu.cn (Z.Yu).

Received 2021-1-31
Accepted 2021-5-19
Published 2021-6-1