Detection of Pancreatic Ductal Adenocarcinoma by A qPCR-based Normalizer-free Circulating Extracellular Vesicles RNA Signature

Background: Pancreatic ductal adenocarcinoma (PDAC) is difficult to diagnose and many efforts have been made to evaluate EVs-derived RNAs as biomarkers to predict PDAC. However, lack of robust internal references largely limited their clinical application. Here we proposed an RNA ratio-based, normalizer-free algorithm to quantitate EVs-derived RNAs in PDAC. Methods: Differentially expressed RNAs in the training group were identified using “limma” package. The ratio of any two candidate RNAs in the same sample was calculated and used as a new biomarker. LASSO regression was performed to build prediction models based on those RNA ratios. RNA-seq data of 116 plasma samples and RT-qPCR data of 111 plasma samples were used for internal and external validation, separately. Three algorithms (lasso regression, logistic regression, and SVM) were compared to improve the performance of this RNA signature. Results: We developed an RNA-ratio based prediction model which comprised eight EVs-derived RNAs, including FBXO7, MORF4L1, DDX17, TALDO1, AHNAK, TUBA1B, CD44, and SETD3. This model could well differentiate PDAC patients with a minimal AUC of 0.86 in internal verification using testing group. External validation using RT-qPCR data also exhibited a good classifier ability with an AUC of 0.89 when distinguishing PDAC from healthy controls. Conclusion: We've developed a qPCR-based, normalizer-free circulating EVs RNA classifier, which could well distinguish PDAC patients from noncancerous controls.


Introduction
Pancreatic cancer (PC) is predicted to be the second most common cause of cancer-related death around the world by 2030 [1]. Pancreatic ductal adenocarcinoma (PDAC) is the most common subtype of PC. More than 85% of PC confirmed to be PDAC after histological confirmation [2]. PDAC has properties of early metastatic potential and resistance to existing treatment such as chemotherapy or radiotherapy, which leads to its poor prognosis and high mortality. The 5-year survival rate of PDAC is less than 8% [3] [4]. Thus, screening tools which could identify PDAC from general population in its resectable stage are crucial and urgently needed. Traditional biomarkers such as CA-199 and CEA have poor sensitivity and specificity, resulting in misdiagnosis. Emerging biomarkers such as plasma miRNAs, plasma lncRNAs, circulation tumor DNAs, and plasma extracellular vesicles (EVs) exhibited Ivyspring International Publisher better accuracy in detecting various types of cancer [4] [5].
EVs are membrane-encapsulated heterogenous vesicles delivering various macromolecules including proteins, miRNAs, lncRNAs, mRNAs, or lipids between different microenvironments, resulting in initiation of various biological processes due to their different cargoes [6]. Tumor cells produce a large number of EVs, and EVs cargoes play important roles in tumorigenesis, angiogenesis, metastasis, and other pathological processes. Tumor-specific expression patterns and nonnegligible biological functions make EVs one of the excellent biomarker sources. Our previous study has also proved that plasma EVs-derived RNAs provided higher classifier ability than RNAs directly isolated from plasma and are more suitable to be biomarkers [7].
There are several methods could be used to detect EVs-derived RNAs such as next generation sequencing (NGS), RNA chip, and RT-qPCR. NGS and RNA chip are relatively accurate but not suitable for large-scale clinical application because of their high-cost. Real-time quantitative PCR (RT-qPCR) is an alternative way to detect RNAs which is more cost-effective than omics tools. However, the absence of a well-established normalizer for plasma EVs derived RNA quantification largely hampered the development of qPCR-based assays, and resulted in unfavorable data consistency and reproducibility across different studies [8][9][10][11][12][13].
To solve this problem, here we proposed a ratio-based normalization for circulating EVs derived long RNA data. The ratio of any two candidate RNAs in the same sample was calculated and used as a new biomarker, and then the ratios were compared between different groups instead of a single RNA level. Considering the two RNAs in an RNA-pair are simultaneously measured under the same condition, their ratio could reflect the true difference among samples by canceling systematic biases. So instead of absolute quantification of a single RNA, we established a new scoring system based on RNA-pairs, which enables us to build a quantitative RNA classifier without an RNA internal reference.

Patients Cohorts
Circulating EVs-derived long RNA NGS data from GSE133684 [14] in GEO database were obtained for analysis. Clinical details of GSE133684 were shown in Table 1. GSE133684 was randomly divided into a training group and a testing group. Training group was used to establish RNA pair prediction model and testing group was used for internal validation. 111 receivers including PDAC patients, chronic pancreatitis patients (CP) and normal controls (NC) from Beijing Friendship Hospital were recruited for external validation of these prediction models. Inclusion criteria of patients in external validation group were (1) patients pathologically diagnosed PDAC or CP in Beijing Friendship Hospital during 2018-1-1 to 2020-1-1; (2) patients agreed to attend this research and assigned the informed consent forms; (3) blood samples were available before surgery or treatment. Exclusion criteria were (1) other kinds of tumors were diagnosed in the same patient; (2) patients with IgG4-related disease or any other autoimmune disease were not proper to attend this study; (3) patients with severe inflammation or septic shock were not able to attend this study; (4) patients with severe coagulation dysfunction; (5) patients refused to attend this research. Clinical details of these subjects were also shown in Table 1. All participants had signed the informed consent, and this study was approved by the ethics committee of Beijing Friendship Hospital. Flowchart of this study was shown in Figure 1.

Differentially expressed RNA analysis and GO analysis
For NGS data in training group, RNAs which TPM lower than 50 were excluded because of the difficulty in qPCR verification. Differentially expressed (DE) RNA analysis was performed to training group using "limma" package in R 3.5.2 [15]. To shrink rang of DE RNAs, cut-off levels of DE RNAs were identified as |Log fold change| > 3 and adjusted p-value < 0.0001. Volcano plot and heatmap were also performed to show the expression differences between PDAC patients and healthy controls. Moreover, to better understand the biological processes (BP), cellular components (CC) and molecular features (MF) of these DE RNAs, Gene Ontology (GO) analysis was also performed by using "clusterProfiler" package in R [16]. Establishment of a new scoring system based on RNA-pairs. Briefly, the GSE133684 dataset was randomly divided into a training group and a testing group. DEGs were identified and constructed RNA-pairs. PDAC prediction models were built based on 2-4 RNA-pairs, and verified by both internal NGS data and external RT-qPCR data.

RNA-pair matrix construction and prediction model construction
RNA pair ratio was calculated from any two DE RNAs expression values in the same sample. Univariate logistic regression was subsequently performed for each RNA pair to roll out pairs not significantly associated with PDAC occurrence and constructed an RNA pair matrix of training group. Next, we used lasso regression to select variables from RNA pair matrix under 7-fold cross-validation by "glmnet" package in R. Prediction models enrolled 2, 3, 4 RNA pairs were established by a step-wise variable selection process by controlling lambda values in lasso regression. Then, logistic prediction models, support vector machine (SVM) models and Lasso regression models for each of these RNA pair signatures were constructed by "glm" function, "e1071" package and "glmnet" package [17] in R, separately. Receiver operator characteristic (ROC) curves and area under the curve (AUC) of all these prediction models were calculated to evaluate their performance.

Internal verification by NGS data in testing group
Testing group of GSE133684 was used for internal verification of these models. ROC curves of these models were drawn by using "plotROC" package [18] and "ggplot2" package in R. AUC values of internal verification for each model were also calculated.

EVs isolation
This study isolated EVs of plasma samples of 111 PDAC patients, CP patients and normal controls. 3 mL plasma sample was collected from each subject. We performed ultracentrifugation (UC) method for EVs isolation. Firstly, we centrifugated plasma samples at 3000 × g for 15min after thawing at 37°C in order to remove cell debris. Then, we diluted supernatant of each sample by 7-fold volume of phosphate-buffered saline (PBS). Next we centrifuged the mix again at 13,000 × g for 30min and a 0.22μm filter was applied in order to filter large vesicles. Then, a P50AT2-986 rotor (CP100NX; Hitachi, Brea, CA, USA) was used at 150,000 × g, 4°C for 4h to pellet EVs. Pellet was resuspended in PBS and centrifuged again at 150,000 × g 4°C for 2h. Finally, the EVs enriched fraction pellet was re-suspended in 100µL PBS.

EV protein quantification
Pierce BCA Protein Assay Kit (Thermo Scientific, Product No. 23,225) were used to quantify the protein concentration of EVs-enriched fractions following the manufacturer's protocol. 10μL of each standard and samples were pipetted into 96-Well Plates. After mixture of 200μL of the WR to each well and shaking for 30 second on the shaker, plates were covered and incubated at 37°C for 30min. Then we measured their absorbance at 562nm on the plate reader. Finally, we build a standard curve and quantified EVs-enriched fraction samples.

Nanoparticle tracking analysis (NTA)
The ZetaView PMX 110 (Particle Metrix, Meerbusch, Germany) with 405 nm laser was applied to find out size and quantity of EVs in concentrations ranged 1x10 7 /mL to 1x10 9 /mL. We took a 60-second video with a frame rate of 30 frames/sec, and analyzed it by using NTA software (ZetaView 8.02.28).

Transmission electron microscopy (TEM)
We next placed 20µL EVs enriched solution on a copper mesh for 10min in room temperature. Then, we contrasted solution with uranyloxalate solution for 1min after washed by sterile distilled water. The solution was subsequently dried for 2min under incandescent light. The copper mesh was then photographed under a TEM (JEOL-JEM1400, Tokyo, Japan).

Western blot
Special markers for extracellular vesicles have been previously reported. In the present study, we used a combination of two positive markers (CD63 and TSG101) and one negative markers (calnexin) to characterize the exosomes we extracted. Rabbit polyclonal antibody CD63 (sc-5275, Santa Cruz, CA, USA), TSG101 (sc-13611, Santa Cruz, CA, USA) and calnexin (10427-2-AP, Promega, Madison, WI) were used in Western Blot procedures. The protein bands were detected using an enhanced chemiluminescence system (Bio-Rad, USA).

RNA isolation and RT-qPCR validation
Total RNA isolated was extracted and purified from plasma EVs using the miRNeasy Mini kit following the manufacturer's protocol (No. 217004, Qiagen, Hilden, Germany). A total RNA was reverse-transcribed and cDNA was synthesized using PrimeScript TM RT Master Mix (TAKARA RR036a). The amplification of cDNA was performed in 10-μL reaction system following the SYBRGREEN life assays manufacturer's instructions. Primer sequence was shown in Table S1. Expression ratios of any two of these RNAs were calculated separately and taken into prediction models. ROC curves were drawn by using "plotROC" package and "ggplot2" package. AUC values of external verification by RT-qPCR were also calculated.

Differentially expressed RNA identification
Samples in the GSE133684 dataset were randomly divided into two cohorts, and 287 low-abundant RNAs with an average TPM lower than 50 were excluded for the difficulty in qPCR quantification. With cut-off levels of |Log fold change| > 3 and adjusted p-value < 0.0001, 229 differentially expressed RNAs were identified in the training cohort (Table S2). Volcano plot and heatmap of these DE RNAs were shown in Figure 2A and 2B, separately.

GO analysis of differentially expressed RNAs
GO analysis was performed in these differentially expressed RNAs by "clusterProfiler" package in R. We find that DE RNAs were significantly enriched in neutrophil related biological processes (BP) such as neutrophil activation and neutrophil mediated immunity (P < 0.001, Figure 2C, Figure S1A, 1B). CD44, NCKAP1L, and DOCK2 play important roles in these biological processes ( Figure  S1C). As for molecular functions (MF) of RNAs, we found that they were significantly enriched in actin binding and cell adhesion molecule binding (P < 0.001, Figure 2D, Figure S1D, 1E). EEF2 and AHNAK participated in these GO terms ( Figure S1F). Furthermore, we explored cellular component (CC) information of DE RNAs and found that they were mostly enriched in focal adhesion and cell-substrate junction (P < 0.001, Figure S1G, 1H). CD44 and AHNAK played crucial roles in these GO terms ( Figure S1I).

RNA-pair matrix establishment and construction of prediction models
We paired any two of 229 candidate RNAs and conducted an RNA-pair matrix containing 26106 RNA-pairs. Among those RNA-pairs, 516 candidates achieved an AUC above 0.8 and were selected for the construction of prediction models. Lasso regression was applied in variable selection and three RNA-pair signatures were constructed by controlling lambda values in the regression ( Figure 3A, Table 2, Figure  S2A). Violin plot representing expression level between PDAC patients and healthy controls of these RNA-pair ratios were shown in Figure 3B-E. Ratios of these 4 RNA-pairs in healthy controls were significantly higher than those in PDAC patients (P < 0.001).

Internal validation of prediction models in the testing cohort
NGS data from 116 samples were used for internal validation, and we constructed Lasso regression models, logistic models, and SVM models for internal validation of these signatures.  Figure  4C).

Characterization of EVs isolated from plasma
In order to further evaluate the expression of EVs-derived RNAs in PDAC patients and non-cancerous individuals, we isolated EVs from plasma samples and characterized them according to the MISEV2018 guideline. We obtained images of oval or bowl-shaped microvesicles with diameters of 50-100 nm by TEM ( Figure 5A). NTA was also performed with the peak value of diameters in 100nm ( Figure  5B). We further verified distinctive markers for extracellular vesicles by Western Blot. Enrichment of two positive markers (TSG101 and CD63) were observed ( Figure 5C). On the contrary, no protein band was detected around molecular weight of calnexin which was considered as a negative marker ( Figure 5C).

External validation by RT-qPCR data in our own cohort
We further verified those signatures by RT-qPCR in an independent cohort of our own cohort including 111 samples (Clinical data was shown in Table 1). For 3 RNA pair model, SVM algorithm achieved the highest AUC of 0.71 in distinguishing PDAC from chronic pancreatitis patients and 0.78 in distinguishing PDAC from health individuals ( Figure  6A-B, Figure S2B). For 4 RNA pair model, the SVM algorithm achieved the highest AUC of 0.77 in distinguishing PDAC from chronic pancreatitis patients and 0.89 in distinguishing PDAC from health individuals ( Figure 6C-D, Figure S2C).

Discussion
Detection of PDAC at a resectable stage is challenging to gastroenterologists [14,19]. Many efforts have been made to evaluate EVs-derived RNAs as biomarkers in cancer diagnosis [7,20], especially in PDAC. Xian-yin Lai et al. [21] developed an exosomal miRNA signature to predict PDAC, and found this signature has better diagnostic value comparing to CA19-9 and GPC1. Tetsuya Takikawa et al. [22] focused on miRNAs in pancreatic stellate cell (PSC)-derived exosomes, and firstly clarified miRNA expression profile of PSC-related exosomes. They also confirmed the roles of PSC-exosome miRNAs in stimulating the proliferation or migration of PDAC cells. Shulin Yu et al. [23] conducted the largest NGS study of EV-derived RNAs in PDAC with 501 participants, developed a diagnosis signature based on 8 EV-derived RNAs and exhibited high accuracy no matter in internal or external validation. They firstly characterized the plasma EV-derived RNA profile in PDAC. Their creative work had laid the groundwork of PDAC biomarker field and provided convincing evidence for EV-derived RNAs to be good PDAC diagnostic biomarkers.
However, there are still many problems preventing these signatures from clinical application. Testing cost is the major problem bothering physicians. Many researches were based on NGS or RNA chip data, however, these methods are not suitable for large-scale clinical application because of their high-cost. As a cost-effective tool, RT-qPCR could be widely used in the evaluation of EVs-derived RNAs, but the controversial normalization of RT-qPCR data generated from plasma EVs largely reduced the reproducibility across different studies as well as our enthusiasm in the development of liquid-biopsy based PDAC diagnostic tools.
Three regular ways were often used to normalize plasma EVs-derived RNAs. Traditional reference genes were often used to normalize RT-qPCR data, such as GAPDH and β-actin, which were considered stably expressed in somatic cells. However, these genes were proved to be differential expressed in plasma EVs. Moreover, during the isolation of EVs, different expression manners between target RNAs and internal reference RNAs result in variance in their relative expression levels and bring poor reproducibility. Consequently, they are not the ideal tool for data normalization in plasma EVs-derived RNAs quantification. Some researchers chose to screen more specific reference RNAs by many algorithms such as Normfinder, Genorm and DataAssist [24][25][26][27][28][29]. Dozens of reference RNAs were found in different diseases and various experiment situations; however due to insufficient sample volume of these studies and poor external validations, few of these reference RNAs were widely accepted by the scientific community. Spike-in exogenous RNAs to EVs such like cel-miR-39 [30][31][32] as external references to partially eliminate deviations from experimental processes was also a choice. However, sample specific deviations couldn't be corrected by those exogenous calibrators.
Here we proposed an RNA-pair ratio-based algorithm to quantitate plasma EVs-derived RNAs. The ratio of any two candidate RNAs were calculated in the same sample, then ratios between two RNAs were treated as new variables to construct a ratio-based prediction system by three machine learning algorithms. Subsequently, we validated these prediction models internally and externally, and found they were capable to predict PDAC occurrence with high AUC levels no matter in NGS data or RT-qPCR data.
The innovation points of this study lie on several aspects. Our major breakthrough is making the clinical application of diagnosis biomarker much easier. By using RT-qPCR method other than NGS or RNA chip, we reduced the cost of cancer screening. By using this normalizer-free algorithm, our model achieved high accuracy no matter in the NGS data or RT-qPCR data, which means we avoided the unstable factors induced by controversial reference genes and made RT-qPCR assays stable and reproducible. Cost-effective and high accuracy made our algorithm easier for clinical application. Moreover, three machine learning methods including lasso regression, SVM and logistic regression were simultaneously applied to assess the robustness of prediction model, making the result more convincing. Cross-platform property also enhanced the reproducibility of this research. Last but not least, our innovation is localized in methodology level and could be helpful to researchers intending to discover reliable biomarkers for other cancer types. Mattia Boeri etc. [11] firstly used RNA ratio signatures to predict patients' prognosis with minimum AUC level of 0.85. However, due to its sufficient sample volume and no mathematically proofs were given, it was still hard for researchers to understand this method. Deng etc. [33] firstly mathematically verified RNA ratio methods. Their findings showed that RNA ratio could reflect true fold change between 2 RNAs by eliminating disturbance of various reference RNAs because expression levels of two RNAs were simultaneously measured under same conditions. Moreover, they also found that RNA ratios method was independent of spiked-in and internal controls, and was more robust than internal or external reference RNAs by comparing their diagnostic capacity, which gave this article a robust theoretical basis.
We identified 8 biomarkers to predict PDAC, i.e. FBXO7, MORF4L1, DDX17, TALDO1, AHNAK, TUBA1B, CD44, and SETD3. Among them, CD44 was proved to be involved in biological process as cell-cell interactions, cell adhesion and migration, which was also enriched in GO analysis of this study [34,35]. In recent years, researchers found CD44 was a biomarker of PDAC cancer stem cells [36], and could reflect initiation and metastasis of PDAC, which made CD44 a potential biomarker to predict PDAC. AHNAK is a large scaffold protein and participates in focal adhesion and cell-substrate junction [37,38], which is also enriched in GO analysis. AHNAK is proved to be associated with poor prognosis of PDAC patients through epithelial-mesenchymal transition, and could be a biomarker to predict PDAC patients' outcome [39].
FBXO7 is a component of ubiquitin-protein ligase complex and plays vital roles in mediating ubiquitination and proteasomal degradation of proteins [40][41][42]. However, its roles in predicting PDAC remains unknown. Most of these biomarkers were related to tumorigenesis or metastasis. Biological or pathological functions of them were still unclear, further experiments would be performed to explore their roles in EVs and their relationship with PDAC.

Conclusion
In conclusion, we firstly developed a qPCR-based, normalizer-free circulating EVs RNA classifier, which could well distinguish PDAC patients from noncancerous controls. After validation of both NGS data and RT-qPCR data of our independent cohort, we rigorously confirmed the robustness and cross-platform stability of this approach. The gene-pair focused methodology we established here would also be helpful to find more qPCR-based, normalizer-free models from the public available EVs RNA databases in the studies of other cancer types.