Profiling of polar urine metabolite extracts from Chinese colorectal cancer patients to screen for potential diagnostic and adverse-effect biomarkers

Background: Metabolomics has demonstrated its potential in the early diagnosis, drug safety evaluation and personalized toxicology research of various cancers. Objectives: We aim to screen for potential diagnostic and capecitabine-related adverse effect (CRAE) biomarkers from urinary endogenous metabolites in Chinese colorectal cancer (CRC) patients. Methods: The metabolic profiles of 139 CRC patients and 50 non-neoplastic controls were analyzed using ultra-high-performance liquid chromatography combined with quadrupole time-of-flight mass spectrometry. Results: There were 41 metabolites identified between the CRC patients and the non-neoplastic controls, and 19 metabolites were identified between CRC patients with and without CRAE. Based on these identified metabolites, bioinformatic analysis and prediction model construction were completed. Most of these differential metabolites have important roles in cell proliferation and differentiation and the immune system. Based on binary logistic regression, a CRC prediction model, composed of 3-methylhistidine, N-heptanoylglycine, N1,N12-diacetylspermine and hippurate, was established, with an area under curve (AUC) of 0.980 (95% CI: 0.953-1.000; sensitivity: 94.3%; specificity: 92.0%) in the training set, and an AUC of 0.968 (95% CI: 0.933-1.000; sensitivity: 89.9%; specificity: 92.0%) in the testing set. In addition, methionine and 4-pyridoxic acid can be combined to predict hand foot syndrome, with an AUC of 0.884; ubiquinone-1 and 4-pyridoxic acid can be combined to predict anemia, with an AUC of 0.889; and 5-acetamidovalerate and 3,4-methylenesebacic acid can be combined to predict neutropenia, with an AUC of 0.882. Conclusion: The profiling of urine polar metabolites has great potential in the early detection of CRC and the prediction of CRAE.


Introduction
Colorectal cancer (CRC) is one of the most common malignancies worldwide, with an estimated 1.4 million new diagnosed cases and 693,900 death cases in 2012 [1]. Over the last few years, with the changes of risk factors and the introduction of early screening, the incidence rates and death rates of CRC Ivyspring International Publisher have declined in the United States [2]. However, the incidence and mortality rates are increasing rapidly in developing countries like China [3]. Although the 5-year survival rate of stage I patients can reach nearly 90%, the rate of stage IV patients is only 12% [4]. Thus, the early detection of CRC is of central importance to improve overall survival rates. Colonoscopy, which is currently the gold standard for CRC diagnosis, is invasive and uncomfortable [5]. Computed tomography colonography (CTC) is an accurate and reliable diagnostic technique, but its high cost has always been a problem. Fecal occult blood testing (FOBT), as well as other noninvasive and inexpensive plasma biomarkers, such as carcinoembryonic antigen (CEA), carbohydrate antigen 19-9 (CA19-9) and SEPT9 gene methylation, are the main screening methods. However, their sensitivity and specificity are relatively poor, and screening with these biomarkers can easily miss asymptomatic patients. Therefore, simple, noninvasive, highly sensitive and specific biomarkers are urgently required for the early diagnosis of CRC.
Metabolomics, which is the comprehensive study of low molecular weight metabolites and potentially offers phenotypic information not captured by genetic profiling, has become the focus of modern systematic biology [6]. It has demonstrated its potential in the early diagnosis, drug safety evaluation and personalized toxicology research related to various cancers [6,7,8], including CRC [9]. To date, by identifying the metabolic profiles in blood, urine, stool and tissue samples between CRC patients and healthy counterparts, significant variations have been revealed, and a number of candidate biomarkers identified [9]. However, none of these biomarkers have entered into clinical practice.
There has also been research on the application of metabolomics for prediction of drug-induced adverse effects (AEs) [10,11,12,13,14]. Studies of metabolic biomarkers of oncology are relatively rare. According to the National Comprehensive Cancer Network (NCCN) guideline (2016), the first-line CAPEOX protocol, containing capecitabine and oxaliplatin, is usually used for both early postoperative adjuvant chemotherapy and advanced palliative chemotherapy. However, AEs remain the major limitation in treatment, especially for bone marrow suppression (BMS) and hand foot syndrome (HFS). Both BMS and HFS were selected as AEs for analyses in this study, since our previous clinical observation and literature research showed that these two AEs have the highest incidence rates [15].
In this article, a urinary metabolomics study was conducted on a cohort of CRC patients (n = 139) and non-neoplastic control subjects (n=50) using ultra-high-performance liquid chromatography combined with quadrupole time-of-flight mass spectrometry (UHPLC-Q-TOF-MS). The purpose of this study is to screen endogenous metabolite biomarkers, and to establish prediction models for CRC diagnosis and capecitabine-related AEs (CRAEs).

Clinical samples
The 139 patients were 36-87 years old and diagnosed with CRC (72 colon cancers and 67 rectal cancers). They were selected from a registered ongoing clinical trial at Shanghai Changzheng Hospital (code at www.clinicaltrials.gov, NCT03030508) from June 2016 to June 2017. The ethical approval for the study was granted by Shanghai Changzheng Hospital Biomedical Research Ethics Committee (approval number: 2016SL007). Recruited subjects in CRC patients were (1) over 18 years old and (2) diagnosed with CRC by biopsy examination. Patients with any preoperative anti-neoplastic medication were excluded. Clinical information was obtained from the hospital and provided in Table S1. The 50 non-neoplastic controls were aged 47-89 years. They were without any known inflammatory condition or gastrointestinal tract disorders, and were enrolled after a routine physical examination. The age and sex of the controls were equivalent to those of the CRC patients (Table S1). Prior to sample collection, a written informed consent was obtained from each patient.
To ensure the effectiveness of the CRC diagnostic model, all samples were randomly divided into a training set and a test set with a ratio of 1:1 using Excel (Microsoft, USA). The two sets were well-matched between CRC patients and control groups in age and sex (Table S1). Among the 139 CRC patients, 43 had received capecitabine-based adjuvant chemotherapy. For these patients, HFS and BMS (including anemia, neutropenia and thrombocytovpenia) were followed-up and graded according to Common Terminology Criteria for Adverse Events (Version 4.0) (Table S2) (2010). These patients were divided into AE and no-AE groups, respectively. Student's t test showed no significant difference in age and chemotherapy cycle between the two groups.
Chi-square test showed no difference in sex, and Mann-Whitney test showed no difference in the pathological stage, between CRAE and no-CRAE groups (Table S3).
All urine samples were collected from the Department of General Surgery in Shanghai Changzheng Hospital. A 12-mL urine sample was collected into a Falcon tube 1-3 days before surgery with an empty stomach, followed by adding 1 mL of protease inhibitor mixture (0.4 mL of 100 mM NaN3, 0.6 mL of 10 mM phenylmethylsulfonyl fluoride and 50: l of 1 mM leupeptin) [16]. Then, the samples were stored at −80°C.

Sample preparation
Since urinary metabolites were concentrated in polar metabolites and contained only few non-polar metabolites, we focused on polar metabolites in this study by separating the polar content and using a separation column specialized for polar metabolites. A volume of 10 μL of urine from each sample was mixed and used as the quality control (QC). The QC sample was made for testing the instrument state, equilibrating the UHPLC-Q-TOF-MS system before sample injection and indicating system stability during the batch analyses [17]. Subsequently, the polar metabolites were extracted from 200 μL of urine sample or QC sample with 800 μL of chloroform/ methanol (2:1, v/v) spiked with 0.2 μg/mL L-2-chlorophenylalanine as the internal standard in a fume hood. Then, the mixture was vortexed for 1 min and centrifuged at 15,000 × g for 10 min at 4°C to remove protein and split the polar (supernatant layer) and non-polar metabolites (lower phase). An aliquot of 300 μL from the supernatant was transferred to a 1.5-mL EP tube, mixed with 900 μL of methanol and centrifuged at 12,000 g for 10 min at 4°C. Next, 900 μL of the supernatant was lyophilized. The lyophilized sample was resuspended in 900 μL of acetonitrile, and stored at -80°C. Stored samples were thawed at 4°C before analysis. Finally, 200 μL of the solution was transferred to a plastic insert within a sampler bottle for injection in the UHPLC-Q-TOF-MS system.

UHPLC-Q-TOF-MS analysis
Sample analysis was performed on an Agilent 1290 ultra-high-performance liquid chromatography system (Agilent Technologies, Santa Clara, CA, USA) coupled with an Agilent 6530 Accurate-Mass Q-TOF LC/MS system (Agilent Technologies) in positive Dual Agilent Jet Stream Electrospray Ionization (Dual AJS ESI) mode (Agilent Technologies). The mobile phases A and B were water with 0.1% v/v formic acid and acetonitrile with 0.1% v/v formic acid, respectively. The column was a 2.1 × 100 mm, 3.5 μm, HSS T3 column (Waters, Manchester, UK) and the temperature was kept at 30°C. The gradient started with 5% B, increased to 20% at 6 min, 50% at 9 min, 95% at 13 min, 100% at 15 min, followed by a post-run of 5 min. The flow rate was maintained at 0.4 mL/min. The injection volume was 3 μL. The capillary voltage was 3500 V, and the nozzle voltage was 500 V. The gas temperature was set at 300°C with a gas flow of 11 L/min and nebulizer pressure of 35 psi, and a sheath gas temperature of 300°C with a sheath gas flow of 11 L/min. For MS acquisition, centroid data were acquired from 100 to 1100 m/z at 0.5-s intervals. For MS/MS acquisition, data were acquired at 0.33-s intervals with collision energy 0, 10, 20 and 40 eV. A reference solution (m/z 121.0509 and m/z 922.0098) was used to correct small mass drifts during the acquisition [17]. The QC samples were injected at the beginning of the run and after every eight samples during sequence analysis to assess the analytical performance [18].

Data analysis
The acquired MS data were analyzed using the Profinder program (Version b8.0, Agilent Technologies). After integration and alignment, a list of spectral features was obtained with the retention time (RT), m/z and spectral area by recursive feature extraction. The spectral features generated by the internal standard, noise and column bleed were removed from the dataset. Then, the integration results were manually checked before they were transferred to the Mass Profiler Professional program (Agilent Technologies) for subsequent analysis. The background and non-biologically relevant information were eliminated according to the 80% rule [19], which means only spectral features with a frequency ≥ 80% in the CRC patient or control groups were kept. Then, these spectral features were normalized using the sum intensity of each feature in each sample.
Soft Independent Modelling by Class Analogy 14.0 (SIMCA, Umetrics AB, Umeå, Sweden) and SPSS version 17.0 (SPSS Inc., Chicago, IL, USA) were used for further analyses. A P-value of less than 0.05 was considered significant. Principal Component Analysis (PCA) was applied to examine data distribution, and for a comprehensive understanding of the metabolic profile.
Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA) was carried out to focus on clustering information and visualize the metabolic alterations. Multivariate statistical analysis in SIMCA 14.0 was used to analyze the complex metabolomics. Criteria for potential biomarkers were a coefficient of variation (CV) < 30% in QC samples. The affected metabolic pathways were examined by Metabolic Sets Enrichment Analysis (MSEA) in MetaboAnalyst 4.0. Student's t test was performed between the two groups to select biomarker candidates. Spectral features with a low P-value (< 0.05) and a high fold of change (FOC ≥ 2) in Student's t test, or with the value of variable importance in the projection (VIP) more than 1 in the OPLS-DA model were added to the candidate list for further metabolite identification. These metabolites were identified by an integrated method which included comparing to commercially approached standards and the web-based spectrum databases such as the Human Metabolite database (http://www.hmdb.ca/) and METLIN (http://metlin.scripps.edu/) [21,22].
Then, binary logistic regression was applied to combine several variables into a multivariable, using a stepwise variable selection method. Receiver operating characteristic (ROC) curve analysis was performed to evaluate the predictive ability of each identified metabolite and the combinational multivariable.

Urinary metabolic profiling
Typical total ion current (TIC) chromatograms of the metabolic profiles are shown in Figure S1. All pooled QC samples were used to monitor the system stability and data reliability for peak intensity (<30% CV) and RT (<20% CV). After manually checking, the metabolomics data revealed 1114 peaks of polar compounds detected by Q-TOF LC/MS and 583 peaks were screened by the 80% rule.
A PCA model (two components, R 2 X cum = 0.366 and Q 2 cum = 0.338) with unit variance (UV) scaling and an OPLS-DA model (one predictive component and three orthogonal components, R 2 X cum = 0.306, R 2 Y cum = 0.963 and Q 2 cum = 0.89) based on Pareto Variance (Par) scaling were established using the 570 spectral features. The QC samples showed tight clustering but separation between control and patients was not clear in the PCA ( Figure S2). This separation was more obvious in the OPLS-DA model ( Figure 1A). A 999-time permutation test was performed to evaluate the PLS-DA model. The R 2 Y-and Q 2 -intercepts were 0.692 and 0.412, respectively ( Figure 1B). The validation plots from permutation tests strongly supported the validity of the established OPLS-DA model because all permuted R 2 and Q 2 values on the left were lower than the original point on the right, and the Q 2 regression line in blue had a negative intercept.
Different metabolites between controls and CRC patients were identified using Student's t test (P ≤ 0.05 and FOC ≥ 2), or VIP ≥ 1 in the OPLS-DA model. A total of 281 compounds were screened and 41 metabolites were identified by comparing metabolomic databases (Table 1, Table S4). Subsequently, 19 differential identified metabolites were found to be related to CRAE based on Mann-Whitney tests ( Table 2).

Metabolic pathway analyses
In order to understand the significant differences in the metabolic networks between the CRC patients and the controls, the 41 CRC related metabolites identified were submitted to the CPDB website (http://cpdb.molgen.mpg.de/) for metabolic pathway enrichment analysis. This analysis was also repeated for the 19 CRAE related metabolites. The MSEA results are shown in Tables 3 and 4. There were 15 CRC related and 10 CRAE related metabolic pathways enriched. For the CRC related metabolic pathways, majority of them are related to the synthesis and catabolism of some of the basic metabolites such as basic carboxylic acid and amino acids. These metabolic pathways include glucose homeostasis, conjugation of carboxylic acids, amino acid conjugation and etc. Some of these changes may be related to abnormal DNA synthesis, since one carbon metabolism and related pathways and folate metabolism were also enriched. Beside these, vitamin B12 metabolism was also found related to CRC, this may indicate that abnormal in inflammation response or immune system may also be related to the susceptibility of CRC.     On the other hand, CRAE related metabolic pathways are similar to the CRC related metabolic pathways. Pathways including conjugation of carboxylic acids, amino acid conjugation, glucose homeostasis indicate altered fundamental synthesis and catabolism of some of the basic metabolites. Pathway such as B12 metabolism indicates that abnormal in inflammation response or immune system may also be related to the susceptibility of CRAE as well.

Construction and validation of a diagnostic biomarker metabolite system
Validation of the CRC diagnostic model was performed by randomly choosing 50% of the samples to create a training-test set. The two sets were well-matched between CRC patients and control groups in age and sex (Table S1). A training set was used to evaluate the validation and predictive ability of identified metabolites to construct a diagnosis marker system for potential clinical application. Based on their high FOC, AUC and VIP values, four metabolites were selected as a panel of candidate markers: methylhistidine, N-heptanoylglycine, N 1 , N 12 -diacetylspermine and hippurate. A binary logistic regression model was applied to combine the four variables into a multivariable model. The ROC curve showed that the training set had an AUC value of 0.980 (95% CI: 0.953-1.000; sensitivity: 94.3%; specificity: 92.0%), and the testing set had an AUC value of 0.968 (95% CI: 0.933-1.000; sensitivity: 89.9%; specificity: 92.0%) (Figure 2A). The relative concentrations of these metabolites in urine samples of CRC patients and non-neoplastic controls are shown in Figure 2B. Spearman's rank correlation coefficient test showed that the concentrations of these metabolites were not related to the pathological stages (P > 0.05) (Figure 3).

Biochemical functions of CRC-related differential metabolites
Like most other cancers, CRC has an uncontrolled cell cycle progression, rapid growth rate, loss of contact inhibition, increased glycolysis and a triggered host immunological response. As a result, CRC patients had a differential plasma metabolic profiling compared to the non-neoplastic controls. Therefore, metabolic analyses and metabolites can indicate potential diagnostic markers and help to reveal the underlying mechanisms of cancer development and drug metabolism [9,21,22,23,24,25,26,27,28]. Some of the differential compounds identified in the CRC patients here might result from the rapid metabolic rate and altered energy metabolites of cancer cells. The CRC patients showed abnormal Glucose Homeostasis. The significantly decreased urine glucose level (Table 1) is consistent with one previously published study [29]. This might indicate elevated glucose consumption in CRC patients compared to the controls.
The N 1 , N 12 -diacetylspermine is a constituent polyamine in human urine [30]. Polyamines are indispensable in cell growth, gene expression and cell proliferation [31]. Rapidly growing cells, such as cancer cells, generally have increased intracellular polyamine levels and actively metabolize polyamines. An elevated N 1 , N 12 -diacetylspermine level may indicate rapid proliferation of cancer cells themselves [32], and has been reported as a more sensitive biomarker than CEA, CA19-9 or CA15-3 for CRC diagnosis at early stages [30,31,32,33].
Methionine is an essential amino acid, involved in the pathways for glucose homeostasis, vitamin B12 metabolism, amino acid metabolism, central carbon metabolism in cancer, one carbon metabolism, and folate metabolism. Methionine metabolism is relevant for cancer pathogenesis including methylation reactions, redox maintenance, polyamine synthesis and coupling to folate metabolism to coordinate nucleotide and redox status [34]. One carbon metabolism and Folate Metabolism are involved in regulation of the genetic process from DNA synthesis to cell migration, proliferation, differentiation and apoptosis [35,36]. Down-regulation of methionine may indicate increased protein biosynthesis in cancer cells. Since methionine also plays a role in DNA methylation by providing methyl groups, overconsumption of methionine for protein biosynthesis may cause overall DNA hypomethylation, which could reduce DNA stability and trigger CRC development [37,38]. Compared to the non-neoplastic controls, CRC patients normally have lower methionine levels both in serum [28] and urine [25], but higher levels in tissues [21]. High plasma concentration of methionine is a marker of low CRC risk [39]. Despite the role in protein biosynthesis, down-regulated methionine level in CRC may indicate a low level of auto-inflammation, which is closely related to antioxidant defenses in some organs [40,41,42].
Methylhistidine is a result of excessive protein catabolism, and down-regulated methylhistidine may also indicate overall increased protein biosynthesis in CRC patients. A study of 63 CRC patients' urine metabolites also supported down-regulated histamine metabolism [43]; however, one study showed that neither urinary 1-methylhistidine nor 3-methylhistidine was associated with colorectal adenoma in a single urine sample, but worthy of further investigation in considering multiple urine samples [44]. Similarly, 5-acetamidovalerate is a product of lysine catabolism [45]. Both alanylasparagine and glutamylproline are dipeptides. They are the products of incomplete catabolism of large proteins. All of these metabolites are down-regulated in CRC patients compared with controls. This indicates the protein thesis was elevated by CRC.
Phenylacetylglutamine (PAG) is a common metabolite of fatty acids with low abundance. It is a colonic microbial metabolite from amino acid fermentation, generated from glutamine conjugation of phenylacetic acid almost exclusively derived from the microbial conservation of phenylalanine, constituting phenylacetate metabolism, which provides a route that facilitates the excretion of nitrogen for patients with urea-cycle defects. Compared to the controls, CRC patients had lower PAG in this study, which may derive from down-regulated phenylalanine metabolism and glutamine metabolism related to gut flora metabolism [25].
Hippurate and its metabolite hydroxyhippurate are normal constituents of endogenous urinary metabolites, generated from microbial degradation of certain dietary components including phenylalanine.
As a downstream product of phenylalanine, a decreased level of hippurate also indicates downregulated phenylalanine metabolism [25].
Other differential compounds identified might indicate the different inflammatory response and the degradation of fatty acids between CRC patients and non-neoplastic controls. N-Heptanoylglycine contains a C-7 fatty acid group as its acyl moiety, which is a minor metabolite of dietary fatty acid. Elevated levels of certain acylglycines in urine and blood may indicate patients with various fatty acid oxidation disorders.
The lower serum level of coenzyme Q (CoQ) was reported and speculated to be associated with CRC progression [28]. Ubiquinone-1 is an intermediate in CoQ synthesis and could act as an antioxidant. The CoQ can suppress fat-induced colon carcinogenesis as an antioxidant [46] and the level of CoQ was also reported to be negatively correlated with redox status [47].
As the main catabolic product of vitamin B6, urinary 4-pyridoxic acid level is significantly associated with the circulating level of vitamin B6 [48]. Vitamin B6 itself is only modestly associated with inflammation; however, the PAr ratio [4pyridoxic acid/ (pyridoxal + pyridoxal 5′-phosphate)] is an indicator of vitamin B6 catabolism during inflammation, which is also a risk factor for carcinogenesis [49,50].
Here, the elevated inflammatory status in CRC patients is consistent with the changes of metabolites arising from bacterial protein catabolism, particularly the tryptophan metabolism [51,52,53]. Tryptophan and its bacterial metabolites play various roles in the balance between immune tolerance and gut microbiota maintenance. The relationship between bacterial tryptophan metabolism and immune response has been described in detail by a recent review [53]. Indole is formed in intestines from tryptophan, and then it is transferred into indoxyl in the liver [54,55]. The serum concentration of indoxyl has been found to decrease in azoxymethane/dextran sodium sulfate (AM/DSS)-induced colon cancer mice [56]. In line with this result, this decreased urinary level was observed in our CRC patients.
Tryptophan can be converted into indole pyruvic acid by aromatic amino acid aminotransferase, which can be further converted into indole acetaldehyde, and then into indole acetic acids (e.g. indole-3-acetic acid [IAA]). Its level in CRC tissues was found to be significantly decreased compared with the normal tissues [57]. Consistently, the urinary level of tryptophan in our CRC patients was also decreased. Pyruvic acid can also be converted into indole acrylic acid, and then finally into indolylacryloylglycine (IAcrGly) through a few enzyme-controlled steps. IAcrGly is one of the physiological components in urine. It was hypothesized that abnormal gut flora could promote the conversion of tryptophan to indolyl propionic acid, which could cause an increased IAcrGly level in urine [58,59]. The trans-verse situation might be true for patients with bladder or CRC. It has been qualified as a part of model of bladder cancer grading distinction. As the increase of pathologic stage malignant degree of gallbladder, the concentration of IAcrGly decreased in high-grade bladder cancer compared with low-grade bladder cancer [60]. Herein, a significant decrease of urinary concentration of IAcrGly in CRC patients was also found. Decreased IAcrGly alone is also a sign of elevated inflammation response, since it is closely associated with introduced oxidative damage by adulterants and elevated oxidative damage is one of the CRC's characteristics [61,62]. Taken together, the urinary levels of IAA, indoxyl, and IAcrGly were all down-regulated in our CRC patients, which suggested a suppressed production of indole pyruvic acid and its derivatives. This may also be contributed by overexpressed indoleamine 2,3-dioxygenase that depletes tryptophan in CRC [63]. Metabolites of indole pyruvic acids including IAA, indoxyl, and IAcrGly are ligands to aryl hydrocarbon receptor (AHR), a transcriptional regulator for intestinal innate immunity and inflammation in the colitis-associated tumorigenesis. These metabolites are beneficial for colon by suppressing inflammation and carcinogenesis [64,65]. The down-regulated IAA, indoxyl, and IAcrGly in our CRC patients compared with the controls may indicate the elevated inflammation response and induced carcinogenesis.
It is worth mentioning that another bacterial tryptophan metabolite N-acetyltryptophan (NAT) was found to be up-regulated in our CRC patients compared with the controls. Its upregulation was also reported in the case of compromised gut microbiota [66,67]. NAT can prevent protein molecules from oxidative degradation by scavenging oxygen [68]. In this study, the up-regulation of NAT further confirmed the development of imbalanced bacteria in CRC, but its physiological function in CRC still needs to be investigated.
In conclusion, based on the urinary metabolomic profile, the CRC patients showed elevated protein metabolism rate, induced inflammation response, and possibly increased energy consumption, compared with the controls.

Biochemical functions of CRAE-related differential metabolites
The pharmacological process of capecitabine has been fully reviewed in both in vivo and in vitro studies. DNA polymorphism [69,70,71], DNA methylation differences [72] and pharmacokinetic measurements [73] that could reflect the pharmacological process of capecitabine have been used to predict CRAE. Some of them have already been proved by prospective clinical research [71]. However, the pharmacological process only determines the local level of capecitabine-related cytotoxicity. In addition to this, how DNA replication, cellular proliferation, cellular apoptosis and immunology systems of normal tissue cells respond to the cytotoxicity may also contribute to the susceptibility to CRAE.
According to the literature [74,75] and our ongoing observational clinical trial [15], BMS and HFS are the two most frequent CRAEs, which severely limit the usage of capecitabine. The BMS contains three sub-types of AEs: anemia, thrombocytopenia and neutropenia. The direct cause of BMS is suppressed blood cell formation, which is a multistep process that starts from differentiation of hematopoietic stem cells and ends with the formation of types of blood cells [76,77]. It is tightly regulated by signaling mediators, growth factor receptors and transcriptional factors involved in cell proliferation and differentiation [78]. The direct cause of capecitabine-related-HFS is a type of inflammation response mediated by COX-2 over-expression in the palm and plantar [79]. Therefore, differential metabolites related to cell proliferation, differentiation and immunological response might be potential markers of CRAE.
To date, there are only a few published literatures that apply metabolomics to investigate markers for CRAE. One previous study showed that higher levels of low-density lipoprotein prior to treatment could predict higher grade toxicity for advanced CRC patients who received single-agent capecitabine [80]. Abnormally high level of lowdensity lipoprotein alone is a hazard factor for immunological response [81].
Consistent with our theory, levels of N 1 , N 12diacetylspermine were down-regulated in patients who developed HFS compared to those had not. This may indicate faulty DNA synthesis. In addition, a number of indicators and mediators of inflammation response were consistently altered in patients with CRAEs. These included up-regulated 4-pyridoxic acid and down-regulated methionine and methylhistidine. Interestingly, the differential inflammation responses were also revealed by metabolites from bacterial tryptophan catabolism. We observed relatively lower levels of IAA and indoxyl in CRC patients susceptible to anemia, and lower levels of IAcrGly in CRC patients susceptible to HFS. Since both IAA and IAcrGly can activate AHR that exert protective effects on autoimmune inflammation [82,83,84,85], the urinary levels of which are mediated by tryptophan metabolism and gut microbiota, we speculate that the altered gut microbiota may also be an important factor for the susceptibility to CRAE. In summary, urinal metabolomics is affected by the health condition of individual, including proliferation, differentiation, and inflammation. It suggests that CRC patients who are susceptible to CRAEs may have faulty proliferation, differentiation, and induced inflammation.

Summary and future directions
The main strength of this study is that it explored CRAE-related metabolites for the first time. A number of metabolites were identified and a potential CRAE predicting model was generated. We also identified CRC-related metabolites. Based on these metabolites, a diagnostic model was generated and verified.
However, several limitations of this study have to be mentioned and considered for future analysis. First, the sample population was small. Our patients were exclusively enrolled from one clinical center and the majority were from the south-east part of China. Second, although the patients and controls had equivalent age and sex, other intrinsic and environmental factors with possible influence were not assessed. Third, because of the small sample size of the CRC patients, internal replication was not used for CRAE prediction models. The reliability of these models will need to be tested using a larger population. Fourth, only positive results were compared with other positive results from the literature. Ideally, we should have also compared our results with other negative results; however, since not many studies report negative results, there is no symmetrical way to do this. Therefore, our results may be found to be negative by others. For example, in our study, methylhistidine was not associated with CRC, unlike the report by Cross et al. [44].
In summary, comparing CRC patients and non-neoplastic controls, and CRC patients with and without CRAEs, differential metabolites revealed changes in cell differentiation and immune response. We speculate that induced proliferation of cancer cells and altered immune response were associated with the specialized metabolic profile of CRC patients. However, faulty cell proliferation, cell differentiation, potential metabolic pathways and excessive immune response may make the CRC patients more susceptible to CRAEs.

Conclusions
Based on urinary metabolic profiles, we identified a number of metabolic pathways associated with CRC and CRAE. Most of these differential metabolites have important roles in cell proliferation, differentiation and immune response. We also constructed a series of biomarker systems for CRC diagnosis and CRAE prediction.

Acknowledgments
We thank our colleagues from the Department of Pharmacy and the Department of Surgery of Changzheng Hospital for sharing their pearls of wisdom with us during the course of this research, and we thank application engineers from Agilent for their suggestions and comments on experimental methodology.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Author contribution
WSC, MML, and WC designed the study; HSY collected clinical samples; YD, and HSY, and WC performed the experiments and analyzed the data; HW, FZ, JZ, HM and HSY contributed to the logistics and optimization of the untargeted metabolomics; XXL, YD, CJ, HM, FZ, and XT contributed to figure and table productions; YD and WC drafted the manuscript. YD, HSY, MML, and WSC amended and finalized the manuscript. All authors read and approved the final article.

Ethical approval
All procedures performed in studies involving human participants were in accordance with ethical standards for the institutional research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Informed consent
Informed consent was obtained from all individual participants included in the study.