Systematic analysis of genetic variants in cancer-testis genes identified two novel lung cancer susceptibility loci in Chinese population

Cancer-testis (CT) genes played important roles in the progression of malignant tumors and were recognized as promising therapeutic targets. However, the roles of genetic variants in CT genes in lung cancer susceptibility have not been well depicted. This study aimed to evaluate the associations between genetic variants in CT genes and lung cancer risk in Chinese population. A total of 22,556 qualified SNPs from 268 lung cancer associated CT genes were initially evaluated based on our previous lung cancer GWAS (Genome-wide association studies) with 2,331 cases and 3,077 controls. As a result, 17 candidate SNPs were further genotyped in 1,056 cases and 1,053 controls using Sequenom platform. Two variants (rs6941653, OPRM1, T > C, screening: OR = 1.24, 95%CI: 1.12-1.38, P = 2.40×10-5; validation: OR = 1.18, 95%CI: 1.01-1.37, P = 0.039 and rs402969, NLRP8, C > T, screening: OR = 1.15, 95%CI: 1.04-1.26, P = 0.006; validation: OR = 1.16, 95%CI: 1.02-1.33, P = 0.028) were identified as novel lung cancer susceptibility variants. Stratification analysis indicated that the effect of rs6941653 was stronger in lung squamous cell carcinoma (OR = 1.36) than that in lung adenocarcinoma (OR = 1.15, I2 = 77%, P = 0.04). Finally, functional annotations, differential gene expression analysis, pathway and gene ontology analyses were performed to suggest the potential functions of our identified variants and genes. In conclusion, this study identified two novel lung cancer risk variants in Chinese population and provided deeper insight into the roles of CT genes in lung tumorigenesis.


Introduction
Lung cancer has been the most frequently diagnosed caner type and the leading cause of cancer-related deaths for decades in China [1]. The tumorigenesis of lung cancer was a multiple-stage process, and both environmental and genetic factors were involved. It was estimated that the heritability of lung cancer was about 15.2% in Chinese population [2]. However, up till now, GWAS (Genome-Wide Association Study)-reported lung cancer associated single nucleotide polymorphisms (SNPs) could only Ivyspring International Publisher account for limited lung cancer heritability (less than 1%) [2,3]. Therefore, more effective strategies were wanted to identify novel lung cancer risk loci based on GWAS data.
Epigenetic alterations have been recognized as an important feature of tumorigenesis [4,5]. Notably, cancer-testis (CT) genes, which were restrictedly expressed in germ cells and malignant tumor cells, were usually activated through epigenetic mechanisms [6]. The activation of CT genes in cancer samples made them oncogene candidates and their remained high immunogenicity made them perfect immunotherapeutic targets for cancer treatment [7,8]. In addition, associations between genetic variants in CT genes and the susceptibility of cancers have been described in previous studies. For example, genetic variants in HORMAD2 and GPATCH2 were reported associated with lung cancer risk [9,10], and variants in CTNNA2, CCDC33 and SPAG17 showed significant association with the susceptibility of breast cancer [11,12]. All these studies suggested that genetic variants in CT genes could also contribute to the development of cancers.
Therefore, systematic analysis of the associations between genetic variants in CT genes and lung cancer risk could help identify more novel lung cancer susceptibility loci. In our previous study, we performed a systematic identification of CT genes in 19 cancer types based on multiple public-available databases [13]. As a result, 876 novel CT genes in 19 cancer types were recognized. In lung cancer, we identified 268 CT genes (including 61 known CT genes) that were activated in at least 2% of cancer samples [13]. This finding provided us an unprecedented opportunity to explore the associations between genetic variants in CT genes and the susceptibility of lung cancer.
In this study, a two-stage case control study was performed. The NJMU GWAS, which has been established in our previous study, was used to screen candidate lung cancer risk variants [9]. These promising variants were further validated in an independent Chinese population with a total of 1,056 lung cancer cases and 1,053 controls based on the Sequenom MassARRAY iPLEX platform. This study would identify novel lung cancer susceptibility loci in Chinese population and help reveal the roles of genetic variants in CT genes in the development of lung cancer.

Study subjects
Two independent datasets were used in this study. The NJMU GWAS contained 2,331 lung cancer cases and 3,077 controls, and was used as screening dataset. The detailed information about the study subjects in NJMU GWAS was described previously [9]. Genotype imputation for NJMU GWAS was performed using IMPUTE2 and Shapeit v2 with the 1000 Genomes Project (the Phase III integrated variant set release, across 2504 samples) as the reference [14].
The validation dataset consisted of 1,056 lung cancer cases and 1,053 controls. All the cases were histopathologically confirmed patients and were recruited from the First Affiliated Hospital of Nanjing Medical University and Jiangsu Cancer Hospital. Controls were obtained from a screening program for non-infectious diseases conducted in Jiangsu Province and matched to the cases for age and gender. All the participants have signed the informed consent acknowledgement and this study was approved by the ethical review board of Nanjing Medical University.

Screening for candidate risk variants in CT genes
As shown in Figure 1, genetic variants in 268 lung cancer associated CT genes (Supplementary Table 1, including the 10 kb upstream and downstream of these genes) were extracted based on NJMU GWAS. Initially, 71,008 SNPs were reserved. Further, SNPs satisfied any of the following criteria were excluded: (1) imputation quality score INFO < 0.8; (2) minor allele frequency (MAF) < 0.05; (3) hardy-weinberg equilibrium (HWE) < 0.05. This resulted in 22,556 qualified SNPs reserved for association analysis. Among them, 96 SNPs showed significant association (P < 0.01) with lung cancer risk in NJMU GWAS. Furthermore, SNPs that were located in previously reported regions or showed a high correlation (r 2 > 0.3) with each other (SNPs with the least P value were reserved) were excluded. Finally, 20 independent SNPs were retained as lung cancer risk variant candidates.

Variants validation using Sequenom platform
In this study, genotyping in the validation stage was performed using the Sequenom MassARRAY iPLEX platform (Sequenom Inc. San Diego, CA, USA) according to the protocol [15,16]. Primers for 17 out of 20 candidate SNPs were successfully constructed. In particular, rs144031443 and rs150492976 were replaced by rs75932085 (r 2 = 0.66, Chinese Han population) and rs4726004 (r 2 = 1), respectively. The genotyping experiment was performed by technicians who were blinded to the status of case or control. Cases and controls were mixed in each 384-well plate with two water samples as blank controls. In addition, 5% of samples were randomly selected for repeat genotyping, which yielded a concordance rate of 100%. Primer sequences used in this study were shown in Supplementary Table 2.

Statistical analysis
The chi-square test was used for the comparison of categorical variables between cases and controls, while the Student's t test was adopted for continuous variables. Associations between genetic variants and lung cancer risk were evaluated (Odd Ratios, OR and 95% confidence intervals, 95%CI) using logistical regression analysis. Age, sex, smoking status and principal components were adjusted. The differential gene expression analysis was conducted using RNA sequence data (107 paired lung cancer and adjacent normal tissues) from the Cancer Genome Atlas (TCGA) database, and was further validated in 60 pairs of lung tumor and normal tissues from GSE19804. The expression quantitative trait loci (eQTL) analysis was performed based on the GTEx v7 database (383 lung tissues). OPRM1 and NLRP8 co-expressed genes were screened based on lung cancer RNA-seq (TCGA) using Pearson correlation analysis. All the analyses were performed using R (3.5.2) or Plink 1.9.

Characteristics of study subjects
The characteristics of study subjects were shown in Table 1. In brief, a total of 3,387 lung cancer cases and 4,130 controls were included in this study. Lung adenocarcinoma (LUAD) was the most common histological type and accounted for more than 50% of all the lung cancer cases. Lung squamous cell carcinoma (SCC) accounted for 35.3% and 21.8% in NJMU GWAS and validation samples, respectively. The distribution of age, sex and smoking status was comparable between lung cancer cases and controls in validation samples (P > 0.05).

Two novel lung cancer susceptibility variants
As mentioned above, 17 of 20 lung cancer candidate SNPs were successfully genotyped in validation stage. Associations between 17 candidate variants and lung cancer risk were shown in Table 2.

Stratification analysis and interaction analysis
To explore the effects of our identified two SNPs in different subgroup populations, stratification analysis was further performed. Study subjects were divided into four subgroups according to age (young < 60 years old vs old ≥ 60 years old), sex (males vs females), smoking status (smokers vs nonsmokers) and histological subtypes (SCC vs LUAD). Notably, we observed a significantly stronger effect of rs6941653 in SCC (NJMU GWAS: OR = 1.39; Validation: OR = 1.29; Combined: OR = 1.36) than that in LUAD (NJMU GWAS: OR = 1.15; Validation: OR = 1.16; Combined: OR = 1.15, I 2 = 77%, P = 0.04, Figure  2A-2B, Figure 3A). In contrast to rs6941653, SNP rs402969 showed a similar effect on lung cancer risk in various subgroup populations (Figure 2C-D, Figure  3B). In addition, interactions between our identified lung cancer risk variants (rs6941653 and rs402969) and smoking were evaluated. However, no significant interaction was found (Supplementary Table 3, P for interaction > 0.05). Similarly, there was no significant interaction between variant rs6941653 and rs402969 (Supplementary Table 4, P for interaction > 0.05).

Functional annotations and eQTL analysis
In order to suggest the potential functions of our identified two novel variants and their related SNPs, functional annotations were performed. As shown in Table 3, rs6941653 was not located in regulatory element regions, such as promoter, enhancer, transcription factor (TF) binding sites or DNase peak. Strikingly, rs9397692 (r 2 = 0.70 with rs6941653) was an eQTL and could influence the binding of transcription factor NFATC2. What's more, rs9397692 had a Regulome DB score of 4, suggesting that it was located in TF binding site and DNase peak. All these results indicated that rs9397692 might be the functional variant, which could affect the binding of specific transcription factor NFATC2. In the second risk loci, rs805165 (r 2 = 0.90 with rs402969) was predicted to be an eQTL and could modify the affinity to TF BRCA1. Consistently, the Regulome DB score of rs805165 was 5 (TF binding or DNase peak). Taken together, rs805165 might be the functional variant in the second loci. Furthermore, eQTL analysis in lung tissues was conducted based on GTEx v7 database. Unfortunately, the expression of OPRM1 and NLRP8 in lung tissues was not sufficient for eQTL calculation. Therefore, associations between our identified variants and the expression of their host genes (OPRM1 and NLRP8) were not evaluated. Notably, rs805165 was found to be significantly associated with the expression of AC010525.7 and AC024580.1 (Supplementary Figure 1). However, the functions of AC010525.7 and AC024580.1 have not been reported in previous studies.

Differential gene expression, pathway and GO analyses
As shown in Figure 4, OPRM1 was not expressed in normal lung tissues (GTEx database), but showed abundant expression in testis and brain tissues ( Figure 4A). The consistent finding was observed in lung cancer adjacent normal tissues based TCGA database. However, the expression of OPRM1 was significantly elevated in lung tumor tissues compared to that in normal tissues (P = 7.83 × 10 -4 , Figure 4C). Similarly, NLRP8 expression was aberrantly upregulated in tumor tissues (Figure 4D, P = 0.015). These findings were further validated in 60 paired lung tumor tissues and adjacent normal tissues from GSE19804 (Supplementary Figure 2). Pathway enrichment analysis showed that OPRM1 co-expressed genes (Fold enrichment: 14.0, P = 2.10 × 10 -214 ) and NLRP8 co-expressed genes (Fold enrichment: 6.7, P = 3.10 × 10 -110 , Supplementary Table 5) were enriched in the olfactory transduction pathway. The GO analysis indicated that OPRM1 co-expressed genes might participate in various biological processes, such as immune response, signaling pathway and cell differentiation. The co-expressed genes with NLRP8 could take part in the sensory perception, keratinocyte differentiation and G-protein coupled receptor signaling pathway.

Discussion
Although lung cancer GWASs have identified dozens of lung cancer susceptibility loci, more lung cancer associated variants remained to be identified [3]. Cancer-testis genes were recognized as promising therapeutic targets for cancers due to their high immunogenicity in tumor tissues [22,23]. However, the roles of genetic variants in CT genes in the development of lung cancer have not been revealed in previous studies. In the current study, a two-stage case-control study was performed to systematically evaluate the associations between genetic variants in CT genes and the risk of lung cancer in the Chinese population. A total of 22,556 qualified SNPs from 268 lung cancer associated CT genes were initially analyzed and 17 candidate SNPs were genotyped using the Sequenom platform. Finally, two variants (rs6941653 in OPRM1 and rs402969 in NLRP8) were identified as novel susceptibility loci of lung cancer in Chinese population.
SNP rs6941653 was located in the intron of OPRM1 (opioid receptor mu 1, 6q25.2), which encoded one of the opioid receptors [24]. Previous studies revealed that genetic variants in OPRM1 could modulate the dependence to multiple drugs or chemical agents, including nicotine, cocaine, alcohol [25][26][27]. Due to these properties, OPRM1 has been reported associated with the progression of Alzheimer's disease, Parkinson's disease, Schizophrenia and so on [28][29][30]. In lung cancer, Lennon FE et al found that OPRM1 (also known as MOR) expression was elevated in several human non-small cell lung cancer cell lines. What's more, overexpression of OPRM1 significantly promoted the proliferation of lung cancer in vitro and in vivo [31,32].
Consistently, OPRM1 was recognized as a novel CT gene in lung cancer because of its aberrant expression in lung tumor tissues according to our previous study [13]. In the current study, rs6941653 in OPRM1 was further identified associated with lung cancer risk in Chinese population. Strikingly, stratification analysis indicated that rs6941653 showed a stronger effect in the SCC population. With regard to the potential mechanisms, we speculated that nicotine dependence induced by OPRM1 might partially account for this. Functional annotations for rs6941653 and their LD variants suggested that rs9397692, which was in the DNase peak and could affect the binding of TF NFATC2, might be the functional variant in these loci.
The rs402969 was located in the upstream of NLRP8, which belonged to the member of the nucleotide-binding oligomerization domain/ leucine rich repeat/ pyrin domain containing (NLRP) subfamily [33]. This gene family was involved in innate immunity, inflammasome formation and mammalian reproduction [34][35][36]. However, the functions and roles of NLRP8 in cancers have not been described in the previous study. In this study, NLRP8 expression was found significantly upregulated in lung cancer tissues. What's more, rs805165 (LD with rs402969) was predicted in the binding sites of TF BRCA1 or DNase peak, suggesting a potential role as a regulator of gene expression. As expected, rs805165 showed a significant association with the expression of AC010525.7 and AC024580.1. Nevertheless, the functions of them have not been elucidated. Unfortunately, the expression of NLRP8 in lung tissues was not sufficient for eQTL evaluation. Function studies were needed to reveal the role of NLRP8 in lung cancer and identify the potential functional variants in these loci.  In conclusion, we performed a systematic evaluation of the associations between genetic variants in CT genes and the risk of lung cancer. Two novel lung cancer susceptibility loci were successfully identified in Chinese population. This study could improve our understanding of the roles of CT genes in the tumorigenesis of lung cancer. However, functional studies were wanted to verify the functional variants and reveal the potential mechanisms.