Overview
Load libraries
Import ER/PR/HER2 status for all patient samples
Import tumor subtype and gene expression values for all patient samples, then compute DAB2IP quartiles
Combine all data, remove samples whose subtype is missing
Plot DAB2IP expression by subtype, boxplots (Fig 1A)
Import ROR-P scores for all samples
Plot boxplot of RORP by subtype for DAB2IP expression groups
Compute ROR-P significance between DAB2IP-high and DAB2IP-low groups for each subtype
Select DAB2IP-high and DAB2IP-low ER+ LumA tumors for differential expression analysis
Run DESeq2 comparing DAB2IP-high and DAB2IP-low ER+ LumA tumors
Read in DE results DAB2IP low/high ER+ LumA tumors
Create manual ordering of tumors by subtype then by DAB2IP expression (Fig 3A)
Explore pairwise correlation of tumors based on DEG expression profiles
Plot correlation heatmap (Fig 3B)
Plot Proliferation scores for each tumor based on 11-gene panel (Fig 4A)
Compute significance for Proliferation scores for each subtype
Plot heatmap of differentially expressed genes that represent putative NF-kB (RelA) targets (Fig 7A)
Plot four putative target genes expression for each subtype split by DAB2IP status (Fig 7B)
Plot heatmap of differentially expressed KEGG spliceosome genes (Fig 7D)
Save workspace
Get session info

Overview

The code presented here supports the Mukherjee et al publication entitled “DAB2IP-Low Luminal A Breast Cancer Patients Exhibit Gene Expression Profiles Overlapping with More Aggressive Cancer Partly Related to IKK/NF-kB Activation” (2023), under review. The steps outlined here recapitulate the analysis performed for preparation of the figures, however, the figures will be subtly different given slight changes in software versions that occurred during manuscript preparation.

The data utilized here were obtained from cBioPortal in February 2019 corresponding to the PanCan Atlas 2018 release and Firehouse Legacy data. The latter contains ER/PR/HER2 status for each patient sample.

Note that the GDC portal no longer includes HTseq gene quantifications but rather includes augmented_star_gene_counts, which should be similar, and RNA-seq data also now utilize RSEM whereas the data used here do not.

Load libraries

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.2
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.0     ✔ tibble    3.1.8
## ✔ lubridate 1.9.3     ✔ tidyr     1.2.0
## ✔ purrr     0.3.5     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ComplexHeatmap)

## Loading required package: grid
## ========================================
## ComplexHeatmap version 2.10.0
## Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
## Github page: https://github.com/jokergoo/ComplexHeatmap
## Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
## 
## If you use it in published research, please cite:
## Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
##   genomic data. Bioinformatics 2016.
## 
## The new InteractiveComplexHeatmap package can directly export static 
## complex heatmaps into an interactive Shiny app with zero effort. Have a try!
## 
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(ComplexHeatmap))
## ========================================

library(circlize)

## ========================================
## circlize version 0.4.14
## CRAN page: https://cran.r-project.org/package=circlize
## Github page: https://github.com/jokergoo/circlize
## Documentation: https://jokergoo.github.io/circlize_book/book/
## 
## If you use it in published research, please cite:
## Gu, Z. circlize implements and enhances circular visualization
##   in R. Bioinformatics 2014.
## 
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(circlize))
## ========================================

library(tidyHeatmap)

## ========================================
## tidyHeatmap version 1.8.1
## If you use tidyHeatmap in published research, please cite:
## 1) Mangiola et al. tidyHeatmap: an R package for modular heatmap production 
##   based on tidy principles. JOSS 2020.
## 2) Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
##   genomic data. Bioinformatics 2016.
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(tidyHeatmap))
## ========================================
## 
## 
## Attaching package: 'tidyHeatmap'
## 
## The following object is masked from 'package:stats':
## 
##     heatmap

library(readxl)
library(DESeq2)

## Loading required package: S4Vectors
## Loading required package: stats4
## Loading required package: BiocGenerics
## 
## Attaching package: 'BiocGenerics'
## 
## The following objects are masked from 'package:lubridate':
## 
##     intersect, setdiff, union
## 
## The following objects are masked from 'package:dplyr':
## 
##     combine, intersect, setdiff, union
## 
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## 
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which.max, which.min
## 
## 
## Attaching package: 'S4Vectors'
## 
## The following objects are masked from 'package:lubridate':
## 
##     second, second<-
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, rename
## 
## The following object is masked from 'package:tidyr':
## 
##     expand
## 
## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname
## 
## Loading required package: IRanges
## 
## Attaching package: 'IRanges'
## 
## The following object is masked from 'package:lubridate':
## 
##     %within%
## 
## The following objects are masked from 'package:dplyr':
## 
##     collapse, desc, slice
## 
## The following object is masked from 'package:purrr':
## 
##     reduce
## 
## Loading required package: GenomicRanges
## Loading required package: GenomeInfoDb
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
## 
## Attaching package: 'matrixStats'
## 
## The following object is masked from 'package:dplyr':
## 
##     count
## 
## 
## Attaching package: 'MatrixGenerics'
## 
## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars
## 
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## 
## Attaching package: 'Biobase'
## 
## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians
## 
## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians

library(ashr)
library(paletteer)
library(patchwork)

Import ER/PR/HER2 status for all patient samples

EPHstatus <- read_tsv("brca_tcga_clinical_data_ER_PR_HER2.tsv") %>%
    dplyr::select(c(`Sample ID`,`ER Status By IHC`,`IHC-HER2`,`PR status by ihc`)) %>%
    mutate(
        TNBC = case_when(
            `ER Status By IHC` == "Negative" & `IHC-HER2` == "Negative" & `PR status by ihc` == "Negative" ~ "TNBC", 
            T ~ "nonTNBC"
            )
        ) %>%
    dplyr::rename("SampleID" = `Sample ID`, 
        "ERstatus" = `ER Status By IHC`, 
        "HER2status" = `IHC-HER2`, 
        "PRstatus" = `PR status by ihc`) %>%
    mutate(SampleID = paste0(SampleID,"A"))

## New names:
## Rows: 1108 Columns: 139
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "\t" chr
## (88): Study ID, Patient ID, Sample ID, American Joint Committee on Cance... dbl
## (20): Diagnosis Age, Days to Sample Collection., Last Alive Less Initial... lgl
## (31): Neoplasm American Joint Committee on Cancer Clinical Group Stage, ...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `First Pathologic Diagnosis Biospecimen Acquisition Method Type` -> `First
##   Pathologic Diagnosis Biospecimen Acquisition Method Type...68`
## • `First Pathologic Diagnosis Biospecimen Acquisition Method Type` -> `First
##   Pathologic Diagnosis Biospecimen Acquisition Method Type...70`

Import tumor subtype and gene expression values for all patient samples, then compute DAB2IP quartiles

Subtype <- read_tsv("data_clinical_patient.txt",skip=4) %>%
    dplyr::rename("SampleID" = PATIENT_ID) %>%
    dplyr::select(c(SampleID,SUBTYPE)) %>%
    mutate(SampleID = paste0(SampleID,"-01A"))

## Rows: 1084 Columns: 38
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (29): PATIENT_ID, SUBTYPE, CANCER_TYPE_ACRONYM, OTHER_PATIENT_ID, SEX, A...
## dbl  (8): AGE, DAYS_LAST_FOLLOWUP, DAYS_TO_BIRTH, DAYS_TO_INITIAL_PATHOLOGIC...
## lgl  (1): WEIGHT
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

TCGAexpr <- read_tsv("data_RNA_Seq_v2_mRNA_median_all_sample_Zscores.txt") %>%
    dplyr::filter(!is.na(Hugo_Symbol)) %>%
    dplyr::select(!starts_with("Entrez")) %>%
    pivot_longer(cols=!starts_with("Hugo"),names_to="SampleID",values_to="Expression") %>%
    dplyr::rename("Gene" = Hugo_Symbol) %>%
    mutate(SampleID = paste0(SampleID,"A")) %>%
    dplyr::select(c(SampleID,Gene,Expression))

## Rows: 20531 Columns: 1084
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr    (1): Hugo_Symbol
## dbl (1083): Entrez_Gene_Id, TCGA-3C-AAAU-01, TCGA-3C-AALI-01, TCGA-3C-AALJ-0...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

DAB2IP <- TCGAexpr %>% 
    dplyr::filter(Gene=="DAB2IP") %>% 
    mutate(quartile = ntile(Expression, 4)) %>%
    mutate(
        DAB2IP = case_when(
            quartile == 1 ~ "DAB2IP_low",
            quartile == 4 ~ "DAB2IP_high",
            T ~ "DAB2IP_mid"
            )
        ) %>%
    dplyr::select(SampleID,DAB2IP,Expression)

Combine all data, remove samples whose subtype is missing

combined <- inner_join(inner_join(DAB2IP,EPHstatus),Subtype)

## Joining with `by = join_by(SampleID)`
## Joining with `by = join_by(SampleID)`

combined <- combined %>%
    drop_na(SUBTYPE)

Plot DAB2IP expression by subtype, boxplots (Fig 1A)

# Get min cutoffs for each DAB2IP expression quartile
DAB2IP.cuts <- TCGAexpr %>% 
    filter(Gene=="DAB2IP") %>% 
    mutate(quartile = ntile(Expression, 4)) %>% 
    group_by(quartile) %>% 
    summarize(max = max(Expression)) %>% 
    pull(max)
    
combined %>%
    select(SampleID,Expression,SUBTYPE) %>%
    group_by(SUBTYPE) %>%
    ggplot(aes(x=SUBTYPE,y=Expression,group=SUBTYPE,fill=SUBTYPE)) +
    geom_boxplot(outlier.shape=NA) +
    annotate("rect", xmin=5.5, xmax=Inf, ymin=-Inf, ymax=DAB2IP.cuts[1], alpha=0.4, fill="gray90")  +
    annotate("rect", xmin=5.5, xmax=Inf, ymin=DAB2IP.cuts[1], ymax=DAB2IP.cuts[2], alpha=0.4, fill = "gray50")  +
    annotate("rect", xmin=5.5, xmax=Inf, ymin=DAB2IP.cuts[2], ymax=DAB2IP.cuts[3], alpha=0.4, fill = "gray20")  +
    annotate("rect", xmin=5.5, xmax=Inf, ymin=DAB2IP.cuts[3], ymax=Inf, alpha=0.4, fill = "black")  +
    geom_jitter(width=0.25) +
    ylab("DAB2IP expression") +
    xlab("Subtype") +
    theme_classic() +
    geom_text(label="1",x=5.55,y=-2.5) +
    geom_text(label="2",x=5.55,y=-0.3) +
    geom_text(label="3",x=5.55,y=0.4) +
    geom_text(label="4",x=5.55,y=2)

## png 
##   2

Import ROR-P scores for all samples

RORP <- read_tsv("BRCA.1218_pam50scores.FINAL.txt") %>%
    dplyr::rename("SampleID" = bcr_patient_barcode, "RORPscores" = `ROR-P (Subtype + Proliferation)`) %>%
    dplyr::select(SampleID,RORPscores) %>%
    mutate(SampleID = paste0(SampleID,"-01A")) %>% 
    group_by(SampleID) %>% 
    summarize(RORPavg = mean(RORPscores))

## New names:
## Rows: 1222 Columns: 20
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "\t" chr
## (9): bcr_patient_barcode, Barcode, Run.Lane.barcode, Call, ROR-S Group ... dbl
## (11): Basal, Her2...5, LumA, LumB, Normal, Confidence, ROR-S (Subtype On...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `Her2` -> `Her2...5`
## • `Her2` -> `Her2...17`

Plot boxplot of RORP by subtype for DAB2IP expression groups

inner_join(TCGAexpr,inner_join(RORP,combined,by="SampleID"),by="SampleID") %>%
    dplyr::select(SampleID,Gene,Expression.x,DAB2IP,Expression.y,ERstatus,SUBTYPE,RORPavg) %>%
    dplyr::rename("Expression" = Expression.x,"DAB2IP_expression" = Expression.y) %>%
    dplyr::filter(Gene=="DAB2IP") %>%
    dplyr::filter(DAB2IP != "DAB2IP_mid") %>%
    ggplot(aes(x=fct_relevel(DAB2IP,c("DAB2IP_low","DAB2IP_high")),y=RORPavg,fill=SUBTYPE)) +
    geom_boxplot(outlier.shape=NA) +
    geom_jitter(width=0.25) +
    xlab("DAB2IP expression") +
    facet_wrap(~SUBTYPE,ncol=5) +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

## png 
##   2

Compute ROR-P significance between DAB2IP-high and DAB2IP-low groups for each subtype

rorp.d.low.lumA <- inner_join(TCGAexpr,inner_join(RORP,combined,by="SampleID"),by="SampleID") %>%
    dplyr::select(SampleID,Gene,Expression.x,DAB2IP,Expression.y,ERstatus,SUBTYPE,RORPavg) %>%
    dplyr::rename("Expression" = Expression.x,"DAB2IP_expression" = Expression.y) %>%
    dplyr::filter(Gene=="DAB2IP") %>%
    dplyr::filter(SUBTYPE=="BRCA_LumA") %>%
    dplyr::filter(DAB2IP=="DAB2IP_low") %>%
    pull(RORPavg)

rorp.d.high.lumA <- inner_join(TCGAexpr,inner_join(RORP,combined,by="SampleID"),by="SampleID") %>%
    dplyr::select(SampleID,Gene,Expression.x,DAB2IP,Expression.y,ERstatus,SUBTYPE,RORPavg) %>%
    dplyr::rename("Expression" = Expression.x,"DAB2IP_expression" = Expression.y) %>%
    dplyr::filter(Gene=="DAB2IP") %>%
    dplyr::filter(SUBTYPE=="BRCA_LumA") %>%
    dplyr::filter(DAB2IP=="DAB2IP_high") %>%
    pull(RORPavg)

t.test(rorp.d.high.lumA,rorp.d.low.lumA)

## 
##  Welch Two Sample t-test
## 
## data:  rorp.d.high.lumA and rorp.d.low.lumA
## t = -6.6269, df = 201.56, p-value = 3.064e-10
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -17.189772  -9.306095
## sample estimates:
## mean of x mean of y 
##  7.464012 20.711945

rorp.d.low.lumB <- inner_join(TCGAexpr,inner_join(RORP,combined,by="SampleID"),by="SampleID") %>%
    dplyr::select(SampleID,Gene,Expression.x,DAB2IP,Expression.y,ERstatus,SUBTYPE,RORPavg) %>%
    dplyr::rename("Expression" = Expression.x,"DAB2IP_expression" = Expression.y) %>%
    dplyr::filter(Gene=="DAB2IP") %>%
    dplyr::filter(SUBTYPE=="BRCA_LumB") %>%
    dplyr::filter(DAB2IP=="DAB2IP_low") %>%
    pull(RORPavg)

rorp.d.high.lumB <- inner_join(TCGAexpr,inner_join(RORP,combined,by="SampleID"),by="SampleID") %>%
    dplyr::select(SampleID,Gene,Expression.x,DAB2IP,Expression.y,ERstatus,SUBTYPE,RORPavg) %>%
 dplyr::    rename("Expression" = Expression.x,"DAB2IP_expression" = Expression.y) %>%
    dplyr::filter(Gene=="DAB2IP") %>%
    dplyr::filter(SUBTYPE=="BRCA_LumB") %>%
    dplyr::filter(DAB2IP=="DAB2IP_high") %>%
    pull(RORPavg)

t.test(rorp.d.high.lumB,rorp.d.low.lumB)

## 
##  Welch Two Sample t-test
## 
## data:  rorp.d.high.lumB and rorp.d.low.lumB
## t = 0.96542, df = 23.001, p-value = 0.3444
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.704416 10.187711
## sample estimates:
## mean of x mean of y 
##  49.27263  46.03098

rorp.d.low.Basal <- inner_join(TCGAexpr,inner_join(RORP,combined,by="SampleID"),by="SampleID") %>%
    dplyr::select(SampleID,Gene,Expression.x,DAB2IP,Expression.y,ERstatus,SUBTYPE,RORPavg) %>%
    dplyr::rename("Expression" = Expression.x,"DAB2IP_expression" = Expression.y) %>%
    dplyr::filter(Gene=="DAB2IP") %>%
    dplyr::filter(SUBTYPE=="BRCA_Basal") %>%
    dplyr::filter(DAB2IP=="DAB2IP_low") %>%
    pull(RORPavg)

rorp.d.high.Basal <- inner_join(TCGAexpr,inner_join(RORP,combined,by="SampleID"),by="SampleID") %>%
    dplyr::select(SampleID,Gene,Expression.x,DAB2IP,Expression.y,ERstatus,SUBTYPE,RORPavg) %>%
    dplyr::rename("Expression" = Expression.x,"DAB2IP_expression" = Expression.y) %>%
    dplyr::filter(Gene=="DAB2IP") %>%
    dplyr::filter(SUBTYPE=="BRCA_Basal") %>%
    dplyr::filter(DAB2IP=="DAB2IP_high") %>%
    pull(RORPavg)

t.test(rorp.d.high.Basal,rorp.d.low.Basal)

## 
##  Welch Two Sample t-test
## 
## data:  rorp.d.high.Basal and rorp.d.low.Basal
## t = -2.0816, df = 59.824, p-value = 0.04167
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -10.8012654  -0.2147586
## sample estimates:
## mean of x mean of y 
##  53.06848  58.57649

rorp.d.low.Her2 <- inner_join(TCGAexpr,inner_join(RORP,combined,by="SampleID"),by="SampleID") %>%
    dplyr::select(SampleID,Gene,Expression.x,DAB2IP,Expression.y,ERstatus,SUBTYPE,RORPavg) %>%
    dplyr::rename("Expression" = Expression.x,"DAB2IP_expression" = Expression.y) %>%
    dplyr::filter(Gene=="DAB2IP") %>%
    dplyr::filter(SUBTYPE=="BRCA_Her2") %>%
    dplyr::filter(DAB2IP=="DAB2IP_low") %>%
    pull(RORPavg)

rorp.d.high.Her2 <- inner_join(TCGAexpr,inner_join(RORP,combined,by="SampleID"),by="SampleID") %>%
    dplyr::select(SampleID,Gene,Expression.x,DAB2IP,Expression.y,ERstatus,SUBTYPE,RORPavg) %>%
    dplyr::rename("Expression" = Expression.x,"DAB2IP_expression" = Expression.y) %>%
    dplyr::filter(Gene=="DAB2IP") %>%
    dplyr::filter(SUBTYPE=="BRCA_Her2") %>%
    dplyr::filter(DAB2IP=="DAB2IP_high") %>%
    pull(RORPavg)

t.test(rorp.d.high.Her2,rorp.d.low.Her2)

## 
##  Welch Two Sample t-test
## 
## data:  rorp.d.high.Her2 and rorp.d.low.Her2
## t = -0.54965, df = 28.418, p-value = 0.5869
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -14.479670   8.349809
## sample estimates:
## mean of x mean of y 
##  43.70862  46.77355

Select DAB2IP-high and DAB2IP-low ER+ LumA tumors for differential expression analysis

tumorsDE <- combined %>%
    dplyr::select(SampleID,DAB2IP,ERstatus,SUBTYPE) %>%
    dplyr::filter(ERstatus=="Positive") %>%
    dplyr::filter(SUBTYPE=="BRCA_LumA") %>%
    dplyr::filter(DAB2IP!="DAB2IP_mid")
write_tsv(tumorsDE,"TCGA_BRCA_tumors_DAB2IP_low_high_ERpositiveOnly_LumA_010924.tsv")

tumorsDE

## # A tibble: 206 × 4
##    SampleID         DAB2IP      ERstatus SUBTYPE  
##    <chr>            <chr>       <chr>    <chr>    
##  1 TCGA-3C-AAAU-01A DAB2IP_high Positive BRCA_LumA
##  2 TCGA-4H-AAAK-01A DAB2IP_high Positive BRCA_LumA
##  3 TCGA-A1-A0SE-01A DAB2IP_high Positive BRCA_LumA
##  4 TCGA-A1-A0SQ-01A DAB2IP_high Positive BRCA_LumA
##  5 TCGA-A2-A04V-01A DAB2IP_high Positive BRCA_LumA
##  6 TCGA-A2-A0CK-01A DAB2IP_high Positive BRCA_LumA
##  7 TCGA-A2-A0CP-01A DAB2IP_high Positive BRCA_LumA
##  8 TCGA-A2-A0CR-01A DAB2IP_high Positive BRCA_LumA
##  9 TCGA-A2-A0CT-01A DAB2IP_low  Positive BRCA_LumA
## 10 TCGA-A2-A0CV-01A DAB2IP_high Positive BRCA_LumA
## # … with 196 more rows

Run DESeq2 comparing DAB2IP-high and DAB2IP-low ER+ LumA tumors

# Collapse all htseq output files from isoform->gene
tx2gene <- read_tsv("rawCounts_DE/gencode.gene.info.v22.lookup.tsv",col_names=c("tx","gene"))

sampleFiles <- grep("htseq.counts$",list.files("rawCounts_DE"),value=TRUE)
for (i in 1:length(sampleFiles)) {
    file <- sampleFiles[i]
    tsv <- read_tsv(paste0("rawCounts_DE/",file),col_names=c("tx","count"))
    joined <- inner_join(tsv,tx2gene) %>%
        group_by(gene) %>%
        summarize(sum = sum(count))
    newFile <- paste0("rawCounts_DE/",file,".030323.collapsed")
    write_tsv(joined,newFile,col_names=F)
}

# Join sample IDs with filenames of new collapsed counts files and compile DESeq2 metadata
sampleFiles <- grep("counts.030323.collapsed",list.files("rawCounts_DE"),value=TRUE)
sampleInfo <- read_tsv("TCGA_BRCA_tumors_DAB2IP_low_high_ERpositiveOnly_LumA_010924.tsv")

## Rows: 206 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (4): SampleID, DAB2IP, ERstatus, SUBTYPE
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

countFiles <- read_tsv("rawCounts_DE/gdc_sample_sheet.2020-11-25.tsv") %>%
    dplyr::select(`Sample ID`,`File Name`) %>%
    dplyr::rename("SampleID" = `Sample ID`,"Filename" = `File Name`)

## Rows: 1222 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (8): File ID, File Name, Data Category, Data Type, Project ID, Case ID, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

coldata <- inner_join(sampleInfo,countFiles) %>%
    dplyr::select(SampleID,Filename,DAB2IP) %>%
    mutate(Filename = str_replace_all(Filename,"counts.gz","counts.030323.collapsed")) %>%
    dplyr::filter(Filename != "0c7d119e-22c0-47ed-b736-3404ab2c7065.htseq.counts.030323.collapsed") %>%
    dplyr::filter(Filename != "6aa69f7f-28c6-4943-b907-62d8085e9c6c.htseq.counts.030323.collapsed")

## Joining with `by = join_by(SampleID)`

## Warning in inner_join(sampleInfo, countFiles): Each row in `x` is expected to match at most 1 row in `y`.
## ℹ Row 27 of `x` matches multiple rows.
## ℹ If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.

        # These two samples have two counts files, removing the second instance of both

coldata <- as.data.frame(coldata)
coldata$DAB2IP <- factor(coldata$DAB2IP)


# Set up DESeq2 object
dds <- DESeqDataSetFromHTSeqCount(sampleTable = coldata, directory = "rawCounts_DE", design= ~DAB2IP)

# Run DE test
dds <- DESeq(dds,betaPrior=FALSE)

## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## -- replacing outliers and refitting for 4830 genes
## -- DESeq argument 'minReplicatesForReplace' = 7 
## -- original counts are preserved in counts(dds)
## estimating dispersions
## fitting model and testing

# Use shrunken log-fold-changes with `ashr`
res <- lfcShrink(dds, contrast=c("DAB2IP","DAB2IP_high","DAB2IP_low"), type="ashr")

## using 'ashr' for LFC shrinkage. If used in published research, please cite:
##     Stephens, M. (2016) False discovery rates: a new deal. Biostatistics, 18:2.
##     https://doi.org/10.1093/biostatistics/kxw041

# VST normalization for downstream plotting
vsd <- vst(dds)
DESeq2::plotPCA(vsd, intgroup="DAB2IP")

# Prepare output files
res <- res[order(res$padj), ]
resdata <- merge(as.data.frame(res,row.names=rownames(res)), as.data.frame(assay(vsd)), by="row.names", sort=FALSE)
rownames(resdata)=resdata$Row.names
resdata=resdata[,-1]

# Plot histogram of significant adjusted p-values
hist(res$padj, breaks=50, col="grey")

# Filter results
res_tbl <- rownames_to_column(as.data.frame(res), var = "Gene") %>% 
    as_tibble()

res_filt <- res_tbl %>%
    dplyr::select(Gene,log2FoldChange,padj) %>%
    dplyr::filter(padj < 1e-5)

# Write output
write_tsv(res_filt,"TCGA_BRCA_tumors_DAB2IP_low_high_ERpositiveOnly_LumA_DESeq2_results_adjP1e5_010924.txt")

Read in DE results DAB2IP low/high ER+ LumA tumors

DEG <- read_tsv("TCGA_BRCA_tumors_DAB2IP_low_high_ERpositiveOnly_LumA_DESeq2_results_adjP1e5_010924.txt")

## Rows: 2073 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): Gene
## dbl (2): log2FoldChange, padj
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

DEG_any <- DEG %>%
    pull(Gene)

# Subset TCGA gene expression data to just the differentially-expressed genes
### Include SFRS1 manually, since we know this to be differentially-expressed but uses an Entrez synonymous gene symbol for SRSF1
TCGAexpr_DEG <- TCGAexpr %>%
    dplyr::filter(Gene %in% DEG_any | Gene=="SFRS1")

Create manual ordering of tumors by subtype then by DAB2IP expression (Fig 3A)

DAB2IP_order <- inner_join(TCGAexpr_DEG,combined,by="SampleID") %>%
    dplyr::select(SampleID,Gene,Expression.x,DAB2IP,Expression.y,ERstatus,SUBTYPE) %>%
    dplyr::rename("Expression" = Expression.x,"DAB2IP_expression" = Expression.y) %>%
    dplyr::filter(!is.na(SUBTYPE)) %>%
    dplyr::filter(ERstatus=="Positive" | ERstatus=="Negative") %>%
    dplyr::filter(DAB2IP=="DAB2IP_high" | DAB2IP=="DAB2IP_low") %>%
#   dplyr::filter(!Gene %in% c("SNORD104","SNORD11","SNORD6","SNORD93")) %>%       # Remove these due to SD=NA
    group_by(SUBTYPE) %>%
    dplyr::arrange(DAB2IP_expression,.by_group = T) %>%
    dplyr::select(SampleID,DAB2IP_expression) %>%
    unique() %>%
    pull(SampleID)

## Adding missing grouping variables: `SUBTYPE`

# Now plot heatmap of DAB2IP DEGs for all DAB2IP high/low tumors
inner_join(TCGAexpr_DEG,combined,by="SampleID") %>%
    dplyr::select(SampleID,Gene,Expression.x,DAB2IP,Expression.y,ERstatus,SUBTYPE) %>%
    dplyr::rename("Expression" = Expression.x,"DAB2IP_expression" = Expression.y) %>%
    dplyr::filter(DAB2IP=="DAB2IP_high" | DAB2IP=="DAB2IP_low") %>%
    dplyr::filter(!is.na(SUBTYPE)) %>%
    dplyr::filter(ERstatus=="Positive" | ERstatus=="Negative") %>%
#  dplyr::filter(!Gene %in% c("SNORD104","SNORD11","SNORD6","SNORD93")) %>%       # Remove these due to SD=NA
    group_by(SUBTYPE) %>%
    tidyHeatmap::heatmap(Gene,
        SampleID,
        Expression,
        scale="none",
        palette_value=colorRamp2(c(-3, 0, 3), c("blue", "white", "red")),
        cluster_columns=F,
        column_order=DAB2IP_order,
        show_row_names = FALSE,
        show_column_names = FALSE) %>%
    add_tile(SUBTYPE) %>%
    add_tile(DAB2IP)

## tidyHeatmap says: (once per session) from release 1.7.0 the scaling is set to "none" by default. Please use scale = "row", "column" or "both" to apply scaling

## png 
##   2

Explore pairwise correlation of tumors based on DEG expression profiles

# Format sample metadata
coldata <- inner_join(TCGAexpr_DEG,combined,by="SampleID") %>%
    dplyr::select(SampleID,Gene,Expression.x,DAB2IP,Expression.y,ERstatus,SUBTYPE) %>%
    dplyr::rename("Expression" = Expression.x,"DAB2IP_expression" = Expression.y) %>%
    dplyr::filter(DAB2IP=="DAB2IP_high" | DAB2IP=="DAB2IP_low") %>%
    dplyr::filter(!is.na(SUBTYPE)) %>%
    dplyr::filter(ERstatus=="Positive" | ERstatus=="Negative") %>%
    dplyr::filter(Gene=="DAB2IP") %>%
    dplyr::select(-Gene,-DAB2IP_expression,-Expression) %>%
    mutate(SUBTYPE = str_replace_all(SUBTYPE,"BRCA_","")) %>%
    unite(Group,c(SUBTYPE,DAB2IP),sep="_")

coldata.df <- as.data.frame(column_to_rownames(coldata,var="SampleID"))

# Format expression data matrix
degmat <- inner_join(TCGAexpr_DEG,combined,by="SampleID") %>%
    dplyr::select(SampleID,Gene,Expression.x,DAB2IP,Expression.y,ERstatus,SUBTYPE) %>%
    dplyr::rename("Expression" = Expression.x,"DAB2IP_expression" = Expression.y) %>%
    dplyr::filter(DAB2IP=="DAB2IP_high" | DAB2IP=="DAB2IP_low") %>%
    dplyr::filter(!is.na(SUBTYPE)) %>%
    dplyr::filter(ERstatus=="Positive" | ERstatus=="Negative") %>%
#   dplyr::filter(!Gene %in% c("SNORD104","SNORD11","SNORD6","SNORD93")) %>%       # Remove these due to SD=NA
    dplyr::select(-DAB2IP,-DAB2IP_expression,-ERstatus,-SUBTYPE) %>%    
    pivot_wider(names_from=SampleID,values_from=Expression,values_fn = median)

degmat.df <- as.data.frame(column_to_rownames(degmat,var="Gene"))

# Compute Pearson correlations, Luminal A DAB2IP-low vs all 
cors <- cor(degmat.df[,coldata$SampleID[coldata$Group=="LumA_DAB2IP_low"]],degmat.df)

Plot correlation heatmap (Fig 3B)

# Set up sample groupings and colors
cols <- c(paletteer_d("nationalparkcolors::Redwoods")[1],       # Light green
      paletteer_d("nationalparkcolors::Everglades")[5],     # Dark green
      paletteer_d("nationalparkcolors::Everglades")[1],     # Light blue
      paletteer_d("nationalparkcolors::Everglades")[2],     # Dark blue
      paletteer_d("nationalparkcolors::Redwoods")[3],       # Light yellow
      paletteer_d("nationalparkcolors::BryceCanyon")[1],        # Dark yellow
      paletteer_d("nationalparkcolors::Badlands")[3],       # Light red
      paletteer_d("nationalparkcolors::DeathValley")[1],        # Dark red
      paletteer_d("nationalparkcolors::Acadia")[6]      # Dark gray
)

names(cols) <- levels(as.factor(coldata$Group))

ha1 <- HeatmapAnnotation(Group = as.factor(coldata$Group), col = list(Group = cols))
ha2 <- rowAnnotation(Group = as.factor(coldata$Group[coldata$Group %in% "LumA_DAB2IP_low"]), col = list(Group = cols), show_legend = F)

# Plot such that DAB2IP_low_LumA (rows) plotted in same order as the clustered columns
# Plot once to get column orders, then plot again with those specified to order the rows

h1 <- ComplexHeatmap::Heatmap(cors,
    show_row_names = F,
    show_column_names = F, 
    name = "Expression correlation",
    clustering_distance_rows=function(x) as.dist(1-cor(t(x))), 
    clustering_method_rows = "complete",
    clustering_distance_columns=function(x) as.dist(1-cor(t(x))), 
    clustering_method_columns = "complete",
    column_title = NULL,
    row_title = NULL,
    top_annotation = ha1,
    left_annotation = ha2,
    cluster_column_slices = F,
    column_split = as.factor(coldata$Group),
    border = T
)

# Draw unordered version to get order
ht <- draw(h1)

orders = column_order(ht)
row_order <- orders$LumA_DAB2IP_low

# Draw in sorted order
ComplexHeatmap::Heatmap(cors[colnames(cors)[row_order],],
    show_row_names = F,
    show_column_names = F,
    cluster_rows = F, 
    name = "Expression correlation",
    clustering_distance_columns=function(x) as.dist(1-cor(t(x))), 
    clustering_method_columns = "complete",
    column_title = NULL,
    row_title = NULL,
    top_annotation = ha1,
    left_annotation = ha2,
    cluster_column_slices = F,
    column_split = as.factor(coldata$Group),
    border = T
)

## png 
##   2

Plot Proliferation scores for each tumor based on 11-gene panel (Fig 4A)

PDscores <- read_tsv("TCGA_Dscore_Pscore.txt")

## Rows: 1198 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): SampleID
## dbl (2): D_Score, P_score_11genesMean
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

left_join(combined,PDscores) %>%
    dplyr::select(SampleID,ERstatus,SUBTYPE,D_Score,P_score_11genesMean,Expression,DAB2IP) %>%
    dplyr::filter(ERstatus=="Positive") %>%
    drop_na(SUBTYPE) %>%
    drop_na(P_score_11genesMean) %>%
    dplyr::filter(DAB2IP!="DAB2IP_mid") %>%
    mutate(DAB2IP = fct_relevel(DAB2IP,c("DAB2IP_low","DAB2IP_high"))) %>%
    ggplot(aes(x=DAB2IP,y=P_score_11genesMean,color=SUBTYPE)) +
    geom_boxplot() +
    geom_jitter() +
    facet_wrap(~SUBTYPE,ncol=5)

## Joining with `by = join_by(SampleID)`

## Joining with `by = join_by(SampleID)`

## png 
##   2

Compute significance for Proliferation scores for each subtype

# LumA
Pscore_LumA_low <- left_join(combined,PDscores) %>%
    dplyr::select(SampleID,ERstatus,SUBTYPE,D_Score,P_score_11genesMean,Expression,DAB2IP) %>%
    dplyr::filter(ERstatus=="Positive") %>%
    drop_na(SUBTYPE) %>%
    dplyr::filter(DAB2IP=="DAB2IP_low" & SUBTYPE=="BRCA_LumA") %>%
    pull(P_score_11genesMean)

## Joining with `by = join_by(SampleID)`

Pscore_LumA_high <- left_join(combined,PDscores) %>%
    dplyr::select(SampleID,ERstatus,SUBTYPE,D_Score,P_score_11genesMean,Expression,DAB2IP) %>%
    dplyr::filter(ERstatus=="Positive") %>%
    drop_na(SUBTYPE) %>%
    dplyr::filter(DAB2IP=="DAB2IP_high" & SUBTYPE=="BRCA_LumA") %>%
    pull(P_score_11genesMean)

## Joining with `by = join_by(SampleID)`

t.test(Pscore_LumA_low,Pscore_LumA_high)

## 
##  Welch Two Sample t-test
## 
## data:  Pscore_LumA_low and Pscore_LumA_high
## t = 7.1667, df = 194.46, p-value = 1.555e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.5406667 0.9512256
## sample estimates:
## mean of x mean of y 
## -0.267900 -1.013846

# LumB
Pscore_LumB_low <- left_join(combined,PDscores) %>%
    dplyr::select(SampleID,ERstatus,SUBTYPE,D_Score,P_score_11genesMean,Expression,DAB2IP) %>%
    dplyr::filter(ERstatus=="Positive") %>%
    drop_na(SUBTYPE) %>%
    dplyr::filter(DAB2IP=="DAB2IP_low" & SUBTYPE=="BRCA_LumB") %>%
    pull(P_score_11genesMean)

## Joining with `by = join_by(SampleID)`

Pscore_LumB_high <- left_join(combined,PDscores) %>%
    dplyr::select(SampleID,ERstatus,SUBTYPE,D_Score,P_score_11genesMean,Expression,DAB2IP) %>%
    dplyr::filter(ERstatus=="Positive") %>%
    drop_na(SUBTYPE) %>%
    dplyr::filter(DAB2IP=="DAB2IP_high" & SUBTYPE=="BRCA_LumB") %>%
    pull(P_score_11genesMean)

## Joining with `by = join_by(SampleID)`

t.test(Pscore_LumB_low,Pscore_LumB_high)

## 
##  Welch Two Sample t-test
## 
## data:  Pscore_LumB_low and Pscore_LumB_high
## t = 0.017274, df = 15.495, p-value = 0.9864
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.2266552  0.2303695
## sample estimates:
## mean of x mean of y 
## 0.7344341 0.7325769

# Basal
Pscore_Basal_low <- left_join(combined,PDscores) %>%
    dplyr::select(SampleID,ERstatus,SUBTYPE,D_Score,P_score_11genesMean,Expression,DAB2IP) %>%
    dplyr::filter(ERstatus=="Positive") %>%
    drop_na(SUBTYPE) %>%
    dplyr::filter(DAB2IP=="DAB2IP_low" & SUBTYPE=="BRCA_Basal") %>%
    pull(P_score_11genesMean)

## Joining with `by = join_by(SampleID)`

Pscore_Basal_high <- left_join(combined,PDscores) %>%
    dplyr::select(SampleID,ERstatus,SUBTYPE,D_Score,P_score_11genesMean,Expression,DAB2IP) %>%
    dplyr::filter(ERstatus=="Positive") %>%
    drop_na(SUBTYPE) %>%
    dplyr::filter(DAB2IP=="DAB2IP_high" & SUBTYPE=="BRCA_Basal") %>%
    pull(P_score_11genesMean)

## Joining with `by = join_by(SampleID)`

t.test(Pscore_Basal_low,Pscore_Basal_high)

## 
##  Welch Two Sample t-test
## 
## data:  Pscore_Basal_low and Pscore_Basal_high
## t = 0.27631, df = 9, p-value = 0.7885
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1950053  0.2492719
## sample estimates:
## mean of x mean of y 
## 0.6153000 0.5881667

# Her2
Pscore_Her2_low <- left_join(combined,PDscores) %>%
    dplyr::select(SampleID,ERstatus,SUBTYPE,D_Score,P_score_11genesMean,Expression,DAB2IP) %>%
    dplyr::filter(ERstatus=="Positive") %>%
    drop_na(SUBTYPE) %>%
    dplyr::filter(DAB2IP=="DAB2IP_low" & SUBTYPE=="BRCA_Her2") %>%
    pull(P_score_11genesMean)

## Joining with `by = join_by(SampleID)`

Pscore_Her2_high <- left_join(combined,PDscores) %>%
    dplyr::select(SampleID,ERstatus,SUBTYPE,D_Score,P_score_11genesMean,Expression,DAB2IP) %>%
    dplyr::filter(ERstatus=="Positive") %>%
    drop_na(SUBTYPE) %>%
    dplyr::filter(DAB2IP=="DAB2IP_high" & SUBTYPE=="BRCA_Her2") %>%
    pull(P_score_11genesMean)

## Joining with `by = join_by(SampleID)`

t.test(Pscore_Her2_low,Pscore_Her2_high)

## 
##  Welch Two Sample t-test
## 
## data:  Pscore_Her2_low and Pscore_Her2_high
## t = -0.27922, df = 7.811, p-value = 0.7873
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.4086571  0.3207142
## sample estimates:
## mean of x mean of y 
## 0.3629286 0.4069000

Plot heatmap of differentially expressed genes that represent putative NF-kB (RelA) targets (Fig 7A)

These putative targets were identified using ChIP-Atlas. Signal within 5kb of the transcription start site (TSS) of target genes was averaged for HMEC, MCF7, and MDA-MB-231 cells (n=20 studies total), and filtered to retain promoters with signal greater than or equal to 10

rela_targets <- read_tsv("RELA.5.breast.gt10.tsv",col_names="Gene") %>%
    pull(Gene)

## Rows: 2871 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): Gene
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

inner_join(TCGAexpr_DEG,combined,by="SampleID") %>%
    dplyr::select(SampleID,Gene,Expression.x,DAB2IP,Expression.y,ERstatus,SUBTYPE) %>%
    dplyr::rename("Expression" = Expression.x,"DAB2IP_expression" = Expression.y) %>%
    dplyr::filter(DAB2IP=="DAB2IP_high" | DAB2IP=="DAB2IP_low") %>%
    dplyr::filter(!is.na(SUBTYPE)) %>%
    dplyr::filter(ERstatus=="Positive" | ERstatus=="Negative") %>%
    dplyr::filter(Gene %in% rela_targets) %>%
#  dplyr::filter(!Gene %in% c("SNORD104","SNORD11","SNORD6","SNORD93")) %>%       # Remove these due to SD=NA
    group_by(SUBTYPE) %>%
    tidyHeatmap::heatmap(Gene,
        SampleID,
        Expression,
        scale="none",
        palette_value=colorRamp2(c(-3, 0, 3), c("blue", "white", "red")),
        cluster_columns=F,
        column_order=DAB2IP_order,
        show_row_names = FALSE,
        show_column_names = FALSE) %>%
    add_tile(SUBTYPE) %>%
    add_tile(DAB2IP)

## png 
##   2

Plot four putative target genes expression for each subtype split by DAB2IP status (Fig 7B)

goi <- c("BIRC5", "CDK5", "TBCA", "SFRS1")      # Note SFRS1 = SRSF1 as per Entrez gene ID

inner_join(TCGAexpr_DEG,combined,by="SampleID") %>%
    dplyr::filter(Gene %in% goi) %>%
    dplyr::select(SampleID,Gene,Expression.x,DAB2IP,Expression.y,ERstatus,SUBTYPE) %>%
    dplyr::rename("Expression" = Expression.x,"DAB2IP_expression" = Expression.y) %>%
    dplyr::filter(SUBTYPE != "BRCA_Normal") %>%
    dplyr::filter(DAB2IP != "DAB2IP_mid") %>%
    ggplot(aes(x=fct_relevel(DAB2IP,c("DAB2IP_low","DAB2IP_high")),y=Expression,fill=SUBTYPE)) +
    geom_boxplot(outlier.shape=NA) +
    geom_jitter(width=0.25) +
    xlab("Expression") +
    facet_grid(Gene~SUBTYPE) +
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

## png 
##   2

Plot heatmap of differentially expressed KEGG spliceosome genes (Fig 7D)

kegg.splice.genes <- c("SNRPE","SFRS1","SNRPG","SFRS2","SNRPD1","SNRPF","BUD31","MAGOHB","HNRNPU","HNRNPC","SNRPB2","ALYREF","SNRPC","SF3B6","SNRNP27","SNRPA1","USP39","PPIL1","LSM3","SNRPB")

inner_join(TCGAexpr_DEG,combined,by="SampleID") %>%
    dplyr::select(SampleID,Gene,Expression.x,DAB2IP,Expression.y,ERstatus,SUBTYPE) %>%
    dplyr::filter(Gene %in% kegg.splice.genes) %>%
    dplyr::rename("Expression" = Expression.x,"DAB2IP_expression" = Expression.y) %>%
    dplyr::filter(DAB2IP=="DAB2IP_high" | DAB2IP=="DAB2IP_low") %>%
    dplyr::filter(!is.na(SUBTYPE)) %>%
    dplyr::filter(ERstatus=="Positive" | ERstatus=="Negative") %>%
    group_by(SUBTYPE) %>%
    tidyHeatmap::heatmap(Gene,
        SampleID,
        Expression,
        scale="none",
        palette_value=colorRamp2(c(-3, 0, 3), c("blue", "white", "red")),
        cluster_columns=F,
        column_order=DAB2IP_order,
        show_row_names = T,
        show_column_names = FALSE) %>%
    add_tile(SUBTYPE) %>%
    add_tile(DAB2IP)

## png 
##   2

Save workspace

save.image("TCGA-BRCA_DAB2IP_010924.RData")

Get session info

sessionInfo()

## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Red Hat Enterprise Linux 8.8 (Ootpa)
## 
## Matrix products: default
## BLAS/LAPACK: /nas/longleaf/rhel8/apps/r/4.1.0/lib/libopenblas_haswellp-r0.3.5.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    grid      stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] patchwork_1.1.3             paletteer_1.4.0            
##  [3] ashr_2.2-54                 DESeq2_1.34.0              
##  [5] SummarizedExperiment_1.24.0 Biobase_2.54.0             
##  [7] MatrixGenerics_1.6.0        matrixStats_0.62.0         
##  [9] GenomicRanges_1.46.1        GenomeInfoDb_1.35.16       
## [11] IRanges_2.28.0              S4Vectors_0.32.4           
## [13] BiocGenerics_0.40.0         readxl_1.4.3               
## [15] tidyHeatmap_1.8.1           circlize_0.4.14            
## [17] ComplexHeatmap_2.10.0       lubridate_1.9.3            
## [19] forcats_1.0.0               stringr_1.5.1              
## [21] dplyr_1.1.0                 purrr_0.3.5                
## [23] readr_2.1.2                 tidyr_1.2.0                
## [25] tibble_3.1.8                ggplot2_3.4.0              
## [27] tidyverse_2.0.0             rmarkdown_2.12             
## [29] knitr_1.45                 
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_2.0-3       rjson_0.2.21           ellipsis_0.3.2        
##  [4] XVector_0.34.0         GlobalOptions_0.1.2    clue_0.3-60           
##  [7] farver_2.1.1           bit64_4.0.5            AnnotationDbi_1.56.2  
## [10] fansi_1.0.3            codetools_0.2-19       splines_4.1.0         
## [13] doParallel_1.0.17      cachem_1.0.8           geneplotter_1.72.0    
## [16] jsonlite_1.8.0         annotate_1.72.0        cluster_2.1.4         
## [19] png_0.1-8              compiler_4.1.0         httr_1.4.7            
## [22] Matrix_1.6-3           fastmap_1.1.1          cli_3.6.0             
## [25] htmltools_0.5.7        tools_4.1.0            gtable_0.3.0          
## [28] glue_1.6.2             GenomeInfoDbData_1.2.7 Rcpp_1.0.8.3          
## [31] cellranger_1.1.0       jquerylib_0.1.4        vctrs_0.5.2           
## [34] Biostrings_2.62.0      iterators_1.0.14       xfun_0.41             
## [37] irlba_2.3.5.1          timechange_0.2.0       lifecycle_1.0.4       
## [40] XML_3.99-0.9           dendextend_1.17.1      zlibbioc_1.40.0       
## [43] scales_1.2.0           vroom_1.5.7            hms_1.1.3             
## [46] parallel_4.1.0         rematch2_2.1.2         RColorBrewer_1.1-3    
## [49] prismatic_1.1.0        yaml_2.3.5             memoise_2.0.1         
## [52] gridExtra_2.3          sass_0.4.7             stringi_1.7.6         
## [55] RSQLite_2.2.10         SQUAREM_2021.1         highr_0.10            
## [58] genefilter_1.76.0      foreach_1.5.2          BiocParallel_1.28.3   
## [61] truncnorm_1.0-9        shape_1.4.6            rlang_1.1.2           
## [64] pkgconfig_2.0.3        bitops_1.0-7           evaluate_0.15         
## [67] lattice_0.20-45        invgamma_1.1           labeling_0.4.3        
## [70] bit_4.0.5              tidyselect_1.2.0       magrittr_2.0.3        
## [73] R6_2.5.1               magick_2.7.4           generics_0.1.2        
## [76] DelayedArray_0.20.0    DBI_1.1.2              pillar_1.7.0          
## [79] withr_2.5.0            mixsqp_0.3-48          survival_3.5-7        
## [82] KEGGREST_1.34.0        RCurl_1.98-1.6         crayon_1.5.1          
## [85] utf8_1.2.2             tzdb_0.2.0             viridis_0.6.2         
## [88] GetoptLong_1.0.5       locfit_1.5-9.5         blob_1.2.4            
## [91] digest_0.6.29          xtable_1.8-4           munsell_0.5.0         
## [94] viridisLite_0.4.2      bslib_0.5.1

DAB2IP expression among TCGA-BRCA tumor samples

Jeremy M. Simon

01/09/2024