scpdata
packagescpdata
disseminates mass spectrometry (MS)-based
single-cell proteomics (SCP) data sets formatted using the
scp
data structure. The data structure is described in the
scp
vignette.
In this vignette, we describe how to access the SCP data sets. To
start, we load the scpdata
package.
ExperimentHub
The data is stored using the ExperimentHub
infrastructure. We first create a connection with
ExperimentHub
.
You can list all data sets available in scpdata
using
the query function.
query(eh, "scpdata")
#> ExperimentHub with 26 records
#> # snapshotDate(): 2024-10-24
#> # $dataprovider: MassIVE, PRIDE, SlavovLab website, Dataverse
#> # $species: Homo sapiens, Mus musculus, Rattus norvegicus, Gallus gallus
#> # $rdataclass: QFeatures
#> # additional mcols(): taxonomyid, genome, description,
#> # coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#> # rdatapath, sourceurl, sourcetype
#> # retrieve records with, e.g., 'object[["EH3899"]]'
#>
#> title
#> EH3899 | specht2019v2
#> EH3900 | specht2019v3
#> EH3901 | dou2019_lysates
#> EH3902 | dou2019_mouse
#> EH3903 | dou2019_boosting
#> ... ...
#> EH9450 | gregoire2023_mixCTRL
#> EH9477 | khan2023
#> EH9487 | guise2024
#> EH9497 | petrosius2023_mES
#> EH9498 | petrosius2023_AstralAML
Another way to get information about the available data sets is to
call scpdata()
. This will retrieve all the available
metadata. For example, we can retrieve the data set titles along with
the description to make an informed choice about which data set to
choose.
title | description | |
---|---|---|
EH3899 | specht2019v2 | SCP expression data for monocytes (U-937) and macrophages at PSM, peptide and protein level |
EH3900 | specht2019v3 | SCP expression data for more monocytes (U-937) and macrophages at PSM, peptide and protein level |
EH3901 | dou2019_lysates | SCP expression data for Hela digests (0.2 or 10 ng) at PSM and protein level |
EH3902 | dou2019_mouse | SCP expression data for C10, SVEC or Raw cells at PSM and protein level |
EH3903 | dou2019_boosting | SCP expression data for C10, SVEC or Raw cells and 3 boosters (0, 5 or 50 ng) at PSM and protein level |
EH3904 | zhu2018MCP | Near SCP expression data for micro-dissection rat brain samples (50, 100, or 200 µm width) at PSM level |
EH3905 | zhu2018NC_hela | Near SCP expression data for HeLa samples (aproximately 12, 40, or 140 cells) at PSM level |
EH3906 | zhu2018NC_lysates | Near SCP expression data for HeLa lysates (10, 40 and 140 cell equivalent) at PSM level |
EH3907 | zhu2018NC_islets | Near SCP expression data for micro-dissected human pancreas samples (control patients or type 1 diabetes) at PSM level |
EH3908 | cong2020AC | SCP expression data for Hela cells at PSM, peptide and protein level |
EH3909 | zhu2019EL | SCP expression data for chicken utricle samples (1, 3, 5 or 20 cells) at PSM, peptide and protein level |
EH6011 | liang2020_hela | Expression data for HeLa cells (0, 1, 10, 150, 500 cells) at PSM, peptide and protein level |
EH7085 | schoof2021 | Single-cell proteomics data from OCI-AML8227 cell culture to reconstruct the cellular hierarchy. |
EH7295 | williams2020_lfq | Single-cell label free proteomics data from a MCF10A cell line culture. |
EH7296 | williams2020_tmt | Single-cell proteomics data from three acute myeloid leukemia cell line culture (MOLM-14, K562, CMK). |
EH7712 | derks2022 | Single-cell and bulk (100-cell) proteomics data of PDAC, melanoma cells and monocytes. |
EH7713 | brunner2022 | Single-cell proteomics data of cell cycle stages in HeLa. |
EH8301 | leduc2022_pSCoPE | Single-cell proteomics data of 878 melanoma cells and 877 monocytes (pSCoPE). |
EH8302 | leduc2022_plexDIA | Single-cell proteomics data of 126 melanoma cells (plexDIA). |
EH8303 | woo2022_macrophage | Single-cell proteomics data from LPS-treated macrophages. |
EH8304 | woo2022_lung | Single-cell proteomics data from primary human lung cells. |
EH9450 | gregoire2023_mixCTRL | Single-cell proteomics data from two monocyte cell lines |
EH9477 | khan2023 | Single-cell proteomics data of 421 MCF-10A cells undergoing EMT triggered by TGF-beta |
EH9487 | guise2024 | Single-cell proteomics data of 108 postmortem CTL or ALS spinal moto neurons |
EH9497 | petrosius2023_mES | Mouse embryonic stem cells across ground-state (m2i) and differentiation-permissive (m15) culture conditions. |
EH9498 | petrosius2023_AstralAML | Single-cell proteomics data of 4 cell types from the OCI-AML8227 model. |
To get one of the data sets (e.g.
dou2019_lysates
) you can either retrieve it using the
ExperimentHub
query function
scp <- eh[["EH3901"]]
#> see ?scpdata and browseVignettes('scpdata') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
scp
#> An instance of class QFeatures containing 4 assays:
#> [1] Hela_run_1: SingleCellExperiment with 24562 rows and 10 columns
#> [2] Hela_run_2: SingleCellExperiment with 24310 rows and 10 columns
#> [3] peptides: SingleCellExperiment with 13934 rows and 20 columns
#> [4] proteins: SingleCellExperiment with 1641 rows and 20 columns
or you can the use the built-in functions from
scpdata
scp <- dou2019_lysates()
#> see ?scpdata and browseVignettes('scpdata') for documentation
#> loading from cache
scp
#> An instance of class QFeatures containing 4 assays:
#> [1] Hela_run_1: SingleCellExperiment with 24562 rows and 10 columns
#> [2] Hela_run_2: SingleCellExperiment with 24310 rows and 10 columns
#> [3] peptides: SingleCellExperiment with 13934 rows and 20 columns
#> [4] proteins: SingleCellExperiment with 1641 rows and 20 columns
Each data set has been extensively documented in a separate man page
(e.g. ?dou2019_lysates
). You can find information
about the data content, the acquisition protocol, the data collection
procedure as well as the data sources and reference.
For more information about manipulating the data sets, check the scp
package. The scp
vignette
will guide you through a typical SCP data processing workflow. Once your
data is loaded from scpdata
you can skip section 2 Read
in SCP data of the scp
vignette.
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] scpdata_1.13.1 ExperimentHub_2.13.1
[3] AnnotationHub_3.13.3 BiocFileCache_2.13.2
[5] dbplyr_2.5.0 QFeatures_1.15.3
[7] MultiAssayExperiment_1.31.5 SummarizedExperiment_1.35.5
[9] Biobase_2.65.1 GenomicRanges_1.57.2
[11] GenomeInfoDb_1.41.2 IRanges_2.39.2
[13] S4Vectors_0.43.2 BiocGenerics_0.51.3
[15] MatrixGenerics_1.17.1 matrixStats_1.4.1
[17] BiocStyle_2.33.1
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 dplyr_1.1.4
[3] blob_1.2.4 Biostrings_2.73.2
[5] filelock_1.0.3 SingleCellExperiment_1.27.2
[7] fastmap_1.2.0 lazyeval_0.2.2
[9] digest_0.6.37 mime_0.12
[11] lifecycle_1.0.4 cluster_2.1.6
[13] ProtGenerics_1.37.1 KEGGREST_1.45.1
[15] RSQLite_2.3.7 magrittr_2.0.3
[17] compiler_4.4.1 rlang_1.1.4
[19] sass_0.4.9 tools_4.4.1
[21] igraph_2.1.1 utf8_1.2.4
[23] yaml_2.3.10 knitr_1.48
[25] S4Arrays_1.5.11 bit_4.5.0
[27] curl_5.2.3 DelayedArray_0.31.14
[29] plyr_1.8.9 abind_1.4-8
[31] withr_3.0.2 purrr_1.0.2
[33] sys_3.4.3 grid_4.4.1
[35] fansi_1.0.6 MASS_7.3-61
[37] cli_3.6.3 rmarkdown_2.28
[39] crayon_1.5.3 generics_0.1.3
[41] httr_1.4.7 reshape2_1.4.4
[43] BiocBaseUtils_1.7.3 DBI_1.2.3
[45] cachem_1.1.0 stringr_1.5.1
[47] zlibbioc_1.51.2 AnnotationDbi_1.67.0
[49] AnnotationFilter_1.29.0 BiocManager_1.30.25
[51] XVector_0.45.0 vctrs_0.6.5
[53] Matrix_1.7-1 jsonlite_1.8.9
[55] bit64_4.5.2 clue_0.3-65
[57] maketools_1.3.1 tidyr_1.3.1
[59] jquerylib_0.1.4 glue_1.8.0
[61] stringi_1.8.4 BiocVersion_3.20.0
[63] UCSC.utils_1.1.0 tibble_3.2.1
[65] pillar_1.9.0 rappdirs_0.3.3
[67] htmltools_0.5.8.1 GenomeInfoDbData_1.2.13
[69] R6_2.5.1 evaluate_1.0.1
[71] lattice_0.22-6 png_0.1-8
[73] memoise_2.0.1 bslib_0.8.0
[75] Rcpp_1.0.13 SparseArray_1.5.45
[77] xfun_0.48 MsCoreUtils_1.17.3
[79] buildtools_1.0.0 pkgconfig_2.0.3
This vignette is distributed under a CC BY-SA license.