| Title: | Companion Package for WSBIM1207 Course |
|---|---|
| Description: | Companion package for the WSBIM1207 course, distributing data and general documentation, and making course administration easier. |
| Authors: | Laurent Gatto [aut, cre] (ORCID: <https://orcid.org/0000-0002-1520-2268>) |
| Maintainer: | Laurent Gatto <[email protected]> |
| License: | GPL-2 |
| Version: | 0.1.19 |
| Built: | 2026-05-07 06:48:00 UTC |
| Source: | https://github.com/UCLouvain-CBIO/rWSBIM1207 |
Apple mobility data, downloaded from https://www.apple.com/covid19/mobility on the 18 August 2021.
The following description has been taken from the Apple Mobility Trends Reports page:
The CSV file and charts on this site show a relative volume of directions requests per country/region, sub-region or city compared to a baseline volume on January 13th, 2020. We define our day as midnight-to-midnight, Pacific time. Cities are defined as the greater metropolitan area and their geographic boundaries remain constant across the data set. In many countries/regions, sub-regions, and cities, relative volume has increased since January 13th, consistent with normal, seasonal usage of Apple Maps. Day of week effects are important to normalize as you use this data. Data that is sent from users’ devices to the Maps service is associated with random, rotating identifiers so Apple doesn’t have a profile of individual movements and searches. Apple Maps has no demographic information about our users, so we can’t make any statements about the representativeness of usage against the overall population.
apple_mobility.csv()apple_mobility.csv()
https://www.apple.com/covid19/mobility
apple_mobility.csv() read.csv(apple_mobility.csv())apple_mobility.csv() read.csv(apple_mobility.csv())
A small data frame describing the beer consumption and and demographics of 48 people.
data("beers")data("beers")
A data frame with 48 observations on the following 8 variables.
Record_IDa numeric vector
Worka factor with levels Employed Unemployed
Consumptiona numeric vector
Gendera factor with levels Female Male
Agea numeric vector
Daya numeric vector
Montha numeric vector
Yeara numeric vector
data(beers) beers str(beers) f <- beers.csv() basename(f) beers2 <- read.csv(f, sep = ";") beers2 identical(beers, beers2)data(beers) beers str(beers) f <- beers.csv() basename(f) beers2 <- read.csv(f, sep = ";") beers2 identical(beers, beers2)
These data were downloaded from Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE (https://github.com/CSSEGISandData/COVID-19):
This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL).
The 'covid19_cases.csv()', 'covid19_deaths.csv()' and 'covid19_recoevered.csv()' functions return the path to a comma-separated file containing confirmed cases, deaths and recovered cases over time for a certain number of coutries/regions. See example below for details.
covid19_cases.csv() covid19_deaths.csv() covid19_recovered.csv()covid19_cases.csv() covid19_deaths.csv() covid19_recovered.csv()
cv <- readr::read_csv(covid19_cases.csv()) ## dates (fomat: month/day/year) names(cv)[-(1:4)] ## Countries/Regions unique(cv[[2]]) ## Provice/States unique(cv[[1]])cv <- readr::read_csv(covid19_cases.csv()) ## dates (fomat: month/day/year) names(cv)[-(1:4)] ## Countries/Regions unique(cv[[2]]) ## Provice/States unique(cv[[1]])
These two files describe the 'Educational attainment of young people in English towns' from the the UK Office for National Statistics. It was explored in the July 2023 article "Why do children and young people in smaller towns do better academically than those in larger towns?".
Two files are available:
english_education.csv was prepared as part of the Tidy Tuesday series
on Educational attainment of young people in English towns
(2024-01-23). The page also describes the variables.
edu_income_eprivation_and_educational_attainment.csv was
downloaded from the UK Office for National Statistics and
converted from xls to csv. The variables in this table are also
described on the page linked above.
english_education_files()english_education_files()
character(2) with file names.
Laurent Gatto
Tidy Tuesday data from 2024-01-23: https://github.com/rfordatascience/tidytuesday/blob/master/data/2024/2024-01-23/readme.md
english_education_files()english_education_files()
Ce jeu de données reperdent l'évolution mensuelle des faillites par NACE - 15 jours de 2005 à 2023, telles que distribuées par statbel, l'office belge de statistique.
faillites_be()faillites_be()
character(2) with file names.
Les données ont été téléchargées de https://statbel.fgov.be/fr/open-data/evolution-mensuelle-des-faillites-par-nace.
Un sous-échantillon de 11544 observation a été sélectionné, de
telle manière a garder les entreprises de grandes tailles. Voir
le scripts/faillites.R.
TF_BANKRUPTCIES_subset.txt.zip: échantillon des données sous
format compressé, les valeurs étant séparées pas le caractère
| (format identique aux données complètes distribuées par
statbel).
Method_BANKRUPTCIES.xlsx: les métadonnées décrivant les
variables, en format xlsx, tel que distribué par statbel.
faillites_be()faillites_be()
Dataset from The effect of upper-respiratory infection on transcriptomic changes in the CNS by Blackmore et al. (2017) (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5544260/) used as data for the Bioconductor intro and RNA-seq lessons and WSBIM1207 courses. The two variables are of class SummarizedExperiment.
data("GSE96870_intro") data("GSE96870_intro_ranges")data("GSE96870_intro") data("GSE96870_intro_ranges")
For details on how the data was prepared, see https://github.com/Bioconductor/bioconductor-teaching/tree/master/data/GSE96870.
data("GSE96870_intro") GSE96870_intro rowData(GSE96870_intro) data("GSE96870_intro_ranges") GSE96870_intro_ranges rowRanges(GSE96870_intro_ranges)data("GSE96870_intro") GSE96870_intro rowData(GSE96870_intro) data("GSE96870_intro_ranges") GSE96870_intro_ranges rowRanges(GSE96870_intro_ranges)
'interroA.csv' and 'interroB.csv' are two comme-separated spreadsheets that provide made-up data about student test restults.
A data frame with 100 observations on the following 8 variables.
idstudent identifer.
heightstudent heights (in cm).
genderF or M.
Xa vector of random data drawn from N(0, 1).
interro1a numeric vector with test scores.
interro2a numeric vector with test scores.
interro3a numeric vector with test scores.
interro4a numeric vector with test scores.
The 'interroC' data contains results for results for 3 new tests for 15 students, including a subset of students in 'interroA'.
'interroL' is a long format containing all results from 'interroA' and 'interroB'.
f <- interroA.csv() interroA <- read.csv(f) head(interroA) f2 <- interro2.rds() readRDS(f2)f <- interroA.csv() interroA <- read.csv(f) head(interroA) f2 <- interro2.rds() readRDS(f2)
A set of tibbles contain information about genes and their protein products used to illustrate join operations.
data("jdf")data("jdf")
These data are based on feature varaibles from the
hyperLOPIT2015 data available in the pRolocdata
package. The script to generate, join.R, is available in the
scipts package directory.
data(jdf) library("dplyr") dplyr::full_join(jdf1, jdf2) dplyr::left_join(jdf6, jdf7)data(jdf) library("dplyr") dplyr::full_join(jdf1, jdf2) dplyr::left_join(jdf6, jdf7)
This data is composed of three files, namely kem_counts.tsv,
kem_counts2.tsv, containing counts data and the annotation file
kem_annot.tsv containing the annotations the KEM samples. Both
files are encoded as tab-separtated sheets and can be found with the
kem.tsv() function.
Credit: the data have been generated by Mr Kevin Missault.
The RNA-Seq count data for 13 ENSEMBL transcripts (ref) and 16 KEM samples is encoded as:
refENSEMBL transcript identifiers.
Expression counts for all genes in sample KEM182-01.
Expression counts for all genes in sample KEM182-02.
Expression counts for all genes in sample KEM182-03.
Expression counts for all genes in sample KEM182-04.
Expression counts for all genes in sample KEM182-05.
Expression counts for all genes in sample KEM182-06.
Expression counts for all genes in sample KEM182-07.
Expression counts for all genes in sample KEM182-08.
Expression counts for all genes in sample KEM182-09.
Expression counts for all genes in sample KEM182-10.
Expression counts for all genes in sample KEM182-11.
Expression counts for all genes in sample KEM182-12.
Expression counts for all genes in sample KEM182-13.
Expression counts for all genes in sample KEM182-14.
Expression counts for all genes in sample KEM182-15.
Expression counts for all genes in sample KEM182-16.
The kem_counts2.tsv file contains counts data for 4774 features.
The annotation contains the following variables for the 16 observations:
sample_idSample identifier.
jurkatA character (yes or no) defining of the cells a Jurkat cells.
cell_typeJurkat cell type (A or B).
treatmentTreatment: either none or stimulated.
kem.tsv() kem2.tsv() kem3.tsv()kem.tsv() kem2.tsv() kem3.tsv()
This function generates data based on a student number and loads it in the users environment.
load_exam_data(noma)load_exam_data(noma)
noma |
'character(1)' with the student number. Must be coercible to a 'numeric'. |
Invisibly returns 'TRUE'. Used for its side effect to load an object of class ‘MSnSet' in the user’s global environment.
load_exam_data("0123") x0123load_exam_data("0123") x0123
This is the "mulvey2015" data from Mulvey et al., _Dynamic proteomic profiling of extra-embryonic endoderm differentiation in mouse embryonic stem cells. _, Stem Cell. (PMID 26059426). See below for more details.
It was extracted from pRolocdata package and converted into a
long format.
data("mvylng")data("mvylng")
data(mvylng) mvylngdata(mvylng) mvylng
A character vector containing peptide sequences.
data("peptides")data("peptides")
The peptides were extraced from the 'hyperLOPIT2015ms3r1psm' object in the 'pRolocdata' package using 'unique(as.character(fData(hyperLOPIT2015ms3r1psm)$Sequence))'.
data(peptides) head(peptides)data(peptides) head(peptides)
A set of csv files providing the population sizes in different regions in Belgium from 1970, 1981 and 1991 to 2023.
population_be.csv()population_be.csv()
population_be.csv() read.csv(population_be.csv()[1])population_be.csv() read.csv(population_be.csv()[1])
Number of killed, seriously injured, slightly injured and uninjured victims of road accidents, by age group, type of user, sex and various characteristics of the accident in Belgium in 20222.
Nombre de tués, blessés graves, blessés légers et victimes indemnes d'accidents de la route, par classe d'âges, genre d'usager, sexe et diverses caractéristiques de l'accident en Belgique en 2022.
These are publicly available data downloaded on 16 June 2023 from the Belgian gouvernment open data page. For more details on these data, see https://statbel.fgov.be/fr/themes/mobilite/circulation/accidents-de-la-circulation
road_accidents_be_2022.rds() road_accidents_be_meta.csv()road_accidents_be_2022.rds() road_accidents_be_meta.csv()
readRDS(road_accidents_be_2022.rds()) read.csv(road_accidents_be_meta.csv())readRDS(road_accidents_be_2022.rds()) read.csv(road_accidents_be_meta.csv())
This package is used to distribute data and general documentation about the WSBIM1207 course, and to make course administration easier. For details about the course and the course material, see http://bit.ly/WSBIM1207.
rWSBIM1207version()rWSBIM1207version()
The following data sets are available. Consult the respective manual pages for further details.
tcga clinical and RNA expression data: see ?tcga.
beer consumption data: see ?beers.
tables to illustrate joins: see jdf.
Maintainer: Laurent Gatto [email protected] (ORCID)
## check the package version that is currently installed rWSBIM1207version()## check the package version that is currently installed rWSBIM1207version()
The Cancer Genome Atlas (TCGA) is a collaboration between the National
Cancer Institute (NCI) and the National Human Genome Research Institute
(NHGRI) that has generated multi-omics analyses (genomic, transcriptomic,
proteomic and epigenetic) in 33 types of cancer.
RNAseq and clinical data analysed here come from LUAD (lung adenocarcinoma)
tumors and corresponding patients.
TCGA clinical and RNAseq expression data extracted from the
curatedTCGAData package. See inst/scripts/tcga.R for
details.
data("expression") data("clinical1") data("clinical2") data("clinical_table_ex1")data("expression") data("clinical1") data("clinical2") data("clinical_table_ex1")
expression: RNA expression data frame with 570 observations on
the following 8 variables.
sampleIDa factor
patienta character vector
typea character vector
A1BGa numeric vector
A1CFa numeric vector
A2BP1a numeric vector
A2LD1a numeric vector
A2ML1a numeric vector
clinical1: clinical data for 516 observations on the following
15 variables.
patientIDa character vector
tumor_tissue_sitea character vector
gendera character vector
age_at_diagnosisa numeric vector
vital_statusa numeric vector
days_to_deatha numeric vector
days_to_last_followupa numeric vector
pathologic_stagea character vector
pathology_T_stagea character vector
pathology_N_stagea character vector
pathology_M_stagea character vector
smoking_historya character vector
number_pack_years_smokeda numeric vector
year_of_tobacco_smoking_onseta numeric vector
stopped_smoking_yeara numeric vector
clinical2: small clinical data with 516 observations on the
following 3 variables.
patientIDa character vector
gendera character vector
years_at_diagnosisa numeric vector
A clinical summary data with 2 observations on the following 3 variables.
gendera character vector
a numeric vector
a numeric vector
In addition, the clinical1.csv and expression.csv
function return the paths to these respective comma-separated value
spreadsheets. The expressions.csv function returns the path to
the expression data split by gene.
data(expression) data(clinical1) data(clinical2) data(clinical_table_ex1)data(expression) data(clinical1) data(clinical2) data(clinical_table_ex1)
These data, originally organized by Suraj Das for a Kaggle dataset, have been cleaned and prepared for the TidyTuesday series.
The data contains three files:
historical_spending contains Valentine's spending from 2010 to 2022.
gifts_age contains information on gifts by age.
gifts_gender contains information on gifts by gender.
For a description of the variables of these table, see the link above.
Note that in these historical surveys, gender was collected as only 'Men' and 'Women', which does not accurately include all genders.
valentine()valentine()
character(3) of urls to file names.
Laurent Gatto
valentine()valentine()