+1
Abdulrahman Salhab 1 month ago • updated by DeepBlue Data Server 2 weeks ago 1

Hi Felipe,


I'm posting my question here again as you suggested.

I want to build a matrix summarizing methylation levels at CpGs resolution from RRBS data of ENCODE (only from tissues and primary cell, hg19 genome), so that I get the methylation level of every possible CpG covered by one sample (or more). and then download this matrix and save to a file.


what is the easiest way to do it in DeepBlue? in R environment would be the best.

 

Best,

Abdul


Hello, thank you for the question:


"How to build a matrix summarizing methylation level of every possible CpG covered resolution from RRBS data of ENCODE (only from tissues and primary cell, hg19 genome)."


Firstly, you have to list all RRBS experiments from ENcODE:

encode_rrbs <- deepblue_list_experiments(technique = "RRBS", project = "ENCODE")


Extract their IDs:

encode_rrbs_names <- deepblue_extract_names(encode_rrbs )

Select the columns where the aggregation will be performed:
exp_columns <- deepblue_select_column(encode_rrbs_names, "SCORE")


Find all CGs in the hg19 genome: (here, we are selecting only the chromosome 21):
cgs_hg19 <- deepblue_find_motif(motif="CG", genome="hg19", chromosomes="chr1")

This command will return a query ID, that “contains” all genomic coordinates where the pattern/motif “CG” appear.
Important remark: Do each chromosome separately. This kind of operation can be slow… we are talking about more than 200 samples, getting millions of regions of each one… so, do not be greedy.do not be greedy.


Building the score matrix:
deepblue_score_matrix(
    experiments_columns=experiments_columns,
    aggregation_function="mean", aggregation_regions_id=cgs_hg19

)

Use the command deepblue_download_request to download the data:
e.g. score_matrix <- deepblue_download_request_data(request_id = request_id)


And the command deepblue_export_tab to save the data:
deepblue_export_tab(score_matrix, target.directory = "./", "cgs_methylation")


Please, look at http://bioconductor.org/packages/release/bioc/manuals/DeepBlueR/man/DeepBlueR.pdf for a complete reference of all commands provided by DeepBlueR.


As reference, here is the full code:

library(DeepBlueR)

encode_rrbs <- deepblue_list_experiments(technique = "RRBS", project = "ENCODE")

exp_columns <- deepblue_select_column(encode_rrbs, "SCORE")

cgs_hg19 <- deepblue_find_motif(motif="CG", genome="hg19", chromosomes="chr1")

status <- deepblue_score_matrix(

   experiments_columns=exp_columns, 

   aggregation_function="mean", 

   aggregation_regions_id=cgs_hg19

)
score_matrix <- deepblue_download_request_data(request_id = request_id)
deepblue_export_tab(score_matrix, target.directory = "./", "cgs_methylation")