+1
Answered
Query metadata based on ENCODE accession IDs (DeepBlueR)
Hi, I'm interested to use DeepBlue to fetch ENCODE metadata based on the accesssion IDs.
Eg. ENCSR000AEH, ENCSR000AEF, ENCSR000AED
This can be done in the package ENCODExplorer but I could not find such features in DeepBlueR. https://www.bioconductor.org/packages/release/bioc/html/ENCODExplorer.html
Customer support service by UserEcho
Hello,
as DeepBlue is a multi-project data server (ENCODExplorer is focused only on ENCODE), we try to have general solutions to list and find the data from multiple projects.
Answering your question:
The ENCODE data imported into DeepBlue has the attribute 'accession' in the extra-metadata.
Check it at: (use ENCODE in the "project" column)
http://deepblue.mpi-inf.mpg.de/dashboard.php#ajax/deepblue_view_experiments.php
What you can do:
use the command [deepblue_]list_experiments, passing ENCODE as the project, this command returns a list of IDs and names.
For the IDs,execute the command [deepblue_]info(). This command returns the full metadata for the given IDs.
You can filter the experiments by accession using the 'accession' in the extra_metadata of these experiments.
Let me know if you answer your questions.
Thank you,
Felipe Albrecht
Hey Felipe,
Thanks for your response. I'm currently trying the following:
The deepblue_info() command either hangs or is taking a long time (been waiting 30 minutes or so).
Thanks,
Floris
Hello,
when you execute the deepblue_list_experiments(), it returns the IDs and names of all avaialble experiments (almost 40k this time).
So, the info() will return an huge XML data, that is parsed by the R, that it is quite slow.
I strongly suggest you to filter the type of experiments that you want.
Examples:
DNA Methylation data
dna_methylation_exps = deepblue_list_experiments(project="ENCODE", epigenetic_mark="DNA Methylation")
H3K27ac peaks (bed files)
deepblue_list_experiments(project="ENCODE", epigenetic_mark="H3k27ac", type="peaks")
H3K27ac peaks (signal files)
deepblue_list_experiments(project="ENCODE", epigenetic_mark="H3k27ac", type="signal")
I hope this helps you.
Thanks! This works well. Any suggestions on processing the resulting info?
The function given in the tutorial does not work for me:
This returns an error.
I've also tried to use many different tidyverse options try to convert it to a managable nested data frame (eg. combinations of flatten and unnest) structure but I'm not having any luck.
Solved, using purrr does the trick for me
http://r4ds.had.co.nz/lists.html#hierarchy
https://jennybc.github.io/purrr-tutorial/ls01_map-name-position-shortcuts.html.
If this is helpful to anyone, I used the following code:
Glad that you found the answer and thanks for sharing it!