Answered

Query metadata based on ENCODE accession IDs (DeepBlueR)

8 years ago • updated by 8 years ago • 7

Hi, I'm interested to use DeepBlue to fetch ENCODE metadata based on the accesssion IDs.

Eg. ENCSR000AEH, ENCSR000AEF, ENCSR000AED

This can be done in the package ENCODExplorer but I could not find such features in DeepBlueR. https://www.bioconductor.org/packages/release/bioc/html/ENCODExplorer.html

Vote

Replies 7
Oldest first
- Newest first
- Oldest first

8 years ago

Hello,

as DeepBlue is a multi-project data server (ENCODExplorer is focused only on ENCODE), we try to have general solutions to list and find the data from multiple projects.

Answering your question:

The ENCODE data imported into DeepBlue has the attribute 'accession' in the extra-metadata.

Check it at: (use ENCODE in the "project" column)

http://deepblue.mpi-inf.mpg.de/dashboard.php#ajax/deepblue_view_experiments.php

What you can do:

use the command [deepblue_]list_experiments, passing ENCODE as the project, this command returns a list of IDs and names.

For the IDs,execute the command [deepblue_]info(). This command returns the full metadata for the given IDs.
You can filter the experiments by accession using the 'accession' in the extra_metadata of these experiments.

Let me know if you answer your questions.

Thank you,
Felipe Albrecht

8 years ago

Hey Felipe,

Thanks for your response. I'm currently trying the following:

> experiments = deepblue_list_experiments()
Called method: deepblue_list_experiments
Reported status was: okay
> experiment_meta = deepblue_info(id = experiments$id)

The deepblue_info() command either hangs or is taking a long time (been waiting 30 minutes or so).

Thanks,

Floris

8 years ago

Hello,

when you execute the deepblue_list_experiments(), it returns the IDs and names of all avaialble experiments (almost 40k this time).

So, the info() will return an huge XML data, that is parsed by the R, that it is quite slow.

I strongly suggest you to filter the type of experiments that you want.

Examples:

DNA Methylation data

dna_methylation_exps = deepblue_list_experiments(project="ENCODE", epigenetic_mark="DNA Methylation")

H3K27ac peaks (bed files)

deepblue_list_experiments(project="ENCODE", epigenetic_mark="H3k27ac", type="peaks")

H3K27ac peaks (signal files)

deepblue_list_experiments(project="ENCODE", epigenetic_mark="H3k27ac", type="signal")

I hope this helps you.

8 years ago

Thanks! This works well. Any suggestions on processing the resulting info?

The function given in the tutorial does not work for me:

# Obtain the information about the experiment_id
  info = deepblue_info(experiment_id)

  # Print the experiment name, project, biosource, and epigenetic mark.
  with(info, { data.frame(name = name, project = project,
    biosource = sample_info$biosource_name, epigenetic_mark = epigenetic_mark)
      })

This returns an error.

I've also tried to use many different tidyverse options try to convert it to a managable nested data frame (eg. combinations of flatten and unnest) structure but I'm not having any luck.

8 years ago

Solved, using purrr does the trick for me

http://r4ds.had.co.nz/lists.html#hierarchy

https://jennybc.github.io/purrr-tutorial/ls01_map-name-position-shortcuts.html.

If this is helpful to anyone, I used the following code:

> experiments = deepblue_list_experiments(project = "ENCODE")
> ## Note the following step can take quite a while
> tmp = deepblue_info(id = experiments$id)
> meta = tibble(experiment_id = map_chr(tmp, "_id"),
+               file_accession = map(tmp, "extra_metadata") %>% map_chr("file_encode_accession", .null = NA), ## Ensures missing data does not throw error
+               sample_accession = map(tmp, "sample_info") %>% map_chr("accession", .null = NA),
+               genome = map_chr(tmp, "genome"),
+               epigenetic_mark = map_chr(tmp, "epigenetic_mark"),
+               description = map_chr(tmp, "description"),
+               project = map_chr(tmp, "project"),
+               technique = map_chr(tmp, "technique"),
+               file_type = map(tmp, "extra_metadata") %>% map_chr("file_type", .null = NA),
+               biosource_name = map(tmp, "sample_info") %>% map_chr("biosource_name", .null = NA),
+               biosource_type = map(tmp, "sample_info") %>% map_chr("biosample_type", .null = NA))

8 years ago

Glad that you found the answer and thanks for sharing it!

Answered

8 years ago

Customer support service by UserEcho