Felipe Albrecht (Pih) / Profile - Comments / DeepBlue Epigenomic Data Server

Query metadata based on ENCODE accession IDs (DeepBlueR)

Glad that you found the answer and thanks for sharing it!

9 years ago

Query metadata based on ENCODE accession IDs (DeepBlueR)

Hello,

when you execute the deepblue_list_experiments(), it returns the IDs and names of all avaialble experiments (almost 40k this time).

So, the info() will return an huge XML data, that is parsed by the R, that it is quite slow.

I strongly suggest you to filter the type of experiments that you want.

Examples:

DNA Methylation data

dna_methylation_exps = deepblue_list_experiments(project="ENCODE", epigenetic_mark="DNA Methylation")

H3K27ac peaks (bed files)

deepblue_list_experiments(project="ENCODE", epigenetic_mark="H3k27ac", type="peaks")

H3K27ac peaks (signal files)

deepblue_list_experiments(project="ENCODE", epigenetic_mark="H3k27ac", type="signal")

I hope this helps you.

9 years ago

How can I use DeepBlue to query ENCODE RNAseq at …

Hello Floris,

Currently DeepBlue does not import matrices from ENCODE, only wiggle and bed files.

We want to import the ENCODE gene quantification data (we already have from BLUEPRINT), but we can't give a precise time of when we will do it.

9 years ago

Query metadata based on ENCODE accession IDs (DeepBlueR)

Hello,

as DeepBlue is a multi-project data server (ENCODExplorer is focused only on ENCODE), we try to have general solutions to list and find the data from multiple projects.

Answering your question:

The ENCODE data imported into DeepBlue has the attribute 'accession' in the extra-metadata.

Check it at: (use ENCODE in the "project" column)

http://deepblue.mpi-inf.mpg.de/dashboard.php#ajax/deepblue_view_experiments.php

What you can do:

use the command [deepblue_]list_experiments, passing ENCODE as the project, this command returns a list of IDs and names.

For the IDs,execute the command [deepblue_]info(). This command returns the full metadata for the given IDs.
You can filter the experiments by accession using the 'accession' in the extra_metadata of these experiments.

Let me know if you answer your questions.

Thank you,
Felipe Albrecht

9 years ago

Missing Signal Values

Hello Richards,

the SIGNAL_VALUE is a column defined in the 'bed' files.

As a rule of thumb, you can assume that all signal files have the format: "CHROMOSOME,START,END,SIGNAL". We are working to give a more meaningful name for the 'SIGNAL' column, but it is a work in progress.

You can obtain the experiments columns using the info() command:

Example:

# Obtain the ID from the experiment name:

id_name <- deepblue_name_to_id("E050_WGBS_ReadCoverage.bedgraph", collection="experiments")

# Obtain information about the experiment:

info <- deepblue_info(id_name$id)

# Print the experiment format:

info$format

# More detailed view of the experiment columns/format

info$columns

I hope that it is what you are looking for.

Please, let me know if you have further questions!

9 years ago

Missing Signal Values

Hello Richard, thank you for the complete information.

You made a small mistake in the output format. You must use the column "VALUE" rather than "SIGNAL_VALUE".

The signal column in the wiggle files is called simply 'VALUE'.

The working example is here:

library(DeepBlueR)

id <- deepblue_select_experiments(experiment_name = "E050_WGBS_ReadCoverage.bedgraph", chromosome = "chr6", start=26330464, end=26330664)

request <- deepblue_get_regions(query_id = id, output_format = "CHROMOSOME,START,END,VALUE")

region <- deepblue_download_request_data(request_id = request)

region

Output:

> region

GRanges object with 18 ranges and 1 metadata column:

seqnames ranges strand | VALUE

<Rle> <IRanges> <Rle> | <character>

[1] chr6 [26330521, 26330522] * | 22.0000

[2] chr6 [26330522, 26330523] * | 22.0000

[3] chr6 [26330544, 26330545] * | 35.0000

[4] chr6 [26330545, 26330546] * | 35.0000

[5] chr6 [26330552, 26330553] * | 35.0000

... ... ... ... . ...

[14] chr6 [26330602, 26330603] * | 46.0000

[15] chr6 [26330603, 26330604] * | 44.0000

[16] chr6 [26330604, 26330605] * | 44.0000

[17] chr6 [26330644, 26330645] * | 28.0000

[18] chr6 [26330645, 26330646] * | 28.0000

-------

seqinfo: 1 sequence from an unspecified genome; no seqlengths

Please, let me know if you have further questions and thank you for posting this question here!

9 years ago

Methylation array data

Hello,

we do not have plans to include Methylation arrays data.

The main reason is that DeepBlue includes genomic regions data (bed and peaks files) and the array data must be converted to genomic regions before being included into DeepBlue.
It is possible to do this conversion, but we do not have enough man power to do it :-(

We will gladly include this data if we substantial upvotes for this idea.

Thank you!
Felipe Albrecht

10 years ago

Beautiful Design.

Thank you!

10 years ago

Your comments

User menu