Database structure


1. master database

This database is constructed by extracting all the HCV/HBV/HEV entries from the latest release of DDBJ (DNA Data Bank of Japan) database, and arranged them by two aspects, the genomic location and the phylogenetic relation.

1.1. HCV master database

HCV master database is constructed by the following process.
  1. All the HCV related entries were retrieved by keywords from DDBJ database.
  2. The reference sequence was chosen arbitrarily from full-length genome entries. It is used as a template for locating all the entries. We use a sutype 1b (accession No. D10750) genome for the purpose.
  3. The location of loci, C, E1, NS1/E2, NS2, NS3, NS4, and NS5 are obtained from annotation of the reference sequence and compile as the reference map.
  4. Each entry extracted in step.1 is aligned against the reference sequence by use of LALIGN (in FASTA package). The results are compiled as the map information.
  5. Divisions (see below) are made for each locus (nucleic acid division). Then each amino acid division is also made by translating the corresponding nucleic acid division.

1.2. HBV master database

HBV master database is constructed by almost the same manner as for the HCV. In this database, a genotype G (accession No. AF160501) is chosed as the reference sequence, and division for each seven loci, PreC, C, Pol, PreS1, PreS2, S, and X is prepared.

1.3. HEV master database

HEV master database is constructed by almost the same manner as for the HCV. In this database, M73218 is chosed as the reference sequence, and division for each three loci, ORF1, ORF2, and ORF3 is prepared.

2. division

To manage sequences, annotation, and results of phylogenetic analyses, a data unit division is defined. Each division contains the data described in Table.1.
Each of division in the master database is made by the following process.
  1. All the entries which cover a locus are extracted from the master database by reference of map information and reference map, and then each sequence is clipped at the end of the locus. The corresponding annotation are also extracted.
  2. The sequences are multiply aligned by use of CLUSTALW.
  3. Genetic distances are calculated by use of 6-parameter method. In the case of amino acid division, Kimura's methos is used.
  4. A phylogenetic tree is calculated by use of NJ method.
typedescription
seq nucleotide or amino acid sequences by FASTA format
anno annotation of entries (typically, by DDBJ release format)
idx table of entry ID, accession No., and sequence length
idxa CDS location and product length (amino acid division only)
tag additional data (optional)
aln multiple alignment
mtrx genetic distance matrix
tree phylogenetic tree
btree result of bootstrap re-sampling analysis

Table.1: Data types of which a division consists.

2.1. private division

Users in account mode can make their own division, which is called private division, to analyze their own data. There are several ways to make a private division.
  1. By extracting partial data from master database by use of Map Viewer.
  2. By extracting partial data from master database by use of Tree Viewer.
  3. By uploading their own data file.
  4. By copying another division.
  5. By merging two or more divisions.
Users can combine the above processes. For example, if users want to analyze a data set which contain both their own data and public data, they extract public data by 1), 2), or 4), and upload their data by 3), and then merge them by 5). For more detail, please refer to a document of private divisions.

[DB home][top]