The Atlas website structure
http://atlasgeneticsoncology.org

Philippe Dessen (Database Director) This email address is being protected from spambots. You need JavaScript enabled to view it.
Jean Loup Huret (Editor) This email address is being protected from spambots. You need JavaScript enabled to view it.
May 2017

 

 


I- Main page: 


II-1. Foreword 1: There are various types of items developed in the Atlas:
1- Genes (http://atlasgeneticsoncology.org/Genes/XXX )
1-1. Annotated genes (papers/cards written by authors) URLs: http://atlasgeneticsoncology.org/Genes/[name-of-gene]ID[number]ch[location].html (1 493 papers/cards, e.g. http://atlasgeneticsoncology.org/Genes/PGRID41700ch11q22.html); and
1-2. Automated cards on genes (more or less like GeneCards); URLs: http://atlasgeneticsoncology.org/Genes/GC_[name-of-gene].html (28 377 cards);
2- Leukemias (http://atlasgeneticsoncology.org/Anomalies/XXX ) (681 annotated papers/cards)
3- Solid tumors (http://atlasgeneticsoncology.org/Tumors/XXX ) (217 annotated papers/cards)
4- Cancer-prone diseases (http://atlasgeneticsoncology.org/Kprones/XXX ) (114 annotated papers/cards)
5- Case reports in hematology (http://atlasgeneticsoncology.org/Reports/XXX ) (88 papers/cards)

All these cards are structured from templates (e.g. Submission form for GENES: http://atlasgeneticsoncology.org/Forms/Gene_Form_for_submission.doc) with the addition of a HEADER with tags or tracking devices allowing for indexing of the form in different parts of the data base (e.g. TRI_PAR_CHROMOSOME -> to which chromosome page (red arrow)? CATEGORY-> to which Cell Biology page (red arrow)?); see also: http://atlasgeneticsoncology.org/Collab/catalog ;
- and EXTERNAL LINKS (bottom of each paper/card).
There are also
- Deep Insights (traditional papers) http://atlasgeneticsoncology.org/Deep/XXX (113 Deep)
- Chromosome pages (http://atlasgeneticsoncology.org/Indexbychrom/idxa_[chromosome-number].html e.g. http://atlasgeneticsoncology.org/Indexbychrom/idxa_11.html ) and
- Chromosome band pages (http://atlasgeneticsoncology.org/Bands/[band].html e.g. http://atlasgeneticsoncology.org/Bands/19p13.html ),
- Cell biology pages (http://atlasgeneticsoncology.org/Categories/[category-name] e.g. http://atlasgeneticsoncology.org/Categories/Cell_cycle.html
- ICD-O pages (International Classification of Diseases - Oncology WHO/OMS) e.g. http://atlasgeneticsoncology.org/Tumors/Solid_Nosology.html and http://atlasgeneticsoncology.org/ICD/icd_2016_topo.html
- Atlas status (thesaurus of the Atlas: http://atlasgeneticsoncology.org/Status/Status.html and sub-pages)
- and various other pages (Backpage: http://atlasgeneticsoncology.org/BackpageAbout.html Recent papers http://atlasgeneticsoncology.org/Recent.html , Educational items http://atlasgeneticsoncology.org/GeneticEng.html , Genes partners, International cancer programs etc. (see Main page)

II-2. Foreword 2: Editorial process:
This is an important part, as the Editorial database processing must take it into account. See "Editorial workflow in the Atlas": http://chromosomesincancer.org/en/editorial-workflow.html .
In particular, critically important, Tables are used 1- to identify all/each relevant item (Table 1 herein below); 2- to dialogue with authors (Table 2). Examples:

NAME

STATUS

AUTHORS

ID Atlas

ICD-O3_MORPH

ICD-O3_TOPO

05;00§ tri 5/NHL or chronic Lympho

FOR SALE

 

 

 

C421,C424

05;00§ MDS with isolated del (5q)

DONE

XX

1134

9986/3

C421,C424

05;00§ del(5)(q32q33) TNIP1/PDGFRB

Reserved

XXXX

1773

 

C421,C424

… about 1,000 items/lines

 

 

 

 

 

99;99§ Extraosseous plasmacytoma

Reserved

XXXXXX

1718

734/3

C421,C424

99;99§ Florid follicular hyperplasia PTLD

Reserved

XXX

1788

 

C421,C424


AUTHOR

e-mail

"Translocation"

DEADLINE

COMMENTS

XXX

XXX@xx

Florid follicular hyperplasia PTLD

DONE

3rd paper (leuk.) + 1 paper (gene)

XXXX

XXXX@xxxx

del(5)(q32q33) TNIP1/PDGFRB

 

Reminder 2017/06/26; 2017/03/21   "Yes, will have this to you shortly"; Reminder 2016/11/19; Reminder 2016/06/17; 2016/01/17 no deadline ("soon"); 2015/10/14: OK

XXXXX

XXXXX@xxxx

del(X)(p22p22) (P2RY8/CRLF2)

16/03/2017

Spontaneous proposal


Note: Tables used to identify all/each relevant item must be related (bijective type relation) with cards/papers; e.g. 05;00§ MDS with isolated del (5q) / ID 1134  <-->  http://atlasgeneticsoncology.org/Anomalies/del5qSoleID1134.html

Finally, we also have to format the cards/papers into word for the "scientific journal" version (see http://documents.irevues.inist.fr/handle/2042/15655 (e.g. http://documents.irevues.inist.fr/bitstream/handle/2042/62324/10-2014-HSPD1ID40888ch2q33.pdf , equivalent of http://atlasgeneticsoncology.org//Genes/HSPD1ID40888ch2q33.html ) of the Atlas ("Export word" arrow), using a database other than the http://atlasgeneticsoncology.org/ herein described (http://atlasonline.critt-informatique.fr/accueil.aspx : an almost fully operational database, under Microsoft environment (with ISS 7 on Windows Server 2008 R2 and SQL server 2008 R2). However, this database is too rigid, and does not allow much biological nor bioinformatics developments. This must either be modified or replaced by a new one, open source preferably).


III- Website Structure
III-1. Entities
The main goal at the origin of the project was to present several sets of monographies for Genes, Leukemias, Tumors, Cancer-prone diseases. The need of a database management was not crucial at this time. That why the Atlas is not a real database (e.g. mySQL etc.) but is organized around a set of structured Cards and numerous relations by the use of Indexes (generated with Perl scripts).

Cards
    Validation of txt files
    Preprocessing for Genes
    Processing of cards in hypertext
Indexes
    General indexation
    Generation of chromosome pages
    Generation of chromosomal bands
    Tables of status, categories , authors ...
Interfaces with external data (Mitelman, COSMIC, Entrez gene, HGNC, UCSC ..)

III-2. Cards processing:
1. author: -> they send ".doc" files
2. editing from ".doc" to structured ".txt" file; with the addition of hyperlinks
3. Validation step
    • Edition of bibliography in alpha order from PMID (and search in PubMed)
    • Correction of special characters following a thesaurus of octal codes
    • Test of blocs and fields
    • Test of correct hyperlinks
4. Transformation into hypertext files
    Using specific scripts (gene2html.pl, anom2html.pl , tumors2html.pl kprone2html.pl ..)
5 . For Annotated Genes: addition of specific external links
    (specific management in parallel for the list of genes)
5 bis. For other genes: automatic creation from updated data (genes_g[cn].txt)
6. Addition of internal hyperlinks 
    (specific management in parallel)

III-3. Organisation of directories
All data is organized in two main directories
1. "cytatlas" (for managment)
2. " chromcancer" (with internet access)

1. cytatlas
    ./Genes0 (for expert txt)
    ./Genes (after txt processing)
    ./Anomalies
    ./Tumors
    ./Kprones
    ./Reports
    ./Deep
    ./Educ
    Each directory has some other subdirectories for Images, xxLinks …
    ./Scripts (all bash and perl + references     data)

2. chromcancer
    ./Genes
    ./Anomalies
    ./Tumors
    ./Kprones
    ./Reports
    ./Deep
    ./Educ
(with subdirectories for Images ..)
    ./Categories
    ./Indexbychrom (Chromosomes pages)
    ./Indexbyalpha
    ./Bands
    ./Status
    ./ICD
    ./ISCN

III-4. Indexation of Cards 1: script indexation.sh (in cytatlas/Scripts)
Used for re-indexation after new files or new data
1. Generation of all automatic genes
2. Generation of the main index file for all documents (ObjDB.txt)
3. Generation of a catalog (text file with the information from the HEADER, see: http://atlasgeneticsoncology.org/Collab/catalog)
4. Generation of some others indexes (ObjDBxx.txt)
5. Transformation of the catalog (and "for sale" - "to be written" files) in tables with concatenation in a catalog_full.txt file)
6. Indexations of Genes (Geneliste.html), Leukemias (Anomliste.html), etc.
7. Indexation by chromosomes
8. Indexation by authors (different IndxAuthxx.txt / html in Collab) (IndxAuth3.txt is the main index for authors and affiliations)
9. Generation of Categories (several files are maintained before in parallel) for Cell Biology items
10. Generation of status (Genes .. Authors . etc.): http://atlasgeneticsoncology.org/Status/Status.html
11. Generation of Recent (last 2 years documents): http://atlasgeneticsoncology.org/Recent.html
12. Generation of COSMIC projects and TCGA/ICGC projets
13. Statistics (http://atlasgeneticsoncology.org/stat_atlas.html)
Possibility of mysql indexation for some items (query in the home page)

III-4 bis. Indexation of Cards 2: Generation of external links for all genes
1. Maintenance of 2 specific tables (genes_gc.txt and genes_gn.txt) for genes with more than 80 informations
All genes in the Atlas are extracted form Entrez Gene (NCBI) by ftp (each week) (60200)
and compared to UCSC genes (refGene.txt file for hg38). Only genes with a genomic location are conserved (27580 at this time).
Potentially cancer genes are specified with the presence of a list of terms in desription or generif:
"cancer","tumour","tumor","neoplasm","metastas","translocation","carcinogen","carcinom »,"lymphom","oncogen","repair","leukemia","transforming","melanoma","neuroblastoma","sarcoma","adenom","glioma","mitogen","fusion","proliferation","rearrangement","malignan"
External data are from HGNC, UniProt, UCSC, Ensembl, COSMIC, etc.) and are processed semi-automatically (an important step to be better formalized)
See: http://atlasgeneticsoncology.org/Collab/genes_gc.txt and http://atlasgeneticsoncology.org/Collab/genes_gn.txt 

III-5. Generation of internal hyperlinks and Cards
script maj_full.sh (in cytatlas/Scripts)
1. Generation of internal hyperlinks
    In each card, hyperlinks are defined as a tag with the format: <CC: TXT: xxxxxxxxxxx  ID: yyy>
    TXT content correspond to the visible txt in the hypertext file;
    ID is the Atlas ID of the object;
    A compete file of hyperlinks is generated.
Map of one set towards another: injectivity/surjectivity:

Item Internal hyperlink toward
1 Gene n1 Leukemias
  n2 Solid tumors
  n3 Cancer-prone
   
1 Leukemia n4 Genes
  n5 Cancer-prone
   
1 Solid tumor n6 Genes
  n7 Cancer-prone
   
1 Cancer-prone n8 Genes
 

n9 Leukemias

  n10 Solid tumors

Examples:

Item Hyperlinks toward
Gene NUP214 Leukemia t(6;9)(p23;q34) DEK/NUP214
    Leukemia t(9;9)(q34;q34) SET/NUP214
    Leukemia T cell ALL
    Solid Tumor Lung Adenocar. t(9;9)(q34;q34) PRRC2B/NUP214
       
Gene KIT Leukemia trisomy 4
    Solid Tumor Melanoma
    Cancer Prone Piebaldism
       
Leukemia t(6;9)(p23;q34) DEK/NUP214 Gene NUP214
    Gene DEK
       
Cancer Prone Tuberous sclerosis Gene TSC1
    Gene TSC2
    Solid Tumor Renal carcinoma
    Solid Tumor Ependymomas


2. For each card (e.g. Genes) generation of the links from the other types (ex: AnomLinks, TumorsLinks etc.) in an hypertext format (to be added when generation of hypertext cards)
3. generation of all cards
    ./gen_genes_gn.sh (for non cancer genes)
    ./gen_genes_gc.sh (for genes potentially cancer)
    ./gen_gene2.sh (for expertized genes)
    ./gen_genes_link.sh (expertized genes are defined as filename or as standard: GC_symbol.html)
    ./maj_prelim.sh (for Leukemia, Tumors or Kprones not written - forsale)
    ./gen_anom.sh, ./gen_kprone.sh, ./gen_tumor.sh, ./gen_educ.sh, ./gen_deep.sh, ./gen_report.sh

III-6. Generation of chromosomal bands
script index_byband2  (in cytatlas/Scripts)
The generation of chromosomal bands (2 sections, Anomalies and Genes) needs some previous processing with different sources of data (Mitelman, COSMIC, FusionDB, TICdb , ChimerDB
See: http://atlasgeneticsoncology.org/Bands/1p36.html#REFERENCES
All sources are preprocessed with the same format.
These pages are updated at each time a new version of Mitelman (3 by year) or COSMIC (each 3 months) 
1. Processing of the Mitelman database
2. Processing of the COSMIC database
3. Integration with other sources

III-7. Other integrations
ICD-O: Topographical Classification (WHO/OMS)
ICD-O: Morphological Classification (WHO/OMS)
Drugs and Therapies
International Cancer Programs
Genomic Data Commons
ICGC Program
TCGA program
IntoGen Portal
OASIS Portal
COSMIC studies
Tumour cell lines

III-8. Atlas website structure statistics (http://atlasgeneticsoncology.org/stat_atlas.html ).

 
 

IV- Perspective and Evolution
The Atlas needs to evolve toward a real database with 2 goals:

1. An editorial management for interactive submission of documents (+++): authors would fill an application form directly formatted in the database and submitted to the Editor and/or the Section Editor (see http://atlasgeneticsoncology.org/BackpageAbout.html#EDITORIAL ), who would validate/ask for modifications/reject these "ready to use" cards/papers.
2. A structured database for all cards and documents.

This database should use open source free software.

To be more integrated in the new era of cancer genomics, one needs the development of new tools, in particular graphical interfaces, and new integrated data in the domain of cancer cytogenomics, in relation with personalized medicine programs.