Chromosomes in Cancer - Atlas website structure

The Atlas website structure
http://atlasgeneticsoncology.org
Philippe Dessen (Database Director) This email address is being protected from spambots. You need JavaScript enabled to view it.
Jean Loup Huret (Editor) This email address is being protected from spambots. You need JavaScript enabled to view it.
May 2018

I- Main page:

II-1.

Foreword 1: There are various types of items developed in the Atlas:
1- Genes (http://atlasgeneticsoncology.org/Genes/XXX )
1-1. Annotated genes (papers/cards written by authors) URLs: http://atlasgeneticsoncology.org/Genes/[name-of-gene]ID[number]ch[location].html (1,493 papers/cards, e.g. http://atlasgeneticsoncology.org/Genes/PGRID41700ch11q22.html); and
1-2. Automated cards on genes (more or less like GeneCards); URLs: http://atlasgeneticsoncology.org/Genes/GC_[name-of-gene].html (28,377 cards);
2- Leukemias 681 annotated papers/cards : http://atlasgeneticsoncology.org/Anomalies/XXX , and 540 "Other Leukemias" (automated cards) : http://atlasgeneticsoncology.org/Anomalies/TL_XXX .
3- Solid tumors 217 annotated papers/card : http://atlasgeneticsoncology.org/Tumors/XXX , and 2,968 "Other Tumors" (automated cards) : http://atlasgeneticsoncology.org/Tumors/TT_XXX .
4- Cancer-prone diseases 114 annotated papers/cards : http://atlasgeneticsoncology.org/Kprones/XXX .
5- Case reports in hematology (http://atlasgeneticsoncology.org/Reports/XXX ) (88 papers/cards)

All these cards are structured from templates (e.g. Submission form for GENES: http://atlasgeneticsoncology.org/Forms/Gene_Form_for_submission.doc) with the addition of a HEADER with tags or tracking devices allowing for indexing of the form in different parts of the data base (e.g. TRI_PAR_CHROMOSOME -> to which chromosome page (red arrow)? CATEGORY-> to which Cell Biology page (red arrow)?); see also: http://atlasgeneticsoncology.org/Collab/catalog and http://chromosomesincancer.org/en/template-for-cards-papers.html ;
- and EXTERNAL LINKS (bottom of each paper/card).
There are also
- Deep Insights (traditional papers) http://atlasgeneticsoncology.org/Deep/XXX (113 Deep)
- Chromosome pages (http://atlasgeneticsoncology.org/Indexbychrom/idxa_[chromosome-number].html e.g. http://atlasgeneticsoncology.org/Indexbychrom/idxa_11.html ) and
- Chromosome band pages (http://atlasgeneticsoncology.org/Bands/[band].html e.g. http://atlasgeneticsoncology.org/Bands/19p13.html ),
- Cell biology pages (http://atlasgeneticsoncology.org/Categories/[category-name] e.g. http://atlasgeneticsoncology.org/Categories/Cell_cycle.html
- ICD-O pages (International Classification of Diseases - Oncology WHO/OMS) e.g. http://atlasgeneticsoncology.org/Tumors/Solid_Nosology.html and http://atlasgeneticsoncology.org/ICD/icd_2016_topo.html
- Atlas status (thesaurus of the Atlas: http://atlasgeneticsoncology.org/Status/Status.html and sub-pages)
- and various other pages (Backpage: http://atlasgeneticsoncology.org/BackpageAbout.html Recent papers http://atlasgeneticsoncology.org/Recent.html , Educational items http://atlasgeneticsoncology.org/GeneticEng.html , Genes partners, International cancer programs etc. (see others on the Main page)

A. Architecture of data:
A1. Text files for cards
Genes0
Anomalies (Leukemias)
Tumors
Kprones (Cancer-prone hereditary diseases)
Reports (Case Reports)
Deep (Deep Insight)
Educ (Educational Items)
The 2 last are defined by a couple of files (.meta + .htm)
A2. Chromosomal location
at the chromosome level
at the chomosomal band level
A3. Functional categories (Cell Biology)
A4 Count/Census of Atlas Items
Statistics on atlas files, see: http://atlasgeneticsoncology.org/Status/Status.html , and http://atlasgeneticsoncology.org/stat_atlas.html
A5. Catalogs and indexes
see in http://atlasgeneticsoncology.org/Collab/
catalog_full.txt (tabulated file with major informations from HEADER and IDENTITY blocs)
ObjDB0.txt ObjDB2.txt ObjDB4.txt ObjDB6.txt ObjDB.txt
ObjDB1.txt ObjDB3.txt ObjDB5.txt ObjDB7.txt: different files for indexing (might be simplified in a new structure)
A6. External resources

B. Modules for management and development
Module 1: Description of the templates of cards
see: http://atlasgeneticsoncology.org/Collab/Formes.xlsx ,
which correspond to the various submission forms for the authors:
http://atlasgeneticsoncology.org/Forms/Gene_Form_for_submission.doc
http://atlasgeneticsoncology.org/Forms/Leukemia_Form_for_submission.doc
http://atlasgeneticsoncology.org/Forms/Solid_Tumor_Form_for_submission.doc
http://atlasgeneticsoncology.org/Forms/Cancer-Prone_Disease_Form_for_submission.doc

Each card has a unique Atlas ID (present in the filename)
Genes: 1 --> 999 + 40000 --> 80000
Anomalies: 1001 --> 4999
Tumors: 5000 --> 9999
Kprones: 10000 --> 19999
Deep: 20000 --> 29999
Educ: 30000 --> 3999

Organisation of directories
All data is organized in two main directories
1. "cytatlas" (for managment)
2. " chromcancer" (with internet access)

1. cytatlas
./Genes0 (for expert txt)
./Genes (after txt processing)
./Anomalies
./Tumors
./Kprones
./Reports
./Deep
./Educ
Each directory has some other subdirectories for Images, xxLinks …
./Scripts (all bash and perl + references data)

2. chromcancer
./Genes
./Anomalies
./Tumors
./Kprones
./Reports
./Deep
./Educ
(each of the above with subdirectories for Images ..)
./Categories
./Indexbychrom (Chromosomes pages)
./Indexbyalpha
./Bands
./Status
./ICD
./ISCN

The process of scripts is in general made in the ./cytatlas/Scripts
The directory can be installed in a way defined by a general shell script
Cytatlas.sh which defines all the logical variables.

#!/usr/bin/bash
## script cytatlas_cygwin disque D
umask 002
CYT_DIR='/cygdrive/d/ATLAS/cytatlas'
CYTW_DIR='/cygdrive/d/ATLAS/chromcancer'
CYTHTML_DIR='http://genome.igr.fr/chromcancer'
SCRIPT_DIR='$CYT_DIR/Scripts'
PATH=$PATH:$CYT_DIR/Scripts
export CYT_DIR CYTW_DIR CYTHTML_DIR SCRIPT_DIR PATH

GENE_DIR=$CYT_DIR/Genes
ANOM_DIR=$CYT_DIR/Anomalies
TUMOR_DIR=$CYT_DIR/Tumors
KPRON_DIR=$CYT_DIR/Kprones
REPORT_DIR=$CYT_DIR/Reports
STUDY_DIR=$CYT_DIR/StudyGroup
DEEP_DIR=$CYT_DIR/Deep
EDUC_DIR=$HOME/DATA_DIR/Educ
WGENE_DIR=$CYTW_DIR/Genes
WANOM_DIR=$CYTW_DIR/Anomalies
WTUMOR_DIR=$CYTW_DIR/Tumors
WKPRON_DIR=$CYTW_DIR/Kprones
WREPORT_DIR=$CYTW_DIR/Reports
WDEEP_DIR=$CYTW_DIR/Deep
WEDUC_DIR=$CYTW_DIR/Educ
WCOLLAB_DIR=$CYTW_DIR/Collab
export GENE_DIR ANOM_DIR TUMOR_DIR KPRON_DIR
export REPORT_DIR STUDY_DIR DEEP_DIR EDUC_DIR
export WGENE_DIR WANOM_DIR WTUMOR_DIR WKPRON_DIR
export WREPORT_DIR WDEEP_DIR WEDUC_DIR WCOLLAB_DIR

alias cyt='cd $CYT_DIR'
alias cytw='cd $CYTW_DIR'
alias cyti='cd $CYT_DIR/Scripts'
echo 'cyt : $CYT_DIR'
echo 'cyti : $CYT_DIR/Scripts'
echo 'cytw : $CYTW_DIR'
echo 'www : $CYTHTML_DIR'

Module 2: Editorial management of cards
Foreword 2: Editorial process:
This is an important part, as the Editorial database processing must take it into account. See "Editorial workflow in the Atlas": http://chromosomesincancer.org/en/editorial-workflow.html .
In particular, critically important, Tables are used 1- to identify all/each relevant item (Table 1 herein below); 2- to dialogue with authors (Table 2).
These tables are the today ones used by the editor.
Examples:

NAME	STATUS	AUTHORS	ID Atlas	ICD-O3_MORPH	ICD-O3_TOPO
05;00§ tri 5/NHL or chronic Lympho	FOR SALE				C421,C424
05;00§ MDS with isolated del (5q)	DONE	XX	1134	9986/3	C421,C424
05;00§ del(5)(q32q33) TNIP1/PDGFRB	Reserved	XXXX	1773		C421,C424
… about 1,000 items/lines
99;99§ Extraosseous plasmacytoma	Reserved	XXXXXX	1718	734/3	C421,C424
99;99§ Florid follicular hyperplasia PTLD	Reserved	XXX	1788		C421,C424

AUTHOR	e-mail	"Translocation"	DEADLINE	COMMENTS
XXX	XXX@xx	Florid follicular hyperplasia PTLD	DONE	3rd paper (leuk.) + 1 paper (gene)
XXXX	XXXX@xxxx	del(5)(q32q33) TNIP1/PDGFRB	?????	Reminder 2017/06/26; 2017/03/21"Yes, will have this to you shortly"; Reminder 2016/11/19; Reminder 2016/06/17; 2016/01/17 no deadline ("soon"); 2015/10/14: OK
XXXXX	XXXXX@xxxx	del(X)(p22p22) (P2RY8/CRLF2)	16/03/2017	Spontaneous proposal

Note: Tables used to identify all/each relevant item must be related (bijective type relation) with cards/papers; e.g. 05;00§ MDS with isolated del (5q) / ID 1134 <--> http://atlasgeneticsoncology.org/Anomalies/del5qSoleID1134.html

These tables may not exist per se. They would be integrated in the database of the Atlas (as, so far, some files as catalog, authors lists … on the INIST server. and automatically generated. But the right way will be to use ONLY screens developed ad hoc for editorial management.

Finally, we also have to format the cards/papers into word for the "scientific journal" version (see http://documents.irevues.inist.fr/handle/2042/15655 (e.g. http://documents.irevues.inist.fr/bitstream/handle/2042/62324/10-2014-HSPD1ID40888ch2q33.pdf , equivalent of http://atlasgeneticsoncology.org//Genes/HSPD1ID40888ch2q33.html ) of the Atlas.
A module is being finalized concerning the production of the scientific journal: It is a Web application developed with PHPWord/MySQL/Symfony Framework.

2.1 Actual process of management:
• Reception of a doc file (structured in the ad hoc template.
• Edition in a text file with good fields (validate the presence of a field in the beginning of lines (with eventually multiplicity of the same field)
• Addition of internal links (an expert task)

• Complementary procedure (by scripts)
- Correction of special characters (non compatibility between office Word and hypertext)
- Validation of bibliography with dowloading the short description of PubMed with PMID.
- Reordering biblio with alphabetic order
- Tests of good fields for each line (in good blocs)
- Test of links ( existent Atlas ID).
- Test of logical structure of each bloc (BEGIN.. / END)
- Transformation in hypertext documents (see further)

2.2 To be developed for a new management:
- On-line templates must be available to the authors who will write directly their paper (e.g. http://atlasgeneticsoncology.org/Forms/Gene_Form_for_submission.doc , corresponding to http://chromosomesincancer.org/en/atlas-templates-for-cards.html ). These templates must be capable of evolution when the Editor(s) in Chief wish to add or delete or modify a tag or even a "BEGIN_END". A password would allow the author to interrupt his writing, save, and come back latter until a last validation.
- A macro-instruction assisting the recognition of an internal hyperlink that the editorial staff member must add before publishing (e.g. ABL: a list of "NAMES" and aliases must help to recognize and propose the hyperlink to ABL1; the editorial member says OK, and the hyperlink to http://atlasgeneticsoncology.org/Genes/ABLID1.html comes automatically): --> need of a thesaurus (e.g. http://atlasgeneticsoncology.org/Tumors/Solid_Nosology.html
- PMID numbers solely permits to give the full reference
- Special characters are automatically transcribed (e.g. https://text-symbols.com/html/entities-and-descriptions , http://www.symbole-clavier.com , http://alexandre.alapetite.fr/doc-alex/alx_special.html ).

Module 3: Indexations
All the files of Atlas are considered as "objects" and are defined by an Atlas ID.
The standard of the filename of the objects is the following:
- Genes: 2 forms:
<String>ID<number><location>.txt Ex: ADCYAP1ID43656ch18p11.txt
<String>ID<number>.txt Ex: AF6ID6.txt
A generic form is built fr an easy links. The filename id defined as:
GC_<symbol>.txt e.g.: GC_ZFP90.txt
- Anomalies (Leukemias)
- Tumors
- Kprones
The filename is as : <string>ID<number>.txt ( ex :t3q21q26TreatRelLeukID1236.txt)
Creation (or Updating) of index or cards
As mentioned in the indexation.sh script (the main indexation performed after some modifications)
Indexation of Cards 1: script indexation.sh (in cytatlas/Scripts)
Used for re-indexation after new files or new data
http://atlasgeneticsoncology.org/Collab/Scripts/indexation.sh
1. Generation of all automatic genes
2. Generation of the main index file for all documents (ObjDB.txt)
3. Generation of some others indexes (ObjDBxx.txt ) (see in http://atlasgeneticsoncology.org/Collab/)
- ObjDB.txt
- ObjDB0.txt
- ObjDB1.html
- ObjDB1.txt
- ObjDB2.txt
- ObjDB3.txt
- ObjDB4.txt
- ObjDB5.txt
- ObjDB6.txt
- ObjDB7.txt
4. Generation of a catalog (text file with the information from the HEADER, see: http://atlasgeneticsoncology.org/Collab/catalog)
5. Transformation of the catalog (and "for sale" - "to be written" files) in tables with concatenation in a catalog_full.txt file)
6. Indexations of Genes (Geneliste.html), Leukemias (Anomliste.html), etc.
7. Indexation by chromosomes
8. Indexation by authors (different IndxAuthxx.txt / html in Collab) (IndxAuth3.txt is the main index for authors and affiliations)
9. Generation of Categories (several files are maintained before in parallel) for Cell Biology items
10. Generation of status (Genes .. Authors . etc.): http://atlasgeneticsoncology.org/Status/Status.html
11. Generation of Recent (last 2 years documents): http://atlasgeneticsoncology.org/Recent.html
12. Generation of COSMIC projects and TCGA/ICGC projets
13. Statistics (http://atlasgeneticsoncology.org/stat_atlas.html)
Possibility of mysql indexation for some items (query in the home page)

Module 4: Internal cross links between classes of cards
This section is important and gives a plus at the Atlas with numerous additions of links between cards. These links (Gene -> tumors, Leukemia <- genes …) enrich the quality and the expertise of the Atlas.
An automatic procedure parses all the cards and indexes the location of links (defined as a standard by the pattern ( <: TXT: text ID: AtlasID).
http://atlasgeneticsoncology.org/Collab/Scripts/maj_full.sh
This script has 2 goals: definition of all internal links
and
Generation of all hypertext files (readable on the net)

Map of one set towards another: injectivity/surjectivity:

Item	Internal hyperlink toward
1 Gene	n1 Leukemias
	n2 Solid tumors
	n3 Cancer-prone

1 Leukemia	n4 Genes
	n5 Cancer-prone

1 Solid tumor	n6 Genes
	n7 Cancer-prone

1 Cancer-prone	n8 Genes
	n9 Leukemias n10 Solid tumors

Examples:

Item		Hyperlinks toward
Gene	NUP214	Leukemia	t(6;9)(p23;q34) DEK/NUP214
		Leukemia	t(9;9)(q34;q34) SET/NUP214
		Leukemia	T cell ALL
		Solid Tumor	Lung Adenocar. t(9;9)(q34;q34) PRRC2B/NUP214

Gene	KIT	Leukemia	trisomy 4
		Solid Tumor	Melanoma
		Cancer Prone	Piebaldism

Leukemia	t(6;9)(p23;q34) DEK/NUP214	Gene	NUP214
		Gene	DEK

Cancer Prone	Tuberous sclerosis	Gene	TSC1
		Gene	TSC2
		Solid Tumor	Renal carcinoma
		Solid Tumor	Ependymomas

Module 5: Management of external links
Files for annotation of genes:
A parallel management of several databases is regularly made (from NCBI, UCSC, UniProt, Ensembl, HGNC, COSMIC, Mitelman (NCI) …)
All genes defined in the Atlas are based of the update list of gene symbols of human Entrez_genes . A great part of annotations are also associated ones in the ftp files of (ftp.ncbi.nih.gov/gene/DATA/ and /gene/GeneRIF). They are managed by script to lead in 2 tabulated files (genes_gc.txt and genes_gn.txt ) , the first one for « cancer genes » (some words in description and/or GeneRif in Entrez Gene),
the second for other genes of Entrez_Gene (NCBI). Genes used are limited to a genome location (hg38) defined in http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz.
See: for a definition of all the fields keep updated and used in External links of the cards on Genes: External_annotations_genes

Module 6: Statistics
- Topics/Items for the Atlas (genes, leukemias, solid tumors and others):
- Need of Tables (e.g. http://atlasgeneticsoncology.org/Collab/ID-TRANSLOC.txt ) to know what is done, what is done but old, what is reserved to a given author, what needs an author to be found. Allows the addition of new items.

Nav view search

Navigation

Search

The Atlas

Association ARMGHM