Documentation

I. Introduction


A. What is Plant MicroRNA Encyclopedia (PmiREN) Database?

PmiREN (Plant miRNA ENcyclopedia) is a comprehensive functional plant miRNA database. For species with whole genome sequences and deeply sequenced small RNA libraries, we employed a suite of uniform methods (e.g. the method based on miRDeep-P2, a widely used tool for miRNA identification, with newly updated plant miRNA criteria was used to identify miRNAs) and collected extensive knowledge on miRNA genes, including fundamental annotation on sequences, expression, target genes and gene clusters, and created brand new data on plant miRNA target validation, synteny and conservation. Besides, we also sorted miRNA candidates with different confident levels by newly updated plant miRNA criteria. In current version 1.2, we have registered 88 species phylogenetically ranging from chlorophytes to angiosperms, 20338 miRNA genes belonging to 5757 families, 1365 clusters, 1668 syntenic blocks and 141327 predicted miRNA-target pairs. Meanwhile, 1587 deeply sequenced small RNA libraries were used in quantification of miRNA expression patterns, and 116 PARE-Seq libraries were employed to validate predicted miRNA-target pairs. These unique features such as annotating miRNAs with an uniform method, solid and extensive information on miRNA synteny, conservation, expression, references, and providing many convenient data accesses, powerful search and download engines, have made PmiREN a truly functional miRNA knowledgebase. We believe PmiREN will provide novel insights into plant miRNA research and benefit the whole research community. 


II. Datasets and Workflow


A. Datasets

Original Datasets of sRNA-Seq data, PARE-Seq data were derived from NCBI, and genome sequences, annotation files (transcript, GFF3, cDNA) were download from NCBI or JGI. Totally, 88 plant genome reference sequences and related annotation information, 1587 sRNA-Seq datasets, 116 PARE-Seq datasets were downloaded. As a result, 20338 miRNAs belong to 5757 families, 1365 cluster, 1668 syntenic blocks and 141327 predicted miRNA-target pairs were achieved through a suite of uniform methods (details as follows in Workflow).


B. Workflow

Data pre-processing

The format of sRNA-Seq or PARE-Seq datasets from GEO DataSets of NCBI is not uniform. The majority of these datasets keep original Fastq format with/without adapter sequences while small number of them have been transformed into Fasta format. Linux version 2.8.2 of SRA toolkit was employed to convert original compressed files into Fastq format. Trim Galore was used to trim adapters. To format sRNA-Seq datasets, trimmed Fastq files are further compacted to Fasta format with collapsed identical reads by an in-house Perl script. 


Annotation of miRNA loci

A standardized process centred on miRDeep-P2 (the upgrade version of miRDeep-P, a widely used miRNA prediction tool in plants) with newly updated plant miRNA criteria was employed to identify miRNAs in 88 species. To each species, its whole genome sequences and sRNA-Seq libraries were as input files while processing the miRNA prediction. All of miRNA items retrieved by this standard process are manually checked in order to ensure them meeting the new plant miRNA criteria, and designated as high confident miRNA candidates with 3 stars. Of these 88 species, 3966  have counterparts in miRBase. miRNA items of these species in miRBase are examined as the following: 1) considering the genome version of many species in miRBase are not up to date, we upgraded the genomic loci information of these miRNAs, and those overlapped with items predicted by miRDeep-P2 designated as high confident ones with 3 stars; 2) items not overlapped but expressed supported by sRNA-seq data (RPM cut-off value ≥5) were designated as mediate confident ones with 2 stars; 3) others were also combined into PmiREN with low confident level, 1 star.


Target prediction and validation

Two suites of plant miRNA target prediction methods, centred on psRNAtarget and RNAhybrid, were introduced to predict miRNA targets. At the same time, the program RNAhybrid was used to predict energetically plausible miRNA:mRNA duplexes with plant-specific constraints, and a cut-off value for minimum free energy/minimum duplex energy of 0.70 was used. CleaveLand4 was used to process PARE-seq data, default parameters were used. The default outputs of CleaveLand4 (Addo-Quaye, et al., 2009 Bioinformatics) were grouped into 5 categories based on its scoring system (from 0 to 4), and the ‘0’ category indicated highest confidence. To restrict the candidates were highly qualified, only items of category 0, 1, 2 were kept in target info. page. But for users’ reference, we provided all 5 categories in the analysis result which could be downloaded from ‘PARE-Seq data’ option of PmiREN.

FYI, details on categories of CleaveLand4 are as follows: 1) category 0: >1 read, equal to the maximum on the transcript, when there is just 1 position at the maximum value; 2) category 1: >1 read, equal to the maximum on the transcript, when there is >1 position at maximum value; 3) category 2: >1 read, above the average depth, but not the maximum on the transcript; 4) category 3: >1 read, but below or equal to the average depth of coverage on the transcript; 5) category 4: Just one read at that position.


Expression analysis

The expression values of mature and star miRNAs were normalized by RPM (Reads Per Million). To each sRNA-Seq library, reads mapped to pri-miRNAs (mismatches not allowed) and localized in position of mature miRNAs (1-2 nt shift allowed) were deemed to reads corresponding to mature miRNAs. The total numbers of these reads were counted to calculate the RPM value of mature miRNAs. The same method was applied to achieve RPM values of star miRNAs. If there were multiple sRNA-Seq libraries from the same tissue, the mean of RPM values was presented in PmiREN.


Syntenic analysis

The synteny analysis of miRNAs was carried out using MCScanX software. Firstly, GFF file including all genes and miRNAs, FASTA file including all CDS and stem-loop sequences of miRNAs were generated. Subsequently, we used local BLAST software to compare whole sequences of each species with E-value less than 1e-10. The GFF file and BLAST output files of all genes and miRNAs were imported into MCScanX software to scan the collinearity miRNAs pairs. Then Circos software was used to display the results of collinearity gene pairs.


Conservation analysis

The conservation of miRNAs was assigned based on annotation results. For each annotated miRNA families, distribution of its members among all species were collected. The correspondence between miRNA families and their presented species were then highlighted in the species phylogenetic tree in miRNA detailed information page.


III. FAQ


A. What information does PmiREN provide for plant miRNAs?

PmiREN contains 88 species with sequenced whole genomes and available public small RNA-Seq data. You can find the highest confident miRNAs according the 2018 criteria in our database (details in the above workflow, annotation of miRNA loci). The basic information of miRNAs, including miRNA families, genome location, sequences, secondary structure,etc., in the miRNA detailed information page. PmiREN also contains the expression information of miRNAs in different tissues based on small RNA-Seq data by standard normalization (RPM). To understand the dynamic evolution of miRNAs, PmiREN provides the information of clusters, conservation and syntenic blocks. To further present the function of miRNAs, miRNA-target pairs predicted by two tools, psRNATarget and RNAhybrid, were harbored in PmiREN. Meanwhile, miRNA-target pairs were validated by 116 PARE-Seq datasets in 21 species.


B. How can I get detailed miRNA information for specific miRNA of interest?

PmiREN applies a quickly search in the top-right corner of web page. Enter miRNA locus ID, miRNA locus accession, miRNA family, or species name, e.g. miR156a, miR156, Arabidopsis thaliana, and the search engine will return all hits with statistic result in a table. Then click on the miRNA locus or miRNA locus accession, you can find the detail information of the miRNA you are interested.

Alternatively, you can also search miRNAs in Search Page with more convenient and specific search engines,  Eight powerful search engines were deployed, and specifically, you can search by keywords, sequence (blastn was used), genome location, clusters, expression, sytenic block and target genes. Click "Search".  You'll quickly receive a summary of results to meet your search need.  


C. How can I get miRNAs for any given species?

We recommend two ways to get the miRNAomes in a species. 

Firstly, you can click into Browse Page, 88 available species in our database ordered by taxon. Once you've chosen a species (by single-clicking its scientific name), you'll see a popup which contains the summary information, including genome information, number of MIR families, MIR loci, cluster, syntenic blocks, sRNA-seq datasets, RNA-Seq datasets, PARE-Seq datasets. Click on MIR families or MiR loci, it will open a new page which contains a table of miRNAs belongs to this species. You can operate the checkbox to download the information of miRNAs you interest or download all by click the button 'Download all'.

Secondary, the access to data files for each PmiREN organism is available via the Download Page.  Link to the FTP is provided by the Download Page in the top-left, and the data files are stored in the folder of each organism. For convenience, we create a user-defined download manner. You can download for miRNAs detailed information by a customized manner. 


D. How to download the data in PmiREN?

All data in PmiREN are access to download in Download Page

The data in PmiREN can be downloaded from the FTP server in a variety of formats. To facilitate storage and download, all data in a species are stored in one folder.

User-defined download is a convenient access to download the data in a customized manner. Downloaded files are compressed into a zip file.

You can also download all data from SourceForge (https://sourceforge.net/projects/pmiren/files/FTP_download/).


IV. Support


Submit your data on PmiREN! 

If you have a group of novel miRNAs in plants, and would like to submit them to PmiREN. Please go to Submit Page


V. Citation

Data files contained in the PmiREN are free of all copyright restrictions and made fully and freely available for non-commercial use. Users of the data should cite the following articles:

PmiREN: a comprehensive encyclopedia of plant miRNAs. Zhonglong Guo, Zheng Kuang, Ying Wang, Yongxin Zhao, Chen Cheng, Zhonglin Wang, Yihan Tao, Jing Yang, Xiayang Lu, Xiaoyan Cao, Lei, Li, and Xiaozeng Yang


VI. Miscellaneous


Figures of Species

All hand drawing figures of organisms are downloaded from Plantillustrations.org.