What is Plant MicroRNA Encyclopedia (PmiREN) Database?
PmiREN (Plant miRNA ENcyclopedia) is a comprehensive functional plant miRNA database. In version 2.0, it contains 38,186 miRNA loci (MIR) belonging to 7,838 families, 1,668 syntenic blocks and 141,327 predicted miRNA-target pairs in 179 species phylogenetically ranging from chlorophytes to angiosperms. In addition, 2,331 deeply sequenced small RNA libraries were used in quantification of miRNA expression patterns, and 116 PARE-Seq libraries were employed to validate predicted miRNA-target pairs. Compared to version 1.0, PmiREN2.0 not only retains its full functions, but also adds 11 new tools for in-depth data mining and experimental practice, promoting the transition from data collection to fully functional knowledgebase. Furthermore, PmiREN forum was established for sharing resources and new discoveries, research communication and announcement release. We believe the PmiREN2.0 will consistently provide novel insights into miRNA research.
II. Datasets and Workflow
PmiREN2.0 consists of miRNA entries from 179 plants, including 2 chlorophytes, 1 moss, 27 ferns or lycophytes, 2 gymnosperms, 1 basal angiosperm, 1 magnoliidae, 24 monocotyledons, and 121 eudicotyledons. Whole genome references and gene annotations were downloaded from Assembly of NCBI (https://www.ncbi.nlm.nih.gov/assembly/) and/or Phytozome V12.1 (https://phytozome.jgi.doe.gov/pz/portal.html). 2,331 sRNA-Seq and 116 PARE-Seq (Parallel Analysis of RNA Ends sequencing) datasets were obtained from the NCBI GEO DataSets (https://www.ncbi.nlm.nih.gov/geo/).
Data analysis pipelines
Data pre-processing. The format of sRNA-Seq or PARE-Seq datasets from GEO DataSets is not uniform. Linux version 2.8.2 of SRA toolkit (https://www.ncbi.nlm.nih.gov/sra/docs/toolkitsoft/) was employed to convert original compressed files into Fastq format. TrimGlore (version 0.5.0) (http://www.bioinformatics.-babraham.ac.uk/projects/trim_galore/) was used to trim adapter sequences. To format sRNA-Seq datasets, trimmed Fastq files are further compacted to Fasta format with identical reads collapsed by an in-house Perl script.
Annotation of miRNAs. A standardized process centred on miRDeep-P2 and newly updated plant miRNA criteria was employed to identify miRNAs. For each species, whole genome reference and sRNA-Seq datasets were used as input files while the new plant miRNA criteria were added as a filter. All miRNA candidates (including those having counterparts in miRBase) retrieved by this process were designated as high confidence entries with three stars. Candidates in miRBase that were not retrieved were re-examined and annotated. Candidates with low expression supported by sRNA-Seq datasets (RPM (reads per million) cut-off value <10) and meeting the structure criteria of miRNA precursors were designated intermediate confidence with two stars. Candidates only meeting structure criteria of miRNA precursors and without any expression support by sRNA-Seq datasets were considered low confidence with one star. Other entries in miRBase were discarded.
Target prediction and validation. Two suites of plant miRNA target prediction methods, psRNAtarget and RNAhybrid, were used to predict miRNA targets. The mature miRNA sequences and mRNA transcripts of the corresponding species were uploaded to psRNATarget webserver. Newest default parameters of Schema V2 (2017 release), except that the default expectation threshold of 5 was reduced to a more restrict value of 3, was used. RNAhybrid was used to predict energetically plausible miRNA:mRNA duplexes with plant-specific constraints as previously described. A cut-off value for minimum free energy/minimum duplex energy of 0.70 was used. CleaveLand4 was used to process PARE-Seq datasets and only category 0 and 1 results were kept to reduce false positives.
Expression analysis. The expression values of mature and star miRNAs were normalized by RPM as previously described. For each sRNA-Seq dataset, reads mapped to pri-miRNAs (mismatches not allowed) and localized in position of mature miRNAs (1-2 nt shift allowed) were deemed to correspond to mature miRNAs. The total numbers of these reads were counted to calculate the RPM value for the mature miRNAs. The same method was applied to calculate RPM values for the star miRNAs. In case multiple sRNA-Seq datasets from the same tissue were available, the mean RPM value was used.
Syntenic analysis. The synteny analysis of miRNAs was carried out using MCScanX. Firstly, GFF (general feature format) files including all protein-coding genes and pre-miRNAs and Fasta files including all coding sequences and pre-miRNAs were generated. Subsequently, DNA sequences of protein-coding genes and pre-miRNAs were used as queries to search against itself using BLASTN with E-value 1e-10. The GFF files and BLAST output files of all genes and pre-miRNAs were imported into MCScanX to scan the collinearity pairs. Circos was used to display the results.
Conservation analysis. The conservation of miRNAs was assigned based on the similarity in mature sequence. For each annotated miRNA families, distribution of all members among all species was determined. The correspondence between miRNA families and species containing these families were highlighted in the species phylogenetic tree in each miRNA loci information page.
Prediction of cis-acting regulatory element. To detect cis-acting regulatory elements which could affect transcription of MIR genes, sequences (3,000 bp) located in the upstream of miRNA hairpins were considered as promoter. Two common software, PlantCare and PlantRegMap were performed with default parameters. The results of PlantRegMap were further filtered by set threshold of P value as 1e-3.
Annotation of target genes. We performed InterProScan to glean insights on the sequence of target genes to structure to function data with set output format as TSV and HTML. For each target gene of miRNA, KEGG Orthology (KO) (https://www.genome.jp/kegg/) was generated by BlastKOALA with set default parameters.
Polymorphism on MIR genes. Genome polymorphism data were retrieved from Phytozome12 with variant call format (VCF). According to the secondary structure of miRNA hairpins, we divided each MIR genes into five parts, mature sequence, star sequence, 5’ arm, 3’ arm and internal loop. Polymorphisms on each part of MIR genes were extracted by in-horse Perl scripts.
Phylogenetic tree of miRNA family. We selected 28 most conserved miRNA families to construct phylogenetic trees. Sequence alignments of miRNA stem loop with flanking 20 bp were used MAFFT (v7.310) with default parameters. IQTREE (v1.6.12) was used to construct phylogenetic trees with parameters set as ‘-m MFP -bb 1000 -bnni’.
8 tools for miRNA regulatory network or functional study
miRNA Regulatory Network
Prediction of cis-acting regulatory elements by PlantRegMap
Prediction of cis-acting regulatory elements by PlantCare
InterProScan annotation for individual miRNA
KEGG annotation for individual miRNA
Regulatory network of transcription factor-miRNA-targets
Variant browser for miRNAs
Phylogenetic tree of specific miRNA family
Literature about plant miRNAs in the last decade
3 tools for molecular experimental design
Design primers for overexpressing miRNA
Design artificial miRNAs to silence specific miRNAs
Design the fragment of target mimic for miRNA repression
Community and collection of resources about miRNA
An online community for plant miRNA research community
Collection of commonly used miRNA databases
Collection of commonly used tools for miRNA annotation
Collection of commonly used tools for miRNA target prediction
A. What information does PmiREN provide for plant miRNAs?
PmiREN contains 179 species with sequenced whole genomes and available public small RNA-Seq data. You can find the highest confident miRNAs according the 2018 criteria in our database (details in the above workflow, annotation of miRNA loci). The basic information of miRNAs, including miRNA families, genome location, sequences, secondary structure,etc., in the miRNA detailed information page. PmiREN also contains the expression information of miRNAs in different tissues based on small RNA-Seq data by standard normalization (RPM). To understand the dynamic evolution of miRNAs, PmiREN provides the information of clusters, conservation and syntenic blocks. To further present the function of miRNAs, miRNA-target pairs predicted by two tools, psRNATarget and RNAhybrid, were harbored in PmiREN. Meanwhile, miRNA-target pairs were validated by 116 PARE-Seq datasets in 21 species.
B. How can I get detailed miRNA information for specific miRNA of interest?
PmiREN applies a quickly search in the top-right corner of web page. Enter miRNA locus ID, miRNA locus accession, miRNA family, or species name, e.g. miR156a, miR156, Arabidopsis thaliana, and the search engine will return all hits with statistic result in a table. Then click on the miRNA locus or miRNA locus accession, you can find the detail information of the miRNA you are interested.
Alternatively, you can also search miRNAs in Search Page with more convenient and specific search engines. Eight powerful search engines were deployed, and specifically, you can search by keywords, sequence (BLASTN was used), genome location, clusters, expression, syntenic block and target genes. Click "Search". You'll quickly receive a summary of results to meet your search need.
C. How can I get miRNAs for any given species?
We recommend two ways to get the miRNAomes in a species.
Firstly, you can click into Browse Page, 179 available species in our database ordered by taxon. Once you've chosen a species (by single-clicking its scientific name), you'll see a popup which contains the summary information, including genome information, number of MIR families, MIR loci, cluster, syntenic blocks, sRNA-seq datasets, RNA-Seq datasets, PARE-Seq datasets. Click on MIR families or MiR loci, it will open a new page which contains a table of miRNAs belongs to this species. You can operate the checkbox to download the information of miRNAs you interest or download all by click the button 'Download all'.
Secondary, the access to data files for each PmiREN organism is available via the Download Page. Link to the FTP is provided by the Download Page in the top-left, and the data files are stored in the folder of each organism. For convenience, we create a user-defined download manner. You can download for miRNAs detailed information by a customized manner.
D. How to download the data in PmiREN?
All data in PmiREN are access to download in Download Page.
The data in PmiREN can be downloaded from the FTP server in a variety of formats. To facilitate storage and download, all data in a species are stored in one folder.
User-defined download is a convenient access to download the data in a customized manner. Downloaded files are compressed into a zip file.
You can also download all data from SourceForge (https://sourceforge.net/projects/pmiren/files/FTP_download/).
E. How to contact us?
If you meet any troubles or find any bugs when you visit PmiREN, please email to Contact@PmiREN.com, pull requests in PmiREN Community or you can contact us by:
Address: No.9 zhonglu shuguanghuayuan, Haidian District, Beijing, China, 100097
Submit your data on PmiREN!
If you have a group of novel miRNAs in plants, and would like to submit them to PmiREN. Please go to Submit Page.
Data files contained in the PmiREN are free of all copyright restrictions and made fully and freely available for non-commercial use. Users of the data should cite the following articles:
Zhonglong Guo, Zheng Kuang, Ying Wang, Yongxin Zhao, Yihan Tao, Chen Cheng, Jing Yang, Xiayang Lu, Chen Hao, Tianxin Wang, Xiaoyan Cao, Jianhua Wei, Lei Li, Xiaozeng Yang, PmiREN: a comprehensive encyclopedia of plant miRNAs, Nucleic Acids Research, Volume 48, Issue D1, 08 January 2020, Pages D1114–D1121, https://doi.org/10.1093/nar/gkz894
Figures of Species
All hand drawing figures of organisms are downloaded from Plantillustrations.org.