This file contains instructions for running a R function (in a R package) - miRComp to combine the prediction results of three miR targets prediction algorithms – miRanda, TargetScan and PicTar. **************************************************** ***********Download and Run miRComp****************** **************************************************** Download and unzip files into your working directory, for example, C:\miRComp. This directory will contain miRComp.R (source R code), mR-miR-375.txt, TS-mIR-375.txt, PT-miR-375.txt (sample data files) and EnsemblHugoRefSeq-Sep08.txt (default input dataset). Then, start R program, after the R prompt (">" as usual), type (without >) >setwd("C:/miRComp") >source("miRComp.R") This will cause R to read and parse the source file. There are many ways to call the function miRComp and input files. For example, you can type >miRComp("mR-miR-375.txt","TS-miR-375.txt","PT-miR-375.txt") to combine three input files or you can do a two-file combination >miRComp("mR-miR-375.txt","TS-miR-375.txt") >miRComp("mR-miR-375.txt", ,"PT-miR-375.txt") or you can designate the input files and ignore the order >miRComp(TargetScanfile="TS-miR-375.txt",PicTarfile="PT-miR-375.txt",miRandafile="mR-miR-375.txt") Error information will be given if there is only one input file >miRComp("mR-miR-375.txt") Error in miRComp("mR-miR-375.txt") : There must be at least two input files The above commands call the function miRComp to combine the input files and output results to the same directory. **************************************************** *******************Input Files********************** **************************************************** The miRComp function takes the output from three miR target prediction algorithms as its input. In the above examples, mR-miR-375.txt is the output of miRanda, TS-miR-375.txt is the output of TargetScan and PT-miR-375.txt is the output of PicTar. EnsemblHugoRefSeq-Sep08.txt was downloaded from Ensembl BioMart (www.ensembl.org/Multi/martview) and is taken as a default input. It provides a match (or conversion) between different gene identifers (Ensembl gene ids, Ensembl transcipt ids, HUGO ids and RefSeq ids). However, not all gene identifiers in this dataset have corresponding names in other identifiers. In this case, the information from targets prediction files were exploited to complement this deficency. Each of these three miR target prediction algorithms has user-friendly web interface (miRanda: http://cbio.mskcc.org/cgi-bin/mirnaviewer/mirnaviewer.pl, TargetScan: http://genes.mit.edu/targetscan.test/ucsc.html, PicTar: http://pictar.bio.nyu.edu/cgi-bin/PicTar_vertebrate.cgi). To obtain target prediction output, you need input the miR name (miRanda) or select from the drop-down menu (TargetScan and PicTar). The output is given as an excel format (miRanda) or a html format (TargetScan and PicTar). It is recommended that you save the excel file as a Tab delimited text file (miRanda) or cut and paste the html content into EXCEL and save as a Tab delimited text file (TargetScan and PicTar). No column name is needed. For TargetScan output, due to the last column's long text content, it is recommended that you delete this column in case of possible reading errors. **************************************************** **********************Output Files******************* ***************************************************** ------------------------------ result-statistics ------------------------------ result-statistics records some simple facts of combinaton results. A typical result-statistics is as follows ----------------------------------------------------------------------------------- 34 lines read from file 'mR-miR-375.txt' ( 34 Ensembl Transcript ids ) 161 lines read from file 'TS-miR-375.txt' ( 161 HUGO ids ) miRanda predicted 23 genes ( HUGO ids, including 4 genes without HUGO ids ) TargetScan predicted 161 genes ( HUGO ids ) miRanda and TargetScan predicted 3 common genes ( HUGO ids ) miRanda and TargetScan predicted a combination of 181 genes ( HUGO ids ) ------------------------------------------------------------------------------------- ------------------------------ combinedtargets ------------------------------ combinedtargets is the main output file. The column names and their meanings are 1. microRNA: the name of microRNA of interest 2. HUGO: HUGO ids used as matching identifiers for different gene names. As the matching identifier, it is unique, but it could be blank since some Ensembl ids don't correspond to any HUGO ids. 3. miRanda: Ensembl gene ids (ENSG) denoting targets predicted by miRanda algorithm. NA means not predicted by miRanda. 4. ENST.id: Ensembl transcript ids (ENST) denoting targets predicted by miRanda algorithm. Several Ensembl transcript ids may correspond to one Ensembl gene id. NA means not predicted by miRanda. 5. rank.mr: ranks of target genes predicted by miRanda. A gene with higher rank (for example, 1) means more possibly it is the target of microRNA of interest. The rank of an unpredicted gene is assigned to be the lowest rank of all predicted genes plus 1. 6. pvalue.mr: a "pvalue" computed from rank.mr. The pvalue of an unpredicted gene is assigned to be 1. 7. TargetScan: HUGO ids denoting targets predicted by TargetScan algorithm. NA means not predicted by TargetScan. 8. rank.ts: ranks of target genes predicted by TargetScan. 9. pvalue.ts: "Estimated false discovery rate" according to TargetScan. 10.PicTar: RefSeq ids denoting targets predicted by PicTar algorithm. NA means not predicted by PicTar. 11. rank.pt: ranks of target genes predicted by PicTar. 12. pvalue.pt: a "pvalue" computed from rank.mr. 13. mR: Is this gene predicted by miRanda? (1: Yes; 0: No) 14. TS: Is this gene predicted by TargetScan? (1: Yes; 0: No) 15. PT: Is this gene predicted by PicTar? (1: Yes; 0: No) 16. mRmTS: Is this gene predicted by both miRanda and TargetScan? (1: Yes; 0: No) 17. mRmPT: Is this gene predicted by both miRanda and PicTar? (1: Yes; 0: No) 18. TSmPT: Is this gene predicted by both TargetScan and PicTar? (1:Yes; 0: No) 19. match: how many algorithms predict this target? (1, 2, or 3?) 20. avepvalue: This gene's "average" pvalue by combining pavlues from three algorithms. 21. averank: This gene's "average" rank, averaged by ranks from three algorithms. ------------------------------- plot.png ------------------------------- plot.png contains three plots, ratio plot, common targets plot and correlation plot. These plots provide a measure of consistency of different algorithms. If two algorithms both predicted n targets according to their respective ranks, how many are common targets? What is the ratio of number of common targets to n? What is the correlation of ranks? checking these numbers' variation to n could give us a way to "shortlist" the targets. ------------------------------------------------------------------ mRHugo TSHugo PTHugo mRmTSHugo mRmPTHugo TSmPTHugo AllthreeHugo ------------------------------------------------------------------ These files contains the HUGO ids of (combination of) algorithms and could be used as input files for further analysis. **************************************************** *********************References********************** ***************************************************** Jin Zhou, Shili Lin, Vince Melfi, Joe Verducci (2006). Composite MicroRNA Target Predictions and Comparisons of Several Prediction Algorithms. MBI Technical Report No. (http://mbi.osu.edu/publications/pub2006.html)