Transposable Element Benchmarking
The field of transposable elements (TE) annotation is in dire need of a standard set of widely adopted benchmarks.
Transposable elements (TEs) comprise a large fraction of the DNA of most complex eukaryotes. It is critical that researchers are able to accurately identify TEs in genomic sequence both because TEs can confound many types of genomic studies, and because TEs are increasingly being shown to play important roles in genome evolution and function.
Accurately identifying TEs is a difficult and complex problem. Not only are they abundant, but TEs are highly diverse, both within and between species. To address the problem, a large number of tools have been developed, which use diverse approaches and algorithms. However, there is currently no standard way to measure a tool’s accuracy. Instead, each toolmaker (or prospective user) must either use a relatively well-annotated genome assemblies, or create their own benchmark annotation. Typically, sophisticated toolmakers will have sets of in-house standards that include some combination of well-annotated regions, manually-curated regions, reversed or fragmented genomes, evolved sequence (simulated), etc. In-house standards may be useful, but they are often not publicly available, making them difficult to assess. Furthermore, generating such references is overly burdensome for small toolmakers, an unnecessary obstacle for further innovation.
In addition to reference datasets, their is no standard way to measure a TE annotation against a benchmark. Again, each toolmaker or user must do their own comparison. As a result, there are few apples-to-apples comparisons of tools available, making it nearly impossible to choose the best tools for a specific system, let alone make a confident judgement about the accuracy of a set of annotations. Largely as a result of the lack of a set of standard references and a measurement system, many tools remain underused.
The issues outlined above were the subject of debate and discussion at a recent meeting of TE researchers. There, a process was agreed upon to address this urgent need for standard benchmarks: Teams or individuals will submit proposals for benchmark datasets. Everyone in the TE community is invited to submit proposals, not just meeting attendees. Some benchmarks may already exist in-house while some may need to be modified or generated. Submitted proposals will be reviewed and eventually published in a highly visible journal. Please visit the benchmark proposals page for additional details and to submit or review proposals.
In parallel, Mathieu Blanchette’s & Thomas Bureau’s group is developing a tool to compare query TE annotations against the standard benchmarks (or against any reference annotation), to be hosted along with the benchmark annotation sets on a publicly available server. (Additional information and a link to the system will be posted here, when available.)