We are seeking a highly motivated bioinformatician to join Ensembl, a world leading provider of genomics data resources and bioinformatics software tools.
The Darwin Tree of Life (DToL) project plans to sequence, assemble and annotate all 66000 eukaryotic species in the UK. This will offer an unprecedented molecular-level insights into evolution and biodiversity. For the first time we will be able to ask questions on a genomic level of whole ecosystems. Ensembl plans to lead the way in annotating the diverse range of species that the DToL project will encompass.
As part of Ensembl Genebuild, you will join a team of bioinformaticians and developers who are experts in gene structure annotation. Your role will be to help with the effort to expand our annotation pipelines to produce the most high-quality annotation possible across the diverse range of species covered by DToL. Certain taxonomic groups, such as plants and invertebrate metazoa, pose specific challenges for gene annotation including very large evolutionary distances between species, a general lack of well characterised proteins and a sparsity of well annotated references species.
To address these issues, we will be developing new methods for genome annotation that exploit maximum utility from the available data. In particular we will be deploying and building upon the latest software for mapping and correcting long and short read transcriptome data sets. We will add hint-guided ab initio methods to our pipelines to help annotate data-sparse genomes. We will also examine the limits for cross-species mapping of both nucleotide and protein data.
The Ensembl Genebuild team have domain area expertise in software development, large-scale compute, big data, pipeline workflows and automation. We collaborate with consortia and communities from all over the world to annotate new genomes. Current and future projects for the team include: scaling up gene annotation pipelines, implementing machine learning techniques to improve gene structure annotation, working with long-read (PacBio IsoSeq, Nanopore) transcriptomic data to extending existing annotations and identify new splice variants.
Your main responsibility will be the development and deployment our large-scale annotation system to produce high-quality gene annotation. More specifically, you will:
You will be to write, understand and maintain complex code. You will also have domain experience in some of the following:
You will have good communication and interpersonal skills, and be a self-starter who can manage their own time to meet the needs of several projects. The key attributes sought are the ability to work in a team, excellent attention to detail, solid problem-solving skills, and the desire to learn and improve. Furthermore, you should demonstrate your ability to communicate both biological and computational ideas (orally and in writing), time management to deadlines, and a desire to work in an international environment.
Previous experience of processing large biological data sets in a production environment would be advantageous, including an understanding of compute clusters, pipeline workflows, software design and automation. Evidence of working in a dynamic, team-based environment or contributing to a large, shared code-base is desirable.