EMBL-EBI - European Bioinformatics Institute

Hinxton, United Kingdom

We are seeking a highly motivated bioinformatician to join Ensembl, a world leading provider of genomics data resources and bioinformatics software tools.

The Darwin Tree of Life (DToL) project plans to sequence, assemble and annotate all 66000 eukaryotic species in the UK. This will offer an unprecedented molecular-level insights into evolution and biodiversity. For the first time we will be able to ask questions on a genomic level of whole ecosystems. Ensembl plans to lead the way in annotating the diverse range of species that the DToL project will encompass.

As part of Ensembl Genebuild, you will join a team of bioinformaticians and developers who are experts in gene structure annotation. Your role will be to help with the effort to expand our annotation pipelines to produce the most high-quality annotation possible across the diverse range of species covered by DToL. Certain taxonomic groups, such as plants and invertebrate metazoa, pose specific challenges for gene annotation including very large evolutionary distances between species, a general lack of well characterised proteins and a sparsity of well annotated references species.

To address these issues, we will be developing new methods for genome annotation that exploit maximum utility from the available data. In particular we will be deploying and building upon the latest software for mapping and correcting long and short read transcriptome data sets. We will add hint-guided ab initio methods to our pipelines to help annotate data-sparse genomes. We will also examine the limits for cross-species mapping of both nucleotide and protein data.

The Ensembl Genebuild team have domain area expertise in software development, large-scale compute, big data, pipeline workflows and automation. We collaborate with consortia and communities from all over the world to annotate new genomes. Current and future projects for the team include: scaling up gene annotation pipelines, implementing machine learning techniques to improve gene structure annotation, working with long-read (PacBio IsoSeq, Nanopore) transcriptomic data to extending existing annotations and identify new splice variants.

Your role

Your main responsibility will be the development and deployment our large-scale annotation system to produce high-quality gene annotation. More specifically, you will:

  • Produce high-quality, evidence-based gene sets for species across the eukaryotic tree-of-life, including protein-coding genes, noncoding RNA genes and pseudogenes;
  • Contribute significantly to the design of new annotation methods for non-vertebrate genomes;
  • Work in a release-based environment and coordinate with other teams;
  • Collaborate with international partners on genome projects;
  • Participate in training users in our annotation methods and workflows;
  • Work with state-of-the-art primary data to help improve gene structure annotation.

You have

  • You should hold an MSc, PhD or equivalent experience in Computer Science, Bioinformatics, Genetics or a related field.

You will be to write, understand and maintain complex code. You will also have domain experience in some of the following:

  • Genome annotation;
  • Methods for DNA/RNA sequencing and sequence alignment;
  • Relational databases;
  • Scaling and optimisation;
  • Machine learning;
  • Non-vertebrate biology.

You will have good communication and interpersonal skills, and be a self-starter who can manage their own time to meet the needs of several projects. The key attributes sought are the ability to work in a team, excellent attention to detail, solid problem-solving skills, and the desire to learn and improve. Furthermore, you should demonstrate your ability to communicate both biological and computational ideas (orally and in writing), time management to deadlines, and a desire to work in an international environment.

You might also have

Previous experience of processing large biological data sets in a production environment would be advantageous, including an understanding of compute clusters, pipeline workflows, software design and automation. Evidence of working in a dynamic, team-based environment or contributing to a large, shared code-base is desirable.

Apply Now

Don't forget to mention EuroScienceJobs when applying.

Share this Job

© EuroJobsites 2019

EuroJobsites is a registered company number: 4694396 VAT number: GB 880 9055 04

Registered address: EuroJobsites Ltd, Unit 8, Kingsmill Business Park, Kingston Upon Thames, London, KT1 3GZ, United Kingdom

Newsletter | Recruit | Advertise | Privacy | Contact Us

© EuroJobsites 2019

EuroJobsites is a registered company number: 4694396 VAT number: GB 880 9055 04

Registered address: EuroJobsites Ltd, Unit 8, Kingsmill Business Park, Kingston Upon Thames, London, KT1 3GZ, United Kingdom

This website uses cookies to make your experience better. Continued use of this website means you accept our cookie policy.  Accept Cookies