hero

Explore open positions at BlueYard portfolio companies

BlueYard Capital
companies
Jobs

Principal Data Architect

Chemify

Chemify

IT
Posted on Feb 15, 2026

Principal Data Architect

Principal Data Architect

Python | Cloud | Distributed Systems | AI/ML Infrastructure

Location: Glasgow or London (King’s Cross)

Workstyle: Hybrid

Reports to: CTO

About Chemify:

Chemify is revolutionising chemistry. We are creating a future where the synthesis of previously unimaginable molecules, drugs, and materials is instantly accessible. By combining AI, robotics, and the world’s largest continually expanding database of chemical programs, we are accelerating chemical discovery to improve quality of life and extend the reach of humanity.

Our newly opened Chemifarm facility in Glasgow operates a growing fleet of advanced robotic systems that automate synthesis, optimisation, and library generation. As we scale up to globally distributed facilities, we are undertaking a foundational transformation of data integrates and scales across our platform.

The Role

We are looking for a Principal Data Architect to design and lead the evolution of Chemify’s data architecture into a performant, distributed, well-governed, and enterprise-ready data ecosystem.

Your mission is to define & implement how data flows across our platforms, how it is stored, synchronized, governed, and shared, both within Chemify and with external partners while complying with contractual, regulatory, and security constraints.

This is a foundational role: your decisions will shape how Chemify scales from a single Chemifarm to a global network of automated laboratories that can safely collaborate with Enterprise customers and research partners.

If you enjoy problem solving complex technical challenges that blend system Architecture, Data engineering and Distributed systems, are a natural communicator and are energized by working closely with scientists using cutting edge technologies, then we’d love to welcome you to our team.

Key Responsibilities

AI-Native Data Strategy

  • Define the enterprise data architecture for scientific and operational data to ensure it is "ML-ready" from the moment of ingestion.
  • Establish a Data Lakehouse architecture on AWS to manage the massive scale of raw, unstructured "dark data" from robotic sensors (spectra, video, logs, etc. ).
  • Lead the strategic design and implementation of a unified chemical data fabric that integrates molecular structures, retrosynthetic reaction networks, and high-frequency robotic telemetry. You will be responsible for architecting a versioned Feature Store that standardizes chemical ontologies and pre-computed molecular descriptors, ensuring a seamless, high-fidelity data loop between robotic laboratory execution and AI-driven discovery engines.

Advanced Relational & Semantic Modeling

  • Architect Graph Data Models representing complex chemical reaction networks, optimized for synthesis AI and automated manufacturing planning.
  • Lead the development of Foundational Semantic Ontologies that allow AI models to reason across disparate chemical data types.
  • Design Vector Database integrations (e.g., pgvector, Pinecone) to facilitate similarity searches across billions of chemical entities.

Industrial Telemetry & Edge Synchronization

  • Architect the ingestion of high-frequency robot and sensor telemetry using MQTT/Streaming patterns, ensuring zero loss of "negative data" (failed experiments) critical for model training.
  • Design a globally distributed data system that synchronizes local lab "Edge" data with global AI training clusters while maintaining consistency guarantees.

Governance & Enterprise Readiness

  • Own the data governance framework, specifically defining Data Tenancy and Partitioning models for Fortune 500 clients to ensure strict IP isolation.
  • Architect secure, compliant Data Sharing patterns for external research partners, translating legal/contractual constraints into technical controls.
  • Drive the data architecture roadmap toward SOC 2 and ISO 27001 readiness, focusing on auditability and access control for training data.

About You

You are an experienced Architect (e.g., TOGAF, AWS Certified Solutions Architect, or equivalent) with strong Python expertise in production data. You have a natural curiosity for complex scientific domains and thrive on creating lasting value through building modern data engineering solutions.

We expect you to bring:

  • BSc or equivalent experience, preferably in a Data Engineering-related field.
  • 8+ years commercial Data Architect & python experience.
  • Deep experience with PostgreSQL, ideally in AWS RDS.
  • Proven experience designing high-throughput telemetry / IoT / industrial data systems generating very large volumes of time-series data.
  • Hands-on understanding of stream ingestion patterns (MQTT).
  • Experience with graph or Vector databases ( Neo4j, Pinecone, pgvector) and modelling complex, highly relational domains.
  • Proven experience designing distributed data systems across multiple services, teams, or locations.
  • Demonstrable experience building impactful solutions with:
  • Data governance frameworks
  • Data tenancy and segregation models
  • Data consistency and replication patterns
  • Secure data sharing between organizations

Beneficial Skills

  • Prior involvement in SOC 2, ISO 27001 compliance programmes, particularly from a data architecture perspective.
  • Exposure to scientific, chemical, or manufacturing data environments.
  • Familiarity with modern data stack components (e.g., data lakes, streaming, or batch/real-time pipelines).
  • Chemistry or AI Drug Discovery domain knowledge is a real differentiator for us

Why Join Chemify?

Impact:

You will help build the infrastructure that enables digital chemistry at scale — accelerating discovery, improving reproducibility, and unlocking new possibilities in science and medicine.

Autonomy:

Reporting directly to the CTO, you will have meaningful influence over the technical direction and data strategy of a Series B deep-tech rocket ship.

Ambition:

We are scaling rapidly, investing in world-class infrastructure, and tackling problems that sit at the frontier of robotics, AI, and chemistry. You will have the resources and mandate to build the right foundations for the future.