Publications

Virtual Parts Repository 2: Model-Driven Design of Genetic Regulatory Circuits

Engineering genetic regulatory circuits is key to the creation of biological applications that are responsive to environmental changes. Computational models can assist in understanding especially large and complex circuits for which manual analysis is infeasible, permitting a model-driven design process. However, there are still few tools that offer the ability to simulate the system under design. One of the reasons for this is the lack of accessible model repositories or libraries that cater to the modular composition of models of synthetic systems. Here, we present the second version of the Virtual Parts Repository, a framework to facilitate the model-driven design of genetic regulatory circuits, which provides reusable, modular, and composable models. The new framework is service-oriented, easier to use in computational workflows, and provides several new features and access methods. New features include supporting hierarchical designs via a graph-based repository or compatible remote repositories, enriching existing designs, and using designs provided in Synthetic Biology Open Language documents to derive system-scale and hierarchical Systems Biology Markup Language models. We also present a reaction-based modeling abstraction inspired by rule-based modeling techniques to facilitate scalable and modular modeling of complex and large designs. This modeling abstraction enhances the modeling capability of the framework, for example, to incorporate design patterns such as roadblocking, distributed deployment of genetic circuits using plasmids, and cellular resource dependency. The framework and the modeling abstraction presented in this paper allow computational design tools to take advantage of computational simulations and ultimately help facilitate more predictable applications.

Synthetic Biology Open Language Visual (SBOL Visual) Version 3.0

People who engineer biological organisms often find it useful to draw diagrams in order to communicate both the structure of the nucleic acid sequences that they are engineering and the functional relationships between sequence features and other molecular species. Some typical practices and conventions have begun to emerge for such diagrams. SBOL Visual aims to organize and systematize such conventions in order to produce a coherent language for expressing the structure and function of genetic designs. This document details version 3.0 of SBOL Visual, a new major revision of the standard. The major difference between SBOL Visual 3 and SBOL Visual 2 is that diagrams and glyphs are defined with respect to the SBOL 3 data model rather than the SBOL 2 data model. A byproduct of this change is that the use of dashed undirected lines for subsystem mappings has been removed, pending future determination on how to represent general SBOL 3 constraints; in the interim, this annotation can still be used as an annotation. Finally, deprecated material has been removed from collection of glyphs: the deprecated insulator'' glyph and macromolecule'' alternative glyphs have been removed, as have the deprecated BioPAX alternatives to SBO terms.

Curation Principles Derived from the Analysis of the SBOL iGEM Data Set

As an engineering endeavor, synthetic biology requires effective sharing of genetic design information that can be reused in the construction of new designs. While there are a number of large community repositories of design information, curation of this information has been limited. This in turn limits the ways in which design information can be put to use. The aim of this work was to improve this situation by creating a curated library of parts from the International Genetically Engineered Machines (iGEM) registry data set. To this end, an analysis of the Synthetic Biology Open Language (SBOL) version of the iGEM registry was carried out using four different approaches-simple statistics, SnapGene autoannotation, SYNBICT autoannotation, and expert analysis-the results of which are presented herein. Key challenges encountered include the use of free text, insufficient part provenance, part duplication, lack of part removal, and insufficient continuous curation. On the basis of these analyses, the focus has shifted from the creation of a curated iGEM part library to instead the extraction of a set of lessons, which are presented here. These lessons can be exploited to facilitate the creation and curation of other part libraries using a simpler and less labor intensive process.

BioParts—A Biological Parts Search Portal and Updates to the ICE Parts Registry Software Platform

Capturing, storing, and sharing biological DNA parts data are integral parts of synthetic biology research. Here, we detail updates to the ICE biological parts registry software platform that enable these processes, describe our implementation of the Web of Registries concept using ICE, and establish Bioparts, a search portal for biological parts available in the public domain. The Web of Registries enables standalone ICE installations to securely connect and form a distributed parts database. This distributed database allows users from one registry to query and access plasmid, strain, (DNA) part, plant seed, and protein entry types in other connected registries. Users can also transfer entries from one ICE registry to another or make them publicly accessible. Bioparts, the new search portal, combines the ease and convenience of modern web search engines with the capabilities of bioinformatics search tools such as BLAST. This portal, available at bioparts.org, allows anyone to search for publicly accessible biological part information (e.g., NCBI, iGEM, SynBioHub, Addgene), including parts publicly accessible through ICE Registries. Additionally, the portal offers a REST API that enables third-party applications and tools to access the portal’s functionality programmatically.

Toward Full-Stack In Silico Synthetic Biology: Integrating Model Specification, Simulation, Verification, and Biological Compilation

We present the Infobiotics Workbench (IBW), a user-friendly, scalable, and integrated computational environment for the computer-aided design of synthetic biological systems. It supports an iterative workflow that begins with specification of the desired synthetic system, followed by simulation and verification of the system in high-performance environments and ending with the eventual compilation of the system specification into suitable genetic constructs. IBW integrates modeling, simulation, verification, and biocompilation features into a single software suite. This integration is achieved through a new domain-specific biological programming language, the Infobiotics Language (IBL), which tightly combines these different aspects of in silico synthetic biology into a full-stack integrated development environment. Unlike existing synthetic biology modeling or specification languages, IBL uniquely blends modeling, verification, and biocompilation statements into a single file. This allows biologists to incorporate design constraints within the specification file rather than using decoupled and independent formalisms for different in silico analyses. This novel approach offers seamless interoperability across different tools as well as compatibility with SBOL and SBML frameworks and removes the burden of doing manual translations for standalone applications. We demonstrate the features, usability, and effectiveness of IBW and IBL using well-established synthetic biological circuits.

Synthetic Biology Open Language Visual (SBOL Visual) Version 2.3

People who are engineering biological organisms often find it useful to communicate in diagrams, both about the structure of the nucleic acid sequences that they are engineering and about the functional relationships between sequence features and other molecular species. Some typical practices and conventions have begun to emerge for such diagrams. The Synthetic Biology Open Language Visual (SBOL Visual) has been developed as a standard for organizing and systematizing such conventions in order to produce a coherent language for expressing the structure and function of genetic designs. This document details version 2.3 of SBOL Visual, which builds on the prior SBOL Visual 2.2 in several ways. First, the specification now includes higher-level interactions with interactions,'' such as an inducer molecule stimulating a repression interaction. Second, binding with a nucleic acid backbone can be shown by overlapping glyphs, as with other molecular complexes. Finally, a new unspecified interaction'' glyph is added for visualizing interactions whose nature is unknown, the insulator'' glyph is deprecated in favor of a new inert DNA spacer'' glyph, and the polypeptide region glyph is recommended for showing 2A sequences.

paraSBOLv: A Foundation for Standard-Compliant Genetic Design Visualization Tools

Diagrams constructed from standardized glyphs are central to communicating complex design information in many engineering fields. For example, circuit diagrams are commonplace in electronics and allow for a suitable abstraction of the physical system that helps support the design process. With the development of the Synthetic Biology Open Language Visual (SBOLv), bioengineers are now positioned to better describe and share their biological designs visually. However, the development of computational tools to support the creation of these diagrams is currently hampered by an excessive burden in maintenance due to the large and expanding number of glyphs present in the standard. Here, we present a Python package called paraSBOLv that enables access to the full suite of SBOLv glyphs through the use of machine-readable parametric glyph definitions. These greatly simplify the rendering process while allowing extensive customization of the resulting diagrams. We demonstrate how the adoption of paraSBOLv can accelerate the development of highly specialized biodesign visualization tools or even form the basis for more complex software by removing the burden of maintaining glyph-specific rendering code. Looking forward, we suggest that incorporation of machine-readable parametric glyph definitions into the SBOLv standard could further simplify the development of tools to produce standard-compliant diagrams and the integration of visual standards across fields.

Flapjack: Data Management and Analysis for Genetic Circuit Characterization

Characterization is fundamental to the design, build, test, learn (DBTL) cycle for engineering synthetic genetic circuits. Components must be described in such a way as to account for their behavior in a range of contexts. Measurements and associated metadata, including part composition, constitute the test phase of the DBTL cycle. These data may consist of measurements of thousands of circuits, measured in hundreds of conditions, in multiple assays potentially performed in different laboratories and using different techniques. In order to inform the learn phase this large volume of data must be filtered, collated, and analyzed. Characterization consists of using this data to parametrize models of component function in different contexts, and combining them to predict behaviors of novel circuits. Tools to store, organize, share, and analyze large volumes of measurement and metadata are therefore essential to linking the test phase to the build and learn phases, closing the loop of the DBTL cycle. Here we present such a system, implemented as a web app with a backend data registry and analysis engine. An interactive frontend provides powerful querying, plotting, and analysis tools, and we provide a REST API and Python package for full integration with external build and learn software. All measurements are associated with circuit part composition via SBOL (Synthetic Biology Open Language). We demonstrate our tool by characterizing a range of genetic components and circuits according to composition and context.

Genetic Circuit Dynamics: Hazard and Glitch Analysis

Multiple input changes can cause unwanted switching variations, or glitches, in the output of genetic combinational circuits. These glitches can have drastic effects if the output of the circuit causes irreversible changes within or with other cells such as a cascade of responses, apoptosis, or the release of a pharmaceutical in an off-target tissue. Therefore, avoiding unwanted variation of a circuit’s output can be crucial for the safe operation of a genetic circuit. This paper investigates what causes unwanted switching variations in combinational genetic circuits using hazard analysis and a new dynamic model generator. The analysis is done in previously built and modeled genetic circuits with known glitching behavior. The dynamic models generated not only predict the same steady states as previous models but can also predict the unwanted switching variations that have been observed experimentally. Multiple input changes may cause glitches due to propagation delays within the circuit. Modifying the circuit’s layout to alter these delays may change the likelihood of certain glitches, but it cannot eliminate the possibility that the glitch may occur. In other words, function hazards cannot be eliminated. Instead, they must be avoided by restricting the allowed input changes to the system. Logic hazards, on the other hand, can be avoided using hazard-free logic synthesis. This paper demonstrates this by showing how a circuit designed using a popular genetic design automation tool can be redesigned to eliminate logic hazards.

Synthetic Biology Open Language Visual (SBOL Visual) Version 2.2

People who are engineering biological organisms often find it useful to communicate in diagrams, both about the structure of the nucleic acid sequences that they are engineering and about the functional relationships between sequence features and other molecular species. Some typical practices and conventions have begun to emerge for such diagrams. The Synthetic Biology Open Language Visual (SBOL Visual) has been developed as a standard for organizing and systematizing such conventions in order to produce a coherent language for expressing the structure and function of genetic designs. This document details version 2.2 of SBOL Visual, which builds on the prior SBOL Visual 2.1 in several ways. First, the grounding of molecular species glyphs is changed from BioPAX to SBO, aligning with the use of SBO terms for interaction glyphs. Second, new glyphs are added for proteins, introns, and polypeptide regions (e.~g., protein domains), the prior recommended macromolecule glyph is deprecated in favor of its alternative, and small polygons are introduced as alternative glyphs for simple chemicals.

Synthetic biology open language visual (SBOL visual) version 2.2

People who are engineering biological organisms often find it useful to communicate in diagrams, both about the structure of the nucleic acid sequences that they are engineering and about the functional relationships between sequence features and other molecular species. Some typical practices and conventions have begun to emerge for such diagrams. The Synthetic Biology Open Language Visual (SBOL Visual) has been developed as a standard for organizing and systematizing such conventions in order to produce a coherent language for expressing the structure and function of genetic designs. This document details version 2.2 of SBOL Visual, which builds on the prior SBOL Visual 2.1 in several ways. First, the grounding of molecular species glyphs is changed from BioPAX to SBO, aligning with the use of SBO terms for interaction glyphs. Second, new glyphs are added for proteins, introns, and polypeptide regions (e. g., protein domains), the prior recommended macromolecule glyph is deprecated in favor of its alternative, and small polygons are introduced as alternative glyphs for simple chemicals.

Synthetic biology open language (SBOL) version 3.0.0

Synthetic biology open language (SBOL) version 3.0.0

Synthetic biology builds upon genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. When designing a synthetic system, synthetic biologists need to exchange information about multiple types of molecules, the intended behavior of the system, and actual experimental measurements. The Synthetic Biology Open Language (SBOL) has been developed as a standard to support the specification and exchange of biological design information in synthetic biology, following an open community process involving both wet bench scientists and dry scientific modelers and software developers, across academia, industry, and other institutions. This document describes SBOL 3.0.0, which condenses and simplifies previous versions of SBOL based on experiences in deployment across a variety of scientific and industrial settings. In particular, SBOL 3.0.0, (1) separates sequence features from part/sub-part relationships, (2) renames Component Definition/Component to Component/Sub-Component, (3) merges Component and Module classes, (4) ensures consistency between data model and ontology terms, (5) extends the means to define and reference Sub-Components, (6) refines requirements on object URIs, (7) enables graph-based serialization, (8) moves Systems Biology Ontology (SBO) for Component types, (9) makes all sequence associations explicit, (10) makes interfaces explicit, (11) generalizes Sequence Constraints into a general structural Constraint class, and (12) expands the set of allowed constraints.

Communicating Structure and Function in Synthetic Biology Diagrams

Biological engineers often find it useful to communicate using diagrams. These diagrams can include information both about the structure of the nucleic acid sequences they are engineering and about the functional relationships between features of these sequences and/or other molecular species. A number of conventions and practices have begun to emerge within synthetic biology for creating such diagrams, and the Synthetic Biology Open Language Visual (SBOL Visual) has been developed as a standard to organize, systematize, and extend such conventions in order to produce a coherent visual language. Here, we describe SBOL Visual version 2, which expands previous diagram standards to include new functional interactions, categories of molecular species, support for families of glyph variants, and the ability to indicate modular structure and mappings between elements of a system. SBOL Visual 2 also clarifies a number of requirements and best practices, significantly expands the collection of glyphs available to describe genetic features, and can be readily applied using a wide variety of software tools, both general and bespoke.

Communicating Structure and Function in Synthetic Biology Diagrams

Biological engineers often find it useful to communicate using diagrams. These diagrams can include information both about the structure of the nucleic acid sequences they are engineering and about the functional relationships between features of these sequences and/or other molecular species. A number of conventions and practices have begun to emerge within synthetic biology for creating such diagrams, and the Synthetic Biology Open Language Visual (SBOL Visual) has been developed as a standard to organize, systematize, and extend such conventions in order to produce a coherent visual language. Here, we describe SBOL Visual version 2, which expands previous diagram standards to include new functional interactions, categories of molecular species, support for families of glyph variants, and the ability to indicate modular structure and mappings between elements of a system. SBOL Visual 2 also clarifies a number of requirements and best practices, significantly expands the collection of glyphs available to describe genetic features, and can be readily applied using a wide variety of software tools, both general and bespoke.

SBOL-OWL: An Ontological Approach for Formal and Semantic Representation of Synthetic Biology Information

Standard representation of data is key for the reproducibility of designs in synthetic biology. The Synthetic Biology Open Language (SBOL) has already emerged as a data standard to represent information about genetic circuits, and it is based on capturing data using graphs. The language provides the syntax using a free text document that is accessible to humans only. This paper describes SBOL-OWL, an ontology for a machine understandable definition of SBOL. This ontology acts as a semantic layer for genetic circuit designs. As a result, computational tools can understand the meaning of design entities in addition to parsing structured SBOL data. SBOL-OWL not only describes how genetic circuits can be constructed computationally, it also facilitates the use of several existing Semantic Web tools for synthetic biology. This paper demonstrates some of these features, for example, to validate designs and check for inconsistencies. Through the use of SBOL-OWL, queries can be simplified and become more intuitive. Moreover, existing reasoners can be used to infer information about genetic circuit designs that cannot be directly retrieved using existing querying mechanisms. This ontological representation of the SBOL standard provides a new perspective to the verification, representation, and querying of information about genetic circuits and is important to incorporate complex design information via the integration of biological ontologies.

A Computational Workflow for the Automated Generation of Models of Genetic Designs

Computational models are essential to engineer predictable biological systems and to scale up this process for complex systems. Computational modeling often requires expert knowledge and data to build models. Clearly, manual creation of models is not scalable for large designs. Despite several automated model construction approaches, computational methodologies to bridge knowledge in design repositories and the process of creating computational models have still not been established. This paper describes a workflow for automatic generation of computational models of genetic circuits from data stored in design repositories using existing standards. This workflow leverages the software tool SBOLDesigner to build structural models that are then enriched by the Virtual Parts Repository API using Systems Biology Open Language (SBOL) data fetched from the SynBioHub design repository. The iBioSim software tool is then utilized to convert this SBOL description into a computational model encoded using the Systems Biology Markup Language (SBML). Finally, this SBML model can be simulated using a variety of methods. This workflow provides synthetic biologists with easy to use tools to create predictable biological systems, hiding away the complexity of building computational models. This approach can further be incorporated into other computational workflows for design automation.

SBOL-OWL: An Ontological Approach for Formal and Semantic Representation of Synthetic Biology Information

Standard representation of data is key for the reproducibility of designs in synthetic biology. The Synthetic Biology Open Language (SBOL) has already emerged as a data standard to represent information about genetic circuits, and it is based on capturing data using graphs. The language provides the syntax using a free text document that is accessible to humans only. This paper describes SBOL-OWL, an ontology for a machine understandable definition of SBOL. This ontology acts as a semantic layer for genetic circuit designs. As a result, computational tools can understand the meaning of design entities in addition to parsing structured SBOL data. SBOL-OWL not only describes how genetic circuits can be constructed computationally, it also facilitates the use of several existing Semantic Web tools for synthetic biology. This paper demonstrates some of these features, for example, to validate designs and check for inconsistencies. Through the use of SBOL-OWL, queries can be simplified and become more intuitive. Moreover, existing reasoners can be used to infer information about genetic circuit designs that cannot be directly retrieved using existing querying mechanisms. This ontological representation of the SBOL standard provides a new perspective to the verification, representation, and querying of information about genetic circuits and is important to incorporate complex design information via the integration of biological ontologies.

A Computational Workflow for the Automated Generation of Models of Genetic Designs

Computational models are essential to engineer predictable biological systems and to scale up this process for complex systems. Computational modeling often requires expert knowledge and data to build models. Clearly, manual creation of models is not scalable for large designs. Despite several automated model construction approaches, computational methodologies to bridge knowledge in design repositories and the process of creating computational models have still not been established. This paper describes a workflow for automatic generation of computational models of genetic circuits from data stored in design repositories using existing standards. This workflow leverages the software tool SBOLDesigner to build structural models that are then enriched by the Virtual Parts Repository API using Systems Biology Open Language (SBOL) data fetched from the SynBioHub design repository. The iBioSim software tool is then utilized to convert this SBOL description into a computational model encoded using the Systems Biology Markup Language (SBML). Finally, this SBML model can be simulated using a variety of methods. This workflow provides synthetic biologists with easy to use tools to create predictable biological systems, hiding away the complexity of building computational models. This approach can further be incorporated into other computational workflows for design automation.

Synthetic Biology Open Language Visual (SBOL Visual) Version 2.1

Abstract People who are engineering biological organisms often find it useful to communicate in diagrams, both about the structure of the nucleic acid sequences that they are engineering and about the functional relationships between sequence features and other molecular species . Some typical practices and conventions have begun to emerge for such diagrams. The Synthetic Biology Open Language Visual (SBOL Visual) has been developed as a standard for organizing and systematizing such conventions in order to produce a coherent language for expressing the structure and function of genetic designs. This document details version 2.1 of SBOL Visual, which builds on the prior SBOL Visual 2.0 standard by expanding diagram syntax to include methods for showing modular structure and mappings between elements of a system, interactions arrows that can split or join (with the glyph at the split or join indicating either superposition or a chemical process), and adding new glyphs for indicating genomic context (e.g., integration into a plasmid or genome) and for stop codons.

Synthetic Biology Open Language (SBOL) Version 2.3

textlesssection class="abstract"textgreatertextlessh2 class="abstractTitle text-title my-1" id="d230e2"textgreaterAbstracttextless/h2textgreatertextlessptextgreaterSynthetic biology builds upon the techniques and successes of genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. The field still faces substantial challenges, including long development times, high rates of failure, and poor reproducibility. One method to ameliorate these problems is to improve the exchange of information about designed systems between laboratories. The synthetic biology open language (SBOL) has been developed as a standard to support the specification and exchange of biological design information in synthetic biology, filling a need not satisfied by other pre-existing standards. This document details version 2.3.0 of SBOL, which builds upon version 2.2.0 published in last year’s JIB Standards in Systems Biology special issue. In particular, SBOL 2.3.0 includes means of succinctly representing sequence modifications, such as insertion, deletion, and replacement, an extension to support organization and attachment of experimental data derived from designs, and an extension for describing numerical parameters of design elements. The new version also includes specifying types of synthetic biology activities, unambiguous locations for sequences with multiple encodings, refinement of a number of validation rules, improved figures and examples, and clarification on a number of issues related to the use of external ontology terms.textless/ptextgreatertextless/sectiontextgreater

Synthetic Biology Open Language Visual (SBOL Visual) Version 2.1

People who are engineering biological organisms often find it useful to communicate in diagrams, both about the structure of the nucleic acid sequences that they are engineering and about the functional relationships between sequence features and other molecular species . Some typical practices and conventions have begun to emerge for such diagrams. The Synthetic Biology Open Language Visual (SBOL Visual) has been developed as a standard for organizing and systematizing such conventions in order to produce a coherent language for expressing the structure and function of genetic designs. This document details version 2.1 of SBOL Visual, which builds on the prior SBOL Visual 2.0 standard by expanding diagram syntax to include methods for showing modular structure and mappings between elements of a system, interactions arrows that can split or join (with the glyph at the split or join indicating either superposition or a chemical process), and adding new glyphs for indicating genomic context (e.g., integration into a plasmid or genome) and for stop codons.

Synthetic Biology Open Language (SBOL) Version 2.3

Synthetic biology builds upon the techniques and successes of genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. The field still faces substantial challenges, including long development times, high rates of failure, and poor reproducibility. One method to ameliorate these problems is to improve the exchange of information about designed systems between laboratories. The synthetic biology open language (SBOL) has been developed as a standard to support the specification and exchange of biological design information in synthetic biology, filling a need not satisfied by other pre-existing standards. This document details version 2.3.0 of SBOL, which builds upon version 2.2.0 published in last year’s JIB Standards in Systems Biology special issue. In particular, SBOL 2.3.0 includes means of succinctly representing sequence modifications, such as insertion, deletion, and replacement, an extension to support organization and attachment of experimental data derived from designs, and an extension for describing numerical parameters of design elements. The new version also includes specifying types of synthetic biology activities, unambiguous locations for sequences with multiple encodings, refinement of a number of validation rules, improved figures and examples, and clarification on a number of issues related to the use of external ontology terms.

SynBioHub: A Standards-Enabled Design Repository for Synthetic Biology

The SynBioHub repository (https://synbiohub.org) is an open-source software project that facilitates the sharing of information about engineered biological systems. SynBioHub provides computational access for software and data integration, and a graphical user interface that enables users to search for and share designs in a Web browser. By connecting to relevant repositories (e.g., the iGEM repository, JBEI ICE, and other instances of SynBioHub), the software allows users to browse, upload, and download data in various standard formats, regardless of their location or representation. SynBioHub also provides a central reference point for other resources to link to, delivering design information in a standardized format using the Synthetic Biology Open Language (SBOL). The adoption and use of SynBioHub, a community-driven effort, has the potential to overcome the reproducibility challenge across laboratories by helping to address the current lack of information about published designs.

SynBioHub: A Standards-Enabled Design Repository for Synthetic Biology

The SynBioHub repository (https://synbiohub.org) is an open-source software project that facilitates the sharing of information about engineered biological systems. SynBioHub provides computational access for software and data integration, and a graphical user interface that enables users to search for and share designs in a Web browser. By connecting to relevant repositories (e.g., the iGEM repository, JBEI ICE, and other instances of SynBioHub), the software allows users to browse, upload, and download data in various standard formats, regardless of their location or representation. SynBioHub also provides a central reference point for other resources to link to, delivering design information in a standardized format using the Synthetic Biology Open Language (SBOL). The adoption and use of SynBioHub, a community-driven effort, has the potential to overcome the reproducibility challenge across laboratories by helping to address the current lack of information about published designs.

Data Integration and Mining for Synthetic Biology Design

One aim of synthetic biologists is to create novel and predictable biological systems from simpler modular parts. This approach is currently hampered by a lack of well-defined and characterized parts and devices. However, there is a wealth of existing biological information, which can be used to identify and characterize biological parts, and their design constraints in the literature and numerous biological databases. However, this information is spread among these databases in many different formats. New computational approaches are required to make this information available in an integrated format that is more amenable to data mining. A tried and tested approach to this problem is to map disparate data sources into a single data set, with common syntax and semantics, to produce a data warehouse or knowledge base. Ontologies have been used extensively in the life sciences, providing this common syntax and semantics as a model for a given biological domain, in a fashion that is amenable to computational analysis and reasoning. Here, we present an ontology for applications in synthetic biology design, SyBiOnt, which facilitates the modeling of information about biological parts and their relationships. SyBiOnt was used to create the SyBiOntKB knowledge base, incorporating and building upon existing life sciences ontologies and standards. The reasoning capabilities of ontologies were then applied to automate the mining of biological parts from this knowledge base. We propose that this approach will be useful to speed up synthetic biology design and ultimately help facilitate the automation of the biological engineering life cycle.

Data Integration and Mining for Synthetic Biology Design

One aim of synthetic biologists is to create novel and predictable biological systems from simpler modular parts. This approach is currently hampered by a lack of well-defined and characterized parts and devices. However, there is a wealth of existing biological information, which can be used to identify and characterize biological parts, and their design constraints in the literature and numerous biological databases. However, this information is spread among these databases in many different formats. New computational approaches are required to make this information available in an integrated format that is more amenable to data mining. A tried and tested approach to this problem is to map disparate data sources into a single data set, with common syntax and semantics, to produce a data warehouse or knowledge base. Ontologies have been used extensively in the life sciences, providing this common syntax and semantics as a model for a given biological domain, in a fashion that is amenable to computational analysis and reasoning. Here, we present an ontology for applications in synthetic biology design, SyBiOnt, which facilitates the modeling of information about biological parts and their relationships. SyBiOnt was used to create the SyBiOntKB knowledge base, incorporating and building upon existing life sciences ontologies and standards. The reasoning capabilities of ontologies were then applied to automate the mining of biological parts from this knowledge base. We propose that this approach will be useful to speed up synthetic biology design and ultimately help facilitate the automation of the biological engineering life cycle.

Synthetic Biology Open Language (SBOL) Version 2.1.0

Synthetic biology builds upon the techniques and successes of genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. The field still faces substantial challenges, including long development times, high rates of failure, and poor reproducibility. One method to ameliorate these problems would be to improve the exchange of information about designed systems between laboratories. The Synthetic Biology Open Language (SBOL) has been developed as a standard to support the specification and exchange of biological design information in synthetic biology, filling a need not satisfied by other pre-existing standards. This document details version 2.1 of SBOL that builds upon version 2.0 published in last year’s JIB special issue. In particular, SBOL 2.1 includes improved rules for what constitutes a valid SBOL document, new role fields to simplify the expression of sequence features and how components are used in context, and new best practices descriptions to improve the exchange of basic sequence topology information and the description of genetic design provenance, as well as miscellaneous other minor improvements.

Synthetic Biology Open Language (SBOL) Version 2.1.0

textlesssection class="abstract"textgreatertextlessh2 class="abstractTitle text-title my-1" id="d807e2"textgreaterSummarytextless/h2textgreatertextlessptextgreaterSynthetic biology builds upon the techniques and successes of genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. The field still faces substantial challenges, including long development times, high rates of failure, and poor reproducibility. One method to ameliorate these problems would be to improve the exchange of information about designed systems between laboratories. The Synthetic Biology Open Language (SBOL) has been developed as a standard to support the specification and exchange of biological design information in synthetic biology, filling a need not satisfied by other pre-existing standards. This document details version 2.1 of SBOL that builds upon version 2.0 published in last year’s JIB special issue. In particular, SBOL 2.1 includes improved rules for what constitutes a valid SBOL document, new role fields to simplify the expression of sequence features and how components are used in context, and new best practices descriptions to improve the exchange of basic sequence topology information and the description of genetic design provenance, as well as miscellaneous other minor improvements.textless/ptextgreatertextless/sectiontextgreater

The SBOL Stack: A Platform for Storing, Publishing, and Sharing Synthetic Biology Designs

Recently, synthetic biologists have developed the Synthetic Biology Open Language (SBOL), a data exchange standard for descriptions of genetic parts, devices, modules, and systems. The goals of this standard are to allow scientists to exchange designs of biological parts and systems, to facilitate the storage of genetic designs in repositories, and to facilitate the description of genetic designs in publications. In order to achieve these goals, the development of an infrastructure to store, retrieve, and exchange SBOL data is necessary. To address this problem, we have developed the SBOL Stack, a Resource Description Framework (RDF) database specifically designed for the storage, integration, and publication of SBOL data. This database allows users to define a library of synthetic parts and designs as a service, to share SBOL data with collaborators, and to store designs of biological systems locally. The database also allows external data sources to be integrated by mapping them to the SBOL data model. The SBOL Stack includes two Web interfaces: the SBOL Stack API and SynBioHub. While the former is designed for developers, the latter allows users to upload new SBOL biological designs, download SBOL documents, search by keyword, and visualize SBOL data. Since the SBOL Stack is based on semantic Web technology, the inherent distributed querying functionality of RDF databases can be used to allow different SBOL stack databases to be queried simultaneously, and therefore, data can be shared between different institutes, centers, or other users.

The SBOL Stack: A Platform for Storing, Publishing, and Sharing Synthetic Biology Designs

Recently, synthetic biologists have developed the Synthetic Biology Open Language (SBOL), a data exchange standard for descriptions of genetic parts, devices, modules, and systems. The goals of this standard are to allow scientists to exchange designs of biological parts and systems, to facilitate the storage of genetic designs in repositories, and to facilitate the description of genetic designs in publications. In order to achieve these goals, the development of an infrastructure to store, retrieve, and exchange SBOL data is necessary. To address this problem, we have developed the SBOL Stack, a Resource Description Framework (RDF) database specifically designed for the storage, integration, and publication of SBOL data. This database allows users to define a library of synthetic parts and designs as a service, to share SBOL data with collaborators, and to store designs of biological systems locally. The database also allows external data sources to be integrated by mapping them to the SBOL data model. The SBOL Stack includes two Web interfaces: the SBOL Stack API and SynBioHub. While the former is designed for developers, the latter allows users to upload new SBOL biological designs, download SBOL documents, search by keyword, and visualize SBOL data. Since the SBOL Stack is based on semantic Web technology, the inherent distributed querying functionality of RDF databases can be used to allow different SBOL stack databases to be queried simultaneously, and therefore, data can be shared between different institutes, centers, or other users.

A Methodology to Annotate Systems Biology Markup Language Models with the Synthetic Biology Open Language

Recently, we have begun to witness the potential of synthetic biology, noted here in the form of bacteria and yeast that have been genetically engineered to produce biofuels, manufacture drug precursors, and even invade tumor cells. The success of these projects, however, has often failed in translation and application to new projects, a problem exacerbated by a lack of engineering standards that combine descriptions of the structure and function of DNA. To address this need, this paper describes a methodology to connect the systems biology markup language (SBML) to the synthetic biology open language (SBOL), existing standards that describe biochemical models and DNA components, respectively. Our methodology involves first annotating SBML model elements such as species and reactions with SBOL DNA components. A graph is then constructed from the model, with vertices corresponding to elements within the model and edges corresponding to the cause-and-effect relationships between these elements. Lastly, the graph is traversed to assemble the annotating DNA components into a composite DNA component, which is used to annotate the model itself and can be referenced by other composite models and DNA components. In this way, our methodology can be used to build up a hierarchical library of models annotated with DNA components. Such a library is a useful input to any future genetic technology mapping algorithm that would automate the process of composing DNA components to satisfy a behavioral specification. Our methodology for SBML-to-SBOL annotation is implemented in the latest version of our genetic design automation (GDA) software tool, iBioSim.

A Methodology to Annotate Systems Biology Markup Language Models with the Synthetic Biology Open Language

Recently, we have begun to witness the potential of synthetic biology, noted here in the form of bacteria and yeast that have been genetically engineered to produce biofuels, manufacture drug precursors, and even invade tumor cells. The success of these projects, however, has often failed in translation and application to new projects, a problem exacerbated by a lack of engineering standards that combine descriptions of the structure and function of DNA. To address this need, this paper describes a methodology to connect the systems biology markup language (SBML) to the synthetic biology open language (SBOL), existing standards that describe biochemical models and DNA components, respectively. Our methodology involves first annotating SBML model elements such as species and reactions with SBOL DNA components. A graph is then constructed from the model, with vertices corresponding to elements within the model and edges corresponding to the cause-and-effect relationships between these elements. Lastly, the graph is traversed to assemble the annotating DNA components into a composite DNA component, which is used to annotate the model itself and can be referenced by other composite models and DNA components. In this way, our methodology can be used to build up a hierarchical library of models annotated with DNA components. Such a library is a useful input to any future genetic technology mapping algorithm that would automate the process of composing DNA components to satisfy a behavioral specification. Our methodology for SBML-to-SBOL annotation is implemented in the latest version of our genetic design automation (GDA) software tool, iBioSim.

An End-to-End Workflow for Engineering of Biological Networks from High-Level Specifications

We present a workflow for the design and production of biological networks from high-level program specifications. The workflow is based on a sequence of intermediate models that incrementally translate high-level specifications into DNA samples that implement them. We identify algorithms for translating between adjacent models and implement them as a set of software tools, organized into a four-stage toolchain: Specification, Compilation, Part Assignment, and Assembly. The specification stage begins with a Boolean logic computation specified in the Proto programming language. The compilation stage uses a library of network motifs and cellular platforms, also specified in Proto, to transform the program into an optimized Abstract Genetic Regulatory Network (AGRN) that implements the programmed behavior. The part assignment stage assigns DNA parts to the AGRN, drawing the parts from a database for the target cellular platform, to create a DNA sequence implementing the AGRN. Finally, the assembly stage computes an optimized assembly plan to create the DNA sequence from available part samples, yielding a protocol for producing a sample of engineered plasmids with robotics assistance. Our workflow is the first to automate the production of biological networks from a high-level program specification. Furthermore, the workflow’s modular design allows the same program to be realized on different cellular platforms simply by swapping workflow configurations. We validated our workflow by specifying a small-molecule sensor-reporter program and verifying the resulting plasmids in both HEK 293 mammalian cells and in E. coli bacterial cells.

An End-to-End Workflow for Engineering of Biological Networks from High-Level Specifications

We present a workflow for the design and production of biological networks from high-level program specifications. The workflow is based on a sequence of intermediate models that incrementally translate high-level specifications into DNA samples that implement them. We identify algorithms for translating between adjacent models and implement them as a set of software tools, organized into a four-stage toolchain: Specification, Compilation, Part Assignment, and Assembly. The specification stage begins with a Boolean logic computation specified in the Proto programming language. The compilation stage uses a library of network motifs and cellular platforms, also specified in Proto, to transform the program into an optimized Abstract Genetic Regulatory Network (AGRN) that implements the programmed behavior. The part assignment stage assigns DNA parts to the AGRN, drawing the parts from a database for the target cellular platform, to create a DNA sequence implementing the AGRN. Finally, the assembly stage computes an optimized assembly plan to create the DNA sequence from available part samples, yielding a protocol for producing a sample of engineered plasmids with robotics assistance. Our workflow is the first to automate the production of biological networks from a high-level program specification. Furthermore, the workflow’s modular design allows the same program to be realized on different cellular platforms simply by swapping workflow configurations. We validated our workflow by specifying a small-molecule sensor-reporter program and verifying the resulting plasmids in both HEK 293 mammalian cells and in E. coli bacterial cells.

Refactoring the nitrogen fixation gene cluster from Klebsiella oxytoca

Bacterial genes associated with a single trait are often grouped in a contiguous unit of the genome known as a gene cluster. It is difficult to genetically manipulate many gene clusters because of complex, redundant, and integrated host regulation. We have developed a systematic approach to completely specify the genetics of a gene cluster by rebuilding it from the bottom up using only synthetic, well-characterized parts. This process removes all native regulation, including that which is undiscovered. First, all noncoding DNA, regulatory proteins, and nonessential genes are removed. The codons of essential genes are changed to create a DNA sequence as divergent as possible from the wild-type (WT) gene. Recoded genes are computationally scanned to eliminate internal regulation. They are organized into operons and placed under the control of synthetic parts (promoters, ribosome binding sites, and terminators) that are functionally separated by spacer parts. Finally, a controller consisting of genetic sensors and circuits regulates the conditions and dynamics of gene expression. We applied this approach to an agriculturally relevant gene cluster from Klebsiella oxytoca encoding the nitrogen fixation pathway for converting atmospheric N2 to ammonia. The native gene cluster consists of 20 genes in seven operons and is encoded in 23.5 kb of DNA. We constructed a “refactored” gene cluster that shares little DNA sequence identity with WT and for which the function of every genetic part is defined. This work demonstrates the potential for synthetic biology tools to rewrite the genetics encoding complex biological functions to facilitate access, engineering, and transferability.

Refactoring the Nitrogen Fixation Gene Cluster from Klebsiella Oxytoca

Bacterial genes associated with a single trait are often grouped in a contiguous unit of the genome known as a gene cluster. It is difficult to genetically manipulate many gene clusters because of complex, redundant, and integrated host regulation. We have developed a systematic approach to completely specify the genetics of a gene cluster by rebuilding it from the bottom up using only synthetic, well-characterized parts. This process removes all native regulation, including that which is undiscovered. First, all noncoding DNA, regulatory proteins, and nonessential genes are removed. The codons of essential genes are changed to create a DNA sequence as divergent as possible from the wild-type (WT) gene. Recoded genes are computationally scanned to eliminate internal regulation. They are organized into operons and placed under the control of synthetic parts (promoters, ribosome binding sites, and terminators) that are functionally separated by spacer parts. Finally, a controller consisting of genetic sensors and circuits regulates the conditions and dynamics of gene expression. We applied this approach to an agriculturally relevant gene cluster from Klebsiella oxytoca encoding the nitrogen fixation pathway for converting atmospheric N2 to ammonia. The native gene cluster consists of 20 genes in seven operons and is encoded in 23.5 kb of DNA. We constructed a ``refactored'' gene cluster that shares little DNA sequence identity with WT and for which the function of every genetic part is defined. This work demonstrates the potential for synthetic biology tools to rewrite the genetics encoding complex biological functions to facilitate access, engineering, and transferability.

Eugene – A Domain Specific Language for Specifying and Constraining Synthetic Biological Parts, Devices, and Systems

Background Synthetic biological systems are currently created by an ad-hoc, iterative process of specification, design, and assembly. These systems would greatly benefit from a more formalized and rigorous specification of the desired system components as well as constraints on their composition. Therefore, the creation of robust and efficient design flows and tools is imperative. We present a human readable language (Eugene) that allows for the specification of synthetic biological designs based on biological parts, as well as provides a very expressive constraint system to drive the automatic creation of composite Parts (Devices) from a collection of individual Parts. Results We illustrate Eugene’s capabilities in three different areas: Device specification, design space exploration, and assembly and simulation integration. These results highlight Eugene’s ability to create combinatorial design spaces and prune these spaces for simulation or physical assembly. Eugene creates functional designs quickly and cost-effectively. Conclusions Eugene is intended for forward engineering of DNA-based devices, and through its data types and execution semantics, reflects the desired abstraction hierarchy in synthetic biology. Eugene provides a powerful constraint system which can be used to drive the creation of new devices at runtime. It accomplishes all of this while being part of a larger tool chain which includes support for design, simulation, and physical device assembly.

Eugene – A Domain Specific Language for Specifying and Constraining Synthetic Biological Parts, Devices, and Systems

Background Synthetic biological systems are currently created by an ad-hoc, iterative process of specification, design, and assembly. These systems would greatly benefit from a more formalized and rigorous specification of the desired system components as well as constraints on their composition. Therefore, the creation of robust and efficient design flows and tools is imperative. We present a human readable language (Eugene) that allows for the specification of synthetic biological designs based on biological parts, as well as provides a very expressive constraint system to drive the automatic creation of composite Parts (Devices) from a collection of individual Parts. Results We illustrate Eugene’s capabilities in three different areas: Device specification, design space exploration, and assembly and simulation integration. These results highlight Eugene’s ability to create combinatorial design spaces and prune these spaces for simulation or physical assembly. Eugene creates functional designs quickly and cost-effectively. Conclusions Eugene is intended for forward engineering of DNA-based devices, and through its data types and execution semantics, reflects the desired abstraction hierarchy in synthetic biology. Eugene provides a powerful constraint system which can be used to drive the creation of new devices at runtime. It accomplishes all of this while being part of a larger tool chain which includes support for design, simulation, and physical device assembly.