2022 IRG-2: Capturing sequence-dependent behavior in biomolecular materials using chemically informed machine learning models

Sequence-encoded biomolecules are promising programmable building blocks for materials, but the complexity of these biomaterials can challenge their design. To enable the design of programmable biomolecular building blocks, the UCI MRSEC team has developed a machine learning model that learns from limited experimental data sets to map biomolecule sequence onto materials properties.

  • Developed for proof-of-principle material: DNA-stabilized silver nanoclusters (AgN-DNAs) with sequence-encoded fluorescence.
  • High-throughput experiments connect DNA sequence to AgN-DNA color (Fig. 1), and the known atomic sizes of “magic-colored” AgN-DNAs inform the ML classification problem.
  • Chemically informed ML classifiers combine known structural properties with training data to learn to distinguish DNA sequences classified by AgN-DNA color.
  • These models increase success of designing AgN-DNAs with target colors by up to 10-fold (Fig. 2a). Moreover, the models are are interpretable, providing insights into the sequence-color “code” for AgN-DNAs (Fig. 2b).

P Mastracco, A Gonzàlez-Rosell, SM Copp (University of California, Irvine)
J Evans (Chaffey Community College, CA) / P Bogdanov (University of Albany, SUNY, NY)