Our guiding philosophy is that twenty-first century biological research requires both “traditional” wet-lab as well as computational/bioinformatics approaches to studying the same set of problems. Hypotheses and ideas generated using one set of methods is tested and explored using the other, and lab members are encouraged to become conversant with both the test tube and the computer. The powerful combination of in silico and in vivo approaches enables us not only to make predictions but also to validate them within specific biological contexts.
The driving biological question in our laboratory is understanding the genetic regulatory circuitry that determines how cell fates are determined during development. We focus on two key aspects, intercellular signaling and transcriptional regulation, with the majority of our effort on the latter. Our focus is on the fruitfly Drosophila melanogaster due to its extremely well-annotated genome and amenability to experimental manipulation. However, all conclusions are expected to relate directly to mammalian (including human) gene regulation. Recently, we have also branched out into other insect species of both medical (e.g., mosquitoes) and agricultural (e.g. honeybees) importance. We are successfully pioneering methods for studying the regulatory genomics of these diverse species.
Gene expression is controlled by the binding of transcription factors to specific cis-regulatory elements. In the higher eukaryotes, these elements can lie 5' to, 3' to, or within introns of a gene; in some cases, they can even be found within protein coding sequences! Spatial and temporal aspects of gene expression are often controlled in a modular fashion, with individual cis-regulatory elements (termed "modules" or "enhancers") regulating expression in a particular time and place. An emerging theme is that a specific combination of transcription factors activiated as a result of intercellular signaling binds a regulatory module in conjunction with tissue-specific transcription factors ("selectors"), forming a "transcriptional code" that regulates the expression of a given gene (see Figure).
Together, the signaling and transcriptional events form a network of interactions in which signaling induces gene transcription, which can in turn lead to further signaling events, which then induce additional gene expression, and so on. Cascades of transcription can also occur, whereby transcription factors induce the expression of other transcription factors, which can in turn regulate still other transcription factors. These developmental regulatory networks are often complex, with multiple levels of cross-talk between different signaling pathways and both positive and negative feedback loops (see Figure 1). Our ultimate goal is to be able to describe all of the regulatory interactions involved in embryonic development. We are also interested in understanding how regulatory networks evolve.
cis-Regulatory modules (CRMs) are critical nodes in developmental regulatory networks, as it is here that signaling pathways and transcription factors are integrated to give rise to changes in the expression of specific genes. Mutations within CRMs have been implicated in a growing number of diseases, underscoring the importance of being able to identify and characterize them. We are using computational approaches to locate the cis-regulatory modules responsible for directing specific patterns of gene expression in a rapid and comprehensive fashion. All of our predictions are extensively tested in vivo using reporter gene assays in the fly embryo so that we can definitively assess our success rate and refine our approach to achieve better performance. SCRMshaw, our Supervised CRM discovery approach, has been developed in collaboration with Dr. Saurabh Sinha at the University of Illinois Urbana-Champaign and has proved to be highly effective for CRM discovery in both insects and mammals.
Much can be learned from studying already-known CRMs using bioinformatics approaches. For this reason, we have constructed the REDfly database of published Drosophila CRMs. This database contains more than 5500 CRMs associated with over 500 genes, along with their sequences and the expression patterns for which they are responsible. REDfly is an internationally-accessed resource that serves as a source of raw data for analysis, hypothesis generation, assessment and validation, and empirical research. As the only comprehensive, unbiased regulatory element database for any metazoan, it is widely used by researchers regardless of whether or not Drosophila is their model system of choice, and it has played a significant role in developing and validating methods that can be applied to vertebrate systems. REDfly data have been important for studies of CRM biology, for interpretation of data obtained from genome-scale experiments, for facilitating both computational and empirical CRM discovery, for developing gene regulatory network models, and for studying CRM and regulatory network evolution.
CRMs work in concert with a gene's promoter. We undertook a systematic investigation of Drosophila promoters, which revealed an unexpected complexity in the overall genomic organization of these important elements (Zhu and Halfon, 2009). The data we have collected in the REDfly database, along with promoter characterizations from that study, are allowing us to undertake both wet-lab and computational explorations of how specific CRM-promoter interactions come about.
This research is motivated in part by our conviction that current methods for investigating promoter-CRM compatibility—essentially, standard reporter gene assays—miss many important aspects of the promoter-CRM relationship. This is because the standard assays fail to take into account the genomic context: proximity of other promoters, distance between promoters and CRMs, insulator elements, and the like. We discuss this viewpoint in a review article (Atkinson and Halfon, 2014) and have been developing computational data in support of it. We have developed a flexible and efficient experimental assay to screen for promoter-CRM interactions and investigate the underlying molecular mechanisms in a more context-aware manner, both in vivo and in vitro, and hope to embark on these experiments soon.
Regulatory networks consist not only of cis-regulatory control elements but also of trans acting factors, many of which are induced by intercellular signaling events. Thus understanding intercellular signaling is an important component of defining the gene regulatory networks responsible for cell fate determination. We have been particularly interested in the various ways in which a class of signaling molecules knows as receptor tyrosine kinases (RTKs), mutations in which contribute to many birth defects and which are a major objective of targeted cancer therapeutics, produce unique downstream transcriptional responses. These receptors are often believed to be acting via identical downstream signaling cascades, but we have identified substantial points of divergence within the pathways (Leatherbarrow and Halfon, 2009). We are investigating the mechanisms behind this divergent signaling with a particular focus on the Alk RTK and its role in visceral mesoderm specification. These studies will contribute to our knowledge of RTK pathway regulation as well as identify additional genes important for mesoderm development.