I realize that my last post was a bit obscure; I find that kind of thing amusing when I'm punchy. But for anyone who wants to know more about what I actually do, read on.
One of the basic problems in bioinformatics these days (IMO; others might disagree) is that the low-hanging fruit has been picked. We've done some great things with sequence alignment, microarray expression analysis, etc., but while there is still much to be learned from these approaches and much work to do in improving them, they're becoming pretty standardized approaches these days for specific experiments needed to confirm specific biological hypotheses, rather than the frontiers they used to be.
What is still a frontier is combining these multiple sources of information. Most bioinformatics data is, to put it mildly, noisy. Looking at expression data, say, for a stretch of the genome of any significant size is like trying to reconstruct a theater-quality movie from a snowy image on an old black-and-white TV set. The advantage we have is that we know that all the signals are pretty much coming from the same place -- they're all trying to tell us about the same thing -- and if we can look at the genome from multiple angles, we can figure out more of the underlying truth. Stretching the cinematic metaphor until it screams, suppose that we have not only the fuzzy TV broadcast, but also parts of the script and a few of the props, as well as a giant back-lot warehouse that might ... just might ... contain a high-quality master reel, if we can find it under all the junk.
Specifically, the data I'm working with is microarray expression, transcription factor binding, and sequence conservation, all of which cover the entire genome to a greater or lesser degree. The first is specific to actual genes, the second comes from more-or-less evenly spaced probes across the genome, and the third has base-pair-by-base-pair coverage. But they all feed in to an understanding of the same biological processes, whatever those processes may be
In the particular case of the data I'm working with right now, it's the development of wing shape in D. melanogaster, but it can be anything, in any organism. Specifically, it can be disease processes in human beings, which of course is kind of the ultimate point of the exercise. But flies are a lot easier to breed than people, and ethics boards tend to frown on things like deliberately mutating experimental groups of human subjects, anyway.
So we look at a bunch of flies, some of which have straight wings and some of which have curly wings (I will not descend here into the inevitable flamewar over the politics of insectile racism) and we gather expression and binding data from them, as well as conservation data from melanogaster and various related fly species. The first two types of data are phenotype-specific, i.e., having to do with the specific phenomenon under study. The third is of more general biological significance: highly conserved areas of the genome tend to be those involved in processes which are absolutely necessary for survival, and for a fly, wing development clearly falls into the category.
Expression and binding give us hints as to what parts of the genome are involved. Conservation acts more as a filter. Say a certain gene appears to be more highly expressed in the curly-winged than the straight-winged flies. Is this meaningful? Maybe it is, or maybe it's noise, a pattern of snow on the TV screen that just happens to look like a well-known actor. (And may have more acting talent, too.) If there's lots of transcription factor binding in the area too, that lends support to the hypothesis that something real is going on. And if the region in which this occurs is highly conserved, then we have a solid argument. Otherwise, it's probably time to move on to another region of the genome ... and there's a lot of the genome to consider.
What had me excited last night was this. We have a list of target genes, genes which are thought for various reasons, including direct experimental evidence, to be significant in wing shape development. We also, of course, have thousands of genes which may or may not have anything to do with wing shape development, but probably don't. To have some indication that our method of combing the data actually means something, we want to see the areas of the genome around our target genes identified as significant, along with a very few (but not zero!) of the other genes in the data set. Confirm what we think we know, and then find things we don't know: that, in a nutshell (IMO, YMMV, etc.) is how science works.
And that is what happened last night. I'll be presenting preliminary results of this work at a conference in a couple of weeks; it's good to know I'll be going in prepared. And in the longer term ... this is stuff that matters. I have often regretted leaving patient care for research, or at least looked back on my days of directly saving lives and relieving human suffering with a great deal of nostalgia. But if it turns out that over the course of my entire research career there is one major human disease which we are able to puzzle out, in part, by using methods I've developed ... well, then, I will have done more than I could ever have done as a medic or a physician. And that's pretty much why I made the move in the first place.
|