Because people are stupid

So I see LJ has a new interface that looks like a mash-up of FB/G+/Ello/WTF and the pointless "portal" sites that never seem to go away. So far we can use the old interface; wonder how long that will last. If they try to force everyone onto the new style, I suspect there will be nothing keeping us holdouts here, at all. [sigh]

Bioinformatics: sequences and stuff

Pinning down exactly where bioinformatics got its start is tricky business--you could make a good argument that it goes back a century or so, to Fisher's pioneering work in population genetics--but in the modern sense, it mainly goes back to gene sequencing, which started in the 1970s and has been happening faster ever since. Think about how much faster the computer you're reading this on is than its equivalent from forty years ago, and then consider that our ability to read DNA sequences has grown even faster than that.

For a long time, from that early work to the time I started studying the subject around the turn of the century (get off my lawn!) the focus was on DNA sequences: first of specific genes, then of entire genomes. The genome can be defined as the complete set of genes an organism has, although that definition gets blurry sometimes as I'll discuss later. Phrases like "the human genome" refer to the consensus sequence for the species as a whole. You have your own genome, and I have mine, and mostly they're identical; in the places where they differ, they do so in ways that can be categorized and statistically described. The original Human Genome Project obtained the genomes of five people, and created a consensus sequence from that; the 1000 Genomes Project, as the name implies, is considerably more ambitious (and they're well over that original thousand now). The more samples we have to build a consensus sequence from, the more we know about what individual sequence variations mean.

Applications of this technology include "DNA fingerprinting," since everyone except identical twins has their own unique sequence; evolutionary biology, since we share most of our DNA sequence not only with our fellow humans but also (in decreasing amounts) with monkeys, mice, ravens*, salamanders, fruit flies, and brewer's yeast, and a good chunk of it with mushrooms and bananas and our own gut bacteria; and understanding the genetic basis of heritable traits, including most obviously diseases like diabetes, schizophrenia, and susceptibility to various cancers. This is by no means an exhaustive list! The more genomic data we have, the more things we find to do with it. In my own field of research, we look for alleles (which are, remember, sequence variations in certain genes) that occur frequently in populations that have lived at high altitude for a long time in the Andes, the Himalayas, and the Ethiopian highlands, and compare the frequency of those alleles to those observed in populations that live closer to sea level (which is almost everyone else in the world--even if you live at high altitude, your ancestors probably didn't). We're looking for sequence variations that confer resistance against hypoxia ... and that's just the start.

But the DNA sequence itself, the familiar combination of adenine (A), cytosine (C), guanine (G), and thymine (T), only explains a part of the variation we see in heritable traits. A "gene" is more than just a stretch of DNA that codes for proteins. It's maybe best seen as a unit of inheritance, which includes the sequence of the coding portion, the sequences outside the coding portion that regulate transcription of the gene, the proteins (histones) that the DNA strand coils around like thread around a spool, and chemical modifications to the individual DNA bases that can affect their function. All of this gets lumped under the general category of epigenetics: things that are "around the genome." Really, though, we're learning that we need to expand our definition of genes and genomes to encompass everything we inherit from our parents and grandparents and great-grandparents ... and that we have all inherited from our common ancestors, the common threads that make up the tapestry of life.

Genes are pretty nifty things even when they're just sitting there, but when they start actually doing stuff, they get much more interesting. Next time I'll talk more about regulation, which is pretty much at the heart of my research.

*You knew I'd find a way to talk about dinosaurs here somehow, right? Of course you did.

Other entries in the series here.

Bioinformatics: what is all this, anyway?

At a few people's requests, I'm going to try to start writing semi-regularly about what I do. I can't promise how often I'll post, but I will do my best. Questions are welcome.

Okay. My capsule definition of bioinformatics is "the computational analysis of biological data." The Wikipedia article is also a pretty good place to start. But both of these are very general. So what is it that I do? (Because, you know, that's the really important part ...)

Let's start with a little biology. Our genome is made up of chromosomes, and each of these chromosomes is a tightly coiled strand of DNA. Almost all the cells in our bodies have a complete set of our chromosomes (there are exceptions, but I'm not going to get into it right now) and each of those chromosomes carries on it the sequences for hundreds or thousands of genes. A "gene" has classically been defined as a stretch of DNA that codes for a particular protein, but the more we learn about how all this stuff works, the more we realize that there are genes that code for all kinds of different things.

It all starts with the transcription of DNA to RNA. There are proteins that "walk" along the DNA strands and copy the sequence of a particular gene into RNA. Sometimes these RNA strands just stick around and perform various functions in the cell. More often, they're handed off to other specialized proteins that translate the RNA sequence into an amino acid sequence, and these amino acids are used to build (still more) proteins. Proteins are the workhorses of biology. DNA does things, RNA does things, various other chemicals do all kinds of things, but proteins do most of the day-to-day work of keeping our bodies (and all living things) going.

Now, even though most of our cells contain copies of all the genetic information we have, they can't all be transcribing all the genes and translating the RNA into proteins all the time. Think about the digestive enzymes produced in your stomach and intestines. Your brain has the genes that code for these enzymes too ... but if it starts producing them all of a sudden, you're in big trouble! The functioning of pretty much every cell, tissue, and organ depends on the precise regulation of these genes, turning them on and off at exactly the right times.

There's also the question of the gene sequences themselves. Most genes come in several different variants, called alleles, with slightly different sequences, that therefore code for slightly different RNAs and proteins. It's the combination of these alleles that defines our genetic makeup. My hair-color genes have red alleles, for example. The inheritance of a combination of alleles from our parents is why certain traits--everything from relatively trivial things like hair color to serious genetic diseases--tend to run in families.

Okay, that's the background. Next installment, I'll talk about how heroic, square-jawed bioinformaticists delve fearlessly into all this messy biological goo to unravel the secrets of life itself.

Other entries in the series here.

The exciting life of the bioinformaticist.

Babysitting my computer while it crunches lots of numbers to try to find the one datum that's giving it fits may qualify as SCIENCE, but it sure doesn't feel like it. :/ I really want to get back to doing something with the results of all the number-crunching, you know?

  • Current Mood
    tired tired

Because some people are just rock stupid.

"Andrew Kilian, wow, way to miss the point. The point, since you're obviously unable to grasp it, is this:

"Jim Crow was wrong. Sexual harassment is wrong. In both cases, the way to deal with the wrong is not to say 'the reality is otherwise,' but to challenge the people who make it a reality until they stop doing so.

"Now do you understand, or did I use too many words with more than one syllable?" -- Daniel Dvorkin

  • Current Mood
    annoyed annoyed

Sometimes the snark just gets the better of me.

"Well, I read all the way through your missive. I have to admit I don't have a lot to say, because although you may have some points in there, the word salad makes it hard to figure out what they are. If you want to write something a little more coherent, I might decide you're worth the time it would take to form a more detailed reply. BTW, from what little I can glean, your view of the relevant history has only the barest connection to reality, so you might want to address that too." -- Daniel Dvorkin
  • Current Mood
    predatory predatory