More Recent Comments

Thursday, September 20, 2007

Are These Dappers?

 
Is the first figure a dapper? How about the second figure? What about the third Figure, isn't he dapper in his nice suit?

Check out the definition of this new word from Ryan Gregory, Dog's Ass Plots (DAPs).





8 comments :

T Ryan Gregory said...

The third one is certainly a dapper fellow, but I don't think he'd be a DAPper.

The first is interesting, because I am not sure what the point is -- again, no label on the x-axis. I do know that "genome size" is being used incorrectly and that these would not be the order in terms of genome sizes defined properly. C. elegans is 100Mb, D. melanogaster is 175Mb, human is 3.2Gb, Arabidopsis is 157Mb, rice is 0.5Mb, and maize is ~2.5Gb.

The second is not totally ridiculous in very (*very*) rough outline, in that it's probably true that the first genomes were simple. HOWEVER, most genomes are still bacterial. Note, too, that the arrow extends into the future indefinitely. Vive la orthogenesis!

Conclusion: two DAPper duds and a dapper stud.

Anonymous said...

I am not sure why everbody thinks the first figure is bad. Not all scientific diagrams have to be x-y plots (with metric axes). I consider bar graphs as perfectly legal diagrams: the y-axis is metric, while the x-axis is not and all the x-categories are shown in a random order - or a non-random one.
When creating bar graphs with no obvious ordering of the x-axis, it is common practice to arrange them in a way that the y-axis values have a steady trend.

Here is an example from a business setting: you want to create a bar chart showing the annual sales volume of 10 salespeople. There is no obvious way of ordering the people (you could do it by alphabet or any other silly sequence). Thus, what most people will do is arrange the 10 people by their sales figures, resulting in a steady slope. I don't see anything wrong with that.

I see the first figure (and probably also the figure in Ryan's post) belonging to this category. The x-axis doesn't say anything about increasing complexity. The species are just ordered by an increasing number of genes.

This is not to say that those diagrams are particularly intelligent, and maybe they are intended to illustrate some wrong idea. I just wouldn't condemn any bar graph that doesn't have a metric x-axis.

T Ryan Gregory said...

Kay says "I see the first figure (and probably also the figure in Ryan's post) belonging to this category. The x-axis doesn't say anything about increasing complexity. The species are just ordered by an increasing number of genes."

Sorry, but this is not correct. If that were the case, then the figure would be completely meaningless and it would not have appeared in the article. Moreover, the legend reads: "Among eukaryotes, as their complexity increases, generally so, too, does the proportion of their DNA that does not code for protein." My point is that this is clearly assumed to be what the axis is, but not labeling it gets you off the hook for having to actually provide some metric -- and hence justification -- for ranking complexity.

The issue with the figure here, as I noted, is that it uses "genome size" incorrectly. I also said that I didn't know what the point was -- maybe it's even trying to show that gene number is not related to "complexity". It could be just a list as you describe, which is fine, but that's not at all the case with the figure we were dealing with earlier.

T Ryan Gregory said...

Yknow Kay, they more I think about it the more I disagree with you. :-) I think using bar graphs with no X-axis units is almost always going to be problematic because they human brain will start to make up units. In your example, I bet most people would be plugging in "sales ability", when there could easily be other factors (territory size, by way of example). A table is what you really want in that situation. Obviously this is open for discussion, but I think those graphs will always be misleading in some way. I do know one thing, if the X-axis is intended to imply a scale of increasing complexity (or anything, really) but is not labeled or supported, then it's a Dapper.

Larry Moran said...

Kay asks,

I am not sure why everybody thinks the first figure is bad.

It's not as bad as some but the way the bars are arranged—in increasing size along the x-axis—strongly implies that there's something being measured. That something looks suspiciously like complexity or importance.

It's very hard to make this data neutral with respect to phylogeny. Look at Ryan Gregory's chart to see how it should be done. Bar graphs are definitely not the way to go.

Anonymous said...

I like the first because we are not at the right border. However, the number of genes given for us is at the upper end of the range of estimates, and the legend is crap.

The second... ugh. Vive l'orthogenèse, indeed.

d said...

It seems to me that the most important bit about a DAP is that it leads the reader to an incorrect conclusion. The eponymous Scientific American bar chart appears to place humans on a pedestal of unrivaled genomic complexity, which is particularly insidious because this reinforces the pervasive anthropocentric bias. The first chart here, however, while it may be inaccurate in conflating gene number with genetic complexity, at least gets across the message that the human genome is not the alpha and omega.

Anonymous said...

It's especially stupid how the second figure puts "Cambrian Explosion" before "Multicellular animals and plants", as if those had originated in the "Cambrian Explosion".