Dropbox Recursive Downloader

I'm working on some analyses for the Genetic Analysis Workshop #19, which has placed it's data on Dropbox. Unfortunately, Dropbox doesn't allow for people to download zip archives larger than 1GB, and the data was made available in an unpacked structure with more than a hundred files. Some searching indicated that no one had written a recursive downloader for Dropbox, so 30 minutes of hacking with WWW::Mechanize later, I wrote a simple recursive downloader for Dropbox.

Two hours later, all of the files had downloaded.

Debian Booth at Scale 12x

I spent the weekend at SCALE 12x running the Debian booth. SCALE is one of the best conferences that I get to attend every year; it has a great mix of commercial exhibitors and community groups, and routinely gets great speakers. As I've done for quite some time, I organized a Debian booth there, and talked to lots of people about Debian.

If you're in the Southern California area, or have a chance to give a talk for SCALE 13x, you should do so! Thanks again to Matt Kraai and Paul Hardy for helping out in the Debian booth all weekend!

Plotting Embedded Bitmap in Vector Plot in R

Recently, one of my collaborators complained that one of my plots took forever to render on his machine. The particular plot in question had a few thousand points, many of which were overlapping. Ideally, R would be able to simplify the vector image which was drawn to avoid drawing points which were occluded by other points, but this is difficult to do properly, and R currently doesn't do it.

However, R is able to plot to a bitmap, and bitmap images have the nice property of automatically handling this for you. Furthermore, raster images have recently been made far less clunky in R, so it's pretty easy to shove an arbitrary bitmap image anywhere. With dev.capture in Cairo coupled with grid.raster in grid, we have everything we need to solve this problem:

require(grid)
require(Cairo)

start.rasterplot <- function(width=NULL,height=NULL) {
    x.y.ratio <- convertX(unit(1,"npc"),"mm",valueOnly=TRUE)/
        convertY(unit(1,"npc"),"mm",valueOnly=TRUE)
    width.points <- as.numeric(convertX(unit(1,"npc"),"points"))
    dpi <- as.numeric(convertX(unit(1,"inch"),"point"))
    if (is.null(width) && is.null(height)) {
        width <- 1024
    }
    if (is.null(width)) {
        width <- height*x.y.ratio
    }
    if (is.null(height)) {
        height <- width/x.y.ratio
    }
    Cairo(width=width,height=height,dpi=1024/width.points*dpi,file="/dev/null")
}

stop.rasterplot <- function(plot=TRUE) {
    raster.image <- dev.capture(native=TRUE)
    dev.off()
    if (plot) {
        grid.raster(raster.image,width=unit(1,"npc"),height=unit(1,"npc"))
        return(invisible())
    } else {
        return(raster.image)
    }
}

Now we can do the following:

pdf(file="raster.pdf")
start.rasterplot()
print(xyplot(y~x,
      data=data.frame(y=rnorm(1E8),x=rnorm(1E8))))
stop.rasterplot()
dev.off()

and our PDF will contain a raster image, and will load in seconds instead of taking forever to plot the file.

Working with Org-mode: Committing Changes Everywhere

I'm a huge fan of Org-mode, and I keep all of my org-mode files in git repositories which are under myrepos control.

However, because I often make lots of changes to my agenda and notes, I hate having to manually visit each individual project and make changes to it. [And it's also annoying when I forget to commit a specific change and then have to try to get my laptop and desktop back into sync.]

Luckily, myrepos can easily run a command in parallel in all of the repositories! The following "update_org_files" command will update all of my org-file containing repositories in parallel:

#!/bin/bash

ORG_GREP='-e .org$ -e .org_archive$ -e .org_done$'

if [ "x$1" == "xdoit" ]; then
    if git status --porcelain -z | grep -z '^ M' | grep -zq $ORG_GREP; then
        git status --porcelain -z | grep -z '^ M' | grep -z $ORG_GREP | \
            sed -z 's/^ M//g' | \
            xargs -0 git commit -m'update org files'
        git push;
    fi; 
else
    emacsclient -n -e '(org-save-all-org-buffers)' >/dev/null 2>&1
    mr -d ~ -j5 run update_org_files doit;
fi;

An updated version of this lives in my git repository

Biblatex format for Genes & Immunity

Here's a biblatex format for Genes & Immunity (a Nature imprint) which I needed recently:

The following code in the preamble does almost all of the hard lifting:

\usepackage[backend=biber,hyperref=true,doi=true,url=false,isbn=false,maxbibnames=6,minbibnames=6,sorting=none,firstinits=true,terseinits=true,autocite=inline,style=numeric-comp]{biblatex}
\renewbibmacro{in:}{%
  \ifentrytype{article}{}{%
  \printtext{\bibstring{in}\intitlepunct}}}
% from http://tex.stackexchange.com/questions/12806/guidelines-for-customizing-biblatex-styles
\DeclareCiteCommand{\parencite}[\mkbibbrackets]
  {\usebibmacro{cite:init}%
   \usebibmacro{prenote}}
  {\usebibmacro{citeindex}%
   \usebibmacro{cite:comp}}
  {}
  {\usebibmacro{cite:dump}%
   \usebibmacro{postnote}}
\DeclareMultiCiteCommand{\parencites}[\mkbibbrackets]{\parencite}{\multicitedelim}
\DeclareFieldFormat
  [article,inbook,incollection,inproceedings,patent,thesis,unpublished,manual]
  {title}{#1\isdot}
\DeclareFieldFormat{journaltitle}{\mkbibemph{#1}}
\DeclareFieldFormat[article,periodical]{volume}{\mkbibbold{#1}\addcolon}
\DeclareFieldFormat{year}{#1}
\DeclareNameAlias{default}{last-first}
\DeclareFieldFormat{pages}{#1}
% from http://tex.stackexchange.com/questions/17583/biblatex-remove-commas-between-last-and-first-names-in-bibliography
% remove commas between authors and first inits
\renewcommand*{\revsdnamepunct}{}
% from http://tex.stackexchange.com/questions/40798/how-do-i-get-et-al-to-appear-in-italics-when-using-textcite-or-citeauthor-w
% make et al. /et al./
\renewbibmacro*{name:andothers}{% Based on name:andothers from biblatex.def
  \ifboolexpr{
    test {\ifnumequal{\value{listcount}}{\value{liststop}}}
    and
    test \ifmorenames
  }
    {\ifnumgreater{\value{liststop}}{1}
       {\finalandcomma}
       {}%
     \andothersdelim\bibstring[\emph]{andothers}}
    {}}
% \renewbibmacro{journal}{#1}%
% from http://tex.stackexchange.com/questions/6743/biblatex-changing-the-order-of-entries
\renewbibmacro*{journal+issuetitle}{%
  \setunit*{\addspace}%
  \usebibmacro{journal}%
  \setunit*{\addspace}%
  \printfield{year}%
  \setunit*{\addspace}%
  \iffieldundef{series}
    {}
    {\newunit
     \printfield{series}%
     \setunit{\addspace}}%
  \newunit{\addsemicolon\space}%
  \printfield{volume}%
  \setunit*{\addspace}%
  \newunit
}
\renewbibmacro*{title}{%
  \newunit
  \ifboolexpr{
    test {\iffieldundef{title}}
    and
    test {\iffieldundef{subtitle}}
  }
    {}
    {\printtext[title]{%
       \printfield[titlecase]{title}%
       \setunit{\subtitlepunct}%
       \printfield[titlecase]{subtitle}}%
     \newunit}%
  \printfield{titleaddon}}
\renewbibmacro*{publisher+location+date}{%
  \setunit*{\addspace}%
  \printtext[parens]{\printlist{location}%
  \iflistundef{publisher}
    {\setunit*{\addcomma\space}}
    {\setunit*{\addcolon\space}}%
  \printlist{publisher}%
  }%
  \newunit}
Dealing with Greenhouse Whiteflies

This weekend, I finally got around to dealing with the Greenhouse whitefly infestation we've had on the basil and mint in the kitchen window. We've previously tried Pyrethrin-based insecticides, but eventually the whiteflies came back.

After a bit of research, it appears that insecticide resistance is a fairly common occurrence with whiteflies, and other alternative approaches are needed to manage the infestation. In our case, since the plants are relatively small, I opted for drowning the plants for a few moments, and then putting yellow sticky traps out. Hopefully this will at least let the basil come back and keep the whiteflies under wraps.

Posted
Using Mutt with Org Mode (with refile)

I use org mode extensively, and had added Zack's workflow for integrating mutt with org mode to my ~/.emacs some time ago.

However, I've been annoyed that refiling closes the org-capture frame before refiling finishes. The following trivial modification to Zack's code (which I previously modified to work with org-mode >= 0.8) waits to close the frame until you've finished refiling.

(require 'org-protocol)
(add-hook 'org-capture-mode-hook 'delete-other-windows)
(setq my-org-protocol-flag nil)
(defadvice org-capture-finalize (after delete-frame-at-end activate)
  "Delete frame at remember finalization"
  (progn (if my-org-protocol-flag (delete-frame))
         (setq my-org-protocol-flag nil)))
(defadvice org-capture-refile (around delete-frame-after-refile activate)
  "Delete frame at remember refile"
  (if my-org-protocol-flag
      (progn
        (setq my-org-protocol-flag nil)
        ad-do-it
        (delete-frame))
    ad-do-it)
  )
(defadvice org-capture-kill (after delete-frame-at-end activate)
  "Delete frame at remember abort"
  (progn (if my-org-protocol-flag (delete-frame))
         (setq my-org-protocol-flag nil)))
(defadvice org-protocol-capture (before set-org-protocol-flag activate)
  (setq my-org-protocol-flag t))

Now, the frame automatically disappears after you refile it, keeping my refile.org clean.

Biblatex format for AJHG

I'm working on a paper on the genetic basis of lupus, which I'm submitting to the American Journal of Human Genetics, and since I've recently switched to the wonderful biblatex and biber from the standard bibtex, I've had to figure out how to customize the bibliography and citation format to fit the standards of the journal. Luckily, there are lots of good examples on TeX's stackexchange, which enabled me to figure out how to do all of this.

The following code in the preamble does almost all of the hard lifting:

\usepackage[backend=biber,hyperref=true,doi=false,url=false,isbn=false,maxbibnames=10,minbibnames=10,sorting=none,firstinits=true,autocite=superscript,style=numeric-comp]{biblatex}
\renewbibmacro{in:}{%
  \ifentrytype{article}{}{%
  \printtext{\bibstring{in}\intitlepunct}}}
% from http://tex.stackexchange.com/questions/12806/guidelines-for-customizing-biblatex-styles
\DeclareFieldFormat
  [article,inbook,incollection,inproceedings,patent,thesis,unpublished,manual]
  {title}{#1\isdot}
\DeclareFieldFormat{journaltitle}{#1}
\DeclareFieldFormat[article,periodical]{volume}{\mkbibemph{#1}}
\DeclareFieldFormat{year}{(#1)}
\DeclareNameAlias{default}{last-first}
\DeclareFieldFormat{pages}{#1}
%\renewbibmacro{journal}{#1}%
% from http://tex.stackexchange.com/questions/6743/biblatex-changing-the-order-of-entries
\renewbibmacro*{journal+issuetitle}{%
%   \setunit*{\addspace}%
%   (\printfield{year})%
%  \usebibmacro{date}%
  \setunit*{\addspace}%
  \usebibmacro{journal}%
  \setunit*{\addspace}%
  \iffieldundef{series}
    {}
    {\newunit
     \printfield{series}%
     \setunit{\addspace}}%
   \newunit%
  \printfield{volume}%
%  \setunit{\addspace}% DELETED
%  \usebibmacro{issue+date}% DELETED
%  \setunit{\addcolon\space}% DELETED
%  \usebibmacro{issue}% DELETED
  \newunit}
\renewbibmacro*{title}{%
  \printfield{year}%
  \setunit*{\addspace}%
  \newunit
  \ifboolexpr{
    test {\iffieldundef{title}}
    and
    test {\iffieldundef{subtitle}}
  }
    {}
    {\printtext[title]{%
       \printfield[titlecase]{title}%
       \setunit{\subtitlepunct}%
       \printfield[titlecase]{subtitle}}%
     \newunit}%
  \printfield{titleaddon}}
\renewbibmacro*{publisher+location+date}{%
  \setunit*{\addspace}%
  \printtext[parens]{\printlist{location}%
  \iflistundef{publisher}
    {\setunit*{\addcomma\space}}
    {\setunit*{\addcolon\space}}%
  \printlist{publisher}%
%  \setunit*{\addcomma\space}%
%  \usebibmacro{date}%
  }%
  \newunit}

This, coupled with:

\newcommand{\citep}[1]{\autocite{#1}}
\newcommand{\citet}[1]{\citeauthor{#1}\autocite{#1}}

enables my standard natbib workflow of \citep and \citet to work properly too. Eventually I'll move to just using \autocite everywhere, but for now, that's good enough.

libravatar for the BTS (and boring encoding fixes)

While working on fixing a few encoding problems that I managed to introduce to the BTS almost half a year ago, I took a side bit of coding, and introduced libravatar support to the BTS. Every e-mail now has an avatar to the right which should correspond to the sender. Libravatar is a federated service, which means that if you control your domain, you can serve your own icons. It also automatically falls back to gravatar, so if you're using that service, things should "just work". Hopefully this will be primarily amusing, and people won't abuse it.

More importantly, but much less fun, the double encoding problems (where mails would get double-encoded if any of the headers contained non-us-ascii text), and mojibake wontfix icon (☹) should be fixed now. If you see any additional cases of this, please report them to owner@bugs.debian.org.

Bug Reporting Rate in Debian

Christian's most recent blog post got me wondering if the decline in the bug reporting rate in Debian was something new, or something which often happened during releases. So, lets try to figure that out. In the BTS, when a bug report is filed, the report is written to a file called bugnum.report, and then not touched from then on. Let's look at the modification date on that file to see when each bug was filed; and since we're going to plot this, lets only look at bugs ending in 00:

stat -c '%n %Y' /srv/bugs.debian.org/spool/{archive,db-h}/00/*.report > ~/reporting_rate.txt

Now, lets get the data into R and plot it. [For clarity, I'm not showing the R code, but it's available in the source code for this post.]

From the plot (Bugs reported per second over time with a red loess fit line), it looks like we do see a decline during certain periods. However, there's an even more alarming trend of a decrease in bug reporting in Debian which has been happening since 2006. (Note that I've truncated the y scale significantly; there are periods in Debian where the bug rate is astronomically high, usually corresponding to mass bug filings; I've also limited the plot to data from 2003 on, as I have to clean up that data significantly before I can plot it like this.)

Not sure exactly what that means, but it is troubling.

This blog is powered by ikiwiki.