I routinely generate very large PDFs from R which have hundreds (or thousands) of pages, and navigating these pages can be very difficult. Unfortunately, neither R's pdf() nor its cairopdf() drivers support creating Table of Contents (or Index) while plots are being written out. In the case of cairo, the underlying library doesn't support it either, so this isn't something that can easily be added to R directly. I had been thinking about sitting down for months and writing the support into cairo and R's cairo package... but real life kept getting in the way.
Fast forward to a week ago, when I realized that pdftk
does support
dumping the table of contents and updating the table of contents using
dump_data_utf8
and update_info_utf8
! Armed with that knowledge,
and a bit of hackery, we can save an index, and then update the pdf
once it's been closed.
The R code then looks like the following:
..device.set.up <- FALSE
..current.page <<- 0
save.bookmark <- function(text,bookmarks=list(),level=1,page=NULL) {
if (!..device.set.up) {
Cairo.onSave(device = dev.cur(),
onSave=function(device,page){
..current.page <<- page
})
..device.set.up <<- TRUE
}
if (missing(page)|| is.null(page)) {
page <- ..current.page+1
}
bookmarks[[length(bookmarks)+1]] <-
list(text=text,
level=level,
page=page)
return(bookmarks)
}
write.bookmarks <- function(pdf.file,bookmarks=list()) {
pdf.bookmarks <- ""
for (bookmark in 1:length(bookmarks)) {
pdf.bookmarks <-
paste0(pdf.bookmarks,
"BookmarkBegin\n",
"BookmarkTitle: ",bookmarks[[bookmark]]$text,"\n",
"BookmarkLevel: ",bookmarks[[bookmark]]$level,"\n",
"BookmarkPageNumber: ",bookmarks[[bookmark]]$page,"\n")
}
temp.pdf <- tempfile(pattern=basename(pdf.file))
temp.pdf.info <- tempfile(pattern=paste0(basename(pdf.file),"info_utf8"))
cat(file=temp.pdf.info,pdf.bookmarks)
system2("pdftk",c(pdf.file,'update_info_utf8',temp.pdf.info,'output',temp.pdf))
if (file.exists(temp.pdf)) {
file.rename(temp.pdf,pdf.file)
} else {
warning("unable to properly create bookmarks")
}
}
and can be used like so:
cairopdf(file="testing.pdf")
bookmarks <- list()
bookmarks <- save.bookmark("First plot",bookmarks)
plot(1:5,6:10)
bookmarks <- save.bookmark("Second plot",bookmarks)
plot(6:10,1:5)
dev.off()
write.bookmarks("testing.pdf",bookmarks)
et voila. Bookmarks and a table of contents for PDFs.
This basic methodology can be extended to any language which writes
PDFs and does not have a built-in method for generating a Table of
Contents. Currently, the usage of Cairo.onSave
is a horrible hack,
and may conflict with anything else which uses the onSave hook, but
hopefully R will report the current page number from Cairo in the
future.