While setting up my personal page with Hakyll, I have
discovered that there is no simple copy-pastable solution for generating
a list of publications from a .bib
file. In this post I
show a way to manually transform contents of a bibliography file into a
nicely formatted markdown. You can copy my code and very quickly adapt
it to your needs. The final result is available on
github.
When it comes to references, Pandoc does have built-in citation
processing machinery, and it can be used with Hakyll to cite works in
blog posts (see, for example, this
guide). As for obtaining a plain publication list, the main
suggestion seems to be to use an empty file containing a
\nocite{*}
command to make Pandoc list the references.
However, I have found that this method is impossible to customize! The basic requirements are: reverse chronological sorting of references and no “unique names”, which means that repeated author combinations should be written out in full. With biblatex, this can be solved with a combination of settings:
\usepackage[backend=biber, style=numeric, sorting=ydnt, firstinits=true, uniquename=false]{biblatex}
With Pandoc, this problem seems to require a manual approach. I will show the basic code to process a personal bibliography.
Setup
Make sure that your project includes the following dependencies from
Hackage: text
, pandoc
, parsec
,
bibtex
. The bibtex
package provides
unsophisticated parsers of .bib
files, which is exactly
what we need.
Code
Let’s begin with some necessary imports:
module Bib (publicationList) where
import Control.Applicative ( (<|>) )
import Data.List ( sortOn )
import qualified Data.Text as T
-- contains the type of a bibtex entry
import Text.BibTeX.Entry ( T(..) )
-- contains bibtex parsers
import qualified Text.BibTeX.Parse as P
import Text.Parsec.String ( parseFromFile )
import Text.Pandoc ( runPure, readLaTeX, writeMarkdown, def )
Now, let’s write the main logic of the module: the function to read the file, run the parser, and call a proper formatter on each entry.
publicationList :: FilePath -> IO [String]
= do
publicationList filename -- read and parse bibtex entries
<- readPubs filename
pubs -- descending sort by year, then format and return
return $ map format $ reverse $ sortOn (`field` "year") pubs
-- runs a parser P.file on a fiven filename
-- panics when there is a parsing error
= do
readPubs filename <- parseFromFile P.file filename
res case res of
Left e) ->
(error $ "Parsec error in parsing .bib file "
<> filename <> ":\n" <> show e
Right pubs) -> return pubs
(
-- reads the entry type
-- and then calls the corresponding formatter
= case entryType pub of
format pub "article" -> formatArticle pub
"misc" -> formatMisc pub
"unpublished" -> formatUnpublished pub
-> error $ "unsupported .bib entry format: " <> fmt fmt
Finally, let’s write some formatters! For my page, I have settled on roughly the following format: first go the authors, then the year, then the title in quotation marks, and then maybe journal information and a URL. Of course, this is mostly arbitrary.
We need several convenience functions first. The fields of the entry
are parsed into an association list, so we can adapt lookup
to get their values.
-- for mandatory fields:
-- throw an error if not present
field :: T -> String -> String
=
field pub a case lookup a (fields pub) of
Nothing ->
error $ "bibliography error: cannot find field "
<> a <> "in entry " <> identifier pub
Just v -> v
-- for optional fields
maybeField :: T -> String -> Maybe String
= lookup a (fields pub) maybeField pub a
Then, some extra embellishment functions. The most important is
texToMarkdown
, which converts any LaTeX syntax to Markdown
by passing the string through Pandoc. I use it to render the “notes”
field for some entries, but it can potentially be used for any field to
get rid of diacritics or other syntax.
texToMarkdown :: String -> String
=
texToMarkdown s let result = runPure $ do
<- readLaTeX def (T.pack s)
x
writeMarkdown def xin case result of
Left e ->
error $ "error reading latex commands in the string "
<> show s <> ":\n" <> show e
Right s') -> T.unpack (T.strip s') <> ". "
(
= "[" <> s <> "](https://doi.org/" <> s <> "). "
makelink s
= "_" <> s <> "._ "
italicize s
Just s) = s
maybeToStr (Nothing = "" maybeToStr
Finally, here is how we can implement the formatters:
= field pub "author"
formatArticle pub <> ". ("
<> field pub "year"
<> ") \""
<> field pub "title"
<> ".\" "
<> italicize (field pub "journal")
<> maybeToStr (do
<- maybeField pub "volume"
vol <- maybeField pub "pages"
pages return $ vol <> ": " <> pages <> ". ")
<> makelink (field pub "doi")
= field pub "author"
formatMisc pub <> ". ("
<> field pub "year"
<> ") \""
<> field pub "title"
<> ".\" "
<> maybeToStr (fmap italicize (maybeField pub "journal" <|>
"publisher"))
maybeField pub <> makelink (field pub "doi")
= field pub "author"
formatUnpublished pub <> ". ("
<> field pub "year"
<> ") \""
<> field pub "title"
<> ".\" "
<> maybeToStr (fmap texToMarkdown (maybeField pub "note"))
<> maybeToStr (fmap makelink (maybeField pub "doi"))
I have tried to deal with optional fields in a “monadic” way, by
using fmap
and <|>
operators, and I
think it looks acceptable in the end.
Use it yourself
The source file for this post is available here. A result of runnning this script can be seen at my homepage (source).
To use it with Hakyll, you can use a listField
to pass a
list of references into a template. In the main file, you can do
something like this:
= do
main <- publicationList "pubs.bib"
bibliography $ do
hakyllWith config ...
"index.md" $ do
match $ setExtension "html"
route $ do
compile ...
let bibliography' = mapM makeItem bibliography
= field "pub" (return . itemBody)
bibCtx let ctx =
"bibliography" bibCtx bibliography' `mappend`
listField ...
...
getResourceBody>>= applyAsTemplate indexCtx
>>= renderPandoc
...
>>= loadAndApplyTemplate "templates/default.html" ctx
>>= relativizeUrls
Please reach out to me with comments and suggestions!