Philip Eichinski

View Philip Eichinski's profile on Research Gate

← All articles

The 2016 IEEE international Conference on eScience attracted a wonderful assembly of researchers from around the world to Baltimore to share some very interesting work concerning ways in which computer science is being used to support and enhance traditional science.

On the agenda were topics such as big data, parallel computing, automation of computational workflows, reproducibility and provenance, machine learning applications, and remote sensing of environmental data. A nice addition to this year’s conference was to have authors supplement their presentations with poster sessions, which encouraged longer discussions about the research than was possible during the presentation question-time.

Particularly interesting for those of us working in remote bird monitoring was the awesome work by Travis Desell and his students Marshall Mattingly and Connor Bowley. This team have deployed many video cameras to monitor various species of birds, and are using a combination of citizen science, crowdsourcing of computation and machine learning and image processing to turn these raw images into ecologically meaningful data.

They keynote talks were all impressive. Tony Hey gave an engrossing key-note talk on the convergence of data-intensive and compute-intensive infrastructure, in particular the work that is being done at the Rutherford Appleton Lab of the UK Science and Technology Facilities Council, including using the Diamond synchrotron for imaging all sorts of things from fossils to molecules. And of course he talked about the computing resources and expertise required to handle and analyse the large amount of data generated by these kind of experiments. Anastasia Ailamaki explained her team’s approach the the growing challenge of querying increasingly hetrogeneous data. Finally, Jonathan Pevsner gave a mind-boggling crash course in genomics and how it is being used to understand disease, and notably gave an example of how discoveries can be made through publicly available data and outsourced cloud computing, which was a great illustration of data-intensive scientific discovery in action.

My presentation, Datatrack: An R package for managing data in a multi-stage experimental workflow (Paper link:, source code:, was well received and I got some great feedback from some very experienced people in the area of data provenance, especially Daniel Garijo from the University of Southern California, who worked through some ideas with me about making the provenance graph compatible with the PROV specification.

All in all, the conference was a very enjoyable and valuable experience. I also squeezed in a bit of sight-seeing, most of which was at various museums such as the National Gallery, Smithsonian National Air and Space Museum, Museum of Natural History, and the wonderful Baltimore aquarium.


  ← All articles