[SE04-HUM] Data science and machine learning for development and humanitarian action

DAY 1 – Wensday 27 June – 1:30pm-3:00pm
Swiss Tech | Room 1A | Level Garden

Session Leaders

Robert West
DLAB, EPFL, Switzerland

Bob West is a tenure-track assistant professor of Computer Science at EPFL, where he heads the Data Science Lab. His research aims to make sense of large amounts of data. Frequently, the data he analyzes is collected on the Web, e.g., using server logs, social media, wikis, online news, online games, etc. He distills large quantities of raw data into meaningful insights by developing and applying algorithms and techniques in areas including social and information network analysis, machine learning, computational social science, data mining, natural language processing, and human computation.

Skyler Speakman
IBM Research, Africa

Skyler Speakman is a Research Scientist at IBM Research — Africa.  His projects use data science to impact the lives of millions of people on the continent.  He believes that data collected through phones and drones will fundamentally change service delivery and African development in the next decade.  Skyler completed a Ph.D. in Information Systems at Carnegie Mellon University as well as a M.S. in Machine Learning.  He also holds masters in Mathematics, Statistics, and Public Policy.  He lives in Nairobi, Kenya with his wife and two young sons.



13:30-13:55: Opening keynote: Stefano Ermon (Stanford University): “Artificial Intelligence for Sustainability”

13:55-14:05: Poster presentation: Wesley van der Heijden (The Netherlands Red Cross): “Combining Open Data and Machine Learning to Predict Food Security in Ethiopia”

14:05-14:15: Poster presentation: Leonardo Milano (Internal Displacement Monitoring Centre): “Monitoring Migration and Internal Displacement: Filling the Data Gaps with Innovation”

14:15-14:40: Closing keynote: Maria De-Arteaga (Carnegie Mellon University): “Machine Learning for the Developing World”

14:40-15:00: Podium discussion with all 4 speakers, moderated by Skyler Speakman (IBM Nairobi)


It is widely anticipated that data science in general and machine learning in particular, will revolutionize our society as a whole. Due to ever larger and more fine-grained data sets, as well as advances in computing hardware and learning algorithms, we are bound to see a whole new world of opportunities to bring about ground-breaking changes, which could expedite the development of low- and middle-income countries. This session will look into promising applications of data science in development and humanitarian action.


Monitoring Migration and Internal Displacement – Filling the Data Gaps with Innovation

Justin Ginnetti1, Leonardo Milano1

1Internal Displacement Monitoring Centre, Switzerland

Email address: justin.ginnetti@idmc.ch

Biography of Presenting Author: Justin Ginnetti is IDMC’s Head of Data and Analysis. He joined IDMC in 2012 after having served as a policy officer at the UN’s office of disaster risk reduction (UNISDR). He served as a chapter scientist and contributing author of the IPCC’s Special Report on Extreme Events and Disasters (SREX).

Justin holds a master’s degree in law and diplomacy from the Fletcher School at Tufts University, where he studied climate change-induced displacement and forced migration of agro-pastoralists in the Horn of Africa.


The global picture on internal displacement is currently incomplete. This underestimate poses a significant challenge for achieving progress toward several global policy targets. In September 2017, IDMC began using the Internal Displacement Event Tagging and Clustering Tool (IDETECT). IDETECT mines huge news datasets, such as The GDELT Project, the European Media Monitor, and social media platforms. Using natural language processing and machine learning algorithms, IDETECT classifies reports by type of displacement, and extracts from those source documents information about location and the number of people displaced – in real time. IDMC has only just begun to use IDETECT and will exploit it to its fullest potential over the coming months and years.

Combining Open Data and Machine Learning to predict Food Security in Ethiopia

Wesley van der Heijden1,2,3, Marc van den Homberg1, Martijn Marijnis2, Marijke de Graaff2, Hennie Daniels3

1Netherlands Red Cross, Netherlands, The; 2ICCO; 3Tilburg University,Netherlands, The

Email address: mvandenhomberg@redcross.nl

Biography of Presenting Author: Applied researcher with a focus on improving preparedness and response to natural hazards through the smart use of technology and (big) data. For Practical Action, he investigates how technology can improve climate risk management for the poor and vulnerable. Before, Marc founded TNO’s ICT4D team, where he worked on the data-driven development of an early warning system to close the last mile information gap in Bangladesh. He provided input to the Asia Regional Plan for Implementation of Sendai. Marc holds IFRC’s disaster management certificate, an MBA and a PhD in physics.


Food security is commonly measured by means of surveys, requiring substantial time and budget. Open data can possibly serve as a cost-effective alternative to predict food security. In this paper a method is proposed that uses open data related to food insecurity drivers to predict food security in Ethiopia at the subnational level. The method is based on an ordinal classification approach with a random forest as underlying algorithm. The model turned out to have an accuracy of approximately 90%. Although using an ordinal approach increases performance, a negative side-effect is that the model struggled to predict records with the label ‘stressed’ as a target. The basis of this effect lays in how probabilities for classes ranked in the middle are calculated. Further research on adding open data sources on other drivers and on finetuning hyperparameters in the modelling is advised before implementing machine learning to predict food security.

Artificial Intelligence for Sustainability

Stefano Ermon

Stanford University

Email address: ermon@cs.stanford.edu

Biography: Stefano Ermon is an Assistant Professor of Computer Science in the CS Department at Stanford University, where he is affiliated with the Artificial Intelligence Laboratory, and a fellow of the Woods Institute for the Environment. His research is centered on techniques for probabilistic modeling of data, inference, and optimization, and is motivated by a range of applications, in particular ones in the emerging field of computational sustainability. He has won several awards, including four Best Paper Awards (AAAI, UAI and CP), a NSF Career Award, an ONR Young Investigator Award, a Sony Faculty Innovation Award, an AWS Machine Learning Award, a Hellman Faculty Fellowship, and the IJCAI Computers and Thought Award. Stefano earned his Ph.D. in Computer Science at Cornell University in 2015.


Recent technological developments are creating new spatio-temporal data streams that contain a wealth of information relevant to sustainable development goals. Modern AI techniques have the potential to yield accurate, inexpensive, and highly scalable models to inform research and policy. A key challenge, however, is the lack of large quantities of labeled data that often characterize successful machine learning applications. In this talk, I will present new approaches for learning useful spatio-temporal models in contexts where labeled training data is scarce or not available at all. I will show applications to predict and map poverty in developing countries, monitor agricultural productivity and food security outcomes, and map infrastructure access in Africa. Our methods can reliably predict economic well-being using only high-resolution satellite imagery. Because images are passively collected in every corner of the world, our methods can provide timely and accurate measurements in a very scalable end economic way, and could revolutionize efforts towards global poverty eradication.

Machine Learning for the Developing World

Maria De-Arteaga

Carnegie Mellon University

Email address: mdeartea@andrew.cmu.edu

Biography: Maria De-Arteaga is a fourth year PhD candidate in the joint Machine Learning and Public Policy program at Carnegie Mellon University, where she is advised by Artur Dubrawski. She is passionate about creating novel machine learning algorithms that are motivated by existing societal challenges, improving fairness qualities of machine learning systems, and understanding how machine learning can better help us achieve global development goals. During her PhD, Maria has applied her work to counter-human trafficking and sexual violence initiatives, as well as various projects in the healthcare domain. In 2017, she co-organized the NIPS Workshop on Machine Learning for the Developing World.


Researchers from across the social and computer sciences are increasingly using machine learning to study and address global development challenges. In this talk, I examine the burgeoning field of machine learning for the developing world (ML4D). Suggestions of best practices drawn from the literature for ensuring that ML4D projects are relevant to the advancement of development objectives are presented, followed by a discussion on how developing world challenges can motivate the design of novel machine learning methodologies. This talk proposes a roadmap for bridging development gaps through ML, accompanied by examples from the literature. It also discusses how technical complications of ML4D can be treated as novel research questions, how ML4D can motivate new research directions, and where machine learning can be most useful.