2018 FALL - Data-readiness in a World of AI

DATE, LOCATION & HOST

The 2018 fall PRISME Forum Tech Meeting was held Thursday, the 15th of November, 2018, and hosted by Takeda Pharmaceuticals at 1 Takeda Pkwy, Deerfield, IL 60015.

PRISME Forum Technical Meeting Chair: Christian Baber, Shire

PRISME Forum Chair: Dan Chapman, UCB

Fall Business Meeting of the PRISME Forum 2018

Tuesday and Wednesday, the 13th and 14th of November, 2018

Meeting hosted by Takeda Pharmaceuticals

Data-readiness in a World of AI

One of the key points of discussion at the last two PRISME Forum Technical Meetings on the topic of AI was that the limitations for AI/ML was not computing power, nor indeed algorithms, rather it was the availability of high-quality and fit-for-purpose structured data sets labeled both with appropriate metadata and endpoints. The scarcity of data for training machine learning is a fundamental feature of AI in the Life Science industry. Living systems are complex and noisy and as such require a significant amount of data to model them accurately. While substantial amounts of in vitro experimental data exist, in vivo data is much more difficult to collect and, in the case of human data, use is limited by informed consent, privacy regulations and ethical considerations. The idea that ‘data is more important than algorithms’, has been gaining support since 2001 when Banko et al. published their paper “Scaling to Very Very Large Corpora for Natural Language Disambiguation”i which demonstrated that several very different Machine Learning Algorithms performed almost identically well on the complex problem of natural language disambiguation once they were given enough data. The idea was, more recently, taken up by an article entitled “The Unreasonable Effectiveness of Data”ii by Peter Norvig et. al. in 2009 which showed (Figure 1) that it can be relatively easy to reach around 50% accuracy using a variety of algorithms but to improve further, the need for data grows logarithmically. For AI to be effective a sufficient amount of high-quality data needs to be readily available. The biopharmaceutical and healthcare industry in its entirety has a great deal of data. However, this data is rarely in a form amenable to use to train AI/ML methods without substantial data cleanup and labeling with meta-data and endpoints. Additionally, this data is generally widely dispersed both within individual companies and between companies. This causes problems with gaining access to the data and, with the diversity of data formats, reading and understanding the data. Individual biopharmaceutical companies self-evidently have less data on which to train AI/ML systems to produce robust and generalizable results. If there were cross-company collaboration to merge data sets then much larger, more diverse and more effective training data sets could be made available. Despite this, the industry is cautious about sharing its data; not least because companies fear they will compromise or lose their IP. Other alternatives to address the issue include methods that mitigate data shortage and overfitting such as transfer learning, multi-task learning and the generation of synthetic data. This PRISME Forum Technical Meeting will set out to explore opportunities for the biopharmaceutical industry to improve timely access to sufficient, high-quality data, on which AI systems can be trained (both within and beyond individual companies) and to use best the available data in the age of AI. A focus will be on practical examples that have been implemented at pharmaceutical companies along with efforts that have been attempted, but failed, and associated lessons learned. Topics addressed include:

The implementation and use of the FAIR data principles (Findable, Accessible, Interoperable, Reusable)iii in industry Current tools and methods for meta data capture, end-state labeling and automated data preparation both at the point of creation and the time of use.
Practical storage, management and access to data from every stage of the R&D process and examples of data re-use & models constructed with data federated across multiple domains.
Examples of the use of methods such as transfer learning to reduce the amount of directly relevant data required to build models for specific tasks.
Methods that would allow companies to share their data, including the use of “guest-algorithms” that can train on data sets without exposing the IP.

The PRISME Forum Technical Meeting Advisory Committee:

Christian Baber (Chair), Head of R&D IT, Shire
Nick Brown, Head of Technology Incubation Lab, AstraZeneca
Dan Chapman, Head of IT New Med. Information Management, UCB
David Christie, Vice President, Enterprise Applications Group, CSL Behring
Lars Greiffenberg, Director – R&D IT and Translational Informatics, Abbvie
Carol Rohl, Executive Director, Scientific Information Management, Merck
Martin Romacker, Principal Scientist – Data and Information Architecture, Roche
Nico Stanculescu, Logistics, PRISME Forum
Jason Tetrault, Global Head Data Engineering and Emerging Technologies, Takeda
Jianchao (JC) Yao, Associate Principal Scientist, Merck

ⁱ https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/acl2001.pdf ⁱⁱ https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf ⁱⁱⁱ https://www.nature.com/articles/sdata201618

PRISME Forum Fall 2018 Tech Mtg Book Ver 11-14

Fall 2018 Tech Slides - 11-09

Baber

PRISME_Plenary_Google_Brain

Romacker

Musen PRISME

2018.11.15_PathAI_PRISME

Jason

Outlier - PRISME 2018

HealthVerity November 2018 FINAL

Riffyn Merck PRISME poster 20181115 v3

HOTEL

The hotel for this meeting is the Hyatt Regency Deerfield, located at 1750 Lake Cook Rd, Deerfield, IL 60015. The discounted room rate is $169 per night plus tax.

Rates are valid ONLY through October 13, 2018.

Reservations can be made online at https://book.passkey.com/go/PRISMEForum

When reserving a room, please remember to use “PRISME” for the above rate and appropriate allocation to our room block.

DISTANCE TO MEETING VENUE FROM THE AIRPORT

O’Hare International Airport is a 20 minute ride (14 .4 mi/24 km) from the meeting venue or conference hotel.

CAR SERVICES

Uber and Lyft remain reliable sources for the transfer between O’Hare and the meeting venue/hotel.

Additional car services will be posted soon!

MEETING AND SOCIAL EVENT VENUE TRANSFERS

Morning and afternoon transfers will be offered between the hotel, the meeting venue and the social/networking events (per program outline).