By some time next year, predicts Raghua Ramakrishnan, there will be more than 20 billion connected electronic devices in the world.
“The amount of data exchanged by sensors will be more than the amount exchanged by human beings,” said the CTO for Data and a Technical Fellow at Microsoft, speaking at the third annual Rice Data Science Conference. The Oct. 14-15 event was hosted by the Ken Kennedy Institute for Information Technology and drew some 340 leaders from industry and academia.
“Data science is a field in which scientists work with domain experts to address complex problems beyond what was thought possible even five years ago,” said Lydia Kavraki, director of the Ken Kennedy Institute and Noah Harding Professor of Computer Science at Rice.
“Houston, with its universities, the Texas Medical Center, NASA, the energy industry and its rapidly evolving entrepreneurial landscape, is the ideal place for such a conference.”
Speaking on “Data in the Cloud,” Ramakrishnan said: “Soon we will see kiosks that will bring the power of the Cloud to Mom and Pop stores. Data emanates from our tools, transportation, homes, even our bodies. Everything in the world is becoming observable.”
Dr. Joshua C. Denny, professor of biomedical informatics and medicine at Vanderbilt University, leads the Data and Research Center of the All of Us program, administered by the National Institutes of Health. Created in 2015 by President Barack Obama, All of Us hopes to collect genetic and health data from one million volunteers in the United States. The goal is to tailor medical care to individual patients.
“This hasn’t been robustly attempted before. It’s an ambitious project. We have already collected biospecimens from more than 276,000 participants, at a rate of about 3,100 a week. More than 80 percent of them are from groups that have been historically underrepresented in biomedical research,” Denny said.
All of Us currently enrolls participants who are at least 18 years old at 350 recruitment sites around the country. In early 2020, the data collected from volunteers will start becoming available to researchers. “We intend to make precision medicine a reality,” Denny said.
“We strive for real-time access to knowledge, a real-time continuous platform devoted to individual patients. We want all the data of what you are,” said Dr. David Jaffray, who this year became the first-ever chief technology and digital officer at the University of Texas MD Anderson Cancer Center in Houston. Jaffray also holds a faculty appointment as professor of radiation physics and a joint appointment in imaging physics.
Jaffray said he joined MD Anderson to oversee the design and implementation of a new technology infrastructure that will “safeguard the integrity and availability of the institution’s systems and intellectual property assets,” with machine learning at its core.
“A rich semantics will allow machines to understand meaning. There’s a growing realization that we need an alternative approach to accelerate progress against cancer. We need to move beyond the promise of Big Data and focus on building the machinery to collect, curate, analyze and learn continuously,” Jaffray said.
GQ Zhang , vice president and chief data scientist at UTHealth in Houston, participates in the National Sleep Research Resource and the Center for Sudden Unexpected Death in Epilepsy Research, the largest and most comprehensive clinical data sets in those fields.
“There has not been enough innovation ins our efforts to collect large data sets in medicine. There has not been enough disruption. That is beginning to change. It must change,” said Zhang, who displayed a seemingly random table of numbers and asked the audience if it constituted data.
“These are numbers, not data,” he said. As he added more context, the numbers emerged as a chart illustrating correlations of height and weight, which physicians use to calculate the BMI (body-mass index).
“We start with data, which can become information, which can become knowledge, which can become wisdom,” Zhang said.
Genevera Allen, associate professor of electrical and computer engineering, and founder/faculty director of the Center for Transforming Data to Knowledge (D2K Lab), and one of her students, Cara Tan, a senior in computer science and statistics, spoke of the D2K Lab creating opportunities for student experiential learning working on real data provided by industry, not-for profits and internal Rice projects. “We are preparing students who can transform data to knowledge,” Allen said.
In the D2K Lab’s consulting clinic, students work directly with companies, academic labs, government agencies and nonprofits, helping them to translate raw data into actionable ideas. Earlier this year, the faculty senate at Rice adopted data science as an interdisciplinary minor.
Eitan Anzenberg is director of data science at Bill.com, a cloud-based payment management platform based in Palo Alto, Calif. The company has more than 3 million clients, reads 60 million bills and invoices each year, and annually processes $70 billion in payments.
“Our company read bills, even hand-written bills that are messy and difficult to interpret. Using machine learning, we find patterns in documents. Rather than spending a month figuring out an unsupervised machine-learning problem, we use a character-level classification system that is probabilistic, with a high confidence score per character,” Anzenberg said.
Among the other plenary speakers were Krersti Engan, professor of electrical engineering and computer science at Stavanger University in Norway, and Alena Crivello, a digital fluency adviser at Chevron, who earned her Ph.D. in statistics from Rice in 2006. The latter observed that every two days more biomedical data accumulates in the world than through all of human history prior to 2003. It grows by a rate of 50 percent annually.
The conference featured four plenary presentations, four parallel tracks (methods and algorithms, energy, healthcare, public good) and the presentation of more than 50 student posters.
In addition to Jan Odegard, executive director of the Ken Kennedy Institute, the conference organizing committee included Natalie Berestovsky, Occidental Petroleum; Keith Cooper, the L. John and Ann H. Doerr Professor in Computational Engineering at Rice; Alena Crivello, Chevron; Max Grossman, NAG; Giewee Hammond, Aramco Services; Roy Keyes, Houston Data Science Group; Scott Ferguson, HEDS Meetup; Scott Morton, Rice; Lilia Reddy, Chevron; Craig Rusin, Baylor College of Medicine; Julianna Toms, BP; Jim Ward, Two Sigma; and Yan Xu, Houston Machine Learning Meetup.