Data scientists call for greater representation in field

COURTESY IMAGE / WM.EDU

Thursday, Feb. 6, students at the College of William and Mary gathered in the Integrated Science Center for a talk by Emory University professor Laura Klein, a scholar specializing in the intersection between identity and analytical data science.

Klein was introduced by College professor and gender, sexuality and women’s studies program director Elizabeth Losh, who emphasized Klein’s role in explaining the implications data science holds for women and racial minorities. She then detailed Klein’s current research endeavors.

“She is currently at work on two major projects: data by design, which offers an interactive history of data visualization from the 18th century to the present and vectors of freedom, which explores how quantitative methods can help to identify new actors and new pathways of influence in the archive of the Abolitionist Movement of 19th century United States,” Losh said.

Klein began her talk by providing a brief overview of her new book “Data Feminism.” Her book, which has been open for peer review since November 2018, seeks to integrate feminist perspectives into data science. Klein detailed that human, environmental and moral costs are often ignored throughout the data collection and analysis process, making data science remarkably similar to another extractive industry.

“In today’s world, data is power,” Klein said. “… Data is the new oil.”

“In today’s world, data is power,” Klein said. “… Data is the new oil.”

According to Klein, ordinary individuals have become disempowered and overburdened by big data. Contemporary data science has developed along the same structural fault lines as other modern institutions, where certain groups are disproportionately likely to suffer from power imbalances and a lack of representation in the field.

“These power imbalances are nothing new … for women, for black people, for queer people, for trans people, immigrants, all sorts of marginalized groups … it’s just the same old oppression,” Klein said. “… Data science needs feminism, and intersectional feminism in particular, if we are to ever have hope of overturning these power imbalances.”

“These power imbalances are nothing new … for women, for black people, for queer people, for trans people, immigrants, all sorts of marginalized groups … it’s just the same old oppression,” Klein said. “… Data science needs feminism, and intersectional feminism in particular, if we are to ever have hope of overturning these power imbalances.”

In working with her co-author Catherine D’Ignazio, both women brainstormed the ways in which intersectional feminism could be best integrated into data science and gauged which principles were most important to include in their work. Their primary goal was to craft a feminist framework that could be used for data scientists, in order to make sure that women’s perspectives were actively considered by scholars in the field. The authors were less interested in the reciprocal relationship, feeling limited incentives to incorporate data science principles into feminism and focusing on the former relationship instead.

Klein then transitioned into explaining data science’s current failures by detailing two news stories from the previous year. One described an effort by Amazon to craft a machine-learning technology capable of screening job applications and eliminating applicants who were not selected to proceed to the interview stage. According to Klein, since the technology’s development was dependent on reviewing the profiles of existing Amazon employees — who are disproportionately white and male — the system was biased against women and minorities.

“The system identified features that would be grounds to rank someone lower in the applicant pool,” Klein said. “… For instance, having gone to a women’s college, because very few Amazon employees were women and therefore had not have gone to a women’s college.”

According to Klein, the characteristics that the system prioritized were indicative of systemic bias, both in gender and socioeconomic status.

“… When they identified the features that did highly correlate with a successful screening score, it was being named Jared and playing high school lacrosse,” Klein said.

Klein also referenced facial recognition technology as a problematic usage of data science, given its tendency to only recognize white people. Since these facial recognition technologies emerged from predominantly white and male research teams, their successful usage is limited to a narrow demographic of the global population.

She then illustrated the importance of context in understanding data by describing research conducted by D’Ignazio’s former students, who were interested in visualizing discrepancies in sexual assault reporting at two New England universities. Klein said that her colleague’s students uncovered an unexpectedly high frequency of sexual assault reports at Williams College, a small liberal arts institution in rural Massachusetts, while reporting surprisingly low report rates from students at Boston University, an urban campus in downtown Boston.

Klein said this discovery demonstrated context’s vitality in data science.

“Then they stopped to think about it some more, and they realized there were reasons why Williams College essentially had a higher number of reported instances of sexual assault and BU didn’t, and it all had to do with the context around the data collection,” Klein said.

Since students at Boston University were disincentivized to report sexual assault due to various mechanisms impending them from reporting sexual assault, the university appeared to have lower assault rates than Williams College — an inference that would be grossly incomplete.

“Not knowing the context surrounding the data can lead to vastly inaccurate and inappropriate conclusions,” Klein said.

Klein’s discussions on data visualization and the roles women and minorities can play in the field drew students of different backgrounds to the talk, including Gujie Shen ’23.

“I was always interested in feminism and I wanted to see how it could be combined with data science,” Shen said. “I wanted to hear what are some of the practical effects of digital humanities projects, and how it can be used, as she mentioned, to mobilize.”

“I was always interested in feminism and I wanted to see how it could be combined with data science,” Shen said. “I wanted to hear what are some of the practical effects of digital humanities projects, and how it can be used, as she mentioned, to mobilize.”

News Editor Leslie Davis ’21 contributed reporting to this article.

2 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here