Thanks for attending my talk at Holy Cross on November 19, 2024!

Link to slides (view only PDF): https://filedn.eu/lldOHjCIRMjfewo3JirFYqh/website-documents/Rothschild_Holy-Cross_Research-Talk_11-19-24.pdf

  • Slide 4:
  • Slide 5:
    • Gender Shades: Joy Buolamwini and Timnit Gebru. 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency, 77–91. https://proceedings.mlr.press/v81/buolamwini18a.html
  • Slide 6:
  • Slide 9:
    • ImageNet: J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, ImageNet: A Large-Scale Hierarchical Image Database. IEEE Computer Vision and Pattern Recognition (CVPR), 2009.
  • Slide 10:
  • Slide 11:
    • Impact on development of computer vision field:
      • Image credit: Google Scholar snapshot
      • (Raji et al., 2021): ​Deborah Raji, Emily Denton, Emily M. Bender, Alex Hanna, and Amandalynne Paullada. 2021. AI and the Everything in the Whole Wide World Benchmark. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1. https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/084b6fbb10729ed4da8c3d3f5a3ae7c9-Abstract-round2.html
    • Example of key benchmarking dataset:
      • Image credit: paper screenshot, citation immediately follows
      • (Northcutt et al., 2021): Curtis G. Northcutt, Anish Athalye, and Jonas Mueller. 2021. Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks. Retrieved November 14, 2022 from http://arxiv.org/abs/2103.14749
    • Assembled by crowdworkers:
      • Image credit: screenshot from Dr. Fei Fei Li’s slides (Fei-Fei Li. 2010. crowdsourcing, benchmarking & other cool things. Retrieved from https://image-net.org/static_files/papers/ImageNet_2010.pdf)
      • (Tsipras et al., 2020): Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Andrew Ilyas, and Aleksander Madry. 2020. From ImageNet to Image Classification: Contextualizing Progress on Benchmarks. In Proceedings of the 37th International Conference on Machine Learning, 9625–9635. Retrieved September 1, 2022 from https://proceedings.mlr.press/v119/tsipras20a.html
  • Slide 12:
    • Images reproduced from https://labelerrors.com/
    • (Youbi Idrissi et al., 2022): Badr Youbi Idrissi, Diane Bouchacourt, Randall Balestriero, Ivan Evtimov, Caner Hazirbas, Nicolas Ballas, Pascal Vincent, Michal Drozdzal, David Lopez-Paz, and Mark Ibrahim. 2022. ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations. https://doi.org/10.48550/arXiv.2211.01866
    • ​(Vasudevan et al., 2022): Vijay Vasudevan, Benjamin Caine, Raphael Gontijo-Lopes, Sara Fridovich-Keil, and Rebecca Roelofs. 2024. When does dough become a bagel? analyzing the remaining mistakes on ImageNet. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS ’22), 6720–6734.
  • Slide 14:
  • Slide 15: see slide 14 for image credit
  • Slide 17:
    • Images from Time 100 People in AI list: The 100 Most Influential People in AI 2024. TIME. Retrieved November 17, 2024 from https://time.com/collection/time100-ai-2024/
    • (D’Alessandro et al., 2017): B. D’Alessandro, C. O’Neil, T. LaGatta, Conscientious classification: A data scientist’s guide to discrimination-aware classification, Big Data 5 (2017) 120–134.​
  • Side 17: see slide 6
  • Slide 18:
  • Slide 20:
    • Image credit: Tim Siminote. 2021. Google launches a new medical app—outside the United States. Ars Technica. Retrieved November 17, 2024 from https://www.wired.com/story/google-launches-medical-app-outside-us/
    • Citation: Lara Schenck, Dana Priest, Gabe Dubose, Zajerria Godfrey, Annabel Rothschild, Ben Rydal Shapiro, and Betsy DiSalvo. 2025. “A Window into Data Apprenticeship: Developing an Integrated Work-Training Curriculum for Novice Adults”. In SIGCSE TS 2025 (ACM Special Interest Group on Computer Science Education).​
  • Slide 21:
    • AAL terminology explanation: *Nicholas Deas, Jessi Grieser, Shana Kleiner, Desmond Patton, Elsbeth Turcan, and Kathleen McKeown. 2023. Evaluation of African American Language Bias in Natural Language Generation. https://doi.org/10.48550/arXiv.2305.14291​
    • Citation: Carl DiSalvo, Annabel Rothschild, Lara L. Schenck, Ben Shapiro, and Betsy DiSalvo. 2024. “When Workers Want to Say No: A View into Critical Consciousness and Workplace Democracy in Data Work”. Proc. ACM Hum.-Comput. Interact. 8, CSCW1, Article 156 (April 2024),​
  • Slide 22:
  • Slide 24:
    • Citizen science citation: Ashley Boone, Annabel Rothschild, Xander Koo, Grace Pfohl, Alyssa Sheehan, Betsy DiSalvo, Christopher Le Dantec, and Carl DiSalvo. 2024. “Reimagining Meaningful Data Work through Citizen Science”. Proc. ACM Hum.-Comput. Interact. January 2024.​
  • Slide 26:
    • Paper citation: Annabel Rothschild, Ding Wang, Niveditha Jayakumar, Lauren Wilcox, Carl DiSalvo and Betsy DiSalvo. 2024. “The Problems with Proxies: Making Data Work Visible through Requester Practices”. AIES (Conference on Artificial Intelligence, Ethics, and Society).​ https://ojs.aaai.org/index.php/AIES/article/view/31721
  • Slides 26 – 29: author’s own photos
  • Slide 32:
    • (Jacobs & Wallach, 2023): Jacobs, A. Z., & Wallach, H. (2021). Measurement and Fairness. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 375–385. https://doi.org/10.1145/3442188.3445901
  • Slide 35:
    • (Trace & Hodges, 2024): Ciaran B. Trace and James A. Hodges. 2024. The Role of Paradata in Algorithmic Accountability. In Perspectives on Paradata: Research and Practice of Documenting Process Knowledge, Isto Huvila, Lisa Andersson and Olle Sköld (eds.). Springer International Publishing, Cham, 197–213. https://doi.org/10.1007/978-3-031-53946-6_11
    • (Gebru et al., 2021): Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford. 2021. Datasheets for datasets. Communications of the ACM 64, 12: 86–92. https://doi.org/10.1145/3458723​
  • Slide 36:
    • Paper citation: Annabel Rothschild, Amanda Meng, Carl DiSalvo, Britney Johnson, Ben Rydal Shapiro, and Betsy DiSalvo. 2022. ”Interrogating Data Work as a Community of Practice”. Proceedings of the ACM on Human-Computer Interaction 6, Article 307 (November 2022) (2022), 29.​
    • Link to City of Atlanta heat sheet: https://drive.google.com/file/d/1XRTlCfyP1j2wyhvov3b_RCn5pULhDN2_/view
  • Slide 36 – 48: more information about Datum Fieldnotes (including trying out the tool for yourself): https://dataworkforce.gatech.edu/datum-fieldnotes/
  • Slide 39:
    • Nault, K., Ruhi, U., & Livvarcin, O. (n.d.). Exploring the Applications & Challenges of Data Analytics in Non-Profit Organizations. AMCIS 2020, Session 8.
    • Shapiro, S. J., & Oystrick, V. (2018). Three Steps Toward Sustainability: Spreadsheets as a Data-Analysis System for Non-Profit Organizations. Canadian Journal of Program Evaluation, 33(2), 247–257. https://doi.org/10.3138/cjpe.31157
    • Harmon, E., Bopp, C., & Voida, A. (2017). The Design Fictions of Philanthropic IT: Stuck Between an Imperfect Present and an Impossible Future. Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 7015–7028. https://doi.org/10.1145/3025453.3025650
    • Benjamin, L. M., Voida, A., & Bopp, C. (2018). Policy fields, data systems, and the performance of nonprofit human service organizations. Human Service Organizations: Management, Leadership & Governance, 42(2), 185–204. https://doi.org/10.1080/23303131.2017.1422072
    • Voida, A., Harmon, E., & Al-Ani, B. (2011). Homebrew databases: Complexities of everyday information management in nonprofit organizations. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 915–924. https://doi.org/10.1145/1978942.1979078
    • Erete, S., Ryou, E., Smith, G., Fassett, K. M., & Duda, S. (2016). Storytelling with Data: Examining the Use of Data by Non-Profit Organizations. Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing, 1273–1283. https://doi.org/10.1145/2818048.2820068
    • Davies, T., & Frank, M. (2013). ‘There’s no such thing as raw data’. Exploring the socio-technical life of a government dataset. Proceedings of the 5th Annual ACM Web Science Conference. https://www.academia.edu/70581019/_There_s_no_such_thing_as_raw_data_Exploring_the_socio_technical_life_of_a_government_dataset
    • Darian, S., Chauhan, A., Marton, R., Ruppert, J., Anderson, K., Clune, R., Cupchak, M., Gannett, M., Holton, J., Kamas, E., Kibozi-Yocka, J., Mauro-Gallegos, D., Naylor, S., O’Malley, M., Patel, M., Sandberg, J., Siegler, T., Tate, R., Temtim, A., … Voida, A. (2023). Enacting Data Feminism in Advocacy Data Work. Proc. ACM Hum.-Comput. Interact., 7(CSCW1), 47:1-47:28. https://doi.org/10.1145/3579480
  • Slide 45:
    • Sands, A., Borgman, C. L., Wynholds, L., & Traweek, S. (2012). Follow the data: How astronomers use and reuse data. Proceedings of the American Society for Information Science and Technology, 49(1), 1–3. https://doi.org/10.1002/meet.14504901341
    • Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford. 2021. Datasheets for datasets. Communications of the ACM 64, 12: 86–92. https://doi.org/10.1145/3458723​
  • Slide 51:
  • Slide 52:
    • DataWorks, Amazon Mechanical Turk, ChatGPT logos — copyright of respective organizations
    • Paper citation: Grace Kim, Annabel Rothschild, Carl DiSalvo, and Betsy DiSalvo. 2024. “What’s Your Stake in Sustainability of AI?: An Informed Insider’s Guide”. AIES (Conference on Artificial Intelligence, Ethics, and Society).​ https://ojs.aaai.org/index.php/AIES/issue/view/609