Center for Anytime Anywhere Analytics

Journal Paper

@article{raghunandan2024LodestarSupportingRapid, title={Lodestar: Supporting Rapid Prototyping of Data Science Workflows through Data-Driven Analysis Recommendations}, author={Deepthi Raghunandan and Zhe Cui and Kartik Krishnan and Segen Tirfe and Shenzhi Shi and Tejaswi Darshan Shrestha and Leilani Battle and Niklas Elmqvist}, url={https://doi.org/10.1177/14738716231190429}, year={2024}, date={2024-01-01}, journal={Information Visualization}, publisher={SAGE Publications Inc.}, address={Thousand Oaks, CA, USA}, volume={23}, number={1}, pages={21--39}, doi={10.1177/14738716231190429}, issn={1473-8716}, abstract={Keeping abreast of current trends, technologies, and best practices in visualization and data analysis is becoming increasingly difficult, especially for fledgling data scientists. In this paper, we propose lodestar, an interactive computational notebook that allows users to quickly explore and construct new data science workflows by selecting from a list of automated analysis recommendations. We derive our recommendations from directed graphs of known analysis states, with two input sources: one manually curated from online data science tutorials, and another extracted through semi-automatic analysis of a corpus of over 6000 Jupyter notebooks. We validated Lodestar through three separate user studies: first a formative evaluation involving novices learning data science using the tool. We used the feedback from this study to improve the tool. This was followed by a summative study involving both new and returning participants from the formative evaluation to test the efficacy of our improvements. We also engaged professional data scientists in an expert review assessing the utility of the different recommendations. Overall, our results suggest that both novice and professional users find Lodestar useful for rapidly creating data science workflows.}, language={English}, keywords={computational notebook,data science,Markov chain,Python,visualization recommendation}, }

Information Visualization • 2024

doi Lodestar: Supporting Rapid Prototyping of Data Science Workflows through Data-Driven Analysis Recommendations^↗

Deepthi Raghunandan

Zhe Cui

Kartik Krishnan

Segen Tirfe

Shenzhi Shi

Tejaswi Darshan Shrestha

Leilani Battle

Niklas Elmqvist

Click to read abstract

Keeping abreast of current trends, technologies, and best practices in visualization and data analysis is becoming increasingly difficult, especially for fledgling data scientists. In this paper, we propose lodestar, an interactive computational notebook that allows users to quickly explore and construct new data science workflows by selecting from a list of automated analysis recommendations. We derive our recommendations from directed graphs of known analysis states, with two input sources: one manually curated from online data science tutorials, and another extracted through semi-automatic analysis of a corpus of over 6000 Jupyter notebooks. We validated Lodestar through three separate user studies: first a formative evaluation involving novices learning data science using the tool. We used the feedback from this study to improve the tool. This was followed by a summative study involving both new and returning participants from the formative evaluation to test the efficacy of our improvements. We also engaged professional data scientists in an expert review assessing the utility of the different recommendations. Overall, our results suggest that both novice and professional users find Lodestar useful for rapidly creating data science workflows.
Conference Paper

@inproceedings{Raghunandan2023, title={Code Code Evolution: Understanding How People Change Data Science Notebooks Over Time}, author={Deepthi Raghunandan and Aayushi Roy and Shenzhi Shi and Niklas Elmqvist and Leilani Battle}, url={https://users.umiacs.umd.edu/~elm/projects/cce/cce.pdf}, year={2023}, date={2023-04-24}, booktitle={Proceedings of the ACM Conference on Human Factors in Computing Systems}, journal={Proceedings of the ACM Conference on Human Factors in Computing Systems}, publisher={ACM}, address={New York, NY, USA }, abstract={Sensemaking is the iterative process of identifying, extracting, and explaining insights from data, where each iteration is referred to as the “sensemaking loop.” However, little is known about how sensemaking behavior evolves from exploration and explanation during this process. This gap limits our ability to understand the full scope of sensemaking, which in turn inhibits the design of tools that support the process. We contribute the first mixed-method to characterize how sensemaking evolves within computational notebooks. We study 2,574 Jupyter notebooks mined from GitHub by identifying data science notebooks that have undergone significant iterations, presenting a regression model that automatically characterizes sensemaking activity, and using this regression model to calculate and analyze shifts in activity across GitHub versions. Our results show that notebook authors participate in various sensemaking tasks over time, such as annotation, branching analysis, and documentation. We use our insights to recommend extensions to current notebook environments.}, keywords={}, }

Proceedings of the ACM Conference on Human Factors in Computing Systems • 2023

pdf Code Code Evolution: Understanding How People Change Data Science Notebooks Over Time^↗

Deepthi Raghunandan

Aayushi Roy

Shenzhi Shi

Niklas Elmqvist

Leilani Battle

Click to read abstract

Sensemaking is the iterative process of identifying, extracting, and explaining insights from data, where each iteration is referred to as the “sensemaking loop.” However, little is known about how sensemaking behavior evolves from exploration and explanation during this process. This gap limits our ability to understand the full scope of sensemaking, which in turn inhibits the design of tools that support the process. We contribute the first mixed-method to characterize how sensemaking evolves within computational notebooks. We study 2,574 Jupyter notebooks mined from GitHub by identifying data science notebooks that have undergone significant iterations, presenting a regression model that automatically characterizes sensemaking activity, and using this regression model to calculate and analyze shifts in activity across GitHub versions. Our results show that notebook authors participate in various sensemaking tasks over time, such as annotation, branching analysis, and documentation. We use our insights to recommend extensions to current notebook environments.
Conference Paper

@inproceedings{Raghunandan2021, title={Lodestar: Supporting Independent Learning and Rapid Experimentation Through Data-Driven Analysis Recommendations}, author={Deepthi Raghunandan and Zhe Cui and Kartik Krishnan and Segen Tirfe and Shenzhi Shi and Tejaswi Darshan Shrestha and Leilani Battle and Niklas Elmqvist}, url={https://users.umiacs.umd.edu/~elm/projects/lodestar/lodestar.pdf}, year={2021}, date={2021-10-01}, journal={Proceedings of the Symposium on Visualization in Data Science}, booktitle={Proceedings of the Symposium on Visualization in Data Science}, abstract={Keeping abreast of current trends, technologies, and best practices in visualization and data analysis is becoming increasingly difficult, especially for fledgling data scientists. In this paper, we propose Lodestar, an interactive computational notebook that allows users to quickly explore and construct new data science workflows by selecting from a list of automated analysis recommendations. We derive our recommendations from directed graphs of known analysis states, with two input sources: one manually curated from online data science tutorials, and another extracted through semi-automatic analysis of a corpus of over 6,000 Jupyter notebooks. We evaluate Lodestar in a formative study guiding our next set of improvements to the tool. Our results suggest that users find Lodestar useful for rapidly creating data science workflows.}, keywords={}, }

Proceedings of the Symposium on Visualization in Data Science • 2021

pdf Lodestar: Supporting Independent Learning and Rapid Experimentation Through Data-Driven Analysis Recommendations^↗

Deepthi Raghunandan

Zhe Cui

Kartik Krishnan

Segen Tirfe

Shenzhi Shi

Tejaswi Darshan Shrestha

Leilani Battle

Niklas Elmqvist

Click to read abstract

Keeping abreast of current trends, technologies, and best practices in visualization and data analysis is becoming increasingly difficult, especially for fledgling data scientists. In this paper, we propose Lodestar, an interactive computational notebook that allows users to quickly explore and construct new data science workflows by selecting from a list of automated analysis recommendations. We derive our recommendations from directed graphs of known analysis states, with two input sources: one manually curated from online data science tutorials, and another extracted through semi-automatic analysis of a corpus of over 6,000 Jupyter notebooks. We evaluate Lodestar in a formative study guiding our next set of improvements to the tool. Our results suggest that users find Lodestar useful for rapidly creating data science workflows.

Publications coauthored by

Deepthi Raghunandan

doi Lodestar: Supporting Rapid Prototyping of Data Science Workflows through Data-Driven Analysis Recommendations ↗

pdf Code Code Evolution: Understanding How People Change Data Science Notebooks Over Time ↗

pdf Lodestar: Supporting Independent Learning and Rapid Experimentation Through Data-Driven Analysis Recommendations ↗

doi Lodestar: Supporting Rapid Prototyping of Data Science Workflows through Data-Driven Analysis Recommendations^↗

pdf Code Code Evolution: Understanding How People Change Data Science Notebooks Over Time^↗

pdf Lodestar: Supporting Independent Learning and Rapid Experimentation Through Data-Driven Analysis Recommendations^↗