articles+ search results
101 articles+ results
1 - 20
Next
Number of results to display per page
1 - 20
Next
Number of results to display per page
1. CORAL: COde RepresentAtion learning with weakly-supervised transformers for analyzing data analysis [2022]
-
Ge Zhang, Mike A. Merrill, Yang Liu, Jeffrey Heer, and Tim Althoff
- EPJ Data Science, Vol 11, Iss 1, Pp 1-26 (2022)
- Subjects
-
Data science, Meta science, Representation learning, Computer applications to medicine. Medical informatics, and R858-859.7
- Abstract
-
Abstract Large scale analysis of source code, and in particular scientific source code, holds the promise of better understanding the data science process, identifying analytical best practices, and providing insights to the builders of scientific toolkits. However, large corpora have remained unanalyzed in depth, as descriptive labels are absent and require expert domain knowledge to generate. We propose a novel weakly supervised transformer-based architecture for computing joint representations of code from both abstract syntax trees and surrounding natural language comments. We then evaluate the model on a new classification task for labeling computational notebook cells as stages in the data analysis process from data import to wrangling, exploration, modeling, and evaluation. We show that our model, leveraging only easily-available weak supervision, achieves a 38% increase in accuracy over expert-supplied heuristics and outperforms a suite of baselines. Our model enables us to examine a set of 118,000 Jupyter Notebooks to uncover common data analysis patterns. Focusing on notebooks with relationships to academic articles, we conduct the largest study of scientific code to date and find that notebooks which devote an higher fraction of code to the typically labor-intensive process of wrangling data in expectation exhibit decreased citation counts for corresponding papers. We also show significant differences between academic and non-academic notebooks, including that academic notebooks devote substantially more code to wrangling and exploring data, and less on modeling.
- Full text View on content provider's site
2. Exploratory Analysis and Its Malcontents [2021]
-
Jeffrey Heer
- Harvard Data Science Review (2021)
- Subjects
-
Electronic computers. Computer science and QA75.5-76.95
- Full text View on content provider's site
-
Jon E. Froehlich, Siddhant Patil, Devanshi Chauhan, Manaswi Saha, Jeffrey Heer, and Rachel Kangas
- Proceedings of the ACM on Human-Computer Interaction. 4:1-26
- Subjects
-
Computer Networks and Communications, Human-Computer Interaction, Social Sciences (miscellaneous), Politics, Environmental planning, Multi stakeholder, and Political science
- Abstract
-
Traditionally, urban accessibility is defined as the ease of reaching destinations. Studies on urban accessibility for pedestrians with mobility disabilities (e.g., wheelchair users) have primarily focused on understanding the challenges that the built environment imposes and how they overcome them. In this paper, we move beyond physical barriers and focus on socio-political challenges in the civic ecosystem that impedes accessible infrastructure development. Using a multi-stakeholder approach, we interviewed five primary stakeholder groups (N=25): (1) people with mobility disabilities, (2) caregivers, (3) accessibility advocates, (4) department officials, and (5) policymakers. We discussed their current accessibility assessment and decision-making practices. We identified the key needs and desires of each group, how they differed, and how they interacted with each other in the civic ecosystem to bring about change. We found that people, politics, and money were intrinsically tied to underfunded accessibility improvement projects "without continued support from the public and the political leadership, existing funding may also disappear. Using the insights from these interviews, we explore how may technology enhance our stakeholders" decision-making processes and facilitate accessible infrastructure development.
- Full text View on content provider's site
-
Matthew Conlen, Megan Vo, Alan Tan, and Jeffrey Heer
- The 34th Annual ACM Symposium on User Interface Software and Technology.
-
Jeffrey Heer and Younghoon Kim
- IEEE Transactions on Visualization and Computer Graphics. 27:485-494
- Subjects
-
Computer Graphics and Computer-Aided Design, Computer Vision and Pattern Recognition, Signal Processing, Software, Data visualization, business.industry, business, Grammar, media_common.quotation_subject, media_common, Summative assessment, Human–computer interaction, Computer science, Animation, Statistical graphics, Visualization, Formative assessment, and Recommender system
- Abstract
-
Animated transitions help viewers follow changes between related visualizations. Specifying effective animations demands significant effort: authors must select the elements and properties to animate, provide transition parameters, and coordinate the timing of stages. To facilitate this process, we present Gemini, a declarative grammar and recommendation system for animated transitions between single-view statistical graphics. Gemini specifications define transition “steps” in terms of high-level visual components (marks, axes, legends) and composition rules to synchronize and concatenate steps. With this grammar, Gemini can recommend animation designs to augment and accelerate designers' work. Gemini enumerates staged animation designs for given start and end states, and ranks those designs using a cost function informed by prior perceptual studies. To evaluate Gemini, we conduct both a formative study on Mechanical Turk to assess and tune our ranking function, and a summative study in which 8 experienced visualization developers implement animations in D3 that we then compare to Gemini's suggestions. We find that most designs (9/11) are exactly replicable in Gemini, with many (8/11) achievable via edits to suggestions, and that Gemini suggestions avoid multiple participant errors.
- Full text View on content provider's site
-
Xiaoying Pu, Matthew Kay, Steven M. Drucker, Jeffrey Heer, Dominik Moritz, and Arvind Satyanarayan
- Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems.
-
Francis Nguyen, Xiaoli Qiao, Jeffrey Heer, and Jessica Hullman
- Computer Graphics Forum. 39:33-48
- Subjects
-
Computer Graphics and Computer-Aided Design, Computer science, Generalization, User studies, Human–computer interaction, Visualization, and Visual analytics
- Full text
View/download PDF
-
Jeffrey Heer, Dominik Moritz, Chanwut Kittivorawong, and Kanit Wongsuphasawat
- 2020 IEEE Visualization Conference (VIS).
- Subjects
-
Artificial intelligence, business.industry, business, Occupancy, Bitmap, computer.file_format, computer, Computer science, Chart, and Pattern recognition
-
Martin Schweinsberg, Michael Feldman, Nicola Staub, Olmo R. van den Akker, Robbie C.M. van Aert, Marcel A.L.M. van Assen, Yang Liu, Tim Althoff, Jeffrey Heer, Alex Kale, Zainab Mohamed, Hashem Amireh, Vaishali Venkatesh Prasad, Abraham Bernstein, Emily Robinson, Kaisa Snellman, S. Amy Sommer, Sarah M.G. Otner, David Robinson, Nikhil Madan, Raphael Silberzahn, Pavel Goldstein, Warren Tierney, Toshio Murase, Benjamin Mandl, Domenico Viganola, Carolin Strobl, Catherine B.C. Schaumans, Stijn Kelchtermans, Chan Naseeb, S. Mason Garrison, Tal Yarkoni, C.S. Richard Chan, Prestone Adie, Paulius Alaburda, Casper Albers, Sara Alspaugh, Jeff Alstott, Andrew A. Nelson, Eduardo Ariño de la Rubia, Adbi Arzi, Štěpán Bahník, Jason Baik, Laura Winther Balling, Sachin Banker, David AA Baranger, Dale J. Barr, Brenda Barros-Rivera, Matt Bauer, Enuh Blaise, Lisa Boelen, Katerina Bohle Carbonell, Robert A. Briers, Oliver Burkhard, Miguel-Angel Canela, Laura Castrillo, Timothy Catlett, Olivia Chen, Michael Clark, Brent Cohn, Alex Coppock, Natàlia Cugueró-Escofet, Paul G. Curran, Wilson Cyrus-Lai, David Dai, Giulio Valentino Dalla Riva, Henrik Danielsson, Rosaria de F.S.M. Russo, Niko de Silva, Curdin Derungs, Frank Dondelinger, Carolina Duarte de Souza, B. Tyson Dube, Marina Dubova, Ben Mark Dunn, Peter Adriaan Edelsbrunner, Sara Finley, Nick Fox, Timo Gnambs, Yuanyuan Gong, Erin Grand, Brandon Greenawalt, Dan Han, Paul H.P. Hanel, Antony B. Hong, David Hood, Justin Hsueh, Lilian Huang, Kent N. Hui, Keith A. Hultman, Azka Javaid, Lily Ji Jiang, Jonathan Jong, Jash Kamdar, David Kane, Gregor Kappler, Erikson Kaszubowski, Christopher M. Kavanagh, Madian Khabsa, Bennett Kleinberg, Jens Kouros, Heather Krause, Angelos-Miltiadis Krypotos, Dejan Lavbič, Rui Ling Lee, Timothy Leffel, Wei Yang Lim, Silvia Liverani, Bianca Loh, Dorte Lønsmann, Jia Wei Low, Alton Lu, Kyle MacDonald, Christopher R. Madan, Lasse Hjorth Madsen, Christina Maimone, Alexandra Mangold, Adrienne Marshall, Helena Ester Matskewich, Kimia Mavon, Katherine L. McLain, Amelia A. McNamara, Mhairi McNeill, Ulf Mertens, David Miller, Ben Moore, Andrew Moore, Eric Nantz, Ziauddin Nasrullah, Valentina Nejkovic, Colleen S Nell, Andrew Arthur Nelson, Gustav Nilsonne, Rory Nolan, Christopher E. O'Brien, Patrick O'Neill, Kieran O'Shea, Toto Olita, Jahna Otterbacher, Diana Palsetia, Bianca Pereira, Ivan Pozdniakov, John Protzko, Jean-Nicolas Reyt, Travis Riddle, Amal (Akmal) Ridhwan Omar Ali, Ivan Ropovik, Joshua M. Rosenberg, Stephane Rothen, Michael Schulte-Mecklenbeck, Nirek Sharma, Gordon Shotwell, Martin Skarzynski, William Stedden, Victoria Stodden, Martin A. Stoffel, Scott Stoltzman, Subashini Subbaiah, Rachael Tatman, Paul H. Thibodeau, Sabina Tomkins, Ana Valdivia, Gerrieke B. Druijff-van de Woestijne, Laura Viana, Florence Villesèche, W. Duncan Wadsworth, Florian Wanders, Krista Watts, Jason D Wells, Christopher E. Whelpley, Andy Won, Lawrence Wu, Arthur Yip, Casey Youngflesh, Ju-Chi Yu, Arash Zandian, Leilei Zhang, Chava Zibman, Eric Luis Uhlmann, and Psychometrics and Statistics
- Schweinsberg, Martin; Feldman, Michael; Staub, Nicola; van den Akker, Olmo R; van Aert, Robbie C M; van Assen, Marcel A L M; Liu, Yang; Althoff, Tim; Heer, Jeffrey; Kale, Alex; Mohamed, Zainab; Amireh, Hashem; Venkatesh Prasad, Vaishali; Bernstein, Abraham; Robinson, Emily; Snellman, Kaisa; Amy Sommer, S; Otner, Sarah M G; Robinson, David; Madan, Nikhil; Silberzahn, Raphael; Goldstein, Pavel; Tierney, Warren; Murase, Toshio; Mandl, Benjamin; Viganola, Domenico; Strobl, Carolin; Schaumans, Catherine B C; Kelchtermans, Stijn; Naseeb, Chan; Mason Garrison, S; Yarkoni, Tal; Richard Chan, C S; Adie, Prestone; Alaburda, Paulius; Albers, Casper; Alspaugh, Sara; Alstott, Jeff; Nelson, Andrew A; Ariño de la Rubia, Eduardo; Arzi, Adbi; Bahník, Štěpán; Baik, Jason; Winther Balling, Laura; Banker, Sachin; AA Baranger, David; Barr, Dale J; Barros-Rivera, Brenda; Bauer, Matt; Blaise, Enuh; Boelen, Lisa; Bohle Carbonell, Katerina; Briers, Robert A; Burkhard, Oliver; Canela, Miguel-Angel; Castrillo, Laura; Catlett, Timothy; Chen, Olivia; Clark, Michael; Cohn, Brent; Coppock, Alex; Cugueró-Escofet, Natàlia; Curran, Paul G; Cyrus-Lai, Wilson; Dai, David; Valentino Dalla Riva, Giulio; Danielsson, Henrik; Russo, Rosaria de F S M; de Silva, Niko; Derungs, Curdin; Dondelinger, Frank; Duarte de Souza, Carolina; Tyson Dube, B; Dubova, Marina; Mark Dunn, Ben; Adriaan Edelsbrunner, Peter; Finley, Sara; Fox, Nick; Gnambs, Timo; Gong, Yuanyuan; Grand, Erin; Greenawalt, Brandon; Han, Dan; Hanel, Paul H P; Hong, Antony B; Hood, David; Hsueh, Justin; Huang, Lilian; Hui, Kent N; Hultman, Keith A; Javaid, Azka; Ji Jiang, Lily; Jong, Jonathan; Kamdar, Jash; Kane, David; Kappler, Gregor; Kaszubowski, Erikson; Kavanagh, Christopher M; Khabsa, Madian; Kleinberg, Bennett; Kouros, Jens; Krause, Heather; Krypotos, Angelos-Miltiadis; Lavbič, Dejan; Ling Lee, Rui; Leffel, Timothy; Yang Lim, Wei; Liverani, Silvia; Loh, Bianca; Lønsmann, Dorte; Wei Low, Jia; Lu, Alton; MacDonald, Kyle; Madan, Christopher R; Hjorth Madsen, Lasse; Maimone, Christina; Mangold, Alexandra; Marshall, Adrienne; Ester Matskewich, Helena; Mavon, Kimia; McLain, Katherine L; McNamara, Amelia A; McNeill, Mhairi; Mertens, Ulf; Miller, David; Moore, Ben; Moore, Andrew; Nantz, Eric; Nasrullah, Ziauddin; Nejkovic, Valentina; Nell, Colleen S; Arthur Nelson, Andrew; Nilsonne, Gustav; Nolan, Rory; O'Brien, Christopher E; O'Neill, Patrick; O'Shea, Kieran; Olita, Toto; Otterbacher, Jahna; Palsetia, Diana; Pereira, Bianca; Pozdniakov, Ivan; Protzko, John; Reyt, Jean-Nicolas; Riddle, Travis; (Akmal) Ridhwan Omar Ali, Amal; Ropovik, Ivan; Rosenberg, Joshua M; Rothen, Stephane; Schulte-Mecklenbeck, Michael; Sharma, Nirek; Shotwell, Gordon; Skarzynski, Martin; Stedden, William; Stodden, Victoria; Stoffel, Martin A; Stoltzman, Scott; Subbaiah, Subashini; Tatman, Rachael; Thibodeau, Paul H; Tomkins, Sabina; Valdivia, Ana; Druijff-van de Woestijne, Gerrieke B; Viana, Laura; Villesèche, Florence; Duncan Wadsworth, W; Wanders, Florian; Watts, Krista; Wells, Jason D; Whelpley, Christopher E; Won, Andy; Wu, Lawrence; Yip, Arthur; Youngflesh, Casey; Yu, Ju-Chi; Zandian, Arash; Zhang, Leilei; Zibman, Chava; Luis Uhlmann, Eric (2021). Same data, different conclusions: Radical dispersion in empirical results when independent analysts operationalize and test the same hypothesis. Organizational Behavior and Human Decision Processes, 165:228-249.
Schweinsberg, M, Feldman, M, Staub, N, van den Akker, O R, van Aert, R C M, van Assen, M A L M, Liu, Y, Althoff, T, Heer, J, Kale, A, Mohamed, Z, Amireh, H, Venkatesh Prasad, V, Bernstein, A, Robinson, E, Snellman, K, Amy Sommer, S, Otner, S M G, Robinson, D, Madan, N, Silberzahn, R, Goldstein, P, Tierney, W, Murase, T, Mandl, B, Viganola, D, Strobl, C, Schaumans, C B C, Naseeb, C, Mason Garrison, S, Yarkoni, T, Richard Chan, C S, Adie, P, Alaburda, P, Albers, C, Alspaugh, S, Alstott, J, Nelson, A A, Ariño de la Rubia, E, Arzi, A, Bahník, Š, Baik, J, Winther Balling, L, Banker, S, AA Baranger, D, Barr, D J, Barros-Rivera, B, Bauer, M, Blaise, E, Boelen, L, Bohle Carbonell, K, Briers, R A, Burkhard, O, Canela, M A, Castrillo, L, Catlett, T, Chen, O, Clark, M, Cohn, B, Coppock, A, Cugueró-Escofet, N, Curran, P G, Cyrus-Lai, W, Dai, D, Valentino Dalla Riva, G, Danielsson, H, Russo, R D F S M, de Silva, N, Derungs, C, Dondelinger, F, Duarte de Souza, C, Tyson Dube, B, Dubova, M, Mark Dunn, B, Adriaan Edelsbrunner, P, Finley, S, Fox, N, Gnambs, T, Gong, Y, Grand, E, Greenawalt, B, Han, D, Hanel, P H P, Hong, A B, Hood, D, Hsueh, J, Huang, L, Hui, K N, Hultman, K A, Javaid, A, Ji Jiang, L, Jong, J, Kamdar, J, Kane, D, Kappler, G, Kaszubowski, E, Kavanagh, C M, Khabsa, M, Kleinberg, B, Kouros, J, Krause, H, Krypotos, A M, Lavbič, D, Ling Lee, R, Leffel, T, Yang Lim, W, Liverani, S, Loh, B, Lønsmann, D, Wei Low, J, Lu, A, MacDonald, K, Madan, C R, Hjorth Madsen, L, Maimone, C, Mangold, A, Marshall, A, Ester Matskewich, H, Mavon, K, McLain, K L, McNamara, A A, McNeill, M, Mertens, U, Miller, D, Moore, B, Moore, A, Nantz, E, Nasrullah, Z, Nejkovic, V, Nell, C S, Arthur Nelson, A, Nilsonne, G, Nolan, R, O'Brien, C E, O'Neill, P, O'Shea, K, Olita, T, Otterbacher, J, Palsetia, D, Pereira, B, Pozdniakov, I, Protzko, J, Reyt, J N, Riddle, T, (Akmal) Ridhwan Omar Ali, A, Ropovik, I, Rosenberg, J M, Rothen, S, Schulte-Mecklenbeck, M, Sharma, N, Shotwell, G, Skarzynski, M, Stedden, W, Stodden, V, Stoffel, M A, Stoltzman, S, Subbaiah, S, Tatman, R, Thibodeau, P H, Tomkins, S, Valdivia, A, Druijff-van de Woestijne, G B, Viana, L, Villesèche, F, Duncan Wadsworth, W, Wanders, F, Watts, K, Wells, J D, Whelpley, C E, Won, A, Wu, L, Yip, A, Youngflesh, C, Yu, J C, Zandian, A, Zhang, L, Zibman, C & Luis Uhlmann, E 2021, ' Same data, different conclusions : Radical dispersion in empirical results when independent analysts operationalize and test the same hypothesis ', Organizational Behavior and Human Decision Processes, vol. 165, pp. 228-249 . https://doi.org/10.1016/j.obhdp.2021.02.003
Organizational Behavior and Human Decision Processes, 165
Organizational Behavior and Human Decision Processes
Organizational Behavior and Human Decision Processes, 165, 228-249. ACADEMIC PRESS INC ELSEVIER SCIENCE
Schweinsberg, Martin; Feldman, Michael; Staub, Nicola; van den Akker, Olmo R.; van Aert, Robbie C.M.; van Assen, Marcel A.L.M.; Liu, Yang; Althoff, Tim; Heer, Jeffrey; Kale, Alex; Mohamed, Zainab; Amireh, Hashem; Venkatesh Prasad, Vaishali; Bernstein, Abraham; Robinson, Emily; Snellman, Kaisa; Amy Sommer, S.; Otner, Sarah M.G.; Robinson, David; Madan, Nikhil; ... (2021). Same data, different conclusions: Radical dispersion in empirical results when independent analysts operationalize and test the same hypothesis. Organizational behavior and human decision processes, 165, pp. 228-249. Elsevier 10.1016/j.obhdp.2021.02.003
Schweinsberg, Martin; Feldman, Michael; Staub, Nicola; et al; Bernstein, Abraham; Strobl, Barbara (2021). Same data, different conclusions: Radical dispersion in empirical results when independent analysts operationalize and test the same hypothesis. Organizational Behavior and Human Decision Processes, 165:228-249.
- Subjects
-
Crowdsourcing data analysis, Scientific transparency, Research reliability, Scientific robustness, Researcher degrees of freedom, Analysis-contingent results, Psychology (excluding Applied Psychology), Psykologi (exklusive tillämpad psykologi), Department of Informatics, 000 Computer science, knowledge & systems, analysis-contingent results, crowdsourcing data analysis, research reliability, researcher degrees of freedom, scientific robustness, scientific transparency, Organizational Behavior and Human Resource Management, Applied Psychology, 650 Management & public relations, business, and cs_r
- Abstract
-
The project was funded by a research grant from INSEAD and was also supported by the Swiss National Science Foundation under grant number 143411.
In this crowdsourced initiative, independent analysts used the same dataset to test two hypotheses regarding the effects of scientists’ gender and professional status on verbosity during group meetings. Not only the analytic approach but also the operationalizations of key variables were left unconstrained and up to individual analysts. For instance, analysts could choose to operationalize status as job title, institutional ranking, citation counts, or some combination. To maximize transparency regarding the process by which analytic choices are made, the analysts used a platform we developed called DataExplained to justify both preferred and rejected analytic paths in real time. Analyses lacking sufficient detail, reproducible code, or with statistical errors were excluded, resulting in 29 analyses in the final sample. Researchers reported radically different analyses and dispersed empirical outcomes, in a number of cases obtaining significant effects in opposite directions for the same research question. A Boba multiverse analysis demonstrates that decisions about how to operationalize variables explain variability in outcomes above and beyond statistical choices (e.g., covariates). Subjective researcher decisions play a critical role in driving the reported empirical results, underscoring the need for open data, systematic robustness checks, and transparency regarding both analytic paths taken and not taken. Implications for organizations and leaders, whose decision making relies in part on scientific findings, consulting reports, and internal analyses by data scientists, are discussed.
INSEAD
Swiss National Science Foundation (SNSF) European Commission 143411
- Full text View on content provider's site
-
Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey Heer, and Daniel Weld
- Subjects
-
Computer Science - Computation and Language
- Abstract
-
While counterfactual examples are useful for analysis and training of NLP models, current generation methods either rely on manual labor to create very few counterfactuals, or only instantiate limited types of perturbations such as paraphrases or word substitutions. We present Polyjuice, a general-purpose counterfactual generator that allows for control over perturbation types and locations, trained by finetuning GPT-2 on multiple datasets of paired sentences. We show that Polyjuice produces diverse sets of realistic counterfactuals, which in turn are useful in various distinct applications: improving training and evaluation on three different tasks (with around 70% less annotation effort than manual generation), augmenting state-of-the-art explanation techniques, and supporting systematic counterfactual error analysis by revealing behaviors easily missed by human experts.
Comment: ACL 2021, main conference, long paper
-
Jeffrey Heer, Daniel S. Weld, and Tongshuang Wu
- ACM Transactions on Computer-Human Interaction. 26:1-27
- Subjects
-
Human-Computer Interaction, Computer science, Artificial intelligence, business.industry, business, Feature selection, Sentiment analysis, Overfitting, Machine learning, computer.software_genre, computer, Training set, and End user
- Abstract
-
Tools for Interactive Machine Learning (IML) enable end users to update models in a “rapid, focused, and incremental”—yet local—manner. In this work, we study the question of local decision making in an IML context around feature selection for a sentiment classification task. Specifically, we characterize the utility of interactive feature selection through a combination of human-subjects experiments and computational simulations. We find that, in expectation, interactive modification fails to improve model performance and may hamper generalization due to overfitting. We examine how these trends are affected by the dataset, learning algorithm, and the training set size. Across these factors we observe consistent generalization issues. Our results suggest that rapid iterations with IML systems can be dangerous if they encourage local actions divorced from global context, degrading overall model performance. We conclude by discussing the implications of our feature selection results to the broader area of IML systems and research.
- Full text View on content provider's site
-
Michael Correll, Younghoon Kim, and Jeffrey Heer
- Computer Graphics Forum. 38:541-551
- Subjects
-
Computer Graphics and Computer-Aided Design, Human–computer interaction, Computer science, Information visualization, business.industry, and business
- Full text
View/download PDF
13. Communicating with Interactive Articles [2020]
-
Matthew Conlen, Jeffrey Heer, Fred Hohman, and Duen Horng Chau
- Distill. 5
- Subjects
-
World Wide Web and Computer science
-
Tim Althoff, Yang Liu, Jeffrey Heer, and Alex Kale
- Subjects
-
Computer Science - Human-Computer Interaction, Computer Graphics and Computer-Aided Design, Computer Vision and Pattern Recognition, Signal Processing, Software, Data science, Compiler, computer.software_genre, computer, Transparency (graphic), Computer science, Digital subscriber line, Robustness (computer science), Sampling (statistics), Scripting language, Multiverse, and Data modeling
- Abstract
-
Multiverse analysis is an approach to data analysis in which all "reasonable" analytic decisions are evaluated in parallel and interpreted collectively, in order to foster robustness and transparency. However, specifying a multiverse is demanding because analysts must manage myriad variants from a cross-product of analytic decisions, and the results require nuanced interpretation. We contribute Boba: an integrated domain-specific language (DSL) and visual analysis system for authoring and reviewing multiverse analyses. With the Boba DSL, analysts write the shared portion of analysis code only once, alongside local variations defining alternative decisions, from which the compiler generates a multiplex of scripts representing all possible analysis paths. The Boba Visualizer provides linked views of model results and the multiverse decision space to enable rapid, systematic assessment of consequential decisions and robustness, including sampling uncertainty and model fit. We demonstrate Boba's utility through two data analysis case studies, and reflect on challenges and design opportunities for multiverse analysis software.
Comment: submitted to IEEE Transactions on Visualization and Computer Graphics (Proc. VAST)
-
Younghoon Kim and Jeffrey Heer
- Computer Graphics Forum. 37:157-167
- Subjects
-
Computer Graphics and Computer-Aided Design, User interface, Human–computer interaction, and Computer science
- Full text
View/download PDF
-
Alan Borning, Jane Hoffswell, and Jeffrey Heer
- Computer Graphics Forum. 37:537-548
- Subjects
-
Computer Graphics and Computer-Aided Design, Graph Layout, Computer science, and Theoretical computer science
- Full text
View/download PDF
17. Dziban: Balancing Agency & Automation in Visualization Design via Anchored Recommendations [2020]
-
Halden Lin, Dominik Moritz, and Jeffrey Heer
- CHI
- Subjects
-
Human–computer interaction, Recommender system, User intent, Computer science, Visualization, ggplot2, Knowledge base, business.industry, business, Automation, Chart, and Predictability
- Abstract
-
Visualization recommender systems attempt to automate design decisions spanning choices of selected data, transformations, and visual encodings. However, across invocations such recommenders may lack the context of prior results, producing unstable outputs that override earlier design choices. To better balance automated suggestions with user intent, we contribute Dziban, a visualization API that supports both ambiguous specification and a novel anchoring mechanism for conveying desired context. Dziban uses the Draco knowledge base to automatically complete partial specifications and suggest appropriate visualizations. In addition, it extends Draco with chart similarity logic, enabling recommendations that also remain perceptually similar to a provided "anchor" chart. Existing APIs for exploratory visualization, such as ggplot2 and Vega-Lite, require fully specified chart definitions. In contrast, Dziban provides a more concise and flexible authoring experience through automated design, while preserving predictability and control through anchored recommendations.
18. iSeqL [2020]
-
Akshat Shrivastava and Jeffrey Heer
- IUI
- Subjects
-
Sequence labeling, Sequence learning, Artificial intelligence, business.industry, business, Natural language processing, computer.software_genre, computer, Annotation, Computer science, Active learning, Architecture, Text mining, Exploratory data analysis, and Transfer of learning
- Abstract
-
Exploratory analysis of unstructured text is a difficult task, particularly when defining and extracting domain-specific concepts. We present iSeqL, an interactive tool for the rapid construction of customized text mining models through sequence labeling. With iSeqL, analysts engage in an active learning loop, labeling text instances and iteratively assessing trained models by viewing model predictions in the context of both individual text instances and task-specific visualizations of the full dataset. To build suitable models with limited training data, iSeqL leverages transfer learning and pre-trained contextual word embeddings within a recurrent neural architecture. Through case studies and an online experiment, we demonstrate the use of iSeqL to quickly bootstrap models sufficiently accurate to perform in-depth exploratory analysis. With less than an hour of annotation effort, iSeqL users are able to generate stable outputs over custom extracted entities, including context-sensitive discovery of phrases that were never manually labeled.
-
Yang Liu, Jeffrey Heer, and Tim Althoff
- CHI
- Subjects
-
Computer Science - Human-Computer Interaction, Decision process, Interpretability, End-to-end principle, Interview study, Data collection, Research studies, Computer science, Decision points, and Data science
- Abstract
-
Drawing reliable inferences from data involves many, sometimes arbitrary, decisions across phases of data collection, wrangling, and modeling. As different choices can lead to diverging conclusions, understanding how researchers make analytic decisions is important for supporting robust and replicable analysis. In this study, we pore over nine published research studies and conduct semi-structured interviews with their authors. We observe that researchers often base their decisions on methodological or theoretical concerns, but subject to constraints arising from the data, expertise, or perceived interpretability. We confirm that researchers may experiment with choices in search of desirable results, but also identify other reasons why researchers explore alternatives yet omit findings. In concert with our interviews, we also contribute visualizations for communicating decision processes throughout an analysis. Based on our results, we identify design opportunities for strengthening end-to-end analysis, for instance via tracking and meta-analysis of multiple decision paths.
20. Falcon: Balancing Interactive Latency and Resolution Sensitivity for Scalable Linked Visualizations [2019]
-
Dominik Moritz, Bill Howe, and Jeffrey Heer
- Abstract
-
We contribute user-centered prefetching and indexing methods that provide low-latency interactions across linked visualizations, enabling cold-start exploration of billion-record datasets. We implement our methods in Falcon, a web-based system that makes principled trade-offs between latency and resolution to optimize brushing and view switching times. To optimize latency-sensitive brushing actions, Falcon reindexes data upon changes to the active view a user is brushing in. To limit view switching times, Falcon initially loads reduced interactive resolutions, then progressively improves them. Benchmarks show that Falcon sustains real-time interactivity of 50fps for pixel-level brushing and linking across multiple visualizations with no costly precomputation. We show constant brushing performance regardless of data size on datasets ranging from millions of records in the browser to billions when connected to a backing database system.
Catalog
Books, media, physical & digital resources
Guides
Course- and topic-based guides to collections, tools, and services.
1 - 20
Next