User Study Protocols
Table of contents
Study Information
Hypotheses
Main study hypotheses:
- H1: The user can more clearly interpret the model’s bias trough conversational explanations.
- H2: LLM chatbot-based model steering is better than manual model steering approaches for reducing bias issues.
- H3: Conversational explanations provide better causal reasoning of model bias than visual explanations.
Design Plan
Study type
Experiment - A researcher randomly assigns UIs to study participants.
Study Design
We propose to conduct a between-subject user study with 20 participants to explore the comparative significance of manual and LLM-based interactions with the model. Both groups will have 10 participants each. The participants will be recruited trough social media and the study will we conducted trough online surveys.
We plan to provide detailed instructions about the objective of this study and the roles, responsibilities and rights for the study participants. Next, we plant to collect their informed consent and demographic information. Next, trough a demo we plan to introduce our system. The demo aims to describe the usage scenario and explain the role of the prediction model, explanation dashboard and the configuration mechanism.
Next, the participants will self-explore the system. They will have to complete data-configuration tasks. They will be allowed to configure the training data multiple times with the goal of minimizing the bias of the model.
Next, the participants will be given a post-task questionnaire using which we could compare the different types of data configuration mechanisms. To analyse their objective understanding of the system, we plan to adopt the method followed by Kulesza et al. and formulated a mental model questionnaire.
Randomization
Participants will be randomly assigned to either the manual configuration group or the LLM-based configuration group for our between-subject user study.
Sampling Plan
Data collection procedures
Our mixed-methods study aims to collect the following data to compare between the manual and the automated configurations:
- Scores to evaluate objective understanding of users on a 5-point Likert scale.
- Perceived understandability responses on a 5-point Likert scale.
- Perceived workload responses on a 10-point Likert scale.
The online study is expected to take 45 minutes to 1 hour for each participant. Participants have to be at least 18 years old. The study will be conducted in April.
Sample size
We plan to recruit 20 participants for our study. Since it is a between-subject study having two groups, in each group we aim to get at least 10 participants.
Variables
Manipulated Variables
Only the type of the data configuration approach is varied in our experiments. So, we have two treatment groups:
- Manual configuration group
- LLM-based configuration group
Measured variables
We will record the following measures from our online study:
- Scores to evaluate objective understanding of users on a 5-point Likert scale.
- Perceived understandability responses on a 5-po Likert scale.
- Perceived workload responses on a 10-point Likert scale.
Analysis Plan
Statistical Models
Inference Criteria
We will use the standard significance level of 0.05 (p = 0.05) criteria for determining if the results are statistically significant or not.
Data Exclusion
Only 100% completed responses will be included in the data analysis. Incomplete answers and participants who fail to complete the data configuration task will be excluded.
Missing Data
We have added validations to ensure that there are no missing records. However, if missing data is observed we will drop those records.