User Study Protocols

Study Information
1. Hypotheses
Design Plan
Sampling Plan
1. Data collection procedures
2. Sample size
Variables
1. Manipulated Variables
2. Measured variables
Analysis Plan

Study Information

Hypotheses

Main study hypotheses:

H1: The user can more clearly interpret the model’s bias trough conversational explanations.
H2: LLM chatbot-based model steering is better than manual model steering approaches for reducing bias issues.
H3: Conversational explanations provide better causal reasoning of model bias than visual explanations.

Design Plan

Study type

Experiment - A researcher randomly assigns UIs to study participants.

Study Design

We propose to conduct a between-subject user study with 20 participants to explore the comparative significance of manual and LLM-based interactions with the model. Both groups will have 10 participants each. The participants will be recruited trough social media and the study will we conducted trough online surveys.

We plan to provide detailed instructions about the objective of this study and the roles, responsibilities and rights for the study participants. Next, we plant to collect their informed consent and demographic information. Next, trough a demo we plan to introduce our system. The demo aims to describe the usage scenario and explain the role of the prediction model, explanation dashboard and the configuration mechanism.

Next, the participants will self-explore the system. They will have to complete data-configuration tasks. They will be allowed to configure the training data multiple times with the goal of minimizing the bias of the model.

Next, the participants will be given a post-task questionnaire using which we could compare the different types of data configuration mechanisms. To analyse their objective understanding of the system, we plan to adopt the method followed by Kulesza et al. and formulated a mental model questionnaire.

Randomization

Participants will be randomly assigned to either the manual configuration group or the LLM-based configuration group for our between-subject user study.

Sampling Plan

Data collection procedures

Our mixed-methods study aims to collect the following data to compare between the manual and the automated configurations:

Scores to evaluate objective understanding of users on a 5-point Likert scale.
Perceived understandability responses on a 5-point Likert scale.
Perceived workload responses on a 10-point Likert scale.

The online study is expected to take 45 minutes to 1 hour for each participant. Participants have to be at least 18 years old. The study will be conducted in April.

Sample size

We plan to recruit 20 participants for our study. Since it is a between-subject study having two groups, in each group we aim to get at least 10 participants.

Variables

Manipulated Variables

Only the type of the data configuration approach is varied in our experiments. So, we have two treatment groups:

Manual configuration group
LLM-based configuration group

Measured variables

We will record the following measures from our online study:

Scores to evaluate objective understanding of users on a 5-point Likert scale.
Perceived understandability responses on a 5-po Likert scale.
Perceived workload responses on a 10-point Likert scale.

Analysis Plan

Statistical Models

Inference Criteria

We will use the standard significance level of 0.05 (p = 0.05) criteria for determining if the results are statistically significant or not.

Data Exclusion

Only 100% completed responses will be included in the data analysis. Incomplete answers and participants who fail to complete the data configuration task will be excluded.

Missing Data

We have added validations to ensure that there are no missing records. However, if missing data is observed we will drop those records.