Daniel Lyons' Notes

UX for Language User Interfaces

Description

Large language models unlock new user interaction design patterns based on language user interfaces (LUIs). But though these patterns are new, they are not immune to the principles of user experience design.

In this talk, Charles Frye walks through principles for, emerging patterns and anti-patterns of, and case studies in the design of user experiences supported via a natural language interface.

Join BuzzRobot AI Slack community to connect with fellow data scientists, ML engineers and researchers: https://buzzrobot.slack.com/join/shared_invite/zt-1zsh7k8pd-iMu_M8bUxIK3pOJgqJgCRQ#/shared-invite/email

My Notes

Here is a markdown outline of the tech talk transcript:

00:00 Introduction: Talk Overview and Thesis

  • 00:11 Q&A Structure: Text chat during talk, audio chat at the end.
  • 00:19 TLDR: UX for LUIs (Language User Interfaces).
  • 00:23 Key Thesis: LLMs unlock LUIs, which need good design.
  • 00:41 Design Anti-patterns are emerging first.
  • 00:50 Looking at GitHub Copilot design process as a master class.
  • 01:05 Future problem: LUI design for robots (3-5 year timescale).

01:22 LLMs Unlock Language User Interfaces (LUIs)

  • 01:24 LUI: A new style of user interface.
  • 01:33 History of User Interfaces:
    • 01:35 1970s: Terminal User Interface (TUI) - text-based, required specialization.
    • 02:08 1980s: Graphical User Interfaces (GUI) - image-based, spatial interaction, made PCs popular.
    • 02:41 1990s: World Wide Web - GUI in browser, focused on links.
    • 02:59 2000s: Mobile User Interfaces (MUI) - physical interaction (touch, accelerometers, position), graphical.
  • 03:26 2020s: Language User Interface (LUI) - natural language description of desired outcome.
    • 03:49 Example: Adept - typing natural language to achieve tasks like finding a house.
  • 04:08 Naming the new interface: "Language Interface" liked by Sam Altman.
  • 04:28 Previous attempts at Language Interfaces (back to the 1960s):
    • 04:41 SHRDLU from Stanford (virtual robot following text commands).
    • 04:56 Early search engines.
  • 05:01 Why LUIs are possible now: Large Language Models (LLMs) can understand language by predicting the next word.
    • 05:46 LLMs have learned about various domains (Python, chemistry, etc.) enabling LUIs.

06:01 Language User Interfaces Need Good Design

  • 06:02 Design is a critical piece for effective LUIs.
  • 06:18 "Dancing Bear Phase": Current LUI applications are like surprising but not yet good performances (Alan Cooper's analogy).
  • 07:00 Moving past the Dancing Bear Phase: Need human-centered design.
    • 07:11 Recommended Book: "The Design of Everyday Things" by Don Norman.

07:25 Key Design Principles for LUIs

  • 07:28 Align Signifiers and Affordances: What the system appears to do vs. what it actually does.
    • 07:58 Example: Push/Pull doors - misaligned signifiers and affordances cause frustration.
    • 08:36 Convention as Signifiers: Green button (go), Red button (stop), Blue text (link) in web design.
    • 09:18 Design patterns help communication without reliance on documentation.

09:40 LUI Design Anti-Patterns

  • 09:45 Focusing on anti-patterns because they are emerging first.
  • 09:57 Example: Interaction with a chatbot therapist (Eliza).
    • 10:43 Eliza (1960s chatbot) designed to mimic Rogerian Psychotherapy by saying as little as possible.
    • 11:02 Humans fill in the missing intelligence/humanity when interacting with simple systems.
  • 11:35 paradolia: Seeing patterns (like faces) where they don't exist, applying to perceived cognition in AI.
  • 12:04 ⭐ Avoid triggering Paradolia in LUI design: Suggest machine-like qualities.
    • 12:24 Example: Copilot's ghostly text resembles phone autocomplete, signifying a machine.
    • 12:45 Example: Chatbot Alice trying to pretend to be a person (blinking, facial movements) signifies humanity it doesn't possess.
    • Key insight: Set expectations. Don't over promise by personifying your product.
  • 13:13 Concrete Suggestions to Suggest Machinist:
    • 13:26 Choose a machine-like name (ChatGPT vs. Claude).
    • 13:42 System referring to itself as a system or "it" rather than human pronouns.
    • 14:12 More textual, less voice-like interfaces suggest machine.
    • 14:28 Use mono space fonts to signify code/machine identifiers.
    • 14:48 If using voice, leaning into synthetic/computer-like voice rather than voice cloning suggests machine.
  • 15:24 Frustration with Systems like Alexa/Siri: Human-seeming voices but lacking human intelligence.
  • 15:53 Importance of setting expectations with users carefully (Eugene Wei).

16:24 Good LUI Design: GitHub Copilot Case Study

  • 16:31 GitHub Copilot: Successful LLM-powered product integrated with VS Code.
  • 16:55 Creators shared insights into their design process.
    • 17:09 Nat Friedman's talk at Scale AI recommended.
  • 17:50 Initial Product Ideas (MVPs built in a weekend):
    • 17:58 PR bot (suggesting improvements).
    • 18:22 Stack Overflow in your IDE (answering code questions in context).
    • 18:43 Better autocomplete (smarter text completion).
  • 19:18 Different Requirements for Each Idea:
    • 19:22 PR bot: High accuracy, less demanding on speed.
    • 19:40 Autocomplete: High speed (faster than typing), can be wrong sometimes.
  • 20:00 User Research Revealed Primary Challenge: Model accuracy was hard to get high enough for PR bot or chat.
  • 20:20 Long Tail of Value: Occasional huge time savers, often mediocre or time wasters.
  • 20:40 Doubled Down on Autocomplete: Easier to meet latency requirements with engineering. This led to Copilot.
  • 21:03 Second Phase: Months of painstaking UX research.
    • 21:16 A/B Testing Features.
    • 21:24 Picking Good Metrics:
      • 21:35 Acceptance of completions (measures usefulness/speed).
      • 22:00 Product stickiness and 30-day retention (harder metric, shows long-term value).
  • 22:38 Key Finding: Importance of being in the background and easy to dismiss.
    • 23:00 Users have "aha" moments (saving 20 minutes) that drive interest in learning to use it better.
  • 23:21 Importance of A/B testing with the right metrics and allowing for easy dismissal.
  • 23:39 Question on Models Used (Niraj): OpenAI models, early access helped.
  • 23:57 AI Researcher Bias towards Accuracy: Found that larger, slower models with higher accuracy (95% acceptance) had negative impacts on other metrics.
  • 24:30 Latency was More Critical: Fast latency allows more "shots on goal," leading to higher total value delivered even with less intelligent models.
  • 24:58 Result: Highly successful product (Copilot) built via careful design process.

25:26 Future Horizon: LUI Design for Robots

  • 25:36 Current Focus: Developer tooling, web applications (easier).
  • 25:48 More Interesting Problems: Long-horizon applications like robotics.
  • 26:03 Problem for Useful Robots: Usability in general situations.
  • 26:22 Getting Humans to Transfer Knowledge to Robots (Polka Agarwal).
  • 26:35 Current Robot Interfaces: Text interface, traditional application interface (e.g., Spot, self-driving cars, Roomba) - not flexible or easy to use.
  • 27:06 Research on Language-Directed Robots: Saying "bring me a drink and a snack" and the robot figures it out.
  • 27:50 Challenges for Robot LUIs: Multimodality, on-device/network inference, safety, reliability.
  • 28:27 2-year timeframe: Designing a good interactive language interface for robots is a key problem.

28:40 Wrap-up

  • 28:45 Summary:
    • LUIs are a new way to interact with computers unlocked by LLMs.
    • LUIs badly need design.
    • Anti-patterns are emerging quickly.
    • Learn design process from examples like Copilot.
    • Opportunities exist in the near and long term (next decade).
  • 29:20 Talk originally given at Full Stack Large Language Models Bootcamp.
  • 29:44 Joint talk with Sergey Kariev (influenced thinking on design).
  • 30:04 Bootcamp covers engineering, operations, project building, etc., not just design.

30:19 Q&A

30:32 Q: Is there an ontology of LUI types/patterns?

  • 30:44 A: Ontology is still emerging, not well-defined like web UI patterns.
  • 31:00 Displaying structured JSON output is a particularly challenging UI problem.
  • 31:46 Patterns identified (cut for time from original talk):
    • 31:57 Click-to-complete (deprecated, e.g., OpenAI Playground).
    • 32:35 Autocomplete (Copilot style).
    • 32:48 Command Palette (Replit, Notion).
    • 33:00 One-on-one Chat (most common now, inspired by ChatGPT).
  • 33:30 Distinction: LLM as the product vs. LLM enhancing a product. Most of the talk focused on the latter.

34:06 Q: Are there Apple/Google-like design guidelines for LUIs?

  • 34:19 A: Yes, but not as developed as for web/mobile.
  • 34:50 Microsoft and Google may have written about this.

35:20 Q: Does prior user experience work in voice interfaces (Siri, Alexa) shape LUI design?

  • 35:42 A: Voice interfaces were less successful partly due to lack of intelligence to handle language. LLMs could help.
  • 36:10 Blockers for Voice LUIs:
    • 36:13 Super fast inference needed for real-time conversation.
    • 36:55 Running large LLMs on devices like Alexa is resource-intensive (may change with time).
    • 37:29 Difficulty turning human speech audio into high-quality text input, especially in noisy environments.

38:15 Q: Are autonomous agents the next big thing?

  • 38:32 A: Definition of an Agent: Can pursue goals in an environment, often involving tool use, reasoning, planning, and long-term memory.
  • 39:25 Challenges for Agents: Tool use, memory, and planning are fiddly, leading to low reliability, especially over multiple turns.
  • 40:02 Tool use has improved recently (GPT-4, 3.5). Retrieval/memory and planning are being worked on.
  • 40:50 Short term: May see a "cresting wave" where people get disillusioned with agents.
  • 41:09 Longer term: Expect reliability to improve for driving agents with LLMs.
  • 41:38 Follow-up: Can one company develop all agent components or will it be modular? Probably modular, but core providers can integrate via pre-training/fine-tuning.
  • 43:10 Follow-up: Possible to train LLM agents? Yes, but it's a hard training problem (RL policies, rollouts).

43:36 Q: How much attention is paid to the model side (inputs, transformation, model awareness of interface) when designing LUIs?

  • 43:59 A: Currently, models are text-only, so the model and user sides are somewhat split.
  • 44:24 Multimodal models (seeing images) could change this, allowing the model to see what the user sees.
  • 45:06 Latency challenges with very low-latency data like eye-tracking. Sharing the screen with a multimodal model is more likely in the near future.
  • 46:00 Thinking about LLMs: Text prediction vs. Universal simulator framework.

47:03 Q: Where does an onboarding chat service fit on the latency and accuracy scale?

  • 47:29 A: Latency: Not super fast like autocomplete, but not slow (competing with human response times). Similar to Stack Overflow in IDE.
  • 48:03 Accuracy: Needs to be fairly high, as initial impressions are crucial for user retention.
  • 48:36 Ways to improve accuracy: Narrow the scope of the chat, extensive user testing to fix bugs in the defined scope.

49:33 Q: Regarding Paradolia and frustration: Is it okay to aim for human-like behavior if the model is good enough?

  • 49:57 A: If human-like affordances can be delivered, human-like signifiers are fine.
  • 50:20 Challenge: Human affordances are very difficult to provide perfectly (e.g., ChatGPT forgetting context causes frustration).
  • 50:50 Final Resolution: Develop a new design vocabulary for LUIs that matches their actual cognition, rather than classical machine or human signifiers. This will take time.
  • 51:31 In the meantime: Safer choice is to lean towards machine-style signification and be defensive in UI construction.
  • 52:00 Signifying humanity brings "baggage" and "poorly maintained dependencies."

52:28 Q: Copilot: Users preferred faster, mediocre help over slower, excellent help?

  • 52:48 A: Several factors:
    • 52:55 Background integration makes errors less frustrating.
    • 53:14 Fast latency allows more "shots on goal" and response to minor user actions.
    • 54:00 "Lottery ticket phenomenon": Frequent small interactions with occasional large rewards (huge time saves) are appealing.
    • 54:34 Possible with Law of Large Numbers (lots of interactions) and deemphasizing failures (sliding into background).

Transcript

UX for Language User Interfaces
Interactive graph
On this page
Description
My Notes
00:00 Introduction: Talk Overview and Thesis
01:22 LLMs Unlock Language User Interfaces (LUIs)
06:01 Language User Interfaces Need Good Design
07:25 Key Design Principles for LUIs
09:40 LUI Design Anti-Patterns
16:24 Good LUI Design: GitHub Copilot Case Study
25:26 Future Horizon: LUI Design for Robots
28:40 Wrap-up
30:19 Q&A
30:32 Q: Is there an ontology of LUI types/patterns?
34:06 Q: Are there Apple/Google-like design guidelines for LUIs?
35:20 Q: Does prior user experience work in voice interfaces (Siri, Alexa) shape LUI design?
38:15 Q: Are autonomous agents the next big thing?
43:36 Q: How much attention is paid to the model side (inputs, transformation, model awareness of interface) when designing LUIs?
47:03 Q: Where does an onboarding chat service fit on the latency and accuracy scale?
49:33 Q: Regarding Paradolia and frustration: Is it okay to aim for human-like behavior if the model is good enough?
52:28 Q: Copilot: Users preferred faster, mediocre help over slower, excellent help?
Transcript