title: UX for Language User Interfaces
sources: https://www.youtube.com/watch?v=mf-95Df2-A8
media_link: https://www.youtube.com/watch?v=mf-95Df2-A8
contentPublished:2023-10-03noteCreated:2025-06-23description: Large language models unlock new user interaction design patterns based on language user interfaces (LUIs). But though these patterns are new, they are not i...tags:- clippings
- video
takeaways:subjects:-"[[LUI]]"Status:publish:trueYoutube_Duration:55:39
Description
Large language models unlock new user interaction design patterns based on language user interfaces (LUIs). But though these patterns are new, they are not immune to the principles of user experience design.
In this talk, Charles Frye walks through principles for, emerging patterns and anti-patterns of, and case studies in the design of user experiences supported via a natural language interface.
22:38 Key Finding: Importance of being in the background and easy to dismiss.
23:00 Users have "aha" moments (saving 20 minutes) that drive interest in learning to use it better.
23:21 Importance of A/B testing with the right metrics and allowing for easy dismissal.
23:39 Question on Models Used (Niraj): OpenAI models, early access helped.
23:57 AI Researcher Bias towards Accuracy: Found that larger, slower models with higher accuracy (95% acceptance) had negative impacts on other metrics.
24:30 Latency was More Critical: Fast latency allows more "shots on goal," leading to higher total value delivered even with less intelligent models.
24:58 Result: Highly successful product (Copilot) built via careful design process.
25:36 Current Focus: Developer tooling, web applications (easier).
25:48 More Interesting Problems: Long-horizon applications like robotics.
26:03 Problem for Useful Robots: Usability in general situations.
26:22 Getting Humans to Transfer Knowledge to Robots (Polka Agarwal).
26:35 Current Robot Interfaces: Text interface, traditional application interface (e.g., Spot, self-driving cars, Roomba) - not flexible or easy to use.
27:06 Research on Language-Directed Robots: Saying "bring me a drink and a snack" and the robot figures it out.
27:50 Challenges for Robot LUIs: Multimodality, on-device/network inference, safety, reliability.
28:27 2-year timeframe: Designing a good interactive language interface for robots is a key problem.
36:13 Super fast inference needed for real-time conversation.
36:55 Running large LLMs on devices like Alexa is resource-intensive (may change with time).
37:29 Difficulty turning human speech audio into high-quality text input, especially in noisy environments.
38:15 Q: Are autonomous agents the next big thing?
38:32 A: Definition of an Agent: Can pursue goals in an environment, often involving tool use, reasoning, planning, and long-term memory.
39:25 Challenges for Agents: Tool use, memory, and planning are fiddly, leading to low reliability, especially over multiple turns.
40:02 Tool use has improved recently (GPT-4, 3.5). Retrieval/memory and planning are being worked on.
40:50 Short term: May see a "cresting wave" where people get disillusioned with agents.
41:09 Longer term: Expect reliability to improve for driving agents with LLMs.
41:38 Follow-up: Can one company develop all agent components or will it be modular? Probably modular, but core providers can integrate via pre-training/fine-tuning.
43:10 Follow-up: Possible to train LLM agents? Yes, but it's a hard training problem (RL policies, rollouts).
43:36 Q: How much attention is paid to the model side (inputs, transformation, model awareness of interface) when designing LUIs?
43:59 A: Currently, models are text-only, so the model and user sides are somewhat split.
44:24 Multimodal models (seeing images) could change this, allowing the model to see what the user sees.
45:06 Latency challenges with very low-latency data like eye-tracking. Sharing the screen with a multimodal model is more likely in the near future.
46:00 Thinking about LLMs: Text prediction vs. Universal simulator framework.
47:03 Q: Where does an onboarding chat service fit on the latency and accuracy scale?
47:29 A: Latency: Not super fast like autocomplete, but not slow (competing with human response times). Similar to Stack Overflow in IDE.
48:03 Accuracy: Needs to be fairly high, as initial impressions are crucial for user retention.
48:36 Ways to improve accuracy: Narrow the scope of the chat, extensive user testing to fix bugs in the defined scope.
49:33 Q: Regarding Paradolia and frustration: Is it okay to aim for human-like behavior if the model is good enough?
49:57 A: If human-like affordances can be delivered, human-like signifiers are fine.
50:20 Challenge: Human affordances are very difficult to provide perfectly (e.g., ChatGPT forgetting context causes frustration).
50:50 Final Resolution: Develop a new design vocabulary for LUIs that matches their actual cognition, rather than classical machine or human signifiers. This will take time.
51:31 In the meantime: Safer choice is to lean towards machine-style signification and be defensive in UI construction.
52:00 Signifying humanity brings "baggage" and "poorly maintained dependencies."
52:28 Q: Copilot: Users preferred faster, mediocre help over slower, excellent help?