As per my last post on Substack, I had the great good fortune to spend my first few formative years in digital product design and development around the turn of the century (yes, I’m old very experienced!) at the seminal consultancy Cooper, founded and led by the brilliant Alan Cooper. Alan taught us to center our work on serving the goals of personas — the actual, targeted users. Target personas for most software usually don’t have the same mindset and skills as a software developer or an interaction designer. So, we learned to design solutions that would bring the software’s behavior closer to the mental model of the actual user, instead of embodying the implementation model of the underlying, engineered system.
In those days, and until quite recently, software needed to provide a graphical user interface (UI) to its users. UIs offer a set of graphical inputs and outputs, along with content. You know the drill: a dialog box that displays a set of settings to be configured, a drop-down menu with pre-defined choices, a checkbox, and a button labeled “Submit”. Over time, of course, UIs have become more dynamic in terms of their visual pattern languages and improvements like animations for state transitions. We have also seen a rise in alternative input mechanisms such as gesture or voice, particularly in mobile contexts or services.
Natural language inputs are awesome
From the start of my career as interaction designer at Cooper, we frequently sought to provide natural language inputs to better serve people’s mental models. Yet we still had to use the constrained set of capabilities built into the software and its databases. We often specificied natural-language tools for search interfaces. These usually took the form of a smart set of inputs comprising logically-linked drop-down menus and free-text entry fields that when linked together resulted in a complete sentence. To achieve this, we also had to actively advocate for engineers to build the relational databases on the backend that could support this innovative, user-centered UI pattern.
Given my deep roots in Cooper’s human-oriented design principles, I can hardly overstate my excitement at the explosion of generative AI tools (you know, LLM/ML systems such as ChatGPT, Claude, Pi, Perplexity, etc.) that can accept highly unstructured natural language inputs — and then respond with natural language outputs that make the system appear capable of conducting a meaningful conversation. (Of course, many of us have been using natural language text strings with Google’s search box for ages, but we still got back a structured list of things that required detailed parsing.)
Being humans, we’re highly skilled at having conversations. After all, we’re trained from birth to interact with other people who share enough of our language (whether spoken, written, or gestural) to make sense through dialog. Conversations are a powerful way for the entities involved to pass along information, develop shared understanding, clarify the root of matters, and make plans. If those conversations are recorded and shared, whether video or audio, they can also provide similar value for those who observe/hear them. Witness the popularity of podcasts!
This new mode of interaction will continue to transform access to information for people the world over. I truly believe we’ll look back at these systems arising starting in late 2022 as an inflection point for software design and development. Those old-school dialog boxes where the software displays a constrained set of options for the user to submit their choices are going to become “dialog spaces” where users can freely express their needs, preferences, and choices using their own words. This shift is happening in the mainstream already, such as in the customer service bot space for example. When conceived and applied more broadly, AI technology will be a quantum leap forward in providing people with the ability to interact effectively with software-driven systems.
But, how well do conversations support knowledge creation & capture?
What’s less clear to me is how the information created and embodied within such conversations can serve as a useful, meaningful repository of knowledge that a person can re-access and utilize efficiently. Will we always have to go through a back-and-forth conversation to elicit information via prompts? I certainly hope not!
Take for example the huge potential for AI/LLMs in healthcare to democratize our own access to our personal health records. The complex domain of medical care is relatively impenetrable to non-clinicians (not least because of the silo-ed nature of information being held in disparate EHRs).
Now imagine you’re experiencing acute back pain and going to your personal AI — who’s across all of your personal health records in an ideal world. You would be able to ask it: “When did I last have that back problem, I think it started with an ‘s’…” to which it responds: “You were treated for sciatica, a problem that manifests with pain that radiates into the legs, about 3 years ago.”
And then…what? Does it offer to display that specific medical record from your past? Does it offer a review of the previously prescribed treatment? Does the system have any inkling that this question might mean this health problem is now recurring? While a human interlocutor could leap to all of these ideas as a next step, they yet need to be programmed into the software. We don’t necessarily want to put that next-step burden on us to have to drill into that information like a miner seeking to extract a gem from the surrounding rock — after all, we’re dealing with a real-life health problem.
Furthermore, while a linear back-and-forth conversation is a type of communication that can produce valuable results, it’s not an efficient way to store large bodies of inter-related information — nor is it a particularly efficient way to specify highly complex or contextual inputs.
Let’s not find ourselves forced to conduct linear conversations that we have to drive forward ourselves as the interaction model for every task and need!
We must expand on the potential to create novel AI-enabled conversational experiences which can transform linear conversations into more actionable and accessible information spaces. While we’re leveraging this new and welcome input mechanism for many needs, product leaders and designers must keep innovating on their outputs and next steps so that we can make the software we build behave in ever more contextually-appropriate, human-centered ways.
AI is barely there when it comes to longitudinal awareness of ourselves as humans across contexts1:
Stay tuned for a follow-up post examining various ways the current leading AI systems handle their end of the dialog.
Oh and, meanwhile: strap on your life vest for Election Day 2024 in America! 😬
Lizz at Devise is a publication exploring the space of design and product management through the lens of heart-centered values and teamwork. Thanks for being here! Lizz leads Devise Consulting, where we do cool stuff involving Product Management, UX Design, Strategy, Customer Research, Team Development, and Product Operations. We’re eager to work with teams exploring AI opportunities in healthcare. If you’d ever like to chat, please schedule some time for us to discuss what’s on your mind: chat with Lizz.
Full disclosure: this post was originally published on my Medium channel in mid-2023, so Pi may have become more aware of human inputs across channels by now.