People of ACM - Shota Yamanaka
September 23, 2025
Why did LY Corporation decide to start a human-computer interaction team? What is the team working on now?
Previously, our corporation did not have a dedicated research team specializing in human-computer interaction (HCI). Instead, several researchers applied their expertise individually to solve various challenges across our departments. Since there were many challenges that could be addressed with HCI methodologies, these individual researchers collaborated directly with different teams, such as the development team, for our weather forecast application. As these collaborations resulted in multiple published papers and the number of researchers with an HCI background grew, the corporation decided to formally establish a team to explicitly invest in and strengthen our HCI research capabilities.
Today, the members of our HCI team are advancing research by leveraging their unique skills in collaboration with several universities. Current projects include analyzing users’ eye movements during PC operation and developing smartphone accessibility tools using 3D printers. We also actively collaborate with other teams within LY Research. For example, we are working with the security team to investigate user authentication methods that are both secure and convenient, and we are co-authoring papers with big-data researchers and engineers by analyzing large-scale datasets accumulated by the corporation over many years.
Within the field of graphical user interfaces, you are known for your work in modeling motor performance. What is an example of how modeling human motor performance can improve graphical interfaces? What is a significant recent innovation in this area?
Classically, Fitts’ law has been used to estimate the time required to point at targets like icons and hyperlinks. This enables the design of graphical user interfaces (GUIs) that minimize task time. In recent years, the focus has shifted to be more inclusive of not just speed, but also how accurately users can perform tasks. Much of my work has addressed this, leading to several published papers on models for estimating the success rate of tapping targets on smartphone screens. Furthermore, based on one of these models, I have publicly released design-facilitation tools that can estimate tap success rates for web pages and application interfaces.
A significant recent innovation I am watching closely is the application of more advanced machine learning methods, such as Bayesian hierarchical modeling and reinforcement learning. I have particularly high expectations for reinforcement learning as a tool from an industrial perspective. This is because reinforcement learning allows us to create agents that can operate a GUI and perform a multitude of tasks at scale. For example, evaluating a pre-release app’s interface currently requires costly user testing. If we could instead have multiple agents with different characteristics, such as simulating users of different ages or operating the interface to discover areas for improvement, it would become an incredibly powerful tool for companies.
In a paper published this year “Examination of User Classification Using Pre-Tasks in Web-Based Experiments,” you (along with co-authors Takaya Miyama and Satoshi Nakamura) explored how the quality of experimental data can be compromised when using crowdsourcing. Will you explain how graphical user interfaces play a role here?
In recent studies on GUIs, I collected data from over 100 crowdworkers. I often encountered a common issue where some of the data appeared to be from participants who were not performing the tasks diligently. This is a well-known problem in other crowdsourced tasks like data labeling or short-text writing. To address this, we investigated whether a simple, interactive GUI task could help identify diligent participants. Specifically, we asked workers to adjust an image to a specific size (e.g., 5 centimeters on a side) using a pinch gesture.
Our key finding was that the workers who performed this preliminary task accurately also followed instructions more precisely in the subsequent main task. In the HCI field, while the importance of securing a sufficient sample size is now widely recognized, improving data quality is equally critical. My belief is that properly designed GUI-based tasks make this possible, and it is a challenge that more researchers should address.
One trend in user interfaces (UI) and user experiences (UX) is using air gestures, where people wouldn’t need to touch screens. Do you see this technology growing?
I predict that air gestures will spread not as a replacement for existing interaction methods, but as an alternative for achieving specific purposes. This is because, although it has been over a decade since smartphones and tablets became widespread, mice and touchpads are still the primary tools for office work. Similarly, even as air gesture recognition becomes more accurate and the necessary sensors become commonplace, I believe they will be used alongside our current input methods.
In the context of HCI research, studies on air gestures have often emphasized benefits like speed and reduced physical burden. From my perspective, however, performing fine-grained operations with air gestures is still difficult. This means the importance of how accurately users can perform tasks will likely be even greater than it is for touch or mouse interactions. Conversely, this suggests that applications where a coarse operation is sufficient to achieve a goal (e.g., swiping left or right in the air above a smartphone to navigate web pages) might be a very appropriate use case.
What is another trend in graphic user interfaces that will be especially impactful in the near future?
That would undoubtedly be the emergence of large language models (LLMs). I recently read several papers that propose to use multimodal LLMs for designing GUIs. Furthermore, LLMs that can understand and operate GUIs are also appearing, such as Claude’s Computer Use. Beyond just creating mockups, many models and libraries are being released that can significantly improve GUI layouts and color schemes.
While the field has not yet reached the point of fully automating GUI creation with LLMs, I believe it will become possible to implement truly usable GUIs with minimal human intervention by combining them with the reinforcement learning agent-based testing I mentioned earlier. For industry, I expect LLMs to be useful in many processes, including GUI design and implementation, pre-release testing and revision, and post-release updates based on user feedback. This should be beneficial for both sides: companies will see cost reductions, and users will get more easy-to-use web pages and apps.
Shota Yamanaka is a Senior Chief Researcher at LY Corporation Research (formerly Yahoo Japan Corporation). He founded LY Corporation Research’s Human-Computer Interaction team and still serves as its director. Yamanaka’s research interests include human-computer interaction, graphical user interfaces, and human-performance modeling.
This year, Yamanaka received the IPSJ/ACM Award for Early Career Contributions to Global Research. He was recognized for theoretical and empirical modeling for understanding human motor behaviors in graphical user interfaces. His papers have been presented at numerous conferences including CHI, UIST, UbiComp, ISS, DIS, as well as the IPSJ Interaction Symposium.