PhD seminar on Human-centered AI, Robotics, and Interactive Technologies

A discussion involving 15 PhD students, organized 7-8 Nov 2022 at KTH

The Creative-Ai (AI and the Artistic Imaginary – WASP-HS, project team is organizing this PhD seminar, which takes place 7-8 November 2022 at KTH. The idea is to provide an opportunity for Ph.D. students to present their planned, ongoing, and publishable work. The seminar is an opportunity to get feedback from each other and from the workshop discussants. This workshop also provides a platform for developing novel ideas or turning these ideas into research sketches.


Monday 7 November (location: EECS library, Lindstedtsvägen 3, floor 4) 

  • 12:50 Welcome 
  • 13:00 Sjoerd Henricus Arnoldus Hendriks (Chalmers) 
  • 13:30 Silvia A. Carretta (Uppsala University) 
  • 14:00 Anna-Kaisa Kaila (KTH) 
  • 14:30 Coffee 
  • 15:00 Laura Cros Vila (KTH) 
  • 15:30 Vincenzo Madaghiele (KTH) & Luca Falzea (Politecnico di Torino) 
  • 16:00 Georgios Diapoulis (Chalmers) 
  • 16:30 Conclusion of Day1 
  • 17:00 After work: meeting in a pub.  

Tuesday 8 November (location: Fantum, Lindstedtsvägen 24, floor 5) 

  • 09:30 Leila Methnani (Umeå Univsersity) 
  • 10:00 Yuchong Zhang (Chalmers)  
  • 10:30 Coffee 
  • 11:00 Joris Grouwels (KTH) 
  • 11:30 Kelsey Cotton (Chalmers) 
  • 12:00 Lunch break: self-organized, but a location will be suggested!  
  • 13:30 Nicolas Jonason (KTH) 
  • 14:00 Hans Lindetorp (KTH, KMH) 
  • 14:30 Ambika Kirkland (KTH) 
  • 15:00 Coffee 
  • 15:30 Petra Jääskeläinen (KTH) 
  • 16:00 Siyang Wang (KTH) 
  • 16:30 Ziming Wang (Chalmers)
  • 16:30 Conclusion 

Virtual Location:


Silvia A. Carretta:

“The autonomous concept of accountability for artificial intelligence. A legal study of AI uses for online content moderation”

The text presents the efforts made to create a legal definition of accountability for AI and it introduces a four-factor analysis. These factors are: i) human involvement in the design of AI solutions (Humans in-the-loop); ii) evolution through the AI life cycle; iii) importance of being subjected to clearly defined legislations and establishing who is in charge of ensuring compliance (Rule of law); and iv) domain-related issues connected with the interplay of intellectual property rights, users’ freedom of consuming and sharing lawful content online, as well as liability of intermediaries (use of AI algorithms for content moderation). The study of this legal definition is a necessary first-step for my research project, which then wishes to establish which actors are to be held accountable in different moments of the AI life cycle.

Anna-Kaisa Kaila

In my PhD project, I set out to untangle some of the ethical complexities of AI technologies in the cultural and creative sectors. What are the broad implications of creative AI applications on the field and its actors? What affordances, limitations and ethical susceptibilities do such technologies pose to artistic work, and how do the current practices intertwine with the existing ethical frameworks and legal guidelines? What could ethical creative AI look like, and how could we support developers in adopting caring approaches in their design work? I will approach these questions from an interdisciplinary perspective that draws inspiration from STS and critical studies, care ethics and legal studies.

In the seminar, I will present some preliminary observations from literature as well as from workshops and interviews conducted with researchers of music AI and Nordic AI-artists, to lay the ground for a research project on the frameworks of ethical creative AI.

Laura Cros Vila

My main objective is to develop a framework for understanding music-listening machines and evaluating how much they resemble the way humans listen to music, facilitating AI partnerships for music

Vincenzo Madaghiele & Luca Falzea

Our project focuses on the aesthetics of AI-generated music. Our analysis is located within sociology of aesthetics and media studies, and focuses on how cultural hegemony can be expressed through AI-generated art and media. Our initial experiment is an auto-ethographic exploration of the commercial music generation software for the creative industries Amper, AIVA and Ecrett, with the aim of analyzing the affordances and the boundaries in which users have to move when using these tools, with the aim of understanding how the cultural hegemony can be manifested in such tools through technological constraints and design choices. Some preliminary reflections include an analysis of the affordances of these platforms and of the aesthetic traits specific to the music that was generated using these systems. Our research might evolve in several different directions, and we are looking for alternative/complementary analytical frameworks and methods for our research objectives.

Georgios Diapoulis

A preliminary framework for musical agents in live coding

Musical live coding is a practice where the human agent is necessarily involved. The code and the generated sounds are two essential components the user interacts with during a performance. Interactive AI algorithms can be applied to both these aspects, either on the symbolic and subsymbolic levels of the written code or the audio domain of the generated sounds. Such applications for musical agents can blur the boundaries on who is acting to perform the music, the human or the software agent. This presentation focuses on a preliminary framework for musical agents used in live coding and examines various performance systems. The framework includes the systems affordances and temporal constraints, which are discussed in relation to a technical dimension of liveness. The study suggests that live coding systems that use software agents operating on both domains of code and audio may facilitate expressive interactions.

Leila Methnani

This work presents a gaming platform for the development of hybrid agents in Multi-Agent settings. We build on top of our previous work in developing the AWKWARD cognitive architecture. AWKWARD agents can have their plans re-configured in real time to align with social role requirements under changing environmental and social circumstances. The hybrid architecture makes use of symbolic- and behaviour-based techniques to achieve the real-time adjustment of agent plans for evolving social roles, while providing the additional benefit of transparency into the interactions that drive this behavioural change in individual agents. The gaming platform under development is inspired by Massive Online Battle Arena games (e.g. DOTA2), where success is heavily dependent on social interactions. The platform will serve as a sandbox for sample implementations and subsequent evaluations of agents built using the AWKWARD architecture.

Yuchong Zhang

One of the cutting-edge techniques—augmented reality (AR), in which virtual objects are superimposed in the real world–has been demonstrated and applied in numerous fields due to its capability of providing interactive interfaces of visualized digital content. Moreover, AR can provide functional tools that support users undertaking domain-related tasks, especially facilitating them in data visualization and interaction because of its ability to jointly augment the physical space and the user’s perception. How to fully use the advantages of AR technique, especially the items which augment human vision to help users with different domain tasks’ perform is the central part of my PhD research.

Joris Grouwels

Intrinsically motivated Artificial agents learning (vocal) music

AI research about intrinsically motivated agents goes back at least as far as the 90s. Like in reinforcement learning, it deals with agents that sense and act in an environment. Intrinsic drives (novelty, curiosity, empowerment etc.) can structure an agent’s learning process even in the absence of extrinsic rewards and in environments that are too big to be fully explored. It opens up the possibility for developing subjective agents, whose behavior depends on their structure and their trajectories.

Because this setting resembles the human (vocal) performer’s condition more than the currently ubiquitous data set and task driven Machine Learning Paradigm, it motivates modeling musical, mental, bodily, aesthetic and social aspects of music practice that are typically neglected in today’s application of Ai to Music. Furthermore, as it is not optimizing for an overarching task, it can be used to guide divergent searches in the environment leading to discovery of novelty that is not constrained by a dataset.

Over the last months I have been exploring this field. I would like to discuss my findings and get input from the audience.

Kelsey Cotton

Technological advancements of machine learning models and the rising accessibility of prosumer computers–has seen rapid innovation in creative utilisation and appropriation of machine learning and AI systems in artistic contexts. Such utilisation within artistic applications and within a range of artistic practices offers a rich and wide space for evaluation into the human-agent interaction loop. Through thoughtful and critical evaluation into how artistic practitioners and communities utilise and engage with such tools, we can provide impactful insights and approaches to engagement and interaction modes with AI-agents and tools within artistic communities, and also to the general HCI community. Critically, there is an unfolding suspicion in regards to a coded gender bias within these models-which necessitates an urgent exploration into developing intersectional and care-centric feminist frameworks for interacting with AI agents.

Nicolas Jonason

Applications of artificial intelligence to music production. My work this far is mostly centered on sound synthesis and related problems.

Hans Lindetorp

My PhD is primarily focusing on technical solutions for implementing music in interactive applications. The studies have shown limitations in current technologies (including my own) in regard to the process of composing and recording the music. While available systems often rely on looped and synchronised audio files, it leads to a fragmented creation of musical blocks that is dynamically put together by the system. The system often rely on metric grids for controlling looping and synchronising which leads to grid-based, quantized music. This in turn tend to favour digitally generated sound rather than acoustic instruments.

I want to explore how music information retrieval (MIR) could be used to challenge those limitations. If the study is successful, it could contribute to finding a new workflow for composing, recording and implementing music in interactive applications.

Ambika Kirkland

Most previous research investigating how prosody contributes to the perception of speaker attitudes and characteristics has relied on either the use of read speech or artificial manipulation of recorded audio. Our novel method of prosody control in synthesized spontaneous speech offers a powerful tool for studying speech perception which can provide insights into the effects of prosodic features on perception while also paving the way for conversational systems which are more effectively able to engage in and respond to social behaviors. We have so far used this method to examine the perception of smiled speech, as well as the interplay between filled pause position and prosodic features in the perception of speaker confidence. We are currently developing a female neural TTS voice trained on spontaneous speech, and in the near future plan to investigate how prosodic features and perceived gender influence a listener’s impression of the speaker’s knowledgeability and competence.

Petra Jääskeläinen

I would like to present the stage where I’m at in my PhD research with a few examples of recent studies/papers. The presented studies are in sustainability, imaginaries, and ethics of Creative-AI. I will most likely focus mostly on values that surround and get embedded into the design of Creative-AI technologies, and how they relate to sustainability and ethics.

Siyang Wang

I want to briefly introduce our published work on synthesizing speech with fillers (‘um’, ‘uh’), and jointly synthesizing speech and gesture. The main focus of the talk will be on evaluation and why evaluation of speech and gesture synthesis is more difficult than they appear. I hope to start conversation with attendants regarding how speech and gesture synthesis can be evaluated, and more interestingly how the synthesis systems can be used in novel applications.

The focus of my PhD is synthesizing speech and gesture.