Since its first introduction, the concept of cybernetics spread widely through many branches of academy and percolated into the everyday life soon after. Even now, it continues to affect our social and cultural life greatly. Here, we will trace the impact of cybernetics on electronic art, and how this impact resonates with 21st centuries’ social online networks and metaverses in the idea of participation, co-creation, and constant flux.


In 1968, the time was finally ripe for an exhibition where robots chased the audience and changed the lightning according to environmental sounds like clapping hands, where one had the chance to encounter computers writing poems, and machines drawing interesting geometric figures that played magical tricks with your visual system. Today, after 60 years, it has acquired the status of a myth among the cognoscenti of computer arts. In this paper, we will trace the links between the metaverse and electronic art to those first years, and to the impact of cybernetics.

From cybernetics to the fundamentals of electronic art:
The term ‘cybernetics’ was first used by Norbert Wiener in the title of his famous book “Cybernetics or Control and Communication in the Animal and the Machine”. Cybernetics grew partly out of Shannon’s information theory, and its etymology goes back to the ancient Greek word kybernetes, meaning steersman or governor. The title of Wiener’s book includes an appropriate definition of the concept, which is effectively a theory of control, i.e. of the principles that govern the behavior of adaptive systems (e.g. animals and machines) in dynamic environments.
Two concepts were particularly important to cybernetics. The first one is teleology, by which Wiener denoted the ‘purpose’ that guided the behavior of an adaptive system. This concept relates to planning and autonomy, issues that are still important aspects of robotics. The second important concept was self-replication, which is a natural property of living systems. In short, cybernetics sought the principles behind mechanisms of replication and reproduction that were equally applicable to artificial and conceptual systems. Katherine Hayles charts the transformations of the concept as it diffuses into the cultural space, by examining the equally influential information theory in broadest sense, and taking into consideration a bidirectional flow between the cultural/social circumstances of the times and the scientific agenda. [1] Here we would like to focus on the initial activities that transferred ideas from cybernetics into arts.
Abraham Moles and Max Bense were the first to apply information theory to arts at a theoretical level, when they tried to capture the essentials of aesthetics with the use of cybernetic thinking. [2] [3] However, on the level of applications, we should name the British artist Roy Ascott as the pioneer. Already in 1961, Ascott was teaching at Ealing School of Art a curriculum that fused cybernetic thinking with art education. In 1964, he displayed pieces based on these ideas in an exhibition entitled Diagram Boxes and Analogue Structures. Later on, he published the philosophical aspects of his work in the journal Cybernetica in a two-part article, “Behaviorist Art and the Cybernetic Vision.” [4] In this paper, Ascott describes a cybernetically driven art theory called Cybernetic Art Matrix, CAM for short. CAM’s prerequisite is an environment that calls for user participation in creating an art object. This environment is set up in a way to force the audience, or in Ascott’s terminology, the participant, to give feedback, through which the participant engages in a decision making activity concerning the art object. The end result is the joint creation of the object by the artist and the participant. Ideally, this object would be an open project, in constant flow and never ceasing to take on new aspects. With every new participant, the creation process would re-start or expand, and this circulation would continue until some physical limit (e.g. end of the exhibition) brings the process to a halt.
Apter notes that Ascott’s idea of ‘art as a process’ had a great appeal for artists, as it formulated art as a dynamic system that comes into existence only through the feedback loop between the artist and the audience. [5] As Ascott details in his papers, this line of thinking is in continuity with the modern art’s “behaviorist” tendencies. In contrast to the traditional understanding of an art object with a well-defined body, ways of construction (such as painting and sculpture), and a specific space for dissemination (i.e. museums, galleries, fairs), cybernetic art opened the doors to a new way of making, experiencing, sharing and displaying art. Franke sums up the aesthetics of this new type of art object as: “The conditions of optimum aesthetic communication can be obtained from a determination of the reactivity of viewers of works of art. Art then is a part of a process of regulation (in a cybernetic sense) in which an artist seeks to achieve the maximum of receptivity.” [6] The actual impact of cybernetics on arts clearly manifested itself in the form new meanings attached to arts and in the understanding of what makes an art object, as well as in the ways of how art production has changed. The concepts of feedback, interaction, information sharing and ‘art as a process’ led first to Telematic Art, then to Telepresence Art, both of which eventually falling under the heading of New Media Art, as Electronic Art is called today. [7]

From interaction to open-ended play
In their “Book for the Electronic Arts,” Mulder and Post subdivide the modern art practice in stable and unstable art. [8] With stable art they denote the culture of “high art”, driven by the materiality and of secularity of art objects. Unstable Art, in contrast, is more volatile, as it is participatory, performative and in constant flux, and is based on (shared) experience. Stable art is serious, unstable art is playful. In modern games and playful interaction the principles of unstable art are more alive then ever.

In the last decade digital games introduced new concepts in the context of playing: a virtual game space containing interaction space allows gamers to communicate, decide and create. These actions are all in line with the ideas expressed in the previous section, namely that the idea of being a part of the process inherently follows the principles of cybernetics, and opens up a performative space. In this sense, some artworks resemble games, and vice versa. A famous example of this is the computer artwork Daisies, by Theodore Watson. In this interactive installation, daisies are projected on a floor, creating an immersive game experience, in which the user is central. You walk over the daisies and the daisies die under your feet, only to quickly grow back a few seconds later.
In the 90’s, based on these concepts, designers and artists created interactive environments, mainly supported by video images and interactive sound. In this context, Marinka Copier’s definition of play becomes crucial. She describes games as a system of communication and continuous negotiation of (role) players with socio-cultural network of human and inhuman actors. [9] Copier formulates a comprehensive description of (role) play that does not focus on actors like rules, goals, objects, or environments, but instead investigates the relations between all actors. Role-players actively negotiate with the game mechanics, socio-cultural mechanics, as well as individual-personal ones. From these negotiations a play experience emerges. The play experience and the activities related to these experiences are in a constant state of flux. It is in this continuous change that the characteristic of play can be found, and is often defined as open-ended play.
Instead of designing for goal-directed behavior, as is assumed by, for example, Norman’s action cycle the definition of open-ended play assumes that players do not structure their activity beforehand, but that activity grows as the interaction in the context of use occurs. People are opportunistic as they interact with the world. These ideas are inspired by theories about situated action [10] [11] [12] and above all on emergent behavior in decentralized systems, [13] which relates to the aspect of cybernetics as regulatory systems. According to Resnick, nature provides us with various examples where local behavior leads to global patterns. For example, individual birds in a flock use only simple local rules related to nearby birds, which lead to organized flock patterns. Programs in his parallel programming environment StarLogo have shown that by giving objects or agents local rules overall patterns can occur in simulated environments (or micro-worlds). But most importantly, local rules are shaped by players’ participations and actions, and the patterns of the overall game emerges through these interactions, or in other words, through the wisdom of the crowd.

Play & Fun in Metaverse & Social Networks
Games in social networks like Facebook become more and more popular as they can be played everywhere and anytime. They enable expression through role-play, interactive attributes, measures and other (nonverbal) communications. In modern identity construction, (instant) meaningfulness is of increased significance. [14] This (instant) meaningfulness can, for example, be established in playing the same games in social networks (MafiaWars (Zynga 2008), PetSociety (Playfish 2009), or RestaurantCity (Electronic Arts 2009)) other activities like chat, msn (Microsoft 1999), Skype (Heinla, Kasesalu, en Tallinn 2003) etc., or belonging to the same interest groups. In Social Games like Farmville (Zynga 2009), identities are reshaped through collaborations around certain thematic activities. Within these online games a friend’s value corresponds to his or her instant meaningfulness in the game. To be a friend in FarmVille, means to be of value. A friend transforms in a sort of commodity since friends are assets to play the game. This directly ties-in with the social rules on social networks, in which someone’s popularity, and ‘value’ is qualified by his/her number of friends.
Here for us the most important thing is that the boundaries between ‘play time’ and other activities cease to exist: accessing the social sphere of the virtual games can be done via handheld devices, mobile phones and computers while working, eating, and even playing other games. The second factor we would like to emphasize is the erasure of the roles/identities: a dear friend becomes a commodity during the play experience, but with a switch for example from the play window of FarmVille back to Facebook home window, the everyday ‘identity/role’ of the friend is restored.
In modern play-design games and playful interaction are situated in real life as part of everyday activity; a playful approach in which games can be called upon when necessary as part of existing applications in learning, social networks, etc. [15] This requires a social intelligence in game design and will lead to games that are embedded in systems of social meaning, fluid and negotiated between us and other people around us. In this way game design focuses on interactive products as creators, facilitators and mediators of experiences as well as the creation of opportunities.
Damer makes a distinction about the ‘game-play virtual worlds’ and ‘social virtual worlds’, emphasizing that the latter differs from the former primarily because it is based on the freedom given to the players for building both the virtual world, as well as the social atmosphere and the game space in it. [16] In contrast the game-play worlds come with predefined rules, and scenarios. We can state that social virtual worlds resemble the idea of open-ended play.
An interesting thing to note here is about the artistic dimension of these worlds, and the question of creativity & artistic expressions experienced by its users. The general impression is that most of the artistic practices in these spaces are still confined to the existing forms of art creation and dissemination (Lester et. al. 2009). It is expected that with time, when virtual reality loses its novelty of offering a new experience, the potential it generates will be explored thoroughly, and new forms of arts will be born out of these explorations. There are already many fruitful virtual exhibitions hinting for this next step.
However, we believe that these virtual worlds and social networks will have a much bigger impact on the understanding of art. A simple google search for the most popular virtual worlds like World of War Craft and Second Life shows that their popularity extends to the social network sites as well. Here, for us, the most interesting social sites are the ones devoted to art (deviantArt, Flickr), and media (Youtube, MySpace). For instance, in deviantArt, there are ample groups around these cult-spaces, and many users not only uploads screenshot of their experiences, or their avatars, but also share tutorials and textures to help other members in educating how to create in virtual worlds. In other words, some players, first experiment themselves how to create ‘art’, and then share their knowledge with other members for them to join the experience.

The ubiquity of virtual social platforms, and the effects of overabundance of media lead some critics to question the role of the artists in current society. For some, spaces like metaverse offers, and forces the artist to go beyond the traditional artistic goals like catching/questioning the reality, and to become a scientist/technician redefining/creating the reality. For others, art as such does not even have a role to play anymore. In this paper, we tried to contradict these extreme postulations about art in metaverse by pointing out the potential of social spheres of networks and metaverse have on the dissemination and (hence) the definition of art.
Mulder and Post trace the transition of electronic art from machine to media, from there to interface, and lastly to networks. [8] We would like to conclude our paper by asking the question: What is next? We hope that the next step in the evaluation of electronic art will be the realization that expertise has lost its importance. Only then, art will be detached from its high pedestal and materiality by becoming the toy of the layman. Everyone who uploads a picture, designs an avatar, creates a space in Second Life, comments at someone else’s pictures in Flickr or deviantArt will be entitled an ‘artist’ if they care to take on this title.

References and Notes: 

  1. N. Hayles, “Boundary Disputes,” in Configurations 2, no. 3 (1994): 441–467.
  2. A. Moles, Information Theory and Esthetic Perception (Champaign, IL: University of Illinois Press, 1966).
  3. M. Bense, Aesthetica (Baden-Baden: Agis-Verlag, 1965).
  4. R. Ascott, “Behaviourist Art and the Cybernetic Vision,” in Cybernetica 9, no. 4 (1966): 247-264.
  5. M. J. Apter, “Cybernetics and Art,” in Leonardo 2, no. 3 (1969): 257-265.
  6. H. W. Franke, “Some Remarks on Visual Fine Arts in the Age of. Advanced Technology,” in Visual Art, Mathematics and Computers, ed. F. J. Malina, 3-5 (Oxford: Pergamon Press, 1979).
  7. A. A. Akdag Salah, “Discontents of Computer Art: A Discourse Analysis on the Intersection of Arts, Sciences and Technology” (PhD diss., UCLA, 2008).
  8. A. Mulder and M. Post, Book for the Electronic Arts (Amsterdam: De Baille, 2000).
  9. M. Copier, “Beyond the Magic Circle: A Network Perspective on Role-Play in Online Games” (PhD diss., Utrecht University, 2007).
  10. J. Lave, Cognition in Practice (Cambridge: Cambridge University Press, 1988).
  11. B. Nardi, “Studying Context,” in Context and Consciousness, ed. B. Nardi, 35-52 (Cambridge, MA: The MIT Press, 1997).
  12. L. Suchman, Plans and Situated Actions (Cambridge: Cambridge University Press, 1987).
  13. M. Resnick, Turtles, Termites and Traffic Jams Explorations in Massively Parallel Microworlds (Cambridge, MA: The MIT Press, 1997).
  14. B. A. M. Schouten, T. Bekker, and M. Deen, “Playful Identity in Game Design and Open-Ended Play,” in Playful Identities (Utrecht: Utrecht University Press, 2011).
  15. R. Tieben, T. Bekker, J. Sturm, and B. A. M. Schouten, “Eliciting Casual Activity through Playful Exploration, Communication, Personalisation and Expression” (conference, Chi Sparks, Arnhem, June 23, 2011).
  16. B. B. Damer, “Meeting in the Ether,” in Journal of Virtual Worlds Research 1, no. 1 (2008): 1-17.

T/Act - social empowerment through interaction with media artworks

This paper presents results from research made through a collaborative design process with selected individuals with severe physical disabilities. The work encourages and enables creative expression by the participants beyond everyday norms. Can a disruption of institutionalized conditioning according to class, education, gender and physical abilities be orchestrated by careful design and presentation of interactive artworks?


Our current lifestyle is focused and reliant upon media technologies. Our lives are organised through and by technology, such that we can easily forget the importance of physical social interaction rather than which are mediated by online social networks. Instead of being empowered by technology, humans are enslaved to its seductive powers. Is it possible to move away from this focus on the technological and rather discuss the act of using the interface and the product of that action and the content? Does access to media technology in itself empower the participant, particularly if that person is herself on the margins of society? The Eye Writer project is a superb example of open source media technologies being used to empower a specific individual (Tempt One) and others with a similar disabilitating disease (ALS). [1] As Tempt One himself states:

“Art is a tool of empowerment and social change, and I consider myself blessed to be able to create and use my work to promote health reform, bring awareness about ALS and help others.”

It is clear that the act of empowerment for Tempt One comes through a combination of access to the technology, the ability to once again create graffiti art, and his possibility to have a presence in the public city environment through the large scale urban projections of his tags. Each element is very specific to the individual in question. In the research described in this paper the author attempts a broader area of investigation. Can the use of media technologies enhance the possibilities for people with disabilities to express themselves creatively on equal terms with able bodied people?
This paper presents ongoing research into the effects of physical interaction with audiovisual systems through a discussion of the results and observations from collaborative design workshops organised for a group of people with disabilities. The author, as a media artist, had not considered working with people with disabilities until a visit by a group of students from Beaumont special school to the Lanternhouse International arts centre in the north of England where he was undertaking a residency. As these students with severe cerebral palsy were encouraged to touch and interact with the installation which was on display for them, it became apparent that the colour, form, sound and overall interactive environment they were confronted with provided a powerful and provocative stimulus, causing emotional reactions which surprised their carers. A follow-up visit to the college showed that although well equipped with musical instruments, media and audio software, most solutions were generalized rather than individually tailored to each student’s needs. This approach may work for the able-bodied person where we all have approximately the same physical abilities, but for a person with disabilities this can be totally inappropriate and very frustrating for all involved. Together with musician Alan Fitzgerald the author proposed to develop bespoke electronic interfaces for a small group of students. In particular it was hoped to examine the following question: If a unique interface is created specifically for a particular individual, can an examination of the use of this interface lead us to answer questions regarding interface design in general? Unfortunately at the time it was not possible to carry out this project in England, but since the beginning of 2011 the author has been investigating similar themes through participatory design workshops with people with disabilities belonging to the Taika Dance group in Turku, Finland. The majority of the participants are electric wheelchair users and have severely limited use and control of their physical bodies, while some have more mobility. They have their own social networks, yet as a whole they can be regarded as on the margins of society with little voice or visibility. Does access to media technology and the ability to create visual and audio performance lead to a wider social empowerment in society for people like these with disabilities? Does the same effect happen for the wider public at large when they are able to interact deeply with a media art work?

Through a participatory design process, the aim of the workshop sessions has been to develop personal interfaces which might be thought of as bespoke electronic musical instruments made for each individual. Due to the practical difficulties involved with all aspects of the collaboration – logistics, communication, and basic bodily needs – progress has been slow, but fruitful. As this group of people have had no prior possibility to make sound or music, the process started with getting to know each other via “off the shelf” solutions. A midi keyboard and controller were used to provide an immediate experience of actually creating different sounds. Using Max/MSP and Reason software, samples and sound parameters could easily be modified. Sounds were also recorded from the participants own voices and mobile phones to use as samples. Even at this simple level, the experience of hearing one’s own voice played back and modified to create interesting or weird sounds was stimulating for the group. Participants soon felt confident to contribute their own ideas and suggestions for the sounds.

The next level of interaction involved gradually introducing different types of electronic sensors and interfaces, allowing the participants to experiment and play with sound in ways that were totally new for them. The author is familiar with using analogue sensors for data collection, interfacing through the Arduino microcontroller to PCs. Now it was necessary to develop methods of using the electronics so that they would not restrict the users’ limited physical movements. Fortunately there are many small footprint solutions readily available on the market. The selected solution was to use short range radios to send the data to remote PCs. The X-Bee radio together with an Arduino Fio has so far proven to be the best solution, as radios can be networked to send data simultaneously to one PC. The type of sensors used range from simple flex and pressure sensors, accelerometers, and compass modules, to perhaps the most useful, the 9 DOF Razor IMU which provides angle of orientation data in all directions. [2] The emphasis on hardware development had been on the novel use of existing electronic components and not the actual development of new technology per se, although this does include the creation of custom sensors and switches using soft circuitry for example. The exploitation of small wireless devices means that the usual restrictions caused by signal wires are removed, and any impediments to the physical body are minimized. The approach used is to concentrate on the movements that the participants are able to make, rather than design an interface that they would have to adapt to.

The focus is on ABILITY rather than DIS-ability. They play according to their own abilities, and can focus on developing that skill. The aim is to discover appropriate forms of interface and sound according to each person’s physical abilities and musical interest. The dynamics of social interaction between the members of the group is also mediated by the technology. It can be observed that there is an eagerness to be the one performing. At the current stage of the project only one or two people have been able to use the interfaces simultaneously. Now that the physical abilities of each of the members have been understood, appropriate personalised interfaces are under development.

As much as possible the motivation for the design of these interfaces comes from the participants themselves as they experiment with the prototypes. One example is a control interface made as a cushion for a wheelchair user – she can control media and play sounds by shifting her weight on the chair. Made with Arduino and Open Frameworks, the interface is very sensitive, intuitive and fun to use. It can be thought of as a dance mat for wheelchair users, yet it is equally useable by the able-bodied. This is at the core of the research: through the development of new media interfaces for a small group of very particular people, gain insight into empowerment through human interaction with audio visual systems in general. Even though the participants have sensory systems different to the regular population, the goal is to make this difference invisible through the medium of the art performance. With the Taika Dance group the aim is to perform publically at the end of 2011.

The use of computer mediated technologies opens up further possibilities for social interaction. Networked technologies, such as video, audio and telematic control of devices allow these physically challenged participants to interact with others over large distances (such as Finland-UK). There is the potential to enable people with disabilities to collaborate remotely and perform highly advanced works to a geographically dispersed public audience. The use of telematic and virtual spaces allows flexibility in developing personal navigable space for each participant – finding the comfort zone for each individual is extremely important when they may not feel comfortable exposing their physical self to a live audience, but a tele-mediated performance maybe an exciting and liberating alternative. The author can foresee other groups of users/participants such as older people making use of these same systems to create their own networked performative works, mixing the security of their personal space with the empowerment of performing to a virtual audience online.
Collaborative performance shifts interaction and participatory behaviour onto a social level. The research aims to develop a methodology for observing the changing role of creator-interactor-viewer and the effects on the social interaction of the participants. How does narrative structure and a shared sense of social space lead towards development of temporary community? In the case of the Taika Dance group, the participants are already known to each other, but through the performative act they are able to transform their own self-image and their perceived role in society. They become activators of their own destiny for that moment in time – they are no-longer abject objects on the margins of society but proud performers in their own right. These works enable investigation of enactive engagement in collaborative activity with playful, participatory artworks, environments and performances. These include accessible and easy use – easy control interfaces that give inexperienced users control over creative acts and allow them to explore artistic experience through their natural body movements and perceptually guided actions.
The dialectical method facilitates the benchmarking of the generalist approach with that of the highly defined individually focused approach. By focusing on people with special needs (brain damage, physical handicap) in this case, the research adds to the discussion of reactions to interaction stimuli and control in the average adult human. Just as the blind person’s sense of hearing is amplified, so it may be that someone with severely limited movement can actually have an acute sense of control over a range far too limited for the normal person to perceive. Work by Saranjit Birdi with special needs patients in the UK supports this proposition. [3] The bespoke device or environment designed for the individual also acts as a window into their world, as we are able to experience the physical or virtual world through their interface, their experience. In particular Merleau-Ponty’s discussion of the body schema illustrates how examination of a unique individual helps us to understand the wider landscape. [4]

As is alluded to in the title of this paper, the motivation for the research is to understand if and how social empowerment can be orchestrated through interaction with media artworks. Can a disruption or disturbance of institutionalized conditioning according to class, education, gender and physical abilities be affected by careful design and presentation of the interactive artwork? It is vital that the interactive experience invites and encourages social interaction between the participants themselves, as it is only through social activity that the self-image can be positively developed. Can the artwork create a community of presence, an opportunity for living in the moment leading to unpredictable (inter)activity within the social group? The artistic TAZ (Temporary Autonomous Zone) acts as a revealing agent within society using the tools of poetic terrorism to disrupt the status quo. [5] Hakim Bey’s concept of the Temporary Autonomous Zone has been proposed by Geert Lovink as a model for network based communities of interest. [6] Having worked extensively with 3D virtual communities in the past, the author can say that the behaviours observed in physically interactive environments can be identical to those seen in the TAZ of virtual communities. The physical artwork (environment, installation) becomes a point of focus for social interaction AND empowerment, as the normal rules of engagement within the public (museum) space are temporarily ignored in favour of those created by the participants themselves.  We are forced to reappraise the traditional models for spectator vs. artist, as new tools and technologies allow the barriers to interaction to become transparent. The role of the artist or designer changes to become that of a facilitator or producer for a larger group of participants. In fact, the artist creates the situation, and the possibilities for others to bring to life, and accordingly the role of the artist as the author becomes less significant. Curator and theorist Nicolas Bourriaud regards that we have passed into a new “altermodern” era where artistic production is concerned with the weaving of “relationships” between people and things, where the artist “viatorises” objects to build narratives through “post production” techniques – the re-use of artefacts, sampling, a mixing of cultures and signs. [7] The discourse, the social activity, becomes the work itself.

By contrasting the generic with the specific, this research has set out to uncover new information about the benefits, desire and motivation to interact with complex technologically driven systems, as well as proposals for rules and methods for the creation of artistic communities of presence. The work together with Taika Dance encourages and enables creative expression by the participants beyond their everyday norms. The eventual goal is to have an understanding of how to enable deep audience participation in live performative events and interactive environments through their interaction and control of audiovisual and robotic systems.

References and Notes: 

1. The EyeWriter Project website. Free Art and Technology (FAT), OpenFrameworks and the Graffiti Resarch Lab: Tempt1, Evan Roth, Chris Sugrue, Zach Lieberman,Theo Watson and James Powderly. (accessed June 28, 2011).
2. An inertial measurement unit, or IMU, is an electronic device that measures and reports on a craft's velocity, orientation, and gravitational forces, using a combination of accelerometers and gyroscopes. (accessed June 29, 2011).
3. Saranjit Birdi, Thisability (2010) Online documentation and artist statement, (accessed July 4, 2011).
4. Maurice Merly-Ponty, “The spatiality of one’s own body and motility” in Phenomenology of Perception (Abingdon and New York: Routledge Classics 2008, 1945, eng 1962), 112-177.
5.  Hakim Bey, "The Temporary Autonomous Zone, Ontological Anarchy, Poetic Terrorism" (Autonomedia 1985, 1991) (accessed July 4, 2011).
6. Geert Lovink, “The Data Dandy and Sovereign Media, An Introduction to the Media Theory of ADILKNO,” Lecture for the Fifth International Symposium on Electronic Art, Helsinki, 24 August 1994.
7. Nicolas Bourriaud, Relational Aesthetics (Paris: Les Presses du Reel, 1988, eng 2002); Altermodern (London: Catalogue Tate Triennial, 2009).


Robots as social actors: audience perception of agency, emotion and intentionality in robotic performers

This paper looks at the different ways audiences perceive and respond to anthropomorphic and bio-mimetic qualities in robotic characters, specifically their perceptions of agency, emotion and intentionality. The author argues that it is audience perception rather than the innate qualities of the robot that determines successful robot-audience interactions.


Analyzing Robotic Performance

This paper analyzes robots as performative entities that create themselves in the moment of their performance and also looks at how audiences perceive and interpret those performances through observation and interaction. Interactions between humans and robots take place in a variety of different contexts. Some of these contexts are explicitly performative or theatrical, including Honda’s ASIMO conducting the Detroit Symphony Orchestra, Hiroshi Ishiguro’s female android Geminoid-F acting in the Japanese play Sayonara and Louis-Philippe Demers’s robotic performers in Australian Dance Theatre’s (ADT) Devolution. These performances are all tightly scripted and rehearsed. Other human-robot interactions take place in more open environments, such as art galleries and museums where audiences can interact with robots in unscripted interactive encounters. Nevertheless, I would argue that there is a theatrical performative element to all public displays of robots. All robots are in essence performers: they are designed to act and interact in the world and are programmed (scripted) to perform in particular ways.

How then can we best analyze the performances of robots across both theatrical and non-theatrical environments? Moreover, how do audiences respond to these robotic performances? While there are a growing number of studies analyzing robots as performers, particularly from the domain of performance studies, [1] [2] [3] [4] it is the work of sociologist Erving Goffman that proves to be particularly useful in analyzing robotic performances and interactions with humans across both theatrical and non-theatrical contexts, such as art galleries and museums.

In The Presentation of Self in Everyday Life, Goffman views all human social interaction as a type of acting. We don’t have to be on a literal theatrical stage to act, we are all actors who craft and perform different versions of ourselves in our everyday lives depending on which social situations we are in and who we are interacting with. Goffman uses the metaphor of the theater to describe how we move between back stage and front stage arenas using various techniques of “impression management” such as selecting different modes of dress, speech and behavior to perform these different presentations of self to our different audiences. [5]

Using Goffman’s theatrical framework, we can analyze the physical appearance and behavior of the robot along with its staging and theatrical mise-en-scène to see how these all play a part in framing the robotic performance and how it is perceived and interpreted by audiences. The back stage preparation of the robot’s appearance and behavior includes its design, fabrication and assembly, as well as more conventional types of costuming and dressing up. How the robot is then presented to an audience, whether this is in a theater, gallery, museum or trade show, also contributes to the overall impression the robot will make.

We can break down these aspects as follows:

  • Appearance (robot morphology, for example machinic, biomorphic, zoomorphic, anthropomorphic, and costuming)
  • Behavior (the robot’s movement and actions including its interaction with its environment and with other actors)
  • Context (this includes the environment within which the performance takes place and aspects of theatrical mise-en-scène such as setting, props and lighting)

Goffman’s description of back stage and front stage arenas and the team efforts frequently involved in these everyday presentations of self marries itself very well to the production context of robotic performance, which typically includes the artist as well as literal teams of technologists, assistants and handlers who work behind the scenes in the presentation of the robotic artwork. In this team effort, the agency of the performance may be distributed in a variety of different ways between the members of the team and the robot itself. The robot may perform completely autonomously and have its own emergent agency and behaviors (albeit programmed by the artist/technical team) or it may be controlled in more direct ways through automated performance scripts or teleoperation.

Some Case Studies

Wade Marynowsky, The Discreet Charm Of The Bourgeoisie Robot (2008)

There is something of a camp aesthetics evident in Wade Marynowsky’s cross-dressing robot Boris in The Discreet Charm Of The Bourgeoisie Robot. Although Boris playfully references human attributes in his voice, clothing and behavior, he is still clearly a robot, he is not trying to pass as human. The robot is dressed in an old-fashioned Victorian black dress trimmed with lace but his glass-domed head with its camera eye clearly proclaims his identity as a robot — a robot playing dress-ups. As gallery visitors enter the space Boris whirls in circles and engages them in conversation. Marynowsky’s robot is reminiscent of the robot in Lost in Space, the Daleks in Doctor Who and Robbie the Robot in Forbidden Planet, but its historical lineage also includes the famous chess playing Turk, an automaton built by Wolfgang von Kempelen in the late 18th century. Von Kempelen’s automaton astounded its audiences with its uncanny chess playing ability until it was revealed that the Turk’s prowess was in fact attributable to unseen human operators hiding in the stand that housed its mechanism. Marynowsky’s robot is controlled by similar sleight of hand — in this case it is an unseen human operator (the artist) who remotely observes the actions of gallery participants and direct Boris’ movements and speech via the Internet.

The mise-en-scène of the performance — the lace-trimmed black dress and the old-fashioned gramophone horns lining the gallery walls — combined with the robot’s uncanny whirling when visitors enter his space evokes the feeling of a Victorian séance; especially combined with the spirit possession inherent in his channeling of his master’s voice through the Internet.

Simon Penny - Petit Mal (1989-2006)

There is nothing human-like in the appearance of Simon Penny’s Petit Mal. The robot is completely machinic in appearance. It sits on two bicycle wheels joined by an axis with an upright pole supporting three ultrasonic sensors and three pyroelectric (bodyheat) sensors in the front and a fourth ultrasonic at the back. However, although not ostensibly anthropomorphic or zoomorphic in appearance, the constellation of sensors nevertheless acts as a sort of ‘head.’ A colorful vinyl print covers some of the metal tubing which acts as a counterpoint to the utilitarian machinic appearance of the robot and gives it a more playful and frivolous appearance.

The robot moves around the gallery performance space generally avoiding walls but sometimes lightly glancing off them. It rocks back and forwards on its base as it pursues and reacts to people in its performance environment. It will approach audience members who are directly in front of it up to a distance of about 60cm and try to maintain this front-facing position and distance as its audience interactor moves. If the person comes closer than around 60cm, Petit Mal will retreat. However, the robot’s behavior can become confused if there are multiple people in the performance area or if it gets cornered. The appearance and gently erratic movement and behavior of the robot contribute to its playful demeanor. The robot’s name derives from a neurological term that describes a momentary loss of control or consciousness. The naming of the robot provides its behavior with a psychological frame. Is this robot out of control? Is it psychologically disturbed?

Petit Mal has appeared in many gallery performance environments, sometimes in an open gallery space and sometimes in specially constructed enclosures. The robot (when it was exhibited at Transmediale 2006 in Berlin) performs in a rectangular arena enclosed on all sides by hip-high white walls. This performance area is reminiscent of a zoo enclosure with the audience standing behind the wall to watch the actions of this strange creature. The robot is contained in this space with no other objects or props but audience members are able to enter the space to interact with the robot.

Audience perception of robotic performers

We can conduct a rigorous semiotic analysis of a robot’s appearance and behavior and the staging of its presentation as I have done above but this is only part of the equation. The key question remains: how do humans understand and interpret the performance of robots?

In his analysis of the everyday presentation of self, Goffman also places particular emphasis on the role of the audience in receiving and judging the performance. A successful performance is one where the audience views the actor as he or she wants to be viewed. We all test and judge each other’s performances. If robots successfully perform the behavioral signifiers of animacy, agency, emotion and intelligence, audiences will respond to those cues. However, the intention of the performer and the intended meaning of the performance is not necessarily what will be received by the audience. Both human and robotic performers are subject to performance mistakes and unintended behaviors. These gestures and behaviors (for example, the jerky movement of a robot or responses that are too fast or too slow) even if they are not an intentional part of the performance will be interpreted as meaningful by the audience and become part of the performance effect.

As Byron Reeves and Clifford Nass [6] have shown, human responses to computers and virtual characters are informed by deeply ingrained physiological and behavioral tendencies and habits. These instinctive physiological responses (such as reacting to facial expressions, body language and movement) and social responses (such as a tendency to be polite) are carried over from the physical world into our interaction with robots.

When robots display machinic, bio-mimetic or anthropomorphic characteristics, these performative signifiers (sign-systems) are measured against the audience’s own experience of other similar entities (human, animal, insect, machine, art) that they are familiar with. The robot’s movement and behavior are just as important, perhaps even more important, as its physical appearance in this regard. What the robot does, how it does it, and how it responds to its environment and other entities including audience members are key factors in how it is perceived.

Behaviors that look too controlled and automated can appear machinic and unexpressive. Unpredictable behaviors by the robot in response to its environment and to other objects/people in that environment give an appearance of agency, personality and even emotion. Hesitations, frailties and inconsistencies make the robot appear more like a living organism than a programmed machine. The active interpretive role of the audience is a key factor here. It is the audience's projection of their own meanings onto the performance that generates much of the expressiveness of the robotic performance. This, after all, is how audiences read and respond to the performances of human actors. We interpret each other's performances including perceived intentions and emotions through reference to our own experience and emotions.

In this scenario, whether the robotic performer is intelligent and has emotions or not is not the key issue, it is whether we can tell the difference or not. Human perception and emotional and cognitive responses are more important than epistemological ontologies when it comes to robotic performance. The successful performance of the robot, judged from the audience’s point of view, is determined by what the audience can directly perceive in the robot’s appearance and behavior rather than by the intrinsic qualities and abilities of the robot (for example, whether the robot is ‘truly’ aware, intelligent and socially responsive).

As Sherry Turkle comments in her book Alone Together, “Computers 'understand' as little as ever about human experience […] They do, however, perform understanding better than ever.” [7] Robots may not be truly alive, but according to Turkle, they are becoming “alive enough” for humans to have relationships with.

The intrinsic qualities of the robot including the sophistication of its manufacture, its sensing systems and Artificial Intelligence (AI) programming are only relevant to the audience to the extent that they impact on the robot’s observable behavior and performance. These factors may be highly relevant to scientific robotic research and robotic development but in terms of audience response, careful staging, programming and even trickery may be just as important factors in achieving an effective performance for the audience. Robotic performances may be completely autonomous or assisted by human operators. From the audience’s point of view, it may be difficult to tell the difference. Creative staging and showmanship along with elements of deception and trickery have a long history in machine performance, as in Von Kempelen’s chess-playing automaton. Wade Marynowsky’s Boris has automated sequences and is also teleoperated by the artist and other guest operators, making the robot appear to be much more intelligent and aware of its audience. This hi-tech puppetry and remote operation of robotic performers is also the case with Hiroshi Ishiguro’s teleoperated Geminoid robots, which are controlled by the humans operating them rather than acting as autonomous performers. In this process, agency and social intelligence is transferred and delegated from the artist/operator to the robot even though from the audience’s point of view, the intelligence and awareness appears to be coming from the robot performer itself.

Successful acting is all about simulation and making what is unreal appear real. For a robot, this is the ability to persuasively simulate or pass as human, or alive, or intelligent. Alan Turing’s famous test used to determine machine intelligence and social performance is essentially an acting test. It measures not whether a computer is intelligent or can think like a human, but whether it can perform as if it is human, or at least whether it can perform well enough to fool a human audience. Turing set out this test for machine intelligence in his influential 1950 essay Computing Machinery and Intelligence [8] where he describes the scenario for an ‘imitation game’ to test whether a computer can successfully imitate a human being. Turing based his test on an earlier game where an interrogator tries to guess the gender of two participants (one male and one female) by asking them questions and assessing their typewritten replies. In Turing’s version of the game, he replaces one of the human participants with a computer and suggests that if the interrogator cannot tell the difference between the human and the computer purely from their answers, then the computer can be said to be intelligent. In this way intelligence becomes a functional attribute achieved through persuasive simulation or ‘passing’ rather than an inherent attribute.

‘Passing’ or successful simulation means getting it ‘just right,’ but over-performance and under-performance are more common features of machine performance. Over-performance and under-performance may be perceived in a variety of different ways and can have both entertaining and unsettling effects on audiences. Exaggerated appearance and behavior, including over-emphasized facial features, expressions, gestures and movement are common features of cartoon animation and animated films, where these techniques are successfully used for comic effect and to enhance emotion and drama. More unsettling are the uncanny responses evoked by robots and digitally animated characters that are ‘almost but not quite’ human in their appearance and behaviour; these responses have been described by Japanese roboticist Masahiro Mori as the ‘uncanny valley’ phenomenon. [9] [10] These unsettling effects occur when the mimetic aspiration of the work falls just short of achieving a perfect simulation. While audiences generally find lifelike or human-like characteristics in a more abstracted form appealing and empathetic, when these characteristics become more realistic (but not quite right), audiences tend to focus more on the disparities and what is not working about the simulation. The human brain perceives these imperfect simulations as defective versions of the real thing.

As we have seen, audiences judge robotic performances in the same way as they judge any other type of performance interaction whether they occur in everyday social settings or in more staged theatrical environments. The success of the robotic performance depends on two key factors, the intended performance, the robot’s appearance and its ability to enact or simulate behavior, movement and interactive responses (to its environment and other entities/actors) and the perceived performance, the audience’s perception and interpretation of the robot’s appearance, behavior and interactive responses.

References and Notes: 

  1. Philip Auslander, “Humanoid Boogie: Reflections on Robotic Performance,” in Staging Philosophy, eds. D. Krasner and D. Saltz, 87–103 (Ann Arbor: University of Michigan Press, 2006).
  2. Louis-Philippe Demers, “Machine Performers: Neither Agentic nor Automatic” (paper presented at the 5th ACM/IEEE International Conference on Human-Robot Interaction, Osaka, March 2-5,  2010).
  3. Steve Dixon, Digital Performance (Cambridge, MA: The MIT Press, 2007).
  4. Yuji Sone, “Realism of the Unreal: The Japanese Robot and the Performance of Representation,” in Visual Communication 7, no. 3 (2008): 345–62.
  5. Erving Goffman, The Presentation of Self in Everyday Life (New York: Doubleday Publishing, 1959).
  6. Byron Reeves and Clifford Nass, The Media Equation: How People Treat Computers, Television, and New Media Like Real People and Places (New York: Cambridge University Press, 1996).
  7. Sherry Turkle, Alone Together: Why We Expect More from Technology and Less from Each Other (New York: Basic Books, 2010), 26.
  8. Alan Turing, “Computing Machinery and Intelligence,” in Mind 59, no. 236 (1950): 433–60.
  9. Masahiro Mori, “Bukimi No Tani [The Uncanny Valley],” in Energy 7, no. 4 (1970): 33–35.
  10. Kathy Cleland, “Not Quite Human: Traversing the Uncanny Valley,” in What was the Human?, eds. L.E. Semler, B. Hodge and P. Kelly (Melbourne: Australian Scholarly Publishing, forthcoming). Available at

Designing a way to visualize the invisible through the dynamic illustration of conversations

This paper discusses the creation of a multi-modal data driven prototype application called the Conversation Viewer. Designed to visually represent the evolution of a conversation through a dynamic touch based graphical interface, it illustrates various elements of participants’ email, text and voice messages as they seek to find a mutual agreement around a meeting date.



The rise of social media and the sophisticated development of third party applications, in combination with an ever-increasing range of mobile devices, makes the immediate communication of ideas between individuals, different communities of practice and the wider community possible. While we have the ability to send and receive vast amounts of information, status updates and social commentary about different events there is a need to capture dynamic systems of communication in a way that illustrates the less visible relations between participants of a conversation. This is significant to situations where the consequences of numerous interactions can change the perspective and direction of different courses of action. For example, this holds relevancy for participants in distributed environments involved in the collaborative development of long-term projects. Since digital communications commonly involve direct communication between individuals responsible for sending and receiving information, knowledge resulting from group discussions or informal corridor conversations is not as easily shared amongst participants in distributed environments via email, project documents and websites.  

Rather than seeking to integrate multiple systems of communication made available through aggregation tools such as Netvibes [1] or Hootsuite, [2] this research focuses on assisting others to contextually build relations between their different communications. Here the intent is to supply individuals with a comprehensive understanding of the circumstances surrounding the progressive development of different thoughts and actions that lead to mutual agreements. This approach positions human participation, negotiation and understanding at the heart of dynamic communication systems and looks beyond the notion of interaction as the mere sending, receiving or aggregation of disparate messages. For the purposes of this research, the term 'dynamic situation' refers to a set of circumstances in which one has an awareness of their surrounding environment. In particular, this can be understood as a situation that enables individuals to understand not only their contribution to an ongoing conversation but also that of other participants involved in the same discussion.

In light of this, Bell Labs France researchers specializing in hybrid and social communication, intuitive collaboration and applications design worked together to develop a prototype application called the Conversation Viewer. Its role is to make visible the abstract relations that exist between participants of a conversation. A touch-based graphical interface is used to interact with and visualize the progressive development of conversations drawn from participants’ email, text and voice messages. The data driven prototype application was initially designed as an iPhone application that was later developed for the iPad (See Fig. 1).  

The impact of functionality on form

The goal in designing and developing the Conversation Viewer was to create a way to deal with the often-fragmented experience of understanding individual and group intentions expressed by voice and text-based data. In doing so, the Conversation Viewer aims to provide greater awareness surrounding the overall actions and interactions of participants engaged in evolving conversations. This is where time as a factor of interaction in the past, present and future can be used not only as a reference to illustrate the history of a conversation but also to study the value of interactions.   

The key function of the Conversation Viewer is in the utility it offers participants to interact and develop conversations. Interaction forms the basis of conversations in which elements of dynamic situations are negotiated between participants to develop a desirable outcome. The impact of functionality on the form of conversations is in the approach taken by participants during the course of interaction. It is important that understandings are not communicated; instead they are built collaboratively through conversation where participants derive meaning from their interpretation of a discussion. A new-formed understanding is then offered to participants for further interpretation and comparison to the original, which eventuates in mutual understanding and agreement. [3] In this way, it is possible to understand how the collaborative development of a dynamic situation facilitates the collective learning of different objectives between participants through a shared process of negotiation, understanding and agreement. Hence, the Conversation Viewer offers a means for individuals to inquire about the evolution of dynamic situations and exchange information through a series of interactions to reveal and resolve contradictory ideas.

Fundamental to this is the function of an observer as an accepted participant in the act of observing that allows for subsequent understanding from such actions to be derived. [4] During the development of interaction, individuals are accepted as mutual participants in the act of knowledge creation. In doing so different individuals, considered as observers and participants, become necessary elements in the development of dynamic situations. This in turn enables them to act subjectively. By interacting with various participants involved in conversation, understanding is created through the exchange of ideas that lead to mutual agreement. This involvement is interactive and productive so that individuals affect and are affected by the interactions in which they participate. The interaction should represent the culmination of the participant’s interpretations. Significant to this is that, “The language of the conversation must bridge the logical gap between past and future, but in doing so it should not limit the variety of possible futures that are discussed nor should it force the choice of a future that is unfree.” [5] This is important because the form of a system directly influences its usefulness as a tool to support the evolution and understanding of dynamic situations.

The useful and visible functions of a system

The expression of individual intentions and the broad circumstances associated with various events and actions that take place within dynamic situations underpins the useful function of the Conversation Viewer. Creating a tool to achieve this becomes challenging, since there are no established interaction design patters or use cases for the design of dynamic communication systems. Nor is it possible. The Conversation Viewer, therefore, seeks to provide end-users of the system with a contextual timeline of events that visually flow together with the collaborative evolution of conversations over time. Building the relations between these activities means that individuals communicating with and though the Conversation Viewer are not required to search through numerous email threads, text messages and voice mails to gain a quick overview of the current state of conversation. Instead, the data driven system illustrates conversations at both general and detailed levels of information in the form of a visual narrative. The 1MC Viewer also opens up the potential to express the character of individual actions or those of a group. To achieve this a technical component called a sentiment analyzer interprets the mood of an individual’s communication and illustrates it in the design of the ambient interface. While applications such as the iPhone Tracker [6] provide a contextual and historical narrative of an individual’s physical movements from location-based data captured by the iPhone, the Conversation Viewer represents a visual narrative of (1) participants’ relationships towards reaching an agreement, (2) general and detailed information about the terms of agreement, as well as (3) the emotional disposition of participants during a conversation, as an organized integrated whole. 

To give further context to the application’s use, the following scenario briefly describes the potential interactions that could take place during the discussion, understanding and confirmation of an agreement. For the purposes of this example the terms of agreement are to find an appropriate meeting date. Fig. 2a represents the first interaction in a conversation. It illustrates a message from John in brief format that has been reduced in detail for the purposes of simplicity, as a function of the system. When touched, the message expands to show its detailed contents if so desired. At present, the agreement point is unclear, which will become more focused when a precise meeting date is proposed. Laura and Steve are represented in grey while John is represented in black, as he is the first to participate in the conversation. The participants of the conversation are positioned as neutral with respect to the agreement point. This is indicated by the background location rings in the design of the interface to help visualize each participant’s relationship to the terms of agreement. As each participant of the conversation interacts with one another, their visual appearance transforms from grey to back. Simple emotions are also represented by the system’s interface, based on semantic information found in each communication. 

Halfway through the discussions in Fig. 2b a specific meeting date has been suggested. This is visually confirmed by the clarity of center point at the core of the agreement. Here both John and Steve are closer to the agreement due to their repositioning with respect to the location rings. We can also see that Steve is happy about this arrangement while John remains neutral to the current proposal. Finally, after much discussion the meeting is confirmed (see Fig. 2c). This is visually represented by the participants close proximity to the agreement point and its solid appearance. By presenting conversations in this way, it is envisaged that participants of the conversation can gain an understanding of their development more easily than reading or listening to each individual email and voice message that forms the basis of their representation. Furthermore, with the added functionality of (1) a global slider in a customized version of the Conversation Viewer shown in Fig. 3 or (2) the gestural activity of swiping along the vertical list of communications on the left hand side of the iPad application in Fig.1, an animated view of the conversation visually evolves at the same time each moment of the discussion is discovered.


The overall purpose of the Conversation Viewer is to facilitate different ways of observing and participating in conversations and to offer end-users of the system greater contextual understanding of the evolution of dynamic situations. This is achieved through conversation where multiple viewpoints are expressed, visualized and internalized by those engaging in the discussion. As a result, a shared understanding of what is known from that which was previously unknown is created. Essential to this communication is that participants enter into the conversation with different perspectives and individual understandings that are distinct from any others. Given that without difference, there is no basis for exchange or discussion among participants that leads to the mutual understanding of something new. [7] Communication ceases to be productive without a context of difference or conflict to initiate change. [8]

It is envisaged that the manner in which dynamic situations can be experienced through the active engagement and manipulation of changing circumstances in the Conversation Viewer will assist the diverse consideration and development of mutual agreements. This is where interaction is the product of an individual’s capacity to communicate, and develop an understanding of their actions with respect to the thoughts and knowledge of other participants’ terms of agreement. Interaction is not considered in a mechanistic sense visualized by the re-positioning of graphical icons that gravitate toward the point of agreement in the Conversation Viewer’s interface. For that is the resulting material outcome of interaction.

The notion of interaction in the context of this research is centered upon the actions required to reach a mutual goal. This in turn drives the purpose of communicating with different individuals through the Conversation Viewer; especially where the development of a conversation is as a result of a comprehensive understanding of a dynamic situation. The significance of interaction is therefore accomplished through the act of doing, which in turn enables an individual to cultivate a shared understanding of the current terms of an agreement.

Ideally, interactive products or services should offer participants, in its system of communication, the freedom to choose how they may express and fulfill their goals rather than being forced to accomplish a task by way of a limited system of interaction. This is where the idea of participating and communicating through a product or service is much greater than its physical manifestation and transcends the materiality of a product. Here the notion of an organized, integrated whole that interconnects people with their environment becomes important. Interaction in this sense is largely concerned with interconnectedness between all the elements of the design situation. This can be graphic signs and symbols, material objects, activities, services, organizations, environments or systems. What is significant is the active participation of these elements with one another as an organized integrated whole. When all the parts of a design solution are connected, everything is in harmony. This holds significance for the dynamic treatment of a product’s content with respect to the form of design outcomes; particularly in the design of products or services like the Conversation Viewer that seek to support the changing conditions of dynamic communication systems.

References and Notes: 
  1. Netvibes Web Site, “Netvibes. Dashboard Everything,” May 2011, (accessed January 10, 2009).
  2. Hootsuite Web Site, “Social Media Dashboard,” 2008-2011, (accessed May 17, 2010).
  3. Natalie Ebenreuter, “Dynamics of Design,” inKybernetes: The International Journal of Systems & Cybernetics 36, no. 9/10 (2007): 1318 - 1328.
  4. Ranulph Glanville, “Thinking second-order cybernetics” (paper presented at the Cybernetics Society 27th Annual Conference, London, UK, September 14, 2002).
  5. John Christopher Jones, Design Methods (2nd edition. New York: Van Nostrand Reinhold, 1992), 407.
  6. Pete Warden’s Web Site, “iPhone Tracker,” 2011, (accessed April 27, 2011).
  7. Ranulph Glanville, “And he was magic,” inKybernetes: The International Journal of Systems and Cybernetics 30, no. 5/6 (2001):652–673.
  8. Dean C. Barnlund, “Communication: The context of change,”  Basic Readings in Communication Theory, ed. C. David Mortensen, 6–26 (2nd edition. New York: Harper and Row, 1979).

Hybrid Art Forms: The Way of Seeing Music

This paper will focus on relationship between music and visual arts through the idea of hybrid art forms. Within this interdisciplinary approach, it aims to consider scientific and technological developments and the way its effects on art and perception. In this context, examples by some of hybrid art forms will be analyzed and finally compared with recent multi-media works.


Throughout the Ancient times, from Aristotle to Schopenhauer, from Pythagoras to Newton, common aspects of vision and hearing have been taken account as an interesting research field. In his passage De Sensu, Aristotle remarked the correlation between sound and color by encompassing physical and perceptual matters: “We may regard all these colors (all those based on numerical ratios) as analogous to the sounds that enter into music, and suppose that those involving simple numerical ratios, like the concords in music.” [1] However, the analogy in physical relations between sound and vision were grounded to the ideas of Pythagoras who had depicted the musical sensory qualities related to mathematical ratios.

At the end of 18th century, physical correlations of harmonics could have been displayed by the inventions of various tools. For example, Ernst Chladni had produced his patterns with a simple system that included scattered sand onto a square plate. Chladni’s patterns were occurred when this plate was bowed in certain notes. [2] Within the 19th Century, the interest of these instruments had gradually increased. French mathematician Jules Lissajous used small fixed mirrors on the sideways of small steel instrument and Sir Charles Wheatstone reflected a light beam on his Kaleidophone to produce the shapes of harmonic vibrations. Among those inventions Harmonograph is one of the well-known device that create figures of harmonic movement of pendulums.

Beside these physical correspondences between visual and acoustic harmonies, there are also remarkable interpretations vary in different standpoints. Goethe’s color theory and Newton’s observations on physical similarity of musical scales and seven prismatic rays of light had invoked scientific and artistic researches. In one of the early inventions of color transmission instrument ‘Clavecin Oculaire’, Louis-Bertrand Castel (1688-1757) modified distribution of Newton’s visible spectrum and implemented these colors in to his color schema. In the half of 19th Century, new theories and applications were adapted to the instruments, such as Frederick Kastner’s invention called ‘Pyrophone’ which was a type of gas organ and Bainbridge Bishop’s device with a small display placed on the top of a classical organ that sound and its related color could be played together. However, the very-best known color organ was Alexander Rimmington’s instrument, which was performed in New York, at the premier of Scriabin’s ‘Prometheus, Poem of Fire’ in 1915.

Synaesthesia generates another common state of visual art and music. From late 19th century till the end of second quarter of 1900’s, visual artists and musicians got deeply involved with this phenomenon, which is basically defined as a transition of senses. It was practiced mostly seeing the music or hearing the color of sounds. It embodies subjectivity and evokes personal associations. Like Scriabin, his contemporary in visual art Kandinsky was also aspired by this inner sense.

Artistic interpretations

Mathematical nature of harmony in music has always great influence on other art forms, such as visual arts, poetry and even architecture, which could be discussed in a wide range of interdisciplinary perspective. First of all, the sense of order, disorder and the contrast between them alter our perceptual and psychological response. [3] In music, the pleasure of order is established by tonal properties and its analogy in visual arts is indicated by perspective, which represents the vision of nature in harmony. For instance, when the tonal system was broken down by the technique of Schoenberg’s twelve-tone system, the harmonic hierarchy was dismissed. This could be seen analogous in the flattened pictorial plane of paintings. However, Donald Mitchell draw a parallel between Cubism and the new architecture, on the one hand, and Schoenberg’s method on the other and claim that abandonment of tonality in music and subsequent development of the serial method is well nigh simultaneous with the abandonment of perspective. [4]

Intensive interrelations between music and visual arts began to form art by responding aesthetical needs of the 20th century’s worldview. Painters like Kandinsky, Klee and Kupka devoted their art to become unified with the idea of music that has the power to give expression without help of representation. Abstractness, which is considered as an essential property of music became great achievement for visual arts. Latter on, it was also determined by Clement Greenberg, who believed in independence and the purity of modernist art. For Greenberg all arts can pursue the sensuous and physical property of music. [5] Thus, in relation to other arts, music became a model; but the significance is, it should be considered as a method rather than an effect.

Greenberg’s modernism is based on distilling art forms and by referring to Lessing and Babbitt; he revealed a long running idea: ‘medium specificity.’ On the other hand it is interesting that contemporaneously there developed new art forms and innovative experiments based on synthesis of different mediums.

Medium syntheses previously had been seen in the stage compositions like musical dramas and operas and latter in modernist period it gained new inspirations. Like in the ‘Yellow Sound’ (1909-14), which was an experimental piece, Kandinsky embraced all perceptual effects and blended different art forms. He turned increasingly to music by working on experimental stage compositions with Thomas von Hartmann who used music, painting, dance and lighting in his stage works. At the same time his colleague Schoenberg intensified his painting practices and stated to work on small operas. His short drama ‘Erwartung’ (1909) and the opera ‘Die Glückliche Hand’ (1910-13) revealed the same synthetic manner. Contrary to Greenberg’s discriminative sense, the synthetic relations of music and visual arts could be seen as influence by Wagner aesthetics: the ‘Gesamtkunstwerk’ where all arts united under the banner of music.

Although in Greenberg’s exclusive sense of modernism or in Wagner’s unified art form, music is seen as a paradigm, as a model to define aspired ‘modern’ art. As Simon Shaw-Miller has pointed out in his essay ‘Modernist Music’, the modernist conception, has two streams developing at the birth of modernism “which are generated from music.” [6] This bilateral condition of modernism stated in on the one hand ‘abstract formal techniques’ and on the other in ‘multi-sensory’ or ‘multidimensional model’, they can be called ‘formal modernism’ and ‘contextual modernism’, ‘pure’ and ‘hybrid’.

Within this time period, among these stage experimenters, Thomas Wilfred conditioned himself with an exceptional aesthetic notion. He suggested a new art form, Lumia, the Art of Light, which has only light as an independent aesthetic medium. With his instrument named Clavilux, he performed silent compositions –which were also given opus numbers like in music - controlled by a keyboard.  He stated the aesthetical concept of Lumia as “The use of light as an independent art-medium through the silent visual treatment of form, color and motion in dark space with the object of conveying an aesthetical experience to a spectator.” [7] In 1920’s, when Thomas Wilfred began his experiments with Clavilux, some other artists saw advantages of film as a medium to express their abstract context. Oskar Fischinger, Walter Ruttmann, Hans Richter and Viking Eggeling, they all became aware of potentials in forth dimension. Their formalist tendency in abstract experience has broadened the essential meaning of composition, rhythm, color and the form.

It is reasonable that not all art forms are pure. Jerrold Levinson defines hybrid art forms as “art forms arising from the actual combination or interpenetration of earlier (existing) art forms.” [8] Then he goes on to categorize hybrids as juxtapositional (additive), synthetic (fusional) and finally transformational in which visual music –in the form of abstract color film- is considered as. But for Levinson, because of the transformation of music (western classical music) into abstract film is not structurally and thematically possible therefore he claims visual music as a nonexistent art form. Nevertheless he mentions that these kind of nonexistent art forms could someday exist “by appeal to radically new means and media that technological advance will make available.” [9]

References and Notes: 

  1. Aristotle, “De Sensu,” (accessed May 20, 2011).
  2. Anthony Ashton, Harmonograph (New York: Wodden Books, 2003), 46.
  3. E. H. Gombrich, The Sense of Order (New York: Phaidon Press, 2006), 6.
  4. Donald Mitchell, The Language of Modern Music (New York: St Martin’s Press, 1966), 77.
  5. Clement Greenberg, “Towards a Newer Laocoon,” in Art in Theory 1900-2000, ed. Charles Harrison and Paul Wood, 565 (Blackwell Publishing, 2002).
  6. Simon Shaw Miller, “Modernist Music,” in The Oxford Handbook of Modernisms, ed. P. Brooker, A. Gasiorek, D. Longworth and A. Thacker, 599-617 (New York: Oxford University Press, 2010).
  7. Thomas Wilfred, “Light and the Artist”, The Journal of Aesthetics & Art Criticism 5, no.4 (1947), 252.
  8. Jerrold Levinson, “Hybrid Art Forms,” Journal of Aesthetic Education 18, no. 4 (1984): 6.
  9. Jerrold Levinson, Contemplating Art: Essays in Aesthetics (New York: Oxford University Press, 2006), 118.

Towards a Taxonomy of Interactivity

In this paper I take a critical look at the notion of interactivity; an inadequately defined term. It has very deep roots within the world, both as it self-organises and as we construct it. I offer an analysis of interaction based on the notion of “relations” as a general term for the interconnections through which all interactions occur. I examine the degrees of relations that operate so that information flow between entities is enabled.


When one looks critically at a great deal of the contemporary new media art that is described as interactive, one finds a huge variance in its formal manifestations such as

  1. the form of the artwork and its technical constructs; i.e., whether it is an installation or a performance, a sculpture or a software application
  2. the location and accessibility of the artwork; whether it is standalone or networked, situated in public space or gallery space
  3. the kinds of interactors; whether human or machine, audience or individual, or even computer to computer.

These kinds of categories constitute the taxonomy being developed by Katja Kwastek and visualised by Evelyn Munster for Ars Electronica [1] and no doubt others. However this kind of taxonomy does not investigate the actual processes of interaction itself, i.e., it does not explore what happens in terms of the flows of information and signs between entities, human or machinic, when engaged in these kinds of interactions. It is the intention of this paper to develop, in a preliminary form, a taxonomy of the kinds of relations through which information and signs flow in the actual activity of interaction, where relations constitute the connections or linkages between entities.

Interaction – and its co-relative, participation – have a very wide range of structures. Interactive artworks occupy a wide range of levels between the potential fully conversational robot and the video replay that simply switches on when the spectator enters the gallery. These run the gamut from the 'interaction' in the mind of a viewer in their active mental interpretation with an entirely passive artwork (a painting or sculpture) through to detailed and creative conversation between individual people and possibly between people and machines. It is the notion of the conversation that for me constitutes the full concept of an interaction.

In order to come to grips with an understanding of interaction in contemporary art and its range from the entirely one-sided to the fully conversational, I want to assemble an understanding that is as general as possible, so that it is not restricted in its application simply to human-machine interaction, but is operational over the full range of processes that can be described as interactions, be these the exchange of chemical products between micro-biological entities or the possibility of having a truly conversational relationship with a robot in the way that you and I might interact when we are having a conversation. This generality should then allow us to go even further, to the point where we might engineer truly conversational relationships between machines. I use the term 'conversational' because it entails a notion of inventiveness, which we might think of as the capacity to generate new behavioural repertoire and by which we might be able to produce a true artificial or machine intelligence, i.e., a creative machine, one that can not only trick us into believing that it has passed the Turing Test, but can truly generate new and creative ideas.

Interaction implies reciprocal actions or influences of two (or more) entities upon each other, where an entity is some kind of organised object of multiple components that has some degree of autonomy and agency. Thus, interaction takes place between entities that possess the capacity to act for themselves. It also requires that these actions alter the internal (cognitive) structure of each. At the most basic level interaction is fundamental to life, since it is the means by which an organism deals with and adapts to its environment. [2]

While this paper examines the interaction between an artwork and its spectators, in general the entities that engage in this reciprocal behaviour may be biological, social or machinic. Of the biological, at the lowest possible level of entity are single-celled organisms, at the highest possible level are whole societies or even whole ecosystems, though I will discuss human organisms for the most part. Machinic entities are artificial or constructed, e.g., technical and computer driven installation art, robots, and other forms of potential artificial intelligence. These devices must be, in some sense, adaptive, i.e., able to change their state to accommodate changes in their environment. This is a necessary first condition that has to be possessed by any entity that will undergo interaction.

There seem to be two general terms that apply. One, participation, though not usually thought of as involving relations with some kind of machine or constructed object, may be characterised as one-to-many, and involves engagement with a group of others in an event of some sort. To participate is to place oneself in the context of some process and to engage with whatever it offers that allows some kind of entry into the overall event. One becomes part of some larger thing or event that is the participation, per se. These can be happenings, theatrical events, and events in which the spectator has to supply at the very least their presence so as to complete the work. There will be multiple processes of connection developing over time producing a wide spectrum of activities.

The other, interaction, involves engagement with, usually, a single other entity (person or machine), and is commonly one-to-one. Again the work is not complete without the interaction, but here the focus is on reciprocal relationships and their development over time. To interact is usually considered to involve engaging with devices of varying kinds through the exercise of controls or sensors or other data gathering attachments that provide information as to changes in local conditions, and thus permitting the spectator, as user, to participate in the process of some 'thing' so that some kind of reciprocal relationship develops with it.

A distinction is often drawn between participation and interaction. This has an historical basis in that the word participation applied to early (1960s) forms of happenings and other event based art, whereas an interaction is usually thought of as being between a computer-driven or other machinic device and a person. In English usage, one 'participates in' but 'interacts with' some thing. However, in both situations it is the coherence of some larger process – a product of all the entities involved and greater than each entity when each is seen as an individual – that one experiences. As Pask has noted:

an observer who comes to know the system must be a participant in the system. The boundaries of the system, far from being pre-fabricated, are created by the activities of the system. This is a prescient notion of autopoiesis, or organizational closure.” [3, 353]

The use of the term 'participation' arose in the period of the happenings of Allan Kaprow or, for example, Nam June Paik's Participation TV (c.1963). Apart from the obvious person-to-person interaction required in a happening, I suspect that this distinction has lately been drawn through a need for a formal distinction between works produced through the use of analogue systems and works in which the computer is the locus of the choice-point selection process that is seen as interactivity in much recent contemporary art. Thus, participation and interaction can be shown to have a very similar set of characteristics whatever the technological means.

If participation was analogue and interaction is digital then given that they are both means by which one develops some kind of relationship with an other; be they animal, human, significant, analogue or digital machine (computer), environment or any combination of these, then are they not two words for the same thing? The key is that some sort of communication transpires; a reciprocal exchange of (generally 'meaningful') information that endures because of that meaningfulness and its reciprocation. Thus I argue that the separation between participation and interaction is meaningless, artificial, and misdirecting.

But what exactly are the processes of interaction? What are its characteristics? Firstly, whether the process is direct – through the exchange of chemistry (e.g., in biology), or mediated – through language or any of the extensions of our capacities that are embodied as analogue or digital technologies; a body, or some material functional object the states of which are alterable, must be involved. Thus, interaction must be embodied. Interactivity, being medium independent, needs some sort of physical channel through which information transfer can occur. These channels function as relations to the other and could as easily be speech as they could be a camera sensing people walking into the gallery. Ultimately what counts is what is recognised in the sensing and that, like it or not, is analogue.

Interaction is the relational dynamics that occurs between an entity – an organism or device – and its environment.

An organism is any coherent biological entity that metabolises energy in order to maintain that coherence (its organisation) within an environment, to gather and process information about its environment, and to permit its reproduction. At the machinic level are devices that are in some sense adaptive, i.e., that can change their state to accommodate changes in their environment. The need to be adaptive is a necessary first condition that has to be possessed by any entity that will undergo interaction.

An environment is all other organisms and the physical, social, cultural and machinic context that constitutes the experiential space of an organism for any duration. Every thing that is in some sense other to (i.e., not) an organism is its environment. Only the most sterile of environments are entirely passive or neutral; thus interaction and its corollary, adaptability, are necessary for any entity that has to survive in an environment.

To any organism its environment is 'active' when other organisms interact with it by competing with it for resources or by generating outputs (signals) into the environment which may or may not be useful to it. Its capacity to adapt to changes in its environment is essential to its continued coherence and its reproduction. This adaptive capacity is tested by its capacity to use the resources in its DNA, or program, and its stored experience to handle day-to-day changes. But to 'know', in any sense, about those changes it must be able to sense its environment and effect internal changes that accommodate those sensed external changes. These processes are structurally fundamental to interaction, and they constitute the primary level of the process of communication. [2] Further, they require the two orthogonally related conditions of 'autonomy' and 'agency'.

Autonomy implies that an entity can stand alone in some sense, making decisions based on its own knowledge of its situation. Its etymology is from the Greek auto for 'self' and nomos for 'law', i.e., self-driven or self-governing and, thus, self-regulating. Based on this we might think of something like a static autonomy, for example an object such as a painting or a sculpture that is complete in itself, through to an active mobile autonomy best represented by a living organism that is capable of moving, feeding, sensing and, overall, making decisions for itself. The notion that a static object, something that just sits there and does nothing is autonomous seems trivial but it stands as the lower bounding case of autonomy. We normally think of autonomy as applying to an entity that is in some sense self-sustaining; that has the capacity to sense its environment, operate on it, and thereby make decisions for itself, and thus we start now to see a merging with the notion of agency.

Agency is that property of an autonomous entity that is its capacity to act in or upon the world. That is, having made a decision it has the capacity to carry out (or execute) that decision.

The kinds of entities that have both autonomy and agency will be both biological (living) and artificial (constructed), e.g., robots and other attempts at artificial intelligence, and our chief interest here, installation artworks.

Adaptation by an entity to its environment both requires and supports its autonomy, allowing the organism to behave independently of other organisms, survive on its own and enact its own decisions. An organism's autonomy requires internal feedback relations in which aspects of its internal system can enact the regulation of its local environment in intentional ways. When these relations spread beyond the organism's boundaries you get social environments in which organisms communicate, sense and have intentionality and from this comes interaction. [4]

There are degrees of relations that may develop when a spectator encounters an artwork. These may manifest in several possible ways.

Degree 0: The artwork may be entirely passive and the only interaction is that process of the interpretation of an artwork that the viewer has to make to be able to see it and render it meaningful to them. Such action takes place entirely within the viewer, and although it is dynamic it has no impact on the artwork, which is itself entirely passive.

Degree 1: It may be triggered to start some playback of a pre-programmed sequence. Obviously interpretation on the part of the spectator is involved, but now the work becomes, in a very limited way, active. However there is no further impact on the artwork beyond the commencement of its pre-programmed trajectory.

Degree 2: It may respond with an action of some sort which will in turn draw further behaviour from the spectator. The artwork can now be said to be interactive. This is the common 'interaction' that occurs when the actions of a spectator elicit some sound, movement, visual or other event from the work that, crucially, causes the spectator to make further moves that are sensible to the artwork, thereby eliciting different sound or visual events from it. It is the kind of interaction that a musician has with an instrument, and the spectator may in fact be able to develop some skill with the artwork so as to be able to play it like an instrument. However here, the artwork's responses are all preprogrammed in the sense that a particular movement or action will elicit one particular response from the object, or may force a selection from several possible responses depending on, say, a contingent branch in the program flow.

Degree 3: If further movements of similar type are produced, a changing range of responses (e.g., new sounds) may be produced, since, having made one response the machine may then 'know' to make a different, albeit preprogrammed, response when given a similar stimulus.

The above classification is not dissimilar to that of Cornock and Edmonds. [5] For them interactions could be:

  1. static: allowing no opportunity for interaction
  2. dynamic-passive: change in response to environment, but not influenced by users
  3. dynamic-interactive: “generate outputs that correspond directly to input from audiences.”

and more recently Edmonds, Turner & Candy's addition of the class: dynamic-interactive (varying):distinguish[es] articles that change over time, either through automated learning or through updates from the artist.” [6]

However they do not go far enough, thus:

Degree 4: By any measure the peak class of interaction is conversation – an ongoing, inherently stable, multi-sided, adaptive process of information transfer, that consists in alternating, reciprocal production and transmission of information and response to that information, through consideration, recognition (of signs), understanding (of their meanings), development or extension of 'ideas' embodied in the messages and the production of further transmissions. Conversation must involve understanding which is a function of a mutually agreed, or learned, set of signs (language) that convey the meaning.

Conversation necessarily involves feedback; the closing of the loop through the response by the second party, which is in turn considered and responded to by the first. Thus a continuing cycle of feedback undergoes temporal development as the conversation continues, and each party is, at least, able to utilise its existing repertoire of behaviours – ranging from language and gesture to the demonstration of objects and processes or the operation of machines. This cybernetic feedback relation, though largely neglected in contemporary art over recent decades, provides a framework of immense value in understanding how interactive systems can work, and it is the circular feedback system that renders the conversation something greater than what exists within each party, such that its coherence gives it a mutually embodied autopoietic presence.

More interestingly, in any intelligent entity (living or artificial) the learning of a set of signs to convey meaning will require the development of new behavioural repertoire through a process of adaptation. The interactive context (the environment) will make demands on each entity and place constraints as to the effectiveness of any behaviours, moulding the development of any new repertoire.

In sum, while none of this is strictly 'new,' little of it has been spelt out and incorporated into contemporary art practice. Pask's work would be the only counter example. Regarding conversation, and paraphrasing him [3, 358–360]:

  1. Conversation between individuals occurs over time and alters the mental contents of each individual over that time.
  2. Conversations have a start and a finish and unfold over time, although they may run in parallel, supposing more than two individuals are engaged.
  3. The conversation is in the the union of the minds of the individuals engaged. That is, it exists as a superstructure that is not contained exclusively in either mind but necessarily is a function of the activity of both.
  4. There is a process of feedback that gives conversation its unifying character.
  5. There is a “transfer of tokens” (language, signs) between each individual within the conversation.
  6. These 'tokens' must be mutually understandable. However, the interpretations of the conversation are nevertheless a function of each individual mind.

While many interactive artworks demonstrate some level of “reciprocal production” I know of none that have achieved a truly conversational level of interaction. This I suppose is due to the intractable problems of building a true artificial intelligence. One of the nearest approaches to this status is in Stelarc's work, the Prosthetic Head, [7] which has also been the locus of a great deal of work intended to produce various aspects of this capability under the framework of the Thinking Head project. [8]

Finally, I list the component sequences (i.e., the dynamic relations) of the process. What interaction needs is

  • A potentially dynamic system in some environment.
  • The entry of an interlocutor and a stimulus generated by that interlocutor.
  • A response to that first stimulus, functioning itself as a first return stimulus.
  • A further response to that first return stimulus followed by a further response on the part of the second party.

This must develop into an ongoing loop of stimulus-response sequences. Ideally it should follow a coherent line of development, and it might stimulate the production of new behavioural repertoire. Thus begins the process of developing a creative machine, in the way that we are creative.

References and Notes: 
  1. K. Kwastek and E. Munster, “Theme Landscape: Interactive Art,” (accessed September 2011).
  2. S. Jones, “Interaction Theory and the Artwork,” in Proceedings of Interaction: Systems, Practice and Theory Conference, ed. E. Edmonds and R. Gibson, 283–303 (Sydney: UTS, 16–19 November 2004).
  3. G. Pask, “Heinz von Foerster's Self Organization, the Progenitor of Conversation and Interaction Theories,” in Systems Research vol.13, no.3 (1996): 351–364.
  4. Stephen Jones, "Sensing, Communication and Intentionality in Artificial Life," in Proceedings of the 5th International Symposium on Artificial Life and Robotics, ed. M. Sugisaka and H. Tanaka (Oita, Japan: AROB, 2000), 26–29.
  5. S. Cornock and E. Edmonds, “The Creative Process where the Artist is Amplified or Superseded by the Computer,” Leonardo 6, no.1 (1973): 11–16.
  6. E. Edmonds, G. Turner and L. Candy, “Approaches to interactive art systems,” Graphite '04, Proc. 2nd international conference on Computer graphics and Interactive techniques in Australasia and South East Asia (New York: ACM, 2004), 113–117.
  7. M. Smith, Stelarc: The Monograph (Cambridge, MA: The MIT Press, 2005), 68–79.
  8. C. Kroos, D. C. Herath and Stelarc, "The Articulated Head Pays Attention," 5th ACM/IEEE International Conference on Human-Robot Interaction (Osaka: ACM/IEEE, 2010).

Enhancing Spatial Experiences through Digitally Augmented Spaces:

This article will be discussing Augmented Reality in a trans-disciplinary approach to discuss the possibility of enhancing the experience of space through our bodies by augmenting it digitally in a 3d setting. The article will be searching for how could the experience of space enhanced digitally in terms of body-space interaction and what might be the consequences of this change.




It has been two decades since Weiser had put forward the vision of Ubiquitous Computing in which he was foreseeing the shift away of computing from a desktop centered state towards a pervasive computing with smaller mobile devices distributed throughout the space (Weiser 1991). Today this vision is being realized with the emergence and pervasive use of tablet devices and smart phones connected wirelessly to public or private networks and each other as well. And supporting this vision, wearable computers and head mounted displays (HMD) has been much more accessible for end users in terms of price and ease of use.

The 1990s has been the era of virtual reality and most of the discussions in economy, politics, architecture and urban design theories were focused on the notion of cyberspace experience. However among the emerging new media and digital interaction technologies the Augmented Reality (AR) technology enabling to overlay the digital information over the physical space is turning back the discussion on the physical space again. But this time a physical space which is augmented with dynamic digital media.

Cybertectonic Space

The term Augmented Reality is broadly being used for the Computer Vision technology which enables overlaying 3d registered digital dynamic media on physical space (Azuma 1997). But on the on the other hand in a wider perspective, in terms of spatial experience, any situation enhancing, transforming or manipulating our experience of space may be understood within the context of Augmented Reality.  Emphasizing this point of view Manovich brings up the concept of Augmented Space, re–conceptualizing augmentation as an idea and cultural and aesthetic practice rather than as a technology (Manovich 2006).

In this perspective Augmented Space may be considered as a new space or transformation of the physical space with digitally overlaid data and even as a new realm providing a place for Being–in–the–world (Heidegger 1978).In another terms Augmented Space is a new realm containing virtual elements in a real physical space. Here reality and virtuality are not considered to be opposite concepts but they are viewed as lying on the opposite sides of the reality–virtuality continuum (Milgram et al. 1994) where Augmented Reality is located somewhere in–between.

Regarding McLuhan’s perspective involving media as an extension of human body (McLuhan 1995)  Augmented Reality can also be discussed as a new medium  extending our bodies and therefore providing new possibilities of space experience. And from now on to pull back the attention from the computer vision technology to architectural domain, the new in–between space experience made possible by Augmented Reality will be called as cybertectonic experience, coining the words cybernetics/cyberspace and architechtonics. And the new space produced via cybertectonic experience will be called as cybertectonic space.

Within this context it might be concluded that Augmented Reality is not just a Computer Vision technology providing possibilities for overlaying digital media on physical space but more than that Augmented Reality is a concept strictly related with spatial experience and bodily perception and therefore in an ontological perspective it provides a new in–between place in the continuum of real and virtual and thus resulting a new problematic domain in Architecture that should carefully be considered. This new problematic domain might be a ground for understanding the structure and qualities of cybertectonic experience and how cybertectonic space is produced.


The Library+ project is a pilot study that has been experimented with undergraduate architecture students of Istanbul Kultur University Department of Architecture at the main campus library. The main goal of the project was to create an Augmented Reality scene in the conventional setting of a library by overlaying digital information over the physical space to provide a new realm for cybertectonic experience and observe, discuss and evaluate possibilities of cybertectonic space in an architectural point of view.

The hardware used in the project was a backpack system consisting of a laptop computer and a Head Mounted Display and the open source AMIRE marker tracking based augmented and mixed reality authoring tool has been used as software. Marker based augmented reality authoring software makes it possible get much more precisely set scenes in the physical space than gps based systems and yet easier to install and setup compared to RFid tag based systems since the only thing needed to register any digital object in the physical space are black and white patterns printed on papers which are called markers. AMIRE, not only making possible to overlay digital data on physical space but it also allows yo to design some interactive scenes according to parameters like distance of the body from the digital object, distance between two objects and markers and some logical operators like if and then. With all these parameters much more interactive cybertectonic spaces could be created and this interaction is not just pressing buttons and triggering some events more than that and most importantly the scene is interactively constructed  based on body gestures and the movement in space, turning around objects, bending over and etc.

Therefore the cybertectonic library space is not only a physical space ornamented with digital media, it’s a space which is open to interact with, a space that encourages bodily perception (Merleau–Ponty 1996) of the digital, which is intensively connected to or extending the physical.

Concluding Observations

We have experienced that Library+ offers a completely new library experience rather than searching the catalogs in the computers and accordingly finding the books in the appropriate shelves and reading them eventually. The space encourages one to discover the library space and reveal the hidden information between shelves or books and leading him or her to possibly unexpected results or experiences. The cybertectonic space of the Library+ changes the linear and usual process of research or reading experience happening in a library and we have also observed that the movement of the body dramatically changes in the cybertectonic library experience as expected.

Supporting the cybrid principles that Anders has mentioned earlier (Anders 2008) we have found that reciprocity between the physical and the virtual elements of cybertectonic space is an essential issue which should be undertaken as a key principle when creating cybertectonic experiences. Otherwise, the weak relationship of physical space with the digital content may prevent to create meaningful cybertectonic experiences when aiming to enhance the experience of space.

As a conclusion it could be told that Augmented Reality technology brings up a new problematic domain in to Architecture that should be discussed in terms of experience of space. The cybertectonic space is capable to change the conventional experience of space and therefore as an extension of our bodies it is letting us to have new experiences of space, leading us to get in a new ontological state of Being. With the emergence of smart phones and tablets, the concept of ubiquitous computing has the potential to create ubiquitous cybertectonic experiences and we believe that further work in architectural domain is essential to understand and interpret the cybertectonics of space.

References and Notes: 

Lev Manovich, “The poetics of augmented space: learning from Prada,” Visual Communication 5, no. 2 (2006): 219–240.
M. McLuhan, Understanding Media The extensions of man (Cambridge, Mass., 1995).
M. Merleau-Ponty, Phenomenology of perception, Cognitive Science (Motilal Banarsidass Publisher, 1996).
Mark Weiser, “The Computer in the 21st Century,” Scientific American 265, no. 3 (1991): 94-104.
Martin Heidegger, Being and time (Wiley-Blackwell, 1978).
P. Milgram et al., “Augmented reality: A class of displays on the reality-virtuality continuum,” in Proceedings of Telemanipulator and Telepresence Technologies, vol. 2351 (Citeseer, 1994), 282–292.
Peter Anders, “Designing mixed reality: perception, projects and practice,” Technoetic Arts: a Journal of Speculative Research 6, no. 1 (May 28, 2008): 19-29.
Roland T. Azuma, “A survey of augmented reality,” Presence-Teleoperators and Virtual Environments 6, no. 4 (1997): 355–385.


Embodied Schemas for Cross-modal Mapping in the Design of Gestural Controllers

A conceptual framework for the design of intuitive gestural controllers for timbre manipulation from an Embodied Cognition perspective is proposed. This framework incorporates corporeal mimesis, affective/kinaesthetic dynamics, image schematic and conceptual metaphoric organization in order to speculate on possibilities in the design of mapping schemes between interface and synthesis algorithm.



In contrast to traditional musical instruments, digital interfaces introduce an “arbitrary” component by encoding physical gesture as an electronic signal. This arbitrary component interrupts the natural, ‘felt’ sense of the interaction between performer and instrument and necessitates the introduction of control metaphors and mapping schemes that are not necessarily readily intelligible for non-technical users. 

This paper focuses on the design of interfaces for the control of synthesized timbre. Several mapping strategies for control of synthesis have been proposed. However, it has been noted that empirical investigation of the “natural tendencies for multimodal mapping” is required in order to elaborate a generalizable set of cross-modal mappings. [1] We propose that the design of transparent, intuitive interfaces for timbre manipulation can be grounded in empirical analysis and subsequent artistic practice-based research focussed on cross-modal correlations between the user's embodied experience and a timbral perceptual domain. Gestural control is suggested as an appropriate paradigm in this context because it offers the possibility of affording intuitive control of sound via an intermediate ‘embodied’ mapping scheme based on ‘embodied’ characterizations of timbre and gesture.

Embodied Cognition offers a promising conceptual basis for this research. We suggest that structures common to cognition, multimodal perception and physical gesture can be identified. These correlations may then form the basis of a generalizable set of cross-modal mappings to be incorporated in the design of a gestural controller for timbral manipulation. This paper is a speculative outline of some possibilities for the application of embodied cognition in the design of gestural controllers for timbre manipulation. 

Embodied Cognition


Embodied Cognition is an emerging research strand in a number of different disciplines that posits embodiment as profoundly constitutive of cognition. In general, Embodied Cognition analyses support the view of cognitive processes as creative, figurative acts structured by the teleological interactions of embodied agents with their environments and cultures rather than as largely passive processes of representation and calculation. [2] A central claim of Embodied Cognition is that cognition and action are inextricably linked in lived experience. [3] Significantly, studies suggest that the specific structures of an organisms perceptual apparatus contribute to a structuring of all cognitive processes, not just those specifically recruited in perception.

Embodied Cognition analyses reveal a number of common primal structures and processes in cognition. These are learned via the common human experience of having a physical body and interacting in physical and cultural environments. Affective/kinaesthetic dynamics, Sensorimotor Mimesis, Image Schemata and Conceptual Blending have been identified for empirical analysis in this study. Essentially, the research question here is, for example, are Image Schemata activated by judgments of timbral quality and by gesture production and recognition? Can such schema be reliably demonstrated empirically? If so, what correlations exist, if any, between schema activated by timbre and those activated by gesture?

Image Schemata

Image schemata are basic features in cognitive processing acquired in early childhood development. They are based on basic dynamic embodied patterns and they give structure to more abstract conceptual processes. They are non-propositional, pre-conceptual structures that have a role in organising mental representations into coherent, meaningful units. They have a basic logic and are implicated in the formation of new concepts. Significantly for this study, they are focussed on perceptual interactions, and bodily and linguistic experience in social and historical contexts and they are inherently meaningful because of this embodied grounding. They are neurally encoded. [4]

The following are three simple examples of Image Schemata. The Link Schema is a simple physical or metaphorical structure whereby one object is linked to, and constrained by, another according to a basic symmetrical logic. Examples are the concept of a causal “connection”, or the physical act of connecting a laptop to its power source. The Container Schema has an Interior-Boundary-Exterior structure. It is based on the bodily experience of being a container and of being contained by something. The basic logic is that of the mutual exclusivity of interior and exterior and the necessity of being in one state or the other. The Source-Path-Goal Schema is based on bodily experiences such as throwing a ball. Other examples of Image Schemata as suggested by Johnson [5] are part-whole, centre-periphery, link, cycle, interaction, contact, pushing, balance, near-far, front-back and above-below.

An analysis of concepts of physical relation in the sentence, ‘The book is on the desk‘ [3] illustrates the general form in which Image Schemata are implicated in the construction of meaning in linguistic expressions. Meaning here emerges by virtue of the fact that the Image Schemata activated by the linguistic expression are derived from a composite of neurally encoded embodied experiences such as being above, being in contact with, supporting, and so on. These experiences activate specific schemata which give meaning to the word ‘on’ in this expression. The  experience of being ‘above’ yields an orientational schema; ‘contact’ yields a topological schema; and ‘support’, a force-dynamic schema. An important aspect of this process is that meaning does not arise through some kind of correlation between concept and objective 'fact', or by correlation of symbol and referent, but through a composite of activated schemata.

It is significant that several of these schemata have a structural identity with gestures or gestural components (e.g. source-path-goal schema, centre-periphery, above-below, pushing etc.). Johnson also presents analysis of image schemata across visual, kinesthetic and auditory modalities. [5] Image Schemata, therefore, may offer a means of organizing gestural analysis in terms of the embodied structures of perception in other modalities.

Conceptual Blending

Lakoff and Johnson [6] argue that there are no literal definitions, only trans-domain mappings via metaphor. New concepts or categories are formed via the conceptual blending or mapping of a set of schemata from one domain to a target domain. For example, conceptual blending based on the Source-Path-Goal Schema yields complex causal patterns, such as that of ice changing state as it melts. Metaphorical conceptualization is, therefore, significantly constitutive of all thought. It is systematic, fundamental to language and thought and embodied. Such conceptual blends are implicated in cross-modal correlations such as judgments of timbre as being, for example, rounded, harsh, warm, heavy etc. It is important to note that gesture is fully characterizable within the framework of conceptual metaphoric mappings across modalities. [7]

Sensorimotor Mimesis

Sensorimotor Mimesis is any of several means by which humans imitate, consciously or unconsciously, covertly or overtly, an environmental or social stimulus. It is a crucial process in the psychological development of infants. It is often co-activated with other behaviors such as speaking, listening to music and working out logical problems. Mimesis is particularly significant because it is appears to be implicated in cognitive tasks such as understanding and in affective responses to stimuli. In particular, vocal mimesis seems to be an important part of the music listening experience. People have the tendency to move in sympathy with music and, it appears, that this behavior is fundamentally constitutive of our understanding of, and affective response to music.

Affective/Kinaesthetic Dynamics

There is a formal congruency between motion and emotion. The felt quality of emotion is grounded in demonstrable dynamic patterns of physical expression. This is a reciprocal dynamic, emotions are shaped by motor attitudes, just as physical movement expresses emotion. Empirical studies by Bull [8] demonstrate the inability of subjects to experience particular emotions while adopting postural attitudes considered to be antithetical. Emotional expression is a kinetic phenomenon that has spatiality, temporality, intensity, and manner of execution. Phenomenological bracketing has been used to elucidate the dynamic structure underlying these forms. Tensional quality is mediated by the felt effort of postural attitude. Linear quality is directional contour of movement. Amplitudinal quality is the amount of extensiveness or constrictiveness of a posture. Projectional quality is the manner is which energy is released.

Sensorimotor Mimesis, considered along with Affective/Kinaesthetic Dynamics give us a theoretical framework within which analyse corporeal responses to stimuli. Mimetic responses to music are, seemingly, of a piece with our affective, aesthetic response. Affective/Kinaesthetic dynamics offer a structure whereby affective response can be assessed.

Hypothesis: Embodied Schemas for Gestural Manipulation of Timbre 

In light of the above, we suggest that responses to timbre stimuli may be characterized as mediated by corporeal mimetic engagement according to affective/kinaesthetic  dynamic structures. Timbres ‘feel’ a certain way due to this corporeal engagement. We further suggest that judgments of difference or similarity between timbres is mediated by an Image Schematic/Metaphoric structuring of differing corporeal attitudes. We suggest that such mapping processes mediate meaningful ‘navigation’ of timbre space via physical gesture. This view is supported by the observation that natural linguistic descriptions of timbre tend to emphasize embodied, cross-modal mappings through the use of multi-modal, embodied descriptors.

We suggest that empirical analysis may show that identical Image Schemata are activated by motor, visual and auditory stimulation. Substantial trends in cross-modal association have already been demonstrated. [9] We speculate that common cognitive structures may be shown to account for these correlations and to account for the tendency to describe sound in terms of weight, force, speed, intensity, emotion, spatial position and orientation, containedness, gravity, density, amplitude, colour, order, chaos etc. Essentially, we maintain that identical Image Schemata structure perception across modalities, thus allowing for perceptions of close correlation between certain gestures or shapes for example and certain timbres. 

Such correlations would present a grounded basis for mapping schemes in gestural interfaces for sound synthesis. We propose that an empirical study of gesture performance and timbre perception that attempts to find the basis for cross-modal mappings between each is the first step in the design of these mappings for synthesis interfaces.

Design Approach

We propose empirical analysis and subsequent artistic practice-based research in order to elucidate cross-modal patterns. Study 1 will take the form of a preliminary subjective study designed to assess Image Schema activation in listeners in response to audio presentations of synthesized timbres and also, to assess the methodology with a view to later tests. The intent here is to assess the viability of a methodology which assumes that the perception of timbre difference is structured according to Image Schematic principles. Some of the following schema will form the test set; Path, Source-Path-Goal, Center-Periphery, Compulsion, Attraction, Link, Scale, Equilibrium, Full-Empty, Near-Far, Mass-Count, Iteration, Above/Below, Vertical Orientation, Length (extended trajector), Rough-smooth/Bumpy-smooth.

Subjects will be presented with a set of synthesized timbres. These will be designed to include examples of pitched tones, percussive tones, instrument models and novel timbres. Each timbre example will be presented in pairs, once in an unmodified fashion then with one modification. Timbres will be modified in terms of some of the well-established factors known to be involved in timbre e.g. spectral envelope, harmonicity, spectral centroid, attack and decay envelopes.

A number of methods for assessing Image Schema activation are currently being considered. Subjects may be asked to supply linguistic descriptions of the presentations. The descriptions will be constrained to a limited set of options or to a defined metric and chosen in a an onscreen interface. Subjects may also be asked to select the most appropriate graphical representation of an Image Schematic structure that describes the relationship between pairs of presented timbres. Synthesis will be implemented in Max/MSP as will the user interface and response data collection system. Data will be subjected to statistical analysis in order to identify trends, if any, in subject response.

Study 2 will incorporate gesture capture via inertial sensors and a pair of sensor-enabled gloves. These sensors will enable the tracking of the relative position of the arms and the amount of contraction/expansion in the arms. Subjects will asked to adopt whatever gesture or gestures they feel are appropriate responses to timbre stimuli as presented in the pilot study. Subjects will also be asked to perform gestures in response to linguistic and pictorial presentations of image schema as detailed above. This test will yield some measure of the corporeal mimetic response to timbre presentations in that it will provide some data on the Linear, Amplitudinal and Projectional qualities of the affective/kinaesthetic response. It will also establish whether it is plausible to assess schema activation in gesture production.

Study 3 will be a test of prototype mappings. If Tests 1 and 2 yield clear data, then there may be correlations between kinaesthetic and schematic activations in the case of both timbre and gesture that can form the basis of the design of systems in a phase of artistic practice-based tests focussing on art installation and music performance contexts.


Embodied Cognition is an area of cognitive science that focusses on perception, cognition and action as profoundly shaped by the human experience of having a body and living in a physical environment and in a culture. It suggests an alternative approach that complements current initiatives in the design of interactive technologies and gestural controllers. Embodied Cognition offers a novel way to analyse the complex interactions between user and technology in terms of the fundamental categories of embodied existence. These categories are involved in organizing perceptual phenomena into coherent, meaningful units and are implicated in the formation of new concepts. In turn, the design of meaningful controllers depends upon empirical knowledge of the fundamental categories whereby human beings interact with and understand their environment. In using these structures as a conceptual basis, it may be possible to  identify correlations between gesture and phenomena, such as timbre, in other modalities. Such correlations are the necessary building blocks of a grounded cross-modal mapping schema on which to base the design of controllers that allow for meaningful gestural interaction with sound.

References and Notes: 
  1. P. Maes, M. Leman, M. Lesaffre, M. Demey and D. Moelants, “From expressive gesture to sound: The development of an embodied mapping trajectory inside a musical interface,” Journal on Multimodal User Interfaces 3, nos. 1–2 (2010): 67–78.
  2. M. Imaz and D. Benyon, Designing with blends (Cambridge, MA: The MIT Press, 2007).
  3. G. Lakoff and R. Núñez, Where Mathematics Comes From: How the Embodied Mind Brings Mathematics into Being (New York: Basic Books, 2000).
  4. T. Rohrer, “Image Schemata in the Brain,” in From Perception to Meaning: Image Schemas in Cognitive Linguistics, ed. B. Hampe, 165–196 (Berlin: Mouton de Gruyter, 2006).
  5. M. Johnson, The Body in the Mind: the bodily basis of meaning, imagination, and reason (Chicago, IL: Chicago University Press, 1987)
  6. G. Lakoff amnd M. Johnson, Philosophy In The Flesh: the Embodied Mind and its Challenge to Western Thought (New York: Basic Books, 1999).
  7. R. Núñez, “A Fresh Look at the Foundations of Mathematics: Gesture and the Psychological Reality of Conceptual Metaphor,” in Gesture and Metaphor, eds. A. Cienki & C. Müller, 93–114 (Amsterdam: John Benjamins, 2008). 
  8. N. Bull, The attitude theory of emotion (New York: Nervous and mental disease monographs, no. 81, 1951).
  9. J. Simner, C. Cuskleyô and S. Kirbyo, “What sound does that taste? Cross-modal mappings across gustation and audition,” Perception 39 (2010): 553–569.

intraInter socialite: Emoticon Jacket for Social Interaction

intraInter socialite is a wearable computing experience that investigates the loss of intonation and body language that occurs at the intersection of computers and textual communication. Emoticons, an attempt to express emotional nuances in the virtual realm, are used to subtitle interaction that occurs in the physical realm. This is achieved through force sensors that display emoticons on a jacket with a LCD screen.


Emoticons construct a variable language meant to convey emotions, physical and mental states in a textual context. An emoticon consists of various punctuation and letters from the Latin alphabet to create perpendicularly oriented graphics. Although it creates a graphical language and can be universally understood, it still allows for user and virtual service interpretation. Some services — IM and email clients — convert the text to a graphic, which it is perceived to represent. These graphics are an inconsistent interpretation and vary according to the client.

Two or more users that are using messaging clients can use different clients within the same conversation and get different graphics for the same text. Users can also include additional or fewer characters than what the service recognizes, which will effect whether the client converts the text to a graphical representation or not. For example, :) would in most cases be perceived as a smile graphically, connoting a happy emotion. However, :-) may be the text recognized by the client to produce a graphic. Textual representations and their varied graphical outcomes can change the interpretation of the message and emotion. Some users may feel that :-) is a more effective smile, whereas some would argue that :) or the graphics produced are more effective.

The inconsistencies in representation of emotions can lead to some confusion. Another way in which emoticons can lead to confusion are when the emoticon is not supported by a client or is not common in a user’s repertoire. When an emoticon emerges, the service is often lagging in converting its use to an agreed upon graphical interpretation. Additionally, there are many different emoticons being created to fill the user’s need for emotional expression. Both users may not be proficient in a particular emoticon’s connotation and this may lead to an emotional or contextual disconnect in the conversation.

Just as a person’s physical cues can be misinterpreted by those interacting, emoticons and their inconsistencies can lead to misinterpretation and confusion. They can also contribute to the lexical direction and enhance a conversation. [1] Emoticons provide non-verbal indicators of emotional cues that can be lost in text-based interaction, but also reinforce physical indicators if introduced to a face-to-face social exchange. When introduced to any social situation, virtual or physical, emoticons can be used to reinforce or subvert the verbal/textual message. They can change the message intent/content in as few as two keystrokes. [2] In the virtual realm, emoticons are often a subtitle for text and are often treated as a way to interpret the tone of the message. When bodily or verbal intonation infers one message and an emoticon is introduced that infers another or when both physical and textual cues are given, which is to be used as the interpretation? Do physical indications, or textual cues that are deliberately displayed, reveal the desired intonation?

intraInter socialite is an emoticon jacket with LCD screen (Fig. 1). The focus of the jacket is to create subtexts for interpersonal human interaction. The wearer uses force sensors to create computer textual subtitles for physical interaction. My investigation into wearable computing with this project is an inquiry into the loss of intonation and body language that occurs at the intersection of computers and textual communication as is evident in today's instantaneous communication and technology-centric culture.

This project explores emotional content and expression in multiple ways. The jacket:

  • acts as a non-verbal, non-corporeal intermediary to a bodily and potentially verbal social interaction to create another plane of emotional meaning.
  • potentially contradicts or detracts from that which is physically and verbally expressed.
  • expresses, reinforces and clarifies that which is physically and verbally expressed.
  • is a physical computing experience of a virtual communicative convention.
  • expresses the development of an emotional and graphical mode of expression utilizing textual punctuation.
  • explores whether a barrier or channel is created for emotional content through technology and physical computing.
The piece, intraInter socialite, asks several questions of the interaction between the user and the jacket and the social experience while wearing the jacket. Does the expression of content through electronic means become a prosthesis and/or hindrance for inter-human interaction? If it is a prosthetic, does an emotional language intermediary offer assistance for those with autism or empathic disorders? Is meaning lost and are its prosthesis capabilities diminished when precision is taken away? Emotional content is critical for daily communications and message interpretation. [3] How does the role/character of the interactor serve as a truth-teller and how does it help the user to reinforce false emotional reactions? [4] In this case, the natural method of conveying emotion through applied pressure, leads to an imprecise emotional connection and brings the emotion being conveyed into consciousness for both parties.

In this application, the effort put into replacing the nuances of personal communication with punctuation and textual cues in the virtual realm creates a subtitle for the conversation and interaction that occurs in the physical realm. It creates a range of implied emotion from the wearer. This also introduces an imprecise control over the emoticon displayed and the perception of the emoticon in the context of the interaction. The user has the ability to change the experience of the conversation when they attempt to control the level of emoticon displayed.

The techno-centric geek and socially inept express themselves more freely and create a powerful online or electronic identity through and behind the computer screen. The electronic veil is lifted through a forced vocabulary and a forced interaction in the human world. The wearer has only the jacket to hide behind, no computer screen and no alternate identity, pictures or avatar. With only a jacket in the middle of the interaction, the focus of the conversation may dramatically shift to the screen and what is being displayed. How does this create a “veil” even though the interactors are able to see each other in real-time physical space? Experimentally, adding the language of the virtual world to a physical interaction may allow the users to focus on the jacket instead of the interaction. [5]

Emoticons are an important non-standardized aspect of communication in virtual space that help convey emotion and additional meaning. They were created to help fill an emotional content void that was not being filled by short, text-based communication. Textual punctuation has become its own graphically and internationally interpreted language through emoticons. intraInter socialite attempts to study the effects of adding the emoticon to physical computing and interaction. It also calls to attention the use of textual elements to create graphical emotional elements to a virtual interaction. It allows the user to subtitle and create additional emotional content for the interaction, whether true or false.

References and Notes: 
  1. Daantje Derks, Arjan E. R. Bos and Jasper Von Grumbkow, “Emoticons in Computer-Mediated Communication: Social Motives and Social Context,” in CyberPsychology and Behavior 11, no. 1 (2008): 99-101.
  2. Alecia Wolf, “Emotional Expression Online: Gender Differences in Emoticon Use,” in CyberPsychology and Behavior 3, no. 5 (2000): 827-833.
  3. Shao-Kang Lo, “Nonverbal Communication Functions of Emoticons in Computer-Mediated Communication,” in CyberPsychology and Behavior 11, no. 5 (2008): 595-597.
  4. Ibid.
  5. Joan Gajadhar and John Green, “The Importance of Nonverbal Elements in Online Chat,” in Educause Quarterly 28, no. 4 (2005), (accessed May 3, 2011).

Touch Interfaces – Between Hyperrealism and Invisibility

In this paper current trends in mainstream multitouch interface design are analysed. Based on a critical review of Apple's iOS Human Interface Guidelines and on experience from teaching several multitouch design seminars, recommendations for design practice and some forward-looking statements concerning hyperrealistic interface metaphors will be derived.



Multitouch technology has existed for several years today. [1] While big multitouch tables have mostly been found in public places like exhibitions, small screen devices like multitouch smartphones have become an everyday phenomenon. In both cases the context of use has been different from the use of a desktop computer. Multitouch table systems often are designed for specific content, an individual location and fixed context of use. In contrast, smartphone applications are to be used in any context – due to mobility. With the emergence of medium sized multitouch devices like the iPad, more and more digital products, which are known from a work-related desktop context, are being redesigned for multitouch use. But just like the invention of the computer mouse was a prerequisite and an activator for the invention of graphical user interfaces and new software genres, [2] multitouch interaction is a prerequisite and activator for novel interfaces and the emergence of media formats and applications which are specific and typical for medium-sized multitouch devices.

Today two trends in multitouch interface design are already apparent: Photorealistic real live metaphors like wooden bookshelves on one hand and direct, touch-based interaction without any visible buttons or handles with content like maps on the other hand, or even sensor based interaction.

Realism and Learnability

The real-world metaphor approach has already a tradition in human-computer-interaction. The very first graphical user interfaces of the late 1970s where based on a visible real world metaphor. But due to technical limitations the visual style and the iconography of the desktop interface was quite abstract – black and white pixels only in low resolution. This relatively high level of abstraction helped to forget about the original meaning of these metaphors once the user’s learning phase was left behind and the meaning of interface elements had been internalised. When seeing "menus" in a software application today, no one thinks of a restaurant's list of dishes. The idea of a restaurant menu was helpful in the early years of GUI, but today's computer users would be rather distracted or even confused by a photorealistic imitation of a restaurant menu card with "cut", "copy" and "paste" listed on it.

A second wave of more realistic real world metaphors hit the interface design discipline in the early 1990s when "interactive multimedia" became popular. Abstract and text-based interface elements like menus, buttons and drop-down-lists were replaced by the display of everyday objects in everyday environments. These were again real live metaphors – now with a higher level of detail, showing greater similarity between the real world objects and the visual representation. In spite of significant usability problems the naive realism of these interfaces was a success in so called "edutainment CD-ROM" applications. Attempts to transfer this approach to standard software (like Microsoft BOB) at this time failed completely. [3]

Hyperrealism in Multitouch Interfaces

Today applications on an Apple iPad again look like real objects. E-books look like real books, software calendar apps mimic paper sheets, leather covers and even chrome-plated spiral binding. Compared to 1990s multimedia the level of photorealism and the aesthetic quality are obviously superior, but the concept is the very same. And also the theory behind real-world metaphors is still the same: They should help the user understand and learn how to use virtual artefacts by transferring knowledge from real world interaction to the computer world. In their Human Interface Guidelines for iPhone and iPad, Apple therefore recommend the use of real world metaphors as standard practice: "When virtual objects and actions in an application are metaphors for objects and actions in the real world, users quickly grasp how to use the app." [4] Addressing possible limitations of such an approach, Apple are worried only about possible shortcomings of the real-world antetype’s functionality: "The most appropriate metaphors suggest a usage or experience without enforcing the limitations of the real-world object or action on which they’re based. For example, people can fill software folders with much more content than would fit in a physical folder." [4] This is a quite one-sided view – only focussing on limitations of the real-world object and disregarding the limitations of the virtual object. When we see a book in the real world we exactly know what we can do with it, how we handle and navigate it and we also know what we are not able to do with it. With a photorealistic representation of a book on a screen this is different. Of course the resemblance to a book gives the user some clues how to possibly interact with the interface, but it is quite clear that the user can only interact in ways that are anticipated and implemented by the creator of the software. Probably it is possible to "flip pages". But there are several ways how to flip real books’ pages – where do you have to touch the page, what kind of movement is expected? Is it possible for users to write annotations? How? Does it allow to mark pages with dog-ears – and why not? Is it possible to rip out pages?

Based on everyday experience we know how we can interact with our environment and what we can do with the objects surrounding us. Due to this everyday experience we are even able to anticipate possible uses and interactions with artefacts we have never seen or touched before, just by looking at them. [5] We immeditely know how our body relates to the objects, for instance if we can sit on it or where we can put our fingers in. And we successfully anticipate possible handling and mechanical constraints of objects. Well designed artefacts stimulate these expectations by indices – visual cues communicating their handling – and by that make a product self-explanatory and easy to use. "Which parts move, which are fixed? Where should the object be grasped, what part is to be manipulated? […] What kind of movement is possible: pushing, pulling, turning, rotating, touching, stroking?" [6] Needless to say that the induced expectations should be met at the end. Elements that look moveable should be moveable in the expected way.

Can Interfaces Be Natural?

Most of this kind of everyday knowledge today is not "natural" but deeply rooted in technology driven culture. Interaction with light switches, bicycles or books may feel natural for us, but it is artificial. In any case there is not too much difference in figuring out how we can climb a tree (natural), or how we can use a knive (artificial). Both is based on experience, which implies that it has to be learned in the first place no matter if natural or artificial.

The same is true for virtual interfaces. We make asumptions about how they can be operated and controlled based on experience. This experience today primarily is experience with other virtual interfaces and only in the second place it is based on knowledge aquired while interacting with physical everyday objects. When test users of a gestural interface where asked what kind of gesture they would expect for accessing a selected item, the majority proposed pointing at it twice – a double tap in the air. [7] This is clearly not a natural gesture, but it has been internalised in years of performing double clicks in standard desktop interfaces. With more and more people growing up with digital media, the discrimination between knowledge from the analogue world and knowledge from the digital domain seems to be antiquated and obsolete. For so called "digital natives" a double click is more familiar and feels more natural than cracking a nut or peeling an orange.

When everyday objects are used as interface metaphors some interaction techniques will be anticipated and expected, but the intersecting set of possible interactions shared by real and virtual artefacts is actually rather small and is determined entirely by the software design. So there is are two gulfs to bridge in order to use such an interface effectively. One is the difference between what the real objects allows or affords to do and what the virtual one does not. The second gulf is the difference between what the virtual interface allows or affords and what the real thing does not (see figure 1).

Invisibility and Intuition

Actually the problem is not the difference between the two sets of interaction possibilities but the lack of knowledge about it. In interfaces with a hyperrealistic reproduction of everyday objects this lack of knowledge is mainly caused by a lack of visibility. The interface lacks visual cues of what is operable and what is not.
Despite this conflict between real life metaphors and visibility, Apple also recommend to pay attention to readily identifiable interactive elements: "Controls should look tappable. iOS controls, such as buttons, pickers, and sliders, have contours and gradients that invite touches." [4] Already a superficial analyses of iOs applications shows that this works fine in abstract interfaces where clickable elements are clearly discernable – by visibility and by convention. But real live metaphors often lead to inconsistencies. Shape and materiality of virtual objects "invite touches" where touching has no effect. Then again clickable and movable objects are not identifiable by the eye: paper pages do not look scrollable, telephone numbers do not look clickable.
For decades mobile device interaction lagged behind desktop software, mainly due to hardware limitations. Since the introduction of the iPhone in 2007 it has been the other way round: interaction techniques of mobile devices drive innovation in standard desktop interaction. Apple continue to implement multitouch gestures, which were developped for mobile touchscreens, to classic input devices like the trackpad and the "Magic Mouse" a mouse with a multitouch area on its upper surface. In the tradition of the "direct manipulation" interaction paradigm, this is said to make interaction more intuitive: "New Multi-Touch gestures […] let you interact directly with content on the screen for a more intuitive way to use your Mac." [8] Several different definitions of intuition exist in philosophy and psychology. It is probably easier to agree on what intuition is not: It is not a discoursive or conscious process of reasoning. It rather is a way of judging and decision making without analytical reflection, mainly based on tacit knowledge. Tacit knowledge is indeed unconscious. But it is also, like knowledge in general, based on experience – that means it has to be learned. For instance there is no "natural" way of interaction with a map, because using maps is already a cultural technique. Once we learned how to work with real maps this knowledge can be helpful to work with digital maps as well. Touching and moving maps around works intuitively indeed, but Apple offers more: "New gestures include momentum scrolling, tapping or pinching your fingers to zoom in on a web page or image, and swiping left or right to turn a page or switch between full screen apps." [8]
The popular two fingers "pinch" gesture to zoom maps, images and websites is not intuitive at all: Neither the interface shows any sign that would indicate "pinchability", nor does the idea of a real map or photograph suggest "zoomability". Again the problem is that the virtual artefact does not actively communicate what kind of interactions are possible in addition to our tacit knowledge from the real world. The pinch gesture is successfull, not because it is so intuitive – it simply isn't. It is merely easy to learn and easy to remember. Actually it is not based on a real life metaphor – in the physical world it is hard to find any example where objects can be scaled by simply moving two fingers. But still it is learned and remembered easily due to the simple analogy it is based on: A change in distance of the two fingertips is proportional to the change of the size of the touched object. Accompanied by direct visual feedback the logic of this interaction method is understood immediately.

But without knowing that one can "pinch" a map or a photograph hardly anyone one would try. This simple fact does not attract too much attention because seeing the gesture just once in one of Apple's TV commercials for the iPhone or the iPad will suffice to understand and remember. This leads to the conclusion that interaction does not need to be intuitive, but has to be learnable.


Merely copying reality does not necessarily lead to understandable interfaces. When using real life metaphors, designers have to be very conscious about interaction disparities between real and virtual objects.
Much more important than intuition is a good balance of learnability and effectiveness. What a good balance is of course depends strongly on the type of user and the context. Especially in professional software intuitive use and learnability do not have to be top priority. In the long run rather ease of use and effectiveness are crucial. For a software that is used on a daily basis and for years some learning effort for the sake of effectiveness will be worthwhile.

The terms "simple", "easy" and "intuitive" seem to work perfectly as marketing phrases. As general and universal goals in interaction design they should be refused. Often easy to use artefacts do not have too much potential and power. Just take a violin and a triangle (the percussion instrument) and consider their learnability and their potential – probably not everything in life should be about ease.

References and Notes: 
  1. Bill Buxton, "Multi-Touch Systems that I Have Known and Loved," March 2011, (accessed June 26, 2011).
  2. Bill Moggridge, Designing Interactions (Cambridge, MA: The MIT Press, 2007), 27-29.
  3. Knight-Ridder/Tribune, "Microsoft Bob Still Lives, At Least In Certain Spirit," Chicago Tribune, February 2, 1997 (accessed June 26, 2011).
  4. Apple Computer Inc., ed., iOS Human Interface Guidelines (San Francisco, CA: Apple Computer Inc., 2011).
  5. James Jerome Gibson, "The Theory of Affordances," in Perceiving, Acting, and Knowing: Toward an Ecological Psychology, ed. R. Shaw and J. Bransford, 67-82 (Mahwah, NJ: Lawrence Erlbaum, 1977). 
  6. Donald Norman, The Psychology of Everyday Things (Jackson, TN: Basic Books, 1988), 99, 197.
  7. Dema El Masri, "New Dimensions for compArt" (master's thesis, Bremen University, 2011).
  8. Apple Computer Inc., "Mac OS X Lion With 250 New Features Available in July From Mac App Store," Apple's official Web Site, June 6, 2011, (accessed June 26, 2011).
Syndicate content