LeanMES: Human-Machine Interaction Review

Size: px

Start display at page:

Download "LeanMES: Human-Machine Interaction Review"

Maximilian Crawford
5 years ago
Views:

1 LeanMES: Human-Machine Interaction Review Theory and Technologies Eeva Järvenpää & Changizi Alireza, Tampere University of Technology Date:

2 Document Information Document number Document title Delivery date Main Author(s) Participants Main Task Task Leader Publicity level T3.3. Human-Machine Interaction Theory and Technologies M22 Eeva Järvenpää & Changizi Alireza, Tampere University of Technology Minna Lanz & Ville Toivonen, Tampere University of Technology T3.3: Novel Human-Machine Interaction Harri Nieminen, Fastems PU = Public Version V1.2 Revision History Revision Date Author Organization Description V Eeva Järvenpää, Changizi Alireza TUT Version for LeanMES consortium commenting V Eeva Järvenpää TUT FINAL 2

3 Executive Summary The transformation towards digital manufacturing is going on. Manufacturing ITsystems allow real time data to be collected from the factory floor and displayed to those who need it, when they need it. However, the human factor plays an important role in manufacturing as the involvement of human causes uncertainty to the process. Therefore, specific attention should be placed on human-friendly user interfaces, to improve productivity and reliability of data, and make the workplaces more attractive for future generations. The purpose of this report is twofold. First of all, it intends to give an introduction to human aspects that affect to the design of technical systems and especially their user interfaces (UI) (Section 2), and to give guidelines for user-centric, human-friendly interface design (Section 3). This theoretical part of the report is not targeted to any specific user interfaces, but are general, and can be applied to any type of user interfaces, e.g. for designing UIs of Manufacturing Operations Management Systems (MOMS). Secondly, the report will review different existing and emerging humanmachine interaction technologies and give examples of their applications in industrial contexts in Section 4. The categories of discussed technologies include: 1) Direct and indirect input devices, which are used to transfer the user commands to the machine; 2) Mobile interfaces and remote sensors, such as tablets, smart phones, smart watches, and sensors used to collect data form user activities; 3) Virtual and augmented reality, which refers to mixing the virtual and real world together; 4) Gesture and speech control, which are used to control the system by body motions and voice commands. From human-perspective, whether a system can be described being usable or not depends on four factors, namely anthropometrics, behavior, cognition and social factors. Anthropometrics refers to the physical characteristics, such as body type and size, of the intended users. Behavior refers to the perceptual and motivational characteristics of users, looking at what people can perceive and why they do what they do. Behavioral characteristics are mostly related to the sensation with the basic senses (sight, hearing, touch, smell and taste) and interpretation of the sensed stimuli. Cognitive factors include learning, attention and memory and other aspects of cognition that influence on how users think and what they know and what knowledge they can acquire. Social factors consider how groups of users behave, and how to support them through design. (Ritter et al ) The usability of an user interface always depends on three aspects: 1) The specific user and its characteristics; 2) The task that is being done with the designed HMI; and 3) The context and environment of use of the designed interface. Therefore no rules for usercentric design can be given. However, several authors have given guidelines and heuristic principles for designing user interfaces with good usability. In the following are listed the most relevant guidelines, collected from (Nielsen 1995; Ritter et al. 2014; Hedge 2003): Usage of terms and language: The system should speak the user s language and use words they already know and which are relevant for their context. The interface should exhibit consistency and standards so that the same terms always mean the same thing. Consistent use of words strengthens the chances of later successfully retrieving these words from the memory. 3

4 Use recognition rather than recall: Systems that allow users to recognize actions they want to do will be easier initially to use than those that require users to recall a command. Favour words over icons: Instead of displaying icons, words may be better. This is because retrieving names from memory is faster than naming objects. Information reliability and quality: The user should not be provided with false, misleading, or incomplete information at any time. Show only information which is needed: The system should be esthetic and follow a minimalist design, i.e. do not clutter up the interface with irrelevant information. Provide feedback for the user: The current system status should always be readily visible to the user. Make available actions visible: Make the actions the user can (and should) perform easier to see and to do. Allow flexibility for different users: The system should have flexibility and efficiency of use across a range of users, e.g. through keyboard short-cuts for advanced users. Ensure that critical system conditions are recoverable: The user should have the control and freedom to undo and redo functions that they mistakenly perform. While designing user interfaces, three selections need to be made. These include: 1) Selection of the modality, which refers to the sensory channel that human uses to send and receive a message (e.g. auditory, visual, touch); 2) Selection of the medium, which refers to how the message is conveyed to the human (e.g. picture, diagram, video, alarm sound); and 3) Selection of the technology to deliver the message (e.g. smart phone or AR glasses). The multimodal interfaces, which use multiple different modalities (and also media and technologies), are emerging. For example, the augmented reality interfaces usually utilize multiple modalities, such as vision, speech and touch, and is built by combining multiple technologies, such as different visual displays, speech recognition and haptic devices. Even though the most common UIs, at least in Finnish manufacturing environments, are still pen and paper, it is believed that the transformation towards digitalization, for example implementation of MES systems, will open doors for adoption of novel user interfaces on the factory floor. Adopting new technologies into manufacturing industry is usually quite slow, but there are signs from recent years that the emerging UI technologies have tried to find their way into factory floors. This report intends to introduce the existing and emerging UI technologies that could be used on the factory floors in the future. By first discussing about the human characteristics important for the design, and giving the general guidelines for interfaces with good usability, the aim is to emphasize that while selecting and designing the media and technologies, the human behavior and cognitive capabilities need to always be considered. The user, task and context of use will affect to the optimal technology selection. 4

5 Table of contents Executive Summary... 3 Table of contents Introduction Human aspects in user-interface design Introduction to user-centric design ABCS Framework for user-centric design Anthropometrics Behavior Cognition Social cognition and teamwork Human actions Input and output modalities of user interfaces Multimodal interfaces Theoretical principles of user-computer multimodal interaction Adaptive system interfaces Guidelines for designing user-centric, human-friendly interfaces User characteristics relevant for system design Task analysis Heuristic principles for designing interfaces with good usability System characteristics and cognitive dimensions Design of multimodal interfaces Design for Errors Display Designs Thirteen principles of display design Visual design principles for good design Human-machine interaction technologies Direct and indirect input devices Mobile Interfaces and Remote Sensors Mobile Device and Remote Sensor Technologies Mobile Devices and Remote Sensors Applications Virtual and Augmented Reality Technologies for Augmented Reality AR application examples Gesture and Speech Control Technologies for Gesture and Speech control Gesture and Speech Control Application Examples Conclusions References

6 1. Introduction Human factors play a crucial role in the production environment. The desire towards more agile and responsive manufacturing requires that the real time information of the production status is always visible for those who might need it. This, in turn, requires that the information is, on one hand collected from the production processes, and on the other hand displayed to the workers in a human-friendly way. As noticed during the interviews conducted in the 1st period of LeanMES-project (Järvenpää et al. 2015), the contribution of human causes uncertainty to the process. This problem was especially visible in the information inputting and searching. The current manual practices in information inputting, e.g. re-typing information from paper documents to IT-systems, don t allow real time transparency to the operations, neither provide reliable data. As the transformation towards digital manufacturing is finally starting in many companies, the information previously provided by paper documents to the factory floor operator (e.g. job lists and work instructions), could now be displayed by multitude of different UI technologies in a digital, easily editable format. Same applies to the information collection from the factory floor. In order to mitigate the problems relating to human perceptual and cognitive capabilities, as well as behavior, a special attention should be paid on the design and selection of good and intuitive user interfaces and interaction technologies. The novel ways of working on the factory floor should not only improve the efficiency and quality of operations, but also be pleasurable for the workers. To attract future operators, the manufacturing sector should target to social sustainability and adopt new UI technologies in order to be more appealing and accessible for youngsters, who have grown in a digital world. The purpose of this report is two fold. First of all, it intends to give an introduction to human aspects that affect to the design of technical systems and especially their user interfaces (Section 2), and to give guidelines for user-centric, human-friendly interface design (Section 3). Secondly, the report will review different existing and emerging human-machine interaction technologies and give examples of their application in industrial contexts (Section 4). 6

7 2. Human aspects in user-interface design 2.1. Introduction to user-centric design When one reads a book or research article about user-centric (or human-friendly) design, it is usually highlighted that no generic rules for user-centric design can be written, because the characteristics of good design depends on the task, context and users of the designed technology (e.g. Ritter et al. 2014; Smith et al. 2012; Courage et al. 2012). For instance, Ritter et al. (2014) states: User-centered design is about considering particular people doing particular tasks in a particular context. Watzman and Re (2012) have similar viewpoint: The most important principle to remember, when thinking about design, is that there are no rules, only guidelines. Everything is context sensitive. Always consider and respect the user. Based on Courage et al. (2012) the users should be analysed by answering to questions such as: Who are they? What characteristics relevant to the design do they have? What do they know about the technology? What do they know about the domain? How motivated are they? What mental models do they have of the activities the designed product covers? For understanding the task the user is trying to accomplish, the following questions can be considered: What is the goal of the user? What steps involve in achieving the goal? How the task is currently done, in which sequence and by which methods? The analysis on users environments or context should clarify the physical situation in which the tasks occur, technology available to the users, as well as social, cultural and language considerations. (Courage et al. 2012) Two terms, often used when discussing about user-centric design are usability and user experience. These terms are sometimes mixed, even though their meaning is different. As stated by Ritter et al. (2014) usability focus on the task related aspects and getting the job done. On the other hand, user experience focus on user s feelings, emotions, values and their immediate and delayed responses. Three factors that influence usability and user experience are: the system itself; the user and their characteristics; and the context of use of the technology or systems. From the user s perspective, whether a system can be described being usable or not depends on (Ritter et al. 2014): Shape and size of the users (anthropometric factors) External body functioning and simple sensory-motor concerns, and motivation (behavioral factors) Internal mental functioning (cognitive factors) External mental functioning (social and organizational factors) As the usability of a system is an inherent requirement for good user experience, this section will mainly focus on the aspects that affect directly to the usability of an UI. 7

8 2.2. ABCS Framework for user-centric design Ritter et al. (2014) presented an ABCS framework in which the design relevant human characteristics are divided into four categories: Anthropometrics (A) - The shape of the body and how it influences what is designed: consideration of the physical characteristics of intended users such as what size they are, what muscle strength they have and so on. Behavior (B) Perceptual and motivational characteristics, looking at what people can perceive and why they do what they do. Cognition (C) Learning, attention, memory, and other aspects of cognition and how these processes influence design: users defined by how they think and what they know and what knowledge they can acquire. Social factors (S) How groups of users behave, and how to support them through design: users defined by where they are their context broadly defined including their relationships to other people. In the following sections, these four categories are discussed in more detail Anthropometrics The physical attributes of the user will affect how they use a particular artifact. The physical aspects of interaction relate to the posture and load bearing of the human body. Relating to physical aspects the designer has to consider whether the human can reach the controls, operate the lever, push the buttons and so on. Supporting correct posture will affect to the well-being of the user. The load bearing is important to consider especially when using portable or wearable devices (e.g. phones, tablets and headmounted displays). The human has to support the weight of the interface during the interaction, but normally also during the whole day. (Ritter et al. 2014) The perception of touch is divided into three types of tactual perception: Tactile, kinesthetic and haptic perception. The tactile perception is solely mediated by the change in cutaneous stimulation, i.e. when the skin is stimulated. The kinesthetic perception is mediated by variations in kinesthetic stimulation, i.e. awareness of static and dynamic body posture based on information coming from muscles and joints. Haptic perception involves using information from the cutaneous sense and kinesthesis to understand and interpret objects and events in the environment. Haptics is the most common type of tactual perception. For instance most of the common input technologies, e.g. physical keyboards, touch screens, pointing devices (mouse, trackpad, tracker balls, etc) use some sort of haptic feedback to inform the user about the performed actions. (Ritter et al. 2014) Behavior Behavioral characteristics are mostly related to the sensation and perception. People have five basic senses: sight, hearing, touch, smell and taste. Sensation occurs when the sense organs are stimulated and they generate some form of coding of the stimuli. Perception occurs when this coded information is further interpreted using knowledge of the current context (physical, physiological, psychological, and so on) to add meaning. 8

9 The process of perception is subjective. This implies that simply presenting designed stimuli in such a way that they will be sensed accurately does not necessarily mean that they will be perceived in the way that the designer intended. (Ritter et al. 2014) Most user interfaces use vision as the major sense. One of the most useful applications of vision to interface design is to take advantage of how the eye searches. Certain stimuli pop out from other stimuli and can therefore be used to draw attention to important things. Ritter et al. (2014) stated that e.g. highlighting, using different color or making the object to move or blink, makes the objects to pop out from others. Colors should be used to emphasize things that are important. However, as advised by Ritter et al. (2014), in order to help people with red-green color vision deficiency, redundant information should be used. Often it is important to consider how the different sensory modalities can be used together to provide further information for the user (e.g. difficult conditions, such as lack of light or persons with impaired vision or hearing). Also, if some elements on a display, which are visually similar (such as same shape, but slightly different color) should be processed differently, they should be made distinct by separating one or more dimensions of the appearance by several JNDs (Just noticeable differences) (e.g. several shades of color). (Ritter et al. 2014) Further details about the design of visual displays are discussed in Section As discussed above, vision has an important role in most user interfaces. Welsh et al. (2012) stated that people are more accurate and less variable under conditions in which they have vision of the environment than when they do not. Furthermore, they said that ballistic actions, such as keypress don t require continual source of visual target information and feedback during the execution, because online corrections can not be made. On the other hand, for aiming movements, such as pointing a certain icon on the display, continual and stable source of visual information about the effector and the target is needed for efficient feedback-based corrections and movement accuracy. (Welsh et al. 2012) Fitt s law (Fitts 1954), relating to perceptual-motor interaction, is often used as a predictive model of time to engage a target. The law indicates that the time to point to an object is related to the distance from the object and inversely related to the size of the object. The law implies that larger objects lead to faster pointing times than smaller objects and shorter distances lead to faster reaction times. (Ritter et al. 2014; Welsh et al. 2012) In addition to perceptual capabilities, also motivation affects to the behavior or a human. Szalma et al. (2012) listed three organismic elements that are essential for facilitating intrinsic motivation for task activity. These needs are competence, autonomy (personal agency, not independence per se), and relatedness. Ritter et al. (2014) named these elements as mastery, autonomy and purpose. Based on Szalma et al. (2012) three factors that support autonomy are as follows: 1) meaningful rationales for doing a task, 2) acknowledgement that the task might not be interesting; 3) an emphasis on choice rather than control Cognition Cognition refers to mental capabilities of users relating to memory, attention and learning. As stated by Ritter et al. (2014), user s cognition is limited, for example the 9

10 working memory and attentional resources are limited, which affects to how much information human can process at a time. Memory The way people use a system will be greatly influenced by how well they can retrieve commands and locations of objects from memory. There are different types of memory, that are used for different purposes (Ritter et al. 2014): Short-term memory: Is often used to store lists or sets of items to work with. For unrelated objects, users can remember around seven meaningful items (+/- 2). Long-term memory: Information which is meaningful, and which meaning is processed at encoding time, is easier to remember. Declarative memory: Facts and statements about the world Procedural memory: Includes acts, or sequences of steps that describe how to do particular tasks. Implicit memory: Cannot be reported. Most procedural information is implicit in that the precise details are not reportable. Information gets put into implicit memory when the user works without a domain theory and learns through trial and error. Explicit memory: Can be reported. Most declarative information is explicit in that it can be reported. Users can perform tasks more robustly, and because they can describe how to do the task, they can help others more readily. Users can be encouraged to store information in explicit memory by helping them develop a mental model of a task, and by providing them with time to reflect on their learning. Ritter et al. (2014) highlighted few mnemonics and aids to memory. For instance, recognition is useful aid to recalling. Recognition memory is more robust than recall memory. This implies that it is easier to recognize something that you have previously seen than to recall what it was you saw. Many interfaces take advantage of recognition memory by putting objects or actions in a place where they can be recognized instead of requiring the user to recall them. In addition, anomalous or interesting things are better retrieved from memory, than something which is not drawing the user s attention in the first place. (Ritter et al. 2014) In case of lists, certain things affect how well the information on the lists can be retrieved (Ritter et al. 2014): Primacy Items appearing at the start of a list are more easily retrieved from the memory. Distinctive items in a list are better retrieved. Items in a list that make sense (e.g. MES, ERP) are better retrieved than items that do not have associations for everybody. Recency Items appearing last in the list are better retrieved. Attention According to Ritter et al. (2014) attention refers to the selective aspects of perception, which function so that at any instant a user focuses on particular features of the 10

11 environment to the relative (but not complete) exclusion of others. Welsh et al. (2012) listed three important characteristics of attention: 1) attention is selective and allows only a specific subset of information to enter the limited processing system; 2) the focus of attention can be shifted from one source of information to another; 3) attention can be divided such that within certain limitation, one may selectively attend to more than one source of information at a time. As discussed by Welsh et al. (2012) shifts of attention that are driven by stimuli are known as exogenous, or bottom-up, shifts of attention. They are considered to be automatic in nature and thus, for the most part, are outside of cognitive influences. Exogenous shifts of attention are typically caused by dynamic change in the environment, such as the sudden, abrupt appearance (onset) or disappearances (offset) of a stimulus, a change in the luminance or color of a stimulus, or the abrupt onset of object motion. Performer-driven, or endogenous, shifts of attention are under complete voluntary control. This type of shift of attention can be guided by wide variety of stimuli, such as symbolic cues like arrows, numbers or words. In this way, users can be cued to locations or objects in the scene with more subtle or permanent information than the dynamic changes that are required for exogenous shifts. However, the act of interpreting the cue requires a portion of the limited information-processing capacity. Furthermore, as stated by Welsh et al. (2012) it seems that automatic attentional capture is dependent on the expectations of the user. Therefore, the designer of the interface has to consider the perceptual expectations of the user. (Welsh et al. 2012) Proctor & Vu (2012) stated that many studies have shown that it is easier to perform two tasks together when they use different stimulus or response modalities, than when they use the same modalities. Performance is also better when one task is verbal and the other visuospatial than when they are the same type. According to multiple resource models, different attentional resources exist for different sensory-motor modalities and coding domains. (Proctor & Vu 2012) Therefore, dual tasks that use different perceptual buffers will interfere less with each other. For instance, people can learn to drive and talk at the same time in normal weather conditions, because driving does not use a lot of audio cues. (Ritter et al. 2014) Mental models and learning Mental models are used to understand systems and to interact with systems. When the user s mental models are inaccurate, systems are hard to use. The model the user brings to the task will influence how they use the system, what strategies they will most likely employ, and what errors they are likely to make. It is therefore important to design the system in such a way that the user can develop an accurate mental model of it. (Ritter et al. 2014) Mental model can be considered as a representation of some part of the world that can include the structures of the world (the ontology of the relevant objects), how they interact and how the user can interact with them (Ritter et al. 2014). Payne (2012) simplified the meaning of mental models into what users know and believe about the systems they use. If the user s mental model accurately matches the system, the user can better use the mental model to perform their task, to troubleshoot the system and to teach others about the task or system (Ritter et al. 2014). 11

12 The designer of the system must have an accurate mental model of how people will use it. This requires understanding how people will use it, the tasks they will perform using the system, and their normal working context. Making the system compliant with the user s mental model will almost certainly help reduce the time it takes to perform task, reduce learning time and improve the acceptability of the system. Good interfaces will help users to develop appropriate levels of confidence in their representations and decisions. Often this means providing information to support learning, including feedback on task performance and also providing information to build a mental model. It is important to keep the human in the loop. This means keeping the users aware of what the computer is doing, by providing them with feedback about the system s state. They can use this to detect errors, to update their own mental model of how the system is working and to anticipate when they need to take an action. If users do not get feedback, their calibration about how well they are doing will be poor to non-existent. When it is not clear for the user what to do next, problem solving is used. Problem solving uses mental models and forms a basis for learning. (Ritter et al. 2014) One important concept, which aids in building the correct mental model of the system, and therefore easing its usage, is the stimulus-response (S-R) compatibility. This means that there should be clear and appropriate mappings between the task/action and the response. It is typically seen as having physical aspects of an interface (e.g. buttons) and displays match the world that they are representing. For example. buttons calling elevator to go up, should be upper than the ones to call it to go down. (Welsh et al. 2012; Ritter et al. 2014) Social cognition and teamwork Social processes how people interact with each other are important, because they affect how systems and interfaces are used. Workplace systems are socio-technical systems, meaning technical systems that are designed for and shaped by people operating in social contexts. Two especially important social responsibility effects, presented by Ritter et al. (2014), should be considered. These are diffusion of social responsibility and pluralistic ignorance. The diffusion of social responsibility indicates that a person is less likely to take responsibility for an action or inaction when they think someone else will take the action. For instance this could happen when sending an to many and nobody takes the responsibility. Pluralistic ignorance refers to the fact that people, especially inexperienced, often base their interpretation of a situation on how other people interpret it. For example, if the other people don t react to an alarm sound, the rest will interpret it as not important as well. (Ritter et al. 2014) 2.3. Human actions Based on Welsh et al. (2012) three basic processes can be distinguished in human information processing: Stimulus identification, which is associated with processes responsible for the perception of information; Response selection, which pertains to the translation between stimuli and responses, and Response programming, which is associated with the organization of the final output. (Welsh et al. 2012) When human takes an action, it includes several stages. Norman (1988) defined seven stages of user activities. The process of these stages should be seen as cyclic rather than linear sequence of activities: 12

13 Establish the goal Form the intention to take some action Specify the action sequence Execute the action Perceive the system state Interpret the system state Evaluate the system state with respect to the goals and intentions MANU LeanMES Ritter et al. (2014) discussed about the gulfs of evaluation and execution, originally defined by Norman (1988). In evaluation and execution phases the user has to make mappings between the psychological and physical concepts. In evaluation phase it means the following: When the user perceives the state of the system, this will be in terms of physical concepts (usually variables and values) that the user will have to translate into a form that is compatible with their mental model of how the system operates. Gap between the physical concepts and the psychological concepts is called gulf of evaluation. In execution phase the goals and intentions of the user s (psychological concepts) need to be translated into physical concepts, which are usually actions that can be executed in the system. The gap between the goals and intentions, and the physical actions are called gulf of execution. Interfaces show examples where details and feedback on the state of the system can be difficult to interpret, and where it can be difficult to work out what actions are available and how to execute them. In these cases the gulfs of evaluation and execution are large. (Ritter et al. 2014) The above mentioned gulfs lead to following implications for design (Ritter et al. 2014): Good design involves making sure that information that is crucial to task evaluation and performance are made clearly visible to the user. What counts as appropriate information will vary across tasks, and sometimes across users, and even across context of use. Appropriate consideration should be given to: Feedback helps to reduce the gulf of evaluation because it shows the effect of performing a particular tasks. Consistency helps users to help themselves (e.g. by applying knowledge on other systems (e.g. place of buttons)). Mental models design should facilitate the development of appropriate mental models, and support the use of those models by making the appropriate information visible to users at the right time in the right place. Critical systems should not be too easy to use. Users must pay attention to what they are doing Input and output modalities of user interfaces Sutcliffe (2012) described the difference between medium and modality. A message is conveyed by a medium and received through a modality. A modality is the sensory channel that human uses to send and receive messages to and from the world, essentially the senses. Two principal modalities that are used in human-computer communication are vision and hearing. (Sutcliffe 2012) As the vision modality has been widely covered in other sections of this report, this section concentrates to mainly to 13

14 hearing, namely speech and non-speech auditory modalities. Also touch will be shortly included. Smell and taste are not discussed here, as their use in UIs is not yet common. Non-speech auditory output refers to auditory stimulus, which is not spoken language, but e.g. alarm or warning sounds. Hoggan & Brewster (2012) listed the advantages of non-speech feedback (including also other than auditory feedback, such as touch): Vision and hearing are interdependent, they work well together (e.g. our ears tell our eyes where to look ). Hearing and touch have amodal properties, which relates to space and time and involve points along continuum (e.g. location), intervals within continuum (e.g. duration), patterns of intervals (e.g. rhythm), rates of patterns (e.g. tempo), or changes of rate (e.g. texture gradients). Sound has superior temporal resolution. Sound and touch reduce the overload from large displays. Sound and touch reduce the amount of information needed on the screen. Sound reduces demands on visual attention. Sound is attention grabbing. Touch is subtle and private. Spatial resolution of tactile stimuli is high. Auditory or tactile form makes computers more usable by visually disabled people. On the other hand, Hoggan and Brewster (2012) (originally by (Kramer 1994)), brought out some disadvantages of non-speech feedback: Sound has low resolution. Using sound volume or tactile amplitude, only a very few different values can be unambiguously presented. Presenting absolute data is difficult. There is lack of orthogonality, changing one attribute of a sound or tactile cue may affect the others. The auditory feedback (or input) may annoy to other persons nearby. Hoggan & Brewster (2012) highlighted that nonspeech auditory or tactile feedback is useful in mobile devices. As the devices are small, there is a very limited amount of screen space for displaying information. Also, if the users are performing their tasks in movement, e.g while walking or driving, they cannot devote all of their visual attention to the mobile device. (Hoggan & Brewster, 2012) Speech is characterised by its transient nature, while graphics are persistent. While a graphical interface typically stays on the screen until the user performs some action, the message carried out by speech is immediately gone after it has been said. Listening to speech taxes user s short-term memory and if the message is long, something may be forgotten. Therefore, in general, transience means that speech is not a good medium for delivering large amounts of information. However, as people can look and listen at the same time, speech may be good for grabbing attention or for providing an alternate mechanism for feedback. (Karat et al. 2012) Speech is also invisible. The lack of visibility makes it difficult to communicate the functional boundaries of an application to the user. Because there is no visible menu or other screen elements, it is much more challenging to indicate to the users what actions 14

15 they may perform and what words and phrases they must say to perform those actions. It is also problematic when the speaker is not in a private environment, or when there are other voices in the background that might interfere with the speech recognition. (Karat et al. 2012) In the future, the multimodal interfaces are expected to become more common, and these interfaces will use also other modalities, such as haptic (sense of touch), kinesthetic (sense of body posture and balance), gustation (taste) and olfaction (smell) (Oviatt, 2012). Multimodal interfaces will be discussed in the next section Multimodal interfaces Multimodal interfaces are becoming more common in human-machine interaction. According to Dumas et al. (2009), multimodal systems are computer systems endowed with multimodal capabilities for human/machine interaction and able to interpret information from various sensory and communication channels. Multimodal interfaces process two or more combined user input modes, like speech, pen, touch, manual gesture, gaze or body movements, in a coordinated manner with multimedia system output. Compared to unimodal interfaces, multimodal interfaces aim to provide a more human way to interact with computer by using richer and more natural ways of communications, such as speech, gestures and other modalities, and more generally all the five senses. However, it has to be noted that the terms natural interaction and natural UI are often used, when talking about new UIs. Hinckley & Wigdor (2012) gave an operational definition for a natural UI: the experience of using a system matches expectations such that it is always clear to the user how to proceed and only a few steps (with minimum of physical and cognitive effort) are required to complete common tasks. Therefore, one cannot state that one interaction technology would be more natural than another. It is always dependent on the task that is supposed to be performed with the technology. It has been proved by Oviatt (1997) that compared to unimodal interfaces, multimodal interfaces can improve the error handling and reliability, provide greater expressive power, and provide improved support for users preferred interaction style. The multimodal interfaces can support broad range of users and context of use, since the availability of multiple modalities supports flexibility. For example, the same user may benefit of the speech input in quiet conditions when the hands are occupied, while in noisy environment e.g. touch input may be more efficient. Flexible personalization of the interaction mode, based on user and context, is especially useful for people with impaired vision, hearing or moving abilities. (Dumas et al. 2009) According to Dumas et al. (2009) the findings in cognitive psychology indicate that humans are able to process modalities partially independently and, thus, presenting information with multiple modalities increases human working memory. Therefore, increasing effective working memory by presenting information in a dual-mode form, rather than a purely visual one, could expand human processing capabilities. 15

2.4.2. Theoretical principles of user-computer multimodal interaction MANU LeanMES When human interacts with a machine, his/her communication can be divided into four different states (see Figure 1).

16 Theoretical principles of user-computer multimodal interaction MANU LeanMES When human interacts with a machine, his/her communication can be divided into four different states (see Figure 1). These are: Decision, Action, Perception and Interpretation. The machine has similar four states. In the decision state the communication message content is prepared consciously for an intention, or unconsciously for attentional content or emotions. After that, in the action state, the communication means to transmit the message (e.g. speech or gesture), are selected. When the human communicates his/her message, in perception state, the machine uses one or multiple sensors to grasp the most information from a user. During the interpretation state, the system tries to give a meaning to the different information collected in the previous state. In the computational state, action is taken following the business logic and dialogue manager rules defined by the developer. In the action state the machine generates the answer based on the meaning extracted in the interpretation state. A fission engine determines the most relevant modalities to return the message, depending on the context of use and the profile of the user. (Dumas et al. 2009) Figure 1. A representation of multimodal man-machine interaction loop (Dumas et al. 2009). Oviatt (2012) highlighted that commercially available multimodal interfaces primarily have been developed for mobile use, including cell phones, small PDA handhelds, and new digital pens. The commercial solutions have avoided co-processing and interpreting the linguistic meaning of two or more natural input streams. In this regard, they lag substantially behind far more powerful research-level prototypes, and have yet to reach their most valuable commercial potential. In some cases, these systems simply have emphasized capture and reuse of synchronized human communication signals (e.g., verbatim speech, pen ink), rather than interpretation and processing of linguistic meaning at all. (Oviatt 2012) As stated by Oviatt (2012) there is a growing interest in designing multimodal interfaces that incorporate vision-based technologies, such as interpretation of gaze, facial 16

17 expression, head nodding, gesturing and large body movements. These technologies unobtrusively or passively monitor user behavior and don t require explicit user command to a computer. This contrasts with active input modes, such as speech or pens, which the user deploys intentionally as a command issued to the system. Although passive modes may be attentive and less obtrusive, active modes generally are more reliable indicators of user intent. As vision-based technologies mature, one important future direction will be the development of blended multimodal interfaces that combine both passive and active modes. (Oviatt 2012) Adaptive system interfaces Jameson & Gajos (2012) defined user-adaptive system as an interactive system that adapts its behavior to individual users on the basis of processes of user model acquisition and application that involve some form of learning, inference, or decision making. User-adaptive systems are different than adaptable systems, that offer the user an opportunity to configure or otherwise influence the system s longer term behavior, e.g. by choosing options that determine the appearance of the user interface. Jameson & Gajos (2012) stated that often a carefully chosen combination of adaptation and adaptability works the best. Jameson & Gajos (2012) discussed about suitable functions for adaptive systems: Supporting system use Offering help adaptively, e.g. by suggesting the user the commands he/she could use next. Taking over parts of routine tasks, e.g. sorting or filtering and scheduling appointments and meetings. Systems of this sort can actually take over two types of work from the user: 1) choosing what particular action is to be performed (e.g. which folder a file should be saved in); and 2) performing the mechanical steps necessary to execute that action. Adapting the interface to individual task and usage, i.e. adapting the presentation and organization of the interface so that it fits better with the user s task and usage patterns. Adapting the interface to individual abilities, this is useful not only for people with impairments, but also different environmental factors, such as temperature may temporarily impair a person s dexterity, a low level of illumination will impact reading speed and ambient noise will affect hearing ability. It would be good especially with mobile devices to adapt to the momentary effective abilities of users. Supporting information acquisition Helping users to find information, including support for browsing and querybased search and spontaneous provision of information. The system can e.g. suggest news articles based on the user s previous clicks on other articles. Recommending products Tailoring information presentation. The properties of users that may be taken into account in the tailoring of documents include: the user s degree of interest in particular topics; the user s preference or need for particular forms of information presentation; and the display capabilities of the user s computing device. 17

18 3. Guidelines for designing user-centric, human-friendly interfaces As stated in the beginning of the report, there are no universal rules for good usercentric design. However, as the previous sections showed, in general, the human behavior and cognitive capabilities are not totally unpredictable, and therefore some guidelines for good user interface design can be given. The guidelines given in this chapter are general, and can be applied to any user interface design, including the planner and operator interfaces to the manufacturing operations management systems. Watzman & Re (2012) listed audit questions for the usable interfaces (see Table 1). The audit questions A are meant for figuring out the purpose and context of the usage of the interface, while the audit questions B are more targeted towards finding the most efficient way to perform the task that is supposed to be performed by using the designed interface. Table 1. Audit questions for designing usable interfaces (Watzman & Re, 2012). Audit questions A Who are the product users? How will this product be used? When will this product be used? Why will this product be used? Where will this product be used? How will the process evolve to support this product as it evolves? Audit questions B What is the most efficient, effective way for a user to accomplish a set of tasks and move on to the next set of tasks? How can the information required for product ease of use be presented most efficiently and effectively? How can the design of this product be done to support ease of use and transition from task to task as a seamless, transparent and even pleasurable experience? What are the technical and organizational limits and constraints? 3.1. User characteristics relevant for system design The human characteristics relevant for design, were thoroughly covered in section 2 in general level. Here few relevant characteristics, relating to a specific person who will be using the system, by (Ritter et al. 2014) are summarized: Physical characteristics, limitations and disabilities Perceptual abilities, strength, and weaknesses Frequency of product use 18

19 Past experience with same/similar product Activity mental set (the attitude towards and level of motivation you have for the activity) Tolerance for error Patience and motivation for learning Culture/language/population expectations and norms Task analysis Task analysis provides a way to describe the users task and subtasks, the structure and hierarchy of these tasks, and the knowledge they already have or need to acquire to perform the tasks. Prescriptive analyses show how the user should carry out the task (associated with normative behavior). Descriptive analyses, in contrast, show how users really carry out the task, and are hence associated with actual behavior. (Ritter et al. 2014) Courage et al. (2012) highlighted that task analysis requires watching, listening to and talking with users. Other people, such as managers and supervisors, and other information sources, such as print or online documentation are useful only secondarily for a task analysis. Relying on them may lead to false understanding. In addition to analyzing the users, their characteristics, expectations and level of experience, it is crucial to consider also the context and environment where the system is being used. Sutcliffe (2012) states that it is important to gather information on the location of the use (office, factory floor, public/private space, and hazardous locations), pertinent environmental variables (ambient light, noise levels, and temperature), usage conditions (single user, shared use, broadcast), and expected range of locations (countries, languages and cultures). Different task analysis methods include (Courage et al. 2012: Ritter et al. 2014): Hierarchical task analysis (HTA) Task Analysis Grammar (TAG) Cognitive task analysis GOMS (Goals, Operations, Methods, and Selection rules) The keystroke level model As stated by Courage et al. (2012) the efficiency-oriented, detailed task analyses, such as TAG and GOMS have a place in evaluating especially those products for which efficiency on the order of seconds saved is important. Courage et al. (2012) listed different types of granularity levels for the task analysis: Analysis of a person s typical day or week Job analysis: All the goals and tasks that someone does in a specific role daily, monthly, or over longer periods Workflow analysis: Process analysis, cross-user analysis, how work moves from person to person High-level task analysis: The work needed to accomplish a large goal broken down into sub-goals and major tasks. Procedural analysis: The specific steps and decisions the user takes to accomplish a task. 19

20 For presenting the data of the task analysis, several method can be applied, such as affinity diagrams, artifacts, flow diagrams, personas, scenarios, sequence diagram, user need tables and user/task matrix. The user/task matrix becomes a major input to a communication plan to answer the question of what tasks to include in documentation for people in different roles. (Courage et al. 2012) As a result of task analysis, function allocation can be carried out. Function allocation is done to identify the list of functions that the system (including both the human and the machine) has to perform. These functions can then be allocated to either human or machine, e.g. based on Fitt s list, which is also referred to as the MABA-MABA (Men are better at, Machines are better at) approach. However, as Ritter et al. brought out, the designers often allocate all the tasks that they know how to automate, to the technology, and leave the human to carry out all the others. (Ritter et al. 2014) This may not lead to task allocation, which would optimize the capability utilization of both human and machine. Also, if it is e.g. important for the user to learn about the task in order to be able to take control of it in case of machine failure, it may not be wise to automate the task completely, as it doesn t facilitate learning Heuristic principles for designing interfaces with good usability In this section, the heuristic principles for good UI design, presented by multiple authors, will be discussed. Nielsen (1995) listed 10 general usability principles, or heuristics, for user interface design. They are summarized in the following list: 1. The current system status should always be readily visible to the user. 2. There should be a match between the system and the user s world: the system should speak the users language. 3. The user should have the control and freedom to undo and redo functions that they mistakenly perform. 4. The interface should exhibit consistency and standards so that the same terms always mean the same thing. 5. Errors should be prevented where possible. 6. Use recognition rather than recall in order to minimize mental workload of the users. 7. The system should have flexibility and efficiency of use across a range of users, e.g. through keyboard short-cuts for advanced users. 8. The system should be esthetic and follow a minimalist design, i.e. do not clutter up the interface with irrelevant information. 9. Users should be helped to manage errors: not all errors can be prevented so make it easier for the users to recognize, diagnose and recover. 10. Help and documentation should be readily available and structured for ease of use. Grice s (1975) maxims of conversation are often used as a guideline for evaluating what kind of information should be displayed to the user: Maxim of quantity - The message should be made as informative as required. The message should not be more informative than is required 20

21 Maxim of quality - Information that is believed to be false or for which there is no adequate evidence, should not be displayed. Maxim of relevance - Only relevant information should be displayed. Maxim of manner - Obscurity of expression and ambiguity should be avoided. The message should be brief (avoid unnecessary prolixity) and orderly. Implications of human memory to system design by (Ritter et al. 2014): Use words that the users know. Use the words consistently to strengthen the chances of later successfully retrieving these words from the memory. Instead of displaying icons, words may be better. This is because retrieving names from memory is faster than naming objects. Systems that allow users to recognize actions they want to do will be easier initially to use than those that require users to recall command. There is a trade off, however, when the users become experts. Once something has been learned and stored to the long-term memory, it takes some time to un-learn it. Therefore, the user should not be allowed to learn incorrect knowledge. It takes a long time to correct this error. Principles for design to avoid exasperating users by (Hedge 2003): Clearly define the system goals and identify potential undesirable system stages Provide the user with appropriate procedural information at all times Do not provide the user with false, misleading, or incomplete information at any time Know the user Build redundancy into the system Ensure that critical system conditions are recoverable Provide multiple possibilities for workarounds Ensure that critical systems personnel are fully trained Provide system users with all of the necessary tools The Gulfs of Evaluation and Execution were discussed in section 2.3. In the following list, the design principles for making these gulfs narrower are discussed (Norman 1988; Ritter et al. 2014): 1. Use both the knowledge in the world and the knowledge in the head. Provide information in the environment to help the user determine the system state and to perform actions, such as explicit displays of system state, and affordances on the system controls. 2. Simplify the structure of tasks. Require less of the user by automating sub-tasks, or using displays to describe information without being asked, or provide common actions more directly. However, do not reduce this below their natural level of abstraction. 3. Make the relevant objects and feedback on actions visible. Bridge the Gulf of Evaluation. Make the state of the system easier to interpret. 4. Make the available actions visible. Bridge the Gulf of Execution. Make the actions the user can (and should) perform easier to see and to do. 21

22 5. Get the mappings correct from objects to actions. Make the actions that the user can apply natural. 6. Exploit the power of constraints, both natural and artificial, to support bridging each Gulf. Make interpretations of the state and of possible actions easier by removing actions that are not possible in the current state and reducing the complexity of the display for objects that are not active or available. 7. Design for error. Users will make errors, so you should expect them and be aware of their effects. Where errors can not be prevented, try to mitigate their effects. Help the users see errors and provide support for correcting them. 8. When all else fails, standardize. If the user does not know what to do, allow them to apply their knowledge of existing standards and interfaces System characteristics and cognitive dimensions Systems are often evaluated based on seven characteristics (Ritter et al. 2014): functionality, usability, learnability, efficiency, reliability, maintainability, and utility/usefulness. The usability has been in the focus of this report. Other important characteristic is efficiency. When designing user interfaces it is important to remember that maximal efficiency is not always desired. As stated by Ritter et al. (2014) efficiency must be calculated in terms of technical efficiency that matches user efficiency expectations of the task at hand. For instance, one-click payments in e-markets without asking the user to review the order before payment, may be too efficient. Ritter et al. (2014) presented 14 cognitive dimensions. Their goal was to provide a fairly small, representative set of labeled dimensions that describe critical ways in which interfaces, systems and environments can vary from the perspective of usability. The cognitive dimensions help to discuss and compare alternative designs. These dimensions focus on the cognitive aspects of interfaces and don t address design tradeoffs related to the other aspects of users anthropometric, behavioral and social aspects. Below are listed the cognitive dimensions from Ritter et al. (2014): 1. Hidden dependencies: how visible the relationships between components are? 2. Viscosity: How easy it is to change objects in the interface? 3. Role-expressiveness: How clear the mappings of the objects are to their functions? 4. Premature commitment: How soon the user has to decide about something? 5. Hard mental operations: How hard are the mental operations to use the interface? 6. Secondary notation: The ability to add extra semantics 7. Abstraction: How abstract the operations and systems are? 8. Error-proneness susceptibility: How easy it is to err? 9. Consistency: How uniform the system is (in various ways, including action mapping)? 10. Visibility: Whether required information is accessible without work by the user. 11. Progressive evaluation: Whether the user can stop in the middle of creating some notation and check what you have done so far. 12. Provisionality: Whether the user can sketch out ideas without being too exact 13. Diffuseness: How verbose the language is? 22

23 14. Closeness of mapping: How close the representation in the interface (also called notation) is to the end results being described? Hidden dependencies are common for instance in the spreadsheets, which show the user formulae in one direction only, that is, which cells are used to compute the value in a cell, but not which cells use a given cell s value. Another example is that applications other than created them may be dependent on some files, e.g. graphics in reports. Usually these dependencies are not visible and deleting the dependent files may be hazardous. Therefore, all dependencies that may be relevant to the user s task should be represented. (Ritter et al. 2014) A viscous system is resistant to change. Even small changes can require substantial effort, for example changing the numbering of every picture (and text referencing) manually in Word document. Sometimes viscosity can be beneficial, e.g. it encourages reflective action and explicit learning. When it is easy to make changes, it can lead to many small, unnecessary changes being made. Viscosity is important especially in safety critical applications or applications where incorrect action is expensive in time or money. Viscosity can be implemented e.g. by asking the user to confirm the action Do you really want to do this action?. (Ritter et al. 2014) Role-expressiveness describes the extent to which a system reveals the goals of the designer to the user. The purpose of each component of the system is understandable to the user, e.g. buttons of the interface should be clearly recognizable as buttons that can be pressed. Classic problem occurs when two similar looking features achieve different function or when two different looking functions achieve similar effects. (Ritter et al. 2014) Some mental operations are harder than others. For instance those operations which are contradicting with the normal mental models, e.g. having to mentally change the size of an object (which is normally considered as relatively constant aspect of an object) is more difficult than applying simple rules of behavior. Also mentally rotating objects is slower with larger objects than with small ones. Hard mental operations are easy to implement computationally, but troublesome for users. Hard mental operations can be solved at several levels, including either by avoiding the problem by understanding the relative difficulty of operations, or by providing tools to assist in these operations. (Ritter et al. 2014) 3.5. Design of multimodal interfaces Human cognitive capacity is limited. Sometimes the limited resources may lead to multimedia usability problems, discussed by Sutcliffe (2012): Capacity overflow may happen when too much information is presented in a short period, swamping the user s limited working memory and cognitive processor s capability to comprehend, chunk, and then memorize or use the information. The connotation is to give users control over the pace of information delivery. Integration problems arise when the message on two media is different, making integration in working memory difficult; this leads to the thematic congruence principle. 23

24 Contention problems are caused by conflicting attention between dynamic media, and when two inputs compete for the same cognitive resources. For example speech and text require language understanding. Comprehension is related to congruence; we understand the world by making sense of it with our existing long-term memory. Consequently, if multimedia content is unfamiliar, we cannot make sense of it. Multitasking makes further demands on our cognitive processes, so we will experience difficulty in attending to multimedia input while performing output tasks. In task-driven applications, the information requirements are derived from the task model. In information-provision applications, such as websites with an informative role, information analysis involves categorization and the architecture generally follows a hierarchical model. In the third class of explanatory or thematic applications, analysis is concerned with the story or argument that is, how the information should be explained or delivered. (Sutcliffe, 2012) Sutcliffe (2012) presented the classification for information components: Physical items relating to tangible observable aspects of the world Spatial items relating to geography and location in the world Conceptual-abstract information, facts, and concepts related to language Static information which does not change: objects, entities, relationships, states, and attributes Dynamic, or time-varying information: events, actions, activities, procedures, and movements Descriptive information, attributes of objects and entities Values and numbers Causal explanations Sutcliffe (2012) suggested the following heuristics, collected from multiple sources, for appropriate media selection: To convey detail, use static media, for example, text for language-based content, diagrams for models, or still image for physical detail of objects. To engage the user and draw attention, use dynamic media, e.g. video, animation, or speech. For spatial information, use diagrams, maps, with photographic images to illustrate detail, and animations to indicate pathways. For values and quantitative information, use charts and graphs for overviews and trends, supplemented by tables for detail. Abstract concepts, relationships, and models should be illustrated with diagrams explained by text captions and speech to give supplementary information. Complex actions and procedures should be illustrated as a slideshow of images for each step, followed by a video of the whole sequence to integrate the steps. Text captions on the still images and speech commentary provide supplementary information. Text and bullet points summarize steps at the end, so choice trade-offs may be constrained by cost and quality considerations. 24

25 To explain causality, still and moving image media need to be combined with text. Payne (2012) referred to other research on multimedia instructions (Mayer & Moreno 2002). Following principles were summarized: The multiple presentation principle states that explanations in words and pictures will be more effective than explanations that use only words. When words only are presented, learners may find it difficult to construct an appropriate mental image, and this difficulty may block effective learning. Studies have offered support for the general idea that learners will acquire richer knowledge from narration and animation than from narration alone. The contiguity principle is the claim that simultaneous, as opposed to successive, presentation of visual and verbal materials is preferred. The chunking principle refers to a situation in which visual and verbal information must be presented successively, or alternately (against the contiguity principle). It states that learners will demonstrate better learning when such alternation takes place in short rather than long segments. The reasoning is straightforward, given the assumptions of the framework: working memory may become overloaded by having to hold large chunks before connections can be formed Design for Errors Errors often arise as a combination of factors at the anthropomorphic, behavioral, cognitive and social levels in the ABCS framework. Each of the components people, technology, context can give rise to errors. There are different type of errors such as slips, which refer to errors that occur when someone knows the right thing to do, but accidentally do something different, e.g pressing wrong buttons while typing; or mistakes, which refer to errors that occur when the action is taken on the basis of an incorrect plan. One specific type of errors is post-completion errors. These arise when the goal for the task has been completed, but the goals for the subtasks have not. A good example of such situation is getting the money from the ATM, but forgetting the card to the machine. Good interface design can can help to reduce the errors that may happen while interacting with the interface. The first step in design for error is to identify the situations that can lead to erroneous performance. Secondly, appropriate mechanisms must be put in place to either prevent the errors, or at least mitigate the adverse consequences arising from those errors. For example, in order to avoid the postcompletion errors, the system should discourage the user from believing that they have completed the task until all the important sub-parts are done, and to put the most important goal last, where technology and the situation permits. Good design can help provide more feedback on performance, and could also provide education along the way about how to correct problems. (Ritter et al. 2014) 25

26 3.7. Display Designs Displays are human-made artefacts designed to support the perception of relevant system variables and to facilitate further processing of that information. A user must be able to process whatever information that a system generates and displays; therefore, the information must be displayed according to principles in a manner that will support perception, situation awareness, and understanding. The term display doesn t refer only to visual displays, but includes all medias that are used to provide information to the users (e.g. audio and haptic devices). (Wickens et al. 2004) Thirteen principles of display design Wickens et al. (2004) defined 13 principles of display design. These principles of human perception and information processing can be utilized to create an effective display design. The potential benefits of applying these principles are expected to be, for instance: a reduction in errors, a reduction in required training time, an increase in efficiency, and an increase in user satisfaction. It has to be noted that not all the principles are applicable to all displays or situations and may even seem to be conflicting. The principles may be tailored to the specific situation. Perceptual principles 1. Make displays legible (or audible). A display s legibility is critical and necessary for designing a usable display. If the characters or objects being displayed cannot be discernible, then the operator cannot effectively make use of them. 2. Avoid absolute judgment limits. Do not ask the user to determine the level of a variable on the basis of a single sensory variable (e.g. color, size, loudness). These sensory variables can contain many possible levels. 3. Top-down processing. Signals are likely perceived and interpreted in accordance with what is expected based on a user s past experience. If a signal is presented contrary to the user s expectation, more physical evidence of that signal may need to be presented to assure that it is understood correctly. 4. Redundancy gain. If a signal is presented more than once, it is more likely that it will be understood correctly. This can be done by presenting the signal in alternative physical forms (e.g. color and shape, voice and print, etc.), as redundancy does not imply repetition. A traffic light is a good example of redundancy, as color and position are redundant. 5. Similarity causes confusion: Use discriminable elements. Signals that appear to be similar will likely be confused. The ratio of similar features to different features causes signals to be similar. For example, A423B9 is more similar to A423B8 than 92 is to 93. Unnecessary similar features should be removed and dissimilar features should be highlighted. Mental model principles 6. Principle of pictorial realism. A display should look like the variable that it represents (e.g. high temperature on a thermometer shown as a higher vertical level). If there are multiple elements, they can be configured in a manner that looks like it would in the represented environment. 26

27 7. Principle of the moving part. Moving elements should move in a pattern and direction compatible with the user s mental model of how it actually moves in the system. For example, the moving element on an altimeter should move upward with increasing altitude. Principles based on attention 8. Minimizing information access cost. When the user s attention is diverted from one location to another to access necessary information, there is an associated cost in time or effort. A display design should minimize this cost by allowing for frequently accessed sources to be located at the nearest possible position. However, adequate legibility should not be sacrificed to reduce this cost. 9. Proximity compatibility principle. Divided attention between two information sources may be necessary for the completion of one task. These sources must be mentally integrated and are defined to have close mental proximity. Information access costs should be low, which can be achieved in many ways (e.g. proximity, linkage by common colors, patterns, shapes, etc.). However, close display proximity can be harmful by causing too much clutter. 10. Principle of multiple resources. A user can more easily process information across different resources. For example, visual and auditory information can be presented simultaneously rather than presenting all visual or all auditory information. Memory principles 11. Replace memory with visual information: knowledge in the world. A user should not need to retain important information solely in working memory or retrieve it from long-term memory. A menu, checklist, or another display can aid the user by easing the use of their memory. However, the use of memory may sometimes benefit the user by eliminating the need to reference some type of knowledge in the world (e.g. an expert computer operator would rather use direct commands from memory than refer to a manual). The use of knowledge in a user s head and knowledge in the world must be balanced for an effective design. 12. Principle of predictive aiding. Proactive actions are usually more effective than reactive actions. A display should attempt to eliminate resource-demanding cognitive tasks and replace them with simpler perceptual tasks to reduce the use of the user s mental resources. This will allow the user to not only focus on current conditions, but also think about possible future conditions. An example of a predictive aid is a road sign displaying the distance to a certain destination. 13. Principle of consistency. Old habits from other displays will easily transfer to support processing of new displays if they are designed in a consistent manner. A user s long-term memory will trigger actions that are expected to be appropriate. A design must accept this fact and utilize consistency among different displays. 27

28 Visual design principles for good design The universal principles of visual communication and organization are (Watzman & Re, 2012): Harmony - Refers to grouping of related parts, so that all the elements combine logically to make a unified whole. In interface design this is achieved when all design elements work in unity. Balance - Offers equilibrium or rest. Provides the equivalent of a center of gravity that grounds the page. Without balance, the page collapses, all elements are seen as dispersed and content is lost. Balance can be achieved by using symmetry or asymmetry. Simplicity - Is the embodiment of clarity, elegance and economy. Involves distillation every element is indispensable, if an element is removed, the composition falls apart. Two common guidelines to achieve simplicity are: less is more and When in doubt, leave it out. Several things have to be considered when designing visual communications, such as web pages, different visual displays or dashboards. These include aspects such as typography, color, field of vision, page layout design, graphs and charts and. amount of information on display. Typography Typographic choice affects legibility and readability, meaning the ability to easily see and understand what is on a page. Legibility, the speed at which letters and the words built from them can be recognised, refers to perception. Readability, the facility and ease with which text can be read, refers to comprehension. Regardless of the media, legibility and readability depends on different variables, such as point size, letter pairing, word spacing, line length and leading, resolution, color and organisational strategies, such as text clustering. Type size is also dependent on the resolution offered by output and viewing devices, color usage, context, and other design issues. In choosing a typeface, its style, size, spacing and leading, the designer should think about the final output medium and examine this technology s effect on legibility. Low quality monitors and poor lighting have a major impact: serifs sometimes disappear, letters in small bold type fill in and colored type may disappear altogether. Line spacing effects in such way that when there is greater space between the words than there is between lines, the reader s eye naturally falls to the closest word, which may be below instead of across the line. White on black (or light on a dark background) is generally regarded as less legible and much more difficult to read over large areas, compared to the colors being other way around. (Watzman & Re, 2012) Color The appropriate use of color can make it easier for users to absorb large amounts of information and differentiate information types and hierarchies. Color is often used to: Show qualitative differences; Act as a guide through information; Attract attention/highlight key data; Indicate qualitative changes; Depict physical objects accurately. For color to be effective, it should be used as an integral part of the design program, to reinforce meaning and not simply as decoration. One important thing to remember is that at least 9% of the population, mostly male, is color-deficient to some degree, so color shouldn t be used as an only cue. This is especially important in critical 28

29 situations, such as warnings. Therefore color should be used as a redundant cue when possible. (Watzman & Re, 2012) Field of vision Field of vision refers to what a user can see on a page with little or no eye movement. A good design places key elements in the primary field of vision, reflecting and reinforcing the information hierarchy. Size, contrast, grouping, relationships, and movement are tools that create and reinforce field of vision. The user first sees what is visually strongest, not necessarily what is largest or highest. Animated cues, such as blinking cursors and other implied structural elements like handles around selected areas become powerful navigational tools if intuitively understood and predictably applied. (Watzman & Re, 2012) Page design Two important functions of page design are motivation and accessibility. A welldesigned page is inviting, drawing the eye into the information. Motivation and accessibility are accomplished by providing the reader with ways to quickly understand the information hierarchy. At a glance, the page design should reveal easy navigation and clear, intuitive paths to discovering additional details and information. This is called visual mapping. A visually mapped product has: An underlying visual structure or grid, organizational landmarks, graphic cues and other reader aids; Distinctly differentiated information types; Clearly structured examples, procedures, and reference tools; Well-captioned and annotated diagrams, matrices, charts, and graphics. Grid enables a user to navigate a page quickly and easily. A grid specifies placement for all visual elements. The user anticipates where a button will appear or how help is accessed. A well-designed page should give a hint at all topics contained in the site, provide high-level information about these topics, and suggest easy paths to access this information. Consistent use of type, page structure, and graphic and navigational elements creates a visual language that decreases the amount of effort it takes to read and understand a communication piece. (Watzman & Re, 2012) The Gestalt principles, illustrated in Figure 2, shows how different objects should be placed on the display, if they should be regarded as a group by the user (Ritter et al. 2014). 29

30 Figure 2. Gestalt principles of visual grouping (Ritter et al. 2014). Charts, diagrams, graphics and icons People don t have time to read. Therefore, in general, the users prefer well-designed charts, diagrams, and illustrations that quickly and clearly communicate complex ideas and information. It is very difficult to create an icon that, without explanation, communicates a concept across cultures. If an icon must be labeled, it is really an illustration and the icon s value as visual shorthand is lost. It is better to use a word or short phrase rather than word and image when screen space is at minimum. (Watzman & Re, 2012) On the other hand, Ritter et al. (2014) suggested that the use of icons can be eased by text that appear on top of the icon, when going close with the mouse. Photograph can easily represent an existing object, but issues relating to resolution and cross-media publishing can make it unintelligible. Illustrations allow to present abstract concepts or objects that do not exist and it can help to focus the viewer s attention to a certain detail. Graphics are invaluable tools for promoting additional learning and action, because they reinforce the message, increase information retention and shorten comprehension time. Different people learn through different cognitive modes or styles. Therefore it may be wise to use various modes, such as text, charts, photos or allow the mode to be customized. (Watzman & Re, 2012) Amount of information Especially, the hand-held devices have very limited space for presenting the information. When evaluating how much information should be presented in the screen, the demands from cognitive and visual perspectives may be contradicting. Schlick et al (2012) stated that presenting little information on a screen at time helps to avoid visibility problems resulting from high-information density. On the other hand, 30

31 presenting as much information on screen as possible allows users to have maximum foresight (cognitive preview) of other functions on the menu, which should benefit information access from a cognitive point of view and minimize disorientation. (Schlick et al. 2012) 31

32 4. Human-machine interaction technologies MANU LeanMES As discussed by Danielis (2014), after the industry has already undergone three revolutions in the form of mechanization, electrification, and informatization, as fourth industrial revolution, the Internet of Things and Services is predicted to find its way into the factory. For this development, e.g., in Germany the term Industry 4.0 has been created. The vision is the so-called Smart Factory with novel production logic: The products are intelligent and can be identified clearly, constantly located, and are aware of their current state. These embedded production systems shall be interconnected with economic processes vertically and combined to a distributed real-time (RT) capable network horizontally. (Danielis 2014) An important role will be played by the paradigm shift in human-technology and human-environment interaction brought about by Industrie 4.0, with novel forms of collaborative factory work that can be performed outside of the factory in virtual, mobile workplaces. Employees will be supported in their work by smart assistance systems with multimodal, user-friendly user interfaces. (INDUSTRIE ) The ongoing transformation towards digital manufacturing paves the way for adoption of novel user interfaces for factory floor operators. While many of the technologies for instance for augmented reality have been there for quite some time, their use in industrial context has been rare to date (Nee et al. 2012). Adoption of manufacturing ITsystems, such as MES (Manufacturing Execution System), will support the real time data collection from the manufacturing operations in a digital format. This data, earlier nonexistent, can then be used throughout the organization for better and more synchronized management and control of the operations. Such digitalization will also allow the relevant real time information to be displayed to the factory workers through multitude of different user interface technologies This chapter will introduce some of the available and emerging human-machine interaction technologies and show some examples of the applications. It will start by discussing about direct and indirect input devices in general, after which it introduces specific technologies, such as mobile devices, augmented reality, speech and gesture recognition, in more detail. Each technology will be evaluated based on their technology readiness levels (TRL), which are commonly used in European Commission s (EC) Horizon 2020 program. The evaluation is done based on the material available on the technologies. The focus in this report is mainly between commercial and prototype technologies. The technology readiness levels are, as defined by EC, the following: TRL 0: Idea. Unproven concept, no testing has been performed. TRL 1: Basic research. Principles postulated and observed but no experimental proof available. TRL 2: Technology formulation. Concept and application have been formulated. TRL 3: Applied research. First laboratory tests completed; proof of concept. TRL 4: Small scale prototype built in a laboratory environment ("ugly" prototype). TRL 5: Large scale prototype tested in intended environment. TRL 6: Prototype system tested in intended environment close to expected performance. TRL 7: Demonstration system operating in operational environment at precommercial scale. TRL 8: First of a kind commercial system. Manufacturing issues solved. 32

33 TRL 9: Full commercial application, technology available for consumers. MANU LeanMES 4.1. Direct and indirect input devices In human-computer interaction, the human has to be able to give the commands and input information to the computer in some way. The input devices can be either direct or in-direct. This section will give examples of the devices belonging to these two categories and introduce the characteristics of direct and in-direct input devices. A direct-input device has a unified input and display surface. Indirect input device does not provide input in the same physical space as the output. Examples of direct input devices are touch screens and display tablets operated with a pen (or other stylus). In contrast, e.g. mouse is an indirect input device, because the user must move the mouse on a surface (the desk) to indicate a point on another surface (the screen). (Hinckley & Wigdor, 2012) Welsh et al. (2012) stated, that even though mouse, keyboard and joystick devices will continue to dominate for the near future, embodied, gestural and tangible interfaces where individuals use their body to directly manipulate information objects are rapidly changing the computing landscape. Example is the touchscreen, which allows user to, instead of pointing and clicking with a mouse, to directly pull, push, grab, pinch, squeeze, crush and throw virtual objects. The user doesn t need to use dissociated (mouse) and/or arbitrary (keyboard and joystick) sensorimotor mappings to achieve his/her goals. These new modes of interaction allow more direct mapping of the user s movements on to the workspace. (Welsh et al. 2012) Touch screen is the most common example of direct input devices. They are used for instance in tablet devices, mobile phones, laptop screens and large wall-mounted displays. There exist different kind of touch screen types (Hinckley & Wigdor 2012; Schlink et al. 2012): Resistive touch screens - React to pressure generated by finger or stylus. Require pressure and may be fatiguing to use, but can be used by operators wearing gloves. Capacitive touch screens - A human touch on the screen s surface results in an alteration of the human body s electrostatic field, which is measured as a change in capacitance. Require contact from bare fingers in order the touch be sensed. However, soft touch is enough. Surface acoustic wave touch screens - Use ultrasonic wave created by a fingertip on the surface. Optical touch screen - Use several optical sensors around the corners of the screen to identify the location of the movement or touch. Dispersive signal touch screens - Detect the mechanical load created by a touch. Strain gauge touch screens - Are also known as force panel technology. Are spring mounted in every corner. Identify the corresponding deflection when the screen is touched and locates it. Touch screens can also be divided into single touch and multiple touch screens. Single touch interfaces are able to detect only one touch point at a time. They resemble mouse and are good for pointing. Multi-touch interface is able to detect multiple fingers (i.e. 33

34 touch points) simultaneously and can be thus used e.g. for pinch to zoom. Capacitive screens, optical (infrared) screens, and most recently resistive screens, can be used for multi-touch purpose. (Schlick et al. 2012) Hinckley and Wigdor (2012) highlighted, that direct input on wall-mounted displays is commonplace, but the constant physical movement required can become burdensome. Interacting with portions of the display, that are out of the view or beyond arm s length, may also raise challenges. As stated by Hinckley & Wigdor (2012), indirect input scales better to large interaction surfaces, because it requires less body movement and also allows interaction at a distance from the display. Input technologies, which use gestures and other body input, are also categorized as direct input devices. Gestures are considered as a natural way of interacting with machines. However, in gesture-based interaction, the main challenge is to correctly identify when a gesture, as opposed to and identical hand movements, starts and stops. It is not so clear when the user is actually trying to interact with the machine and when not. Similar challenge exists with speech interfaces. Also, gesture and other body input may cause fatigue, e.g. if one s arms have to be extended for long periods of time. Indirect input devices can be divided into absolute and relative input devices. An absolute input device senses the position of an input and passes this message to the operating system. Relative devices sense only changes in position. Absolute mode is generally preferable for tasks such as drawing, handwriting an tracing, whereas relative mode may be preferable for traditional desktop graphical user interaction task, such as selecting icons or navigating through menus. (Hinckley & Wigdor, 2012) Common indirect input devices, in addition to mice and keyboards, are touchpads, trackballs and joysticks (Hinckley & Wigdor, 2012): Touchpads are small, touch-sensitive tablets, which are often used in laptop computers. Usually they use relative mode for cursor control, because they are too small to map to an entire screen. The small size of the touchpad necessitates frequent clutching, and may require two hands to hold down the button. Trackball senses the relative motion of a partially exposed ball in two degrees of freedom. They may require frequent clutching movements because users must lift and reposition their hand after rolling the ball through a short distance. Joysticks: Isometric joystick is a force-sensing joystick that returns to the center when released. Isotonic joysticks sense the angle of deflection. Keyboards are either indirect or direct input devices. The graphical keyboards of touch screens are direct input devices. Many factors influence typing performance with keyboards, including key size, key shape, activation force, key travel distance and the tactile and auditory feedback provided by striking the keys. Touch screens graphical keyboards require significant visual attention, because the user must look at the screen to press the correct key. The quality of tactile feedback is poor when compared with a physical keyboard because the user can not feel the key boundaries. A graphical keyboard (as well as the user s hand) occludes a significant portion of a device s screen, resulting in less space for the document itself. Furthermore, because the user typically cannot rest his/her fingers in contact with the display (as one can with mechanical keys) and also because the user must carefully keep other fingers pulled back so as to not 34

35 accidentally touch keys other than the intended ones, extended use of touch-screen keyboards can be fatiguing. (Hinckley & Wigdor, 2012) 4.2. Mobile Interfaces and Remote Sensors The basic consumers of mobile devices, such as smart phones and tablets, use such devices for the purpose of media consumption, picture/video capture, social collaboration, web browsing, communication, games, mapping and route planning. Recently industry has found mobile devices useful; however the changes are happening slowly. Future manufacturing operator tools are based on mobile communication, decision support and IT, enhancing operator capability. The Operator of the Future 1 - project in Sweden has developed and tested concepts, relying on mobile technologies, such as adaptive work instructions, dynamic checklists, logbook, reporting, localization, remote support, decision support, statistics, remote monitoring and control. The global market requires that decisions are taken as quickly as possible, even if people who are responsible for it are out of physical limits of their companies. Therefore the possibility to access critical information anywhere and anytime, with mobile devices, is an indispensable key. For example, as stated by Moran (2013) with MES mobile applications, data is made available on demand regardless of physical location, providing real-time insight into operational and business performance. In manufacturing, abnormal operating events that require action can occur at any time and it is important that the right resources are aware of these events as near to real-time as possible to minimize the impact on profitability. (Moran 2013) Researchers have been interested in implementing mobile devices to set remote access to HMIs recently. Using web services is one way for this integration Cavalcanti (2009) described architecture of a system that provides access to factory floor information from cell phones which can be called remote monitoring. The system uses many communication technologies like OPC and Web Services enabling the critical information like setpoints, alarms and thresholds to be viewed on the cell phone, from anywhere. (Cavalcanti 2009). Moreover, the use of wireless technologies has aimed to make the interfaces more flexible, simplify the installation, and target to cost effectiveness. The connections in these technologies need highly reliable conditions even in severe environments. The wireless systems have to deal with upper layer systems as well as other sensors in the design. Therefore, communication protocols that are able to connect to the system are necessary. Previous research has aimed for developing core technologies to implement wireless technology in industrial use. Researched aspects are such as: 1) Creating reliable low power mesh networks. The focus is on how mesh nodes in the system decide to connect and how to minimize the number of routing tables required in an IP network. Reliability also requires time synchronization of data transmission. 2) Considering redundant routes between wireless system and upper layers and avoiding density at the gateways. 3) Creating a seamless communication which has been possible by IPv6 technologies application on nodes. 4) Final core has been to make modules ultra low power.. In this way is that the interface will be fast, reliable and easy to access via environment network. (Yamaji 2008) 1 Operator of the Future by Chalmers [Available in: 35

4.2.1. Mobile Device and Remote Sensor Technologies Tablet and smart phone devices Tablets are familiar devices from the home use. They are slowly finding their ways to the industrial use as well.

36 Mobile Device and Remote Sensor Technologies Tablet and smart phone devices Tablets are familiar devices from the home use. They are slowly finding their ways to the industrial use as well. Most of such devices have touch screen display, which is operated by fingers or stylus. One example of a tablet device is a Motorola ET1 Enterprise 2 (TRL9), released on 2011 (Figure 3). It is designed especially for use in manufacturing companies. Double user log-in, integrated optical barcode scanner, and swappable battery packs with a multi touch panel are some features of this tablet. The operating system may be reached through several mobile Motorola devices which are running android, windows, or windows CE. The device is equipped with WLAN, GPS, and android as operating system. Figure 3. Motorola ET1 Enterprise Tablet. (Figure from Motorola America ET1 Enterprise page 2015). Smart Watches A smart watch is a watch that has more capabilities than only timekeeping. Modern smart watches have similar operating system or sometimes even the same as in a mobile phone. Such devices can have features like camera, accelerometer, thermometer, altimeter, barometer, compass, cell phone, touch screen, GPS navigation, speaker, map display, watch, a mass storage distinguishable by a computer, and a rechargeable 2 Motorola ET1 Enterprise Tablet [Available in: EN/Business+Product+and+Services/Tablets/ET1+Enterprise+Tablet] 36

37 battery. Companies such as Samsung, LG, Asus, Sony, Motorola, Apple, Pebble, Qualcomm, and Exetch have made their smartwatch products. (Melon 2012; Trew 2013) Much has been written about smartwatches lately. However, valuable use cases are still unclear. Independent research company Smartwatch Group has done an in-depth analysis on what will be the most relevant application areas for smartwatches in These are listed in Table 2. Table 2. Smartwatch Group ranking for applications of smartwatches in (Smartwatch Group 2015). Application Key Benefits 2020 Ranking Personal assistance Medical/health Wellness Personal Safety Highly efficient, context-aware management of calendar, tasks, and information needs Basis for huge improvements in therapy for various patient groups; tool to manage medical records Higher body awareness, more movement, better nutrition, less stress, improved sleep Prevention of emergencies; auto-detection and fast support in case it happens Corporate Solutions Simpler, more efficient, safer and cheaper business processes 5 Other wireless interfaces 3Dconnexion SpaceMouse Wireless 3 3Dconnexion presented the SpaceMouse Wireless (TRL 9), which is a wireless 3D mouse and a new solution for industrial integration (Figure 4). The 3D mouse is designed as an input device that helps the engineer to navigate in a 3D CAD environment in 6 degrees of freedom. SpaceMouse Module is addressing the joystick market and is designed as an alternative to a conventional joystick for use in industrial environments. The components are provided in an open housing with a standard metric screw and slimline mount for easy integration. It is available with a serial or USB interface. KUKA use the 3Dconnexion industry module in a robot programing controller, where each robot is taught how to move its arm (Figure 4). The conventional way would be to program axis separately, but with the integration of the industry module in the KUKA SMARTPAD, it is possible to move the arm freely in 6-degrees-of-freedom. This movement is recorded and can be easily implemented in the robots program. 3 CadRelations Youtube Channel Video: HMI 2014: 3Dconnexion, - programing industry robots gets easier. [Available in: 37

38 Figure 4. 3Dconnexion SpaceMouse Wireless used in KUKA robot controller panel. (Screenshot from HMI 2014: 3Dconnexion 2014). Electronic Paper Electronic paper is a technology that tries to show screens like ordinary paper. The difference with backlight papers is the trial to reflect light and empty pixels like normal papers. Use cases for electronic paper are wrist watches, ebooks, newspapers, displays embedded in smart cards, status displays, mobile phones, and electronic shelf labels. Moreover, electronic papers can also be used in in production environment as easily updateable Kanban cards. (Dilip 2010) An electronic shelf label (ESL) is an interesting case to be used in warehouses and shop floor which is for labeling the price or quantity of a product (Figure 5). A communication network allows the display to be automatically updated whenever a product price or amount in warehouse is changed. This communication network is the true differentiation and what really makes ESL a viable solution. The wireless communication must support reasonable range, speed, battery life, and reliability. The means of wireless communication can be based on radio, infrared or even visible light communication. Currently, the ESL market leans heavily towards radio frequency based ESL solutions. Automated ESL systems reduce labor costs of pricing management, improve pricing accuracy and allow dynamic pricing. Dynamic pricing is the concept in which retailers can fluctuate pricing to match demand, online competition, inventory levels, shelf-life of items, and to create promotions. (Dilip 2010) 38

Figure 5. An electronic shelf label (ESL). (Screenshot smarttag from vmsd online page September 2013). Remote sensors Example: Irisys people counting System 4 InfraRed Integrated Systems Ltd.

39 Figure 5. An electronic shelf label (ESL). (Screenshot smarttag from vmsd online page September 2013). Remote sensors Example: Irisys people counting System 4 InfraRed Integrated Systems Ltd. has made an infrared system so called Irisys People Counting (TRL 9), which sensors are designed to detect the heat emitted by people passing underneath it as infrared radiation (Figure 6). The units contain imaging optics, sensors, signal processing and interfacing electronics all within a discretely designed moulded housing. Up to eight virtual counting lines are defined by an operator using a portable PC setup tool, and people are counted as they pass each line in a defined direction. Mounting heights of between 2.2m and 4.8m can be accommodated with the standard lens. Other lens options are available to cover higher mounting heights. 4 Irisys People Counting [Available in: 39

40 Figure 6. Irisys People Counter. (Screenshot from Irisys People Counting 2015) Mobile Devices and Remote Sensors Applications The main industrial application environments where mobile devices have been reported are such as warehouses, military, emergency services, and construction workers. In factory floor and construction sites, tablets have to be ruggedized and protected from water and dust ingress. While these kinds of tablets and protection are available, adoption has been still slow. One reason could be the restriction of supporting all applications of a company to be running on one mobile device. However armies have been able to use smartphones and develop modified versions of various platforms that allows for access to , documents, and a partitioned ecosystem of apps and other enterprise apps at the high level of security necessary. (IQMS 2011) In the following, a couple of application examples for mobile devices and remote sensors are shown. Running Enterprise Resource Planning ERP on Mobile Devices Innowera presented the application to run SAP for mobile devices using Innowera Web and Mobile Server 5. The application has built-in offline capabilities and offers device management, user management, and back office integration capabilities. It can be installed on ios and Android without the need for writing a new app for each platform and the possibility to be hosted on Microsoft Azure, AWS, HP Cloud. The InnoweraApp can be downloaded from Apple itunes or Google play. Afterwards the user need to process to Innowera Web and Mobile Server (IWMS). If required, one can change published processes using any HTML5 editor. 5 Innowera Mobile [Available in: 40

IQMS EnterpriseIQ mobile technology 6 (TRL 9) extends manufacturing of ERP functionality with real-time manufacturing, MES and ERP information on the go via smart phones, PDAs, and tablets.

41 IQMS EnterpriseIQ mobile technology 6 (TRL 9) extends manufacturing of ERP functionality with real-time manufacturing, MES and ERP information on the go via smart phones, PDAs, and tablets. IQMS ERP software allows checking production process in real-time, take record and tries to give a full integration in ERP system. Strong data encryption, as well as user defined security roles, ensure data is secure as taking advantage of options such as CRM, document control, lot number changes, production and reject reporting, quick inspections, real-time work center monitoring. Pro-face Remote HMI 7 Pro-face is software for developing human-machine interfaces (HMIs). Pro-face Remote (TRL 9) is an HMI prepared and designed for implementing on tablets and smartphones. Systems integrators on factory floor may use it for checking I/Os, what happened on the system or following the machine steps and movements (Figure 7). The system status monitoring may be synchronous or asynchronous. The system alarms may be viewed on mobile device and in drastic cases it is easy to reach the right person contact info for taking proper actions. Snapouts and remote monitoring are other features to be used. Figure 7. Checking the machine movement with tablet device by Proface Remote HMI. (Screenshot from Pro-face Remote HMI intro video 2013). Tablets on factory floor and warehouse Companies such as Cheer Packs North America 8 use Microsoft Surface Pro tablet (TRL 9) for office people, warehouse and quality management. In Figure 8 a quality specialist is auditing on factory floor, inputting information and taking pictures to the quality management software by using a Surface Pro device. As a task, user can take evidence of possible problems, send it to someone else or save it for a future process. Based on the employee feedback, the device has improved time efficiency, as it saves time from operators, supervisors and quality inspectors in walking between different screens for monitoring and information input. 6 IQMS Mobile ERP Apps for Manufacturing Companies [Available in: 7 Pro-face Remote HMI [Available in: 8 Surface Pro Youtube Channel Video: Cheer Pack North America gains efficiency with Surface on the factory floor. [Available in: 41

Figure 8. Quality specialist taking picture with Surface Pro. (Screenshot from Surface Pro intro on factory floor Cheer Packs North America 2014).

checkout initiative (Figure 9). Kroger s QueVision technology is powered by Irisys intelligent Queue Management solution.

42 Figure 8. Quality specialist taking picture with Surface Pro. (Screenshot from Surface Pro intro on factory floor Cheer Packs North America 2014). QueVision System for Traffic Control 9 QueVision combines infrared sensors over store doors and cash registers, predictive analytics, and real-time data feeds from point-of-sale systems for a faster checkout initiative (Figure 9). Kroger s QueVision technology is powered by Irisys intelligent Queue Management solution. It uses infrared sensors and predictive analytics to arm store front-end managers with real-time data to make sure registers are open with customers need them. The solution, across the Kroger family of stores, has reduced the time a customer waits in line to check out, on average, from four minutes before QueVision to less than 30 seconds today. Figure 9. Kroger Traffic Control System aims to provide customer for a faster checkout. (Figure from Kroger mobile innovations 2014). 9 Kroger Co s QueVision for Traffic Control [Available in: 42

43 4.3. Virtual and Augmented Reality Immersive Virtual Reality (VR) is a technology that enables users to enter into computer generated 3D environments and interact with them. In VR technologies, the human body movements are monitored by using different tracking devices. This enables intuitive participation with and within the virtual world. Head mounted displays (HMDs) are a commonly used display device for VR, using the closed view and non-see-through mode. (Schlick et al. 2012) Augmented reality is characterized by visual fusion of 3D virtual objects into a 3D real environment in real time. Compared to VR, AR supplements reality, rather than replacing it. With AR, developers create various virtual models in a way that users can interact with those and distinguish between virtual and real world. An AR system includes processor, sensors, display and input devices. The display system can be a monitor or screen mounted in workplace, a head-mounted display, or eyeglasses. (Graham 2012) Even though AR technologies have existed already some years, their implementation in real industrial environments has been rare (Nee et al. 2012). It is seen that the emergence of manufacturing IT-solutions, which can collect and manage the manufacturing information, will pave the way for more AR implementations. Furthermore, as stated by Schlick et al. (2012) the recent advances in wearable computer displays, which incorporate miniature TFT LCDs directly into conventional eyeglasses or helmets, should simplify ergonomic design and further reduce weight of the VR and AR technologies (Schlick et al. 2012). The most common usage contexts for AR has been reported as conceptual product design, education and training, visual tracking and navigation, work instructions and remote help centers. (Nee et al. 2012; Graham 2012) In the following sections the technologies and application examples of augmented reality will be discussed. The focus will be on head-mounted displays, as the other technologies, such as mobile devices, gesture control and speech recognition, are discussed on other sections of this report Technologies for Augmented Reality Head-mounted displays (HMDs) are common technology in Augmented Reality applications used to overlay the real world with virtual information is to use an HDM. The overlaying can be done in two ways, either by using an HMD in see-through mode, or by using an HMD in non-see-through mode, called video-see-through. The latter approach optically isolates the user completely from the surrounding environment, and the system must use video cameras to obtain a view of the real world. In optical seethrough HMD the user sees the real scene through optical combiners and no video channel is needed. (Schlick et al. 2012) The HMDs can generally be divided into following categories (Schlick et al. 2012): Monocular - Single display source, which provides the image to one eye only. Binocular (2D) - Two displays with separate screens and optical paths, enabling both eyes to see the same image simultaneously. 43

Binocular (3D) - Allow stereoscopic viewing with 3D depth perception. This is produced by presenting two spatially slightly incongruent images to the left and right eyes. As discussed by Welsh et al.

44 Binocular (3D) - Allow stereoscopic viewing with 3D depth perception. This is produced by presenting two spatially slightly incongruent images to the left and right eyes. As discussed by Welsh et al. (2012) an HMI can assist with target detection, because it overlays critical cue information over the actual environment, reducing the scanning time required to sample and attend both the display and the environment. In the following, few existing HMD products are introduced. Google Glass 10 Google glass is the smart wearable glass developed by Google (TRL 7). The sales section of google glass beta version is stopped however the development is still proceeding and the goal is to release a fine version of glasses. Google Glass projects the rendered image through a lens and into the retina. Figure 10 shows a projector and a prism working together. Figure 10. A projector and a prism working together in a google glass (Figure from techlife 2013). The result is that the user perceives a small translucent screen hovering at about arm s length distance, as extended up and outward from the right eye. Since the colors are cycling very quickly, the user perceives a full color video stream. The touch pad installed beside the glass gives the capability to switch among the menu and search among past and current events or taping on it opens one application. The camera also has the possibility to take photos and record 720p videos. (Glass Help 2015) Figure 11 show different parts of the glasses and Figure 12 a user wearing a google glass. 10 Glass Help [Available in: 44

Figure 11. Google glass structure including list of sensors and location of the processor. (Figure from elsevier-promo online page 2015). Figure 12. Google glass image preview.

EyeTap: The eye itself as display and camera 11 EyeTap (TRL 7) is a device which allows, in a sense, the eye itself to function as both a display and a camera.

45 Figure 11. Google glass structure including list of sensors and location of the processor. (Figure from elsevier-promo online page 2015). Figure 12. Google glass image preview. (Figure from Cult of Android online page October 2013). EyeTap: The eye itself as display and camera 11 EyeTap (TRL 7) is a device which allows, in a sense, the eye itself to function as both a display and a camera. EyeTap is at once the eye piece that displays computer information to the user and a device which allows the computer to process and possibly alter what the user sees. That which the user looks at is processed by the EyeTap. This allows the EyeTap to, under computer control, augment, diminish, or otherwise alter a user's visual perception of their environment, which creates a Computer Mediated Reality. Furthermore, ideally, EyeTap displays computer-generated information at the appropriate focal distance, and tonal range. Figure 13 depicts and describes the basic functional principle of EyeTap. Note from the diagram that the rays of light from the environment are collinear with the rays of light entering the eye (denoted by the dotted lines) which are generated by a device known as the aremac. "Aremac" is the word 11 Eyetap research project. [Available in: 45

46 camera spelled backwards and is the device which generates a synthetic ray of light which is collinear with an incoming ray of light. Ideally, the aremac will generate rays of light to form an image which appears to be spatially aligned, and appears at with the same focus as the real world scene. (EyeTap Research Project Page 2015) Figure 13. Basic functional principle of EyeTap. (Figure from eyetap online page 2015). Canon Mixed Reality headset 12 Canon's Mixed Reality (MREAL) headset (TRL 9) delivers augmented reality. Canon s Mixed Reality is pitched as a high-end tool for product designers in the automotive, construction, manufacturing, or research fields. The system works differently than Google Glass. MREAL's bulky-looking headset positions two cameras in front of eyes, which display a combination of video from surroundings and computer-generated graphics (Figure 14). Canon created MREAL to allow designers to interact with simple designs of their products, which will look like highly detailed objects through the glasses by the headset's computer-powered augmented reality. Basically, it allows designers to interact with intricate, computer-generated versions of their ideas in a 3D environment. The head-mounted display is linked to a controller, which is connected to a computer generating the video of user surroundings. 12 Canon Mixed Reality (MREAL) headset [Available in: 46

Figure 14. Canon Mixed Reality (MREAL) headset system architecture using augmented reality (Figure from Canon Mixed Reality headset online page 2015).

47 Figure 14. Canon Mixed Reality (MREAL) headset system architecture using augmented reality (Figure from Canon Mixed Reality headset online page 2015). Microsoft HoloLens 13 The HoloLens of Microsoft (TRL 5) wraps around user s head and does not isolate user from the world. It has the Intel SoC and custom Holographic Processing Unit as built in. That does not just allow the user to see the digital world projected around but on top of the real world. User can see the person standing next to and talk to them, avoid walking into walls and chairs as well as looking at a computer screen, because HoloLens detects the edge and does not project over it so there is no need to keep taking it on and off during the work. One can take notes or answer on a computer with a keyboard or a pen instead of trying to force gestures and gaze. The HoloLens projected screen moves as the user moves the head. User can control the apps either with voice commands or by using the equivalent of a mouse click as the air tap. Making a Skype call from HoloLens is a good way to try out voice and gesture commands; it is possible to search the person to call in the address book then air tap to connect. The other party does not require a HoloLens and is able to see in Skype what the Holones user is looking at and for example draw diagrams on the video that appear in user s view (Figure 15). 13 Microsoft Microsoft HoloLens. [Available in: 47

48 Figure 15. Hololens example application for customer service purposes (Figure from Microsoft Hololens online page teach and learn 2015) AR application examples Many companies and research groups recently started to create and develop methods to use AR and this section aims to introduce some industrial and non-industrial application examples. Google Glass Applications 14 Google have designed basic applications for glassware such as taking photos, record a video, finding directions, or search google. However it takes time to get used to wearing the glasses. There are also applications available in Glass appstore and from third parties. For instance Tesco grocery Glassware lets the user to browse, view nutritional information, and add items to the shopping basket hands-free. Other example is Magnify that lets users zoom-in on objects located in front of them. Users with limited vision are able to zoom in and out in order to see objects at a closer range with a voice command. Magnify runs for 30 seconds and users have the option to extend the time. Currently, also IFTTT 15 (IF This Then That) is also available for Google Glass. This service exists to automate the tasks user regularly perform across a wide range of popular apps and services. Augmented reality applications from SAP 16 SAP is working with smart eyewear company Vuzix to bring augmented reality and smart glasses into industrial environments. The applications are targeted especially to field technicians and warehouse workers, where hands-free computing can aid in data collection and operations. The two applications launched are the SAP AR Warehouse Picker and the SAP AR Service Technician (TRL 9). Both applications utilize visualization and voice recognition to receive instructions via the M100 Smart Glasses to complete 14 Glassware Apps Online Page [Available in: 15 About IFTTT. [Available in: 16 SAP. Augmented Reality Apps. [Available in: 48

daily tasks without hand-held devices or instructions. The aim is to make the operations faster, more efficient and better quality (reduce mistakes).

49 daily tasks without hand-held devices or instructions. The aim is to make the operations faster, more efficient and better quality (reduce mistakes). SAP AR Warehouse Picker 17 SAP AR Warehouse Picker (Figure 16) aims to instruct the warehouse worker in the picking operations, and to collect the information of the picked items. With the application the users are able to scan barcodes and QR codes for handling units, locations, products, stations and any other required scans. It is also possible to give voice input for quantity confirmation. The usage of smart glasses and AR technology eliminates the need for hand-held scanners, which have been making the picking operations difficult by occupying one hand. The hands-free functionality reduces the time the workers must spend interacting with handheld scanners and devices. To get started, workers connect the smart glasses with the organization s back-end or gateway system and load warehouse picking tasks. Pickers are then guided through tasks according to the steps required for each item to be picked. Voice-recognition and visualization functionality drive task completion and accuracy with prompts and stepby-step directions. Operators can navigate through software options and enter data (e.g. completion of tasks) with voice command. The smart glasses include speakers for the audio prompts, as well as built-in scanning functionality. For example, the application can give workers audio prompts to scan a particular item with the smart glasses, pick an item up off the shelf, or enter an item quantity. The authentication of users is verified by scanning a unique QR code through the smart glasses. Figure 16. SAP AR Warehouse Picker application guiding the worker in the picking operations (screenshot from SAP Enterprise Mobile 2013). 17 SAP Enterprise Mobile Video: SAP & Vuzix Bring you Augmented Reality Solutions for the Enterprise. [Available in: 49

50 SAP AR Service Technician 18 SAP AR Service Technician (Figure 17) aims to instruct the technician in service operations. With the application, users have access to 3D visual enterprise models of their workplace and the use of an expert calling feature, which allows a remote expert to give directions to a colleague while streaming a visual from the head set. The application supports voice-activated commands and audio-note functionality. The hands-free functionality allows the operator to concentrate on the skilled and precise hand tasks. To get started, technicians need to sync the smart glasses with a tablet or laptop to retrieve all necessary data and any new voice notes from SAP Work Manager, left by other workers and stakeholders. They can scan the QR code and select from a list of procedures. Once the information for the current job is loaded into the smart glasses, workers can navigate the software with voice-activated commands. They can browse 3D visualizations and information including instructions, operational steps, and parts lists. They can drill into details for more information on a specific part, listen to equipment voice notes, and record new voice notes. Browsing through procedure steps happens by commands such as Next, Previous, and Step. For each step, the 3D model of the part or item will animate, and audio and textual instructions can be provided if available in the visual enterprise model. In order to get over-the-shoulder expert assistance, the field technician can use voice commands to select from a list of available experts and make the call. The expert can see in real-time what the technician sees through the camera in the smart glasses, and the technician can see the expert in the smart glasses. Figure 17. SAP AR Service Technician application guiding the service operator (screen shot from SAP 2014). AstroVAR 19 AstroVAR is a projected augmented reality system and a product from Delta Sygni Labs (TRL 9). It enables visual communication between the remote expert and the on-site personnel. Experts can see the situation and help from the office by using a laser pointer showing visual instructions on workpieces and devices. With the expert knowledge, 18 SAP Video: SAP and Vuzix bring you the future of Field Service. [Available in: 19 Delta Sygni Labs AstroVAR product [Available in: 50

Simulo Engineering AR help platform 20 Assembly tasks, disassembly, diagnosis routine, pre-assembly operations are samples of AR use-cases improved in Simulo Engineering (TRL 9).

51 straight to the point, the on-site personnel can fix the problem. Equipment is back in service and the on-site site visit is avoided. Some features are such as wireless, no glasses and feasibility to use (Figure 18). Figure 18. Delta Sygni Labs AstroVAR product for technical support (Delta Sygni Labs online page 2014). Simulo Engineering AR help platform 20 Assembly tasks, disassembly, diagnosis routine, pre-assembly operations are samples of AR use-cases improved in Simulo Engineering (TRL 9). AR implementation for work instruction of arm loader is a fine example of teaching inexperienced workers to do new tasks (Figure 19). Figure 19. The assembly process for a manipulator using AR guides on screen installed in the environment. (Screenshot from Simulo Engineering using industrial Application of AR in 2012) 20 Simulo Engineering. AR industrial Applications. [Available in: 51

52 4.4. Gesture and Speech Control Gesture and Speech control are often used in augmented reality applications. They are becoming more common with the emergence of multimodal interfaces. As highlighted by Karat et al. (2012) speech technology, like the other recognition technologies, lack 100% accuracy. This is due to the fact that the individuals speak differently from each other, and because the accuracy of the recognition is dependent on an audio signal that can be distorted by many factors. The accuracy depend on the choice of the underlying speech technology, and making the best match between the technology, the task, the users, and the context of use. Automatic speech recognition can have explicitly defined rule-based grammars or it can use statistical grammars such as a language model. Usually a transactional system uses explicitly defined grammars, while dictation systems or natural language understanding (NLU) systems use statistical models. (Karat et al. 2012) In general, it is effective to use speech applications for situations when speech can enable a task to be done more efficiently, for instance, when a user s hands and eyes are busy doing another task (Karat et al 2012). The dialog styles in speech recognition systems include: Directed dialog (systeminitiated) in which the user is instructed or directed what to say at each prompt; Userinitiated in which the system is passive and the user is not prompted for specific information; Mixed initiative in which the system and the user take turns initiating the communication depending on the flow of the conversation and the status of the task. (Karat et al. 2012) Hickley & Wigdor (2012) brought out some limitations relating to speech recognition. First of all, it can only succeed for a limited vocabulary. The error rates increase as vocabulary grows and the complexity of grammar increases, if the quality of the audio signal from the microphone is not good enough, or if users employ out-of-vocabulary words. Speech is inherently non-private in public situations, and can also be distracting for persons nearby. Spatial locations are not easily referred by speech, which means that speech cannot eliminate the need for pointing. (Hinckley & Wigdor 2012) In recent years, the robustness of speech recognition in noisy environments has been improved by speech/lip movement integration. This kind of work has included classification of human lip movement (visemes) and the viseme-phoneme mappings that occur during articulated speech. (Dumas et al. 2009) As stated by Hinckley & Wigdor (2012) for computer to embed themselves naturally within the flow of human activities, they must be able to sense and reason about people and their intentions e.g. to know when the user is trying to interact with the system, and when he/she is talking or interacting (e.g. waving) with other people. This issue applies to both gesture and speech control Technologies for Gesture and Speech control Kinect Kinect (codenamed in development as Project Natal and currently with TRL 9) is a line of motion sensing input devices by Microsoft for video game consoles and Windows PCs. Based around a webcam-style add-on peripheral, it enables users to control and interact 52

with their console/computer without the need for a game controller, through a natural user interface using gestures and spoken commands (Project Natal 2009). The body position is estimated in 2 steps.

53 with their console/computer without the need for a game controller, through a natural user interface using gestures and spoken commands (Project Natal 2009). The body position is estimated in 2 steps. First the device draws a depth map by using structured light, and then finds body position by machine learning. Inside the sensor case 21, a Kinect for Windows sensor (Figure 20) contains firstly an RGB camera that stores three channel data in a 1280x960 resolution. This makes capturing a color image possible. It also contains an infrared (IR) emitter and an IR depth sensor. The emitter emits infrared light beams and the depth sensor reads the IR beams reflected back to the sensor. The reflected beams are converted into depth information measuring the distance between an object and the sensor. This makes capturing a depth image possible. Third is a multiarray microphone, which contains four microphones for capturing sound. Because there are four microphones, it is possible to record audio as well as find the location of the sound source and the direction of the audio wave. Finally it includes a 3-axis accelerometer configured for a 2G range, where G is the acceleration due to gravity. It is possible to use the accelerometer to determine the current orientation of the Kinect. Figure 20. Kinect sensor components. (Figure from Kinect for Windows Sensor Components and Specifications 2015.). The first-generation Kinect was first introduced in November 2010 in an attempt to broaden Xbox 360's audience beyond its typical gamer base. Microsoft released the Kinect software development kit for Windows 7 on June 16, 2011 (Knies 2011). This SDK was meant to allow developers to write Kinecting apps in C++/CLI, C#, or Visual Basic.NET (Stevens 2015). SHADOW Motion Capture 22 SHADOW motion capture system (TRL 9) uses inertial measurement units sealed in neoprene fabric (Figure 21). The flexible sensors are small, lightweight, and comfortable to wear. Inertial sensors measure rotation, not position. Shadow includes software to estimate position based on the skeletal pose, pressure sensor data, and a kinematic simulation. The position estimate updates in real time and streams to the 21 Kinect for Windows Sensor Components and Specifications [Available in: 22 SHADOW motion capture system online page. [Available in: 53

viewing and recording systems with the current pose. Shadow skeleton data is viewable in real time and compatible with most 3D digital content creation applications.

54 viewing and recording systems with the current pose. Shadow skeleton data is viewable in real time and compatible with most 3D digital content creation applications. The software provides export to the industry standard FBX, BVH, and C3D animation and mocap file formats. The Software Development Kit (SDK) supports network based streaming of all synchronized pose data. The SDK is open source and available in many popular programming languages. In 2013 a release of the Shadow full body inertial motion capture system was presented, which builds on and extends existing hardware and software platform. Motion Shadow software requires a computer with Wi-Fi. The Motion Viewer and Monitor applications are only available on the Windows platform. Motion Monitor on a Wi-Fi enabled mobile device. Shadow also operates in standalone mode. Use your Wi-Fi enabled mobile device as a remote control. The Motion User Interface supports the following systems, no software or app required. It works on Apple ios (iphone, ipad, ipod Touch), Android, and Windows Phone. Figure 21. Shadow - a full body wearable sensor network for motion capture. (Figure from Motion Node Channel 2013). Thalmic Labs MYO armband 23 MYO armband (TRL 7) senses muscle movements for Minority Report-style motion control. MYO is an armband that translates the muscles' electrical activity into motion controls (Figure 22). The sensor inside the armband has enough sensitivity to pick up individual finger movements. Developers will be able to program for the controller as well. To prevent accidental input, users must activate the motion control with a unique gesture that is unlikely to occur normally. The armband will supposedly be one size fits 23 Thalmic Labs MYO gesture control armband [Available in: 54

all, and uses Bluetooth 4.0. While MYO is built for Windows and Mac, developers can also integrate the device with their Android and ios apps. Figure 22.

55 all, and uses Bluetooth 4.0. While MYO is built for Windows and Mac, developers can also integrate the device with their Android and ios apps. Figure 22. MYO armband has sensors to output the hand gesture. Developers are able to extract styles based on a preferable common hand gesture. (Figure from Thalmic Labs MYO gesture control armband 2014). Haptic Interfaces Haptic devices (or haptic interfaces) are mechanical devices that mediate communication between the user and the computer. Haptic devices allow users to touch, feel and manipulate three-dimensional objects in virtual environments and teleoperated systems. Most common computer interface devices, such as basic mice and joysticks, are input only devices, meaning that they track a user's physical manipulations but provide no manual feedback. As a result, information flows in only one direction, from the peripheral to the computer. Haptic devices are input-output devices, meaning that they track a user's physical manipulations (input) and provide realistic touch sensations coordinated with on-screen events (output). Examples of haptic devices include consumer peripheral devices equipped with special motors and sensors (e.g., force feedback joysticks and steering wheels) and more sophisticated devices designed for industrial, medical or scientific applications (e.g., PHANTOM device). (Mimic Technologies Inc. 2003) Haptic interfaces are relatively sophisticated devices. As a user manipulates the end effector, grip or handle on a haptic device, encoder output is transmitted to an interface controller at very high rates. Here the information is processed to determine the position of the end effector. The position is then sent to the host computer running a supporting software application. If the supporting software determines that a reaction force is required, the host computer sends feedback forces to the device. Actuators (motors within the device) apply these forces based on mathematical models that simulate the desired sensations. For example, when simulating the feel of a rigid wall with a force feedback joystick, motors within the joystick apply forces that simulate the 55

The end result is a sensation that feels like a physical encounter with an obstacle. (Mimic Technologies Inc. 2003) Figure 23 shows an example for a haptic glove. Figure 23. A Haptic Glove gives user the ability to touch virtual objects.

56 feel of encountering the wall. As the user moves the joystick to penetrate the wall, the motors apply a force that resists the penetration. The farther the user penetrates the wall, the harder the motors push back to force the joystick back to the wall surface. The end result is a sensation that feels like a physical encounter with an obstacle. (Mimic Technologies Inc. 2003) Figure 23 shows an example for a haptic glove. Figure 23. A Haptic Glove gives user the ability to touch virtual objects. (Figure from Digital Trends October 2014). Speaker separation HARK 24 HARK developed by Kyoto university is a case which was introduced in 2010 for sound separation to be implemented on the robots. The test demo available online shows the capability of distinguishing voice of 4 different talkers (Figure 24). Figure 24. HARK by university of Kyoto. (Screenshot from Willow garage ROS video 2010). 24 Audition for Robots with Kyoto University (HARK). [Available in: 56

4.4.2 Gesture and Speech Control Application Examples Robotic control by gesture recognition Research example using Kinect cameras was done on gesture control for industrial robots at the department

57 4.4.2 Gesture and Speech Control Application Examples Robotic control by gesture recognition Research example using Kinect cameras was done on gesture control for industrial robots at the department of Information Technology & System Management in FH Salzburg 25. The task was prepared for positioning and picking different parts by following the user hand and applying gesture control (Figure 25). Figure 25. Control an industrial robot by hand using gesture control. (Screenshot from gesture control for industrial manipulator intro in department of Information Technology & System Management in FH Salzburg 2014). Material handling by gesture recognition Many flows of materials and goods at factories and workshops take place manually. A mobile machine can be useful, which is controlled by natural gestures, relieves the workers of heavy loads, and transports them independently. Assistance system FiFi of Karlsruhe Institute of Technology (KIT) aims for this purpose (Phys-engineering 2014). FiFi is an assistance system developed to support man in direct environment in a contact free manner controlling (Figure 26). The mobile platform equipped with a camera system is particularly suited for dynamic material flows at factories and workshops. These flows require high flexibility and are usually executed by man. Typical examples are high bay warehouses for car spare parts, consumer products of big online traders or deliveries of goods between departments of big companies. Via a camera system, the machine three-dimensionally acquires the gestures of the user and executes his commands. For moving or switching into the different modes of operation, no contact is required. It follows the user and may approach him up to an arm's length for loading. When the user points to a line on the floor, it independently moves along the line to the next station, where it is deloaded by the next user. A safety laser scanner prevents it from colliding with objects or people and allows for safe operation. By a gesture, a lifting system can be adjusted to various working heights. 25 fhsits Youtube Channel Video: Control an Industrial Robot by Hand! - Gesture Control. [Available in: 57

Workers can use the handy scanner to check on barcodes and receive voice information as well as give commands about specific product (Figure 27).

58 Figure 26. Mobile machine using gesture control for load carrying. (Figure from Phys engineering August 2014). Jennifer by Lucas Systems 26 Jennifer (available since 2012) is a voice picking system for mobile work in warehouses (TRL 9). Workers can use the handy scanner to check on barcodes and receive voice information as well as give commands about specific product (Figure 27). Afterwards, the worker knows if that is the right location and also how many to pick from the product to the basket. Workers may give voice commands to know whether chosen place for moved product is right. System can inform user about other product details such as expiration date. Figure 27. Jennifer (available since 2012) is a voice picking system for mobile work in warehouses. (Screenshot from Introduction to Voice Picking with Jennifer 2012). 26 Jennifer voice picking by Lucas Systems. [Available in: 58

59 Hotel staffed by robots A hotel staffed by robots will start working on July 2015 in Huis Ten Bosch, which is a Japanese theme park. The two-story, 72-room Henn-na Hotel, which is slated to open July 17, will be staffed by ten robots that will greet guests, carry their luggage and clean their rooms. According to The Telegraph (Bridge 2015), the robots, created by robotics company Kokoro, will be an especially humanoid model known as an "actroid". Actroid robots (Figure 28) are generally based on young Japanese women, and they can speak fluent Japanese, Chinese, Korean and English, as well as mimic body language and human behaviors such as blinking and hand gestures. Three actroids will staff the front desk, dealing with customers as they check in to the hotel. Four will act as porters, carrying guests' luggage, while another group will focus on cleaning the hotel. The hotel itself will also feature some high-tech amenities (Kaplan 2015), such as facial recognition software that will allow guests to enter locked rooms without a key, and room temperatures monitored by a panel that detects a guest's body heat. Figure 28. Robots to serve guests in Japanese hotel. (Screenshot from washiungtonpost 2015). 59

ENHANCED HUMAN-AGENT INTERACTION: AUGMENTING INTERACTION MODELS WITH EMBODIED AGENTS BY SERAFIN BENTO. MASTER OF SCIENCE in INFORMATION SYSTEMS

ENHANCED HUMAN-AGENT INTERACTION: AUGMENTING INTERACTION MODELS WITH EMBODIED AGENTS BY SERAFIN BENTO. MASTER OF SCIENCE in INFORMATION SYSTEMS BY SERAFIN BENTO MASTER OF SCIENCE in INFORMATION SYSTEMS Edmonton, Alberta September, 2015 ABSTRACT The popularity of software agents demands for more comprehensive HAI design processes. The outcome of