Grouve: Proximity Based Ad-Hoc Group Formation with Smartwatches using Sounds in a Corporate Environment

Size: px

Start display at page:

Download "Grouve: Proximity Based Ad-Hoc Group Formation with Smartwatches using Sounds in a Corporate Environment"

Clarissa Barber
5 years ago
Views:

University Hasselt Master thesis Grouve: Proximity Based Ad-Hoc Group Formation with Smartwatches using Sounds in a Corporate Environment Author: Debbie

1 University Hasselt Master thesis Grouve: Proximity Based Ad-Hoc Group Formation with Smartwatches using Sounds in a Corporate Environment Author: Debbie Gijsbrechts Promotor: Prof. dr. Johannes Schöning Thesis proposed to achieve the degree of master in computer science: Human-Computer Interaction Academic year

2 Acknowledgements First and foremost, I would like to thank my promotor Prof. dr. Johannes Schöning for the feedback and input that he provided. Without him, I couldn t have brought this to a good end. I would also like to thank the whole Internet of Things research team at Bell Labs Antwerp for their hospitality (I loved the lunches) and input. Special thanks to Fahim Kawsar, Geert Vanderhulst, and Marc Van den Broeck for their guidance during the evaluation and user studies. I would also like to thank my friends and family who supported me through this stressful period and accepted my (sometimes) erratic behavior as a result of it. Last but certainly not least I would like to thank my boyfriend Vincent van Veghel, who has always supported me and made me see the best things in life. i

3 Abstract More and more devices are apparent in our daily lives. From smart fridges to wearables, the internet of things is expanding. Connecting to nearby devices often relies on Wifi, Bluetooth and NFC connections. Room-based proximity detection to connect with these devices, however, is still hard to accomplish due to radio waves that are hard to contain in a space. In this thesis, we propose a technique for automatic room-based group detection based on sound signals called Grouve. It is a technique that uses device speakers and microphones already present on the device for group formation. This thesis focuses on mobile devices such as smartphones and smartwatches, but it could also be used for other devices. We highlight several possible use cases for the technique and provide a proof-of-concept application that can create a meeting with everyone present in the room according to their private schedules. A user study and technical study were performed to test the Grouve concept. The result was conclusive: Grouve presents a user-friendly alternative with less mental and physical demand than the current manual group formation methods. Grouve, however, needs further research to make it usable on all types of devices due to the difference in hardware and the effect that this may have on the sound detection and creation. The maximum distance (80 cm) between sending and receiving devices is not acceptable either. Further research is needed to see if multi-hopping and the use of digital signal processing techniques other than those currently used could improve this.

4 Dutch summary The following Dutch summary will be submitted to the Vlaamse Scriptieprijs. Since it is aimed towards a younger audience, technical details are omitted. This makes it more understandable for a general audience. Samenwerken is luisteren Introductie Het vormen van groepen voor samenwerking is een element dat al sinds het begin van de mensheid bestaat. Van voedsel verzamelen zoals jagen, tot samen Pokémon Go spelen om op een kortere periode meer pokémon te kunnen vangen. Vaak willen we in deze omstandigheden informatie delen om ons doel sneller te bereiken. Het delen van informatie kan zowel mondeling als met behulp van technologische snufjes. Stel je voor dat je samen in één ruimte zit en je wil iets delen met alle aanwezige personen dat je niet mondeling kan meedelen. Met de huidige mogelijkheden moet je alle personen in de ruimte handmatig toevoegen aan een groep. Wat als we met één druk op een knop automatisch een groep konden creëren waarin we bestanden en ander informatie kunnen delen? Het magische aspect van geluid Grouve (Group wave) is een methode om groepen te vormen die gebruik maakt van geluid. Hiervoor worden reeds aanwezige luidsprekers en microfoons zoals deze aanwezig op een smartwatch of smartphone gebruikt. Wanneer een groep dient te worden aangemaakt, zendt het apparaat van de de aanmaken van de groep (door bijvoorbeeld een klik op een knop) een geluid uit dat de nieuwe groep moet voorstellen. Wanneer apparaten van personen die zich in dezelfde kamer bevinden dit geluid opvangen, weten deze apparaten dat ze zich kunnen aanmelden om deel uit te maken van de groep. Hierna kunnen dan elementen zoals tekst, afbeeldingen, kalenders enzovoort worden gedeeld tussen alle leden van de groep. Het voordeel van het gebruik van geluid is dat we geluid eenvoudig in één ruimte kunnen houden, in tegenstelling tot radiogolven die gebruikt worden in bijvoorbeeld NFC, Bluetooth en Wi-Fi. Denk maar aan Bluetooth apparaten in één ruimte waarmee men connectie kan maken terwijl men zich in een andere (nabijgelegen) ruimte bevindt. Dit zou kunnen zorgen voor fouten in de groepscreatie en eventueel zelfs het lekken van informatie. iii

5 iv Automatisch een vergadering inplannen Om aan te tonen hoe Grouve kan worden gebruikt, werd er een app gemaakt voor Android toestellen. De app die werd aangemaakt werkt als volgt: de gebruiker geeft een commando om een samenkomst aan te maken binnen een gegeven tijdsperiode. De applicatie vormt een groep gebruikmakend van Grouve en zoekt een tijdsperiode waarin iedereen beschikbaar is. Alle groepsleden kunnen dan kiezen of ze aanwezig willen zijn op het gevonden en voorgestelde tijdsstip. En testen maar... Grouve werd getest in twee verschillende studies. Allereerst werd er getest met een aantal gebruikers (9 deelnemers) welke soort van groepsvorming ze verkozen: de automatische (Grouve) of een handmatige methode? Hierbij werd aangetoond dat de meerderheid van de personen (6 personen) Grouve verkoos boven de manuele methode. Wanneer we echter de resultaten in detail bekeken was de reden voor dit relatief lage percentage dat de personen de gebruikte methode en de smartwatch die werd gebruikt niet gewoon waren. De fysieke en mentale belasting voor de deelnemers was veel lager voor Grouve dan voor de manuele groepsvorming. Deze lager mentale en fysieke belasting is het grootste voordeel van Grouve. Er werd ook vastgesteld dat bij het gebruik van de manuele groepsvorming fouten optraden als gevolg van verkeerde spelling. Er werd ook een studie uitgevoerd om het bereik en de prestatie van Grouve in omgevingen met achtergrondgeluid te testen. Tot op 80 centimeter tussen het ontvangende en zendend apparaat werd een 100% accuraatheid geconstateerd. Zowel in een gewone omgeving zonder als met achtergrondgeluid. Wanneer de afstand werd vergroot tot 1,5 meter werd er met 77% accuraatheid groepen ontdekt in een omgeving zonder toegevoegd achtergrondgeluid. Wanneer er echter geluid werd toegevoegd in de achtergrond, was er helemaal geen herkenning van een groep. Dit bewijst dat het plausibel is om geluid te gebruiken voor het vormen van groepen. Aangezien het geluid echter verzwakt ten gevolge van achtergrondgeluid en de verplaatsing van geluidsgolven, dient er gesleuteld te worden aan de implementatie van Grouve. Conclusie Grouve is een techniek om automatisch groepen te vormen met mensen aanwezig in één kamer. Om dit te verwezenlijken worden mobiele apparaten zoals smartphones en wearables zoals smartwatches gebruikt. Aangezien er gebruik wordt gemaakt van reeds aanwezige luidsprekers en microfoons, dient er geen extra investering te worden gemaakt. Tijdens het testen werd er aangetoond dat geluidsgolven een plausibele manier zijn om communicatie en groepsvorming te voorzien tussen apparaten. Ook de testers ervaarden deze methode van groepvorming als eenvoudiger en minder belastend. Aangezien de

6 v afstand tussen de apparaten met de huidige implementatie echter nog niet fantastisch zijn, dient er verder onderzoek te worden gedaan om deze afstand te kunnen vergroten.

7 Contents Acknowledgements Abstract Contents List of figures List of tables Contents Abbreviations i ii vi ix x xi xii 1 Introduction Problems Goals Thesis structure Related work Wearables General research themes Relevant papers Data over sound What is sound? Audible vs inaudible sound Modulation and demodulation Existing SDK s and research regarding data over sound Challenges Technology in corporate environments CSCW and group formation Concept Use cases Creating a meeting Sharing documents Discovery of nearby stationary devices Using Grouve to log and quantify group data Chosen use case vi

8 Table of contents vii 3.3 Design space analysis Paper prototype Flow of the meeting creation application Implementation Assumptions made for implementation Choosing the operating system Images in the application Speech recognition Recognizing commands and dates Google Calendar and free busy Clock watch face Database structure Data over sound Frequencies able to detect Audio programming in Android Creating frequencies Detecting the frequencies of the created soundwaves Checking for internet connection Results and evaluation Speech recognition libraries technical study Setup and data use Difficulties while testing Results User study Participants Apparatus Procedure Results Conclusion Technical study User base Robustness and range Conclusion Conclusion and future work Future work Studying high frequency effects on humans and animals A larger range of sound transmission Enlarge the amount of data being sent CPU usage and the DSP Speech recognition on the device Device calibration Heterogeneity Evaluation of real life use of Grouve Changing the group

9 Table of contents viii 6.2 Conclusion A Results of the technical study for speech recognition APIs 78 A.1 Table with the results of the Microsoft Cognitive Services speech recognition API A.2 Table with the results of the Google speech recognizer API A.3 Table with the results of the Watson IBM speech to text API B Code snippets 81 B.1 The creation of a sine wave B.2 Google service account usage B.2.1 Service account to gain access to calendars Bibliography 83

10 List of Figures 1.1 Evolution of collaborative technology An overview of wearables with different screen sizes A schematic view of triangulation Acceptability and preference rankings in BodyScape System overview of MoLe Visual representation of a sound wave Visual representation of audio frequency ranges Results hearing thresholds Lee et al.[22] UI of Chirp Different group creation patterns Grouve concept A cluttered user interface in Google Calendar DSA accepting or denying DSA how to detect end of meeting creation command DSA how to receive feedback on command given DSA for the maximum size of a group DSA for when to listen for commands DSA for the range of the command DSA for the notification of sending data DSA for notifying that a command was received Pencil sketches of the program flow Paper prototypes that were created Audio spectrums using Audio Spectrum Analyzer Technical study results speech API s Speaker location Samsung Galaxy Gear RTLX and SUS scores of the usability study ix

11 List of Tables 4.1 The group table of the database The member table of the database The members table of the database The task table of the database Technical specifications of the devices used in the small scale technical study regarding different speech recognition APIs Harvard sentences used during testing (list 2) RTLX scores mean and S.D. for every category SUS scores for every participant Recall when using smartwatch as the sending device F score when using smartwatch as the sending device Recall when using smartphone as the sending device F score when using smartphone as the sending device A.1 Results technical study word error rate (WER) Microsoft Cognitive Services speech recognition API A.2 Results technical study word error rate (WER) Google speech recognizer API A.3 Results technical study word error rate (WER) Watson IBM speech to text API x

12 Listings Code/regex.java Code/getGoogleAccount.java Code/createRecord.java Code/recordAudio.java Code/createSineWave.java Code/createServiceAccount.php xi

13 Abbreviations A Amplitude AM Amplitude Modulation API Application Programming Interface ASR Automatic Speech Recognition BLE Bluetooth Low Energy BYOD Bring Your Own Device CSCW Computer Supported Cooperative Work dba decibel A-weighting dbfs decibels relative to Full Scale DSA Design Space Analysis DSP Digital Signal Processing or Digital Signal Processor f frequency FFT Fast Fourier Transform FM Frequency Modulation Hz Hertz HWD Head-Worn Displays IoT Internet of Things NFC Near Field Communication NLP Natural Language Processing NUI Natural User Interface OS Operating System T period PM Phase Modulation POC Proof Of Concept xii

14 Abbreviations xiii QOC RTLX SNR SR SW UI UX WER ZCR Questions Options Criteria Raw Task Load index Signal to Noise Ratio Sampling Rate SmartWatches User Interface User experience Word Error Rate Zero Crossing Rate

15 Chapter 1 Introduction Since the dawn of time humans have collaborated to ensure their survival. It is commonly known that with collaboration, great ideas take form. Take the discovery of the structure of DNA for example. It only took form since James Watson and Francis Crick shared their thoughts and combined their knowledge into a structure model. A big part of collaboration nowadays is the existence of collaborative technology tools (the evolution of collaborative technology tools is depicted in image 1.1). There is even a community called Computer-Supported Cooperative Work (CSCW) 1 that focuses on the study of technologies for these collaborative activities. It consists of behavioral researchers as well as system builders who look at the intersection of collaborative behaviors and technology. They address how tools can facilitate, impair or change these cooperative activities. The instruments examined and used are often divided into one of three categories namely information sharing, communication, and coordination. However, to be able to collaborate, there is one important thing that we need to do: create a group. Without a formed group, whether it is formally or informally done, we do not know with whom to share, communicate or coordinate. In all sorts of collaborative tools such as DropBox 2, Slack 3, and Google Docs 4, we need to choose with whom to share our documents and conversations with. At the moment, 1 For a more elaborate explanation and exploration of CSCW take a look at book/the-encyclopedia-of-human-computer-interaction-2nd-ed/ computer-supported-cooperative-work

Chapter 1. Introduction 2 Figure 1.1: A schematic view of the evolution of collaborative technology over time. Source: http://www.zdnet.

16 Chapter 1. Introduction 2 Figure 1.1: A schematic view of the evolution of collaborative technology over time. Source: digital-collaboration-goes-deeper-gets-lightweight-and-intelligent/ by Dion Hinchcliffe. this can only be done by hand, either through sharing a designated link or by creating a group. This manual approach comes in handy when sharing data with remote users, but what if the users are all sitting together in one room? We could improve the speed by using proximity-based technologies to create a (temporary) group. Especially the business environment could benefit from such an approach. If time and effort could be saved by automatically creating groups when needed, workers could be less frustrated and more efficient which results in better company numbers. 1.1 Problems The creation of a group based on proximity within a confined space is only possible when we track the position of people with the help of devices. Because wearables are worn on the body, these are perfect for locating people. Especially since they contain a diverse range of sensors and connectivity possibilities specialized for data storage and data transmission. By using a stored account on the wearable linked to the person using the device, we can create a device group that represents a particular group of individuals.

Chapter 1. Introduction 3 The creation of a group is however restricted, in design and implementation, due to the hardware that is available in wearables nowadays.

17 Chapter 1. Introduction 3 The creation of a group is however restricted, in design and implementation, due to the hardware that is available in wearables nowadays. These restrictions include: Battery size Due to the small size of wearables, batteries are small as well. Small batteries mean that there is less power available, so we need to be careful with the amount of battery consumption. The battery size also has some implications for other hardware elements like the screen size and CPU capabilities which we will touch upon later. Not only the amount of battery consumption is important here, but also how intensive battery usage is. Since these wearables are worn on the body, they could create uncomfortable feelings due to increasing heat. A faulty implementation of wearables could even result in burns on the skin. Intel, for example, recalled the Basic Peak smartwatch due to overheating that led to blistering and burns on the wrist as well as melting charging cradles 5. Screen size Since wearables are worn on the body, most of them are small and light in terms of weight. Due to this, in combination with the available batterypower, screen sizes are often small and sometimes even unavailable (see image 1.2). This creates challenges for the design and implementation of applications. We either have to find other ways than visual feedback (for example because of lack of a screen) or a different customized representation to ensure that the content is clearly represented on a small screen. As far as interaction possibilities go, we are also limited because of screen size. Figure 1.2: This picture shows an overview of how screens can look like on wearables. Ranging from small screens to no screens. Source: by Amit Diwan. Connectivity In wearables we generally have two means of connecting devices: Bluetooth (classic or 5 Article regarding the overheating issues of the Basic Peak: 65e bd-11e6-8d05-4eaa66292c32.

Chapter 1. Introduction 4 BLE) and Wi-Fi. If we have to choose between these two options, Bluetooth has the main advantage because of less power consumption.

18 Chapter 1. Introduction 4 BLE) and Wi-Fi. If we have to choose between these two options, Bluetooth has the main advantage because of less power consumption. Using Bluetooth, however, means that we need to connect the device to another device (such as a smartphone) to access functions that need an internet connection. There are however some devices available that connect directly to the web through 3G or 4G. Still, the power consumption of these techniques calls for improved scheduling for data transfer. CPU capabilities and memory The size of the battery and device itself also limit the CPU and memory capabilities on wearables. Therefore we need to be careful about the amount of CPU usage as well as how we handle memory on the wearable. Besides wearables restrictions, we are also restricted in the way that we want to pair devices. Since groups are created according to the location of people and their devices, we could use localization sensing techniques. In general, there are three large groups of localization sensing techniques[17]: triangulation, scene analysis, and proximity. Triangulation makes use of the geometric properties to compute object locations (see image 1.3). It does this through lateration (length positions) or angularation (length combined with angular measurement). The techniques used to measure these distances can be: direct, time-of-flight and attenuation measurements. Figure 1.3: A schematic view of how triangulation is used to calculate the position of a smartwatch. We use the signal strength of the smartwatch with three access points. The signal strengths are represented here by circles with different sizes to represent the different signal strengths. When the signal strengths are known, we can perform a calculation to find the intersection point of these three circles. The intersection represent the location of the smartwatch.

19 Chapter 1. Introduction 5 Scene analysis uses features of a scene observed from a particular point and draws conclusions about the location of the observer or objects in the scene. It does this through static (predefined dataset) or differential scene analysis ( the difference between successive scenes). Issues, however, can arise since the environment can change rapidly which can give a faulty analysis. There also needs to be access to features of the environment against which it will be observed. Proximity can be done through detection of direct physical contact, monitoring wireless access points and observing automatic ID systems. All of these detection techniques need a method for identification. In other words: if direct physical contact or the monitoring of wireless access points is used, we need to attach an ID to the used devices. These techniques, however, are overkill for our group formation. To create a group with people in a confined space, we only need to pair user devices instead of truly localizing them. Only the position of individuals relative to each other is meaningful, not the exact position. Therefore we would only need pairing techniques to pair the user devices apparent in the confined space. At the moment, radio waves are often used to pair two or more devices. Examples of techniques that use radio waves are Near Field Communication (NFC) Bluetooth and Wi-Fi. Radio waves support large bandwidths but can also easily penetrate boundaries such as the ones set by a confined space. This can lead to data leaking outside the set perimeter even when the transmission power is adjusted (lowered). The data leakage creates two types of problems: a security risk due to the data being leaked as well as inconveniences for device detection since devices in a nearby space might also be detected. For the formation of a group in a confined space it would therefore be better to use another type of wave. Our eye falls on audio waves. Since these waves mimic characteristics of the human voice, they have a more predictable distribution pattern than radio waves and are more shielded by boundaries such as walls and a set perimeter. Another positive element in using sound waves for group formation is that we can use already present elements in the devices, the speaker and microphone, to incorporate our group formation technique.

20 Chapter 1. Introduction Goals In this thesis, we want to create temporary user groups of user devices for enhancing collaboration in a corporate environment. Here a group is defined as people who are present in a confined space. The term confined space can be a conference room, a hallway or any other room that has a boundary. As a proof of concept (POC), we developed an Android application for mobile devices that was visually optimized for smartwatches and uses soundwaves as a communication base. The devices primarily used for the group creation are smartwatches since they can be linked to a person and are worn on the body. Since they are worn on the body smartwatches are assumed to always be present during group formation. To demonstrate a use case for the automatic group creation, we implemented an application that can create a meeting for the people present in the room. Before we began developing the application, we thought about different use cases for which we could use the group formation. We then chose one particular use case, creating a meeting, and started the development process. A small design space analysis (DSA) was performed to reason about the design and to create a visual representation so that others could understand why we made certain design choices. The DSA resulted in a paper prototype that we tested during a small scale usability test. The prototype and design were then slightly altered to accommodate for the remarks made during usability testing. Before the implementation started, data was collected with the different devices used (smartwatch and smartphone) during development. The first goal of the data collection was the detection of noise level so we could define thresholds for the decoding of sound waves. The second goal consisted of establishing which frequencies could be sent and received by the standard microphones and speakers present on the devices. After this phase, we implemented a prototype of the meeting creation application. To test the feasibility of the application, we then performed a user study with a limited version of the meeting creation app. A technical test was also conducted to verify the user base, range, and robustness of the application.

21 Chapter 1. Introduction 7 The core contribution of this thesis is our group formation approach based on sound called Grouve along with an application to provide meeting creation based on an addhoc created group. We support our approach with a technical evaluation and user study that discuss the viability and usability. 1.3 Thesis structure The thesis will be structured as follows: In chapter 2 we will discuss related work regarding wearables, data over sound, technology in corporate environments, and techniques for group formation. This research will form the base of our work and is used to motivate some of our decisions regarding the concept and implementation of Grouve. Chapter 3 will present the concept of Grouve and the meeting creation application along with the DSA, paper prototype and small usability test that all had an influence on how we implemented the concept. Here we will explain the design choices we made in the context of usability. Chapter 4 discusses the implementation itself. We will go into detail about the technologies that we used and how we constructed the code. We also address problems regarding implementation in this chapter. Chapter 5 will present the evaluations that we performed. A small technical study regarding speech recognition libraries, a usability test for a limited version of the meeting creation application, and a technical evaluation on robustness, user base, and range will be represented to discuss the viability and usability of Grouve. Finally, we close with chapter 6, drawing our conclusions and discussing how we could improve Grouve with future work. We do need to state that a patent application is pending for the Grouve concept. There are also two papers in preparation that discuss some of the contents of this thesis. We are planning to send these papers to PerCom 2017 and CHI 2017.

22 Chapter 2 Related work In this chapter we will discuss relevant themes for this thesis. First wearables and their general research themes will be discussed. Then the use of sound for data transmission will be discussed, followed by a discussion of technology usage in corporate environments. Finally we will discuss current techniques available for ad-hoc group formation using mobile devices. 2.1 Wearables Steve Mann, who has been referred to as the father of wearable computing [26], defines wearable computers or wearables as following: A wearable computer is a computer that is subsumed into the personal space of the user, controlled by the user, and has both operational and interactional constancy, i.e. is always on and always accessible. [25] (Steve Mann) Since we wear wearables on the body, it is an excellent candidate to use for our group formation. 8

23 Chapter 2. Related work General research themes Wearables come in different types and sizes. Possible types of wearables are fitness trackers, smart glasses, smart watches, smart clothing and smart jewelry. Each of these types of wearables has its characteristics. There is only one thing they all have in common: they need a battery. Optimistically battery capacity is high. However, due to the small size of wearables, this often isn t a possibility. This size restriction leads to batteries that need frequent charging. When creating applications for wearables, we need to take this into account. Wearables have four input possibilities: touch input, voice input, physical buttons and gestures. Every possibility has its negative and positive aspects. Touch input can become cumbersome because of the fat finger problem. This issue arises from using a touchscreen with small (touchable) elements such as buttons, input fields, and hyperlinks. Fingers can be too broad for these little elements which can result in wrong elements being touched/selected. Occlusion of particular elements may also be a result of small touchscreens. Touch input, however, is still an excellent method to interact because of the great tactile feedback and easy nature. Voice input is one of the most intuitive ways to communicate. The last few years this input approach has been widely used (Siri, Google Now, S Voice,... ). Problems that can emerge from the use of voice input are ambiguity, high calculation costs and errors in speech registration. These problems often have a profound connection with the complexity of the natural language. Some words can have different meanings in which context is a major factor. The program has to be able to derive the right meaning according to this context and this itself can create high response times. Errors in the input registered can arise because of background noise and incorrect pronunciation (dialects, wrong intonation, and a difference in the emotional state of being). Gestures can also be used as input. It is an intuitive and quick way of communicating with wearable devices. Gestures, however, aren t error prone and can cause false positives which will result in faulty communication. Tactile feedback is unavailable. This deficiency makes it hard to recognize why a movement was not correctly registered.

24 Chapter 2. Related work 10 Physical keys have tactile feedback. When applied to wearables, these keys often are small and aren t as easy to handle. It also lacks the intuitive nature of interaction. Reachability is an important aspect when it comes to wearables. Wearables itself, are often created because the information that they collect or provide aren t in reach in already existing devices. Smartwatches, for example, are often used for displaying notifications from the smartphone. In this way, we have easy access to this information, and we can continue our activities or interact with our smartphone/smartwatch according to the importance of the notification. With the rise of wearables, many privacy issues came up. Wearables such as smart glasses have camera s attached to the device. Many people think that this is an intrusion into their privacy because they could be filmed while walking down the street. Security, however, may be an even greater source of concern. The data collected from wearables such as activity trackers could easily be used to track every single movement of the user when hacked. As with many technologies social acceptance can be a breaking point for the success of wearables. The location of the wearable on the body is an important aspect of social acceptance. If the device is invisible, people are more likely to accept it. If interaction with the device is necessary, the location of the interaction on the body is also an important aspect for the acceptability. A touchpad on the inner leg, for example, won t be as acceptable as one on the wrist. Acceptability also has a strong connection with culture. Depending on the culture of the location where the wearable is used, as well as the culture of the people using and interacting with the public display, acceptance can be higher or lower. Chinese people, especially in China itself, for example, are well known to adapt to and accept new technologies Relevant papers Min et al.[27] studied the practices of battery use and management of smartwatches. They tested the users (primarily male between 20 and 30 years old) use patterns and battery usage through an on-line survey. Questions asked were directed to understand the primary usage, concerns about battery life and recharging patterns. These patterns strongly diverted from the smartphone usage registered. The survey suggested that

25 Chapter 2. Related work 11 concerns about battery life were not as high as with smartphones primarily because the primary usage of the smartwatch is smartphone-dependent. They stated that this would possibly increase if special features were available to use the smartwatch as a standalone device. When interacting with smartwatches, participants interacted for a short time (13 seconds on average) but frequently. Smartphones were often used for much longer periods. This difference in usage times could mean that users think that smartwatches are ideal for short-term interactions while smartphones are more suited for long-term interactions. Wigdor et al.[39] describes LucidTouch, a mobile device that addresses the limitation of touch input on a small screen. It addresses the occlusion of graphical elements by the user s fingers. The mobile device uses pseudo-transparency to control an application by touching the back of the device. Pseudo-transparency uses an overlay of the user s hands onto the screen. The user tests indicated that users preferred touching the back instead of touching the front. The reasoning for this was the higher precision, multi-finger input, and reduced occlusion. Baudisch et al.[2] also illustrates the uses of back-of-device interaction. They state that back-of-device interaction works practically independent of device size. The interaction that they used (shift interaction) only failed for screen sizes under one inch. Zhang et al.[41] compared GUI input on a mobile device with speech input. They tested both input methods for the AT&T s Visualizer management service. Their goal consisted of taking advantage of natural speech input to improve the user experience (UX) when accessing web applications on mobile devices. They stated that it would be reflected in an accurate and efficient response to queries, improve user performance in the mobile environment, and a decreased cognitive load. To support their hypotheses an exploratory field study was conducted in order to measure efficiency, effectiveness and user preference. Effectiveness proved to be the same in both input methods. Speech input however was more efficient and was preferred by the users to perform the tasks. Wagner et al.[37] illustrates a design space called BodyScape that classifies body-centric interaction techniques. It does this on multiple surfaces according to input and output location relative to the user. A controlled experiment for studying acceptability (see figure 2.1) and performance of mid-air pointing and 18 on-body touching targets was conducted. Both techniques were tested combined and individually. The study showed

Chapter 2. Related work 12 that participants were least efficient with targets on the lower body and on the dominant arm (especially when combined).

It also resulted in three guidelines that need to be taken into consideration when designing on-body interaction: task difficulty, body balance, and interaction effects. Figure 2.

26 Chapter 2. Related work 12 that participants were least efficient with targets on the lower body and on the dominant arm (especially when combined). The most efficiency was registered when using targets on the torso. The users themselves preferred targets on the upper body. It also resulted in three guidelines that need to be taken into consideration when designing on-body interaction: task difficulty, body balance, and interaction effects. Figure 2.1: Median preference and acceptability rankings of on-body targets (from green = acceptable to red = unacceptable). Source: Wagner et al.[37]. Wang et al.[38] describes research where motion sensors of a smartwatch were used to reveal what was typed. They processed accelerometer and gyroscope signals while tracking the wrist its micro-motions. Combining this data with the structure of valid English words guesses can be made about typed words. MoLe has a 30% chance to narrow the typed word down to 5 possibilities and 50% chance to narrow it down to 24 possibilities. Figure 2.2 show a system overview of MoLe. Figure 2.2: Data that was typed by the user is pre-processed through gravity removal and timing analysis blocks, superimposed on the refitted typing templates, and passed through a Bayesian inference model. The Bayesian inference model leverages the patterns and structures in English words so it can ultimately decode the typed words. Source: Wang et al.[38].

Chapter 2. Related work 13 2.2 Data over sound As stated before, we want to use sound to detect devices in a confined space and to pair them.

2.1 What is sound? Sound waves are mechanical waves[12]. Energy in sound waves propagates through mediums such as matter, gas, and fluids by using particles present in the mediums.

27 Chapter 2. Related work Data over sound As stated before, we want to use sound to detect devices in a confined space and to pair them. Therefore we will only focus on sound waves in this section. We will not discuss radio waves any further since our choice for soundwaves was already made clear in the introduction chapter What is sound? Sound waves are mechanical waves[12]. Energy in sound waves propagates through mediums such as matter, gas, and fluids by using particles present in the mediums. The propagation is possible by particles oscillating, moving back and forth, over a central neutral point called the equilibrium position. If there are no particles present, such as in a vacuum, sound can not propagate. Figure 2.3: The visual representation of a sound wave and some of its variables. We can depict a sound wave in graph form such as in figure 2.3 where we represent it as pressure over time. We can create such a sound wave that accounts for a single frequency by generating a waveform 1. Since we are only interested in the purest waveform and don t want to introduce harmonics 2, we only take a look at the formation of a sine wave. Before we describe sine waves to create a particular frequency, we have to describe some essential elements regarding sound waves: Amplitude (A): the magnitude of the maximum displacement counted from the equilibrium position. Since it is the magnitude, it can never be negative. In short, it represents the intensity of the sound. 1 More information about the four basic types of waveforms: jkrug/ MUS364/audio/Waveforms.htm 2 Harmonics are waves whose frequency is a multiple of the reference wave.

28 Chapter 2. Related work 14 Equilibrium position: the position of the particle when it is at rest. Period (T): the time that a particle needs to complete an entire vibrational cycle. Frequency (f): the number of complete vibrational cycles in a certain amount of time. The f is often represented in Hertz (Hz) which means the number of complete vibrational cycles per second. Humans perceive this as pitch. Wavelength: the distance between repetitions of a shape. Phase (ϕ): the position of a point in time on a waveform cycle. The measurement of phase is degrees, and one waveform cycle is 360. Sampling rate (SR): the number of samples taken per second. When representing audio as data, this is a crucial variable. The SR also sets the Nyquist frequency (half of the SR) which represent the maximum frequency (in Hz) representable with the chosen SR. We can describe a sine wave as a function of time (t): y(t) = A sin(2πft + ϕ) = A sin(ωt + ϕ) (2.1) where A is the amplitude, f is the frequency that needs to be represented, t is the time, and ϕ is the phase. Since periodic time can be seen as the inverse of the SR, the t can be represented as t = 1 SR (2.2) A characteristic of sound that can have implications when using sound as a data transferring method is the Doppler effect[12]. When the registering person or device is moving, or the sound source is moving, the frequency can be perceived lower or greater than it actually is. If one or both elements move to each other, the frequency is perceived as higher since peaks in the sound waves are picked up quicker by the listening person or device. The opposite effect establishes when one or both elements move away from each other. Gupta et al.[14] used the Doppler effect to detect gestures by using commodity computing hardware.

29 Chapter 2. Related work Audible vs inaudible sound In figure 2.4 we can see a division of the audio frequencies according to the audibility for humans[12]. The first band (blue) represents the frequencies from 0 to 20 Hz and are called infrasonic since they are below the audible frequencies. The second band (yellow) accounts for the frequencies ( 20 to Hz) that are audible to humans. The third band represents all audio frequencies above the audible frequencies for humans ( Hz and up) and are called ultrasonic. It is important to note that although the range of audible frequencies for humans is set from 20 to Hz, this can vary from person to person. Figure 2.4: A visual representation of the audio frequency ranges. The upper limit of the audible frequencies for humans decreases as age progresses. Multiple studies have tried to register these hearing thresholds according to age and lifestyle[1, 3, 22, 30]. Lee et al.[22] studied the hearing thresholds between 125 and HZ in humans. 352 subjects, in the age range from years were evaluated during the study. The participants were grouped according to their age, which resulted in 5 age groups: years, years, years, years and years. The hearing from all the individual subjects was plotted as a function of age at different frequencies. These show that when age increases, the db threshold goes up which means that for the higher frequencies to be perceived, more db are needed as age increases. As the results show in figure 2.5, Hz is not as easily noticed by a 60-year-old as it is for a 10-year-old. Since high frequencies such as Hz become inaudible as people age, they could be used for data communication without disrupting the environment. Filonenko et al.[9] tested if smartphones were able to generate low ultrasonic frequencies for the use of an indoor positioning system. More particular frequencies in the range of Hz. They noticed that although all tested smartphones were able to output these frequencies, for most of the smartphones noise was created if the volume of the device was set too high. These results support that mobile devices can output low ultrasonic (inaudible) sound, but the outcome depends on the exact hardware of the apparatus.

Chapter 2. Related work 16 Figure 2.5: The hearing thresholds from individual subjects plotted as a function of age. Data recorded at different frequencies are displayed in individual panels.

30 Chapter 2. Related work 16 Figure 2.5: The hearing thresholds from individual subjects plotted as a function of age. Data recorded at different frequencies are displayed in individual panels. The solid lines represent a one- or two-line linear regression fit to the data. The dashed lines represent a similar fit after eliminating all non-responses. Source: Lee et al.[22] Modulation and demodulation To enable data transmittal through sound, we can use one of three modulations: amplitude, phase or frequency modulation. Each type of modulation adjust something in the carrier sound so data can be incorporated. In Amplitude modulation (AM) the amplitude of a signal is varied to transmit data. It is however very vulnerable to noise in respect to the other two modulation methods. Frequency modulation (FM) makes use of a change in frequency. Its amplitude stays the same but the phase changes. Phase modulation (PM) varies the phase of the signal; the amplitude stays the same but the frequencies change. We will focus on FM in the remainder of this thesis since it is simple to implement and is more robust than AM. In our case, however, we can not exactly use the term FM since we do not use a carrier wave. For the sake of simplicity, since we do use a change in frequency to represent data, we will still call the used technique FM through this thesis.

31 Chapter 2. Related work 17 When using FM, we want to estimate the frequency being sent at a certain time. This estimation of the fundamental frequency (f 0 ), sometimes referred to as pitch detection, has been a popular topic in research for years. Many researchers have attempted to create f 0 estimators and succeeded in context-specific attempts. Little are appropriate for more than one domain. Most research revolves around the detection in speech and music domains[8, 19, 31]. Our goal, however, is an f 0 estimator for recognizing high frequencies ranging from to Hz. There are three general domains of f 0 estimation algorithms: Time-domain methods: in which the problem is approached by attempting to detect the f 0 from the waveform that represents the signal as the change in air pressure over time. Frequency domain methods: in which sinusoidal peaks are located in the frequency transform of the input signal. Time-domain in combination with frequency domain methods. Gerard David[11] gives a great overview of the history and current techniques available for pitch extraction and fundamental frequency. Since none of the techniques described are specialized in frequencies ranging from to Hz, we use a general technique adjusted to work with our chosen frequency range. The used technique will be discussed in the implementation section Existing SDK s and research regarding data over sound Chirp.io 3 uses sound to transfer an identifier sequence. It has a 32 characters alphabet mapped to 32 pitches which are a semitone 4 apart ranging from 1760hz to 10500hz. The entire identifier sequence consists of 20 pure tones (87.2ms each). The 30 tones are divided as follows: the first two indicate that the following tones are a chirp shortcode; the ten tones that follow represent the 10-character payload; the last eight tones are Reed-Solomon error correction characters. The error corrections make sure that the code can be reconstituted when over 25% of it was missing or misheard. The id that is A relative distance between two notes.

32 Chapter 2. Related work 18 perceived leads to a picture, text or link stored on the Chirp server. In other words, there is no group formed. The sending of data can be anonymous or identified (Facebook or Twitter account) according to the sender his or her liking. Due to the range of frequencies used to create the chirp, humans can hear the sound emitted. When the sound emitted is recorded and played, we can receive the data as if a device would have created the Chirp itself. An essential element to mention is that when music is playing in the background, this method, due to the low frequencies used for mapping the alphabet characters, often does not work. Figure 2.6: The user interface (UI) of the Chirp application when a chirp is sent to send a picture. Nearbytes 5 uses sound to transfer the data itself and not just a link like Chirp.io. It uses high frequencies to encode the data. These frequencies, however, are still audible 6. The creators also state that for the program to work efficiently, both devices (listening and sending) need to be in a proximity of 10 cm of each other. LISNR 7 also uses sound to communicate between devices. They use inaudible sound waves that they call Smart Tones to transmit the information. They use the Hz frequency range to transmit data. They provide the means for a customizable data transmission with a detection time, approximately 0.6 to 3 seconds, based upon usage. Each Smart Tone consist of three parts: a preamble that announces the presence of a signal, a header that contains metadata (checksum for error correction and how much data will follow) about the payload, and a payload (actual data). They state that Video of a Nearbytes demo: 7

33 Chapter 2. Related work 19 the range in which Smart Tones are perceivable depends on the volume of the sending device: the higher the volume of broadcast, the farther the tone can be picked up. We suspect that there exist more SDK s for data over sound transmission, but since all of these SDK s seem to build upon the same techniques, we will not discuss more SDK s. Nitalla et al.[28] created PhoneEar. It is an approach that incorporates listening for sound encoded data in high frequencies (17 to 20 khz) on mobile devices. They used frequency-shift keying (FSK) modulation to encode information. The main goal was to incorporate data in existing audio such as audio commercials and music for contextbased recommendations, enhanced notifications, enhanced directions(shutting off your phone on a plane for example), and to show complementary information. Sun et al.[34] implemented Spartacus: a mobile system that enables spatially-aware neighboring device interactions without any prior configuration. They enabled users to initiate interaction by using a pointing gesture at the device. They used built-in microphones and speakers along with an acoustic technique based on the Doppler effect. Spartacus achieved an average 90% device selection accuracy in most of their interaction scenarios within a three-meter distance. The usage of their continuous audio-base lowerpower listening mechanism that triggered the gesture detection service achieves about four times lower energy consumption than Wi-Fi Direct and 5.5 times lower than the latest Bluetooth 4.0 protocols Challenges There are a few challenges when implementing data transmission over sound. Noise can interfere with the data sent. Noise could be introduced through the electronic components of the device that is sending the data, the device listening, or other sounds (speaking, clapping, and so on). If the signal to noise ratio (SNR) is high, it can become difficult to detect the fundamental frequency. Although lots of algorithms exist that try to resolve this, there is no solution for all frequencies. Most of the research dedicates itself to searching solutions for fundamental frequency detection of speech and music[13, 15, 40]. Microphone sensitivity is an important element that we need to take into account. If the microphone on a device is not sensitive enough to register the frequencies sent

34 Chapter 2. Related work 20 used in the data modulation, the data can not be received. Some microphones can even introduce extra noise. The Speaker is also an essential element in the data exchange. If sound quality is low, the listening device may not receive the data correctly. The decibels that a speaker can output at specific frequencies (frequency response) has a significant influence on the audio quality. These need to be high enough so the sound can be carried far enough while not being too high since the inaudible sounds could become perceptible for human ears. Another influence on the audio quality is total harmonic distortion (THD). THD[9, 21] represents the faithfulness of the original audio being translated by the speaker. The lower the THD, the better. Limited CPU and memory are a high challenge when it comes to data transmission via sound. Real-time modulation and demodulation techniques can be high on memory and CPU usage. Therefore we must take this into account. 2.3 Technology in corporate environments Bring your own device (BYOD) is becoming a greater part of company culture. In BYOD the same device is used for personal as well as business purposes. This dual usage creates many security issues that must be taken into consideration. Security risks ranging from company espionage, viruses getting into the business IT infrastructure to even losing these devices that have access to company data. Containerization techniques can be used to ensure the separation of enterprise content and personal data[29]. This separation of content, however, is only one component of the security measures needed when mobile devices are at play such as access control, next-generation firewalls, BYOD control mechanisms, and BYOD management and policy. Vorakulpipat et al.[36] provides an overview of the research and issues regarding mobile device security with a focus on critical infrastructure. Oluwatimi et al.[29] focuses on post-installation application restriction policies with their approach called DroidARM to ensure complete separation of personal and enterprise data. These policies restrict the capabilities of the mobile applications at run-time. For DroidARM, data shadowing is used to protect data and system resources regarding the enterprise side of the device.

35 Chapter 2. Related work 21 But not only BYOD is a security risk, wearables and mobile devices, in general, can become a liability since these can be used for company espionage. Not only a rogue employee could use the device to leak company information by using cameras or microphones apparent on the device but also unwilling employees could be used to extract company information. From password detection[38] using a smartwatch to device take over to take pictures and record audio. When introducing new technology in a company, we need to take acceptability into account. The technology acceptance model (TAM) of Davis et al.[7] set the base for research regarding technology adoption and use. Davis suggested in his original research that the user s motivation can be explained by three factors: attitude toward using the system, perceived usefulness 8, and perceived ease of use 9. So when introducing technology, we need to pay attention to the perceived ease of use and perceived usefulness of the users. If the user does not think that his or her job will be easier using the new technology, he or she will not invest time in using it. Privacy is another important factor. The evolving technology makes it easy to track and log everything that an employee does. From keystroke logging to position tracking with activity trackers and even using sensors to detect what was typed[38]. The information received from these Big Brother devices and software packages, can be beneficial for the company but what about privacy? Employees may not like the fact that employers are tracking their every move. Where does the tracking stop? Can the employer track the employee at home and register their night rest? Can this information be used to increase productivity or punish and even fire employees? These are all questions that need to be answered and documented when bringing new technologies inside the business environment. Sadly there are no guidelines available to use yet. 2.4 CSCW and group formation The CSCW focuses on the study of technologies for collaborative activities. Here we will discuss existing methods to mobile device groups for collaborative interactions. These techniques are all based on device binding. 8 The degree to which an individual believes that using a particular system would enhance his or her job performance[7]. 9 The degree to which an individual believes that using a particular system would be free of physical and mental effort [7].

Chapter 2. Related work 22 (a) EasyGroups group creation pattern. (b) Ring variant group creation pattern. (c) A FlexiGroups group creation pattern. (d) A FlexiGroups group creation pattern. Figure 2.

Source: Jokela et al.[18]. Jokela et al.[18] started with a study to test if people are willing to share their mobile devices to engage in collaborative interaction.

Sharing a device was problematic for most people when the person with whom the device needed to be shared with was a complete stranger.

There still resided some concerns regarding potential damage to devices although users stated that the benefits outweighed the potential risk of damage.

36 Chapter 2. Related work 22 (a) EasyGroups group creation pattern. (b) Ring variant group creation pattern. (c) A FlexiGroups group creation pattern. (d) A FlexiGroups group creation pattern. Figure 2.7: Different group creation patterns. With the FlexiGroups the arrows indicate touch actions between participants. The numbers in the white circles indicate the order of the touch interactions. Source: Jokela et al.[18]. Jokela et al.[18] started with a study to test if people are willing to share their mobile devices to engage in collaborative interaction. They explored mobile collocated interactions to encourage people to share their devices to reach a common goal or create a collective experience. Sharing a device was problematic for most people when the person with whom the device needed to be shared with was a complete stranger. With family members and close friends, this was less of a problem. There still resided some concerns regarding potential damage to devices although users stated that the benefits outweighed the potential risk of damage. As a result of the observations made, EasyGroups was implemented. It is a binding method that relies on devices touching each other. A leader, adds a member to the group by physically letting both devices touch. The leader then continues this action to add new members until the full group is created. An alternative called Ring was also created in which the role of the leader gets passed on to the newly added group member. At last Flexigroups was created, a version of EasyGroups that provided more freedom in the group creation. In figure 2.7 an example of the different group creation patterns possible with the applications mentioned is shown.

37 Chapter 2. Related work 23 Chung and Mujibiya[6] created Shuriken: a user grouping and data transfer method based on inter-device relative positioning. Shuriken uses BLE in combination with builtin sensors that are typically present in smart devices. Devices are linked by pointing them towards each other. A BLE connection is then formed. The received radio signal strength (RSS) and digital compass readings are then obtained to see if the device need to be linked. If the RSS falls into the set thresholds and the compass readings are opposites from each other, the devices are linked. Once the devices are connected, data can be sent via the BLE connection. Collaborative shopping and offline business meetings are said to be use cases of this data transfer method. Chong et al.[5] identified four general categories for interactive device association techniques: Guidance: techniques where users need to act in the real world to connect devices. The user needs to perform a physical action on the devices to guide them to find each other. Physical contact, pointing, visual alignment, collocation, proximity, and physical extension are all part of the guidance category. Input: a technique that is based on the input that a user gives on the device. Character input, button pressing, shaking, impact, and cross-device gestures can all be seen as techniques of in the input category. Enrolment: an identity is set on the device and then shared with devices to whom it wants to connect. Biometric elements, rhythm tapping, and identifier entry can all be seen as enrolment. Matching: users compare the output of the involved devices to confirm or reject a connection. Text comparison, pattern matching, and spatial validation are all examples of the matching category. All of these interactive device association techniques could, in theory, be used to create an ad-hoc room-restricted group.

38 Chapter 3 Concept In this chapter, we will discuss the concept of automatic group creation called Grouve. First, we will look at some use cases for the ad-hoc room-based group formation based on sound. Then we will choose the most plausible one and discuss the DSA, prototype and small scale usability study performed for the use case. 3.1 Use cases The overall goal of the application is automatic group creation in a confined space using user devices (preferably wearables). By automatically creating the group, we want to stimulate opportunistic collaboration, particularly in a business environment. We will use sound to communication between user devices. The main idea for automatic group creation goes as following: When a group needs to be formed in a confined space, one user will use their user device as a master device that communicates with all surrounding devices in the room. This master device will serve as the group creator. All other devices are seen as regular members of the group and will be added to the group when a particular sound is caught. To take on the master role a device needs speakers that can emit the sound produced for group creation. To be able to perceive the sound waves, listening devices need a microphone able to detect the sound. If a device contains both a speaker and a microphone, it can serve as a master and a listener. Image 3.1 shows a visual representation of the Grouve concept. 24

39 Chapter 3. Concept 25 Figure 3.1: A visual representation of the Grouve concept. When we want to create a group with the 3 people that are present in the room, one person (the group creator) sends out a sound representing a group id in order to create the group. The two other people present in the room register the group id and add themselves to the group through accessing the group server Creating a meeting Meetings are a huge part of collaborations in every sector. The planning of these meetings is often done when casually running into someone in the hallway or at the end of another meeting. Due to busy schedules, it can be hard to find an open slot for all the people involved while keeping the bigger picture in mind especially when a multitude of people is involved. Sometimes the elements used to plan these meetings like pen and paper, a portable computer or a smartphone, aren t even in reach. Therefore a smartwatch that is always on the wrist could be used. At the end of a meeting, the team leader could give a voice command to plan in a meeting with everyone that is present in the room. The smartwatch then passes through the command to the company server which gives the command to all company smartwatches to listen for the next few minutes. Within those minutes, the team leader s smartwatch will emit a sound that can not be reproduced easily. All the smartwatches in the proximity of the team leader s watch will then pick up the signal. The smartwatches that picked up the signal will then be part of the team that needs to be included in the calendar search. When the people in the room are recognized, the server will search for an open slot that is available for every calendar. When everyone agrees, the calendars are all adjusted. Planning in the meetings could also be accompanied by planning in a room big enough for the employees present.

40 Chapter 3. Concept Sharing documents To enable opportunistic collaboration, people need to have the opportunity to share documents and other files effortlessly. Devices used to share these files, however, such as computers and smartphones, are often out of reach during coffee runs and small talk. To enable the effortless sharing of these files without losing mobility and constantly carrying devices around a smartwatch could be used. Due to its small size, voice command possibilities and near-hand placement (on the wrist), the smartwatch is ideal for these opportunistic collaborations. To share documents easily with a smartwatch, they should be stored accompanied by keywords that describe the content and context of that document. These keywords could be automatically subtracted from the documents or given in by hand. When two people are interacting, one could give a voice command to the smartwatch to search for one or multiple keywords. The smartwatch could then pass the command and retrieve all the documents connected to these keywords. The watch itself could visualize the documents. Gestures or voice commands, according to the user s preferences, can then be used to navigate through the documents and share them with the people in the room. The sharing of the documents will rely on the same method that the creation of the appointments did (the use of sound) Discovery of nearby stationary devices When stationary devices such as a tv or a printer are equipped with a microphone, a group could be formed between a user device and this stationary device. The group formation technique based on sound could then be used as a discovery method for suitable devices within reach. Jobs like screen sharing and printing would need less effort to be executed in this manner. This discovery of nearby devices could also be used to detect where people find themselves. Imagine the situation in which colleagues need to discuss something. One of the employees knows that the other one is in the building, but he does not know where his colleague exactly is. We could use the group formation technique to find the exact room where a colleague finds himself. We could find the missing employee by letting stationary devices (such as TVs) emit a sound based on a group id and an id from the

41 Chapter 3. Concept 27 sending device. When a listening device hears a sound, and the account id installed on the device matches the person sought, the server can add the id of the emitting device to the group. When a database is assembled of all the emitting devices and the rooms that they are in, we could match the person to an emitting device and thus a room Using Grouve to log and quantify group data The Grouve concept could be used to log and quantify face-to-face meetings such as Vanderhulst et al.[35] did with Wi-Fi radio signals. The problem with using Wi-Fi to detect and quantify these human encounters is that there is no locality aspect as there is with sound waves. Radio waves are hard to contain inside a room. Therefore the constant detection and sending of sounds could be used to detect other people present in the room. We could then log the number of people present and how long they were present in the room at the same time. These types of data could be used to track the behavior of workers and even children with autism to present an insight in their social development. 3.2 Chosen use case Nowadays much time is spent on planning meetings in the working environment especially when a multitude of people is involved. Applications that are currently available for planning with multiple agendas have many steps to go through before a date and time is set(for example Google Calendar and Outlook Calendar). When a time slot is appointed, all members need to accept or reject this timeslot. This process can cost a considerable amount of time since response time can be high. The UI usually used for applications that allow a multitude of calendars such as Daylite 1 and Google Calendar, often gets cluttered as can be seen in figure 3.2. For this thesis, we picked the meeting creation use case since it is a subject that lives in the working environment. Genee 2, for example, an automated calendaring service based 1 Daylite is a calendar application for Mac, iphone and ipad. With this app you can share calendars with your team. 2

Chapter 3. Concept 28 Figure 3.2: A cluttered user interface with multiple and shared calendars in Google Calendar. Source: screenshot from the video https://www.youtube.com/watch?

42 Chapter 3. Concept 28 Figure 3.2: A cluttered user interface with multiple and shared calendars in Google Calendar. Source: screenshot from the video by Jennifer Bagley. on natural language processing, was recently acquired by Microsoft3. Also, Google Calendar for work recently released a feature that helps to find the best time for a meeting automatically when scheduling it in with other people4. Our concept will tackle previously named elements and will be able to plan in a meeting by using a short amount of time with a short response time by using smartwatches. By using a simple voice command to find an open slot for everyone present in the room automatically, we can avoid the cluttered user interface and the huge amount of steps usually needed. 3.3 Design space analysis Before the prototype construction, a small DSA[24] was performed. A DSA consists of three elements: questions, options, and criteria (QOC). These are the three most basic concepts of DSA. Questions are the key issues that we need to resolve. Options consist of possibilities solving the questions asked. Criteria include the reasoning why we could use certain options to address the question. These three components are represented in the DSA scheme by using Q, O, and C before the textual representation. When a negative assessment is made between a criteria and an option, we draw a dotted line. 3 microsoft-buying-genee-ai-artifical-intelligence-calendar 4 google-calendar-find-a-time-office-meetings

43 Chapter 3. Concept 29 With a positive evaluation between both elements, we draw a full line. A border is drawn around the chosen option and the criteria that have the highest priority. The DSA was used to discuss the input methods, display methods and interaction methods using the application. It also defined some variables: the range of the group detection and group size. Since we think that the schemas are clear enough, we will not provide any other explanations regarding the DSA. Figure 3.3: DSA that proposes the reasoning for user interaction when accepting or denying an action. Figure 3.4: DSA that proposes the reasoning for detecting the end of a command given. Figure 3.5: DSA that proposes the reasoning for getting feedback when a command was given.

44 Chapter 3. Concept 30 Figure 3.6: DSA that proposes the reasoning for the maximum size of a group. Figure 3.7: DSA that proposes the reasoning for when devices should start listening for commands. Figure 3.8: DSA that proposes the reasoning for the range that should be covered when giving a command. Figure 3.9: DSA that proposes the reasoning for what type of feedback should be used when sending data. Figure 3.10: DSA that proposes the reasoning for what type of feedback should be used when notifying the user that a command was registered.

45 Chapter 3. Concept Paper prototype Out the DSA and the concept in general, a paper prototype was formed. The flow of the program was initially created in the form of sketches (see figure 3.11) to have a quick visual representation of the program flow. If the sketches were deemed to suffice our needs, they were created in a more pleasant visual representation in the form of vector images via Inkscape 5. These drawings came in handy during the implementation but more on that later on. The initial flow of the program consisted of the meeting creator activating the smartwatch with a gesture and then giving a voice command to create a meeting. After the meeting creator had approved the meeting creation, a sound was emitted to create a group. When participants were detected, a slot was calculated that was available for all participants. After a participant accepted or denied a slot, the reaction of other group members was anticipated. If more than one participant accepted the meeting, the meeting was set. We tested the final paper prototype (see figure 3.12) during a small user test with 4 participants. We recruited the participants from the research facility of Bell Labs Antwerp (two female and two male). Initially, the participants were asked to fill in their demographics and device experiences. Next, we gave the participants some tasks to perform. During the execution of the tasks, we asked the participants to say out loud what they were doing and/or thinking. According to what was said, laddering was used to extract more information regarding the UX. All participants had used speech interaction before and planned in meetings quite often. They all stated that speech interaction was not their preferred way of interacting with small devices such as smartwatches due to the inaccuracy of the speech recognition. Privacy and habits of touch interaction also played an important part in not wanting to use speech interaction. One user stated that they did not want others to know what they were doing in a work or private environment when interacting with a device. Some users (N=3) stated that they would not feel comfortable using speech recognition. There was a difference though between the acceptability of speech recognition according to the type of environment. Public places were mostly not accepted by the users (N=3) to 5

12: The program flow for the meeting creator (on the right) and the meeting attendee (on the left) in

46 Chapter 3. Concept Figure 3.11: Some sketches of the original program flow. During development and creation of the paper prototype, some of these sketches were altered. Figure 3.12: The program flow for the meeting creator (on the right) and the meeting attendee (on the left) in the form of a paper prototype. The task represented here is the creation of a meeting for next week with everyone present in the room. 32

47 Chapter 3. Concept 33 use voice interaction due to privacy and social awkwardness. In the workplace, privacy was the most important factor for not using speech interaction (N=2). Most of the participants (N=3) thought that in a home environment both factors, privacy and social awkwardness, wouldn t be a big deal. There was a profound difference between the UX of people that had used smartwatches before (N=2) and people that had not (N=2). When asked if they found that buttons were acceptable to accept or deny something, the smartwatch users stated that swiping could also be used and would be more acceptable to them. The other users thought that the buttons were acceptable, and no other interaction method should substitute this. Another remark came from the fact that some symbols like the sound emission and retrieval of a date were not clear enough. All participants stated that a textual representation of what was going on should be used along with the symbolic representations. Since all participants made the remark, the alteration was made during implementation of the application. If no open slot was found, the participants opted for forcing a meeting when most people were available (N=2) or giving another option later on than the time frame given (N=2). If no one could be detected to create a group with most participants (N=3) stated that he or she would like to send out the signal simply once more. One participant said that the confidence would be lost in the application and that it would just create more frustration if he would have to try again. We think that giving a person a choice to re-emit the signal is the best option. Since the user can choose to re-emit the signal and is not obligated to, frustration can be lowered for users who might have lost confidence in the application simultaneously with the frustration of having to repeat the command. Therefore this option is present in the implementation, and the program flow was adjusted. A scroll action was proposed to display large texts on the smartwatch. All participants agreed that this was an appropriate interaction to show large texts as long as only the upper half of the last sentence on the screen was displayed. In that case, the user would be aware that there is text to come. The participants deemed the scrolling movement as natural since they all had used a smartphone and computer before. Along with remarks regarding the UX, participants stated which functionalities that they would like to see in the application:

48 Chapter 3. Concept Add or remove other members from the group (N=2). 2. Give priority to members (N=1). 3. Remove himself from a group (N=1). 4. Voice interaction and touch interaction through the whole application (N=1). 5. Booking a room while creating a meeting (N=3). We did not implement number 4 since we thought that this option would not create more value towards the UX. It could even add more security risks regarding the application. If we use speech recognition for the original command, and another user gives that command, we still need to accept it by touch ourselves. When speech interaction is utilized for the acceptance as well, others in the room could accept our meeting. It would also clash with other people accepting their appointments. Usage of voice interaction means that voice recognition should be performed to enforce security. Due to extra calculations that need to be implemented in combination with the limited amount of computation power and room available on the device, another connection needs to be made with a server. All of these extra calculations take up time and battery power. Therefore we do not think that the benefits of implementing speech recognition throughout the program would justify the added power and time consumption. Item 1 and 3 were also not implemented since we propose our group detection method for a collocated group formation within a confined space. Every person that should be inside the group needs to be inside the enclosed space. It is not intended for adding people that are not present. If people are not supposed to be included in the group, they should leave the confined space before group formation commences. It would be a good idea to provide a priority for the team members (item 2). In that way, if no open slot was found for all participants we could exclude people that do not have a high priority and search for an open timeslot again. This prioritizing of people, however, puts enormous pressure on the team creator. He or she needs to decide the priority of the group members. If discussed among the members, this could result in emotional distress of the people being voted as Not important enough to attend.. Since this element could introduce more research questions and issues than it would resolve, we opted not to implement it in our application for the moment.

49 Chapter 3. Concept 35 Item 5 and 6 are relevant to our application. Due to time restrictions for the implementation, we did not implement these features. To provide context for our calendar application (item 5), we could show the group members their schedule that week on a public display that is available in the room. Showing personal schedules would, however, mean that we need an extra screen. We can not use the smartwatch screen for these types of visualizations since the screen is too small to provide a useful context. Showing personal calendars also imposes privacy concerns. Not every group member may be comfortable with everyone knowing their calendar details. The privacy concern could be resolved by only providing time blocks with no further information. With this solution, some context is lost to gain privacy. To book a room at the same moment (item 6), we could give every room its own calendar. A database would be needed to store the calendar ID of the room, the room name, and room capacity. In that way, we could search for every available time slot if a room is available with the same, or bigger, capacity as the size of the group that was assembled. If the first time slot does not have an available room, the next available time slot will be taken into consideration. The application itself was deemed useful by half of the participants (N=2). Primarily because it would save them time to discuss an appropriate meeting slot and response time would be lower than when using standard applications like Outlook Calendar 6. The primary reason given by the rest of the participants (N=2) for not wanting to use the application was the lack of situations in which meetings would be set with people present in their proximity. If the situation would, however, propose itself, they stated that a normal conversation would suffice to plan a meeting. 3.5 Flow of the meeting creation application At first, the application presents the user with a watch face that he can click to start the speech recognition. Once the user clicks the watch face, it shows via a microphone representation that the user can start giving the voice command. When the user gives the voice command, the application presents the user with the result of the voice command. According to the recognized command one of the following screens is shown: 6 Introduction-to-the-Outlook-Calendar-bc7ce542-72a7-4bc5-8c59-5bb e4? ui=en-us&rs=en-us&ad=us&fromar=1

50 Chapter 3. Concept 36 An accept and deny button to go through with the group formation for the given scheduling date along with a textual representation of his command. A button to return and a textual representation of the recognized command stating that there was no acceptable date found. A button to return and a textual representation of the recognized command indicating that you need to use the right command format to use the application. All possibilities return to the original watch face except for the accept button. The accept button leads to the sound being emitted to form a group. A symbolic along with a textual representation is shown to make clear that the application is detecting others to create a group. If no one was detected a symbolic representation along with a small text is shown followed by the choice to retry emitting the sound for the group creation or stopping the application in its whole. If a group was detected, a symbolic representation along with a small text is shown to state that the team has been assembled. Following this screen, a screen indicates that the server is searching for a common available date. Both screens are shown on the team leader device as well as on the team member devices. When the server has found a date, a screen shows the found data with the choice to accept or deny the proposed date. During this phase the team members can discuss if this date suits them or not and propose the next step to be taken: everyone accepts the meeting, no one accepts the meeting or some people accept the meeting. In the end, a textual representation is shown to state if the meeting was booked or not. We choose high frequencies in the audible range and the low ultrasound frequencies in order to communicate between devices. The reasoning for this is that high frequencies are more resistant to daily noise (less harmonics from speaking frequencies) and they can not be heard by humans and are therefore less disruptive in a working environment.

51 Chapter 4 Implementation In this chapter, the implementation of the meeting creation application using Grouve will be discussed. 4.1 Assumptions made for implementation Before we discuss the implementation itself, we need to address some assumptions we made during implementation. The first assumption we make is that every device used in this system has a Google Account registered on it. The primary calendar associated with this Google account is the calendar used in the application for meeting scheduling. With this assumption, we also assume that there is only one device present with the same Google account at a time. Multiple devices with the same Google account and the application installed should not be present in the room! Our second assumption is that the Google service account has access to the calendar used by the Google account on the device. We will explain the reasoning for this later on. The third assumption is one made about the internet connection. For our group formation to work, we need an internet connection present. The application will return feedback if this is not the case, but the group formation aspect will not function without a connection to the web. Our fourth assumption is that all the devices using the application have both a speaker 37

52 Chapter 4. Implementation 38 and a microphone sensitive enough to register and send the frequencies used. The fifth assumption is that when creating a group, the whole group is present in the room at the moment of the group creation. We do not provide any means for adding or deleting group members. The sixth and final assumption made regards the blocks of time in the calendar. We assume that every meeting will last 30 minutes and that the work day starts at 09:00 AM and ends at 17:00 AM. 4.2 Choosing the operating system Before the implementation of our meeting creation system that uses Grouve, we had to make a choice: for which operating system (OS) should we develop the application?. When choosing the OS, we also needed to make sure that smartwatches available for this OS would support speakers, microphones, and an internet connection since these are critical for Grouve. Many OS s exist for smartwatches. Therefore we only focused on the four primary OS systems used on smartwatches: Android Wear: A version of the Android OS that designed for smartwatches and wearables in general. This OS needs to pair with a smartphone running Android version 4.3 or higher or ios version 8.2 or newer although functions are then more limited. watchos (original, 2 and 3): An OS that is used only for the Apple Watch. It was based on ios and therefore has similar features. The Apple Watch is limited in functionality if it was not paired with an iphone 5 or a later version of the iphone. Pebble OS: An OS used for the Pebble smartwatches. As with all smartwatches available at the moment, it functions best (more features) when paired with a smartphone. As long as the smartphone carries an OS that uses native ios (ios 8 or higher) or Android (OS 4.3 or higher), it will work for the Pebble Time series. The original Pebble works with earlier ios and Android versions.

53 Chapter 4. Implementation 39 Tizen: An OS based on the Linux kernel and GNU C Library that implements the Linux API. Apart from smartwatches, the OS is also used for other devices ranging from smartphones to smart home appliances. None of the apps that run on Tizen are native. They are all created in HTML5 which means that apart from running on Tizen, they can also run on the web. Unlike with Android Wear, Pebble OS and watchos however, we are not sure with which smartphones Tizen will work. In the early days of Tizen, the smartwatches could only be paired with some Samsung devices. Android Wear was not a good option for us since most of the smartwatches using Android Wear do not have a speaker on board. In fact, speakers were only supported from Android Wear 1.4 and upwards which resulted in dormant speakers present in devices (Huawei Watch and ASUS ZenWatch 2) being activated by the Android Wear update 1. Using Android Wear, we would have been restricted by the number of devices for which we could implement the application. Therefore another OS was chosen. WatchOS as stated before needs to pair with an iphone to have all the possible features available. With WatchOS we are therefore restricted to use only a smartwatch and smartphone created by Apple. This exclusivity lays massive restrictions on the amount of devices for which we could implement our application. Due to the exclusivity of the WatchOS, we did not choose the OS for the implementation. Pebble OS also wasn t an option since all the smartwatches using this OS do not have a speaker. Therefore Pebble OS was not an option for Grouve. Tizen was not chosen due to the fact that only a certain amount of smartphones can pair with Tizen smartwatches. Due to the select nature, we opted for not implementing the application for Tizen. Since none of the OS s named here were applicable for the implementation of our app, we flashed a Samsung Galaxy Gear smartwatch with an Android null ROM. By doing this, we were able to create an application for Android devices in general. Using Android created more options for evaluating and testing the Grouve concept. 1 android-wear-gets-speaker-support-and-new-gestures-voice-input-options/

54 Chapter 4. Implementation Images in the application We reused the images created for the paper prototype concept in the actual implementation. We passed the SVG files created in Inkscape through an online converter called Android SVG to VectorDrawable 2 that converted the SVG files to XML. By doing this, we can use the resulting XML images for all screen densities. 4.4 Speech recognition As a result of the technical study (reference), the speech recognition part was implemented using the IBM Watson-Speech-to-text SDK. However to accommodate the SDK to our specific needs, values inside the speech-android wrapper where changed as follows: In order to make the application as seamless as possible, the end of a voice command is automatically registered when a pause (half second of silence) is detected. To enable this, the continuous parameter inside sendspeechheader (W ebsocketu ploader class) was set to false. Since only the end result of the speech recognition matters for the application, the interim results parameter inside sendspeechheader (W ebsocketu ploader class) was also set to false. The retrieval of intermediate results would not have a useful function in the application and would only lead to more data traffic. To ensure privacy, the X W atson Learning Opt Out header parameter with a value set to true was added inside the recognize method of SpeechT ot ext.java. This prevents the IBM Watson Speech Service from saving speech audio data and recognized results in contrast to other built-in speech recognition services such as Siri and OK Google. The reasoning behind adding this functionality comes from the privacy concern that users had while performing the small scale usability test using the paper prototype. Since the speech recognition could interfere with the sound detection element part of our application, we decided that speech recognition should only be activated when we press the watch face. In this way, we can switch the microphone usage from the group id detection to the speech recognition and back again. The speech recognition itself runs in the background of the application once the watch face is touched. The availability of 2

55 Chapter 4. Implementation 41 the speech recognition is shown in the UI with a microphone (available) or a microphone with a red stripe (unavailable) crossing it. Due to the usage of the IBM speech recognition SDK an internet connection is needed. In an ideal case, the speech recognition could be done on the device itself without the use of any internet connection. Hardware on smartwatches, however, aren t made yet for this kind of applications. More on this subject and developments regarding speech recognition possibilities later on. 4.5 Recognizing commands and dates The result of the speech recognition, a string, needs to be handled so that the actual command can be recognized and verified. The ideal case for this would be the usage of natural language processing (NLP). By using NLP, we can derive the meaning of the text as well as the extraction of specified contents needed to create the meeting such as start time for the lookup and duration of the meeting (information extraction). Stanford CoreNLP 3 is a suite of core NLP tools that could be used to perform NLP. Using rules optimized for our application, the command could be detected as a correct or incorrect command. This NLP tool suite, however, is resource extensive and since smartwatches have limited hardware, using it on the smartwatch itself was no option. We could still use Stanford CoreNLP, but then we would have to implement it as a web service. Since we wanted to limit the amount of internet connections, and the extra battery usage that goes along with it, we opted for regular expressions. Although these are not as efficient as the NLP core suite, it will suffice for our proof-of-concept. For our use case two regular expressions were used: String patternstringone = "(?:book create schedule)(?: a an)(?: appointment meeting)(?: with everyone(?: present here in the room))?(?:(.*)?\\ (?:called named)?(?<=called named)\\ (.*)?)?"; String patterstringtwo = "(?:book create schedule)(?: a an)(?: appointment meeting)(?: with everyone(?: present here in the room))?(.*)?"; 3

56 Chapter 4. Implementation 42 These two regular expressions accept commands such as: create a meeting, create an appointment called new meetin, schedule a meeting for next week called staff meeting, etc. Although these regular expressions aren t perfect (anything could be used instead of for next week), they seem to do the trick. If a command was not recognized, a screen is shown telling the user that he or she should use the right command format: Create a meeting (with everyone here) for [start time] called [meeting name]. If the start time does not contain a correct time format such as next week or as soon as possible, it will return a textual representation and tell the user that the command contained incorrect timing information. In both situations, wrong timing or a wrong command in general, the user will be shown the text that was recognized. This is necessary since the occasion may arise that the IBM speech recognition API returns a wrongfully recognized string. In this way, the user is informed that it was the applications fault and not a wrong command. The start time can contain a natural language date such as next week or in three months. To extract the exact date, we used Natty 4 a natural language date parser that was written in Java. This date is then used to address the search span for a meeting date. 4.6 Google Calendar and free busy As stated before, we assume that every device using the application has a Google account on it. The primary account used on the device should represent the account needed for the calendar scheduling. following code: To get the first Google account on the device we use the private String getaccount (){ Account account = null; AccountManager manager = (AccountManager)getSystemService( ACCOUNT_SERVICE); Account[] accounts = manager.getaccountsbytype("com.google"); if(accounts.length > 0){ account = accounts[0]; 4

57 Chapter 4. Implementation 43 return account.name; } else{ Toast.makeText(this,"You can t use this application since you don t have a google account registered on your device",toast.length_long). show(); finish(); } } return null; This account is then used to create a group on the server. We can only create a group on the group formation server if the account was registered on the server. In our case this can only be done by the company itself, so we assume that all the accounts that need access have already been added to the database. Once a group is formed using the Grouve automatic group formation technique, we use a service account that has access to all the company accounts to create new events in the group member calendars and to check their busy information. We use a service account since for the application to work seamlessly we should omit the authentication step of permitting the server to access the user s calendar. It is okay to assume that a service account would have this type of access since for a corporate environment this is often the case. If the calendars used for planning in meetings is the one linked to a corporate Google account, domain-wide authority could easily be given to the service account. This domain-wide authority would provide instantaneous access to all the workers calendars. When a group is formed, an array is created consisting of 30-minute blocks starting from the default start time (9 AM) and ending with the default end time (17 PM). Then following this, we create a query for the busy information of each group member s calendar. There is a query for each day present in the date range given. If it is a Saturday or Sunday, the program will not take that day into account. When the busy information is received, an array is created consisting of bit-arrays: one for each group member. The length of each of these arrays is equal to the amount of 30-minute time blocks. Using the busy information, we fill in the bit-arrays: 0 if busy, 1 if free in the

58 Chapter 4. Implementation 44 corresponding time block. When all of the arrays are filled in, we compute the busy and free times with the bit- wise AND operator. If a time block contains 1 at the end, this time block is free, and the search will stop. If no time block was found, the search continues until a result is found or we reach the end of the date range. Because the calculations of the free timeslots can take a while, especially if the date range is big, we set the maximum of a group size to 6. This number 6 corresponds with the average group size registered during the user study (see 5.2.4) as well as the maximum group size for Genee 5. With Genee they state that if the group is larger than six people, it is best to give a distinct time to schedule the meeting otherwise the search for a date would take too long. 4.7 Clock watch face Since we use Android on a smartwatch, we do not have access to a standard watch face. Therefore we used a widget to get the same form-factor. First, we wanted to use the AnalogClock Android class for this. This class, however, did not support full customization for the showing of the microphone due to possible security issues. To resolve this, we used an ImageView portrait on the widget and filled this with an analogview. For this to work, we implemented the clock as a Service with an AppWidgetProvider. The AppWidgetProvider makes sure the Service updates the ImageView and the Service creates and changes the view. 4.8 Database structure We use a relational database (mysql) to represent our data used during the group formation. We use a relational database since the data needed for the group formation is well structured (always uses the same fields) and the tables have a strong relationship with each other. The database consists of 4 tables: group (table 4.1), member (table 4.2), members (table 4.3) and task (table 4.4). The members table contains two foreign keys (group id and member id), and the group table contains one foreign key (task id). These foreign keys express the relations between the different tables. 5

59 Chapter 4. Implementation 45 Column Type Null Default extra group id int(4) No auto increment task id int(11) No time created timestamp Yes CURRENT TIMESTAMP Table 4.1: Elements of the group table. Column Type Null extra member id int(11) No auto increment firstname varchar(35) No lastname varchar(35) No varchar(255) No Table 4.2: Elements of the member table. Column Type Null member id int(11) No group id int(4) No has accepted tintint(1) Yes Table 4.3: Elements of the members table. Column Type Null Default Extra task id int(11) No auto increment title varchar(35) No start date datetime No end date datetime No duration int(4) No event id varchar(255) Yes NULL cal id varchar(255) No Table 4.4: Elements of the task table. If someone wants to create a group, the group creation server sends a message to the database. The database then constructs a group and its relevant group id together with a task and its task id. The group id in numeric form is then sent back to the group creator so he or she can send out a sound signal using a hashed version of this group id. More on this sound creation later on. Normally the timestamp in the group table would be used to delete groups from the table after a certain time passed using a cron job. The hosting party (one.com) used to host the website, however, did not have this capability, so the timestamp was not used in this implementation. This is also the reason why the group ids are created using auto-increment. It would be better just to use a random function to create a 4 number

60 Chapter 4. Implementation 46 long id but because of the lack of cron jobs, we cannot remove groups. This, in turn, means that we can quickly reach a point in which finding a random number that hasn t been put into the database is hard and takes much time. When Grouve is used in the real world, it would be better to implement this functionality. 4.9 Data over sound Frequencies able to detect At first, we used the String getproperty (String key) of the AudioManager class in Android. The keys were set to P ROP ERT Y SUP P ORT MIC NEAR ULT RASOUND and P ROP ERT Y SUP P ORT SP EAKER NEAR ULT RASOUND to check if frequencies in the range of khz could be sent and detected. Testing these properties on both devices resulted in null values. Therefore Audio Spectrum Analyzer 6 along with and online frequency generator 7 were used instead. We sent out low, and high-frequency sounds from a MacBook Pro using the online tone generator and captured the audio spectrum on the Samsung Galaxy Gear and Samsung Galaxy S6 using Audio Spectrum Analyzer. We performed all of these actions in an ordinary room with as little noise as possible. During these tests, we noticed that noise could be present that overpowers frequencies in the low-frequency region (see image 4.1b). In the higher frequencies of the audible spectrum (18600Hz Hz), this is not the case. Therefore high frequencies that are more resistant to noise are a better choice (see image 4.1a). We also performed these informal tests with the volume set to 100% on the smartwatch (Samsung Galaxy Gear) and the smartphone (Samsung Galaxy S6). Both devices could register the high frequencies sent by the other device. These results thus support our assumptions made in chapter 3. Because of this, frequencies in the range of Hz Hz were chosen to encode the data. We stopped at Hz (Nyquist frequency) since this is the highest frequency that we can detect when using a sampling rate of Hz. This sampling rate was chosen since according to the AudioTrack class description on the Android developer website Hz is the only sampling rate that is guaranteed to 6 analyzer_for_android&hl=nl Audiotrack class:

61 Chapter 4. Implementation 47 (a) Audio spectrum when a 18700Hz sine wave is emitted. (b) Audio spectrum when a 300Hz sine wave is emitted. Figure 4.1: The audio spectrums on a Samsung S6 using Audio Spectrum Analyzer. The signals registered where emitted through the online tone generator on a MacBook Pro (Retina, 13 inches, medio 2014, 2,6 GHz). In both cases, the volume was set to 25% capacity. The distance between the MacBook Pro and S6 was 60cm in both cases. The red ellipses depict the amplitude of the loudest frequency detected while the underlined text depicts the estimated frequency of the loudest signal detected. work on all Android devices. The frequencies implemented for the data encoding were however adjusted later on since further testing revealed that we could not detect high frequencies such as Hz accurately with the method that we suggest Audio programming in Android We had two options for audio programming in Android: Stick with the SDK (Java with AudioTrack class) or the Native Development Kit (NDK) (C with OpenSL). We chose to stick with the SDK. NDK probably would be a better fit for the signal processing tasks

62 Chapter 4. Implementation 48 since they are CPU extensive, but it would add additional complexity to the development process, so we decided to use the SDK instead. To record audio we created an AudioTrack and start recording as following: int mbuffsizeminin = AudioRecord.getMinBufferSize(44100, AudioFormat. CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT); mrecord = new AudioRecord(MediaRecorder.AudioSource.MIC, 44100, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, mbuffsizeminin ); mrecord.startrecording(); M ediarecorder.audiosource.m IC represent the audio source for the record. We chose this one since other microphone sources tried out (U N P ROCESSED, V OICE COMMUNICAT ION, and V OICE RECOGNIT ION) did not return audio that was useful for our implementation represents the SR of the audio file. We chose this as earlier mentioned because of the Nyquist frequency and it is the only SR guaranteed to work on Android devices. AudioF ormat.channel IN MONO states that we will only use one single channel (signal) to record the audio. It was selected since this configuration of audio channels is supported by all devices according to Google. AudioF ormat.encoding P CM 16BIT states the format in which the data is to be returned. This value was also chosen since it is the only encoding supported by all devices. mbuffsizeminin represents the buffer size to which audio data is written during the recording. To send out audio we created the AudioTrack as following: // Gets the minimum buffersize of an Audiotrack int minbuffsize = AudioTrack.getMinBufferSize(44100,AudioFormat. CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT);

63 Chapter 4. Implementation 49 // Creates an audiotrack object maudiotrack = new AudioTrack(AudioManager.STREAM_MUSIC, 44100,AudioFormat. CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT, minbuffsize, AudioTrack.MODE_STREAM); All the values were chosen here for the same reasons as with recording audio except: AudioF ormat.channel OUT MONO was chosen for the same reasons as the in channel value only, this time, we needed the channel out version. AudioT rack.mode ST REAM is the creation mode where audio data is streamed from Java to the native layer as the audio is playing. Since we are playing audio with high SR (44 100Hz), we use the streaming mode. Using this mode will create overhead but is necessary due to high frequencies used. AudioManager.ST REAM MUSIC represents the audio stream for music playback that is used when playing a sound Creating frequencies After establishing if the devices detected frequencies in the range of Hz, we implemented a method for creating sound with the chosen frequencies. The frequency itself was depicted as a sine wave so we would have a pure wave with no harmonics introduced in the original signal. Since a sine wave is represented by formula (2.1) and the amplitude in our code equals the maximum value of a short (Short.MAX V ALUE), we can rewrite formula (2.1) using formula (2.2) and Short.MAX V ALUE as: Short.MAX V ALUE sin(2π f + ϕ) (4.1) SR We used this exact formula to implement the sound creation. Do note that we replaced SR with the letter N in the actual implementation. The result, however, was not as expected. When we played different frequencies one after another, a click occurred at the beginning and end of each frequency played. We partially resolved this clicking by making sure that the angle which we used to calculate the sinus, was a value between

64 Chapter 4. Implementation 50 2π and 2π (π normalization). We implemented the pi normalization with the following code: angle = ((angle + increment)%(2.0f (f loat)π)) The speakers of the devices still could not handle the sudden change in voltage. Therefore to stop the clicking completely, a 10ms fade in and fade out was performed on every frequency representation. To fade in and fade out the signal, we executed a Hanning window[32] on the samples. The code for the creation of a sine wave (frequency audio) can be found in B.1. We do need to state that when we used this code on other devices that were not used during implementation, some device could still not cope with the change in voltage. To corporate data into these sound waves, we mapped frequencies between Hz and Hz to an alphabet. This alphabet exists out of a frequency that sets the beginning of a message, one that sets the end of a message and 17 frequencies that each represent a letter from a to q. There is exactly a 100 Hz difference between each frequency used. We convert the id received from the database when someone wants to create a group to a string using the hashids library 9. By using a salt, we can encode the strings so that someone who does not have this salt but does know the used alphabet still can t decode the group id transmitted. This hashid and salt usage creates an extra security barrier. While using hashid, we make sure that each signal sent out has a length of four characters plus a beginning and end sequence. Each of these components takes 20 ms to transmit. In total, we thus have a signal of 1,2 seconds that needs to be transmitted. A producer-consumer pattern handles the signal creation and signal emission. The producer produces the signal in a certain order and pushes each element on a Linked- BlockingQueue shared with the consumer. While the producer is alive and the Linked- BlockingQueue has elements, the consumer emits the sound put into the LinkedBlockingQueue. We used a LinkedBlockingQueue since it is a thread-safe implementation to share data. Also, the First-In-First-Out (FIFO) implementation of the LinkedBlockingQueue supports our requirements. 9

65 Chapter 4. Implementation Detecting the frequencies of the created soundwaves Detection of the sine waves is done by using digital signal processing (DSP) techniques. We chose to implement two ways of detecting in the frequency domain: the Goertzel algorithm and Fast Fourier Transform (FFT) both combined with a peak detection in frequency ranges used for the data encoding. The peak detection checks the power (magnitude) of the frequencies. The Discrete Fourier Transform (DFT) is a form of the Fourier transform meant for converting a discrete signal in the time domain with a finite extent to a discrete and finite signal in the frequency domain. The signal that we record is a discrete signal due to sampling and is finite. Therefore DFT needs to be used to convert a signal from the time domain to the frequency domain. Both Goertzel and FFT are based upon DFT. Goertzel provides an efficient evaluation of individual terms of the DFT. The Goertzel algorithm analyses only one target frequency from a signal. It does this by applying a single real-valued coefficient at each iteration. For our application, we would need 19 Goertzel implementations to detect all 19 frequencies. When using Goertzel 10 the following elements can be pre-calculated: k = (int)(0.5 + N targetfrequency ) (4.2) SR ω = ( 2 π N ) k (4.3) sine = sin(ω) (4.4) cosine = cos(ω) (4.5) coefficient = 2 cosine (4.6) Per processing of a sample, we need three other variables; let s say V 1, V 2 and and V 3 where V 2 represents the value of V 1 the last time and V 3 is the value of V 2 the last time. At each beginning of a block of samples V 2 and V 3 need to be reset to 0. For every sample we run the following equations: V 1 = coefficient V 2 V 3 + samplev alue (4.7) V 3 = V 2 (4.8) 10 Source for Goertzel algorithm:http: //

66 Chapter 4. Implementation 52 V 2 = V 1 (4.9) After we perform these calculations for each block of samples, we can calculate the magnitude squared for the chosen frequency (optimized Goertzel) with the formula: magn 2 = V V 2 3 V 2 V 3 coefficient (4.10) The FFT is used to compute the DFT efficiently for the whole spectrum. For the implementation, we used the FFT library ca.uol.aig.fftpack. An original package of Fortran subprograms that was used as a base by the Astronomical Instrument Group of the University of Lethbridge for the Java Library. In both Goertzel and FFT, there is a trade-off between the frequency resolution and the acquisition time. Choosing the size of the blocks is an essential element in this trade-off. The block size (N) namely controls the frequency resolution, sometimes called bin width. If the sampling rate is Hz for example and we have an N = 1720, then the bin width is: binwidth = SR N = = (4.11) 1720 The algorithms then update after each N samples in: timecalculated = N SR = 1720 = seconds = milliseconds (4.12) So this means that every 39 ms we calculate the frequencies. If we use a smaller N, we can detect the peak frequency in a smaller amount of time. The frequency resolution, however, will be worse (the trade-off). In an ideal case, the target frequencies should be in the middle of the bins. This means that the target frequencies are best chosen so that they are an integer multiple of SR N. During testing, however, we noticed that for our devices this did not influence the frequency detection. Therefore we chose to stick with N = 1720 for our implementation since we have a higher acquisition time which results in less time needed for the sending and receiving of the signal. When we compare Goertzel and FFT with each other, Goertzel has a higher order of complexity than FFT for the analysis of the full spectrum. If the following equation applies: M < log 2 N (4.13)

67 Chapter 4. Implementation 53 With N being the block size of the FFT and M the number of frequencies to detect. In our application M = 19 and N = 1720, since 19 (log = ) (4.14) the theory states that we are better off using FFT than with 19 Goertzel filters. We tested both algorithms to see if the theory that Goertzel would be less efficient for 19 different frequencies resource wise than the FFT was correct. The theory was proven to be correct since CPU usage with Goertzel was approximately 1.5 % higher on the Samsung Galaxy S6 than for FFT. We do need to state that Goertzel still uses far less memory than the FFT approach but for our application, Goertzel is not accurate enough. So the detection of our frequencies goes as following: We buffer the incoming timedomain signal from the microphone and sample it (N = 1720) then we compute the FFT of this 39 ms signal. With the results of the FFT, we search for the highest amplitude from bin 1443 to bin 1638 ( to Hz). Then we perform some other calculations since the actual frequency sent and the one received can deviate two bins up or down (the reason why there is a 100 Hz distance in between the frequencies). Whenever we adjust N, it does not matter if it is up or down, the actual sent frequency is still spread among the surrounding bins. Since touching the device that is listening can lead to the same high frequencies being detected up to three times after one another, we opted to send out each signal five times. We then need to detect a frequency five times before it is registered as a character. Once a beginning, four characters and an end are detected, we decode the string by using the hashid library and the appropriate salt. The resulting integer is the id of the group. To detect which frequency a bin represents (in the FFT impelemntation), we need to use the formula: targetf requency = binnumber SR N 2 (4.15) The bin-number itself representing a certain frequency is then given by binnumber = targetfrequency 2 N SNR (4.16)

68 Chapter 4. Implementation 54 To ensure continuous listening, we implemented the frequency detection as a service. The service is started by receiving a broadcast signal when the device is booted. So the service starts up when the device is started. A service however can be stopped in Android due to the user stopping it forcefully or low memory situations. Therefore another action was added to the intent filter for the broadcast receiver. This action is presented when the OnDestroy function of the service is called (the service is stopped). The situation could arise that a part or the whole id was missed due to a restart of the device. To date there are no real solutions for this problem. To record audio and detect the frequencies, we also used a producer-consumer pattern with a shared LinkedBlockingQueue. The producer records the audio while the consumer does the group-id detection and decoding of the received id string Checking for internet connection Since we need an internet connection to use the application, we need to check if there is an internet connection available. Since the Samsung Galaxy Gear does not have a Wifi chip, Bluetooth tethering was used to have an internet connection available on the smartwatch. The detection of a network can easily be done through using the ConnectivityManager in Android. This method, however, does not suffice since the device can connect to a network without an actual internet connection (server hang-on for example). Therefore we ping to test if a web connection is available. Pinging can cost a lot of time, especially on the smartwatch since it is connected to the internet with Bluetooth tethering.

69 Chapter 5 Results and evaluation In this chapter the evaluation and results of Grouve will be discussed. We begin with a small scale technical study of well known speech recognition libraries. After this we present the results of a user study performed with a minimized version of our meeting creation application. Finally a technical study of the group formation itself is presented discussing the user base, range, robustness and performance of Grouve. 5.1 Speech recognition libraries technical study Setup and data use For the small scale technical study we used the Samsung Galaxy S6 1 in combination with the Samsung Gear Live smartwatch. Table 5.1 shows the specifications of the devices used. Although the Gear Live did not have Wi-Fi support from the start, the device contains a Bluetooth-Wi-Fi antenna. Since the Android Wear 1.3 update, the Gear Live does have Wi-Fi support. This Wi-Fi support means that when the Bluetooth connection fails or the phone is out of reach, we can still receive notifications and give voice commands. It can, however, be battery consuming when both functions are on since Bluetooth is the preferred method of connection for the watch. When on the edge of the Bluetooth range, the Gear Live will constantly switch from Bluetooth to Wi-Fi. This constant switching between wireless communication methods causes the

70 Chapter 5. Results and evaluation 56 high power consumption. We used the Gear Live paired with the Samsung S6 through Bluetooth while performing the tests described in this section. The reasoning for this is that voice commands often fail when using Wi-Fi. Bluetooth is also the preferred method of communication for Android Wear when both options are available. While testing we register: which API we used; the sentences tested; the sentences that were recognized; word error rate (WER) 2 and the distance between the smartwatch and the played sentence. To be as correct as possible, the sentences that we test will be recorded with the Galaxy S6 and played accordingly. These sentences are part of the Harvard Sentences 3. For this test we picked list 2 (see table 5.2). We recorded all the sentences in a home environment with an average amount of background noise. The APIs that we will test are: Microsoft Cognitive Services speech recognition (formerly known as Project Oxford) Google speech recognizer Watson IBM speech to text Remind that the results highly depend on the quality of the network available; the microphone used on the smartwatch; the quality of the recorded sentence; the quality of speaker used to play the record and the pronunciation, physical and emotional state of the person with whom we recorded the sentences. Since the reaction time could be bottlenecked by bandwidth, we will not register the time taken to receive a response Difficulties while testing When testing began, there were some issues with the initial setup. When using previously recorded sentences to test the speech recognition APIs, almost none of them returned results. We tested this setup with different distances and different volumes, 2 The WER is often used to register results of speech recognition tests. It calculates what the error rate is in recognized words through the formula W ER = S+D+I. S is the number of substitutions; D is N the number of deletions; I is the number of insertions, and N is the number of reference words. 3 Harvard Sentences is a collection of sample phrases often used for testing and research in telecommunications, speech, and acoustics. Each of the sentences is phonetically balanced and uses particular phonemes at the same frequency as they appear in the English language

71 Chapter 5. Results and evaluation 57 Table 5.1: Technical specifications of the devices used in the small scale technical study regarding different speech recognition APIs. Samsung Galaxy S6 Samsung Gear Live Android version Android Marshmallow 1.4 Android Wear Bluetooth 4.1 Low Energy (LE) 4.0 Low Energy (LE) Memory 64 GB, 3 GB RAM 512MB RAM, 4GB Internal Microphone unknown ICS Battery Li-Ion 2550 mah battery 300mAh Li-Ion (full day of typical use) Display 5.1 inch Super AMOLED 1.63 inch Super AMOLED Sensors Fingerprint, accelerometer, gyro, proximity, compass, barometer, heart rate, SpO2 Wi-Fi yes yes Accelerometer, Gyroscope, Compass, Heart Rate Monitor Table 5.2: Harvard sentences used during testing (list 2). Sentence number Sentence 1 The boy was there when the sun rose. 2 A rod is used to catch pink salmon. 3 The source of the huge river is the clear spring. 4 Kick the ball straight and follow through. 5 Help the woman get back to her feet. 6 A pot of tea helps to pass the evening. 7 Smoky fires lack flame and heat. 8 The soft cushion broke the man s fall. 9 The salt breeze came across from the sea. 10 The girl at the booth sold fifty bonds. but the issue still resided. Not receiving any results probably had to do with the frequencies of the recording being filtered out during signal processing. Because of this, we adjusted the setup to use a live speaker as input. When we made this adjustment, the speech recognition APIs did return correct results. To create a result that was as uniform as possible, we tried to maintain the same volume and pronunciation. Therefore the same person was used (a 24 old female student) in a regular home environment. We measured the distance between the smartwatch and the mouth of the individual speaking to map the WER to a distance.

72 Chapter 5. Results and evaluation Results Mid 2015 Google stated that their error result in speech recognition was around 8 percent 4. Our specific setup registered a higher overall WER (table A.2). Although the results for Google were acceptable at 15 cm distance, the WER rose steeply when at 30 cm and 60 cm distance. After 70 cm distance, speech was hardly recognized as it mostly returned:,,didn t catch that. We also need to state that during testing, Google often flaked in finding results especially when the distance was greater than 15 cm. The results of Microsoft Cognitive Services speech recognition API where overall unacceptable (table A.1). The reasoning for this could be the quality of the microphone used for registering the speech since casual testing on the Samsung S6 itself returned better results. For a smartwatch application in this particular setup the Microsoft Cognitive Services speech recognition API underperforms. The Watson IBM speech to text API had an overall acceptable WER at 15 cm (table A.3). When the distance grew, results were worse. Although the WER is overall greater for the Google API, Watson did not flake once when we tested the speech recognition. It also held up better when the distance became greater. When we looked at the word alternatives for falsely recognized words, the actual reference words are often positioned high (second position) in the alternative list. The conclusion of this technical test is that speech recognition can be used to interact with a smartwatch. Results of the speech recognition, however, are highly depended on the API that is used, the distance between the microphone and the individual speaking, and the quality of the microphone itself. Speech recognition on the smartwatch is meant for a short distance (close to 15 cm) between the person speaking and the microphone. If used otherwise, results will be dramatically worse or even nonexistent. For the distance measurements and the particular setup used, Watson proved to be the better API. 4 Google says its speech recognition technology now has only an 8 percent WER: google-says-its-speech-recognition-technology-now-has-only-an-8-word-error-rate/

73 Chapter 5. Results and evaluation 59 (a) Results sentence one. (b) Results sentence two. (c) Results sentence three. (d) Results sentence four. (e) Results sentence five. (f) Results sentence six. (g) Results sentence seven. (h) Results sentence eight. (i) Results sentence nine. (j) Results sentence ten. Figure 5.1: These graphs show the results for the 3 speech API s tested on the 10 chosen Harvard sentences.

74 Chapter 5. Results and evaluation User study We conducted a user study to investigate which type of group formation was preferred by users: automatic or manual group formation. We try to find a correlation between the system usability, task load, and preferred group formation. Our hypothesis is that the automatic group creation will be deemed easier to use and quicker than the manual alternative. We do suspect that people will, however, choose the manual version over the automatic version since they will probably have had previous experiences with the manual version Participants We recruited a group of 9 people (four female, five male) from the research facility in Bell Labs Antwerp and the University of Hasselt. For recruiting, we used stratified sampling with snowball sampling. All participants owned a smartphone but not a smartwatch. Participant s age ranged from 21 to 55 years (µ = 34.8,σ = 11.2) with an education ranging from high school to a Ph.D. Only one of the participants owned a smartwatch and used it regularly for notifications as well as timing purposes. A little over half of the participants (N = 5) had never used speech recognition before. The other part of the participants was divided into two groups, the ones who had tested speech recognition (N = 2) and the ones who often used speech recognition in the car (N = 2). Most of the participants often created meetings (N = 6) while some created meetings from time to time (N = 2) and one participant almost never planned in meetings (N = 1) Apparatus For the manual group formation, we used the Google Calendar application on a computer (MacBook Pro Retina, 13 inch, medio 2014). The automatic group formation was presented in the form of an adjusted version of the meeting creation application with the Samsung Galaxy Gear representing the user and the Samsung Galaxy S6 representing another group member. Because the application works in the same way with two team members as with three team members, we thought that the use of only two devices would not present any problems for the results of the user study. Since we wanted to

75 Chapter 5. Results and evaluation 61 focus on the sort of group creation and not the meeting creation itself, we adjusted the meeting creation application by already setting the parameters for the kind of meeting that we wanted to plan in. In this way, we evaded the use of speech recognition so that this could not influence our test results regarding the group formation. All study participants performed one task with both group formation approaches: Create a meeting for next week with all the members present in the room Procedure Initially, the experimenter read a script to indicate what the purpose was of the user study and what we expected of the participants. Then the experimenter asked the participants to fill in their demographics, device, and meeting creation experiences. Next, the experimenter gave the participants a demo of each device and a task description. The participants were asked to Create a meeting for next week with everyone present in the room. Everyone in the room meant three people (one person was imaginary) Debbie Gijsbrechts, Kylie S, and Kyle b (all Google accounts) with the addresses debbiegijsbrechts@gmail.com, thesistesttwo@gmail.com, and thesitestone@gmail.com respectively. Kylie S represented the participant himself. For the automatic group formation, we used only two accounts (Debbie Gijsbrechts and Kylie S) since we only had two working devices at the time. Because the interaction and time span needed to create the group is the same for a multitude of people, this was not an issue. We added the address of Kyle b to the contacts of Kylie S (the participant) before the experiment. Since the participants already had contact with Debbie Gijsbrechts through , we did not add her contact information ( address) to the contacts of Kylie S. This was done to simulate a situation in which a meeting needs to be created with someone who s name is known but is not added to the contacts yet. Each test was performed inside a meeting room with little to no background noise. At first, we wanted to time the task completion using the automatic and manual group formation. Since the internet connection was variable during the task completion, we did not measure the time taken to complete the given task. After each device condition, the experimenter requested the participant to complete a SUS questionnaire[4] followed by a NASA TLX questionnaire[16]. Since the second part

Chapter 5. Results and evaluation 62 of the NASA TLX questionnaire was deemed Too hard to fill in by all participants, we only used the NASA-TLX rating scales, sometimes called Raw TLX (RTLX).

76 Chapter 5. Results and evaluation 62 of the NASA TLX questionnaire was deemed Too hard to fill in by all participants, we only used the NASA-TLX rating scales, sometimes called Raw TLX (RTLX). In RTLX the weighting process is eliminated. After completing both device conditions, the experimenter conducted a semi-structured interview with the participants and asked the participants what their preferred group creation method was. Later we analyzed the SUS, RTLX and interview data to find the reasoning behind the preferred group formation technique. To avoid the order effect 5, we counterbalanced the participants[23]. This counterbalancing means that we divided all participants into groups and arranged the test conditions ( the type of group formation) in a different order for each group. Each interview was video recorded for later analysis. The total time each participant took was about 50 minutes Results We observed the participants and noticed some problems regarding the automatic group formation. Due to the location of the speaker on the smartwatch (on the band, see figure 5.2), sounds sent out for the group formation were sometimes not received by the other device. The reason for this was that the table on which the participants rested their arm, obstructed the speaker on the smartwatch. This particular problem shows that when an object obstructs the speaker in close proximity, the automatic group formation will not work. One participant (U = 5) also stated that he needed additional feedback on whom the application added to a group. Figure 5.2: The location of the speaker on the Samsung Galaxy Gear. The speaker is located on the band of the smartwatch, more specifically the smartlock. Source image: samsung-galaxy-gear-review-malaysia-price.html. 5 Participants performance may either improve or worsen due to the order of the conditions to test.

IoT. Indoor Positioning with BLE Beacons. Author: Uday Agarwal

IoT. Indoor Positioning with BLE Beacons. Author: Uday Agarwal IoT Indoor Positioning with BLE Beacons Author: Uday Agarwal Contents Introduction 1 Bluetooth Low Energy and RSSI 2 Factors Affecting RSSI 3 Distance Calculation 4 Approach to Indoor Positioning 5 Zone