BookWyrm

KathyReid reviewed The Martian Contingency by Mary Robinette Kowal (Lady Astronaut, #4)

Mary Robinette Kowal: The Martian Contingency (english language, 2025)

None

5 stars

Another thoughtful, incisive, empathetic, culturally rich, character-driven instalment of the Lady Astronaut series, in which Elma York draws on her hard-won confidence and self-assuredness not just to steer the second Mars expedition, but to forge a path for the future of humanity on Mars - Martianity?

I'm constantly in awe of the level of technical detail, the painstaking industrial research and plausibility of Mary Robinette Kowal's alternate history, and The Martian Contingency does not disappoint. She has created a form of research method with this series that I'm terming "speculative ethnography", where her imagination, coupled with a meticulous grasp of astro-geology, astro-audionics and astro-mathematics, yields rich, nuanced, deeply immersive world-building.

A case in point is the way planetary temporality is contrasted between Earth and Mars - the differing lengths of day and new names for months provide a distancing mechanism from an Earthen identity and the adolescence of a Martian …

Another thoughtful, incisive, empathetic, culturally rich, character-driven instalment of the Lady Astronaut series, in which Elma York draws on her hard-won confidence and self-assuredness not just to steer the second Mars expedition, but to forge a path for the future of humanity on Mars - Martianity?

I'm constantly in awe of the level of technical detail, the painstaking industrial research and plausibility of Mary Robinette Kowal's alternate history, and The Martian Contingency does not disappoint. She has created a form of research method with this series that I'm terming "speculative ethnography", where her imagination, coupled with a meticulous grasp of astro-geology, astro-audionics and astro-mathematics, yields rich, nuanced, deeply immersive world-building.

A case in point is the way planetary temporality is contrasted between Earth and Mars - the differing lengths of day and new names for months provide a distancing mechanism from an Earthen identity and the adolescence of a Martian one - reinforced by the creation of new rituals and ways of being - and deliberate choices about what to leave behind on a dying planet - and what to intentionally carry forward to a new.

Throughout this series, Kowal has played with Elma York's ambiguous relationship with motherhood and non-motherhood; a deliberate choice not to have children with Nathaniel provides a key plot point in this book. This is nimbly set against a broader view of Elma York as a consummate mother figure; guiding, growing, nurturing those in her command and care, and parenting a habitat, a civilisation, a planet?

I remember tears streaming down my face when reading the very first Lady Astronaut short story, where an aged Elma is caring for a dying Nathaniel and faces a heart-wrenching choice: Kowal spends time deftly fore-shadowing the degenerative illness that eventually weakens Nathaniel, all the while demonstrating their tenderly tensile tethering.

Five stars.

KathyReid reviewed How Confidence Works by Ian Robertson

No cover — Ian Robertson: How Confidence Works (2021, Transworld Publishers Limited)

None

5 stars

This deeply-researched, authoritative, broad-reaching, insightful and compelling book unpacks the concept of confidence across multiple frames - the pillars of confidence, gender-based differences and contrasts in national confidence between countries.

Casting confidence as a resource that pays compounding dividends, Robertson shows how it is a foundational component of success in career, relationships and other aspects of life. Robertson sees action as the key fuel for confidence - moving from a deliberative to an implemental mindset - breaking down intent into concrete, specific, do-able actions.

A strong recommend for people exploring the theoretical, research-grounded basis of confidence - and those to want to apply those findings to their everyday life.

KathyReid rated Recovering from Emotionally Immature Parents: 5 stars

Recovering from Emotionally Immature Parents by Lindsay C. Gibson

KathyReid reviewed The Dictionary of Lost Words

The Dictionary of Lost Words (Paperback, 2022)

In 1901, the word ‘Bondmaid’ was discovered missing from the Oxford English Dictionary. This is …

None

3 stars

As a work of historical fiction, [b:The Dictionary of Lost Words|49354511|The Dictionary of Lost Words|Pip Williams|https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1576570225l/49354511.SY75.jpg|74793187] provides a gentle introduction to socio-linguistics - that is, why do some words mean different things when said by different people in different contexts?

The historical setting of turn-of-the-century England provides a fertile landscape for exploring not only how social class is marked by dialect and by accent, but how language also serves to construct social divides by reinforcing assumptions, taboos and things "that should not be named". At the same time, I was struck by how the absence of language breeds ignorance; you cannot express a sentiment, a desire, an anger, if you do not possess the words that accurately capture its tenor, its flavour, its characteristics, its nuance. A world of fewer words is the poorer because we cannot capture its complexity and are forced instead to describe it in …

As a work of historical fiction, [b:The Dictionary of Lost Words|49354511|The Dictionary of Lost Words|Pip Williams|https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1576570225l/49354511.SY75.jpg|74793187] provides a gentle introduction to socio-linguistics - that is, why do some words mean different things when said by different people in different contexts?

The historical setting of turn-of-the-century England provides a fertile landscape for exploring not only how social class is marked by dialect and by accent, but how language also serves to construct social divides by reinforcing assumptions, taboos and things "that should not be named". At the same time, I was struck by how the absence of language breeds ignorance; you cannot express a sentiment, a desire, an anger, if you do not possess the words that accurately capture its tenor, its flavour, its characteristics, its nuance. A world of fewer words is the poorer because we cannot capture its complexity and are forced instead to describe it in broad, generalising strokes.

While primarily concerned with how language intersects gender and class, the development of the character of Esme also serves to show how languages stratifies people across life stages - the child, the adolescent, the mother, the widow, the maiden aunt - and how words take on new meaning as their context of use shifts.

I desperately wanted Esme, as a character, to recognise the power of the words that was gifted by the characters she interacted with - not just to scribe them or to record their definition verbatim, but somehow weaponise them, use them as bullets in the world war and in the war of equality in which she was embroiled. Giving womens' words ink, and solidifying them in print may have taken them from the mouths of speakers on the street, but this act did nothing to erode the inequality which bore them, or to upend the indignities of the people who spoke them.

KathyReid reviewed The Dictionary of Lost Words

In 1901, the word ‘Bondmaid’ was discovered missing from the Oxford English Dictionary. This is …

Review of 'The Dictionary of Lost Words' on 'Goodreads'

3 stars

As a work of historical fiction, [b:The Dictionary of Lost Words|49354511|The Dictionary of Lost Words|Pip Williams|https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1576570225l/49354511.SY75.jpg|74793187] provides a gentle introduction to socio-linguistics - that is, why do some words mean different things when said by different people in different contexts?

The historical setting of turn-of-the-century England provides a fertile landscape for exploring not only how social class is marked by dialect and by accent, but how language also serves to construct social divides by reinforcing assumptions, taboos and things "that should not be named". At the same time, I was struck by how the absence of language breeds ignorance; you cannot express a sentiment, a desire, an anger, if you do not possess the words that accurately capture its tenor, its flavour, its characteristics, its nuance. A world of fewer words is the poorer because we cannot capture its complexity and are forced instead to describe it in …

As a work of historical fiction, [b:The Dictionary of Lost Words|49354511|The Dictionary of Lost Words|Pip Williams|https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1576570225l/49354511.SY75.jpg|74793187] provides a gentle introduction to socio-linguistics - that is, why do some words mean different things when said by different people in different contexts?

The historical setting of turn-of-the-century England provides a fertile landscape for exploring not only how social class is marked by dialect and by accent, but how language also serves to construct social divides by reinforcing assumptions, taboos and things "that should not be named". At the same time, I was struck by how the absence of language breeds ignorance; you cannot express a sentiment, a desire, an anger, if you do not possess the words that accurately capture its tenor, its flavour, its characteristics, its nuance. A world of fewer words is the poorer because we cannot capture its complexity and are forced instead to describe it in broad, generalising strokes.

While primarily concerned with how language intersects gender and class, the development of the character of Esme also serves to show how languages stratifies people across life stages - the child, the adolescent, the mother, the widow, the maiden aunt - and how words take on new meaning as their context of use shifts.

I desperately wanted Esme, as a character, to recognise the power of the words that was gifted by the characters she interacted with - not just to scribe them or to record their definition verbatim, but somehow weaponise them, use them as bullets in the world war and in the war of equality in which she was embroiled. Giving womens' words ink, and solidifying them in print may have taken them from the mouths of speakers on the street, but this act did nothing to erode the inequality which bore them, or to upend the indignities of the people who spoke them.

KathyReid rated The Body Keeps the Score: Mind, Brain and Body in Transformation of Trauma: 4 stars

Bessel A. Van Der Kolk: The Body Keeps the Score: Mind, Brain and Body in Transformation of Trauma (Paperback, 2019, Penguin/Iota Publishing Limited)

The Body Keeps the Score: Mind, Brain and Body in Transformation of Trauma by Bessel A. Van Der Kolk

Trauma is a fact of life. Veterans and their families deal with the painful aftermath of combat; one in five …

KathyReid rated Body Keeps the Score - Summarized for Busy People : Brain, Mind, and Body in the Healing of Trauma: 4 stars

Body Keeps the Score - Summarized for Busy People : Brain, Mind, and Body in the Healing of Trauma by Goldmine Reads

KathyReid reviewed Crux: How Leaders Become Strategists by Richard Rumelt

None

5 stars

Picking up where left off, [a:Richard P. Rumelt|1709397|Richard P. Rumelt|https://images.gr-assets.com/authors/1521331864p2/1709397.jpg] again pens an accessible, illustrative and easily-practiced guide to developing - and enacting - organisational strategy.

At the heart of both [b:Good Strategy Bad Strategy: The Difference and Why It Matters|11721966|Good Strategy Bad Strategy The Difference and Why It Matters|Richard P. Rumelt|https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1360570381l/11721966.SY75.jpg|16670227] and [b:The Crux: How Leaders Become Strategists|58895961|The Crux How Leaders Become Strategists|Richard P. Rumelt|https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1653036347l/58895961.SY75.jpg|92565703] is the assertion that strategy is about firstly problem-solving: deeply understanding the environment, the organisation and the challenges it faces, the resources and advantages it has. This introspection yields the titular "crux" - the key problem that the organisation has to solve.

Secondly, strategy is about harnessing the organisation's resources to a singular pursuit of a path that solves that key problem; for Rumelt, diversification of efforts is dilution of strategy.

Recommended both for early-career managers, and for seasoned leaders …

Picking up where left off, [a:Richard P. Rumelt|1709397|Richard P. Rumelt|https://images.gr-assets.com/authors/1521331864p2/1709397.jpg] again pens an accessible, illustrative and easily-practiced guide to developing - and enacting - organisational strategy.

At the heart of both [b:Good Strategy Bad Strategy: The Difference and Why It Matters|11721966|Good Strategy Bad Strategy The Difference and Why It Matters|Richard P. Rumelt|https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1360570381l/11721966.SY75.jpg|16670227] and [b:The Crux: How Leaders Become Strategists|58895961|The Crux How Leaders Become Strategists|Richard P. Rumelt|https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1653036347l/58895961.SY75.jpg|92565703] is the assertion that strategy is about firstly problem-solving: deeply understanding the environment, the organisation and the challenges it faces, the resources and advantages it has. This introspection yields the titular "crux" - the key problem that the organisation has to solve.

Secondly, strategy is about harnessing the organisation's resources to a singular pursuit of a path that solves that key problem; for Rumelt, diversification of efforts is dilution of strategy.

Recommended both for early-career managers, and for seasoned leaders who are weary of vague and ill-grounded strategy documents that become shelf-ware.

KathyReid reviewed Crux: How Leaders Become Strategists by Richard Rumelt

Review of 'Crux' on 'Goodreads'

5 stars

Picking up where left off, [a:Richard P. Rumelt|1709397|Richard P. Rumelt|https://images.gr-assets.com/authors/1521331864p2/1709397.jpg] again pens an accessible, illustrative and easily-practiced guide to developing - and enacting - organisational strategy.

At the heart of both [b:Good Strategy Bad Strategy: The Difference and Why It Matters|11721966|Good Strategy Bad Strategy The Difference and Why It Matters|Richard P. Rumelt|https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1360570381l/11721966.SY75.jpg|16670227] and [b:The Crux: How Leaders Become Strategists|58895961|The Crux How Leaders Become Strategists|Richard P. Rumelt|https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1653036347l/58895961.SY75.jpg|92565703] is the assertion that strategy is about firstly problem-solving: deeply understanding the environment, the organisation and the challenges it faces, the resources and advantages it has. This introspection yields the titular "crux" - the key problem that the organisation has to solve.

Secondly, strategy is about harnessing the organisation's resources to a singular pursuit of a path that solves that key problem; for Rumelt, diversification of efforts is dilution of strategy.

Recommended both for early-career managers, and for seasoned leaders …

Picking up where left off, [a:Richard P. Rumelt|1709397|Richard P. Rumelt|https://images.gr-assets.com/authors/1521331864p2/1709397.jpg] again pens an accessible, illustrative and easily-practiced guide to developing - and enacting - organisational strategy.

At the heart of both [b:Good Strategy Bad Strategy: The Difference and Why It Matters|11721966|Good Strategy Bad Strategy The Difference and Why It Matters|Richard P. Rumelt|https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1360570381l/11721966.SY75.jpg|16670227] and [b:The Crux: How Leaders Become Strategists|58895961|The Crux How Leaders Become Strategists|Richard P. Rumelt|https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1653036347l/58895961.SY75.jpg|92565703] is the assertion that strategy is about firstly problem-solving: deeply understanding the environment, the organisation and the challenges it faces, the resources and advantages it has. This introspection yields the titular "crux" - the key problem that the organisation has to solve.

Secondly, strategy is about harnessing the organisation's resources to a singular pursuit of a path that solves that key problem; for Rumelt, diversification of efforts is dilution of strategy.

Recommended both for early-career managers, and for seasoned leaders who are weary of vague and ill-grounded strategy documents that become shelf-ware.

KathyReid reviewed Sound of the Future by Karl Weber

Karl Weber, Tobias Dengel: Sound of the Future (2023, PublicAffairs)

None

4 stars

The summary

Written for a business audience, this book has two distinct sections. The first provides a gentle, integrated primer on voice technologies, such as automatic speech recognition (ASR), speech to text (STT), text to speech (TTS) or voice cloning, and natural language processing (NLP), and links these to the human needs fulfilled by voice technology. The second is essentially an extended pitch deck. Unabashedly techno-optimist in outlook, it seeks to grow the market for voice technologies by encouraging the reader to examine their own organisation’s operations for voice technology use cases, and provides a detailed guide to the user research and interface design steps needed to implement a voice technology program.

This is unsurprising, given [a:Tobias Dengel|27212062|Tobias Dengel|https://s.gr-assets.com/assets/nophoto/user/u_50x66-632230dc9882b4352d753eedf9396530.png] is the CEO of WillowTree, an AI and digital product consulting company recently acquired by TELUS international for $USD 1.2 billion – which focuses on gathering training data for AI applications. …

The summary

Written for a business audience, this book has two distinct sections. The first provides a gentle, integrated primer on voice technologies, such as automatic speech recognition (ASR), speech to text (STT), text to speech (TTS) or voice cloning, and natural language processing (NLP), and links these to the human needs fulfilled by voice technology. The second is essentially an extended pitch deck. Unabashedly techno-optimist in outlook, it seeks to grow the market for voice technologies by encouraging the reader to examine their own organisation’s operations for voice technology use cases, and provides a detailed guide to the user research and interface design steps needed to implement a voice technology program.

This is unsurprising, given [a:Tobias Dengel|27212062|Tobias Dengel|https://s.gr-assets.com/assets/nophoto/user/u_50x66-632230dc9882b4352d753eedf9396530.png] is the CEO of WillowTree, an AI and digital product consulting company recently acquired by TELUS international for $USD 1.2 billion – which focuses on gathering training data for AI applications. His expertise in human-computer interaction (HCI) and user-centred design (UCD) is evident in the first half of the book, where voice technologies are continually grounded in user tasks and experiences. In the second, his experience is shown in the methods advocated for exploring voice use cases, with a focus on HCD methods such as journey mapping. Co-author [a:Karl Weber|213951|Karl Weber|https://images.gr-assets.com/authors/1337042384p2/213951.jpg] is an editor; his collaboration with Dengel makes the text readily approachable and succinct; terms unfamiliar to the lay reader are well described, and the use of acronyms is minimal.

The book draws heavily on examples from industry to highlight key claims, however some of these are now dated. Stanford Open Voice Assistant Lab (OVAL)’s Almond assistant was re-named Genie in 2021, however has not had any active development for over two years, and the research group has pivoted to working primarily in the large language model (LLM) space. The Open Voice Network’s initiatives on trustworthy voice assistants have now been folded into the umbrella of the Linux Foundation. This is perhaps unavoidable in such a fast-moving space.

Part One – Aligning the use of voice technology to the human need for communication

Each chapter in the first half of the book details a particular human need that is met by voice technology.

The Prologue paints a picture of the transformative power of voice tech, showing how it was used to help those physically impaired to be able to communicate again – using speech – the most natural form of communication.

The Introduction makes a bolder claim – that voice is a technological revolution – akin to the internet or to the mobile phone: nascent, latent, reaching a tipping point of “ubiquity and popularity” that we should all be prepared for lest it catch us unawares. While acknowledging that voice tech is currently limited in application, and harbours a panoply of challenges, the authors hand-wave these away, pointing to the rapid advances being made across the vibrant voice tech ecosystem – inhabited by companies such as ReadSpeaker, SoundHound, Cerence and others. The sizeable investments made in voice are given as evidence for the technological revolution, but differentiated from over-hyped failures such as blockchain and the metaverse in that voice “fulfills basic human needs”, which are articulated in subsequent chapters.

Speed makes the case for “even marginal improvements in speed/efficiency” when designing user interfaces, highlighting examples such as search engines and online shopping websites to reinforce the point that speaking to machines is often quicker than typing to them. It imagines a world where the keyboard is eschewed in favour of the the microphone as the primary mode of data input, because this is faster – and time is money. The physical toll of such a change – can you imagine speaking for the same amount of time you type? – is left unexamined. I wonder what Mica Endsley or other human factors scholars would make of this claim.

The next chapter demonstrates how voice technology meets the need of Safety – by being available to assist when the user is physically incapacitated. There is a claim made in this chapter that was particularly contentious: that having a voice assistant in the cockpit would “prevent crashes and save lives”. While plane <-> tower communication is definitely a contributory factor to many incidents, there is no discussion here of the complexity introduced by voice assistants. Imagine, for example, the utterance engine one out! being mis-transcribed as engine won naught!. Sure, the language model can be weighted for cockpit utterances, but mis-transcription is still rife, even in state of the art systems (Whisper, for example, has a 9.3% Word Error Rate as tested on Common Voice 15).

Knowledge makes the case for voice technology as an interface to the world’s information. Rather than having data at your finger tips, it’s now available on the tip of your tongue – overcoming the limitations of screen real estate. Dengel and Weber also make the case here for voice where users are not computer literate: you don’t have to know how to use a computer to ask a question of a voice assistant. What is not well explained here is that access to knowledge is mediated through millions of APIs – and to curate or synthesise them requires additional capabilities. The potential for commercialisation to skew results in a particular way (such as booking sites preferencing those providers that pay them the most) is left unaddressed. This chapter also touches on voice technology as one of many anticipatory systems – having predictive capabilities through audio feature detection to infer an event is about to happen, and respond. What isn’t covered is the downside of this form of machine surveillance, covered well by researchers Joel Stern, Sean Dockray and James Parker in their Machine Listening: Exposed collaboration.

In the chapter on Inclusion, the authors make the case for voice technology building “a more inclusive society”, pointing to advancements in screen readers, speech to text and smart hearing aids as mechanisms that help in “…liberating and empowering individuals who have too long been excluded from mainstream society…”. The challenge of machine translation for the world’s 7100 spoken languages is also addressed, and inequities in the availability of tooling for under-resourced languages and the existing Anglo-centrism of the tech sector, quite rightly, highlighted. Kathleen Siminyu’s work with Common Voice’s East Africa project, which is providing speech data and tools for the Kiswahili project, gets a mention, which delighted me, however when chatting with her, she was unaware of being featured. Absent was any argument for addressing the lack of investment in low-yield languages – languages whose speakers are not “profitable”. This is likely to remain the purview of NGOs and governments for the foreseeable future, lamentably.

Engagement makes the case for voice technology making life “more creative, entertaining and enjoyable”, using radio and television as previously emerging technologies that were fun to use, which drove adoption. Dengel and Weber speculate about what might happen to voice actors in a time of synthesised voices, seeing both the economic reality of the cost of live narration, and, counter-intuitively, the increasing value of human voices in a soundscape saturated by synthetic speech. They go on to link voice tech to the metaverse and to virtual reality, showing how it is a necessary building block in “multi-modal” experiences. Again, there was no concomitant discussion of the ethics of synthetic speech – and importantly, how “synthetification” – the growing movement to synthetic media – shapes power relations, labour relations and who profits.

The chapter on Transformation ties voice technology to “fundamental changes to business models”, through mechanisms such as voice identification through biometrics, and the aggregation of services to provide a streamlined, personal offering. It covers the move from click-through rate (CTR) in screen advertising to say-through rate (STR) for voice-enabled advertising; again however, it does not explore the ethical or societal issues such changes might bring. I’m reminded here of [a:Joseph Turow|41401|Joseph Turow|https://s.gr-assets.com/assets/nophoto/user/u_50x66-632230dc9882b4352d753eedf9396530.png]'s excellent [b:The Voice Catchers: How Marketers Listen In to Exploit Your Feelings, Your Privacy, and Your Wallet|55457694|The Voice Catchers How Marketers Listen In to Exploit Your Feelings, Your Privacy, and Your Wallet|Joseph Turow|https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1605358687l/55457694.SY75.jpg|86483898] – and how voice is being used as a mechanism to target advertising. The chapter goes one step further, exploring the use of vocal biomarkers in health – but again, without the attendant discussion of unintended consequences. Who stands to benefit if a disease can be diagnosed simply through speaking?

Part Two – A program of work for implementing voice technology use cases within the enterprise

Part Two of The Sound of the Future moves from explicating use cases for voice technology to encouraging the reader to implement them, with attendant advice on strategies for doing so.

The chapter on Falling Barriers traces the recent history of voice assistants like Siri and Alexa, positing that what people really want is something more akin to an “all-purpose valet”. This leads into a discussion on technology breakthroughs, and the factors which incentivise them, and uses the COVID-19 pandemic as a case in point – where hands-free, remote interaction provided by voice-enabled devices helped practitioners avoid infection. Here, I would have enjoyed more grounding on the various innovation theories, however this book is clearly aimed at a business, rather than academic, audience. The chapter goes on to outline the key layers of the voice technology stack, such as automatic speech recognition (ASR), natural language processing (NLP) and conversational AI, providing a precis of the current state of the art of each, and remaining barriers. The paradigm of “multi-modal interaction” is then introduced, situating voice technologies alongside haptics and visual interfaces as a constellation of interfaces that collectively are shifting how we sense and respond to our cyber-physical world. User trust in voice technologies is then introduced as another barrier which must be overcome to ease widespread adoption, in particular citing the Trustmark Initiative from the Open Voice Network as a signal that this barrier is falling. The chapter concludes with an overview of how Dengel sees trajectories of development in voice technology, from automation to business process redesign, to transformation of business models.

Making voice an integral part of your existing business systems encourages the reader to “seize the opportunities” voice technologies present, by first identifying places where voice technology could be integrated into existing business systems. The authors provide a helpful list of six principles for assessing whether an interaction is well suited to voice integration, and go on to use examples from industry to highlight how these principles are applied.

In the Training voice tools to understand your world chapter, the authors cover a problem that has long faced voice technology practitioners – the domain specific nature of spoken language. The utterance (spoken phrase) “twelve fifty” has very different meanings in different contexts – it could mean twelve pounds fifty, 12.50pm, 1250g and so on. The advice here is for organisations to identify the “friction points” their customers face, using tools such as journey mapping to better understand those contexts. The chapter goes on to advocate for prototyping of voice technology tools, using UX methods to elicit feedback to guide iterative development, and ensure that the intent of the user – the task the user wants to perform – is matched by the system. The concepts of error flow handling and conversational repair mechanisms are covered here too – essentially serving as a primer on voice user interface design.

Designing and redesigning the multimodal user experience makes the case for voice technologies as part of an omni-channel digital user experience, highlighting voice’s place in an overall brand experience. It discusses how voice can be used to augment and reinforce other digital channels, such as text-based chatbots or graphical user interfaces. Thankfully, there is little hype about the metaverse – which – given its current white elephant status in industry – would detract from the argument that voice technology on its own is transformative – the argument here is that its transformative power emerges in concert with other technology. The chapter includes advice on how to plan an iterative voice UX (VUX) experience design process, and, also pleasingly, highlights the need for inter-disciplinary teams and executive support.

The concluding paragraphs reiterate the argument that “successful new technology is about meeting basic human needs”, and that to be successful, companies must adopt voice – or face defeat in the marketplace.

The verdict

This book is helpful for businesses who are making their first forays into voice assistants, voice user experience (VUX) or conversational AI, in particular those coming to it from a product management or business analysis background. The use cases for voice are expansively surveyed, and applicable to many industries. However, the technical detail is too light for those needing a deeper guide to the pitfalls of voice technologies, such as accent bias in speech recognition, ambiguous named entity recognition in natural language processing or the privacy dangers of voice cloning.

Moreover, while Dengel and Weber correctly identify that many threads of innovation underpin the current state of the art in voice technology – hardware improvements, advances in deep learning and neural networks, and the availability of more speech data upon which machine learning models may be trained, they gloss over the many challenges in the space – the trust people need to have in assistants, the poor performance of voice technology for accented or disordered speech, the privacy and ethical challenges of requiring user data to be effective, and above all, the question of who profits from speech data gathered from individual people.

Beckoning to previous technological path dependencies, they hold that

“… this is the story of any new technical wave. It takes years for entrepreneurs, designers, and engineers to shift their thinking to take full advantage of any new technological paradigm.” - Introduction

When I think of the precursors to today’s voice and speech technologies – the Audrey and the Shoebox, the Harpy, to the Tangora, to Dragon Naturally Speaking, even as far back as Christian Gottlieb Kratzenstein’s work on synthesising human speech with the “vowel organ”, I can’t help but wonder – are voice technologies really a new technical wave? And in taking full advantage of this new technological paradigm, who is it that is taken advantage of?

If voice is the sound of the future, then we must have other conversations about what that future sounds like – and whose voices are heard.

KathyReid reviewed Sound of the Future by Karl Weber

Review of 'Sound of the Future' on 'Goodreads'

4 stars

The summary

Written for a business audience, this book has two distinct sections. The first provides a gentle, integrated primer on voice technologies, such as automatic speech recognition (ASR), speech to text (STT), text to speech (TTS) or voice cloning, and natural language processing (NLP), and links these to the human needs fulfilled by voice technology. The second is essentially an extended pitch deck. Unabashedly techno-optimist in outlook, it seeks to grow the market for voice technologies by encouraging the reader to examine their own organisation’s operations for voice technology use cases, and provides a detailed guide to the user research and interface design steps needed to implement a voice technology program.

This is unsurprising, given [a:Tobias Dengel|27212062|Tobias Dengel|https://s.gr-assets.com/assets/nophoto/user/u_50x66-632230dc9882b4352d753eedf9396530.png] is the CEO of WillowTree, an AI and digital product consulting company recently acquired by TELUS international for $USD 1.2 billion – which focuses on gathering training data for AI applications. …

The summary

Written for a business audience, this book has two distinct sections. The first provides a gentle, integrated primer on voice technologies, such as automatic speech recognition (ASR), speech to text (STT), text to speech (TTS) or voice cloning, and natural language processing (NLP), and links these to the human needs fulfilled by voice technology. The second is essentially an extended pitch deck. Unabashedly techno-optimist in outlook, it seeks to grow the market for voice technologies by encouraging the reader to examine their own organisation’s operations for voice technology use cases, and provides a detailed guide to the user research and interface design steps needed to implement a voice technology program.

This is unsurprising, given [a:Tobias Dengel|27212062|Tobias Dengel|https://s.gr-assets.com/assets/nophoto/user/u_50x66-632230dc9882b4352d753eedf9396530.png] is the CEO of WillowTree, an AI and digital product consulting company recently acquired by TELUS international for $USD 1.2 billion – which focuses on gathering training data for AI applications. His expertise in human-computer interaction (HCI) and user-centred design (UCD) is evident in the first half of the book, where voice technologies are continually grounded in user tasks and experiences. In the second, his experience is shown in the methods advocated for exploring voice use cases, with a focus on HCD methods such as journey mapping. Co-author [a:Karl Weber|213951|Karl Weber|https://images.gr-assets.com/authors/1337042384p2/213951.jpg] is an editor; his collaboration with Dengel makes the text readily approachable and succinct; terms unfamiliar to the lay reader are well described, and the use of acronyms is minimal.

The book draws heavily on examples from industry to highlight key claims, however some of these are now dated. Stanford Open Voice Assistant Lab (OVAL)’s Almond assistant was re-named Genie in 2021, however has not had any active development for over two years, and the research group has pivoted to working primarily in the large language model (LLM) space. The Open Voice Network’s initiatives on trustworthy voice assistants have now been folded into the umbrella of the Linux Foundation. This is perhaps unavoidable in such a fast-moving space.

Part One – Aligning the use of voice technology to the human need for communication

Each chapter in the first half of the book details a particular human need that is met by voice technology.

The Prologue paints a picture of the transformative power of voice tech, showing how it was used to help those physically impaired to be able to communicate again – using speech – the most natural form of communication.

The Introduction makes a bolder claim – that voice is a technological revolution – akin to the internet or to the mobile phone: nascent, latent, reaching a tipping point of “ubiquity and popularity” that we should all be prepared for lest it catch us unawares. While acknowledging that voice tech is currently limited in application, and harbours a panoply of challenges, the authors hand-wave these away, pointing to the rapid advances being made across the vibrant voice tech ecosystem – inhabited by companies such as ReadSpeaker, SoundHound, Cerence and others. The sizeable investments made in voice are given as evidence for the technological revolution, but differentiated from over-hyped failures such as blockchain and the metaverse in that voice “fulfills basic human needs”, which are articulated in subsequent chapters.

Speed makes the case for “even marginal improvements in speed/efficiency” when designing user interfaces, highlighting examples such as search engines and online shopping websites to reinforce the point that speaking to machines is often quicker than typing to them. It imagines a world where the keyboard is eschewed in favour of the the microphone as the primary mode of data input, because this is faster – and time is money. The physical toll of such a change – can you imagine speaking for the same amount of time you type? – is left unexamined. I wonder what Mica Endsley or other human factors scholars would make of this claim.

The next chapter demonstrates how voice technology meets the need of Safety – by being available to assist when the user is physically incapacitated. There is a claim made in this chapter that was particularly contentious: that having a voice assistant in the cockpit would “prevent crashes and save lives”. While plane <-> tower communication is definitely a contributory factor to many incidents, there is no discussion here of the complexity introduced by voice assistants. Imagine, for example, the utterance engine one out! being mis-transcribed as engine won naught!. Sure, the language model can be weighted for cockpit utterances, but mis-transcription is still rife, even in state of the art systems (Whisper, for example, has a 9.3% Word Error Rate as tested on Common Voice 15).

Knowledge makes the case for voice technology as an interface to the world’s information. Rather than having data at your finger tips, it’s now available on the tip of your tongue – overcoming the limitations of screen real estate. Dengel and Weber also make the case here for voice where users are not computer literate: you don’t have to know how to use a computer to ask a question of a voice assistant. What is not well explained here is that access to knowledge is mediated through millions of APIs – and to curate or synthesise them requires additional capabilities. The potential for commercialisation to skew results in a particular way (such as booking sites preferencing those providers that pay them the most) is left unaddressed. This chapter also touches on voice technology as one of many anticipatory systems – having predictive capabilities through audio feature detection to infer an event is about to happen, and respond. What isn’t covered is the downside of this form of machine surveillance, covered well by researchers Joel Stern, Sean Dockray and James Parker in their Machine Listening: Exposed collaboration.

In the chapter on Inclusion, the authors make the case for voice technology building “a more inclusive society”, pointing to advancements in screen readers, speech to text and smart hearing aids as mechanisms that help in “…liberating and empowering individuals who have too long been excluded from mainstream society…”. The challenge of machine translation for the world’s 7100 spoken languages is also addressed, and inequities in the availability of tooling for under-resourced languages and the existing Anglo-centrism of the tech sector, quite rightly, highlighted. Kathleen Siminyu’s work with Common Voice’s East Africa project, which is providing speech data and tools for the Kiswahili project, gets a mention, which delighted me, however when chatting with her, she was unaware of being featured. Absent was any argument for addressing the lack of investment in low-yield languages – languages whose speakers are not “profitable”. This is likely to remain the purview of NGOs and governments for the foreseeable future, lamentably.

Engagement makes the case for voice technology making life “more creative, entertaining and enjoyable”, using radio and television as previously emerging technologies that were fun to use, which drove adoption. Dengel and Weber speculate about what might happen to voice actors in a time of synthesised voices, seeing both the economic reality of the cost of live narration, and, counter-intuitively, the increasing value of human voices in a soundscape saturated by synthetic speech. They go on to link voice tech to the metaverse and to virtual reality, showing how it is a necessary building block in “multi-modal” experiences. Again, there was no concomitant discussion of the ethics of synthetic speech – and importantly, how “synthetification” – the growing movement to synthetic media – shapes power relations, labour relations and who profits.

The chapter on Transformation ties voice technology to “fundamental changes to business models”, through mechanisms such as voice identification through biometrics, and the aggregation of services to provide a streamlined, personal offering. It covers the move from click-through rate (CTR) in screen advertising to say-through rate (STR) for voice-enabled advertising; again however, it does not explore the ethical or societal issues such changes might bring. I’m reminded here of [a:Joseph Turow|41401|Joseph Turow|https://s.gr-assets.com/assets/nophoto/user/u_50x66-632230dc9882b4352d753eedf9396530.png]'s excellent [b:The Voice Catchers: How Marketers Listen In to Exploit Your Feelings, Your Privacy, and Your Wallet|55457694|The Voice Catchers How Marketers Listen In to Exploit Your Feelings, Your Privacy, and Your Wallet|Joseph Turow|https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1605358687l/55457694.SY75.jpg|86483898] – and how voice is being used as a mechanism to target advertising. The chapter goes one step further, exploring the use of vocal biomarkers in health – but again, without the attendant discussion of unintended consequences. Who stands to benefit if a disease can be diagnosed simply through speaking?

Part Two – A program of work for implementing voice technology use cases within the enterprise

Part Two of The Sound of the Future moves from explicating use cases for voice technology to encouraging the reader to implement them, with attendant advice on strategies for doing so.

The chapter on Falling Barriers traces the recent history of voice assistants like Siri and Alexa, positing that what people really want is something more akin to an “all-purpose valet”. This leads into a discussion on technology breakthroughs, and the factors which incentivise them, and uses the COVID-19 pandemic as a case in point – where hands-free, remote interaction provided by voice-enabled devices helped practitioners avoid infection. Here, I would have enjoyed more grounding on the various innovation theories, however this book is clearly aimed at a business, rather than academic, audience. The chapter goes on to outline the key layers of the voice technology stack, such as automatic speech recognition (ASR), natural language processing (NLP) and conversational AI, providing a precis of the current state of the art of each, and remaining barriers. The paradigm of “multi-modal interaction” is then introduced, situating voice technologies alongside haptics and visual interfaces as a constellation of interfaces that collectively are shifting how we sense and respond to our cyber-physical world. User trust in voice technologies is then introduced as another barrier which must be overcome to ease widespread adoption, in particular citing the Trustmark Initiative from the Open Voice Network as a signal that this barrier is falling. The chapter concludes with an overview of how Dengel sees trajectories of development in voice technology, from automation to business process redesign, to transformation of business models.

Making voice an integral part of your existing business systems encourages the reader to “seize the opportunities” voice technologies present, by first identifying places where voice technology could be integrated into existing business systems. The authors provide a helpful list of six principles for assessing whether an interaction is well suited to voice integration, and go on to use examples from industry to highlight how these principles are applied.

In the Training voice tools to understand your world chapter, the authors cover a problem that has long faced voice technology practitioners – the domain specific nature of spoken language. The utterance (spoken phrase) “twelve fifty” has very different meanings in different contexts – it could mean twelve pounds fifty, 12.50pm, 1250g and so on. The advice here is for organisations to identify the “friction points” their customers face, using tools such as journey mapping to better understand those contexts. The chapter goes on to advocate for prototyping of voice technology tools, using UX methods to elicit feedback to guide iterative development, and ensure that the intent of the user – the task the user wants to perform – is matched by the system. The concepts of error flow handling and conversational repair mechanisms are covered here too – essentially serving as a primer on voice user interface design.

Designing and redesigning the multimodal user experience makes the case for voice technologies as part of an omni-channel digital user experience, highlighting voice’s place in an overall brand experience. It discusses how voice can be used to augment and reinforce other digital channels, such as text-based chatbots or graphical user interfaces. Thankfully, there is little hype about the metaverse – which – given its current white elephant status in industry – would detract from the argument that voice technology on its own is transformative – the argument here is that its transformative power emerges in concert with other technology. The chapter includes advice on how to plan an iterative voice UX (VUX) experience design process, and, also pleasingly, highlights the need for inter-disciplinary teams and executive support.

The concluding paragraphs reiterate the argument that “successful new technology is about meeting basic human needs”, and that to be successful, companies must adopt voice – or face defeat in the marketplace.

The verdict

This book is helpful for businesses who are making their first forays into voice assistants, voice user experience (VUX) or conversational AI, in particular those coming to it from a product management or business analysis background. The use cases for voice are expansively surveyed, and applicable to many industries. However, the technical detail is too light for those needing a deeper guide to the pitfalls of voice technologies, such as accent bias in speech recognition, ambiguous named entity recognition in natural language processing or the privacy dangers of voice cloning.

Moreover, while Dengel and Weber correctly identify that many threads of innovation underpin the current state of the art in voice technology – hardware improvements, advances in deep learning and neural networks, and the availability of more speech data upon which machine learning models may be trained, they gloss over the many challenges in the space – the trust people need to have in assistants, the poor performance of voice technology for accented or disordered speech, the privacy and ethical challenges of requiring user data to be effective, and above all, the question of who profits from speech data gathered from individual people.

Beckoning to previous technological path dependencies, they hold that

“… this is the story of any new technical wave. It takes years for entrepreneurs, designers, and engineers to shift their thinking to take full advantage of any new technological paradigm.” - Introduction

When I think of the precursors to today’s voice and speech technologies – the Audrey and the Shoebox, the Harpy, to the Tangora, to Dragon Naturally Speaking, even as far back as Christian Gottlieb Kratzenstein’s work on synthesising human speech with the “vowel organ”, I can’t help but wonder – are voice technologies really a new technical wave? And in taking full advantage of this new technological paradigm, who is it that is taken advantage of?

If voice is the sound of the future, then we must have other conversations about what that future sounds like – and whose voices are heard.

KathyReid reviewed The Relentless Moon by Mary Robinette Kowal (Lady Astronaut, #3)

Mary Robinette Kowal: The Relentless Moon (EBook, 2020, Tom Doherty Associates)

The Earth is coming to the boiling point as the climate disaster of the Meteor …

None

5 stars

Mary Robinette Kowal masterfully weaves a rich tapestry of endearing and nuanced characters, skilful and meticulously researched science, and an alternative history that is poignantly plausible.

Written in first person perspective, her choice of a female protagonist, one whose interpersonal skills are as sharp as her aeronautical ones, gives us a multi- layered insight into the politics of space.

Relentless Moon may be set in the past, but it serves as a parable for the future - what does it take to lead humanity to save itself?

KathyReid reviewed The Relentless Moon by Mary Robinette Kowal (Lady Astronaut, #3)

The Earth is coming to the boiling point as the climate disaster of the Meteor …

Review of 'The Relentless Moon' on 'Goodreads'

5 stars

Mary Robinette Kowal masterfully weaves a rich tapestry of endearing and nuanced characters, skilful and meticulously researched science, and an alternative history that is poignantly plausible.

Written in first person perspective, her choice of a female protagonist, one whose interpersonal skills are as sharp as her aeronautical ones, gives us a multi- layered insight into the politics of space.

Relentless Moon may be set in the past, but it serves as a parable for the future - what does it take to lead humanity to save itself?

KathyReid reviewed Sound of the Future by Karl Weber

Review of 'Sound of the Future' on 'Goodreads'

4 stars

The summary

Written for a business audience, this book has two distinct sections. The first provides a gentle, integrated primer on voice technologies, such as automatic speech recognition (ASR), speech to text (STT), text to speech (TTS) or voice cloning, and natural language processing (NLP), and links these to the human needs fulfilled by voice technology. The second is essentially an extended pitch deck. Unabashedly techno-optimist in outlook, it seeks to grow the market for voice technologies by encouraging the reader to examine their own organisation’s operations for voice technology use cases, and provides a detailed guide to the user research and interface design steps needed to implement a voice technology program.

This is unsurprising, given [a:Tobias Dengel|27212062|Tobias Dengel|https://s.gr-assets.com/assets/nophoto/user/u_50x66-632230dc9882b4352d753eedf9396530.png] is the CEO of WillowTree, an AI and digital product consulting company recently acquired by TELUS international for $USD 1.2 billion – which focuses on gathering training data for AI applications. …

The summary

Written for a business audience, this book has two distinct sections. The first provides a gentle, integrated primer on voice technologies, such as automatic speech recognition (ASR), speech to text (STT), text to speech (TTS) or voice cloning, and natural language processing (NLP), and links these to the human needs fulfilled by voice technology. The second is essentially an extended pitch deck. Unabashedly techno-optimist in outlook, it seeks to grow the market for voice technologies by encouraging the reader to examine their own organisation’s operations for voice technology use cases, and provides a detailed guide to the user research and interface design steps needed to implement a voice technology program.

This is unsurprising, given [a:Tobias Dengel|27212062|Tobias Dengel|https://s.gr-assets.com/assets/nophoto/user/u_50x66-632230dc9882b4352d753eedf9396530.png] is the CEO of WillowTree, an AI and digital product consulting company recently acquired by TELUS international for $USD 1.2 billion – which focuses on gathering training data for AI applications. His expertise in human-computer interaction (HCI) and user-centred design (UCD) is evident in the first half of the book, where voice technologies are continually grounded in user tasks and experiences. In the second, his experience is shown in the methods advocated for exploring voice use cases, with a focus on HCD methods such as journey mapping. Co-author [a:Karl Weber|213951|Karl Weber|https://images.gr-assets.com/authors/1337042384p2/213951.jpg] is an editor; his collaboration with Dengel makes the text readily approachable and succinct; terms unfamiliar to the lay reader are well described, and the use of acronyms is minimal.

The book draws heavily on examples from industry to highlight key claims, however some of these are now dated. Stanford Open Voice Assistant Lab (OVAL)’s Almond assistant was re-named Genie in 2021, however has not had any active development for over two years, and the research group has pivoted to working primarily in the large language model (LLM) space. The Open Voice Network’s initiatives on trustworthy voice assistants have now been folded into the umbrella of the Linux Foundation. This is perhaps unavoidable in such a fast-moving space.

Part One – Aligning the use of voice technology to the human need for communication

Each chapter in the first half of the book details a particular human need that is met by voice technology.

The Prologue paints a picture of the transformative power of voice tech, showing how it was used to help those physically impaired to be able to communicate again – using speech – the most natural form of communication.

The Introduction makes a bolder claim – that voice is a technological revolution – akin to the internet or to the mobile phone: nascent, latent, reaching a tipping point of “ubiquity and popularity” that we should all be prepared for lest it catch us unawares. While acknowledging that voice tech is currently limited in application, and harbours a panoply of challenges, the authors hand-wave these away, pointing to the rapid advances being made across the vibrant voice tech ecosystem – inhabited by companies such as ReadSpeaker, SoundHound, Cerence and others. The sizeable investments made in voice are given as evidence for the technological revolution, but differentiated from over-hyped failures such as blockchain and the metaverse in that voice “fulfills basic human needs”, which are articulated in subsequent chapters.

Speed makes the case for “even marginal improvements in speed/efficiency” when designing user interfaces, highlighting examples such as search engines and online shopping websites to reinforce the point that speaking to machines is often quicker than typing to them. It imagines a world where the keyboard is eschewed in favour of the the microphone as the primary mode of data input, because this is faster – and time is money. The physical toll of such a change – can you imagine speaking for the same amount of time you type? – is left unexamined. I wonder what Mica Endsley or other human factors scholars would make of this claim.

The next chapter demonstrates how voice technology meets the need of Safety – by being available to assist when the user is physically incapacitated. There is a claim made in this chapter that was particularly contentious: that having a voice assistant in the cockpit would “prevent crashes and save lives”. While plane <-> tower communication is definitely a contributory factor to many incidents, there is no discussion here of the complexity introduced by voice assistants. Imagine, for example, the utterance engine one out! being mis-transcribed as engine won naught!. Sure, the language model can be weighted for cockpit utterances, but mis-transcription is still rife, even in state of the art systems (Whisper, for example, has a 9.3% Word Error Rate as tested on Common Voice 15).

Knowledge makes the case for voice technology as an interface to the world’s information. Rather than having data at your finger tips, it’s now available on the tip of your tongue – overcoming the limitations of screen real estate. Dengel and Weber also make the case here for voice where users are not computer literate: you don’t have to know how to use a computer to ask a question of a voice assistant. What is not well explained here is that access to knowledge is mediated through millions of APIs – and to curate or synthesise them requires additional capabilities. The potential for commercialisation to skew results in a particular way (such as booking sites preferencing those providers that pay them the most) is left unaddressed. This chapter also touches on voice technology as one of many anticipatory systems – having predictive capabilities through audio feature detection to infer an event is about to happen, and respond. What isn’t covered is the downside of this form of machine surveillance, covered well by researchers Joel Stern, Sean Dockray and James Parker in their Machine Listening: Exposed collaboration.

In the chapter on Inclusion, the authors make the case for voice technology building “a more inclusive society”, pointing to advancements in screen readers, speech to text and smart hearing aids as mechanisms that help in “…liberating and empowering individuals who have too long been excluded from mainstream society…”. The challenge of machine translation for the world’s 7100 spoken languages is also addressed, and inequities in the availability of tooling for under-resourced languages and the existing Anglo-centrism of the tech sector, quite rightly, highlighted. Kathleen Siminyu’s work with Common Voice’s East Africa project, which is providing speech data and tools for the Kiswahili project, gets a mention, which delighted me, however when chatting with her, she was unaware of being featured. Absent was any argument for addressing the lack of investment in low-yield languages – languages whose speakers are not “profitable”. This is likely to remain the purview of NGOs and governments for the foreseeable future, lamentably.

Engagement makes the case for voice technology making life “more creative, entertaining and enjoyable”, using radio and television as previously emerging technologies that were fun to use, which drove adoption. Dengel and Weber speculate about what might happen to voice actors in a time of synthesised voices, seeing both the economic reality of the cost of live narration, and, counter-intuitively, the increasing value of human voices in a soundscape saturated by synthetic speech. They go on to link voice tech to the metaverse and to virtual reality, showing how it is a necessary building block in “multi-modal” experiences. Again, there was no concomitant discussion of the ethics of synthetic speech – and importantly, how “synthetification” – the growing movement to synthetic media – shapes power relations, labour relations and who profits.

The chapter on Transformation ties voice technology to “fundamental changes to business models”, through mechanisms such as voice identification through biometrics, and the aggregation of services to provide a streamlined, personal offering. It covers the move from click-through rate (CTR) in screen advertising to say-through rate (STR) for voice-enabled advertising; again however, it does not explore the ethical or societal issues such changes might bring. I’m reminded here of [a:Joseph Turow|41401|Joseph Turow|https://s.gr-assets.com/assets/nophoto/user/u_50x66-632230dc9882b4352d753eedf9396530.png]'s excellent [b:The Voice Catchers: How Marketers Listen In to Exploit Your Feelings, Your Privacy, and Your Wallet|55457694|The Voice Catchers How Marketers Listen In to Exploit Your Feelings, Your Privacy, and Your Wallet|Joseph Turow|https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1605358687l/55457694.SY75.jpg|86483898] – and how voice is being used as a mechanism to target advertising. The chapter goes one step further, exploring the use of vocal biomarkers in health – but again, without the attendant discussion of unintended consequences. Who stands to benefit if a disease can be diagnosed simply through speaking?

Part Two – A program of work for implementing voice technology use cases within the enterprise

Part Two of The Sound of the Future moves from explicating use cases for voice technology to encouraging the reader to implement them, with attendant advice on strategies for doing so.

The chapter on Falling Barriers traces the recent history of voice assistants like Siri and Alexa, positing that what people really want is something more akin to an “all-purpose valet”. This leads into a discussion on technology breakthroughs, and the factors which incentivise them, and uses the COVID-19 pandemic as a case in point – where hands-free, remote interaction provided by voice-enabled devices helped practitioners avoid infection. Here, I would have enjoyed more grounding on the various innovation theories, however this book is clearly aimed at a business, rather than academic, audience. The chapter goes on to outline the key layers of the voice technology stack, such as automatic speech recognition (ASR), natural language processing (NLP) and conversational AI, providing a precis of the current state of the art of each, and remaining barriers. The paradigm of “multi-modal interaction” is then introduced, situating voice technologies alongside haptics and visual interfaces as a constellation of interfaces that collectively are shifting how we sense and respond to our cyber-physical world. User trust in voice technologies is then introduced as another barrier which must be overcome to ease widespread adoption, in particular citing the Trustmark Initiative from the Open Voice Network as a signal that this barrier is falling. The chapter concludes with an overview of how Dengel sees trajectories of development in voice technology, from automation to business process redesign, to transformation of business models.

Making voice an integral part of your existing business systems encourages the reader to “seize the opportunities” voice technologies present, by first identifying places where voice technology could be integrated into existing business systems. The authors provide a helpful list of six principles for assessing whether an interaction is well suited to voice integration, and go on to use examples from industry to highlight how these principles are applied.

In the Training voice tools to understand your world chapter, the authors cover a problem that has long faced voice technology practitioners – the domain specific nature of spoken language. The utterance (spoken phrase) “twelve fifty” has very different meanings in different contexts – it could mean twelve pounds fifty, 12.50pm, 1250g and so on. The advice here is for organisations to identify the “friction points” their customers face, using tools such as journey mapping to better understand those contexts. The chapter goes on to advocate for prototyping of voice technology tools, using UX methods to elicit feedback to guide iterative development, and ensure that the intent of the user – the task the user wants to perform – is matched by the system. The concepts of error flow handling and conversational repair mechanisms are covered here too – essentially serving as a primer on voice user interface design.

Designing and redesigning the multimodal user experience makes the case for voice technologies as part of an omni-channel digital user experience, highlighting voice’s place in an overall brand experience. It discusses how voice can be used to augment and reinforce other digital channels, such as text-based chatbots or graphical user interfaces. Thankfully, there is little hype about the metaverse – which – given its current white elephant status in industry – would detract from the argument that voice technology on its own is transformative – the argument here is that its transformative power emerges in concert with other technology. The chapter includes advice on how to plan an iterative voice UX (VUX) experience design process, and, also pleasingly, highlights the need for inter-disciplinary teams and executive support.

The concluding paragraphs reiterate the argument that “successful new technology is about meeting basic human needs”, and that to be successful, companies must adopt voice – or face defeat in the marketplace.

The verdict

This book is helpful for businesses who are making their first forays into voice assistants, voice user experience (VUX) or conversational AI, in particular those coming to it from a product management or business analysis background. The use cases for voice are expansively surveyed, and applicable to many industries. However, the technical detail is too light for those needing a deeper guide to the pitfalls of voice technologies, such as accent bias in speech recognition, ambiguous named entity recognition in natural language processing or the privacy dangers of voice cloning.

Moreover, while Dengel and Weber correctly identify that many threads of innovation underpin the current state of the art in voice technology – hardware improvements, advances in deep learning and neural networks, and the availability of more speech data upon which machine learning models may be trained, they gloss over the many challenges in the space – the trust people need to have in assistants, the poor performance of voice technology for accented or disordered speech, the privacy and ethical challenges of requiring user data to be effective, and above all, the question of who profits from speech data gathered from individual people.

Beckoning to previous technological path dependencies, they hold that

“… this is the story of any new technical wave. It takes years for entrepreneurs, designers, and engineers to shift their thinking to take full advantage of any new technological paradigm.” - Introduction

When I think of the precursors to today’s voice and speech technologies – the Audrey and the Shoebox, the Harpy, to the Tangora, to Dragon Naturally Speaking, even as far back as Christian Gottlieb Kratzenstein’s work on synthesising human speech with the “vowel organ”, I can’t help but wonder – are voice technologies really a new technical wave? And in taking full advantage of this new technological paradigm, who is it that is taken advantage of?

If voice is the sound of the future, then we must have other conversations about what that future sounds like – and whose voices are heard.

KathyReid's books

To Read (View all 227)

Currently Reading (View all 32)

Read (View all 82)

User Activity

Recovering from Emotionally Immature Parents by Lindsay C. Gibson

The Body Keeps the Score: Mind, Brain and Body in Transformation of Trauma by Bessel A. Van Der Kolk

Body Keeps the Score - Summarized for Busy People : Brain, Mind, and Body in the Healing of Trauma by Goldmine Reads

Waste No Time by Stefan Falk