The following Special Sessions have been accepted at IEEE MLSP 2023

Spatial Audio Signal Processing – Overview And Machine Learning Challenges

Spatial audio is a current and expanding topic in signal processing, as it is directly related to applications of virtual and augmented reality, which caught great attention in the research community and in industry in recent years. Spatial audio encompasses many challenges, some of which are related to common challenges in audio signal processing, such as speech enhancement and speech separation. However, spatial audio processing generalizes these tasks to multiple channel audio which also requires the output to be multiple channels so it can preserve the spatial information in the signal and can be used for sound reproduction using headphones, for example. While standard signal processing methods have been widely used in spatial audio processing, such as array processing and spatial filtering, time-frequency masking, and minimum-variance designs, recent progress in machine and deep learning present new opportunities. It is recently becoming evident that data-based approaches lead to improved performance in many spatial audio processing tasks, such as speech enhancement and separation and binaural reproduction.

A challenge of practical importance in this field can be described through the following chain – recording sound with only a few microphones mounted on a device, cleaning undesired noise and speakers, while preserving spatial information in desired sources, finally reproducing signals over headphones, leading to real-life auditory experience for a human listener. This is far from being achieved and machine and deep learning will play an important role in bridging the gap in the coming years.

Tutorial Outline:

  • Introducing spatial audio through applications
  • Overview of the spatial audio signal processing chain
  • Binaural recording – the simplest approach
  • Ambisonics – the standard format for spatial audio
  • Beamforming based spatial audio – signal processing and extension to wearable arrays
  • Parametric audio – structured data-based approach
  • Machine and deep learning approaches for spatial audio
  • Current challenges and opportunities for machine and deep learning in spatial audio
  • Audio demo for participants – spatial audio over headphones

Speakers: Boaz Rafaely, Hanan Beit-On, Or Berbi (Ben-Gurion University of the Negev, Israel)

Duration: 2 hours

Bi-Level Optimization In Signal Processing And Machine Learning: Foundations And Emerging Applications

This tutorial aims to offer the audience a holistic understanding of bi-level optimization (BLO), an emerging topic gaining popularity because of its applicability in solving many modern signal processing (SP) and machine learning (ML) problems. BLO can be used to tackle wide variety of SP and ML challenges such as: dynamic resource allocation, robust AI, efficient AI, and generalizable AI. Our tutorial will systematically review various aspects of BLO including theoretical foundations, algorithm development, and practical applications. In addition to technical presentations, we will also offer a carefully-designed Demo Expo to showcase the practical numerical implementations of BLO methods and their applications.

We unfold our overarching goal from several aspects, each of which is extensively developed based on contemporary and classical works, including our prior efforts. The following outline will cover a detailed discussion of recent theoretical developments in BLO, its applications in SP and ML, as well as its efficient implementation.

Tutorial Outline:

  • Introduction of BLO (20 mins; Liu & Hong)
  • Theoretical foundations of BLO (40 minutes + 15 minutes Q&A and break; Khanduri & Hong)
  • BLO at scale (35 minutes + 5 minutes Q&A; Liu & Zhang)
  • Demo Expo (I): Demonstration of BLO algorithms implementation (30 minutes + 15 minutes Q&A and break; Zhang & Khanduri)
  • BLO-oriented SP and ML applications (35 minutes + 5-minute Q&A; Hong & Liu)
  • Demo Expo (II): BLO toolbox and use cases (30 minutes; Zhang & Khanduri)
  • Conclusion and Q&A (10 minutes; Liu & Hong)

Speakers: Yihua Zhang (Michigan State University, MI, USA), Prashant Khanduri (Wayne State University, MI, USA), Ioannis Tsaknakis (University of Minnesota, MN, USA), Mingyi Hong(University of Minnesota, MN, USA), Sijia Liu (Michigan State University, MI, USA)

Duration: 4 hours

Model-Based Deep Learning 

Recent years have witnessed a dramatically growing interest in machine learning (ML). These data-driven trainable structures have demonstrated unprecedented success in various applications, including computer vision and speech processing. The benefits of ML-driven techniques over traditional model-based approaches are twofold: First, ML methods are independent of the underlying stochastic model, and thus can operate efficiently in scenarios where this model is unknown, or its parameters cannot be accurately estimated; Second, when the underlying model is extremely complex, ML has the ability to extract the meaningful information from the observed data. Nonetheless, not every problem should be solved using deep neural networks (DNNs). In scenarios for which model-based algorithms exist and are feasible, these analytical methods are typically preferable over ML schemes due to their performance guarantees and possible proven optimality. Among the notable areas where model-based schemes are typically preferable, and whose characteristics are fundamentally different from conventional deep learning applications, are communications, coding, and signal processing. In this tutorial, we present methods for combining DNNs with model-based algorithms. We will show hybrid model-based/data-driven implementations that arise from classical methods in communications and signal processing and demonstrate how fundamental classic techniques can be implemented without knowledge of the underlying statistical model, while achieving improved robustness to uncertainty.

Tutorial Outline:

  • Introduction and motivation

  • What is model-based deep learning

  • Model-aided networks

  • DNN-aided inference

  • Summary

Speakers: Nir Shlezinger (Ben-Gurion University of the Negev, Israel)

Duration: 1,5 hours

Important Information

Tutorials will be held in the first day of the MLSP Workshop on September 17, 2023.

Tutorials are free for students, life members and attendees from low-income countries, who complete the early registration by August 3rd. Further information on tutorial fees can be found here.

Please email any inquiry on tutorials to: ieeemlsp-scientific@listserv.ieee.org