Tetsuro Kitahara Research Overview

Sorry for not being translated into English in part.

Motivation and Aim

One of the major functions lacking in the current computing technology is recognition of the real world. Humans use various information obtained from the real world through their eyes and ears to judge situations and appropriate behavior in everyday life. Computers' capability to recognize auditory and visual scenes is, however, strictly limited. In particular, there have been relatively few attempts to investigate sound recognition, except speech recognition studies. Techniques for recognizing a variety of sounds, not limited to speech, will be important to realize sophisticated computers that extensively use the real-world information.

One major reason why it is difficult for computers to recognize auditory scenes is that the auditory scenes in the real world usually contain multiple simultaneous sources of sound. Because conventional speech recognition studies have assumed that the input sounds to be recognized are voices spoken by a single speaker, they did not deal with situations where multiple sources simultaneously present sound. Although there have been a number of attempts to recognize speech under noisy environments, the number of the source to be recognized is always one; the other sound sources are regarded as noise.

We focus on polyphonic music as a target domain of the research into auditory scene recognition. The key point in developing a computational model of auditory scene recognition is predictability. It is often said that music is enjoyable because we can predict how the music unfolds to some extent but cannot perfectly predict it. Our goal is to develop, using probabilistic models such as Bayesian networks, a computer system that listens to music by predicting music.

Three Issues

We have to resolve three issues in achieving this predictive music listening model.

The first issue is signal processing for mixtures of multiple simultaneous sounds. We plan to develop signal processing techniques by extending our musical instrument recognition method based on Instrogram for polyphonic music. Instrogram is a probabilistic representation of what instrument sounds at what time in what pitch. This can be calculated through hidden Markov models prepared for each half note.

The second issue is prediction models for various abstraction levels of music representations. Music is represented in various levels of abstraction: frequency components, notes, chord symbols, and global music structures. Prediction models for each of such abstraction levels should be performed in parallel because they may be mutually dependent.

The third issue is a framework for implementing such signal processing techniques and prediction models. This framework should enables us to easily reuse existing processing modules and integrating them in order to effciently develop a complicated system. We have therefore been develping CrestMuseXML, an extensible framework for XML-based music description, and the CrestMuseXML Toolkit, an open-source library provides common APIs to access various music descriptions.

Furthermore, we are engaged in music information retrieval and musical performance support.

CrestMuseXML: A Framework for Developing Music Information Processing Systems

Although many music information processing systems have been developed, it is not easy to integrate them because representations of music data are different. To solve this problem, we developing a unified framework for describing music data, CrestMuseXML, and its open-source toolkit.

References

北原鉄朗, 片寄晴弘: "CrestMuseXML (CMX) Toolkit ver.0.40について", 情報処理学会音楽情報科学研究報告,2008-MUS-75-17, Vol.2008 , No.50, pp.95--100, May 2008. [Paper in pdf]
北原鉄朗, 橋田光代, 片寄晴弘: "音楽情報科学研究のための共通データフォーマットの確立を目指して", 情報処理学会音楽情報科学研究報告,2006-MUS-66-12, Vol.2007, No.81, pp.149--154, August 2007.

[External Link]

Instrument Recognition in Polyphonic Music and Its Application to Content-based Music Information Retrieval

We often have different impressions when listening to the same musical piece played on different musical instruments. This implies the importance of information about instrumentation (on what instruments a musical piece is played) as a factor of music information retrieval (MIR). We have developed a MIR system that searches for musical pieces that have similar instrumentation to that of the piece specified by the user. This system has been achieved by using a musical instrument recognizer based on Instrogram, a time-frequency representation of instrument existence probabilities. We are also engaged in applications of instrograms to music visualization and music entertainment.

References

Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, and Hiroshi G. Okuno: "Instrogram: Probabilistic Representation of Instrument Existence for Polyphonic Music", IPSJ Journal, Vol.48, No.1, pp.214--226, January 2007. [Paper in pdf] (第3回IPSJ Digital Courier船井若手奨励賞)(also published in IPSJ Digital Courier Vol.3, No.1, pp.1--13)

[External Link]

Bassline Feature Extraction and Its Application to Content-based Music Informaion Retrieval

Spectral and cepstral features, which are commonly used in content-based music information retrieval (MIR), are useful and powerful features but they do not capture the characteristics of the content of music enough because all they directly represent is a frequency characteristic. To further improve MIR, we have to develop features that directly represent various aspects of music. To design such features, we have focused on the bass part, which play important roles in both rhythm and harmony, and have been engaged in the design of bass-line features and its application to content-based

References

Yusuke Tsuchihashi, Tetsuro Kitahara, and Haruhiro Katayose: "Using Bass-line Features for Content-based MIR", Proceedings of the 9th International Conference on Music Information Retrieval (ISMIR 2008), pp.620--625, September 2008. [Paper in pdf]

Music Information Processing Architecture based on Bayesian Networks

自動作曲，自動編曲，楽曲解釈，自動伴奏といった音楽情報処理タスクは，階層的音楽表現ネットワークにおける既知のノードから未知のノードの推論と考えると，同じモデルで説明することができます．本研究では，各種音楽情報処理タスクを行う統一的アーキテクチャの実現を目指し，現在は，コードボイシングと予測型自動伴奏を取り上げ，タスクごとの課題の洗いだしを行っています．

References

勝占真規子, 北原鉄朗, 片寄晴弘, 長田典子: "ベイジアンネットワークを用いたコード・ヴォイシング推定システム", 情報処理学会音楽情報科学/音声言語情報処理研究報告, 2008-MUS-74-29, 2008-MUS-SLP-70-29, Vol.2008, No.12, pp.163--168, February 2008.

N-gram-based Melody Appropriateness Determination and Its Application to Improvisation Supporting System

ユーザの即興演奏を監視し，音楽的に不適切かどうかを判定し，不適切なら他の音に補正するという新たな演奏支援システムを開発しました．これまでに様々な演奏支援研究がありましたが，即興演奏を扱ったものは少なく，「楽器演奏経験はあるが，即興演奏は敷居が高い」という方々に対してその敷居を低くし，即興演奏の楽しみを実感していただくのは困難でした．本システムを利用することで，多少音楽的に不自然な旋律を弾いてもスピーカーからは音楽的に自然な旋律が出力されますので，初心者でも臆することなく，即興演奏の楽しさを実感することができます．

References

Katsuhisa Ishida, Tetsuro Kitahara, and Masayuki Takeda: "ism: Improvisation Supporting System based on Melody Correction", Proceedings of the International Conference on New Interfaces for Musical Expression (NIME 2004), pp.177--180, June 2004. [Paper in pdf]
Tetsuro Kitahara, Katsuhisa Ishida, and Masayuki Takeda: "ism: Improvisation Supporting Systems with Melody Correction and Key Vibration", Entertainment Computing: Proceedings of the 4th International Conference on Entertainment Computing (ICEC 2005), Lecture Notes in Computer Science 3711, F. Kishino, Y. Kitamura, H. Kato and N. Nagata (Eds.), pp.315--327, September 2005. [Paper in pdf]

[External Link]