Tetsuro Kitahara Research Overview
Sorry for not being translated into English in part.
Motivation and Aim
One of the major functions lacking in the current computing technology
is recognition of the real world.
Humans use various information obtained from the real world through
their eyes and ears to judge situations and appropriate behavior
in everyday life.
Computers' capability to recognize auditory and visual scenes
is, however, strictly limited.
In particular, there have been relatively few
attempts to investigate sound recognition,
except speech recognition studies.
Techniques for recognizing a variety of sounds, not limited to speech,
will be important to realize sophisticated computers that
extensively use the real-world information.
One major reason why it is difficult for computers to recognize
auditory scenes is that the auditory scenes in the real world usually contain
multiple simultaneous sources of sound.
Because conventional speech recognition studies have assumed that the input
sounds to be recognized are voices spoken by a single speaker,
they did not deal with situations where multiple sources simultaneously
present sound.
Although there have been a number of attempts
to recognize speech
under noisy environments,
the number of the source to be recognized is always one;
the other sound sources are regarded as noise.
We focus on polyphonic music as a target domain of the research into auditory scene recognition. The key point in developing a computational model of auditory scene recognition is predictability. It is often said that music is enjoyable because we can predict how the music unfolds to some extent but cannot perfectly predict it. Our goal is to develop, using probabilistic models such as Bayesian networks, a computer system that listens to music by predicting music.
Three Issues
We have to resolve three issues in achieving this predictive music listening model.
The first issue is signal processing for mixtures of multiple simultaneous sounds. We plan to develop signal processing techniques by extending our musical instrument recognition method based on Instrogram for polyphonic music. Instrogram is a probabilistic representation of what instrument sounds at what time in what pitch. This can be calculated through hidden Markov models prepared for each half note.
The second issue is prediction models for various abstraction levels of music representations. Music is represented in various levels of abstraction: frequency components, notes, chord symbols, and global music structures. Prediction models for each of such abstraction levels should be performed in parallel because they may be mutually dependent.
The third issue is a framework for implementing such signal processing techniques and prediction models. This framework should enables us to easily reuse existing processing modules and integrating them in order to effciently develop a complicated system. We have therefore been develping CrestMuseXML, an extensible framework for XML-based music description, and the CrestMuseXML Toolkit, an open-source library provides common APIs to access various music descriptions.
Furthermore, we are engaged in music information retrieval and musical performance support.

Although many music information processing systems have been developed, it is not easy to integrate them because representations of music data are different. To solve this problem, we developing a unified framework for describing music data, CrestMuseXML, and its open-source toolkit.
References
-
北原 鉄朗,
片寄 晴弘:
"CrestMuseXML (CMX) Toolkit ver.0.40について",
情報処理学会 音楽情報科学 研究報告,2008-MUS-75-17, Vol.2008 , No.50, pp.95--100, May 2008.
[Paper in pdf]
-
北原 鉄朗,
橋田 光代,
片寄 晴弘:
"音楽情報科学研究のための共通データフォーマットの確立を目指して",
情報処理学会 音楽情報科学 研究報告,2006-MUS-66-12, Vol.2007, No.81, pp.149--154, August 2007.
[External Link]

We often have different impressions when listening to the same musical piece played on different musical instruments. This implies the importance of information about instrumentation (on what instruments a musical piece is played) as a factor of music information retrieval (MIR). We have developed a MIR system that searches for musical pieces that have similar instrumentation to that of the piece specified by the user.
This system has been achieved by using a musical instrument recognizer based on Instrogram, a time-frequency representation of instrument existence probabilities. We are also engaged in applications of instrograms to music visualization and music entertainment.
References
-
Tetsuro Kitahara,
Masataka Goto,
Kazunori Komatani,
Tetsuya
Ogata,
and
Hiroshi
G. Okuno:
"Instrogram: Probabilistic Representation of Instrument Existence for Polyphonic Music",
IPSJ Journal,
Vol.48, No.1, pp.214--226, January 2007.
[Paper in pdf]
(第3回IPSJ Digital Courier船井若手奨励賞)(also published in IPSJ Digital Courier Vol.3, No.1, pp.1--13)
[External Link]

Spectral and cepstral features, which are commonly used in content-based music information retrieval (MIR), are useful and powerful features but they do not capture the characteristics of the content of music enough because all they directly represent is a frequency characteristic. To further improve MIR, we have to develop features that directly represent various aspects of music. To design such features, we have focused on the bass part, which play important roles in both rhythm and harmony, and have been engaged in the design of bass-line features and its application to content-based
References
- Yusuke Tsuchihashi,
Tetsuro Kitahara,
and
Haruhiro Katayose:
"Using Bass-line Features for Content-based MIR",
Proceedings of
the 9th International Conference on Music Information Retrieval
(ISMIR 2008),
pp.620--625, September 2008.
[Paper in pdf]
自動作曲,自動編曲,楽曲解釈,自動伴奏といった音楽情報処理タスクは,階層的音楽表現ネットワークにおける既知のノードから未知のノードの推論と考えると,同じモデルで説明することができます.本研究では,各種音楽情報処理タスクを行う統一的アーキテクチャの実現を目指し,現在は,コードボイシングと予測型自動伴奏を取り上げ,タスクごとの課題の洗いだしを行っています.
References
- 勝占 真規子,
北原 鉄朗,
片寄 晴弘,
長田 典子:
"ベイジアンネットワークを用いたコード・ヴォイシング推定システム",
情報処理学会 音楽情報科学/音声言語情報処理 研究報告,
2008-MUS-74-29, 2008-MUS-SLP-70-29, Vol.2008, No.12, pp.163--168, February 2008.
ユーザの即興演奏を監視し,音楽的に不適切かどうかを判定し,不適切なら他の音に補正するという新たな演奏支援システムを開発しました.これまでに様々な演奏支援研究がありましたが,即興演奏を扱ったものは少なく,「楽器演奏経験はあるが,即興演奏は敷居が高い」という方々に対してその敷居を低くし,即興演奏の楽しみを実感していただくのは困難でした.本システムを利用することで,多少音楽的に不自然な旋律を弾いてもスピーカーからは音楽的に自然な旋律が出力されますので,初心者でも臆することなく,即興演奏の楽しさを実感することができます.
References
-
Katsuhisa
Ishida,
Tetsuro Kitahara,
and
Masayuki Takeda:
"ism: Improvisation Supporting System based on Melody Correction",
Proceedings of the International Conference on New
Interfaces for Musical Expression
(NIME 2004),
pp.177--180, June 2004.
[Paper in pdf]
-
Tetsuro Kitahara,
Katsuhisa
Ishida,
and
Masayuki Takeda:
"ism: Improvisation Supporting Systems with Melody Correction and Key Vibration",
Entertainment Computing: Proceedings of the 4th International Conference on
Entertainment Computing (ICEC 2005),
Lecture Notes in Computer Science 3711, F. Kishino, Y. Kitamura, H. Kato and N. Nagata (Eds.), pp.315--327, September 2005.
[Paper in pdf]
[External Link]