Study results accepted by AAAI-24, a top international conference on AIRikkyo team develops new image recognition method using Fourier transform

Feb 06, 2024PRESS RELEASE

Study results accepted by AAAI-24, a top international conference on AI
Rikkyo team develops new image recognition method using Fourier transform

OBJECTIVE.

A new image recognition method using the Fourier transform has been developed by a Rikkyo University team comprising Yuki Tatsunami, a second-year doctoral program student at the Graduate School of Artificial Intelligence and Science who is also an engineer at AnyTech Co., Ltd., and Associate Professor Masato Taki. The results of their study have recently been accepted by the 38th Annual AAAI Conference on Artificial Intelligence (AAAI-24), one of the top international conferences in the field of artificial intelligence.

The AAAI Conference on Artificial Intelligence is organized by the Association for the Advancement of Artificial Intelligence (AAAI), which is headquartered in the United States. This year, 2,342 papers (23.75% of the total) were accepted out of 9,862 peer-reviewed papers. The results of this study will be presented at AAAI-24, which will be held in Vancouver, Canada, February 20-27, 2024.

Background of research

In deep learning, the attention mechanism is a technique well-suited for learning long-distance dependencies that establish the relation between elements located far apart. The attention mechanism allows models to selectively and appropriately focus on important information from a wide range of data. In computer vision, it is believed that models can recognize objects and patterns more accurately when they focus on relevant regions in the image. The mechanism, however, has the drawback of requiring a large amount of memory: It calculates individual weights for all input elements and stores the results in memory. This system can cause serious problems, especially in computer vision models that deal with high-resolution images. As the resolution of an image increases, the number of input elements increases, requiring a huge amount of memory and dramatically increasing computation time. When models deal with high-resolution images, physical resource constraints entail expensive hardware, which can create an economic burden.

To avoid this problem, a global filter, which is a fast Fourier transform-based mechanism, has been proposed in recent years as an alternative to the attention mechanism. Similar to the attention mechanism, the global filter can learn long-distance spatial dependencies. The method consists of a fast Fourier transform, element-wise multiplication in frequency domains, and an inverse fast Fourier transform. In contrast to the attention mechanism, this simple system does not require a large amount of memory and only moderately increases in computational complexity as resolution increases. In reality, however, the global filter has not yet achieved state-of-the-art performance.

Results

This study focused on the gaps between the global filter and the attention mechanism and proposed a mechanism called “dynamic filter” that fills the gaps. The global filter is a data-independent operation because it multiplies data by parameters. In contrast, the attention mechanism is a data-dependent operation because it calculates individual weights for each data component. Thus, the global filter and the attention mechanism are different regarding whether they are data-dependent or not.

Now, we have adopted the “dynamic filter,” which dynamically generates a filter to be applied to the data. The dynamic filter calculates weights according to the data and generates a filter according to the weights and a small number of base filters. Using this method can achieve data dependency as in the attention mechanism, while still enjoying the advantages of the global filter.

Next, we have proposed two novel image recognition models: DFFormer, which incorporates the dynamic filter that we have proposed, and CDFFormer, which combines the dynamic filter and a convolutional neural network. There were gaps between the global filter and the attention mechanism, not only in themselves, but also between the macro architectures that employ them. To fill these gaps, we mounted dynamic filters on the architectures that had achieved the highest levels of accuracy and verified the usefulness of the dynamic filters in a fair comparison. These models have achieved accuracy comparable to that of state-of-the-art image recognition models that do not use the attention mechanism. The accuracy has also become closer than before to that of state-of-the-art models that use the attention mechanism, overcoming the previously mentioned problems related to the attention mechanism. In other words, for high-resolution image recognition, our proposed method, similar to the global filter, requires relatively small memory consumption and computation time.

Dynamic filter diagram

Future prospects

In this study, we have developed the DFFormer and CDFFormer. The success of these models may lead to positive reevaluation of fast Fourier transform-based methods. In recent years, there has been a tendency for only large models using the attention mechanism to attract attention. Our study will provide an opportunity to place importance on approaches such as the dynamic filter that have less of an economic burden in terms of training and inference. In addition, we expect that the involvement of dynamic filters will encourage studies that can deepen the understanding of the attention mechanism.

Keywords

AAAI (AAAI Conference on Artificial Intelligence): A top international conference on artificial intelligence
Computer vision: A field of computer image recognition and processing
Attention mechanism: A mechanism with a deep learning model that self-determines and focuses on important information to understand the data it contains. It mimics the cognitive system by which an organism recognizes a target by focusing only on important information.
Convolution: A method of extracting information from images and the like by integrating local information. It is a kind of filtering.
Convolutional neural network: A neural network that uses the convolution operation and is dedicated to images
Fourier transform: A mathematical technique for decomposing signals such as voices and images into different frequency components
Fast Fourier transform: An algorithm that efficiently computes the Fourier transform

Article information

Title: FFT-based Dynamic Token Mixer for Vision
Authors: Yuki Tatsunami, Masato Taki
Link: https://arxiv.org/abs/2303.03932

Taki Laboratory headed by Masato Taki at the Graduate School of Artificial Intelligence and Science

Taki Laboratory conducts extensive research on deep learning, from fundamentals to applications, that will support future applications of artificial intelligence. In addition to the study results discussed in this article, the laboratory has produced various study results, including the paper accepted by NeurIPS 2022, an international conference in the field of machine learning, in 2022.

Other News

Jun 19, 2025
PRESS RELEASE

Few-Electron Highly Charged Muonic Ar Atoms Verified by Electronic K X Rays

A research group including Associate Professor Shinya Yamada of the Department of Physi...

RESEARCH

Search Articles

by Category

PRESS RELEASE INFORMATION ADMISSION RESEARCH EVENT

by Academic Year

2025 2024 2023 2022 2021 2020 2019 2018 2017 2016 2015 2014