The Extended Bag of Words Model for Visual Recognition and Categorization

Miaomiao Liu; Xinde Li; Xiaobin Jin; Xiao Zhang

doi:10.15377/2409-9694.2014.01.02.4

Articles

Vol. 1 No. 2 (2014)

The Extended Bag of Words Model for Visual Recognition and Categorization

Miaomiao Liu^▸^▾
Xinde Li^▸^▾
Xiaobin Jin^▸^▾
Xiao Zhang^▸^▾

PDF

DOI: https://doi.org/10.15377/2409-9694.2014.01.02.4
Submitted: March 10, 2014
Published: 01.07.2021

Abstract

With the development of science and technology, more and more images need to be recognized and categorized. Although the classical Bag of Words (BoW) model has played a great role in the past, there are still many limitations about it, i.e. low precision and accuracy, high complexity of computation, etc. In this paper, it is improved and extended from four ways. Firstly, the features filtered from the background are sampled to reduce the influence of background noise. Secondly, the spatial relationship among all features is integrated with the classical BoW vector to improve the accuracy of recognition and categorization. Thirdly, vocabulary tree is constructed by applying hierarchical K mean value, in order to obtain more reasonable vocabulary list and greatly reduce the clustering time. Fourthly, a weighted visual word histogram is considered, in order to stand out the essential difference among images. At last, some experiments are conducted to show the advantage of the proposed method.

References

Teng L, Tao M, So KI, Sheng HX. Contextual Bag-of-Words for Visual Categorization. IEEE Transactions on Circits and Systems for Vidio Technology 2011; 21(4): 381-392. http://dx.doi.org/10.1109/TCSVT.2010.2041828
Tinglin L, Jing L, Qinshan L, Hangqing L. Expanded bag of words representation for object classification. 2009 16th IEEE International Conference on Image Processing 2009; 297-300: 7-10 Nov.
Agarwal S, Awan A, Roth D. Learning to detect objects in images via a sparse. part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 2004; 26(11): 1475-1490. http://dx.doi.org/10.1109/TPAMI.2004.108
Xianglong L, Yihua L, Wei YA, Bo L. Search by mobile image based on visual and spatial consistency. 2011 IEEE International Conference on Multimedia and Expo 2011; pp. 1-6.
Xiangang C, Jingdong W, Liangtien C, Xiansheng H. Learning to combine multi-resolution spatially-weighted cooccurrence matrices for image representation. 2010 IEEE International Conference on Multimedia and Expo 2010; pp. 631-636.
Meng S, Van H. Image pattern discovery by using the spatial closenessof visual code words. 2011 18th IEEE International Conference on Image Processing 2011; pp. 205-208.
Elsayad I, Martinet J, Urruty T, Djeraba C. A new spatial weighting scheme for bag-of-visual-words. 2010 International Workshop on Content-Based Multimedia Indexing 2010; pp. 1-6.
Lazebnik S, Schmid C, Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Computer Vision and Pattern Recognition. New York, USA: IEEE 2006; pp. 2169- 2178.
Zhang L, Wang C, Xiao B, Shao Y. Image Representation Using Bag-of-phrases. ACTA AUTOMATICA SINICA 2012; 38(1): 46-54. http://dx.doi.org/10.3724/SP.J.1004.2012.00046
Bouachir W, Kardouchi M, Belacel N. Improving Bag of Visual Words Image Retrieval: A Fuzzy Weighting Scheme for Efficient Indexation, 2009 Fifth International Conference on Signal-Image Technology and Internet-Based Systems (SITIS) 2009; pp. 215-220. http://dx.doi.org/10.1109/SITIS.2009.43
Liu C, Yang Y, Chen Y. Constructing Visual Vocabularies Using Sparse Coding for Action Recognition, ICIECS 2009. International Conference On Information Engineering and Computer Science 2009; pp.1-4.
Wang J, Li Y, Zhang Y, Wang C, Xie H, Chen G, Gao X. Bag-of-Features Based Medical Image Retrieval via Multiple Assignment and Visual Words Weighting. IEEE Transactions on Medical Imaging 2011: 30(11): 1996-2011. http://dx.doi.org/10.1109/TMI.2011.2161673
Martin A, Fischler, Bolles RC. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Comm of the ACM 1981; 24(6): 381-395. http://dx.doi.org/10.1145/358669.358692
Yeh T, Lee J, Darrell T. Adaptive Vocabulary Forests for Dynamic Indexing and Category Learning. In Proceeding of IEEE 11th International Conference on Computer Vision 2007; 1-8.
SpÄarck Jones Karen. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 1972; 28(1): 11-21. http://dx.doi.org/10.1108/eb026526
Caltech 256 dataset [EB/OL
VOC 2007 dataset [EB/OL
Chung CC, Jen LC. LIBSVM: a library for support vector machines ACM Transactions on Intelligent Systems and Technology 2011; 2(27): 1-27.

Keywords

BoW
spatial relationship
visual recognition
visual categorization

How to Cite

Miaomiao Liu, Xinde Li, Xiaobin Jin, & Xiao Zhang. (2021). The Extended Bag of Words Model for Visual Recognition and Categorization. International Journal of Robotics and Automation Technology, 1(2), 76–86. https://doi.org/10.15377/2409-9694.2014.01.02.4

[1] Teng L, Tao M, So KI, Sheng HX. Contextual Bag-of-Words for Visual Categorization. IEEE Transactions on Circits and Systems for Vidio Technology 2011; 21(4): 381-392. http://dx.doi.org/10.1109/TCSVT.2010.2041828

[2] Tinglin L, Jing L, Qinshan L, Hangqing L. Expanded bag of words representation for object classification. 2009 16th IEEE International Conference on Image Processing 2009; 297-300: 7-10 Nov.

[3] Agarwal S, Awan A, Roth D. Learning to detect objects in images via a sparse. part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 2004; 26(11): 1475-1490. http://dx.doi.org/10.1109/TPAMI.2004.108

[4] Xianglong L, Yihua L, Wei YA, Bo L. Search by mobile image based on visual and spatial consistency. 2011 IEEE International Conference on Multimedia and Expo 2011; pp. 1-6.

[5] Xiangang C, Jingdong W, Liangtien C, Xiansheng H. Learning to combine multi-resolution spatially-weighted cooccurrence matrices for image representation. 2010 IEEE International Conference on Multimedia and Expo 2010; pp. 631-636.

[6] Meng S, Van H. Image pattern discovery by using the spatial closenessof visual code words. 2011 18th IEEE International Conference on Image Processing 2011; pp. 205-208.

[7] Elsayad I, Martinet J, Urruty T, Djeraba C. A new spatial weighting scheme for bag-of-visual-words. 2010 International Workshop on Content-Based Multimedia Indexing 2010; pp. 1-6.

[8] Lazebnik S, Schmid C, Ponce J. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE Computer Vision and Pattern Recognition. New York, USA: IEEE 2006; pp. 2169- 2178.

[9] Zhang L, Wang C, Xiao B, Shao Y. Image Representation Using Bag-of-phrases. ACTA AUTOMATICA SINICA 2012; 38(1): 46-54. http://dx.doi.org/10.3724/SP.J.1004.2012.00046

[10] Bouachir W, Kardouchi M, Belacel N. Improving Bag of Visual Words Image Retrieval: A Fuzzy Weighting Scheme for Efficient Indexation, 2009 Fifth International Conference on Signal-Image Technology and Internet-Based Systems (SITIS) 2009; pp. 215-220. http://dx.doi.org/10.1109/SITIS.2009.43

[11] Liu C, Yang Y, Chen Y. Constructing Visual Vocabularies Using Sparse Coding for Action Recognition, ICIECS 2009. International Conference On Information Engineering and Computer Science 2009; pp.1-4.

[12] Wang J, Li Y, Zhang Y, Wang C, Xie H, Chen G, Gao X. Bag-of-Features Based Medical Image Retrieval via Multiple Assignment and Visual Words Weighting. IEEE Transactions on Medical Imaging 2011: 30(11): 1996-2011. http://dx.doi.org/10.1109/TMI.2011.2161673

[13] Martin A, Fischler, Bolles RC. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Comm of the ACM 1981; 24(6): 381-395. http://dx.doi.org/10.1145/358669.358692

[14] Yeh T, Lee J, Darrell T. Adaptive Vocabulary Forests for Dynamic Indexing and Category Learning. In Proceeding of IEEE 11th International Conference on Computer Vision 2007; 1-8.

[15] SpÄarck Jones Karen. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 1972; 28(1): 11-21. http://dx.doi.org/10.1108/eb026526

[16] Caltech 256 dataset [EB/OL

[17] VOC 2007 dataset [EB/OL

[18] Chung CC, Jen LC. LIBSVM: a library for support vector machines ACM Transactions on Intelligent Systems and Technology 2011; 2(27): 1-27.