With the development of science and technology, more and more images need to be recognized and categorized. Although the classical Bag of Words (BoW) model has played a great role in the past, there are still many limitations about it, i.e. low precision and accuracy, high complexity of computation, etc. In this paper, it is improved and extended from four ways. Firstly, the features filtered from the background are sampled to reduce the influence of background noise. Secondly, the spatial relationship among all features is integrated with the classical BoW vector to improve the accuracy of recognition and categorization. Thirdly, vocabulary tree is constructed by applying hierarchical K mean value, in order to obtain more reasonable vocabulary list and greatly reduce the clustering time. Fourthly, a weighted visual word histogram is considered, in order to stand out the essential difference among images. At last, some experiments are conducted to show the advantage of the proposed method.