Visual Information Retrieval Machine (VIRM)

Content-Based Image Retrieval on the World Wide Web
Lola Deutmeyer, Waka Waktola
Dr. David Eichmann, Project Director
Library and Information Science, University of Iowa
Iowa City, IA, USA
July 31, 1998


This paper presents some background information on visual information retrieval (VIR) from the World Wide Web and on the Visual Information Retrieval Machine (VIRM) designed specifically to operate as an image retrieval system.  This paper discusses the graphical user interface implemented by VIRM, the process of fetching and clustering images, i.e. grouping similar images, from the Web, and obstacles to a visual information retrieval machine.  It also proposes possibilities for the future development and improvement of VIRM.

1 Introduction

The Visual Information Retrieval Machine (VIRM) research project evolved from Web users' need to acquire specific information from the continuous complex technology of the World Wide Web.  Many search engines and web crawlers now exist for the purpose of document retrieval based on a specified keyword or words.  However, there is very little assistance available for Web users who wish to search for specific images on the Web, and with the rapid growth of multimedia use on the Internet, image retrieval engines are becoming necessary.  The purpose of our project was to develop an extension search engine that makes use of the capabilities of the exisiting search engines to allow Internet users to directly perform image inquiries on the World Wide Web.  We were particularly interested in discovering and implementing a graphical user interface for the VIRM project that would allow users to search through numerous retrieved images with relative ease.

2 What is VIRM?

The Visual Information Retrieval Machine (VIRM) is a a Java-based image retrieval system for the World Wide Web.  VIRM is a branch of the Spider project.  It is a meta-search engine that interrogates other search engines for HTML documents based on a given keyword.  Upon receiving the related documents, VIRM crawls them for image URLs of the form "*.jpg", "*.jpeg", and "*.gif".  Our part of the project was to develop the part of VIRM that is responsible for taking these image URLs, retrieving the associated image, performing similarity calculations on the the retrieved images, clustering them according to our similarity measures, and displaying the resultant groups in a workable graphical user interface.

2.1 What is Spider?

Spider is the parent of the VIRM project.  It is a Java program that creates and manipulates a relational database of the Web graph, traversing links having patterns of "*.html" or "http://*".  Spider is a system that retrieves visual information URLs from the World Wide Web and displays these URLs through the use of the Sulla program.

2.1.2 What is Sulla?

Sulla is a personal user agent for the World Wide Web (that currently runs on the Spider package) with the following characteristics:
  • an ablility to acquire, retain, and act upon an interest profile
  • autonomous action on goals posed to it by its user
  • user progress appraisal and notification
  • ability to access a variety of information sources
  • ethical behavior, as characterized in Ethical Web Agents
  • 2.2 Architecture of Spider

    Figure 1 depicts the architecture of the Spider program.  When a keyword to search for is entered by a user, a lexer takes the "http://*" documents from the Web, breaks them into their component words, and generates a list of URLs contained in each document.  The crawler is instantiated by the SearchEngine file, and new URLs returned from the web search engines are fed to the crawler.  The crawler returns related documents, which are sent both to the clusterer to be grouped with other documents and to an archiver which stores the document to a database.  The document, once sent to the clusterer, generates cluster threads, which are then given to a grapher.  The grapher shows the clusters of similar documents and shows the edges connecting the various items in each cluster.  These are displayed in the user's window.  Spider is a text retrieval program, and its architecture was the basis for the architecture of VIRM, adapted for image retrieval.
    Search Engine -----> Crawler <----- new URLs
                                     |             |
                         Clusterer          Archiver
                                |                        |
                    Cluster Threads      Database
    Figure 1.  Architecture tree of Spider

    3 Existing data on Visual Information Retrieval (VIR)

    We first started our project by obtaining publications, references, and locations of image retrieval engines from the web to find background information on visual information retrieval and graphical user interfaces for VIR methods.  We then characterized the different GUIs we found in order to determine which interface would work best for our project.  We obtained valuable information from IBM's QBIC (Query By Image Content), UC Berkeley's Digital Library Project and Virage's VIR (Visual Information Retrieval) Engine.  They have done related research and also had demonstration image retrieval engines available on their web sites.  Experimention with these demonstrations prompted us to keep the following points in mind when developing our Visual Image Retrieval Machine (VIRM).
    1. Precision of the various image colors.
    2. Memory limits of image retrieval
    3. Use of color, texture, structure, histogram values, and threshold measurements of the images to determine similarity.

    4 Development of VIRM image retrieval and display

    This section describes the process and steps we followed to build on the Spider program to create VIRM once we finished our background research.  We started by acquainting ourselves with the Java methods used to acquire images from given image URLs and learning to extract necessary information about the images required to cluster them.  Once this was completed, we were able to focus our efforts on the graphical user interface for VIRM, an important part of the VIRM project that gave us a means to be certain that the image clustering algorithm was truly working as it should be.

    4.1 Fetching images from the World Wide Web

    Since the first phase of VIRM, sifting HTML documents for image URLs, was already completed, our mission in this project was to build a better image search engine based on the prototypes set by IBM, UC Berkeley, and Virage.  We created a simple Java program that extends the Frame class, fetches images from specified URL addresses, and displays the acquired images in a new frame.  We then modified the code to display the status of each image, the type of value of the pixels in the image (i.e., whether they were bytes or integers), and the image's width and height.  We used the Image/J  0.70 package (a photoshop analysis page for image manipulation) and incorporated a clone of its histogram class in our code in order to view the histogram graph of the images we fetched.  After some more modifications, we were able to additionally display the red, green, and blue values of each pixel to see if the color values were real. Each pixel starts out as a 32-bit value.  Bits 0 -7 contain the value for the blue component, bits 8 - 15 contain the green component, bits 16 - 23 contain the red component, and bits 24 - 31 contain the alpha value, i.e., the level of transparency of the pixel.  The program we created stored the red, green, and blue components of each pixel in an image to separate arrays.  These components were then compressed down to three bits each using a five bit shift to the right and then stored to a new array.  At this point, we were able to eliminate the Image/J histogram graph code from our program and create our own histogram algorithm.  Our histogram was a simple array record of the number of pixels of each color.  Taking the compressed RGB component values, we could construct the separate values into a single 9-bit integer which served as the index of the array.  For the integer construction, we simply used the equation:
    new pixel color value =  compressed red component  * 64 + compressed green component * 8 + compressed blue component
    The new 9-bit color values were integers that ranged from 0 - 511, so we used these values as the index of the histogram array.  The value of histogram[index] was simply incremented every time another pixel of a color value equal to index was counted.  With this histogram, we were then able to do image similarity comparisons and clustering.

    4.2 Clustering the images

    In our part of the VIRM project, we were not directly involved in the creation of the image clustering algorithm and working code.  For more information on the logistics of the clustering algorithm used in the VIRM project, refer to the work on Ontology-based Information Fusion done by Dr. David Eichmann at

    4.3 Creating the graphical user interface for VIRM

    Once the VIRM clustering code was completed, we began creating a new graphical user interface from scratch.  This user interface (a.k.a. image search interface) is intended to operate solely as an image retrieval engine, not for text searching.  The image search interface is being designed in the hope of making VIRM more user-friendly.  Although it currently resembles Sulla's control panel interface (shown in Figure 2), VIRM's GUI is built with more components, such as menu bars with options for new multiple window files, that give the user the flexibility to run numerous image seach windows simultaneously.  The control panel interface shown in Figure 2 is one that is fine for developing VIRM, but it is not suitable for a general web user to operate.  Thus, we have worked on developing a new interface for VIRM that will be better suited to general image searching.  The VIRM GUI also includes a close window menu option, a quit menu, a help menu option, and button options (start, cancel, and exit).  When the "start" button in the new VIRM GUI is clicked, a new window appears.  This new window contains the thumbnail images, which are displayed in a Panel within an array of scrollbars, as shown in Figure 3.

    The image search interface is still in its first stages of construction.  At this time, the image search program is able to fetch images from specified URL addresses that are sifted from documents relating to a given keyword.  These images are then clustered, and the control panel interface allows for the user/developer to adjust the threshold values of the clusters and thus change the conditions for grouping.  For instance, the "membership threshold" condition is a number between 0 and 1 that determines how similar images in a given cluster must be, zero meaning that they need not be similar at all, and one meaning that they must be identical.  The higher the membership threshold value, the more similar images in a particular cluster are, but also the more clusters that are generated.  The "cluster threshold" is similar, determining if, in the graph of all the existing clusters, an edge should be drawn between any clusters to indicate that they fulfill the cluster threshold measure of similarity.  Experimentation needs to be done with the threshold values to determine which values provide the best similarity groupings without generating an impractical number of clusters.

    Sulla control panel interface
    Figure 2. Sulla control panel interface
    VIRM image display interface
     Figure 3. The VIRM image display interface.

    5 Obstacles encountered

    As with any pioneering project, we encountered many obstacles in the development of VIRM.  Most of our problems stemmed from the details of image loading and processing.  We were able to resolve some of these difficulties, but many can still use refinement.

    5.1 Image compression

    When we worked on compressing the pixel RGB component values from eight bits to three bits, there was some concern for the loss of so many color bits in the compression.  This is reasonable, considering that the images start out with 24 color bits, over 16 million possible colors, and are subsequently cut down to 9 color bits, with only 512 possible colors.  Upon reconstruction of the degraded pixels, though, we discovered that there actually was not a significant difference between the original image and the degraded one (Figure 4).  When we tried increasing the compression of the RGB components to two bit color values, the difference between the original image and the reconstructed image was more marked (Figure 5).  Reducing each RGB component to 2 bits cuts the color range down to only 64 colors.  However, this may still be sufficient to compute image similarity and cluster images while increasing the real time speed of the program, which, at this time, is still rather slow.  Experimentation is still needed to determine what is the ideal compression range for image comparison.
    Compression to 3 bits
    Figure 4.  Compression of RGB components to 3 bits  (L- degraded image, R- original image)
    Compression to 2 bits
    Figure 5.  Compression of RGB components to 2 bits  (L- degraded image, R- original image)

    5.2 Memory shortage

    Another significant problem with visual information retrieval in general is the incredible amount of memory images require to process.  We encountered numerous "java.lang.OutOfMemory" errors during execution of VIRM.  We eliminated many unnecessary array-creation instances in our Java code, which significantly helped the memory situation.  Despite these measures, though, much still needs to be done in saving memory space, such as improving garbage collection methods and avoiding other unnecessary data structure instantiations.

    5.3 Transparent images

    Transparent images posed a problem during VIRM's run time.  Images that contained transparent pixels affected the image similarity computations, because their pixels could not be registered as a particular color, since the pixels weren't even visible.  Also, pixels have different levels of transparency and opacity, so even though a certain pixel may be mostly transparent and show up as very light, it would be registered in the color histogram as having full color brightness.  Thus, we would get images clustered together as "similar" that would have striking color differences.

    5.4 Unsuitable images

    We discovered that while the thumbnail images were displayed, VIRM grabbed not only the user's desired images within the HTML documents but also unnecessary buttons and logos (Figure 6).  Since VIRM simply sifts through HTML documents for image URLs indiscriminately, it retrieves navigational buttons, bullets, bars, and other icons, in addition to images of the type that are actually being searched for.  These so-called "functional" images, which increase the functionality of a page and aid in site navigation, are most likely not the subject of an image search, so there presence in the image search results is generally undesired.
    Navigational buttons  
    Paragraph separator/bar
    Other icons  
    Figure 6. Example functional images

    6 What is next for VIRM?

    The ultimate goal of VIRM is to provide the general public with a suitable image retrieval system for the World Wide Web.  At this stage, VIRM is able to perform image retrieval and display rather crudely.  There are still many problems with the project that need to be resolved, and there is much room for improvement of existing capabilities.  This section will discuss a few of the possible areas for future work and propose ways of building VIRM to its highest potential.

    6.1 Eliminating functional images

    The next step for VIRM would be to give it an ability to distinguish between the desired, searched-for images in an HTML document and the undesired functional images such as navigational buttons, bullets, paragraph separators, and other icons.  Elimination of these numerous unwanted images would save much in the way of computational time and memory, in addition to improving the quality of the search results presented by VIRM.  It has been proposed by V. Harmandas, M. Sanderson, and M.D. Dunlop in their paper "Image retrieval by hypertext links" that such unsuitable functional images could be identified from other non-functional images by the higher number of links with other pages that these images have.  Since functional images are usually navigational or readability enhancement images, they tend to be linked with more pages than other images.  Harmandas, et al demonstrate that over 90% of non-functional images are linked to by only one page, whereas functional images are always linked to by more than two pages, with almost half of them being linked to by more than five.  Reasonably, then, the possibility of functional images being retrieved can be reduced by weighting them inversely to the number of pages linking to them.  A second possibility to remove functional images would be to check each image's color histogram to make sure that there are more than, say, fifty distinct colors.  Since most functional images have very few different colors, this could help cut down on the number of functional images that appear in a VIRM run.

    6.2 User download option

    At this point in VIRM's development, images retrieved from a search can only be viewed as a thumbnail (scaled down form).  This is not of much use for a user who needs to download or manipulate these images.  Future work needs to include the insertion of Java code that will allow the user to have access to the original image.  Two possibilities are to either allow a user to download the image directly from the VIRM thumbnail or to direct the user to the location of the original image.  If the image is to be downloaded from the thumbnail, perhaps a full-scale view of the image could be displayed in a separate frame by a single mouse click on the thumbnail, and the full image could then be downloaded by another mouse click on the full-scale image.

    6.3 Dealing with transparent images

    The similarity of the images VIRM clusters together is not particularly strong.  Some of this may be due to transparent images.  Images that are transparent or that have transparent sections cannot have their transparent pixels registered in the color histogram, since they are not even visible.  Since VIRM is written in Java, it may be helpful in the similarity determination algorithm to use the Java method called getAlpha(), which returns how transparent each pixel in an image is.  The similarity calculations could them be weighted according to the transparency or opacity of each pixel.  Pixels that are completely transparent would have an alpha value of zero so they would not be counted in the color histogram at all, while completely opaque pixels would have an alpha value of 255 and would be given full weight in the histogram.  Intermediate alpha values would be weighted accordingly.  This will hopefully improve the similarity between retrieved images in a particular cluster.

    6.4 Edge detection

    Another area of future improvement for image similarity in VIRM would be applying edge detection capabilities.  In order to avoid the grouping of images with strong color similarity but obvious structural differences (e.g., a red rose on a black background and a dark volcano with red molten lava), an edge detection scheme is important.  Not only the similarity of colors, but the similarity of placement of these colors needs to be observed.  The edge detection or color placement scheme could be applied in the image clustering code after the affirmation of color resemblance.

    6.5 Improving the GUI

    VIRM's development also needs to be extended in its graphical user interface to make it flexible and to give it a better display of windows and images.  VIRM's new user interface is currently under construction and the next phase would be to instantiate the necessary files in the Spider package in order for the image search interface to operate on Spider, but particularly for acquiring images.  One question that still needs to be answered as far as the display of the thumbnails is concerned is whether proportional scaling should be used to make them or if a set size thumbnail should be used.  Proportional scaling makes it much easier to tell what an image is supposed to be, but a set size image is much easier to view in a scroll pane, since all scaled images will be the same size.  Development and improvement of the graphical user interface is the next big phase of the VIRM project.