CS 47300: Web Information Search and Management

Assignment 1: Types of Information Retrieval

Due 11:59pmEDT Friday, 4 September, 2020

Please turn in a PDF through Gradescope. The answer to each question should begin on a new page. Gradescope is pretty self-explanatory, but if you'd like you can watch the ITaP video instructions. Note that the first time you access Gradescope, you'll need to do it through Brightspace (this registers you for the course in Gradescope.) Make sure that you mark the start/end of each question in Gradescope. Assignments will be graded based on what you mark as the start/end of each question. Please typeset your answers (You won't be doing handwritten responses during a Google phone interview, why would you do so for class?)

1. Ad-Hoc Retrieval

Describe briefly an example of an ad-hoc information retrieval system you have used that is not web search. Include in your description:

2. Filtering as Information Retrieval

Give an example of an information retrieval system you have experienced utilizing filtering, rather than ad-hoc retrieval. Try to answer:

3. Full Text Indexing

Give an example of an Information Retrieval system (it need not be ad-hoc information retrieval) where full text indexing would be expected to work better than controlled vocabulary indexing. Briefly describe why:

4. Stemming and Stopwords

Perform stemming and stopword removal on the following body of text:

"The quick brown fox jumped over the lazy dog."

For each modification you make to the phrase, include either the rule you used for stemming or whether the word is on the stopword list.

5. Inverted Lists

Create an inverted list from the following 3 documents:

D1: The quick brown fox jumped over the lazy dog.

D2: But why did the fox jump over the dog?

D3: Calling the dog lazy is not very nice.

When creating this inverted list, assume the ad-hoc system generating the inverted list simply treats words independently, and applies a greater importance to less common words.

6. Relevant Real-World Applications

The infamous YouTube Recommended Algorithm has been the subject of many a YouTube content creator's woes, and the subject of many a viewer's comment (Like this comment if this video was in your recommended 9 years later). Where does the YouTube Recommended Algorithm fit into the IR field? Is it most likely Ad-Hoc or Filtering? And what kind of ad-hoc/filtering algorithm is it? Provide reasoning for all assertions.

7. Right and Wrong in Information Retrieval

If one were to look up Vaccinations Dangerous, what would one expect to be the subject matter of the majority of the resulting documents? Now, if one were to look up Vaccinations Not Dangerous, how would that change your results? What does this indicate about the correctness of information retrieval systems?


Valid XHTML 1.1