Please turn in a PDF through Gradescope. The answer to each question should begin on a new page. Gradescope is pretty self-explanatory, but if you'd like you can watch the ITaP video instructions. Note that the first time you access Gradescope, you'll need to do it through Brightspace (this registers you for the course in Gradescope.) Make sure that you mark the start/end of each question in Gradescope. Assignments will be graded based on what you mark as the start/end of each question. Please typeset your answers (You won't be doing handwritten responses during a Google phone interview, why would you do so for class?)
Describe briefly an example of an ad-hoc information retrieval system you have used that is not web search. Include in your description:
Give an example of an information retrieval system you have experienced utilizing filtering, rather than ad-hoc retrieval. Try to answer:
query?
Give an example of an Information Retrieval system (it need not be ad-hoc information retrieval) where full text indexing would be expected to work better than controlled vocabulary indexing. Briefly describe why:
Perform stemming and stopword removal on the following body of text:
"The quick brown fox jumped over the lazy dog."
For each modification you make to the phrase, include either the rule you used for stemming or whether the word is on the stopword list.
Create an inverted list from the following 3 documents:
D1: The quick brown fox jumped over the lazy dog.
D2: But why did the fox jump over the dog?
D3: Calling the dog lazy is not very nice.
When creating this inverted list, assume the ad-hoc system generating the inverted list simply treats words independently, and applies a greater importance to less common words.
The infamous YouTube Recommended Algorithm
has been the subject of many a
YouTube content creator's woes, and the subject of many a viewer's comment
(Like this comment if this video was in your recommended 9 years later
).
Where does the YouTube Recommended Algorithm fit into the IR field? Is it
most likely Ad-Hoc or Filtering? And what kind of ad-hoc/filtering algorithm
is it? Provide reasoning for all assertions.
Rightand
Wrongin Information Retrieval
If one were to look up Vaccinations Dangerous
, what would one expect to be
the subject matter of the majority of the resulting documents? Now, if one
were to look up Vaccinations Not Dangerous
, how would that change your
results? What does this indicate about the correctness
of information
retrieval systems?