Beskjeder
When I listed the curriculum I originally wrote:
All chapters from the textbook that have been covered by the lectures: Chapters 1-9, 14,15, 19-21.
But I forgot to list Chapter 13, which has indeed been covered by a lecture and is also the basis for one of the obligatory exercises! So this should be corrected to:
All chapters from the textbook that have been covered by the lectures: Chapters 1-9, 13-15, 19-21.
It will be possible to reach me during the final exam ("digital tr?sterunde") if you have any clarification questions regarding the problem sets on the exam: I will attend a Zoom meeting (details below) exactly one hour after the start of the exam. Zoom's waiting room function will be used so that only one student is allowed to attend the meeting at the same time.
Aleksander ?hrn is inviting you to a scheduled Zoom meeting. Topic: IN3120/IN4120: Eksamensrunde Time: Dec 18, 2020 04:00 PM Amsterdam, Berlin, Rome, Stockholm, Vienna Join Zoom Meeting https://uio.zoom.us/j/69789890042 Meeting ID: 697 8989 0042 Documentation on how to use Zoom can be found here: /english/services/it/phone-chat-videoconf/zoom/ One tap mobile +46844682488,,69789890042# Sweden +46850500828,,69789890042# Sweden Dial by your location +46 8 4468 2488 Sweden +46 8 5050 0828 Sweden +45 32 70 12 06 Den...
The following is considered to be part of the course curriculum:
- All chapters from the textbook that have been covered by the lectures: Chapters 1-9, 14,15, 19-21.
- All slides used as part of the lectures.
- All supplementary papers discussed in the lectures. You will not be expected to know technical minutiae from these papers, but should be able to retell the gist of what those papers are about and their basic ideas.
- All the obligatory exercises.
The science fair is not considered as part of the curriculum, i.e., topics presented at the science fair that are not covered by the items above will not appear on the exam.
The exam will be is an open-book exam, i.e., all aids (textbook, online resources, notes, and so on) are allowed. It is strictly forbidden to collaborate or communicate with others about the exam during the exam. You may be randomly selected for a conversation to check ownership of your answer [https://www.mn.uio.no/om/hms/koronavirus/kontrollsamtale/]. This conversation does not affect the grading or grade, but may result in IFI opting to pursue a case for cheating. You can read more about what is considered cheating on UiO’s website [/om/regelverk/studier/studier-eksamener/fuskesaker/]. Moreover, information on the website about exams at MN in the autumn 2020 applies [https://www.mn.uio.no/om/hms/koronavirus/eksamen-2020.html].
To complement the slides I used during the lecture on October 29th, here are some additional slides on web search and link analysis. They cover the same topics and chapters from the textbook that I covered during the lecture, but might supply additional context and clarifications.
In today's topic medley lecture I went through the three slide decks found here, here, and here. A couple of papers were also referenced, which you can deep-dive into for additional depth and color:
- An extension of Bloom filters is cuckoo filters, which allows for deletions....
En e-post med info om eksamen med feil tidspunkt ble sendt ut i dag. En ny e-post med ritkig tidspunk er sendt ut.
Eksamen i IN3120/IN4120 blir 18.desember 15.00 - 19.00.
Beklager feilen!
Mhv
Studadm, IFI.
Due to a scheduling conflict I will have to delay the lecture this coming Thursday from 10:15-12:00 to 11:15-13:00. That is, this coming Thursday the lecture will commence one hour later than originally scheduled.
The IN4120 science fair will take place on November 12th, looking forward to it! Some practical information:
- Each group should spend no more than 10 minutes on presenting their topic. I'll be a time cop.
- Please mail me all your presentations by November 11th, i.e., the day before the science fair. I will then upload them to the course GitHub repository, so that everyone has access to them.
- When mailing me your presentations, please send me a PDF file named science-fair-n.pdf, where n is your group number.
- Realistically, given the 10-minute limit, when presenting your topic you will not be able to go through more than, say, 3 slides or so. Your PDF file above may be longer and contain more details/examples than what you have time to present, if you want.
- There are no less than 14 groups, and we only have a 10:15-12:00 slot s...
Tomorrows group session has been moved and will be from 17:15 to 19:00 instead of 12:15 - 14:00.
-Markus S. H.
To supplement today's lecture on "learning to rank" (i.e., looking at document ranking as an ML problem) here is a good overview paper which has been added to the course's GitHub repository.
The solution to assignment C is published on Github in the "solutions" folder.
The solution for assignment B is published on Github repo in the "solutions" folder.
I will be going through assignment D during the group lesson tomorrow. If you have not started on assignment D or have any questions, consider attending the group lesson :)
Haiyue Chen
The solution for assignment A is published on Github repo in the "solutions" folder.
The old version of tests for assignment C tested code from assignment D. Please pull the newest version from Github before you start working on assignment C.
Next Thursday I'll cover the topic of index compression. Applying good compression techniques to an inverted index yields a number of performance benefits in practice. Index compression is covered by this chapter in the textbook, and I'll additionally use this deck to illustrate two integer compression techniques not mentioned in the textbook: Simple9 and PFOR-DELTA. This paper is very implementation-oriented, but its Related Work section gives a good summary of several families of compression algorithms, if you are interested.
In today's lecture I mentioned MapReduce while discussing distributed indexing. Although the textbook covers this already, two supplementary papers can be found below. You can read them if you're interested and want additional detail and depth beyond what's already covered by the textbook.
- This paper is the original MapReduce paper, as originally published by Google.
- This paper describes a similar system, as used internally in Microsoft.
I've been asked which of the supplementary papers presented in this deck presented last week that you should read. I would recommend reading all of them, but I will not ask you on an exam about deep technical details from any of the papers. If you understand the takeaway summaries in the deck then you're good to go. I.e., my expectation is only that you're able to clearly articulate what the main ideas presented in the papers are.
That said, some further comments about the papers mentioned in that deck:
- This paper and...
On Thursday this week I'll be talking about the topics covered in Chapter 3 of the textbook, which addresses some string processing algorithms for tolerant retrieval and how to represent sets of strings. I will also deviate a bit from the book and pull in additional topics mentioned in some of the supplementary papers that have been distributed: In particular, I will discuss what's mentioned in this slide deck.
Miscellaneous supplementary links related to today's lecture:
- This paper gives a more in-depth description of skip lists, for those interested in going beyond what's mentioned in the textbook.
- You can find some source code here, if you're interested in learning more about the details of the Porter stemmer and its heuristics.
- See here for more information about Snowball stemmers for various languages, including online demos.
- I mentioned Double Metaphone as an example of a phonetic algorithm, besides Soundex. You can find some example Java source code...
Padlet (replacement for piazza) for this course is available here: Link
Here's just a quick reminder:
both group sessions are digital, and hosted at the same address which can be found here
(The second zoom invite, below "Gruppetimer")
- Markus S. H.
As mentioned today, please let me know if you are unable to follow the lectures unless they are in English. Thanks!
All lectures will be recorded and made available from here. The recording of today's lecture is now available.
If you for some reason object to or do not consent to being included in a recording, then you can choose to not speak during the lecture and instead submit questions via the chat or offline.