Strings, strings, strings! (cont.)
I've been asked which of the supplementary papers presented in this deck presented last week that you should read. I would recommend reading all of them, but I will not ask you on an exam about deep technical details from any of the papers. If you understand the takeaway summaries in the deck then you're good to go. I.e., my expectation is only that you're able to clearly articulate what the main ideas presented in the papers are.
That said, some further comments about the papers mentioned in that deck:
- This paper and this paper are directly relevant to solving Assignment B, so you'll need to understand what they're about in order to do that obligatory assignment.
- The purpose of this paper and this paper was to present how a large dictionary (set of strings) is typically represented in practice and worked with efficiently: It gets compiled down to a finite state automaton and packed into a contiguous byte buffer.
- The textbook talks about how to compute edit distance between two strings, but not how to efficiently compute the edit distance between a query string and a large dictionary to find the closest matches. You do not want to loop over every possible dictionary entry. This paper tells you can approach this, given that you have a trie representation of the dictionary.
- This paper is more informational than the others and intended only for those interested. You will not get asked about this paper on an exam, but might find it interesting if fast string algorithms are what floats your boat. It outlines how, assuming unit edit costs, edit distance computations between two strings can be computed through an orgy of bit-vector operations. This gives us a speed-up proportional to the machine word size compared to the traditional edit table representation.
Publisert 7. sep. 2020 22:06
- Sist endret 7. sep. 2020 22:07