Computers and Technology
Computers and Technology, 03.03.2020 01:28, 196336

In word segmentation, you are given as input a string of alphabetical characters ([a-z]) without whitespace, and your goal is to insert spaces into this string such that the result is the most fluent according to the language model.

a. Consider the following greedy algorithm: Begin at the front of the string. Find the ending position for the next word that minimizes the language model cost. Repeat, beginning at the end of this chosen segment.
Show that this greedy search is suboptimal. In particular, provide an example input string on which the greedy approach would fail to find the lowest-cost segmentation of the input.
In creating this example, you are free to design the n-gram cost function (both the choice of n and the cost of any n-gram sequences) but costs must be positive and lower cost should indicate better fluency. Note that the cost function doesn't need to be explicitly defined. You can just point out the relative cost of different word sequences that are relevant to the example you provide. And your example should be based on a realistic English word sequence — don't simply use abstract symbols with designated costs.

b. Implement an algorithm that, unlike greedy, finds the optimal word segmentation of an input character sequence. Your algorithm will consider costs based simply on a unigram cost function.
Before jumping into code, you should think about how to frame this problem as a state-space search problem. How would you represent a state? What are the successors of a state? What are the state transition costs? (You don't need to answer these questions in your writeup.)
Fill in the member functions of the SegmentationProblem class and the segmentWords function. The argument unigramCost is a function that takes in a single string representing a word and outputs its unigram cost. You can assume that all the inputs would be in lower case. The function segmentWords should return the segmented sentence with spaces as delimiters, i. e. ' '.join(words).
For convenience, you can actually run python submission. py to enter a console in which you can type character sequences that will be segmented by your implementation of segmentWords. To request a segmentation, type seg mystring into the prompt. For example:

>> seg
Query (seg):
this is not my beautiful house

Console commands other than seg — namely ins and both — will be used for the upcoming parts of the assignment. Other commands that might help with debugging can be found by typing help at the prompt.
Hint: You are encouraged to refer to NumberLineSearchProblem and GridSearchProblem implemented in util. py for reference. They don't contribute to testing your submitted code but only serve as a guideline for what your code should look like.
Hint: the final actions that ucs (a UniformCostSearch object) takes can be accessed through ucs. actions.

answer
Answers: 2

Other questions on the subject: Computers and Technology

image
Computers and Technology, 22.06.2019 19:30, ibrahimuskalel
Avariable definition defines the name of a variable that will be used in a program, as well as
Answers: 3
image
Computers and Technology, 23.06.2019 03:00, jarteria0
State 7 common key's for every keyboard
Answers: 1
image
Computers and Technology, 23.06.2019 09:50, tatumleigh04
Allison and her group have completed the data entry for their spreadsheet project. they are in the process of formatting the data to make it easier to read and understand. the title is located in cell a5. the group has decided to merge cells a3: a7 to attempt to center the title over the data. after the merge, allison points out that it is not centered and looks bad. where would the title appear if allison unmerged the cells in an attempt to fix the title problem?
Answers: 2
image
Computers and Technology, 23.06.2019 12:10, jefersina16
2. fabulously fit offers memberships for$35 per month plus a $50 enrollmentfee. the fitness studio offersmemberships for $40 per month plus a$35 enrollment fee. in how many monthswill the fitness clubs cost the same? what will the cost be?
Answers: 1
Do you know the correct answer?
In word segmentation, you are given as input a string of alphabetical characters ([a-z]) without whi...

Questions in other subjects:

Konu
Biology, 29.08.2020 17:01