Computers and Technology

Yates' big file sorter

for this assignment, you will design and implement a sort program that still works correctly, even when there is not enough memory to store an entire input file. this will give you practice with collections, files, exceptions, and sorting algorithms.

program description

at runtime, your program should read the name of a "large" input file to sort, and the name to give the sorted output file.

the program will sort the lines in the input file, and save the sorted lines in the output file. your sort algorithm will be a hybrid of the mergesort algorithm and java's built-in sorting algorithm as a two-phase process, as described below.

phase 1: breaking the input file into manageable chunks

in the first phase, the sort algorithm will read in a fixed number of lines from the input file, and store them into an array. it should pick a manageable number that will fit into memory. next, it will sort that array using java's built-in arrays. sort method. then it will store the sorted array in a temporary file called "temp_0_0.txt".

the algorithm will repeatedly read in chunks of lines until it has read in the entire contents of the input file. each time it reads in a chunk of lines from the input file, it stores that chunk in an array, sorts the array, and saves the array in another temporary file ("temp_0_1.txt", "temp_0_2.txt", "temp_0_n. txt"). notice that it is reading in an amount that will fit into memory each time, so that it does not run out of memory.

after this first phase is complete, each of the "temp_0_i. txt" files is in sorted order, and together they contain all of the stuff that was in the input file. the second phase must merge the files together, while making sure that the merged files remain in sorted order.

phase 2: merging the chunks together

remember from class that the mergesort algorithm repeatedly merges two small sorted arrays into a larger sorted array. here, you will do the same, but instead repeatedly merge two small sorted files into a larger sorted file. take care that you only read in one line of each file into memory at a time don't read the entire files into memory, or you will run out of memory!

your sort algorithm should begin phase 2 by merging "temp_0_0.txt" and "temp_0_1.txt", and saving it to a new file called "temp_1_0.txt". next, it will merge "temp_0_2.txt" with "temp_0_3.txt", and save the merged file as "temp_1_1.txt". this will repeat until there are no more "temp_0_i. txt" files left. if there are an odd number of these files, the last one will have nothing to merge with. that's ok, it can be merged in later iterations. just copy it into the next available "temp_1_i. txt".

after merging pairs of the "temp_0_i. txt" files, the sort algorithm needs to begin merging pairs of the "temp_1_i. txt" files. it will begin by merging "temp_1_0.txt" and "temp_1_1.txt", and saving the result to "temp_2_0.txt". then, it will merge "temp_1_2.txt" and "temp_1_3.txt", and save the result to "temp_2_1.txt", and so on.

each time through the set of temporary files, the merging process cuts the number of temporary files roughly in half, because it merges two files into one. (i say "roughly" because there may be an odd number of files, in which case the last file does not get merged with anything.) this phase of the sorting must keep repeating, until there are only two temporary files left. when that happens, it should merge those temporary files, but instead of saving the result to another temporary file, it should save it to the output file. then the sort is finished.

guidelines for testing your program

you may find a reasonably large text file (~7 mb) here, which you may use to test your program. it contains aesop's fables, the complete works of shakespeare, mary shelley's frankenstein, and mark twain's the adventures of huckleberry finn.

the file is certainly small enough to fit into memory i didn't want to you to have to deal with an enormous file. however, you should choose a setting so that your program reads in less than the whole file at a time. at first, try reading in around 1/20 of the lines at a time into memory during the first phase. you should test your program with several different settings.

unit tests

write a junit test to make sure that your merge() function works correctly. be sure to test that it merges arrays that are the same size and two arrays that are not the same size.

answer
Answers: 1

Other questions on the subject: Computers and Technology

image
Computers and Technology, 22.06.2019 06:20, Masielovebug
In what kind of attack can attackers make use of millions of computers under their control in an attack against a single server or network availability confidentiality integrity identity automated attack software? those who wrongfully disclose individually identifiable health information can be fined up to what amount per calendar year? single most expensive malicious attack hipaa what are script kiddies? advanced persistent threat security manager security engineer what level of security access should a computer user have to do their job what process describes using technology as a basis for controlling the access and usage of sensitive data? cybercriminal
Answers: 1
image
Computers and Technology, 23.06.2019 10:50, whyidkmyself
Your friend kayla is starting her own business and asks you whether she should set it up as a p2p network or as a client-server network. list three questions you might ask to kayla decide which network to use and how her answers to those questions would affect your recommendation.
Answers: 2
image
Computers and Technology, 23.06.2019 22:30, delawdermia27
The output voltage of a power supply is assumed to be normally distributed. sixteen observations are taken at random on voltage are as follows: 10.35, 9.30, 10.00, 9.96, 11.65, 12.00, 11.25, 9.58, 11.54, 9.95, 10.28, 8.37, 10.44, 9.25, 9.38, and 10.85
Answers: 1
image
Computers and Technology, 24.06.2019 00:40, ndurairajownkpq
What social factors affect your health
Answers: 3
Do you know the correct answer?
Yates' big file sorter

for this assignment, you will design and implement a sort program...

Questions in other subjects: