Java Collections Framework and threads project
Order ID 53563633773 Type Essay Writer Level Masters Style APA Sources/References 4 Perfect Number of Pages to Order 5-10 Pages Description/Paper Instructions
Java Collections Framework and threads project
In this project, you will be working with input, output, classes in the Java Collections Framework and threads. You are expected to investigate the Java library and use the classes and methods in the Collections library as much as possible. You will also need to look into the File class to see how to use folders and files. I would encourage you to use the Scanner class to read in data from the files as it is very easy to use. However, a BufferedReader and a StringTokenizer will provide you with better runtimes for very large files.
In short, you are creating a word index in a few different ways. An index helps you find information in your files faster. An index can also be very useful when comparing how similar two documents are to one another. One way of determining how similar two documents are is to compare the number of uncommon words they share. This might be useful in a recommender type system. For example, if I really like a book that often contains the words California, surfing, sunshine and beach, then odds are good I will like another book that contains a similar number of those keywords.
You must implement the following steps:
- You will write a program that goes through a text file and creates a word index of every word in the file. The index will be the “page” that a particular word is found on. Since a text file only contains text and does not contain any metadata, “pages” will be created depending on the number of characters read in so far (not including delimiters). The number of characters that define a page will be specified at runtime.
- The user will specify 3 command-line arguments. The first argument is the folder where all of the text files are saved, the second argument is the output folder that the output file(s) will be stored to and the third argument is the number of characters that represent a page.
- Assuming the java file is called Index.java, the following command:
java Index myFolder outputFolder 100
would indicate that all input files are stored in a folder called myFolder that is in the same folder as your Java class file. All of the input text files are in myFolder. All text files that you need to account for will have an extension of .txt. Only *.txt files will be in the input folder. The second argument is what folder you will store your output files to. You can assume this empty output folder has already been created.
The third argument is the number of characters on a page. Assume that number is K. The number of characters on each page goes up to but doesn’t exceed K actual characters. For instance, if K is 100 and you have read in 98 actual characters so far and the next word is badger, the word badger would be the first word on the next page (do not split up the word and put part of it on one page and the other part on another page). To make this a bit easier, you must ignore delimiter characters with respect to the number of characters on a page. You can assume that no 1 word will be longer than the number of characters on a page. In this example, no word will be more than 100 characters.
- Create a word index of each file and store that index into an output file. If your input file is called a.txt, then your output file must be called a_output.txt and your output will be stored in the output folder that was specified. You will create an output file for each input file. Your word index should be created in the following way:
- Read in all words. A word is any consecutive sequence of letters, numbers, apostrophes, special symbols, etc. The only things that delimit words are a space, tab and newline (i.e. whitespace). In my sample files, I am using the default delimiters specified by the Scanner class. If you use something different, you may get different results. You should store the words into one of the following Collections: TreeSet, HashSet, TreeMap or HashMap. If you don’t, your code will likely run very slow. All words are case insensitive. For example, the words cat, Cat, CAT and cAT are all considered the same word. Punctuation and other symbols are not to be filtered out. Therefore, cat: and cat are two separate words (note the colon after the first cat).
- Along with reading in all of the words, remember which “page” the word was on. We will start counting from page 1.
- For each file, after you have read in all of the words, you should write out your word index to that files output file. You should write out each word that appears in the file and for each word that you write out, you must also write what page(s) that word appears on. You must write out the words in alphabetical order and only put 1 word per line. Assume the word cat appears on pages 4, 10 and 16, your output should be formatted in the following fashion: cat 4, 10, 16
In other words, it is the word, followed by a space, followed by the page(s) that word appeared on where each page is separated by a comma (see sample input/output). Page numbers must appear in ascending order. Words that appear multiple times on the same page should not show up multiple times in the final output. For example, if the word cat appears on page 4 a total of 3 times, page 4 should only show up once in the output.
- You must solve this project in 3 different ways:
- (10 pts) The first way is without using threads. You must call your file that does not use threads Index.java. You must time your code and determine how long it took to solve without using threads. Your program will create the appropriate files and then print out 1 thing to the terminal window: the amount of time it took to execute in milliseconds.
- (20 pts) Once you have your solution to 4a, modify it so that it uses threads in some fashion. The most natural way to use threads is to create a new thread for each file you read in. If you are testing this on a machine with multiple cores, you should notice a significant decrease in time (assuming you are using large enough files). You must call your file that uses threads IndexRunner.java. Note that this is not saying you are only allowed to create 1 file. Your main method must be in IndexRunner.java but you can create as many files as you want. Your program will create the appropriate files and then print out 1 thing to the terminal window: the amount of time it took to execute in milliseconds.
- (20 pts) In short, you are creating a global word index. This differs from the previous question in that you need to sendyour results back to the master thread so that the master thread can produce the output. Only the master thread is allowed to produce any output for this part. You must name this master file GlobalRunner.java. You can create more than 1 Java file. Your main method must be in GlobalRunner.java but you can create as many Java files as you want. Create a word index for each file and store each index into a single output file called output.txt that is stored in the folder specified. The output will be a comma separated file. The first line of your output file must be this heading: Word, first.txt, second.txt, third.txt, xfourth.txt, etc.In other words, the first word should be Word. This is followed by the names of the input files in alphabetical order.
Your outputs will be combined. The words will be case insensitive and will appear in alphabetical order. For example, assume the word cat is in a.txt on pages 2, 4, 6, is in b.txt on 3, 5, 7 and is in c.txt on 2, 5, 7. Your output line for cat would be: cat, 2:4:6, 3:5:7, 2:5:7 Since commas are used to delimit the files, colons will be used to delimit page numbers. If the word dog is in a.txt on pages 2 and 5, is not in b.txt but is in c.txt on page 8, then your output line for dog would be: dog, 2:5, , 8
You are welcome to use whatever callback or synchronization solution you want but you have to make sure that only the master thread creates the output. No worker thread is allowed to print anything. Your program will create the appropriate output file and then print out 1 thing to the terminal window: the amount of time it took to execute in milliseconds.
A few things to be aware of:
- Make sure you are doing this on an EC2 Ubuntu server. Solving this problem on a Windows machine or some other OS may give you answers that are not consistent with my answers.
- When your code is tested, we will only have your code, an arbitrary input folder and an arbitrary output folder created. Do not hardcode specific paths.
- You must choose appropriate data structures (e.g. TreeMap, HashSet, etc) for each problem. If you don’t, your code may take a long time to run. My solutions take a few seconds to run on an EC2 instance. For each problem using the input files of moby.txt, wap.txt and huckFinn.txt, your code must complete in under 30 seconds or it will be considered incorrect.
Sample Input/Output
You can find sample input and output files at (see README files for how to run them):
For problems 4a and 4b:
http://www.uwosh.edu/faculty_staff/krohne/ds730/JavaProj.zip
For problem 4c:
http://www.uwosh.edu/faculty_staff/krohne/ds730/MoreJavaProj.zip
Producing identical output to the posted output does not necessarily mean everything is correct. For example, if you ignore the multithreading requirements of 4b or ignore the printing requirements of 4c, your program may produce the correct output but still be incorrect.
What to Submit
You must submit all versions of your solution (i.e. steps 4a, 4b and 4c) in a single zipped file. When you are finished, submit a zipped file called p4.zip that contains your Java files and upload it to the dropbox.
RUBRIC
QUALITY OF RESPONSE NO RESPONSE POOR / UNSATISFACTORY SATISFACTORY GOOD EXCELLENT Content (worth a maximum of 50% of the total points) Zero points: Student failed to submit the final paper. 20 points out of 50: The essay illustrates poor understanding of the relevant material by failing to address or incorrectly addressing the relevant content; failing to identify or inaccurately explaining/defining key concepts/ideas; ignoring or incorrectly explaining key points/claims and the reasoning behind them; and/or incorrectly or inappropriately using terminology; and elements of the response are lacking. 30 points out of 50: The essay illustrates a rudimentary understanding of the relevant material by mentioning but not full explaining the relevant content; identifying some of the key concepts/ideas though failing to fully or accurately explain many of them; using terminology, though sometimes inaccurately or inappropriately; and/or incorporating some key claims/points but failing to explain the reasoning behind them or doing so inaccurately. Elements of the required response may also be lacking. 40 points out of 50: The essay illustrates solid understanding of the relevant material by correctly addressing most of the relevant content; identifying and explaining most of the key concepts/ideas; using correct terminology; explaining the reasoning behind most of the key points/claims; and/or where necessary or useful, substantiating some points with accurate examples. The answer is complete. 50 points: The essay illustrates exemplary understanding of the relevant material by thoroughly and correctly addressing the relevant content; identifying and explaining all of the key concepts/ideas; using correct terminology explaining the reasoning behind key points/claims and substantiating, as necessary/useful, points with several accurate and illuminating examples. No aspects of the required answer are missing. Use of Sources (worth a maximum of 20% of the total points). Zero points: Student failed to include citations and/or references. Or the student failed to submit a final paper. 5 out 20 points: Sources are seldom cited to support statements and/or format of citations are not recognizable as APA 6th Edition format. There are major errors in the formation of the references and citations. And/or there is a major reliance on highly questionable. The Student fails to provide an adequate synthesis of research collected for the paper. 10 out 20 points: References to scholarly sources are occasionally given; many statements seem unsubstantiated. Frequent errors in APA 6th Edition format, leaving the reader confused about the source of the information. There are significant errors of the formation in the references and citations. And/or there is a significant use of highly questionable sources. 15 out 20 points: Credible Scholarly sources are used effectively support claims and are, for the most part, clear and fairly represented. APA 6th Edition is used with only a few minor errors. There are minor errors in reference and/or citations. And/or there is some use of questionable sources. 20 points: Credible scholarly sources are used to give compelling evidence to support claims and are clearly and fairly represented. APA 6th Edition format is used accurately and consistently. The student uses above the maximum required references in the development of the assignment. Grammar (worth maximum of 20% of total points) Zero points: Student failed to submit the final paper. 5 points out of 20: The paper does not communicate ideas/points clearly due to inappropriate use of terminology and vague language; thoughts and sentences are disjointed or incomprehensible; organization lacking; and/or numerous grammatical, spelling/punctuation errors 10 points out 20: The paper is often unclear and difficult to follow due to some inappropriate terminology and/or vague language; ideas may be fragmented, wandering and/or repetitive; poor organization; and/or some grammatical, spelling, punctuation errors 15 points out of 20: The paper is mostly clear as a result of appropriate use of terminology and minimal vagueness; no tangents and no repetition; fairly good organization; almost perfect grammar, spelling, punctuation, and word usage. 20 points: The paper is clear, concise, and a pleasure to read as a result of appropriate and precise use of terminology; total coherence of thoughts and presentation and logical organization; and the essay is error free. Structure of the Paper (worth 10% of total points) Zero points: Student failed to submit the final paper. 3 points out of 10: Student needs to develop better formatting skills. The paper omits significant structural elements required for and APA 6th edition paper. Formatting of the paper has major flaws. The paper does not conform to APA 6th edition requirements whatsoever. 5 points out of 10: Appearance of final paper demonstrates the student’s limited ability to format the paper. There are significant errors in formatting and/or the total omission of major components of an APA 6th edition paper. They can include the omission of the cover page, abstract, and page numbers. Additionally the page has major formatting issues with spacing or paragraph formation. Font size might not conform to size requirements. The student also significantly writes too large or too short of and paper 7 points out of 10: Research paper presents an above-average use of formatting skills. The paper has slight errors within the paper. This can include small errors or omissions with the cover page, abstract, page number, and headers. There could be also slight formatting issues with the document spacing or the font Additionally the paper might slightly exceed or undershoot the specific number of required written pages for the assignment. 10 points: Student provides a high-caliber, formatted paper. This includes an APA 6th edition cover page, abstract, page number, headers and is double spaced in 12’ Times Roman Font. Additionally, the paper conforms to the specific number of required written pages and neither goes over or under the specified length of the paper.
GET THIS PROJECT NOW BY CLICKING ON THIS LINK TO PLACE THE ORDER
CLICK ON THE LINK HERE: https://essaysolver.com/orders/ordernow
You Can Also Place the Order In www.perfectacademic.com/orders/ordernow / www.essaysolver.com/orders/ordernow
Do You Have Any Other Essay/Assignment/Class Project/Homework Related to this? Click Here Now [CLICK ME] and Have It Done by Our PhD Qualified Writers!!
Tired of getting an average grade in all your school assignments, projects, essays, and homework? Try us today for all your academic schoolwork needs. We are among the most trusted and recognized professional writing services in the market.
We provide unique, original and plagiarism-free high quality academic, homework, assignments and essay submissions for all our clients. At our company, we capitalize on producing A+ Grades for all our clients and also ensure that you have smooth academic progress in all your school term and semesters.
High-quality academic submissions, A 100% plagiarism-free submission, Meet even the most urgent deadlines, Provide our services to you at the most competitive rates in the market, Give you free revisions until you meet your desired grades and Provide you with 24/7 customer support service via calls or live chats.