OSDev.org https://forum.osdev.org/ |
|
The Most Common Subject Words In This Forum https://forum.osdev.org/viewtopic.php?f=6&t=13730 |
Page 1 of 1 |
Author: | Kevin McGuire [ Fri Apr 13, 2007 9:06 pm ] |
Post subject: | The Most Common Subject Words In This Forum |
The Most Common Subject Words In This Forum The entire development forum is used to extract over 8900 thread titles. Each titles has the words broken out by spaces. Each word only allows alphanumerical characters and all uppercase letters are converted into lower case. While this happens each word is counted from zero to one. So each of the counts to the right of the word are really +1. 1. to * 718 2. in * 649 3. os * 616 4. and * 614 5. a * 562 6. kernel * 491 7. the * 449 8. with * 442 9. problem * 398 10. help * 391 11. c * 367 12. how * 364 13. memory * 318 14. mode * 301 15. i * 281 16. for * 267 17. question * 236 18. of * 235 19. on * 213 20. bochs * 195 21. my * 191 22. floppy * 188 23. paging * 184 24. driver * 181 25. pmode * 177 26. about * 176 27. is * 175 28. code * 168 29. system * 166 30. grub * 164 31. from * 160 32. what * 159 33. problems * 152 34. an * 141 35. file * 141 36. keyboard * 133 37. not * 130 38. need * 127 39. gcc * 124 40. new * 120 41. do * 118 42. interrupt * 116 43. can * 116 44. questions * 115 45. idt * 113 46. boot * 113 47. error * 112 48. stack * 107 49. multitasking * 107 |
Author: | AndrewAPrice [ Fri Apr 13, 2007 10:07 pm ] |
Post subject: | |
Cool! |
Author: | nick8325 [ Mon Apr 16, 2007 3:34 pm ] |
Post subject: | |
I like that "kernel" is more common than "the" |
Author: | Alboin [ Mon Apr 16, 2007 3:42 pm ] |
Post subject: | |
nick8325 wrote: I like that "kernel" is more common than "the"
Well, at least we know we have our linguistic priorities straight. |
Author: | Kevin McGuire [ Mon Apr 16, 2007 4:26 pm ] |
Post subject: | |
You guys have any ideas what we could do with extracting data from the forums? I got board and did it, but I figure there could be a useful idea in it somewhere.. |
Author: | chase [ Mon Apr 16, 2007 5:08 pm ] |
Post subject: | |
Filter with a list of the most common english words and get a list of the most frequent of OS development subjects. Could be used to figure out where wiki articles should be expanded or created. |
Author: | Kevin McGuire [ Tue Apr 17, 2007 8:00 pm ] |
Post subject: | forumdown |
I will give it a try. It actually seems a little more complicated then what you would think with the initial thought, but I have confidence that it is possible. I got a initial tool written. A program forumdown which will download a entire sub forum and store the linked list structures of threads and posts into a local data file that can be loaded. I did a little thinking. I came up with the conclusion that I can use a website that provides a dictionary, thesaurus, and encyclopedia to allow some degree of spell checking and mapping of similar words such as IDT and Interrupt Descriptor Table and allow some sort of primitive comprehension of sentences to get an idea of exactly what people are talking about in the posts. I will try to use this site to provide the English word database, and add some cache to prevent it from taking a excess amount of time. http://www.reference.com/browse/ http://kmcguire.jouleos.galekus.com/dok ... orum_tools Lets see if I can get the other part working. |
Author: | Kevin McGuire [ Sat Apr 21, 2007 10:25 pm ] |
Post subject: | |
A sprocket, two gears, and some strange gooey gel came out my head. I think I was thinking too hard. This might be more than I asked for. I got to get this kernel finished. |
Page 1 of 1 | All times are UTC - 6 hours |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |