การจำแนกประเภทข้อความภาษาไทยบนสื่อออนไลน์ในเชิงบวกและลบด้วยคลังคำหยาบ

ณัฐาศิริ เชาว์ประสิทธิ์

Please use this identifier to cite or link to this item: https://rsuir-library.rsu.ac.th/handle/123456789/1055

Title:	การจำแนกประเภทข้อความภาษาไทยบนสื่อออนไลน์ในเชิงบวกและลบด้วยคลังคำหยาบ
Other Titles:	Thai message Classification For Online Media in Positive and Negative polarity using rude words corpus
Authors:	ณัฐาศิริ เชาว์ประสิทธิ์
metadata.dc.contributor.advisor:	สมชาย เล็กเจริญ
Keywords:	สื่อออนไลน์;ภาษาไทย -- คำหยาบ;ภาษาไทย -- การใช้ภาษา -- แง่สังคม
Issue Date:	2560
Publisher:	มหาวิทยาลัยรังสิต
Abstract:	การใช้คําหยาบภาษาไทยบนสื่อออนไลน์ ถือได้ว่าเป็นเรื่องละเอียดอ่อนมีผลต่อ อารมณ์ ความรู้สึก ซึ่งทําให้ผู้พบเห็น หรือบุคคลที่ถูกกล่าวหาเสื่อมเสีย ถึงขั้นฟ้องร้องกันมากมาย ถ้าร้ายแรงมากอาจทําให้เกิดการฆาตกรรม หรือฆ่าตัวตายได้ ซึ่งเป็นปัญหาอย่างมากสําหรับผู้ดูแล เว็บไซต์ งานวิจัยนี้มีวัตถุประสงค์เพื่อศึกษาการจําแนกประเภทข้อความภาษาไทยในเชิงบวก และลบด้วยคลังคําหยาบ โดยพัฒนาโมเคลุสกัดคําหยาบ หรือตรวจจับคาหยาบด้วยการปรับปรุงคลัง คําหยาบเพื่อให้การตรวจจับคํามีค่าความแม่นยําสูงที่สุด โดยการหาค่าความถี่ของคําสําคัญด้วยการ ใช้เทคนิค TEICF (Term Frequency Inverse Class Frequency) และศึกษาเปรียบเทียบโมเดลการ สกัดคําหยาบด้วยอัลกอรีทีมทั้งหมด 6 อัลกอริมในเทคนิคการจําแนกประเภทข้อมูล ได้แก่ ต้นไม้ ตัดสินใจ (Decision Tree) เคเนียร์เรสเนเบอร์ (K-Nearest Neighbors) นาอีฟเบย์ (Naive Bayes) การ ถดถอยโลจิสติก (Logistic Regression) ซัพพอร์ตเวกเตอร์แมชชีน (Support Vector Machines) และ โครงข่ายประสาทเทียม (Neural Network) ในงานวิจัยนี้ ทําการทดลองกับข้อความจากกระดาน สนทนาที่ปรากฏในสื่อออนไลน์ ผลการทดลองพบว่า การพัฒนาโมเดลสกัดคํา หรือตรวจจับคํา หยาบโดยการกดคําศัพท์ในคลังคําหยาบสามารถให้ค่าความแม่นยํามากที่สุด และโมเดลการสกัดคํา หยาบด้วยการจําแนกประเภทข้อความนั้น พบว่า การถดถอยโลจิสติก (Logistic Regression)ให้ค่า ความถูกต้องมากที่สุด และค่าความคลาดเคลื่อนเฉลี่ยน้อยที่สุด สามารถเป็นโมเคลสําหรับการสกัด คําหยาบข้อความภาษาไทยบนสื่อออนไลน์ได้แม่นยํา อีกทั้งมีรูปแบบการวิเคราะห์ข้อความที่ง่ายต่อ ความเข้าใจมากกว่าเทคนิคอื่นๆ
metadata.dc.description.other-abstract:	Using Thai rude words on online media is one of the delicate issue to motion and sensation of the spectator or accused that cause the impeachment and might be severe to make a commit suicide. This is one of the most importance problems for the website manager. This research purposed to study the classification of Thai messages in both of positive and negative polarities using rude word corpus by development of the extraction rude word model or investigation Thai rude word via the improvement of rude word corpus to obtain the most accuracy. The important word frequency was determined by using Term Frequency Inverse Class Frequency techniques or TFICF. The extraction rude word models were comparatively studied with 6 algorithms, Decision Tree, K-Nearest Neighbors (K-NN), Naive Bayes, Logistic Regression, Support Vector Machines, and Natural Network. This work has been tested with messages from the web board that appeared on the online media. The results show that development of the extraction rude word model or investigation of rude word by decreasing rude word from dictionary gave the most accuracy. Meanwhile, the extraction rude word model with classification types of message show Logistic Regression the most accuracy with the lowest average error value, which can be used as a good accuracy inodel for extraction Thai rude word from online media. Including, it had an easier operation patterns than other techniques.
Description:	วิทยานิพนธ์ ( วท.ม. (เทคโนโลยีสารสนเทศ)) -- มหาวิทยาลัยรังสิต, 2560
metadata.dc.description.degree-name:	วิทยาศาสตรมหาบัณฑิต
metadata.dc.description.degree-level:	ปริญญาโท
metadata.dc.contributor.degree-discipline:	เทคโนโลยีสารสนเทศ
URI:	https://rsuir-library.rsu.ac.th/handle/123456789/1055
metadata.dc.type:	Thesis
Appears in Collections:	ICT-IT-M-Thesis

Files in This Item:

File	Description	Size	Format
NATASIRI CHOWPRASITH.pdf		5.46 MB	Adobe PDF	View/Open

Show full item record