Keywords : data mining

Data Mining Between Classical and Modern Applications: A Review

Ammar Thaher Yaseen Abd Alazeez

AL-Rafidain Journal of Computer Sciences and Mathematics, 2021, Volume 15, Issue 2, Pages 171-191
DOI: 10.33899/csmj.2021.170020

Data mining (DM) is an incredible innovation with extraordinary potential to help organizations centre around the main data in the information they have gathered about the conduct of their clients and likely clients. It finds data inside the information that inquiries and reports can't viably uncover. Overall, DM (to a great extent called information or data revelation) is the route toward analysing data according to substitute perspectives and summarizing it into significant information - information that can be used to assemble pay, diminishes costs, or both. DM writing computer programs is one of different logical gadgets for separating data. It grants customers to separate data from a wide scope of estimations or focuses, organize it, and summarize the associations perceived. In reality, DM is the path toward finding associations or models among numerous fields in enormous social datasets. Procedures used in DM measure come from a mix of computational strategies including Artificial Intelligence (AI), Statistics, Machine Learning (ML), and Database (DB) Systems. Aside from the centre techniques used to do the investigation, the cycle of DM can include different pre-handling ventures preceding executing the mining method. Also, a post-preparing stage is normally utilized to picture the aftereffects of the investigation (for example perceived examples or recovered data) in an instinctive and simple to-impart way. From a wide perspective, there are two significant standards of methods: expectation and information disclosure. It includes four sub-groups: a) Classification, Prediction and Regression, b) Clustering, c) Association Rule and Sequence Pattern Mining, and d) Outliers and Anomaly Detection. What's more, there are some generally new and energizing zones of information investigation, for example, spatial DM and graph DM that have been made conceivable through the structure squares of DM techniques. This survey not just advantages analyst to create solid examination subjects and distinguish gaps in the research areas yet additionally helps experts for data mining and Big Data (BD) software framework advancement.

Study the Relationship between the University Student and Teacher using the Principal Component Analysis and Genetic Algorithms

Sahar E. Mahmood

AL-Rafidain Journal of Computer Sciences and Mathematics, 2021, Volume 15, Issue 1, Pages 75-100
DOI: 10.33899/csmj.2021.168262

Multivariate data analysis is one of the popular techniques, and among them is the Principal Component Analysis, or PCA, is a dimensionality-reduction method which is the process of converting a large number of related variables to a smaller number of unrelated factors, that still contains most of the information in the large set. Therefore, any phenomenon that consist of a large group of variables that are difficult to treat with in their initial form. The process of the interpreting these variables become complex process, so reducing these variables to a smaller is easier to deal with which is the aspiration of every researcher working in the field of principal component analysis. In this research, a multivariate data collection process was carried out which are relates to the nature of education and the relationship between the university student and the teacher, then studying and analyzing by Principal component analysis model, which is a technique used to summarize and condense data through the use of bonding software SPSS,2020.
Thus, it will be illustrious that this research will fall into a concept Data Mining, and is also abbreviated, and then it is realized using genetic algorithms procedure, in latest version MATLAB 2019B, Application of Genetic Algorithms using simulation software with latest release MATLAB 2019, using the Multiple linear regression equation method.
Multiple linear regression procedure to find the arrangement of independent variables within each factor of the factors obtained, by calculating the weight of the independent variable (Beta). Overall results were obtained for the eigenvalues of the stored correlation matrix, and the study required a Statistical analysis (PCA) method, and by reducing the number of the variables without losing much information about the original variables. The goal is to simplify their understanding. The disclosure of its structure and interpretation, in addition to reaching a set of conclusions that were discussed in detail, In addition to important recommendation.

Comparative Studying for Opinion Mining and Sentiment Analysis Algorithms and Applications

Rana Z. Alobaidy; Ghaydaa A.A. Al-Talib

AL-Rafidain Journal of Computer Sciences and Mathematics, 2018, Volume 12, Issue 2, Pages 13-23
DOI: 10.33899/csmj.2018.163578

The amount of the available data increases the ability to analyze and understand. The internet revolution has added billions of customer’s review data in its depots. This has given an interest in sentiment analysis and opinion mining in the recent years. People have to depend on machines to classify and process the data as there are terabytes of review data in stock of a single product. So that prediction customer sentiments is very important to analyze the reviews as it not only helps in increasing profits but also goes a long way in improving and bringing out better products.  In this paper , we present a survey regarding the presently available techniques and applications  that appear in the field of opinion mining , such as , economy , security , marketing , spam detection , decision making , and elections expectation.

Hiding Sensitive Frequent Itemsets over Privacy Preserving Distributed Data

Alaa Jumaa; Sufyan T. F. Al-Janabi; Nazar A. Ali

AL-Rafidain Journal of Computer Sciences and Mathematics, 2013, Volume 10, Issue 1, Pages 91-105
DOI: 10.33899/csmj.2013.163427

Data mining is the process of extracting hidden patterns from data. One of the most important activities in data mining is the association rule mining and the new head for data mining research area is privacy of mining. Privacy preserving data mining is a new research trend in privacy data for data mining and statistical database. Data mining can be applied on centered or distributed databases. Most efficient approaches for mining distributed databases suppose that all of the data at each site can be shared.  Privacy concerns may prevent the sites from directly sharing the data, and some types of information about the data. Privacy Preserving Data Mining (PPDM) has become increasingly popular because it allows sharing of privacy sensitive data for analysis purposes.
In this paper, the problem of privacy preserving association rule mining in horizontally distributed database is addressed by proposing a system to compute a global frequent itemsets or association rules from different sites without disclosing individual transactions. Indeed, a new algorithm is proposed to hide sensitive frequent itemsets or sensitive association rules from the global frequent itemsets by hiding them from each site individually. This can be done by modifying the original database for each site in order to decrease the support for each sensitive itemset or association rule.  Experimental results show that the proposed algorithm hides rules in a distributed system with the good execution time, and with limited side effects. Also, the proposed system has the capability to calculate the global frequent itemsets from different sites and preserves the privacy for each site.

Evaluation of Clustering Validity

Rudhwan Yousif Sideek; Ghaydaa A.A. Al-Talib

AL-Rafidain Journal of Computer Sciences and Mathematics, 2008, Volume 5, Issue 2, Pages 79-97
DOI: 10.33899/csmj.2008.163987

Clustering is a mostly unsupervised procedure and the majority of the clustering algorithms depend on certain assumptions in order to define the subgroups present in a data set. As a consequence, in most applications the resulting clustering scheme requires some sort of evaluation as regards its validity.
            In this paper, we present a clustering validity procedure, which evaluates the results of clustering algorithms on data sets. We define a validity indexes, S_Dbw & SD, based on well-defined clustering criteria enabling the selection of the optimal input parameters values for a clustering algorithm that result in the best partitioning of a data set.
            We evaluate the reliability of our indexes experimentally, considering clustering algorithm (K_Means) on real data sets.
Our approach is performed favorably in finding the correct number of clusters fitting a data set.