Association Rules Mining and Statistic Test over Multiple Datasets on TCM Drug Pairs

Shang E; Duan J; Fan X; Tang Y; Ye L

doi:10.4172/2090-4924.1000126

Abstract

Association Rules Mining and Statistic Test over Multiple Datasets on TCM Drug Pairs

Shang E*, Duan J, Fan X, Tang Y, Ye L

Objective: TCM drug pair is consisted of two and only two drugs, which is the smallest drug group following special drug compatibility regulations. Formulae compatibility regulations are one of the most important problems in TCM clinical practice and modern research but still not quite resolved. TCM drug pair was a very suitable objects to discovery the complicated formulae compatibility regulations. This paper applied association rules mining to study the structural characters of TCM drug pairs find some special relationships between drugs. This study might give some help to the research on the formulae compatibility regulations. Methods: We presented an enhanced association rules mining method to find out the property associations between two drugs in TCM drug pairs. And a binominal statistic test was introduced to get the statistical significance of rules mined. The property data from the 625 drug pairs containing 347 drugs were collected and analyzed. As most association rules mining run only in single database, the new method was proposed to find rules over multiple databases (2 in this paper standing for the two drugs in TCM drug pairs) based on a first Apriori algorithm mining. Then statistic test was applied to filter out insignificant rules furthermore. Results: Apriori algorithm and the new method were applied to mine association rules on TCM drug pairs for comparison. The rules found by Apriori method showed false high support, part of which came from the property associations within one drug but not between the two drugs in TCM drug pairs. And Apriori method could not found the association of replicated property, such as liver - liver rules. The new method proposed could get the only associations between drugs even those replicated property rules. Some associations were mined with high supports and significances. Conclusion: This paper proposed an enhanced method to perform association rules mining over multiple databases. After comparison with Apriori algorithm the new method could just obtain the associations in which each item came from different database. The method was confirmed to be quite suitable on mining over multiple databases. The statistic test was also necessary to exclude false association rules.

International Journal of Biomedical Data MiningOpen Access

Abstract

Association Rules Mining and Statistic Test over Multiple Datasets on TCM Drug Pairs

International Journal of Biomedical Data Mining
Open Access