Dorota Bielińska-Wąż, Piotr Wąż* and Damian Panas
The aim of the studies is to show that graphical bioinformatics methods are good tools for the description of genome sequences of viruses. A new approach to the identification of unknown virus strains is proposed.
Methods: Biological sequences have been represented graphically through 2D and 3D-Dynamic Representations of DNA/RNA Sequences - theoretical methods for the graphical representation of the sequences developed by us earlier. In these approaches, some ideas of the classical dynamics have been introduced to bioinformatics. The sequences are represented by sets of material points in 2D or 3D spaces. The distribution of the points in space is characteristic of the sequence. The numerical parameters (descriptors) characterizing the sequences correspond to the quantities typical for classical dynamics.
Results: Some applications of the theoretical methods have been reviewed briefly. 2D-dynamic graphs representing the complete genome sequences of SARS-CoV-2 are shown.
Conclusion: It is proved that the 3D-Dynamic Representation of DNA/RNA Sequences, coupled with the random forest algorithm, classifies successfully the subtypes of influenza A virus strains.
Graphical bioinformatics; 2D and 3D-Dynamic Representations of DNA/RNA Sequences; supervised learning; machine learning; Random Forest; Boruta algorithm.
Department of Radiological Informatics and Statistics, Medical University of Gdańsk, 80-210 Gdańsk, Department of Nuclear Medicine, Medical University of Gdańsk, 80-210 Gdańsk, Department of Radiological Informatics and Statistics, Medical University of Gdańsk, 80-210 Gdańsk