A Metric to Assist in Detecting International Phishing or Ransomware Cyberattacks
Authors: Wayne Patterson, Jeremy Blacksttone
Abstract: Over the past decade, the number of cyberattacks such as ransomware, phishing, and other forms of malware have increased significantly, as has the danger to innocent users. The ability to launch such devastating attacks is no longer limited to well-funded, highly structured organizations including government agencies whose missions may well include cyberattacks.The focus of our study is threats to an individual not from such highly organized institutions, but rather less organized cybercriminal organizations with limited resources.The Internet provides ample opportunities for such criminal organizations to launch cyberattacks at minimal cost. One tool for such lower-level criminal organizations is Google Translate (GT) needed to launch a cyberattack on a user in a relatively advantaged country such as the United States, United Kingdom, or Canada. It has been observed that many such attacks may originate in a lesser developed country (LDC), where the local language is a language not common persons in target countries, for example English.It is a reasonable assumption that informal cyberattackers may not have a command of English and to use English for an attack online they may require a mechanism, such as the no-cost GT.In previous work, a number of authors have attempted to develop an index to measure the efficiency or what might be called an ABA translation. This involves beginning with a test document in language A, then GT to translate into language B, then back again to A. The resulting original text is then compared to the transformation by using a modified Levenshtein distance computation for the A versions.The paper analyzes the process of determining an index to detect if a text has been translated from an original language and location, assuming the attack document has been written in one language and translated using GT into the language of the person attacked. The steps involved in this analysis include:a) Consistency: in order to determine consistency in the use of the ABA/GT process, the primary selection of test is compared with random samples from the test media;b) Expanded selection of languages for translation: prior work has established use of the technique for 12 language pairs. The current work extends analysis to a wider set of languages, including those reported as having the highest levels of cyberattacks.c) Back translation of selected languages: used to extend the quality of those translations are made.d) New language pairs are considered: by analyzing the countries and indigenous languages of the countries paired with the highest levels of cyberattack and the highest levels of cyberdefense, additional language pairs are added to this analysis;e) Comparison to prior results: results found in this paper are used for a proposed network for all language pairs considered in this analysis.The end product is a metric giving a probability of determining the original source language of the cyberattack as compared to the translation to the victim's language, with the expectation that this will allow for an increased likelihood of being able to identify the attackers.
Keywords: Cyberattack, linguistic analysis, Levenshtein distance
Cite this paper: