I have an idea for a small research paper; I wonder if you or one of your research members would be interested in helping and supporting this idea in at any research centers that you collaborate with.


The idea basically is about hash functions and files indexing for search engine, AI, media recognition,...etc.


The hash function such as SHA, MD5,...etc gives you the hash of the created files. For example, if you have identical files contents but created in different times, you will have to different hashes from the existing hash functions.


The idea is to read the bytes for each file format and give its hash. In case, we have two different files with the same contents, we will end up with the same hash.


This will make it easy when we compare and contrast files during searches. It is basically like adding a special case in the search to enhance accuracy. Similar to the different implementation of quick sort algorithm; if the collection items are less than a threshold a non-recursive algorithm is used like insertion, selection, or bubble sort. Even though the new implementation uses dual-pivot algorithm and merge sort algorithm as improvement for parallelism, the non-recursive algorithm for few items in collections is preferable.

I want this paragraph to be found in the cyber no matter what application that it was written by or which file format it has PDF,DOCX,JPG,..ect.


We need to extract the paragraph from
http://malkahtani.com/par.psd http://malkahtani.com/par.jpg http://malkahtani.com/par.docx http://malkahtani.com/par.pdf


If we hash the files we get

c59eec56a7811901bedada3529527b27, E:\par.docx


68a438f0cb16b28d3393e213560ee899, E:\par.jpg


c4b3c418ff0ec35ab4a5c40ec2a664b9, E:\par.pdf


ab7a1755b7813ef61a18de33a9d39fe0, E:\par.psd


Thus, we need to extract the paragraph or data and hash it then when you search about the same file, we hash its content and search for hash first to get O(1) in the search when using hash map data structure for searching with hash.


To have an idea about what I mean. Check this photo ME or this ME


On of them is the same as the one in my LinkedIn profile LinkedIN http://linkedin.com/in/mohammad-alkahtani-856973185


I uploaded the photo but I was reformatted to conform to LinkedIn style when searching for the same photo in image reserve search, you will get many images but if we used the hash for both images you, will get the exact image and the search engine will execute the AI(Artificial Intelligent) and ML(machine learning) to search for other images that has similarities.




Sincerely,
Mohammad Alkahtani