Information Technology Reference
In-Depth Information
5.
Next it will call FontMapper module with the regional language raw txt string
to its argument.
6.
Now, it will store the Font Mapped Regional Language Raw Texts into
Text Documents and save them into Directory structure, specified by Output-
FolderPath.
3.3 Recursivetraverse (Filepath)
This function in the algorithm calls itself recursively while attempting a folder entry
else call WebPageVerifier Module of the algorithm.
Arguments: This function will take the FilePath as its input and returns only Web-
Pages to its invoker module.
1.
Store all the list of files and subfolder list permanently in array structure.
2.
Store the number of total elements within it.
3.
For each element of the array it will loop till Step 6.
4.
It will check whether the element is a Directory entry or not; if yes then it will
call itself with the new encountered directory entry as argument.
5.
If it is not a directory entry then it will call WebPageVerifer module with the
encountered file entry as the argument.
6.
If WebPageVerifier module returns true then it will return the File entry to its
invoker module, i.e., Process Module.
7.
Return.
3.4 Webpageverifier (Filepath)
This function is used in the algorithm to verify whether a file entry follows specified
extensions or not. So, this can be used to separate only Web-pages from all other
associative site documents like image files, script files, style-sheets and many more.
Arguments: This function will take a file entry as its argument and returns True or
False depending on file extension.
1.
It will calculate the position of last occurrence of '.' in the FilePath.
2.
Now from this point till end of the FilePath denotes the File Extension.
3.
It checks whether the extension extracted equals to .htm, .html or other web
file extensions.
4.
If the extension is same as of possible web extensions then returns True, else
returns False.
3.5 Rawtextextractor (Webpage)
This function is used to extract raw text of regional languages from the web page file
specified by some regular expression and removing all the html tags.
Arguments: This function takes the Web-page file as its argument and returns the
extracted raw text to its invoker module as string.
Search WWH ::




Custom Search