
This is the third article in the author's series "Learning Python in practice".
There is such a problem - when there is a "backup" (well, we know what it really is) a Google Chrome cookie in txt format, but in order to import them into the browser again, you need to convert their file in json format. There is a PHP script on the net, but it is inconvenient to use it, since you need a running web server, locally or remotely. In addition, this script has a problem - it only works with one file.
There is also a web service for conversion, but I don’t really want to trust my own to someone unknown. The task is not difficult, but painstaking. Let's get to the solution.
This is how the cookie backup line in txt looks like:
You must reply before you can see the hidden data contained here.
You must reply before you can see the hidden data contained here.
You must reply before you can see the hidden data contained here.
You must reply before you can see the hidden data contained here.
You must reply before you can see the hidden data contained here.
On the next line files = find_files(work_dir) we call the find_files function, where we pass the path as an argument, and assign the result of the execution to the dirs variable.
Then we call the handle_files(dirs) function, which actually controls the further work of the program.
Consider the find_files function:
You must reply before you can see the hidden data contained here.
Next, four lines with the word pattern - string variables that store search settings.
- list_of_files - Initiates a list that contains a list of found files that match the search criteria.
- list_of_fs_objects = glob.glob(work_dir + files_pattern, recursive=True) - the glob module finds all paths that match the given pattern, in this case it is used to search for files according to the criteria specified in the first four variables at the beginning of the function.
- work_dir - specifies the working directory
- files_pattern - sign * indicates that all files will be searched, with any extensions. The parameter recursive=True specifies that the search will be performed recursively through all subfolders. The search result will be a list of all directories and files found in the working folder, with their paths.
- if dir_pattern in object check if the current list entry (the path to the object) contains the string 'Cookies'.
- and (not os.path.isdir(object) - here os.path.isdir checks if the object is a directory, in our case, directories are not needed, so we add the negation of not, thanks to which we get only files
- and extension_pattern in object - Checks if the file matches the '.log' extension
- and chrome_pattern in object - checks the file path for the presence of the 'Chrome' keyword so that cookies from another browser that have a different format are not accidentally used.
The result of this function will be to return a list of found files that match all the selection criteria.
Further, program control is transferred to the function for processing the found files - handle_files.
You must reply before you can see the hidden data contained here.
We create a variable for the counter of processed files files_counter = 0.
On the next line, the processing cycle for each file for file in list_of_files begins.
The file name is extracted from the file path file_name = os.path.splitext(file)[0].
We get a list of lines of a file list_of_lines = read_file(file) by calling the file read function read_file.
You must reply before you can see the hidden data contained here.
The with codecs.open(filename, 'r', encoding='utf-8', errors='ignore') as file line uses the with context operator, as it eliminates the need to manually close the file opened for reading. The codecs module, which is imported at the beginning of the file, allows you to correctly process any characters in utf-8 encoding. file.readlines() reads all lines from a file into a list. The result of the function is returned by the line return file_data. except IOError is only executed when an exception occurs while reading the file.
The progress of the program is returned to the handle_files function.
A list_of_dictionaries = [] is created, in which dictionaries will be stored - look at the beginning of the post, we need to get a json file like [{"key":"value", "key":"value"...}, {"key": "value", "key":"value"...}].
After reading the file, the counter of processed files files_counter = files_counter + 1 is increased by one.
The progress is still inside the first loop that processes the files. But we need lines.
Therefore, a new loop for item in list_of_lines begins, in which each line from the read file will be processed.
We check that the string is not empty if len(item) > 10. We split the string into substrings list_flags = item.split('\t'), tabulation acts as a separator. Next, each substring is assigned to a variable. The cookie specification specifies the order in which the dictionaries should be placed.
In dic we create a dictionary from substring values. And add the created dictionary list_of_dic.append(dic) to the list.
At the end of the nested loop, a list of dictionaries is obtained. Let's convert it to json format list_dump = json.dumps(list_of_dic). At the beginning of the file, we include a library for working with json in the import json format. The last operation is to convert the json to a string, since further we will write the processed data in a text format that only supports writing as strings.
json_file_name = file_name + '.json' adds the appropriate extension to the source file. After the program has run, there will be two source files in the folder and a file with the name of the source and the extension json - this will be the ready-made cookie file for import.
We write the processed information to the write_file(json_file_name, string_of_dump) file.
The write_file function is similar to read_file in everything, except for the line that is responsible for writing file.write(data). Here filename is the name of the file, data is the resulting list of dictionaries, which got here as an argument to the write_file(filename, data) function.
Let's run the script:
You must reply before you can see the hidden data contained here.
<----------------------------------------------------------------------------------------->
I am also dropping full source code
You must reply before you can see the hidden data contained here.