ThorData - Best Residential Proxy Provider

Python/Perl/Bash Convert Google Chrome cookies from txt to json.

K4NITEL

Administrator
Staff member
Admin
Joined
Jun 18, 2022
Messages
43
Hellcoins
♆498
525099ddff691787090a8.png


This is the third article in the author's series "Learning Python in practice".

There is such a problem - when there is a "backup" (well, we know what it really is) a Google Chrome cookie in txt format, but in order to import them into the browser again, you need to convert their file in json format. There is a PHP script on the net, but it is inconvenient to use it, since you need a running web server, locally or remotely. In addition, this script has a problem - it only works with one file.

There is also a web service for conversion, but I don’t really want to trust my own to someone unknown. The task is not difficult, but painstaking. Let's get to the solution.

This is how the cookie backup line in txt looks like:
You must reply before you can see the hidden data contained here.
This is what the json file should look like:
You must reply before you can see the hidden data contained here.
That is, what we see is that at the input the file is similar in structure to the classic csv, in which records are separated by tabs, at the output, according to the requirements of the standard, we should get a list of dictionaries of this kind:
You must reply before you can see the hidden data contained here.
Let's start from the beginning, namely from the entry point to the program:
You must reply before you can see the hidden data contained here.
Here, the path to the working directory is taken from the command line parameter, i.e. the run command will look like this:
You must reply before you can see the hidden data contained here.
work_dir = sys.argv[1] we took the second argument (the first argv[0] is reserved and will return the name of the script itself) and assigned it to work_dir.

On the next line files = find_files(work_dir) we call the find_files function, where we pass the path as an argument, and assign the result of the execution to the dirs variable.

Then we call the handle_files(dirs) function, which actually controls the further work of the program.

Consider the find_files function:
You must reply before you can see the hidden data contained here.
The mandatory argument work_dir is passed to the function - the working folder in which the files will be searched.

Next, four lines with the word pattern - string variables that store search settings.

  1. list_of_files - Initiates a list that contains a list of found files that match the search criteria.
  2. list_of_fs_objects = glob.glob(work_dir + files_pattern, recursive=True) - the glob module finds all paths that match the given pattern, in this case it is used to search for files according to the criteria specified in the first four variables at the beginning of the function.
  3. work_dir - specifies the working directory
  4. files_pattern - sign * indicates that all files will be searched, with any extensions. The parameter recursive=True specifies that the search will be performed recursively through all subfolders. The search result will be a list of all directories and files found in the working folder, with their paths.
Next, the for object in list_of_fs_objects loop is run to process each found object. In fact, the loop filters the list_of_fs_objects list according to the given criteria. Further, the filter line itself, which combines several condition checks, the "\" sign indicates a line feed (a long line is inconvenient to read) and is not taken into account by the interpreter.

  • if dir_pattern in object check if the current list entry (the path to the object) contains the string 'Cookies'.
  • and (not os.path.isdir(object) - here os.path.isdir checks if the object is a directory, in our case, directories are not needed, so we add the negation of not, thanks to which we get only files
  • and extension_pattern in object - Checks if the file matches the '.log' extension
  • and chrome_pattern in object - checks the file path for the presence of the 'Chrome' keyword so that cookies from another browser that have a different format are not accidentally used.
The word "and" indicates that further processing of the object will be performed if it meets all the conditions at once.

The result of this function will be to return a list of found files that match all the selection criteria.

Further, program control is transferred to the function for processing the found files - handle_files.
You must reply before you can see the hidden data contained here.
The required parameter of the handle_files function is the list of files to be processed - list_of_files.

We create a variable for the counter of processed files files_counter = 0.

On the next line, the processing cycle for each file for file in list_of_files begins.

The file name is extracted from the file path file_name = os.path.splitext(file)[0].

We get a list of lines of a file list_of_lines = read_file(file) by calling the file read function read_file.
You must reply before you can see the hidden data contained here.
In the function, all code that performs reading is placed in a try-except statement. This will allow you to handle exceptional cases when it is impossible to read the file: no rights in the file system, unexpected detachment of the file media, failure in the file system, etc.

The with codecs.open(filename, 'r', encoding='utf-8', errors='ignore') as file line uses the with context operator, as it eliminates the need to manually close the file opened for reading. The codecs module, which is imported at the beginning of the file, allows you to correctly process any characters in utf-8 encoding. file.readlines() reads all lines from a file into a list. The result of the function is returned by the line return file_data. except IOError is only executed when an exception occurs while reading the file.

The progress of the program is returned to the handle_files function.

A list_of_dictionaries = [] is created, in which dictionaries will be stored - look at the beginning of the post, we need to get a json file like [{"key":"value", "key":"value"...}, {"key": "value", "key":"value"...}].

After reading the file, the counter of processed files files_counter = files_counter + 1 is increased by one.

The progress is still inside the first loop that processes the files. But we need lines.

Therefore, a new loop for item in list_of_lines begins, in which each line from the read file will be processed.

We check that the string is not empty if len(item) > 10. We split the string into substrings list_flags = item.split('\t'), tabulation acts as a separator. Next, each substring is assigned to a variable. The cookie specification specifies the order in which the dictionaries should be placed.

In dic we create a dictionary from substring values. And add the created dictionary list_of_dic.append(dic) to the list.

At the end of the nested loop, a list of dictionaries is obtained. Let's convert it to json format list_dump = json.dumps(list_of_dic). At the beginning of the file, we include a library for working with json in the import json format. The last operation is to convert the json to a string, since further we will write the processed data in a text format that only supports writing as strings.

json_file_name = file_name + '.json' adds the appropriate extension to the source file. After the program has run, there will be two source files in the folder and a file with the name of the source and the extension json - this will be the ready-made cookie file for import.

We write the processed information to the write_file(json_file_name, string_of_dump) file.

The write_file function is similar to read_file in everything, except for the line that is responsible for writing file.write(data). Here filename is the name of the file, data is the resulting list of dictionaries, which got here as an argument to the write_file(filename, data) function.

Let's run the script:
You must reply before you can see the hidden data contained here.
where backup is a folder with cookies in text format. After work, the program will report how many files were found and processed.
<----------------------------------------------------------------------------------------->
I am also dropping full source code
You must reply before you can see the hidden data contained here.
 
Top