Data are Everywhere: Where are My Routes, Dude?

Data. Data are everywhere. Data were before us. Data will be after us. This story started in one of many usual boring evenings on a usual boring business trip but it unexpectedly transformed into a short but quite exciting research and absolutely not short but interesting walk called Brussels Comic Route. After the story had happened it was almost forgotten but all of a sudden I recalled it as Google violated GDPR and I could not keep it to myself anymore.

Disclaimer

A small piece of SEO: maps.me, KML, awesome maps.me routes, download Brussels Comic Route in KML, export Google Maps bookmarks and place list free and without SMS. Just to reduce number of these words in this post. And like everything in this blog, this post can be absolutely cruel and contain terrible things. Some of them may seem to be performed in an absolutely incorrect and irrational way and even destroy your brain. If you have another opinion or know a more logical way to do something relevant to the current topic, I will be very glad if you express your thoughts in comments or pull-requests.

Data is Everywhere!

Business trips are good as you may get an overall picture of different places but unfortunately almost all of them have a few problems:

  1. There is too much free time between work stuff and your flight to spend it at the airport.

  2. There is too little free time between work stuff and your flight to do some awesome work have a good rest before awesome work and take a tour or visit some cool attractions and get your job done later.

  3. Both of mentioned problems are usually simultaneous.

And that business trip had all of them... and it was even worse: I had only 4 hours and what's more I could not spend this time as I wanted because there was a museum day off, so such cool places as Tejo Electricity Museum were closed ๐Ÿ˜ž.

It could be a day of a long reddit-airport reading but accidentally I found Brussels Comic Route which was a perfect match for my free time and moreover, it looked quite interesting to fill my free time. There were only a few small issues: a list of addresses is useless for tourists with a complete lack of sense of direction... as myself and an online map is useless for greedy tourists without internet roaming... as myself. I was in the intersection of almost all sets of tourists' problems but I still had some free hours which I could spend on that issue and write a code. Of course, it was "time for sleep" at night before my flight but who thinks about such things when perspectives look so attractive?

First of all, open Developer Tools and try to understand what is behind the maps. Find a map element with enabling "Select an element in the page to inspect it" and clicking on a map sprite and go upward by the DOM tree:

first-overview.png

The good news was that it was just an iframe from OpenData Brussels and it still works as an independent page:

map-iframe.png

It is awesome because it significantly reduces amount of information which we need to review. Now switch on "Network" tab and search a script or a JSON which contains the route points. It's luck because there is just one JSON file and it contains all points of the route:

json-raw.png

There is only a small fly in the ointment: it is a plain list of coordinates without names ๐Ÿ˜ž. Of course, it is much better than nothing as I can use my imagination to make up the names of art objects... But URL parts /opendata.../api/... gives a small hope that a little more information can be got from this service. After some simplifying and experiments with parameters of the request API, all necessary information is got:

json-full.png

Click "Save as..." and place that data in some comfortable location. From this point, I want to emphasize one important moment: any data investigation can be done with Python or with pain. As it was my "rest" I preferred to use the first one. Create a file stub for a conversion script comic_convert.py:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import json

def main():
    pass

if __name__ == "__main__":
    main()

And a special wrapper wrapper.py to make the investigation easier:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from importlib import reload
from comic_convert import *
import comic_convert as cc

def rr():
    reload(cc)

And run it on the interactive console:

> python -i wrapper.py

It allows to write all necessary functions in the main script file and get access to them by cc.<function>() and quickly reload the file with rr(), but keep current state in memory. It can be done much easier with an awesome Jupiter Notebook, but I still don't feel as comfortable with it as with the old plain Python console. Let's try to understand how names of drawings can be generated:

> with open('comic_route.json') as input_file:
>    input_data = json.load(input_file)
> print(input_data[0])
{'datasetid': 'comic-book-route', 'recordid': '51d99fd812daf5960e6a81669d134798e89e6ffb', 'fields': {'auteur_s': 'Zep', 'photo': {'mimetype': 'image/jpeg', 'format': 'JPEG', 'color_summary': ['rgba(185, 188, 196, 1.00)', 'rgba(156, 151, 148, 1.00)', 'rgba(63, 59, 41, 1.00)'], 'filename': 'titeuf.avenue_bockstael.1151.jpg', 'width': 300, 'id': '736d97639cf0a1ffa5a7fbc22c92bde8', 'height': 450, 'thumbnail': True}, 'personnage_s': 'Titeuf', 'coordonnees_geographiques': [50.8700844057, 4.34397697449], 'annee': '2006'}, 'geometry': {'type': 'Point', 'coordinates': [4.34397697449, 50.8700844057]}, 'record_timestamp': '2019-12-11T10:14:16.854Z'}

> print(input_data[1])
{'datasetid': 'comic-book-route', 'recordid': '76178a52138075239f8a77152e37faa05f78a6f3', 'fields': {'auteur_s': 'Roba', 'photo': {'mimetype': 'image/jpeg', 'format': 'JPEG', 'color_summary': ['rgba(138, 147, 153, 1.00)', 'rgba(160, 142, 118, 1.00)', 'rgba(114, 100, 86, 1.00)'], 'filename': '24-boule&bill-rue du chevreuil (0).jpg', 'width': 300, 'id': '701e2ec34bc4b35761b7a3d54224427e', 'height': 200, 'thumbnail': True}, 'personnage_s': 'Boule & Bill - Bollie & Billie', 'coordonnees_geographiques': [50.8376411641, 4.3455862999], 'annee': '1992'}, 'geometry': {'type': 'Point', 'coordinates': [4.3455862999, 50.8376411641]}, 'record_timestamp': '2019-12-11T10:14:16.854Z'}

I don't know French but auteur_s and personnage_s looks pretty suspicious and most probably the first function should be like this:

def get_name(entry):
    fields = entry.get('fields', {})
    return '{} - {}'.format(fields.get('auteur_s', ''), fields.get('personnage_s', ''))

And let's try it:

> rr()
> print('\n'.join(cc.get_name(x) for x in input_data))
Pratt - Corto Maltese
Roba - Boule & Bill - Bollie & Billie
Zep - Titeuf
...

Brussels is placed quite close to the Greenwich line, so my hypothesis is that the first value in coordinates is longitude and the second one is latitude... Still I tend not to trust myself, check it with the external source <https://www.latlong.net/Show-Latitude-Longitude.html>`__. Can it be incorrect? Of course not, I'm an expert (no) ๐Ÿ˜‚. Continue to write the code:

def get_longitude(entry):
    geometry = entry.get('geometry', {})
    return geometry.get('coordinates', [0, 0])[0]

def get_latitude(entry):
    geometry = entry.get('geometry', {})
    return geometry.get('coordinates', [0, 0])[1]

And check all pieces together:

> rr()
> print('\n'.join('{:50.50}{:13.10}{:13.10}'.format(cc.get_name(x), cc.get_longitude(x), cc.get_latitude(x)) for x in input_data))
Pratt - Corto Maltese                                 4.3485253   50.8605342
Roba - Boule & Bill - Bollie & Billie                 4.3455863  50.83764116
Zep - Titeuf                                        4.343976974  50.87008441
...

Bingo! I can already use it and enter all coordinates by hands in a GPS navigator! But is it possible to make my life even easier?

Let's Tame Maps.ME

Maps.ME is an awesome software. Without taking into account that it is much better for greedy tourists without internet in roaming, it has a few additional features:

  1. It is an open-source and supported by the community. For someone it can mean a lot.

  2. The pedestrian map is awesome. Sometimes it can be even a little scary because the maps are so good that you can go along private garages for a few hundred meters towards a direction that really looks like a dead end with a closed small door... which turns out to be open and you can easily go into:

    mapsme-ways.jpg
  3. It works offline. As a matter of fact, it isn't only about greedy tourists. Are you sure that you cannot be stuck at the top of a mountain without a cell coverage because you didn't take into account a short working day for a cable road on holidays and you somehow should find another way to the bottom? I'm already not.

  4. You can easily mark visited places. Offline. And it requires only two clicks instead of a bunch of clicks and menus in Google Map.

Ok, I have a list of coordinates and their names, so what should I do next? Maps.ME is awesome even here, it easily allows to create a new bookmark list, add some points and export them. An exported list has an extension .kmz and can be easily googled or open as text:

> head -c 16 Test.kmz
PK^C^D^T^@^@^@^H^@M-b}8PM...

Maybe it isn't a good habit to determine a type of a file by its magic signature in a text form but it looks exactly like a simple zip-archive. It contains only one file test.kml which also can be easily googled or open in the text editor:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
<Document>
  <Style id="placemark-red">
    <IconStyle>
      <Icon>
        <href>http://maps.me/placemarks/placemark-red.png</href>
      </Icon>
    </IconStyle>
  </Style>
  ...
  <name>Test</name>
  <visibility>1</visibility>
  <ExtendedData xmlns:mwm="https://maps.me">
     ...
  </ExtendedData>
  ...
  <Placemark>
    <name>Hello</name>
    <description></description>
    <TimeStamp><when>2020-01-24T12:46:15Z</when></TimeStamp>
    <styleUrl>#placemark-red</styleUrl>
    <Point><coordinates>-0.614,51.4684904</coordinates></Point>
    <ExtendedData xmlns:mwm="https://maps.me">
       ...
    </ExtendedData>
  </Placemark>
  <Placemark>
    <name>World</name>
    <description></description>
    <TimeStamp><when>2020-01-24T12:46:37Z</when></TimeStamp>
    <styleUrl>#placemark-red</styleUrl>
    <Point><coordinates>-0.613,51.4684904</coordinates></Point>
    <ExtendedData xmlns:mwm="https://maps.me">
       ...
    </ExtendedData>
  </Placemark>
</Document>
</kml>

Maybe it's the right time to open Wiki and read that KML is Koogle Markup Language (Keyhole Markup Language before it was bought by Google) or even open the specification. But screw this and just remove all not interesting sections and try to import it back as test.kml:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
<Document>
  <Style id="placemark-red">
    <IconStyle>
      <Icon>
        <href>http://maps.me/placemarks/placemark-red.png</href>
      </Icon>
    </IconStyle>
  </Style>
  <name>Test</name>
  <visibility>1</visibility>
  <Placemark>
    <name>Hello</name>
    <description></description>
    <TimeStamp><when>2020-01-24T12:46:15Z</when></TimeStamp>
    <styleUrl>#placemark-red</styleUrl>
    <Point><coordinates>-0.614,51.4684904</coordinates></Point>
  </Placemark>
  <Placemark>
    <name>World</name>
    <description></description>
    <TimeStamp><when>2020-01-24T12:46:37Z</when></TimeStamp>
    <styleUrl>#placemark-red</styleUrl>
    <Point><coordinates>-0.613,51.4684904</coordinates></Point>
  </Placemark>
</Document>
</kml>

Awesome! Maps.me even doesn't force us to put it back into a zip archive and works without all this sh.. not necessary trash!

mapsme-experiment.png

So the task can be mitigated into a much simplier task as a text generation:

import datetime
from xml.sax.saxutils import escape

...

HEAD = """\
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
<Document>
  <Style id="placemark-red">
    <IconStyle>
      <Icon>
        <href>http://maps.me/placemarks/placemark-red.png</href>
      </Icon>
    </IconStyle>
  </Style>
  <name>{name}</name>
  <visibility>1</visibility>"""

PLACEMARK = """\
  <Placemark>
    <name>{name}</name>
    <description>{description}</description>
    <TimeStamp><when>{date}</when></TimeStamp>
    <styleUrl>#placemark-red</styleUrl>
    <Point><coordinates>{longitude},{latitude}</coordinates></Point>
  </Placemark>"""

END = """\
</Document>
</kml>"""

...

def generate_kml(data):
    output = [HEAD.format(name="comic_route")]
    for point in data:
        output.append(PLACEMARK.format(
            name=escape(get_name(point)),
            description=escape(get_name(point)),
            date=datetime.datetime.now().replace(microsecond=0).isoformat(),
            longitude=get_longitude(point),
            latitude=get_latitude(point)))
    output.append(END)
    return '\n'.join(output)

And then finally generate the necessary file:

> rr()
> kml = cc.generate_kml(input_data)
> open('comic_route.kml', 'w').write(kml)

And check that everything works fine on Maps.me:

mapsme-finale.png

Bingo! It's time to walk and make cool photos:

comic-route.jpg

And as I promised you can download Brussels Comic Route in KML and the whole script described above.

Google Against Us, We Against Google

Of course, Maps.me is an awesome software but as any software it can have their own negative sides. On one hand, Maps.me and underlying OpenStreetMap database have a very poor catalog of sightseeing. And what is more, even a simple attempt to find the nearest grocery store can transform into a "fascinating" adventure with a sprint or even with a fight. On the other hand, there are quite good catalogs such as TripAdvisor or Google Maps and our trips usually include them. So, is there a way to combine them?

There are a few ways to export the main list of bookmarks through Google Bookmarks or even through Google Takeout. It solves a problem but not completely because a much more comfortable way to plan a trip is Google Maps Placelist. It allows to organize places much more structured and does not force to delete old points to keep bookmarks in some semblance of order but sharing lists of places is a mess and there isn't a straight way to download them:

But Google Maps still needs to get these data from somewhere, so can we use the same information? I think, it is obvious what we should do now! Open Developer Tools and open Google Maps, click "Your Places" -> "Saved", click "Clean" in network events tab to reduce amount of work and then click on a list we are interested in.

googlemaps-list.png

There are a lot of responses in the network requests list and zero *.json among them ๐Ÿ˜ž but one of the items (https://www.google.com/search?tbm=map&authuser=0&hl=en&...) looks rather suspiciously:

{"c":0,"d":")]}'\n[[\"*\",[[null,null,null,null,null,null,null,null,\"EfU6XproFKS...
...2d31594\",\"Rich Family\",null,[\"Hypermarket\",\"Childrens store\"]\n,null,null,null,null,\"Rich Family, Chistopolskaya, 11, Kazan,...
...580921998250.1"}/*""*/
  1. It contains some data from the displayed list.

  2. It contains a lot of other data which are not displayed in a list.

  3. It ends with /*""*/ which is a trick to prevent XSSI attack 1, 2, 3.

  4. Inside it contains... another JSON ๐Ÿ˜•. But we should remember that Google is a big company and some "over-enterprising" with packing of already packed data can be easily found in the big enterprise systems.

  5. Internal JSON starts with )]}'\n which is another trick to prevent XSSI... security is never enough.

  6. It looks like Google optimizes server side logic and reduces serialization logic by dumping all data in a very fast JSON-compatible format and moving all data transformation logic to a client side.

Save it as test.google and before we will go next, it is a good opportunity to make our comic_convert.py a little more universal:

  1. The most important change: rename file to place_convert.py and change associated import in wrapper.py.

  2. And make a few minor changes in code:

class Point:

    def __init__(self, name, description, longitude, latitude):
        self.name = name
        self.description = description
        self.longitude = longitude
        self.latitude = latitude

class ComicSource:

    def __init__(self, filename):
        with open(filename) as input_file:
            self.input_data = json.load(input_file)

    def title(self):
        return "comic_route"

    def enumerate(self):
        for point in self.input_data:
            yield Point(ComicSource.get_name(point),
                        ComicSource.get_name(point),
                        ComicSource.get_longitude(point),
                        ComicSource.get_latitude(point))

    @staticmethod
    def get_name(entry):
        fields = entry.get('fields', {})
        return '{} - {}'.format(fields.get('auteur_s', ''), fields.get('personnage_s', ''))

    @staticmethod
    def get_longitude(entry):
        geometry = entry.get('geometry', {})
        return geometry.get('coordinates', [0, 0])[0]

    @staticmethod
    def get_latitude(entry):
        geometry = entry.get('geometry', {})
        return geometry.get('coordinates', [0, 0])[1]

def generate_kml(filename, source):
    output = [HEAD.format(name=source.title())]
    for point in source.enumerate():
        output.append(PLACEMARK.format(
            name=escape(point.name),
            description=escape(point.description),
            date=datetime.datetime.now().replace(microsecond=0).isoformat(),
            longitude=point.longitude,
            latitude=point.latitude))
    output.append(END)

    with open(filename, 'wb') as output_file:
        output_file.write('\n'.join(output).encode('utf-8'))
    print("{} points written to {}".format(len(output) - 2, filename))

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('filename')
    args = parser.parse_args()

    if args.filename == 'comic_route.json':
        source = ComicSource(args.filename)

    generate_kml(os.path.splitext(args.filename)[0] + '.kml', source)

It looks pretty similar but has a few advantages:

  1. It forces to write longer commands like print('\n'.join(cc.ComicSource.get_name(x) for x in input_data)) which train touch typing skill.

  2. It allows to add new sources in the future.

After that our research can be continued:

# unpack external container
> with open('test.google', 'rb') as input_file:
>    outer_data = json.loads(input_file.read().decode('utf-8')[:-6])

# unpack internal container
> input_data = json.loads(outer_data['d'][5:])
> input_data

[['*', [[None, None, None, None, None, None, None, None, 'hgo7..., ['hgo7..., ['Chistopolskaya, 11', 'Kazan', ...], ['8 (843) 212-43-70'], [None, None, None, ['https://www.google.com/search?q=Rich+Family,...', '1,721 reviews', None, '0ahUK...'], None, None, None, ..., 'Rich Family', None, ['Hypermarket', 'Childrens store'], None, None, None, None, 'Rich Family, ...

Fine, this is JSON... but it is not JSON. JSON is a self-descriptive format but this data are exactly not. From another side JSON is a key-value based format but this data look like dump of quite big data table where a key is index of column and unnecessary values just has null value in JSON. But it isn't a flat table it is quite complex hierarchical table of internal tables of internal table ๐Ÿ˜ฑ. So, add additional function and try to understand underlying logic:

class GoogleSource:

  @staticmethod
  def traverse(data, path):
      try:
          el = data
          for i in path:
              el = el[i]
          return el
      except:
          return None

And try it:

> rr()
> cc.GoogleSource.traverse(input_data, [0])
['*', [[None, None, None, None, None, None, None, None, ...
> cc.GoogleSource.traverse(input_data, [0, 1])
[[None, None, None, None, None, None, None, None, 'hgo7...
> cc.GoogleSource.traverse(input_data, [0, 1, 0])
[None, None, None, None, None, None, None, None, 'hgo7..., 'Rich Family', ...
> cc.GoogleSource.traverse(input_data, [0, 1, 1])
[None, None, None, None, None, None, None, None, 'hgo7..., 'Kazan Wedding Palace', ...

Element input_data[0][1] exactly contains array of elements. This index magic looks pretty solid so we can continue our research:

> cc.GoogleSource.traverse(input_data, [0, 1, 1, 14, 11])
'Kazan Wedding Palace'
> cc.GoogleSource.traverse(input_data, [0, 1, 1, 14, 25, 15, 0, 2])
'BLA-BLA' # Yeah, it is my test commentary...
> cc.GoogleSource.traverse(input_data, [0, 1, 1, 14, 9, 2])
55.8128757
> cc.GoogleSource.traverse(input_data, [0, 1, 1, 14, 9, 3])
49.1080924
# And even name of place list
> cc.GoogleSource.traverse(input_data, [32, 1])
'Test'

Ok, it wasn't so difficult. I even hold no ill will toward Google for violation of my rights. Now all missing parts are discovered and GoogleSource can be written completely:

class GoogleSource:

    def __init__(self, filename):
        with open(filename, 'rb') as input_file:
            outer = json.loads(input_file.read().decode('utf-8')[:-6])
            self.input_data = json.loads(outer['d'][4:])

    def title(self):
        return GoogleSource.traverse(self.input_data, [32, 1])

    def enumerate(self):
        for point in GoogleSource.traverse(self.input_data, [0, 1]):
            yield Point(GoogleSource.traverse(point, [14, 11]),
                        GoogleSource.traverse(point, [14, 25, 15, 0, 2]) or '',
                        GoogleSource.traverse(point, [14, 9, 2]),
                        GoogleSource.traverse(point, [14, 9, 3]))

...

    if args.filename == 'comic_route.json':
        source = ComicSource(args.filename)
    elif args.filename.endswith('.google'):
        source = GoogleSource(args.filename)

...

And run it from bash:

> ./place_convert.py test.google
20 points written to test.kml

Bing... paging thou art a heartless bitch! The list which I wanted to export is for sure longer than 20 points! It is a good challenge to reverse the request parameters... but I prefer to do it in the future. Right now is more than enough to use a simple keyword search as a filter in Network tab and scroll down the whole list of places, save all responses in one file and make a few changes in GoogleSource:

class GoogleSource:

    def __init__(self, filename):
        self.input_data = []
        with open(filename, 'rb') as input_file:
            for line in input_file.read().decode('utf-8').splitlines():
                outer = json.loads(line[:-6])
                self.input_data.append(json.loads(outer['d'][4:]))

    def title(self):
        return GoogleSource.traverse(self.input_data, [0, 32, 1])

    def enumerate(self):
        for response in self.input_data:
            for point in GoogleSource.traverse(response, [0, 1]):
                yield Point(GoogleSource.traverse(point, [14, 11]),
                            GoogleSource.traverse(point, [14, 25, 15, 0, 2]) or '',
                            GoogleSource.traverse(point, [14, 9, 2]),
                            GoogleSource.traverse(point, [14, 9, 3]))

Last piece:

> ./place_convert.py test.google
45 points written to test.kml

The whole script also can be downloaded.

Conclusion

There are so many ways how this research can be continued that I even hate myself that I started it:

  • Read through KML specification or experiment with MAPS.ME, because points list are very basic functionality and at least maps.me also supports paths. Although I don't believe in them, usually our trips are not "We will follow your ideal Hamiltonian path" but "Look at that blinking thing over there, we should go there and have a look!", so really a bunch of points are more suitable for me ๐Ÿ˜….

  • Give another change to Jupiter Notebook.

  • Get through Google Maps Paging logic. Looks like some kind of serialization is used here and it is an interesting task to understand it. Although it gives a very little profit by reducing the additional step with a list scrolling.

  • Get through Google Maps Data Scheme logic. Most probably this is a format based on some of Google APIes like Google Bigtable API etc. but I haven't found anything similar to it. I think, is the most interesting continuation because it can allow to get some new ideas and implement them in real future projects.

Anyway it is already quite finished work which shows that interesting tasks can happen in quite unpredictable places. And now it is a good moment to unsubscribe from the blog updates, leave a comment that the whole post is a bullshit or at least downvote the post. In any case, happy hacking!

Comments

Comments powered by Disqus