Wiker, feeding their database from web data

Context

Wiker is the first local network to connect all the actors of medium-sized or rural towns via their economic, cultural or social information in order to promote local life, its economy and its short circuits and support the sustainable development of town centers and territories.

To do this, Wiker collects relevant information from the territories from numerous internet sources (sites and open data) - some of which are accessible by API - and aggregates it into a single service. But this collection phase is time consuming and a source of errors. For example, each employee spends an average of half an hour a day checking the websites monitored, retrieving interesting information and labeling them ("event", "article" for example). The multiplicity of employees can lead to variations in labeling.

Examples of source pages

Solution

Scenario used

1

Scraping worker and API connectors

Retrieve information from the web

2

Worker
ETL

Cleanses and formats data

3

Prediction Worker

Label the recordings

4

Worker
INSEE

Add the INSEE codes to the municipalities

5

Worker
Database

Save to database

SmartMyData - Example of final results with labeling and INSEE codes

(Screenshot from a Wiker table)

Return on investment

Automate the collection of information from a wide variety of sources

Automatic labeling of records via a Machine Learning module

Reduced risks and errors linked to human intervention

Reallocation of employees to more rewarding tasks