@sejm_watch – what are the sparrows in the Polish parliament chirping about
Politics, area full of emotions and misconceptions. How about looking into it from a data perspective and translate parliament work and emotions in public discourse into numbers and charts? Let me share my latest project – @sejm_watch – a bot which scrapes Sejm website and tweets infographics.
*Sejm – the larger and more powerful lower house of the Polish parliament.
There are various activities taken up in Sejm however I needed to choose the key data points for further analysis.
- party membership, age, etc.
- speeches during Sejm sessions – to see how and what do they talk about,
- voting – how they vote,
- interpellations & questions – what’s important for them, what do they care about
All the data comes from official Sejm website, speeches and voting particularly from sessions’ transcripts. The bot is crawling the website daily for any updates.
The other important area is social media. I took on Twitter, mainly because of high presence of deputies there. Available and friendly API was additional point which led me to use data on this platform. Again, the bot is collecting tweets, mentions and other stats on a daily basis.
Having all the data, now’s the time for analysis and presentation layer, to make it eye-catchy and hopefully viral. @sejm_watch makes 5 infographics (1 daily, 1 each Monday, 3 on event basis).
Deputy birthday dashboards
2 to 3 boards are generated as a gift for deputy birthday. First one is purely on Sejm activity, showing:
- age and comparison to all deputies age distribution, party membership, Sejm committees deputy works for,
- voting presence % and voting compliance to mother party majority,
- count of words spoken in Sejm in time.
Tweet which accompanies the dashboard tells you number of interpellation/questions raised during last 12 months, so both tweet and board show how hard deputy worked in parliament.
Next board is on deputy’s message and agenda shown through speeches in Sejm, questions asked to ministries. Both presented as wordclouds of most common bi-grams used. Twitter wordclouds shows most common hashtags and mentions used in their feed and popular hashtags used by other people mentioning deputy. As a result, it gives you a flavor of what do deputies care about, who do they communicate with and what others write about them.
3rd and last birthday dashboard is generated only for deputies holding Twitter account. It brings out few fun facts such as speech speed in Sejm, number of applause received or time spent on Twitter. What’s more important it indicates if the deputy’s message is positive or negative, if they act confrontational during parliament sessions.
To assess sentiment I used this list of verbs labelled with positive or negative connotation. All the words are first lemmatized using Morfeusz morphological analysis library. Then the following rules apply:
- Positive connotation/sentiment is recognized only when verb in original text is in first person singular/plural (we liked, I love, etc.)
- Negative connotation/sentiment is recognized only when verb in original text is in second or third person singular/plural (you destroyed, he annoys, etc.)
Twitter daily & weekly dashboards
Apart of summaries prepared for single deputies activity, I was curious about entire group of Sejm politicians and their presence on Twitter. What they are tweeting about daily? Who’s gaining popularity? Is there a consistent party driven agenda put onto Twitter?
This summary is posted daily. Shows yesterday’s most popular trends. There are 2 major parties occupying over 83% of places in Sejm, therefore message of smaller parties is less visible but it happens sometimes that deputies from these get into most liked / retweeted. The board is pretty useful to see what’s currently on top amongst the group.
The last one is posted weekly and is focused on social media party driven agenda and its implementation by party representatives in Sejm. It shows what is the daily/weekly topic and who’s best in maintaining media discipline. The remaining part of the dashboard presents growths and declines in Twitter followers.
@sejm_watch – what’s next?
@sejm_watch will just continue crawling the web and tweeting on its own. In the meantime I’ll probably share some ad-hoc analysis based on the data it gathers.
Follow @sejm_watch on Twitter and please share your ideas what analysis, summary or maybe additional data source can be added to make it more informative, useful or interesting!
Materials and inspirations
- @sejm_watch – Twitter profile
- SmarterPoland.pl and MamPrawoWiedziec.pl – they did some similar analysis – it inspired @sejm_watch speech vs. interrupt
- AnalitykaSuwerena – interesting read, it inspired @sejm_watch verb sentiment analysis,
- The Linguistic Category Model in Polish (LCM-PL) – list of Polish verbs with positive/negative connotation assigned
- Morfeusz – really useful for any NLP, lemmatization, text mining in Polish language