As some of you may know already, I’m delighted to now be working with the Open Spending community as community manager. I really look forward to meeting you all at our April Community Hangout!
You are browsing the archive for Uncategorized.
This is a post by Stefan Wehrmeyer and Anders Pedersen.
Last year OpenSpending engaged in a partnership with FarmSubsidy.org to publish the data on payments for recipients of EU farm subsidies, officially known as the EU Common Agricultural Policy. EUR 50 bn. a year are paid in farm subsidies Most of the farm subsidies are paid directly to farmers and companies behind agricultural products. The fight to open these to the public at granular level and in machine readable format has been ongoing since 2005, when transparency advocates and journalists across the EU began demanding the data to be free. While farm subsidy spending is on the decline farm subsidies will continue to account for 38 per cent of the EU budget until 2020. Farmsubsidy.org is the only public database in Europe documenting who gets the money.
The farm subsidy data is released annually and this year the OpenSpending community will head up the data collection on farm subsidy payment, which must be published by each of the 28 EU member states by end of April. In most member states the data will not be available as bulk download, but will need to be scraped from government websites. We need your help to make the data open and accessible to the public!
Help us track and scrape the farmsubsidy data!
We are there now starting to collect farmsubsidy data from across 28 member states. Stefan Wehrmeyer from Open Knowledge Foundation Germany has set up a GitHub repository with an issue for each member state:
Check and add you self to the Farm subsidy country tracker by contributing a scraper for your country (or any other).
To uncover the EU farm subsidies we will need help from across the OpenSpending community and beyond! Documentation on the output format and ways of integration is still forthcoming, but if you are interested, post a comment on the Github issue or to the Openspending Developer list. The upcoming Open Data Day on 22 February could be a great opportunity to tackle this in your country!
Open Data Day is coming up on February 22nd with events happening in cities across the world. Are you interested in the recent budget passed in your city council or curious about the expenditures of your local school board? Then be sure to get involved and make use of the OpenSpending platform and community at Open Data Day.
Help open up your city’s finances by publishing them on OpenSpending and producing elegant visualizations that show where money goes. OpenSpending makes it super-easy to turn an Excel (or even PDF) file of local city finances into something browsable and searchable. Community members have already added more than 100 city budgets to OpenSpending and plotted them on this map . It also enables you make clear and beautiful visualisations of your local city’s finances in seconds making them understandable to everyone.
Just follow these 3 simple steps:
1. Find the data – locate your local city or municipal data
2. Prepare the data – turn it into a single clean spreadsheet in Excel, Libreoffice or Google Docs (and then export as CSV – just “save as”)
3. Upload to OpenSpending and visualize
If you need more information we’ve got the full step-by-step guide for you even in six different languages.
With a bit of data cleaning and a quick upload you will for example get this neat visualization of the Moscow city budget for 2014:
Will you be working on budgets or spending at the Open Data Day?
Let the OpenSpending community know by sharing the news on the mailing list and connect with us on the IRC freenode channel #openspending.
If you need inspiration for your event check this list of exciting Open Data Day activities, which have already been planned.
The Big Lottery Fund have been doing some great strides with open data publishing all of their historical grants data as open data at: http://www.biglotteryfund.org.uk/research/open-data
This seemed perfect for OpenSpending and so with a few minutes of data wrangling we’ve got the data into OpenSpending (only 2012/2013 data so far):
You can already get some nice overviews, for example
By local authority then grantee
Note: this is just 2013 data (not 2012 + 2013)
By region then local authority then grantee
Create your own
You can create your own at: https://openspending.org/big-lottery-fund-grants/views/new
The News Editors’ and Data Wranglers’ Teams are catching up tonight in a joint Hangout. In the interest of transparency we are running this live-blog as an experiment.
Refresh at your convenience.
7.02pm (GMT) We are wrapping up, as our to-do lists are growing. Final points include allocation of more administrative tasks and an awesome project that Steve came up with which is to map all monies spent on energy management, specifically with regards to global warming and the funds budgeted there.
6.49pm (GMT) Berlin-based Michael is sharing his project of opening up budgets across German cities, including this one and this one. In partnership with Miriam Ruhenstroth they aim to get the data up on Open Spending.
6.42pm (GMT) Another insight from the call is that our guide, which aims to help communities get started and has been translated (by volunteers) in four languages, may yet again be in need of a refresh. Steve is going to go through it with a pair of fresh eyes and help update it. If you want to help too, please let us know. Getting communities started as seamlessly as possible with Open Spending is very important to us.
6.32pm (GMT) We’re talking data cleaning now. Big Data that is. Elain’s data is a 1GB file of over 1m lines. She is currently attempting the task with MySQL and OpenRefine. Do you have any other suggestions?
6.27pm (GMT) One of the data wranglers, Elaine Ayo is working cleaning the data on EU structural funds for Italy, which she claimed here. “It’s quite large and so I am still working on trying to figure out how to deal with it,” she told us. Can you help?
6.22pm (GMT) Turns out we have Berlin and Toronto represented in the call too.
6.17pm (GMT) We are taking rounds to reinforce who we are and what we’re in this game for. George Adcock (aka Steve) underlined the need for an infographic map showing the relationships between all the tools we use, especially here at OpenSpending, whether to produce our work, keep up with it or share it. Great insight!
6.10pm (GMT) Welcome! Hooking up on a call from across the world take a few minutes. We are joined from DC, London and Louisiana to name but a few places. In the call tonight we have Anders Pedersen, George Adcock, Elaine Ayo, Michael Horz, Neil Ashton and myself.
At OpenSpending, our restless community drives our work and so our volunteers are key to our activities. Today we are welcoming those joining the News Editors’ and Data Wranglers’ teams. Find out more about them and what they stand for in the short bios below. Many have already been at work, and you can follow their involvement on our Trello boards.
We will kick-off tonight (lunchtime or breakfast depending on your timezone) with an experiment, a live-blog of the first joint Hangout between the two teams. Tune in at 6pm GMT (or thereabouts).
These guys run the blog and manage our social media presence. We are still recruiting News Editor volunteers so if you’d like to join the team apply here.
Burite Joseph, @BuriteJoseph
Independent media practitioner and entrepreneur with over five years of journalism and research experience, Burite runs ZHENOBIA, a media integration and multimedia content aggregation company. She also consults for SMS Media Uganda, Ultimate Media Uganda, East African Business Week and Daily Monitor.
Working with data is my new passion. I am a quick learner and teamwork is my steroid.
Anna Flagg, www.annaflagg.com
Data journalist at the Center for Responsive Politics, Anna has a background in computer science, data visualization, design and data-storytelling.
I like working on projects that create awareness of issues important to the public. I’m excited to work with and learn from the Open Spending community.
Laura S. García,@laura_s_garcia
An experienced journalist, Laura has worked for more than ten years as a multi-media journalist in Spain. She has also taught Geography and History to high-school students. Laura speaks Spanish, Galician, English and a little Swedish.
I’m looking to improve my knowledge of open data, as I’ve always thought this to be the best way to offer a good journalism and a good education as well.
Karen Brzezinska, @westofwarsaw
Also a professional journalist, Karen (Kati) worked for international news services specialising in equity, commodity and currency markets. Her background is in PoliSci (East European studies), and, while originally from midwestern US, her life experience lists Italy, Hungary (1989-1992) and The Netherlands (since 1992) as home-countries. Kati is fluent in English (US) and Dutch.
I’m interested in learning how open data can be used to enhance governance and education.
Dominic Kornu, @qaphui
An IT and Maths tutor from Ghana, with an interest in web and social media technologies, Dominic blogs at Qaphui’s Cafe and volunteers in his free time.
I am interested in learning how open data can be used to enhance governance and education.
Mehmet Koksal, @mehmetkoksal
Freelance journalist based in Brussels (Belgium) and conference interpreter, Mehmet also works as a fixer for the international press, including the French weekly Courrier int.. In his free time he volunteers for AJP and acts as a campaign manager for the EFJ.
Teodora Beleaga, @t30d0ra
I joined the Open Spending project to share my data analysis skills and expand my understanding of fiscal transparency and government spending.
A freelance journalist and editor from Orlando (Florida), Maria works for Pearson Education and theDailySource.org. A graduate of the University of Florida’s College of Journalism and Communications as well as the College of Music, Maria’s work focuses primarily on feature writing, editing and music.
I am continually trying to broaden my knowledge in this sector, more specifically at this juncture in finance and online education cultivation.
A Science and technology freelance journalist based in Berlin (Germany), Miriam has a background in biological sciences. In 2011 she attended a summer school for data journalism (organized by Initiative Wissenschaftsjournalismus).
I found the field of data storytelling thrilling and joined OpenSpending, to learn more about it and participate for good.
The Data Wranglers work to add, clean and visualise data in OpenSpending. They help community members who need assistance. Some data wranglers focus on cleaning and analysing data whereas others work to visualise data using the OpenSpending API. We are still recruiting Data Wrangler volunteers so if you’d like to join the team apply here.
Concha Catalan, @conchacatalan
An English teacher and freelance journalist based in Barcelona (Spain), Concha is currently working on a project to open the autonomous government of Catalonia (opengov.cat). She also blogs at http://barcelonalittleshell.blogspot.com.es.
I would like to add the data set of the autonomous government of Catalonia budget to OpenSpending. I am coming to terms with lots of new concepts.
Prakash Neupane, @nprkshn
OKFN Ambassador in Nepal and FOSS Enthusiastic, Prakash is working in social development empowering individuals and communities by using technology. He is an Open Data Researcher and Nepali Wikimedian, responsible for Wikimedia Education Program in Nepal. Find out more about him here.
Pierre Chrzanowski, @piezanowski
A member of the French OKFN working in the field of Open Government Data, Pierre says he is really interested to work on Tax Heaven, Public Procurement and Aid Data.
I want to learn more about tools to analyse the data sets and how best to do storytelling.
Samuel S. Lee, @OpenNotion
Currently based in Washington DC, Samuel is a member of the World Bank Group Open Finances team. He loves data, innovation, transparency, photography and college football.
I am passionate about “open” and its potential to transform civic engagement, international development, and the world. I am particularly interested in realizing the potential of open financial information.
A data journalism student with a passion for open culture, Adriana is a member of the Society for Open Information Technologies.
A Hungarian journalist working for an Internet news portal in Romania, Sipos specializes in investigative reporting. His background includes philosophy, sociology and public policies. Sipos has experience working with data, filing FOI requests, and tackling spreadsheets.
I am trying to learn as much as I can about data journalism through online groups, MOOCs and books purchased from Amazon. My ultimate goal is to set up a small investigative / data journalism start-up in Romania.
I want to mobilize action (citizens, elected officials and policymakers) for better process, better clarity, better formats, and more transparency around city budgets.
Elaine Ayo, @eieayo
Statistician student based in Washington, DC Elaine has spent the last three years in Seoul, South Korea as a copy editor for an English news wire. Prior to that Elaine reported for her hometown paper, the San Antonio Express-News, in Texas.
A PhD student in Journalism and Mass Communication at the University of Westminster, London, Alessandro shares an interest in Data-Driven Journalism. He has previously worked in South America (Brazil and Argentina) for a couple of years for the communication unit of the United Nations, UNPD, as a journalist and documentaries writer. He says it was a landmark experience.
I started this new pathway in January/February and soon I started to keep myself busy trying to understand the new journalistic practice in which all of us are engaged: Data-Driven Journalism.
An IT and telecom freelance journalist based in Belgium, Hans studied sociology and has a passion for statistics.
I have started to learn to program and study R but without big results up till now.
Freelance journalist based in the UK, Rochelle is currently working for a B2B pharmaceutical publication. WIth a background in Law she previously worked with the Centre of Investigative Journalism where she first discovered data cleaning.
I would like to learn more about data wrangling in order to better my knowledge of its use for investigative journalism.
In our ongoing debate about the pros and cons of XBRL and CSV, we are pleased that we are able to post this response from Charles Hoffman, who is widely credited as one of the main accountants behind XBRL. If this is the first post you read in the series do not miss the earlier entries in this debate. First Marc Joffe argued for the use of XBRL in municipal reporting. Then Friedrich Lindenberg responded that financial reporting should instead look to transport data in order to reduce complexity. Below follows the response from Charles Hoffman:
XBRL has unfortunately earned the reputation it has because of (a) flaws in the way some regulators implement XBRL and (b) misunderstands of the business people promoting XBRL. This is very consistent with what Gartner calls the “hype cycle”.
The following are the realities and truths which should be considered summarised as succinctly as possible. You can see the details here on my blog.
Point 1: Achieving meaningful exchange
“The only way a meaningful exchange of information can occur is the prior existence of agreed upon semantics, syntax, and workflow/process rules.” This video made available by HL7 explains this in more detail.
Point 2: Formality
If you consider point 1, the “rules” can be somewhat of a bottomless pit. A balance needs to be achieved between practicality (something actually works) and “formality” (spending so much time creating rules and making things so complex that no one could ever use the system). A practical balance needs to be achieved.
Point 3: Expressiveness
While it is true that CSV has been around a long time, it is easy to use, there is lots of support….CSV is not very expressive. CSV is a “flat” tabular structure, two dimensional. Information is “n” dimensional (could have many dimensions). An OWL ontology is WAY, WAY more expressive in terms of creating rules to make sure the information is correct (i.e. Point 1), but it is much more complicated because of that expressiveness.
Point 4: Complexity
While “complexity” can never be removed from a system, the complexity CAN be moved. What I mean by this is that while it is hard to create something like an OWL ontology, computer software can shield business users from the complexity in many, many different ways. One example is the use of “patterns”. Another is using “application profiles”. Another is using the 80/20 rule in terms of creating business rules to assure information quality. I could go on and on about this and show you many, many examples. Fundamentally this all boils down to the this one fact: “XBRL software vendors” are building the wrong software; they have built XBRL technical syntax editors instead of “digital financial reporting” applications or “digital business reporting” applications. This problem is understood by some software vendors who are now building the correct software, others are understanding, everyone will be forced to move in this direction due to market pressure.
Point 5: Guidance-based, semantic-oriented, model-driven, business report authoring enabled by “semantic web” technologies
Authoring business reports in the future will be as different as the difference between creating a photograph when you used a darkroom filled with smelly and chemicals as contrast to using “Photoshop”. What you can do with a business report will also be as different as what you can do with a photograph printed on a piece of paper and a photograph expressed digitally. The key is “metadata” and applications which understand and therefore leverage that metadata. For example, Microsoft Word knows ZERO about creating a financial report. Nothing. Guidance-based, semantic-oriented, model-driven financial report authoring tools (think TurboTax) will have: • Knowledge baked in • New knowledge can be inferred/added • Agility to adapt to ever-changing conditions • Semi-automated data integration • Machine intelligence
You may not be able to imagine these applications, or maybe you can. But when you see an application working correctly, leveraging a rich set of metadata (which you cannot even express using CSV files), it will be very, very easy to grasp these ideas. Read the documents linked do on this blog post.
XBRL is only part of a much, much broader trend of digital business reporting and digital financial reporting. That is part of an even bigger trend, “digital”. Electronic medical records is an example of the much broader trend. Electronic medical records has many of the same issues as what the U.S. Securities and Exchange Commission (SEC) is trying to do with XBRL-based financial filings. The accounting profession and SEC is much, much further down the path than electronic medical records from what I can see. Electronic medical records (EMR) are not “interoperable” or exchangeable between systems yet (XBRL is). There is no international standard for EMR (there is for financial reporting, XBRL).
Generally, people are having the wrong discussion! They are discussing syntax (i.e. CSV, JSON, XML, etc.) and they should be discussing “how the heck are we going to articulate and management semantics”. That is the discussion which needs to occur. This is very, very useful stuff. This is not about saying that CSV is bad and that XBRL is good. They are two different tools for different problems. Using the wrong tool to solve a problem is bad as well as inappropriately using a tool is bad!
The goal as I see this is success. Success means (for business people) cost effective, easy to use, effective, robust, reliable, repeatable, predictable, scalable, secure (when necessary), auditable (when necessary), practical, business information exchange by business users between business systems.
Below you will find a short video where Charles Hoffman explains XBRL:
This is an update with news from members across the OpenSpending community. We list some of the many ways you can get involved, and we give a status on how registrations are coming along for the City Spending Data Party on July 19-21. Hint, we have some great cities participating including Lagos, Minsk and Kathmandu!
On the blog
Patrick Nsukami (Dakar Linux User Group), Pierre Chrzanowski (Open Knowledge Foundation France) and Tangui Morlier (Regards Citoyens) wrote up up a detailed account of how they liberated Senegalese procurements from PDF and published the data on OpenSpending. Félix Ontañón wrote up a post about how he opened up university budgets in Spain. Earlier this month we called for a discussion about the continued growth of the OS project and community based on this proposal. We are still eager to hear your thoughts and comments about this.
Fresh data on OpenSpending
Several fresh datasets have been added from Bosnie-Herzegovina, Japan, Brazil and Uruguay in this round of additions to OpenSpending.
川口市平成25年度一般会計予算 July 9, 2013
Bosnia and Herzegovina July 8, 2013
Programa FDI Uruguay July 8, 2013
Inesc July 8, 2013
Execução do Orçamento Federal do Brasil – 2000-2013, 8 July 2013
We have got an abundance of activities and development going on. Let us know how you wish to be involved.
You can help adding spending data by heading over to our Progress Page where several datasets are waiting to be cleaned and uploaded.
Do you code? Hal Seki from OKF Japan is heading up a community sprint to add a few improvement to OS before and during the Spending Data Party. Get in touch if you want to help out!
Upcoming events and activities
Thursday July 11 19:00 CET / 18:00 GMT / 13:00 EDT: The weekly Data Clinic, where we offer community support on how to work with budget and spending data – via Google Hangout and IRC freenode channel #openspending. Bring your own spending data – or have a look at our Progress page to find one of the spending data sets we’re currently working on. Register for the Data Clinic here or drop in at IRC
Thursday July 18 19:00 CET / 18:00 GMT / 13:00 EDT: The Community GHangout will include updates from across the community and prepare for the City Spending Data Party
Friday July 19 to Sunday 21 City Spending Data Party: Communities across the OS community will get together online during this weekend to map spending in their city. So far community members from Minsk (Belarus), Lagos (Nigeria), San Francisco, Oakland, Kathmandu (Nepal), Kampala (Uganda), Kota Tangerang, (Indonesia) and cities across Japan have registered for the event. Find out all the details about the event and how to register in our announce post.
The summer is here so shouldn’t we be outside? Well, in a way we have. Bulk of the work this month has been on stuff outside of the core OpenSpending platform although there have also been some changes to the platform itself.
First up, and really important to us: List of Contributors!
We added a CONTRIBUTORS file to the root of the openspending repository, just like you would expect in an open source project like openspending.
If you think your name should be there, we didn’t intentionally leave it out. We let git list our contributors but unfortunately that’s only code contributors, there are plenty of other contributors and we want them in our list as well. Let us know if you are missing from the list and we’ll add you!
This might seem like a small change, but it’s an important one. Our installation docs were slightly wrong, which might have caused users some problems. They also referred to an outdated version of solr.
That’s been fixed now so we’ve reduced some frustration involved in getting an instance of OpenSpending up and running. If you notice any places in our docs where improvements are needed let us know, or better yet help us by contributiong improvements.
This month we also released some much needed documentation regarding our development process, to help new contributors (and older ones) to know their way around project contributions. We created a short Howto hack on OpenSpending as well as a more detailed documentation about the development process. For code reviewers we also documented some guidelines for our code review process.
Our satellite template exists to make creation of satellite sites easier. It was created last month (and based on Where Does My Money Go?) but it has already been used to create (or initiate) new satellite sites. As always when a fresh piece of software gets used, unforeseen use cases turn up.
We were notified of those problems and were therefore able to put effort into adapting the satellite template to these new use cases. Some of these changes have been added into the template but as more satellite sites pop up, more changes will most definitely get added.
We encourage you to use the satellite site if you are in the process of creating a satellite site, or want to create a satellite site. Also, let us know if you think we can improve the template.
Preparations for Inflations
We have been working hard on making it possible to do fair historical comparisons with OpenSpending by adjusting for inflation. This is quite a large undertaking but nothing we can’t handle. This month we’ve been laying the foundations for these inflation adjustments.
First we had to collect some data. We decided to start small and start by looking at Consumer Price Indices (CPI) only. These give a pretty good indication about the inflation and are frequently used in economics. The data was collected and stored in standardised form as a data package and made available on http://data.okfn.org/.
We went through a lot of CPI data available online and chose the best open data resource we could find. If you know of better data we could use, please help us improve the data, because better data results in better inflation adjustments on OpenSpending.
Then we created a small module to read in data packages and called it datapackage. We created a fairly generalised module so that it could reused in other areas of OpenSpending or in other projects. This module, which we implemented in Python and made available in the Python Package Index, almost instantly sparked off an equivalent module in Java. Then later we received an improvement to our python module. All in the scope of one month. By the looks of it we succeeded in creating a generalised, reusable module. Well done!
Using our datapackage module we then proceeded to build a first version of an economical transformation toolkit, which we dubbed economics (yes, we are very creative when it comes to naming our python modules). At the moment it can do basic inflation computations using the CPI data and we made that available in the Python Package Index as well. You can add more economical methods and computations if you like!
Now we’re set to start implementing inflation adjustment in the core OpenSpending platform. A huge thanks to all the economists, developers, data wranglers, and advisors for their help. July will be an exciting month!
Another big change coming in July is a more standard blogging platform for OpenSpending. We have been using Jekyll to generate our blogs statically and serving those static pages via the OpenSpending platform. We decided to move our blog back to WordPress. This will make blog contributions even simpler since many people are more comfortable with WordPress than markdown.
We’re not quite there yet (look for this change in July) but we called upon our task force to help us migrate the content and get the blog up to speed. We launched a IRC hack session where we collaborated on a script to migrate Jekyll content to WordPress. The content has been migrated but there are some UI/UX tweaks we want to do before we launch our main blog as a WordPress blog.
If you know your way around WordPress and want to help. Let us know and we’ll fill you in on what needs to be done.
In case you hadn’t noticed we also launched and relaunched again the FarmSubsidy project as part of OpenSpending. We initially launched an improved version of the project (after it was adopted by OpenSpending) on our servers but quickly noticed that we needed a dedicated server for the project.
So we took Farmsubsidy down for a couple of days while we sorted out the server issues and moved it to a dedicated server. After reloading the data onto the new server we were able to relaunch Farmsubsidy so that the user experience should be better than it was in the first few days and now you can really start investigating how the European Union subsidises farms.
As always there were loads of other changes and happenings in and around the OpenSpending project. We would love to get some help to achieve even more in the coming months and like before we want to give a shout-out to all those who helped us in June.
Thanks to Michael Bauer, Gunnlaugur Thor Briem, Lucy Chambers, Velichka Dimitrova, Martin Keegan, Dan Lemon, Andy Lulham, Tom Morris, Prakash Neupane, OpenRotterdam, Florian Oswald, Daniel O’Huiginn, Anders Pedersen, Rufus Pollock, Niels Erik Kaaber Rasmussen, Joel Rebello, Todd D. Robbins, Nils Toedtmann, Stefan Wehrmeyer, and Guo Xu for their contributions this month (there are probably a lot more who’ve contributed somehow to this month’s features so sorry in advance if you’re missing from the list).
Over the last few years OpenSpending has seen rapid growth in terms of technology, datasets, and community. As with many projects when they achieve a certain threshold of success and activity, the time has come to bring a bit more structure to the growth of the community and development of the project in order to empower more explicitly and more formally the growing array of stakeholders in the project.
OpenSpending has always a been a community project. This proposal seeks to reflect this more formally in the governance and organization. The main proposed action is to establish a steering group to oversee the project and represent the growing number of stakeholders. In addition, there is a plan to establish specific “teams” who are looking after particular areas, in particular, a “technical (code) team” and a “data team”.
We emphasize that the project will continue to have a home, legally and infrastructurally, at the Open Knowledge Foundation and the Foundation will continue to be strongly committed to the project.