Views and Stored Procedures in PostgreSQL - What are they and when to use them for increasing user and database efficiency with examples
Looking for an edge delivering more data insights? This article is about easy ways to empower users by implementing database views and stored procedures, offering insights into how and why, their respective advantages, plus real life examples to get even greater efficiency for your users. Demonstrating use cases for views and stored procedures, with comparison of features. Based on PostgreSQL, it's applicable to any database with support for views and procedures.
Time is an illusion. Lunchtime doubly so.
Douglas Adams in The Hitchhiker's Guide to the Galaxy
Before diving in, let's define what we mean by efficiency
Efficient adj: Making, Causing to be; Productive of effects; Adequately skilled and active.
In this article, efficiency can mean:
- Efficiency of people's time - fast and responsive;
- Efficiency of people's time - ease of use and re-use;
- Efficiency of machine resources;
- Scalability.
Worthy of note
"Efficiency of people's time" has two meanings, one that produces results quickly, "responsive", and one that although often slower, is easier to learn, use and re-use.
"Efficiency of resources", especially when tempered with "User Experience" often leads to significant "Efficiency of people".
"Scalability" is one step further in ensuring that what works for a few, can be made to work for the many.
There is a natural tension between all these meanings of efficiency, so any technical decision is best made in conjunction with clearly defining and balancing business objectives.
Business context
To find that edge, your staff need easy access to data! Making data accessible via Views or Procedures, offers easier ways to discover more about your users, products and services and reduce the time it takes to surface new opportunities.
With an imperative to continually ask questions and reduce time to market, it's essential to make access to information easy and intuitive.
Individual transactions either for a sales force or website, responses must be sub second.
Ad-hock reports, when predefined / packaged can benefit end users with familiarity and ease of use with, for example, Microsoft Access, Excel or LibreOffice Base, Calc.
Reports can be prepared ahead of time such as "How many products did we sell yesterday?", or immediate "What's my commission on that last sale?".
Making core or complex information easily available, reduces both barriers to research and time to market delivering that competitive edge.
Technical choices
With ample material elsewhere on optimising and organising data, that aspect is outside the scope of this article.
Views are a way of packaging up complex queries into something more manageable. There are many uses for views, this article focuses on benefits including performance, avoiding re-inventing common queries, aiding best practice and increasing productivity through re-use and understanding. Views can be used with other tables and views and are usually intended to be filtered and combined in a flexible manner.
Procedures, sometimes called stored procedures are highly optimised and can include complex business logic. They are often used as discrete components, though can be combined with other procedures, queries and views.
Although clean separation of responsibilities can suggest data be stored on a database and all processing done in an application, there are clear performance and ease of use advantages with database side processing. This article demonstrates and offers insight into those advantages.
Database views
As their name suggests, they are a packaged view into data. Often combined with other views and tables, they are flexible and can filter, join, manipulate and sort data. Great for generic or specific queries, perfect for packaging up common queries, with reduced complexity, making use and re-use easy. The following example is taken from a schema used in an open source project to manage DMARC feedback reports.
DMARC is a standard offering additional security for emails. Django-DMARC helps with implementation by capturing and reporting on feedback reports. The tables are optimised for the data they contain and consist of a reporter, a DMARC feedback record, and a SPF and DKIM results table aliased here to spf_dmarc_result and dkim_dmarc_result.
The majority of reports use just one query to filter and present data, it makes sense to package to a view for ease of use and re-use.
The following is the code to create the view. As can be seen, the query is complex and some columns have been re-named to improve user experience.
SQL to create a view
CREATE VIEW view_dmarc_report AS SELECT dmarc_reporter.org_name, dmarc_reporter.email, dmarc_report.date_begin, dmarc_report.date_end, dmarc_report.policy_domain, dmarc_report.policy_adkim, dmarc_report.policy_aspf, dmarc_report.policy_p, dmarc_report.policy_sp, dmarc_report.policy_pct, dmarc_report.report_id, dmarc_record.source_ip, dmarc_record.recordcount, dmarc_record.policyevaluated_disposition, dmarc_record.policyevaluated_dkim, dmarc_record.policyevaluated_spf, dmarc_record.policyevaluated_reasontype, dmarc_record.policyevaluated_reasoncomment, dmarc_record.identifier_headerfrom, spf_dmarc_result.record_type AS spf_record_type, spf_dmarc_result.domain AS spf_domain, spf_dmarc_result.result AS spf_result, dkim_dmarc_result.record_type AS dkim_record_type, dkim_dmarc_result.domain AS dkim_domain, dkim_dmarc_result.result AS dkim_result FROM dmarc_reporter INNER JOIN dmarc_report ON dmarc_report.reporter_id = dmarc_reporter.id INNER JOIN dmarc_record ON dmarc_record.report_id = dmarc_report.id LEFT OUTER JOIN dmarc_result AS spf_dmarc_result ON spf_dmarc_result.record_id = dmarc_record.id AND spf_dmarc_result.record_type = 'spf' LEFT OUTER JOIN dmarc_result AS dkim_dmarc_result ON dkim_dmarc_result.record_id = dmarc_record.id AND dkim_dmarc_result.record_type = 'dkim';
Using a view
Using a view is incredible simple, for example to obtain a summary of dispositions in January, simply use the following sql:
SELECT policyevaluated_disposition, count(*) FROM view_dmarc_report WHERE date_begin >= '2017-01-01' AND date_end < '2017-02-01' GROUP BY policyevaluated_disposition ORDER BY policyevaluated_disposition;
The complexity remains out of the way in the view, any unused columns and tables are further optimised by the database to return just the information required.
By packaging up complex queries into a view they are easier to use. There are dozens of features and an infinite number of reasons to use them, see Resources to explore further.
Comments
Highly efficient query creating a flat representation of the dmarc feedback record.
As an end user request, any extra tables, views, joins and filters are sent to the database for processing where it will be optimised, returning only the requested rows and columns.
View still too slow/complex? PostgreSQL offers "Materialised views" with flexible data caching providing significantly faster access.
End users can use, re-use, filter, slice and dice the data as required using familiar tools such as Microsoft Excel and Access, and Libreoffice Calc and Base etc.
Cost of developing this view was minimal. It is almost re-creating and directly representing the feedback record.
Procedures and complex functions in PostgreSQL
When requirements are complex or need additional processing, procedures can be ideal. A complete programming language in their own right, and in this article native PostgreSQL PL/pgSQL. Most major databases include choices for programming languages so you can use whichever best fits your requirements; for a list/comparison see resources. With very close access to data they are incredibly fast. Closely tied to data, it can make sense investing in optimisation to process information through procedures, presenting output in a more easily digested format.
Highly optimised, stored procedures are usually used as discrete components, though they can be combined with other tables and views if required. Often used to generate specific reports or data sets, they are frequently used in single data panels or as a focused aspect to a report combined with other data. They can be easily consumed by Microsoft Excel, Access, Libreoffice Calc, Base and similar tools.
In this article's example, bookings for a room, apartment or facility are calculated to provide utilisation analysis by month for each unit. The output is presented as follows.
Unit | J | F | M | A | M | J | J | A | S | O | N | D |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Unit 1 | 22 | 18 | 24 | 24 | 26 | 28 | 24 | 20 | 18 | 22 | 24 | 24 |
Unit 2 | 24 | 26 | 22 | 26 | 24 | 26 | 18 | 20 | 22 | 24 | 26 | 28 |
Unit 3 | 27 | 22 | 24 | 29 | 23 | 18 | 15 | 17 | 16 | 18 | 16 | 14 |
The database schema is straightforward with two tables, one for units, and another for bookings.
Although conceptually simple, getting at the data requires some calculations that are easier to program into a client side app or in this example a server side stored procedure. The desired outcome is to tabulate the number of booked days by month. As can be seen from the example stored procedure or PostgreSQL function that follows, it takes a year as a parameter, defines what will be returned, creates a few variables, a temporary table, then populates the temporary table with data for each month, then returns the final output table joined with unit information, in this case just a title.
Comments
Highly efficient stored procedure/function creating a tabulated representation of unit utilisation by month for a year.
As all the processing is close to the data, it is very fast. Unlike a view, all the data will be processed regardless of any WHERE filters, though the returned data can be filtered.
This function was designed for output to a dashboard, end users can re-use, filter, slice and dice the output as required using common data imports from Microsoft Excel, Access, Libreoffice Calc and Base etc.
Cost of designing this procedure is greater due to the complexity and additional testing.
SQL to create a PostgreSQL function (stored procedure)
CREATE OR REPLACE FUNCTION public.sp_utilisation_unit(year integer) RETURNS TABLE( id integer, title character varying, days_01 integer, days_02 integer, days_03 integer, days_04 integer, days_05 integer, days_06 integer, days_07 integer, days_08 integer, days_09 integer, days_10 integer, days_11 integer, days_12 integer ) AS $BODY$ DECLARE d_start date := '1970-01-01'; d_end date := '1970-01-01'; d_end_calc date := '1970-01-01'; i_month integer := 0; BEGIN BEGIN -- Create temp table to hold utilisation DROP TABLE IF EXISTS temp_utilisations; CREATE TEMP TABLE temp_utilisations ( id integer, month integer, days integer ) ON COMMIT DROP; END; BEGIN FOR i_month IN 1..12 LOOP -- Calculate dates -- d_start = Beginning of the month SELECT make_date(year, i_month, 1) INTO d_start; -- d_end = End of the month SELECT make_date(year, i_month, 1) + interval '1 month' - interval '1 day' INTO d_end; -- d_end_calc = Beginning of the next month -- so that checkout - checkin works SELECT make_date(year, i_month, 1) + interval '1 month' INTO d_end_calc; -- Select and save monthly data INSERT INTO temp_utilisations ( id, month, days) SELECT b.unit_id, i_month, sum(CASE WHEN b.checkout > d_end THEN d_end_calc ELSE b.checkout END - CASE WHEN b.checkin < d_start THEN d_start ELSE b.checkin END) AS days FROM my_booking AS b WHERE b.checkin BETWEEN d_start AND d_end OR b.checkout BETWEEN d_start AND d_end GROUP BY b.unit_id; END LOOP; END; RETURN QUERY SELECT a.id, a.title, CAST(sum(u1.days) AS integer) AS days_01, CAST(sum(u2.days) AS integer) AS days_02, CAST(sum(u3.days) AS integer) AS days_03, CAST(sum(u4.days) AS integer) AS days_04, CAST(sum(u5.days) AS integer) AS days_05, CAST(sum(u6.days) AS integer) AS days_06, CAST(sum(u7.days) AS integer) AS days_07, CAST(sum(u8.days) AS integer) AS days_08, CAST(sum(u9.days) AS integer) AS days_09, CAST(sum(u10.days) AS integer) AS days_10, CAST(sum(u11.days) AS integer) AS days_11, CAST(sum(u12.days) AS integer) AS days_12 FROM my_units AS u LEFT JOIN temp_utilisations as u1 ON u1.id = u.id AND u1.month = 1 LEFT JOIN temp_utilisations as u2 ON u2.id = u.id AND u2.month = 2 LEFT JOIN temp_utilisations as u3 ON u3.id = u.id AND u3.month = 3 LEFT JOIN temp_utilisations as u4 ON u4.id = u.id AND u4.month = 4 LEFT JOIN temp_utilisations as u5 ON u5.id = u.id AND u5.month = 5 LEFT JOIN temp_utilisations as u6 ON u6.id = u.id AND u6.month = 6 LEFT JOIN temp_utilisations as u7 ON u7.id = u.id AND u7.month = 7 LEFT JOIN temp_utilisations as u8 ON u8.id = u.id AND u8.month = 8 LEFT JOIN temp_utilisations as u9 ON u9.id = u.id AND u9.month = 9 LEFT JOIN temp_utilisations as u10 ON u10.id = u.id AND u10.month = 10 LEFT JOIN temp_utilisations as u11 ON u11.id = u.id AND u11.month = 11 LEFT JOIN temp_utilisations as u12 ON u12.id = u.id AND u12.month = 12 GROUP BY u.title, u.id ORDER BY u.title; END;$BODY$ LANGUAGE plpgsql VOLATILE COST 100 ROWS 1000;
Using a PostgreSQL procedure in Django
Consuming this data is very easy.
The following Django function consumes the stored procedure, then calls a template for presentation
def utilisation_unit_year(request, year=None): """Show apartment utilisation""" if year is None: year = date.year # Go fetch utilisation sql = 'SELECT * FROM sp_utilisation_unit(%s);' cursor = connection.cursor() cursor.execute(sql, [year]) data = cursor.fetchall() context = { "data": data, } return render(request, 'myapp/utilisation.html', context)
The following template snippet from utilisation.html is all that is required for presentation.
<table> {% for d in data %} <tr> {% for c in d %} <td>{% if c %}{{ c }}{% else %}-{% endif %}</td> {% endfor %} </tr> {% endfor %} </table>
Summary
It's essential to clearly define requirements and objectives.
Making core or complex information easily available, reduces both barriers to research and time to market.
Optimising frequently used or time specific information can make a difference to sales feedback and incentives with significant results.
Views tend to be used for packaging core or complex queries into something easily more easily used and re-used than original queries. These views can be both generic and optimised making them ideal for both regular and ad-hock requests.
Procedures are, by their nature more complicated to produce, they are a great way to process complex data incredibly fast at source, and making end presentation easier.
Views and Procedures add simplicity and ability to extend databases making them faster and easier to use and re-use than ever. There are many technical and business uses, and the reader is encouraged to further explore the listed resources.
Feature comparison | Views | Procedures |
---|---|---|
Ease of use | Easy | Easy – moderate |
Efficiency | High | High |
Development cost | Moderate | High |
Task specific | Moderate | High |
Scope for optimisation | Possible | Limited |
Use with views | Yes | Yes |
Use with procedures | Yes | Yes |
This feature comparison is intended as a guide, actual results depend on implementation. If in doubt, measure and test!
PostgreSQL is a freely available open source object relational database management system. Using SQL the structured query language used by most databases to create views and the many procedural languages to create user defined functions, it's possible to achieve almost anything. Other databases such as Oracle, IBM DB2, Microsoft SQL Server, Sybase and MySQL are just some of the many databases making it easier to surface information in your data.