Questions › SQL Query to find rows that didn't occur this month
Abs Asked

I am trying to find the number of sellers that made a sale last month but didn't make a sale this month.

I have a query that works but I don't think its efficient and I haven't figured out how to do this for all months.

SELECT count(distinct user_id) as users
FROM transactions
WHERE MONTH(date) = 12
AND YEAR(date) = 2015
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
AND transactions.user_id NOT IN 
(
    SELECT distinct user_id
    FROM transactions
    WHERE MONTH(date) = 1
    AND YEAR(date) = 2016
    AND transactions.status = 'COMPLETED'
    AND transactions.amount > 0
)

The structure of the table is:

+---------+------------+-------------+--------+
| user_id |    date    |   status    | amount |
+---------+------------+-------------+--------+
|       1 | 2016-01-01 | 'COMPLETED' | 1.00   |
|       2 | 2015-12-01 | 'COMPLETED' | 1.00   |
|       3 | 2015-12-01 | 'COMPLETED' | 2.00   |
|       1 | 2015-12-01 | 'COMPLETED' | 3.00   |
+---------+------------+-------------+--------+

So in this case, users with ID 2 and 3, didn't make a sale this month.

Comments :
AdamMc331 replied

What makes you think it's inefficient? Is it running slowly?

AdamMc331 replied

As far as during this for the current date and month, you can use the CURDATE() function and manipulate that value as necessary, or feed in one date using a variable and having the query calculate the previous month.

Strawberry replied

@McAdam331 It's not efficient in part because functions cannot use indexes


4 Answers :
dnoeth answered

Use conditional aggregation:

SELECT count(*) as users
FROM
 (
    SELECT user_id
    FROM transactions
                       -- 1st of previous month
    WHERE date BETWEEN SUBDATE(SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1), interval 1 month) 
                       -- end of current month
                   AND LAST_DAY(CURRENT_DATE)
    AND transactions.status = 'COMPLETED'
    AND transactions.amount > 0
    GROUP BY user_id
           -- any row from previous month
    HAVING MAX(CASE WHEN date < SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1)
                    THEN date 
               END) IS NOT NULL
           -- no row in current month
       AND MAX(CASE WHEN date >= SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1)
                    THEN date 
               END) IS NULL           
 ) AS dt

SUBDATE(CURRENT_DATE, DAYOFMONTH(CURRENT_DATE)-1) = first day of current month

SUBDATE(first day of current month, interval 1 month) = first day of previous month

LAST_DAY(CURRENT_DATE) = end of current month

Abs replied
Brilliant, this is way faster than my query, thank you! I've now simply looped through it with my app code and simply changed the current date in each loop.;
Nir Levy answered

if you want to generify it, you can use curdate() to get current month, and DATE_SUB(curdate(), INTERVAL 1 MONTH) to get last month (you will need to do some if clause for January/December though):

SELECT count(distinct user_id) as users
FROM transactions
WHERE MONTH(date) = MONTH(DATE_SUB(curdate(), INTERVAL 1 MONTH))
AND transactions.status = 'COMPLETED'
AND transactions.amount > 0
AND transactions.user_id NOT IN 
(
    SELECT distinct user_id
    FROM transactions
    WHERE MONTH(date) = MONTH(curdate())
    AND transactions.status = 'COMPLETED'
    AND transactions.amount > 0
)

as far as efficiency goes, I don't see a problem with this one

Strawberry replied
It suffers from the same flaw as the original, insofar as functions are unable to employ indexes. If you can rewrite as a range query then this can you an index and will be much faster.;
Strawberry answered

The following should be pretty efficient. In order to make it even more so, you would need to provide the table definition and and the EXPLAIN.

SELECT COUNT(DISTINCT user_id) users
  FROM transactions t
  LEFT 
  JOIN transactions x
    ON x.user_id = t.user_id
   AND x.date BETWEEN '2016-01-01' AND '2016-01-31'
   AND x.status = 'COMPLETED'
   AND x.amount > 0
 WHERE t.date BETWEEN '2015-12-01' AND '2015-12-31'
   AND t.status = 'COMPLETED'
   AND t.amount > 0
   AND x.user_id IS NULL;

Just some input for thought:
You could create aggregated lists of user-IDs per month, representing all the unique buyers in that month. In your application, you would then simply have to subtract the two months in question in order to get all user-IDs that have only made a sale in one of the two months.
See below for query- and post-processing-examples.

In order to make your query efficient, I would recommend at least a 2-column index for table transactions on [status, amount]. However, in order to prevent the query from having to look up data in the actual table, you could even create a 4-column index [status, amount, date, user_id], which should further improve the performance of your query.

Postgres (v9.0+, tested)

SELECT   (DATE_PART('year', t.date) || '-' || DATE_PART('month', t.date)) AS d,
         STRING_AGG( DISTINCT t.user_id::TEXT, ',' ) AS buyers
FROM     transactions t
WHERE    t.status = 'COMPLETED'
AND      t.amount > 0
GROUP BY DATE_PART('year', t.date),
         DATE_PART('month', t.date)
ORDER BY DATE_PART('year', t.date),
         DATE_PART('month', t.date)
;

MySQL (not tested)

SELECT   (YEAR(t.date) || '-' || MONTH(t.date)) AS d,
         GROUP_CONCAT( DISTINCT t.user_id ) AS buyers
FROM     transactions t
WHERE    t.status = 'COMPLETED'
AND      t.amount > 0
GROUP BY YEAR(t.date), MONTH(t.date)
ORDER BY YEAR(t.date), MONTH(t.date)
;

Ruby (example for post-processing)

db_result                   = ActiveRecord::Base.connection_pool.with_connection { |con| con.execute( db_query ) }
unique_buyers               = db_result.map{|e|[e['d'],e['buyers'].split(',')]}.to_h

buyers_dec15_but_not_jan16  = unique_buyers['2015-12'] - unique_buyers['2016-1']
buyers_nov15_but_not_dec16  = unique_buyers['2015-11']||[] - unique_buyers['2015-12']
...(and so on)...