Help with Redshift queries for page views, sessions and engagement


#1

Hi Team,

I want the data for every urlHost with metrics like Sessions,Page Views, Bounces and Sum of Page Pings. The query I am writing is giving me the wrong data(cartesian product I think) can you guys please validate my query.

SELECT     t1.page_urlhost, 
           Count (1)              AS "number_of_events" , 
           Sum(domain_sessionidx)    sessions , 
           tbl1.sumbounces        AS bounce, 
           ( 
                  SELECT Count(event) 
                  FROM   atomic.events AS ent 
                  WHERE  ent.event='page_view' 
                  AND    t1.page_urlhost=ent.page_urlhost 
                  AND    collector_tstamp > CURRENT_DATE - integer '7' ) AS page_view, 
           ( 
                  SELECT count(event) 
                  FROM   atomic.events AS ent 
                  WHERE  ent.event='page_view' 
                  AND    t1.page_urlhost=ent.page_urlhost 
                  AND    collector_tstamp > CURRENT_DATE - integer '7' ) AS page_ping 
FROM       atomic.events t1 
INNER JOIN 
           ( 
                    SELECT   page_urlhost , 
                             sum(bounces) AS sumbounces 
                    FROM     ( 
                                      SELECT   page_urlhost, 
                                               domain_userid, 
                                               CASE 
                                                        WHEN count (domain_userid)>1 THEN '0' 
                                                        ELSE '1' 
                                               END AS bounces 
                                      FROM     atomic.events 
                                      WHERE    collector_tstamp > CURRENT_DATE - integer '7' 
                                      GROUP BY page_urlhost, 
                                               domain_userid) 
                    GROUP BY page_urlhost ) AS tbl1 
ON         tbl1.page_urlhost=t1.page_urlhost 
GROUP BY   t1.page_urlhost, 
           tbl1.sumbounces limit 1000;

Appreciate your help.

Thanks!


#2

Hi,

should be

count(distinct domain_sessionid)    as sessions

because the session_idx is just the counter of sessions per domain_userid

should be

SELECT count(event_id) 
                  FROM   atomic.events AS ent 
                  WHERE  ent.event='page_ping' 
                  AND    t1.page_urlhost=ent.page_urlhost 
                  AND    collector_tstamp > CURRENT_DATE - integer '7' ) AS page_ping 

and for bounces it depends on your definition of a bounce, i think the most common is: only one pageview per session.


#3

Hi @ecoron

Thanks for all your help here. Now, from last week I have one problem i.e. I want to create an network graph which looks at sites(page_urlhost) which are connected through a common user(3rd party cookie/ network_userid).

To make it simple I have break it down into below steps:
 Get a list of user who have seen more than 1 page_urlhost during 2016
 Take first user, get list of page_urlhost visited
 For every user 2:n
For user x get a list of sites visited
For common sites between user 1 and user x increment Site A <-> Site B
 Repeat for every user

Can you please help me with this query. Appreciate your help.

Thanks!
Deepak Bhatt


#4

Hi Team,

Does it make sense to you guys?

Regards
D B