Efficiently Summarizing Web Browsing Activity – SANS DFIR Summit 2018

Efficiently Summarizing Web Browsing Activity – SANS DFIR Summit 2018


(serene music) (audience applause) – When I was in high
school, I took a shop class. And the very first
day of the class, the teacher handed
out an assignment. It was just one page and it
had 20 items written on it. He handed out this white piece
of paper gave it to us all, hands down, or face
down, excuse me, and he said, “This
is an assignment, “but it’s also a little
bit of race, right? “So, finish this, make
sure you do it right, “but the first person
who finishes wins.” This isn’t high school, I
don’t remember what we won, but I remember them like,
“All right, let’s do this, “I’m gonna win this thing. “This is gonna be good.” And so the teacher says go,
and I flip over the paper. The very first thing
on the paper says, read all the instructions
on this piece of paper before you move on to step two. And I’m thinking,
“Cool, it’s a race. “No, I’m not gonna do that. “I’m gonna keep going down
through this piece of paper.” So the second question or
instruction is something like, write down every odd
number from one to 30. It’s pretty benign,
but it’s pretty random. So I’m like, “Okay, whatever.” And then the questions start getting
progressively less benign, I guess, for me personally. I think they start
off with like, maybe clap your hands
three times, right, this is a quiet classroom, so clap, that’s
pretty disruptive. It’s like okay, weird. It gets down to like
shout your name out. And I’m like, “All right, “I’m the first
person to do this? “All right, I’ll
shout my name out. “This feels very strange.” And I get down to
like towards the end, It’s like take off
your left shoe, and put it on the
table next to you. I’m just like, “Okay, cool.” Question 20. Thank you for reading all these
instructions ahead of time, write your name
in the top corner and sit there quietly, you
finished the exercise right? So I’m not feeling (chuckles)
very good at this point. That’s kind of a problem. But the whole point of
this exercise, I think, in the teachers perspective
is that this is shop class, there is things in
here that can hurt you. The lesson is it’s
very important to read all the instructions
carefully and follow them. So that’s all I remember
from that class, so I guess it was a
pretty effective lesson. But I’m not talking about
shop class, obviously, I’m talking about
web browser stuff. Because like Chad says, that’s
one of my side passions, I have other jobs in forensics, but my side go-to project always
been stuff on web browsers. So it’s kind of
like my go-to thing. And this is probably
the least catchy title to talk of ever thought of. Officially summarizing
web browsing activity. But basically, I kind of want
to talk about some things. I learned as far as how
you can use a summary and an overview to make
things more efficient. So an overview can save time before an in-depth
investigation. This is kind of like my premise, is when we trying
to convince you guys during this presentation. So like back to my
shop class example. If I had read all
the instructions it probably would have
taken me 30 seconds, I would have finished
the exercise instantly and skipped all that work
and the embarrassment, but I didn’t. So, that’s like a lesson
to take to heart, I guess. And so I also wanted
to kind of take a, before I start talking
about this overview thing, and why I think
it’s a good idea, why would you wanna save time? I mean that’s kind of
like not a silly question, but it’s something
that, in my experience, I’ve had a range
of forensic jobs. I started off as an intern
at the FBI in an RCFL, so that’s kind of
one experience. I moved on to working in small consulting
forensic shops. I worked at a stock at a
large healthcare organization and I worked for
other consultants like doing IR and
forensics, things like that. So I’ve kind of seen the gamut, but it still is
just my perspective. But from my perspective, I’ve been getting more and more
devices that come in, right, and the same amount of time to
the analysis is still fixed. So anything you can do to
make yourself more efficient I really latch on to that,
because it really helps me. And anecdotally I’ve kind
of heard this saying, if you’re like in
law enforcement, these labs are getting
more and more devices, the backlogs are growing. And now, if you’re doing
IR as like a consultant, you might be called on to a site to investigate 50,000
endpoints for something. You just can’t do things
the same way you used to. You need to be more efficient. And so that’s why I wanna,
this is all my perspective, my kind of experience, but I wanna gather somewhere
evidence around this and ask people what
their experience is, and get a broader
understanding of this. So I put out a survey about
investigating web browsers. I don’t know any of
you guys saw that, some of you responded to
it, so thank you very much, I very much appreciate that. But the survey was kind
of targeted to this room. Actually, I kind of promoted
it on Twitter on DFIR, which I think a lot of
people here in the summit, are active on. And I also put out a link on one of the SANS
alumni emailing lists. So I think it should be
kind of representative of the people that are here. And so I’m gonna be referencing
some of the questions and answers from the survey,
throughout my presentation. I just think it’s kind
of giving some background and it broadens the
experience a little bit. So first question, how many devices do you
investigate each month? I took two takes on this. The first part was
how many devices did you investigate
for any reason? And the next one
was how many devices do you investigate
physically for browsers? So this was kind of
to get an idea of, like what percentage
of computers you look at web browsers on. All investigations, the
weighted average, was around 14. And for looking at web
browsers, it was around 9. So around 2/3 of cases, people look at information
on web browsers. I think that’s a good thing. I think web browsers
are our clutch in so many investigations. There’s so many
useful artifacts, and so often web browsers are the way that people
access everything else, in their digital life. So I think web browsers are
really gold mine of information. So also on that, is how many
devices does each person have? Like when I started
off my career, I think it’d be very typical. You said one person,
if you’re doing a case, you have one server or
one computer, one laptop and you do a dead-drive
image using forensics and that was kind of it, right? That’s almost never
the case, nowadays. At least again at my experience. So I wanna see what
other people thought. Not surprising, that
the most common answer is one computer and one mobile. So again everybody that seems
pretty representative, right? Everybody has a
cell phone nowadays, everybody has a computer. If you’re looking at just
getting someone’s actions, you’re not only
looking at one thing. It’s gonna very quickly multiply the scope of your investigation, and can exponentially increase
the amount of work you have with the amount of
hard drive sizes that are kind of
going up enough. I thought I was interesting that no one answered
one mobile device. Like if you’re investigating
mobile devices, you’re gonna have a
truckload of them. This kind of reminds me of
from Lee’s presentation, like we expected the one
laptop and then what you got? Like all those other things. That happens in lots of cases. So anything you can
do to be effective and triage, and get answers
quickly, or disqualify devices, I think can really help you
accelerate your investigations. And then you get back to
putting a number on this stuff. Is how much time do you spend reviewing browser
activity per device? So this was, I wanted
to know how much time are you looking at the analysis. Not how much time your
computer’s cranking away at processing and whatever
spreadsheets you’re putting out. But how much time are
you actually looking and trying to use your
brain to figure out and draw some correlation
from what happened? And again, not surprisingly, there are pretty few sub
15-minute investigations. I think that’d be awesome, but that’s really a short time to get a really substantive
answer on anything. It was common one, I was in this 50 minutes
to one-hour bucket and the weighted average
was around 90 minutes. So you look at your computer, look at your browser history
for around 90 minutes. I think, again, that
kind of giant screen with my experience. And so, if you look
at how many devices, how many people you
investigate each month, how many devices you look at, how many devices each person
has, how much time for this? This can turn out to be a large amount of
time you’re spending. And so even if you have
some small kind of savings, I think it can
help you out a lot. So this wasn’t a
survey question, but I was basically combining
the results to the questions and trying to find
some insights. So this is the
investigation time, how much time you spent
looking at the evidence, versus the number of devices
that you look at per month. And so I expected there to
be a almost like a line. So you’d have a lot of devices
in the top right corner, for you guys. Like a very few devices, you spend lots of
time digging on it. If you’re more like
in a stock scenario, if you’ve lost the devices,
you did a quick triage, you’re looking for
one particular thing. So they’re very, very small. I didn’t quite pan out
that way to the extremes, but I think in general
it kind of follows that. And so it doesn’t really
matter where you are on this spectrum, I
think anything we can do, to help you move
slightly to the left, in this case, we’re using
slightly less time per device, I think that’ll help you
in the cumulative savings. It’s enough about
the setting the stage for all the background stuff. So back to, an overview can save time before
an in-depth investigation. So what should you
put in this overview? The first thing that
I think you should use is utilize visualizations to
reveal trends and patterns. So visualizations
we’re really living in an almost a golden
age of visualizations. For a while now, as
you guys are all aware, more and more devices
and apps and whatnot, are collecting more and
more information about you. And now people are starting to want to get access
to that information. There’s all these
personal dashboards, all this kind of things
that are kind of popping up, as far as how to make
really cool visualizations. Like the New York Times, puts out its really good
interactive visualizations on things to make their
stories more catchy. So there’s lots of
really good examples of visualizations out there to help people get a good
handle on a big data set. And that’s something where the visualizations
are very strong. Because they’re kind
of, by definition, they kind of compress the
data you’re looking at. You have to summarize things
to fit it in a visualization. You can have a million
rows spreadsheet be fitted in a three by
three image if you want to. You’re obviously gonna
lose some information, but if you’re trying to get an
overview of the whole thing, or kind of getting your
hands around the data, I think it’s a really
good way to do it. There’s also this
guy, like I guess, he’s like the grandfather
of visualization. His name is Edward Tufte and he has written
a number of books and they’re really awesome
as far as visualizations. I kind of like visualizations
for lots of reasons. But I’ll just kind
of flip through them, not even reading them sometimes, and just kind of
go for inspiration, for different ways to
process information. So I think visualizations
are awesome. So with that, this is
an open-source tool, that I’m releasing today. So it’s called Synopsis. This is basically a proof
of concept kind of tool that just shows you
some visualizations from browser
history information. This is from Exabeam,
Exabeam Labs, this is our first
open-source kind of tool. It’s basically cut in two parts, there’s a small Python script that just processes some
Chrome browser information that you point it to, and then there’s a
standalone web page where you load the JSON file
the Python scripts generates, and then it shows
these little graphs. So there’s not,
and I’ll be honest, there’s not a lot of things
in this Synopsis program that are super novel
or groundbreaking, it’s just showing you, how you can use these
visualization techniques to look at browser history. The way synopsis works is, there’s all these little cards
that are in this web page. So I’m gonna drop some of
these cards in the presentation to show you some examples, and I’m also gonna
pull in examples from other commercial
or open source tools that do the same kind of thing, because this by no means is not all fully on
visualizations for forensics. And there’s lots of other open, there’s just other
tools out there that you can take the output
and process it some way to make your own visualizations. So I’ll talk about
that just a little bit. So this more get
your ideas going, as far as how you can
use visualizations. Back to the survey. How do you find things
of interest right? What, I mean I should certainly
be surprised, I guess. The most popular answer
was a timeline, right? Forensic people love timelines. We gonna timeline everything,
timeline all the things. But cool, but since we’re
doing that, we’re time lining, we have millions and millions
of rows in these timelines. And it’s really hard to
find that first spot. If you have a time stamp or some kind of known
start an incident, you can use that to begin
with, cool, that makes sense. And if you do, how
do you get a hand, like get your arms around, this million lines of logs or
whatever you’re looking at? Timeline. You can look at it kind
of in a macro view. This is from Synopsis, but again this is definitely
not something novel. Overview of how many events
that happen each day, you can very quickly see
spikes, you can see valleys, you can see periods
of inactivity, you can almost kind of
spot the weekends in there, where it drops down. This is from a work computer. So it just, it makes sense. Oh this is from a
commercial tool. Overview of the
time information. I’m actually not sure how
old the screenshot is, might look different, but
yeah there’s a big timeline. You click on it, you zoom
into a smaller timeline and then at the very bottom, there’s actually
this information that shows you the records. So there’s a lot of
different ways to approach using these kind of
overview on timelines to make sure you don’t
get lost in the details. So how often do you form
each type of review? This is another survey question. And these sods they post is, you can kind of dig into later if you’re interested
in that kind of thing. But the first two questions
on here are basically, if you’re looking at a
small number of websites, the first one is solely
looking at one website, and the second question is, you’re looking at
handful websites. So for the first example say
it’s a malware investigation, or an IR kind of thing, and
you have a (mumbles) URL. So you just want to see how
many devices access this URL. It’s a very simple query, you wanna know the
answer yes or no, and if it’s not, then you
don’t care about the device. So knowing that one thing
from the browser history, can be extremely helpful. For the second one, if
you wanna know activity, on a handful of websites, maybe you only care about
stuff on Facebook and Twitter, maybe only care about
social media kind of things. You just wanna see those
small bits of information. And then just in
the bottom there, the least common type of review was comprehensive line by line. That’s awesome, I would love
to have time to do that, but you almost never have time. There’s some cases that warrant super super in-depth
investigation, but realistically it just
doesn’t really happen. So there’s this thing, on
those first two examples, we’re talking about one or a
handful of websites, right? And so for the big overview,
doesn’t really work because it compresses
information. The summary collapses all these
variables down to one thing, just how many events per day. So why don’t you just look
at the same kind of overview, but it’s for a single
domain, a single website, a single, whatever
you’re interested in. So Tufte, that guy
I was talking about, he has this idea
called sparklines. So sparklines are
these little graphs that you can embed almost like
in a paragraph if you wanted. The point is that
they’re one-dimensional. They show one thing, so
you just kinda get a feel for what’s going on. Like you might take a
paragraph to describe all those high activity, and
those low activities here, or if used to draw
a little line, just google and that shows it, you kind of get the same
answer much more quickly. So I’m calling this domain
sparks instead of sparklines. So it’s just, you’ve got
one domain, (mumbles). This happens to be
the website firm, that I used for every RSS feeds. Apparently for some reason
there’s some gap there, I didn’t look at it at all. If this was an investigation, and I was cared about this
website on this time period, if I see there’s
a gap right here and that’s when I
knew it happened, I could immediately,
not immediately, but I could potentially
discard this device and not look at it further. If it’s the only device I had, it could be a very big clue that I don’t have all
the devices I need that are relevant
for investigations. So go out and find more devices. The other thing that Tufte says, is another way of
looking visualizations and it’s called small multiples. So small multiples
are where you have the same style of
graph just shrink down, you do the same
thing multiple times, but with a different
series of data. So you see in graphs
a lot of times, you’ll have all the infinite
series overlaid on each other. So that’s good for
some situations, but I also find it useful
if you separate them in these small multiples and you can very easily kind
of compare and contrast, what’s going on. So you can’t read
the domains on this, but you can see it’s
very easy to use to see for relative periods
of activity or inactivity across the different sites. So these are all
cards from Synopsis. The way it works in Synopsis, is the top three domains
you access by default pop like these
cards, these sparks, but you can add them in
that website box right there and delete them if you want by clicking little
X on the side. So this is again, if you have like some you know
bad.com you wanna look for, type it in there. If there’s no activity,
then (groans), this didn’t happen from
this browser on this device. So timelines are great. But what if you don’t wanna
look at your data linearly? What if you’re
trying to get a feel kind of that pattern of life. Like well, how this easily
typically does stuff when they go to work,
when they leave, when they’re
looking at websites, what kind of breaks they take. The straight down million
timeline doesn’t really work. So this is a heat map. And so this is kind of
like a cyclical view of it. So over here on the columns
we have the hours per day, so there’s 23. And then the rows are each
of the days of the week. So again in the
intensity of the color and heat map shows more
actions on one particular, grid coordinate
than the other days. If you look on, anybody
uses Google Analytics, if you log on to your
Google Analytics dashboard, there is a very
similar looking graph, where it shows you how
people visit your website. Like this kind of is
useful for lots of things. And back to that, the
question I was asking, how long you spend
on investigations, when I was doing investigations, I would like to spend as
much time as possible, just ’cause I kinda
like web browsers. But I spend a lot of
time trying to get a feel of what the person was doing,
their kind of patterns. And that was more of the art, there’s always art versus
science in forensics. And doing things like
this, we can quantify and actually put data to things, moves us from the
art to the science. Where I just think is
where we need to go to scale and be effective. And again, heat maps are not not
unique to this by any means. This is from timesketch. This is from their online
freely accessible demo. You can view a timeline,
or you can click a button and change it to view different
graphs and visualizations. So this one has a heat map. So this one, you can
very quickly see, Tuesday at 11:00 there’s, if you care about browser stuff, this is one you should look. And all these summary
techniques I’m talking about, are specific to web browsers and this presentation, but my no means are the only
relevant for web browsers. So in the top here
in timesketch, my search is a source
short web history. Anything involves
web browsing records usually changes to be failed
logins, or something like that. The heat map would
be just as effective at finding when things
normally happen, when they don’t happen, when
there’s outliers and whatnot. The last visualization I wanna
talk about is word clouds. I really like word clouds. You guys have probably seen
word clouds on blogs, right? Tag clouds, those are
pretty popular things. The size of the word is
proportional to its frequency, so the bigger the word
is the more it appears. My very first SANS
talk, I talked about, I showed an example of
making a word cloud, from everything in a
person’s browser history. It’s just kind of an
interesting thing. And I found it really, really
useful ever since then. This is an example made from
the Python word cloud library. You just put it
in a block of text and it generates a thing. There’s actually even just
the default colors and values, so whatever. But this was taken from
a user’s URL history. It’s every URL they visited, it chops it up on the
different words and bam! So you can kind of
quickly get a feel for like what is
this person looking, like maybe they’re
gonna quit their job, maybe they’re thinking
about resigning, maybe they want to write
a resignation letter. So it’s just, I think
it’s really good to get get a high-level
overview about what’s going on in some kind of set of evidence. So this is from URLs. But you can use the word cloud with any kind of subset
of data you want, in any kind of data. This one is from
autofill records. So if you type
something in Chrome, it tries to help you remember the form field values
that you put in. This is actually from
one of the SANS images, I guess not the most recent one because Lee’s talking
about his new one. But I think it’s kind of funny, because you can actually see
he talked about some group. All they did is try to find out that probably made something. In autofill, you can actually
see, also about Donald Blake, but you can also
see Lee in there, the thing is kind of funny. And then this one is
search engine queries. Everybody loves
search engine queries. If you pull out all
the search terms, you can kind of get a good feel for what a person was
thinking and looking at. This is actually me from doing research
on building this thing and other stuff in
my general work life. So you can tell I’m doing some
coding, looking at pandas, diagrams, dash, plotly
things like that. It’s just a good way to
get a high-level overview. Next survey question was, how do you find these
types of information? Or how often do you find
these types of information, excuse me. So again, everybody wants
to find that smoking gun, the explicit artifact
of proving their case. And unsurprisingly, that doesn’t
happen every single time. You also might find
some bad activity, but not what we’re Looking for. Again it’s pretty common. And most of time, the very last one is
only non relevant data. Nothing useful that thankfully, for this survey was the
least common answer. But the third one there
is supporting data. This isn’t conclusive by itself, but it’s really useful
supporting information. This was found the most common out of any of these
types of information. Which makes sense, you can
find lots of really useful, relevant stuff in
web browsing history, related to who a person is,
what they do, where they are, where they’re going, things
they buy, their interest, it’s just, really
really helpful. So supporting data is
in web browsers a lot. And the nature of
supporting data is that it’s pretty
well understood if you know what
you’re looking for. So if you have common questions
you’re asking all the time in every single investigation, why not automate the way
to answer it for you? This looked very on
a case-by-case basis and depending on what you
find most interesting for you. But there’s lots of, I think
things that are pretty common, that people look for in
a variety investigations, regardless of the type. So what types of
things do you look for? Again, first results,
searches and queries. Again, not a surprise. And I’m very happy to see that search engine
queries are amazing. There is so much information
embedded in them. We had a good talk from
Phil last year about that. And search engine queries
are pretty well known. We don’t know what
every single value does and how to decode
every single thing, but we know a lot of them. And we definitely
know the basics. What’s the search term, here,
browser forensics right, very simple. So it’s very easy. We understand how this is ID’s, so we can very easily
automate the extraction of this kind of information. So in Synopsis,
and general life, I think I find pairing
a word cloud and a table very helpful. You can look at the word cloud and kind of explore
with your eyes, look where things
look interesting, try to find the big
trends, the small ones, and then you can use
the table next to it, to search and filter and find the timestamp
when a search happened and the actual whole
text of the query. I just find these two
pair very well together. So these are included in the default Synopsis
dashboard overview. Back to that question. The next three answers
email accounts, cloud storage activity
and social media. They all kind of share in common the fact that you’re
looking for accounts that people have on services. In lots of cases I had, back
when I was doing consulting, you’d be interviewing
some subject and they give you the two
Gmail accounts they own, and then you wanna
follow up and say, “Okay, is there a
hidden third one?” If they’re not
telling us about that they used to send off some
some company IP, or whatever. So getting information about
accounts that a user has and they might have forgotten, I think it’s a pretty common
thing, at least in my career. So again, we know
how to do this. This is a thing
that happens often. And so this is
specific to Chrome. So there’s three
places that I go to, it’s kind of like a go-to
for extracting information about accounts people have. The first one for
Chrome, is pretty easy. There’s a SQL database, that’s expressly made for
saving login credentials. So if you have
accounts to things, it saves your username,
your password, the domain. It’s super helpful, So it can parse that out and
kind of put it in this card. The second is autofill records. So auto fill records record any type of information
you store in the forms, at a very high level. And so it doesn’t
tell you what website that you logged
into, or anything. Because the reason of
autofill is to identify, tie the the value
with the form name, So it can suggest
the same value to you on a new website with
a similarly named form. We can use this
to extract things that are like email addresses. We know what email
addresses look like. We have very good ways
to automatically parse and tell this is an
email address or not. So you find somebody typing a
Gmail address a bunch of times and autofill, there’s a
pretty high likelihood that they they own
that Gmail address. They had access to it somehow. So you use autofill for that. And the third one is
you can automatically, you can extract accounts
from browsing history. So there’s some
things like Gmail. If you log into Gmail, it
has your your email address, your Gmail address, then
a dash, then Google Mail. So we know what that looks like. So if you’re looking
at browser history, and you see the title of a
page, it’s on mail.google.com, you can extract
the email address that they used to logged into. So we can do those kind of
basic extractions automatically, so we don’t have to
do it manually later. And so the last thing I’ll
talk about on this one at the very bottom there, it’s evidence of
other owned devices. So that doesn’t get a lot of, it’s not a frequent
investigation target, but I really would encourage you to take a second
thought about that and look at different evidence
about having other devices. So back to kind
of Lee’s example, and talking about even
in the survey question, it’s very very common for
people to have multiple devices. And app vendors
know this, right? They wanna make things as
easy to use as possible, so they help sync things
back and forth all the time. So, even if you
don’t have the device that the user performed
the activity on, you might find evidence that was synced to it
from another device. So having all of devices you can get your hands on from
a user is really important. And there’s some
things you would think that you could expect to
find in browser history about other devices. and there’s some stuff that
I did not expect to find, which I thought was interesting. And I’ve kind of kept this
presentation very high-level. My other ones are more
technical than browser stuff, but I’ll gonna take
a little tangent, because I found something
cool and I wanna show you. (chuckles) So the first
one, we have Synced devices. This is from the resources
synced to Chrome. If you log into Chrome
with your Google account, or whatever your account, you can sync devices
back and forth. There’s a SyncData database. You can find lots of
cool information there, but one of the easiest
things you’re going to find is the names of devices
that you have synced to. So here in the bottom, we
have Ryan’s iPad, exaRyan and desktop. So that’s an iPad, a Mac,
and a Windows computer. So you can tell I synced across these
different kinds of devices. That’s not breaking news, We’ve known of that
stuff for a while. The other one was
new, at least to me. It’s called discovered devices. So this is a
chrome-specific artifact. So I was looking at this and I found some
cool things in there. Like there’s a Chromecast
for our CFO’s office at work. The IP address, and
the names and networks. I didn’t know he had that,
I’ve never been in his office. And there’s a couple of
other devices on there. Google Home. What is this? So there’s this thing called
the media router extensions, built in Chrome by default. And it’s hidden, in the
fact that you can’t, if you open up the extensions, you’re not gonna see
media router in there, but it is there. And that very long string,
starting with pked, is the app ID for
the media router. And it’s like a lot
of Chrome extensions, the media router stores some
information in local storage. So you look in local
storage for that app ID, you’ll find a lot
of information. One of the more useful ones
I found is called SyncMap, I think. And so, I guess back a minute, the reason this media
router extension exists is so you can
helpfully chromecast, or you can cast anything
you want from Chrome, to any cast-supportable devices. So the screenshot on the Left, I’m trying to cast
this website to my TV. So cast to, it found the
TV icon, is populating it. So, cool. I actually found I have a TV
that could support casting. And over on the right, is this
record from local storage. It has the IP address of
my TV, the name of my TV, the model of the TV, and something you can’t
see from this data, but the the key above it, is the name of the
network that we’re on. So you actually
find local devices that are on the networks
where computer was from your web browsing history. Regardless of whether, if
you’ve cast it to it or not, like I’ve never cast anything
to the CFO’s chromecast at my work, or in
that Google Home. I don’t know whose that is. But you can still see
evidence of these devices, even their IP address
from your browser history, so that was pretty cool. I had never, never
seen that before. All right, so done with
a technical tangent. So this is at the
end of my survey. I want to also get a feel for who you’re telling
about your investigation. Nobody does these
investigations for fun, I guess that’s not
true, I do it for fun. But you know what I
mean, like your job, someone’s hiring you, you’re
doing this for a reason. So what is the technical
level of your audience? It doesn’t matter how
good your analysis is. If the person you’re
explaining it to, it goes right over their
head, they have no idea, you might as well not
even have done it. If they didn’t understand it, then it was basically
wasted effort. So sadly, but not surprisingly, the least common
audience is someone that’s like like us. Like a DFIR professional
that understands. You can do shorthand, you
just drop some crazy acronym and people know what it means, that’s you know not very common. So you really need to
tailor your message to the kind of
audience that you have. In order to actually
make an impact and have the effects of
your investigation be felt. So pairing with this, is
how do you tell people? The method. The two most common
answers on this, are either an internal
or external report, written report, right? So people don’t like writing, but I’m sorry writing is
an essential skill (laughs) for our industry,
because if you don’t, if you can’t write effectively, you can’t communicate
to your audience. Again, you might as
well not have done it. But okay, why am I
talking about this? Back to the overview. So this overview
isn’t just this thing that you should do
at the beginning to save time to help you. I think it also, you’re
gonna throw it away after you start
your investigation. I think you can use an
overview as a starting point for actually your
writing your report. These same images that you used to help find your first
thread in your investigation or get overview on trend, can be helpful for explaining
that to another person. And I think pictures
and visualizations are a more universal language. You can show a picture
and a word cloud, and people can look at it,
even if they’re not technical, and get an idea about, “Oh,
they’re interested in this, “and less interested in this.” It’s something that people
can very easily grasp. I think it makes your
report more accessible. It also breaks up
your report, right? So it’s not just entire
paragraphs of text. You can have some some catchy
images to draw people in and not make them go to sleep. So I think the overview, you can even include
some of the information from like the
extracted accounts. If that’s all you
wanted to know, like your overview is
your entire report, you just want to give me all the Gmail addresses this
person owns, check done. You saved a bunch of time. So there’s a lot of really
good reports out there and visualizations
are becoming more and more important in them. There’s a lot of really
good vendor reports, like the DBIR. They have whole teams of people. It seems like they put this
really cool graphics together, like an EM trends. Looks really great. But you don’t need to have
a whole professional team to make a visualization
that helps you in a report. There is this report that… So I was excited about
this for a few reasons. First off, not a lot of forensic
reports are made public. Some are, but most
of them are private ’cause the person
that commissioned them it’s sensitive, they’re
going to keep to themselves. So this is a publicly
released forensic report. I’m not really gonna try
and say any names in them, because this is from Norway, and I would fail at saying them. But the gist of
this investigation was a Norwegian newspaper
got a hold of some log files from a streaming service and they went to a
university, NTNU, say, “Can you help
us look at this? “This looks odd. “We think someone’s monkeying
with the streaming records.” And the short answer
is yes, they were. But this report had a lot of
math in it, a lot of stats. It could have easily
been very, very dry and hard to understand, but I thought they made very effective use
of visualizations. If you like that kind of stuff, I would encourage you to
read the whole report. It’s 70 pages, but 50 of it
is tables which you can skip. Like tables of, yeah. So this is an example
of visualization that is not very complicated. They got logged sources
from two separate days. And they could have just said, and in fact they did
say in a paragraph, we got it from this day
range and this day range. But they also included
the visualization. So you very easily see where
the data is or there’s a gap. It’s just something that
breaks up the report and it’s very simple to do. There actually
was a fifth month, but I truncated it so you
can see the picture better. Again, with the investigation, I swear I’ll give you the
really short overview, they thought that some
track plays were inflated to basically pay some people increased royalties at
the expense of others, for whatever unknown reason. And it was, I think,
Kanye and Beyonce, were these two albums that they thought they’re
inflating the tracks on. And this is from
one particular day on the track counts for I
think one of Kanye’s albums. And again, they explained
all the stuff in text, but you can very easily see
there’s something going on here. This is like every other track
record just use runs to zero. It doesn’t even exist anymore. And you can easily see that
from the kind of records. They got their point across
with this visualization. The other thing is,
if you look at this, do you recognize this? I know if you look
at Excel a lot, but I’m pretty sure
this is from Excel. You don’t need some
fancy expensive software, and some design team. You can do it with
whatever you already have. It doesn’t have to be flashy as long as you have
the substance right in the visualization. This is actually my
favorite one from it. They got into the details of
how the manipulation was done and they found that
all these tracks, there is this big
multiple of three. People played three
certain tracks more often like three
tracks and six tracks and nine tracks
more than others. Then it went to all
this the stats laws about how things decay
following this curve. And then they show
you this graph. And so you see each
color is a day. And you can see there’s
only certain days where there’s manipulation
and it’s blindingly obvious. You can see if things
look artificial and not natural at all. And so I thought this is a
very effective visualization. Again, looks like it’s
made made in Excel. All right, as the last
thing is writing guides. I said writing is
very important. Visualisations are good
start for your report, but I don’t think
you can get away with just turning into
visualization as your answer. If you can, cool
more power to you, but you’ll probably
have to write something. One of my jobs, they
issued this book to everyone that started. The Elements of Style. It’s this tiny book, and
it’s really good reference for how to how to write
things using active voice, all this kind of stuff. So, if you’re
interested in writing and making writing better, that’s a really kind
of deep dive on it. There’s also a another guide from one of the SANS
instructors, Lenny. This is the entire
guide, it’s one page. If you want the
cliff notes version, check this out, download it and kind of keep your
writing a little better. So yeah. So an overview can save time
before an investigation. I would encourage you
to use visualizations and kind of extract the things that you’re always
gonna ask for. And you can also use
it as a starting point for your final report. And I’d encourage
you to use this for next time you’re
doing a web investigation. Just look at this,
pull up the records, see what the
visualizations tell you. I think, a lot of times it’s a really good place to
start pulling on a thread and give you a new
idea and investigation that you wouldn’t have
thought of initially. Alright, that’s it,
just have any questions? All right. Well, thank you. (audience applause) (dramatic music)

4 thoughts on “Efficiently Summarizing Web Browsing Activity – SANS DFIR Summit 2018

Leave a Reply

Leave a Reply

Your email address will not be published. Required fields are marked *