#archiveteam-bs 2016-09-20,Tue

↑back Search

Time Nickname Message
00:44 🔗 bsmith093 i'm trying to spider archiveofourown.org for urls to grab, and i can't seem to get past the index page. i've tried every user agent i can think of, nothing works!!
00:45 🔗 bsmith093 here's what i'm using
00:45 🔗 bsmith093 wget --spider -U "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36" -m www.archiveofourown.org 2>&1 > urls.txt
00:58 🔗 JesseW has joined #archiveteam-bs
01:11 🔗 godane i'm at 872k items now
01:14 🔗 ivan bsmith093: browsers send more request headers that they might be checking for
01:15 🔗 robink has quit IRC (Ping timeout: 506 seconds)
01:15 🔗 ivan chrome and firefox developer tools have a "copy as curl" feature in the network tab that you can use to construct an identical request
01:18 🔗 bsmith093 ivan: i have no idea where that option is, i'min the dev tab
01:19 🔗 ivan right-click a network request
01:19 🔗 ivan you have to reload the page to see the request for the page itself
01:20 🔗 bsmith093 i've done that. 18 requests 18 kb, right click is doing nothing special
01:21 🔗 ivan Copy -> Copy as curl
01:22 🔗 ivan (that's in Chrome)
01:25 🔗 bsmith093 iv'e only used the dev console once. i have very little idea what i'm doing. i see elements console sources network and timeline
01:26 🔗 ivan network tab
01:26 🔗 bsmith093 i reloaded the page, theres html everywhere, now what?
01:26 🔗 ivan reload the page
01:26 🔗 ivan stay on the network tab, right-click a request
01:27 🔗 bsmith093 k got it now, theres a massive cookie file shoudl i tell wget to use that?
01:27 🔗 Cameron_D has quit IRC (Ping timeout: 370 seconds)
01:28 🔗 bsmith093 actually here
01:29 🔗 bsmith093 the was apm with the curl blob in it.
01:41 🔗 Cameron_D has joined #archiveteam-bs
01:45 🔗 robink has joined #archiveteam-bs
02:29 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
02:29 🔗 Sk1d has joined #archiveteam-bs
02:29 🔗 Sk1d has quit IRC (Connection closed)
02:32 🔗 Sk1d has joined #archiveteam-bs
02:39 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
02:56 🔗 Sk1d has joined #archiveteam-bs
03:09 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
03:15 🔗 Sk1d has joined #archiveteam-bs
03:16 🔗 Swizzle has joined #archiveteam-bs
03:29 🔗 dashcloud has quit IRC (Read error: Connection reset by peer)
03:30 🔗 tomwsmf_ has quit IRC (Ping timeout: 255 seconds)
03:31 🔗 dashcloud has joined #archiveteam-bs
03:51 🔗 mutoso has quit IRC (Read error: Connection reset by peer)
03:51 🔗 Atros has joined #archiveteam-bs
03:51 🔗 mutoso_ has joined #archiveteam-bs
03:52 🔗 robink has quit IRC (ircd.choopa.net hub.efnet.us)
03:52 🔗 Frogging has quit IRC (ircd.choopa.net hub.efnet.us)
03:52 🔗 balrog has quit IRC (ircd.choopa.net hub.efnet.us)
03:52 🔗 Mayonaise has quit IRC (ircd.choopa.net hub.efnet.us)
03:52 🔗 acridAxid has quit IRC (ircd.choopa.net hub.efnet.us)
03:52 🔗 jspiros has quit IRC (ircd.choopa.net hub.efnet.us)
03:52 🔗 coretx has quit IRC (ircd.choopa.net hub.efnet.us)
03:52 🔗 remsen1 has quit IRC (ircd.choopa.net hub.efnet.us)
03:52 🔗 ranma has quit IRC (ircd.choopa.net hub.efnet.us)
03:52 🔗 ivan has quit IRC (ircd.choopa.net hub.efnet.us)
03:52 🔗 chfoo has quit IRC (ircd.choopa.net hub.efnet.us)
03:52 🔗 SadDM has quit IRC (ircd.choopa.net hub.efnet.us)
03:52 🔗 yakfish has quit IRC (ircd.choopa.net hub.efnet.us)
03:52 🔗 Stiletto has quit IRC (ircd.choopa.net hub.efnet.us)
03:52 🔗 trs80 has quit IRC (ircd.choopa.net hub.efnet.us)
03:52 🔗 superkuh has quit IRC (Excess Flood)
03:53 🔗 superkuh has joined #archiveteam-bs
03:54 🔗 atrocity has quit IRC (Ping timeout: 633 seconds)
03:58 🔗 robink has joined #archiveteam-bs
03:58 🔗 Stiletto has joined #archiveteam-bs
03:58 🔗 Frogging has joined #archiveteam-bs
03:58 🔗 balrog has joined #archiveteam-bs
03:58 🔗 Mayonaise has joined #archiveteam-bs
03:58 🔗 acridAxid has joined #archiveteam-bs
03:58 🔗 jspiros has joined #archiveteam-bs
03:58 🔗 coretx has joined #archiveteam-bs
03:58 🔗 remsen1 has joined #archiveteam-bs
03:58 🔗 ranma has joined #archiveteam-bs
03:58 🔗 ivan has joined #archiveteam-bs
03:58 🔗 chfoo has joined #archiveteam-bs
03:58 🔗 SadDM has joined #archiveteam-bs
03:58 🔗 yakfish has joined #archiveteam-bs
03:58 🔗 trs80 has joined #archiveteam-bs
03:58 🔗 hub.efnet.us sets mode: +ooo balrog chfoo SadDM
03:58 🔗 swebb sets mode: +o balrog
03:58 🔗 swebb sets mode: +o SadDM
04:07 🔗 ndiddy has quit IRC (Read error: Operation timed out)
04:36 🔗 Meroje has quit IRC (Quit: bye!)
04:36 🔗 Meroje has joined #archiveteam-bs
04:50 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
04:55 🔗 Aranje has quit IRC (Quit: Three sheets to the wind)
04:57 🔗 Sk1d has joined #archiveteam-bs
05:08 🔗 superkuh has quit IRC (Read error: Operation timed out)
05:08 🔗 Petri152 has quit IRC (Ping timeout: 633 seconds)
05:09 🔗 superkuh has joined #archiveteam-bs
05:14 🔗 fie_ has joined #archiveteam-bs
05:15 🔗 phuzion has quit IRC (Ping timeout: 633 seconds)
05:15 🔗 phuzion has joined #archiveteam-bs
05:20 🔗 Petri152 has joined #archiveteam-bs
05:22 🔗 phuzion has quit IRC (Read error: Connection reset by peer)
05:22 🔗 fie has quit IRC (Read error: Operation timed out)
05:28 🔗 phuzion has joined #archiveteam-bs
05:33 🔗 dashcloud has quit IRC (Read error: Operation timed out)
05:37 🔗 dashcloud has joined #archiveteam-bs
05:44 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
05:57 🔗 RichardG_ has joined #archiveteam-bs
05:57 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
06:07 🔗 dashcloud has quit IRC (Read error: Operation timed out)
06:07 🔗 JesseW has joined #archiveteam-bs
06:11 🔗 dashcloud has joined #archiveteam-bs
06:28 🔗 JesseW has quit IRC (Read error: Operation timed out)
06:45 🔗 BlueMaxim has joined #archiveteam-bs
07:46 🔗 logchfoo2 starts logging #archiveteam-bs at Tue Sep 20 07:46:55 2016
07:46 🔗 logchfoo2 has joined #archiveteam-bs
07:52 🔗 Petri152 has quit IRC (ny.us.hub ircd.choopa.net)
07:52 🔗 bwn has quit IRC (ny.us.hub ircd.choopa.net)
07:52 🔗 fusl has quit IRC (ny.us.hub ircd.choopa.net)
07:58 🔗 bwn_ has joined #archiveteam-bs
08:04 🔗 GE has joined #archiveteam-bs
08:08 🔗 bwn_ is now known as bwn
08:14 🔗 SmileyG has quit IRC (Read error: Operation timed out)
09:01 🔗 Petri152 has joined #archiveteam-bs
09:01 🔗 fusl has joined #archiveteam-bs
09:05 🔗 Smiley has joined #archiveteam-bs
10:20 🔗 GE has quit IRC (Quit: zzz)
10:25 🔗 godane has quit IRC (Read error: Operation timed out)
10:28 🔗 godane has joined #archiveteam-bs
11:51 🔗 GE has joined #archiveteam-bs
12:23 🔗 dashcloud has quit IRC (Read error: Operation timed out)
12:42 🔗 dashcloud has joined #archiveteam-bs
13:16 🔗 phuzion has joined #archiveteam-bs
13:20 🔗 RichardG_ is now known as RichardG
13:23 🔗 VADemon has joined #archiveteam-bs
13:36 🔗 useretail has quit IRC (Ping timeout: 244 seconds)
13:37 🔗 dashcloud has quit IRC (Read error: Operation timed out)
13:37 🔗 dashcloud has joined #archiveteam-bs
14:02 🔗 BlueMaxim has quit IRC (Quit: Leaving)
14:22 🔗 Start has quit IRC (Quit: Disconnected.)
14:32 🔗 useretail has joined #archiveteam-bs
14:51 🔗 JesseW has joined #archiveteam-bs
14:57 🔗 godane i'm up to 873k items
14:58 🔗 godane i'm also at 9400ish items in my godaneinbox
15:15 🔗 BartoCH has quit IRC (Ping timeout: 260 seconds)
15:15 🔗 joepie91 http://dailycaller.com/2016/09/19/computer-tech-who-asked-how-to-strip-out-email-addresses-may-have-worked-for-hillary/
15:15 🔗 joepie91 apparently, Hillary email server tech asked for help on Reddit
15:16 🔗 joepie91 in what seems like falsifying evidence?
15:16 🔗 BartoCH has joined #archiveteam-bs
15:16 🔗 Kaz perfect, they've found a scapegoat
15:19 🔗 joepie91 Kaz: not necessarily. post claimed that he was asked to do so
15:19 🔗 joepie91 by $employer
15:19 🔗 joepie91 so... :P
15:24 🔗 Kaz yeah, true
15:25 🔗 Kaz wonder how this one will be spun
15:29 🔗 sep332 has joined #archiveteam-bs
15:31 🔗 joepie91 a full index of every North Korean domain existence: https://github.com/mandatoryprogrammer/NorthKoreaDNSLeak
15:31 🔗 joepie91 (source; zone transfer misconfiguration)
15:32 🔗 joepie91 hm, maybe not
15:32 🔗 joepie91 oh yeah no, it is, it's just a very small zone
15:32 🔗 joepie91 :p
15:40 🔗 JesseW has quit IRC (Ping timeout: 370 seconds)
15:42 🔗 metalcamp has joined #archiveteam-bs
15:44 🔗 JesseW has joined #archiveteam-bs
15:49 🔗 JesseW has quit IRC (Read error: Operation timed out)
16:04 🔗 GLaDOS has quit IRC (Quit: Oh crap, I died.)
16:05 🔗 GLaDOS has joined #archiveteam-bs
16:10 🔗 VADemon has quit IRC (Ping timeout: 255 seconds)
16:32 🔗 GE I expected nothing less from a country that doesn't think the cold war is over
16:45 🔗 schbirid has joined #archiveteam-bs
16:55 🔗 dashcloud has quit IRC (Read error: Operation timed out)
16:59 🔗 dashcloud has joined #archiveteam-bs
18:10 🔗 SketchCow HEY HI HELLO
18:10 🔗 SketchCow So, I'm Australia for a couple weeks starting this weekend.
18:11 🔗 SketchCow I assume none of you live there (I'll be in Melbourne this time) but I'll also be semi-spotty during that time. (Lots of walking)
18:15 🔗 schbirid dont you mean sketchy?
18:34 🔗 Aranje has joined #archiveteam-bs
18:47 🔗 Aranje has quit IRC (Quit: Three sheets to the wind)
18:51 🔗 Aranje has joined #archiveteam-bs
19:47 🔗 ndiddy has joined #archiveteam-bs
19:58 🔗 metalcamp has quit IRC (Quit: Bye)
20:02 🔗 JW_work1 has joined #archiveteam-bs
20:05 🔗 JW_work has quit IRC (Read error: Operation timed out)
20:19 🔗 Stiletto has quit IRC (Ping timeout: 190 seconds)
20:20 🔗 Stiletto has joined #archiveteam-bs
20:25 🔗 dashcloud has quit IRC (Ping timeout: 244 seconds)
20:26 🔗 dashcloud has joined #archiveteam-bs
20:26 🔗 hook54321 have we ever considered trying to scrape logs.omegle.com ?
20:37 🔗 Kaz plenty of three letter agencies already have a copy
20:38 🔗 schbirid has quit IRC (Quit: Leaving)
20:46 🔗 tomwsmf_ has joined #archiveteam-bs
20:46 🔗 yipdw hook54321: seriously, dude, that's gross
20:47 🔗 yipdw I *know* the URLs have no access control, that is not the point
20:47 🔗 yipdw there is a tremendous difference between saving homepages and saving omegle logs, and the difference is intent to publish
20:48 🔗 xmc ^ seconded
20:48 🔗 yipdw no, there is no HTTP header or browser extension to delineate this
20:48 🔗 yipdw that doesn't fucking mean it doesn't exist
20:50 🔗 yipdw it is also quite possible that Omegle is used by private citizens and we treat their correspondence differently than (say) public figures
20:51 🔗 xmc archiveteam isn't just saving. in essence, we republish. so if something wasn't public and we've made it public, that's on us.
20:52 🔗 xmc publishing new things can ruin lives completely by accident. so we have *always* drawn the line at things that were chosen to be published by a person, and then another person decided to delete.
20:53 🔗 xmc likewise, if someone decides to unpublish their work, that is a thing which we must err on the side of respecting.
20:54 🔗 yipdw well
20:54 🔗 yipdw that's a tricky one but yes
20:54 🔗 xmc it is tricky.
20:54 🔗 xmc we should skew towards respecting the author's wishes
20:55 🔗 xmc we try to not make editorial decisions
20:55 🔗 xmc this means a few things
20:55 🔗 SketchCow ?
20:55 🔗 xmc 1. decision to archive is made based on risk and impact, not approval-of-content or percieved value
20:56 🔗 xmc uh, i forget what i was going to say for 2 and 3
20:56 🔗 SketchCow My attitude is that if there's stuff that makes people quake in their dishwashing soap about saving as an Archiveteam thing, someone who disagrees can use the tools and do the work.
20:56 🔗 xmc yes
20:56 🔗 SketchCow Doesn't mean we need to be doing it. Especially something minor in terms of processing power.
20:56 🔗 xmc but under the archiveteam banner we shouldn't be sucking out semi-private things that weren't publicly displayed
20:57 🔗 SketchCow There's no reason to
20:57 🔗 xmc like, should we archive the list of imeis that weev sucked out of the at&t website hole
20:57 🔗 xmc if he posts it, maybe. but we don't need to go do the same shit.
20:57 🔗 SketchCow No reason to.
20:57 🔗 xmc right
20:57 🔗 SketchCow Someone else can hand over a blackbox "crap hackerz got" thing to the archive, it'll go dark, or not, or whatever
20:57 🔗 xmc exactly what i'm saying
20:58 🔗 xmc we have a reputation, which we've earned by not being dickheads
20:58 🔗 xmc and also being effective
20:58 🔗 ndizzle has joined #archiveteam-bs
20:58 🔗 SketchCow ...we're kind of dickheads
20:58 🔗 SketchCow But we're not ridiculously sociopathic dickheads
20:59 🔗 xmc RIGHT
20:59 🔗 SketchCow Talking like whatever that exiled nutter is who talks in third person
20:59 🔗 JW_work has joined #archiveteam-bs
20:59 🔗 xmc you and me and yipdw all are in 100% concordance here
21:01 🔗 SketchCow Always good to revisit premises
21:01 🔗 SketchCow Remember why we got into it
21:01 🔗 SketchCow FOR A DECADE
21:01 🔗 xmc holy shit you're right
21:01 🔗 xmc but yeah
21:02 🔗 JW_work all this makes sense to me, FWIW
21:02 🔗 xmc original purpose of archiveteam: individual humans decide to publish something, corporations decide to delete it.
21:02 🔗 ndiddy has quit IRC (Read error: Operation timed out)
21:02 🔗 JW_work and there's an important difference between *privately* making a copy of something (and even passing it on to IA as a dark archive) and publishing it for free download
21:06 🔗 hook54321 It technically is public, but I agree that we should try to not damage our reputation.
21:07 🔗 JW_work1 has quit IRC (Ping timeout: 633 seconds)
21:07 🔗 Frogging It's public but not *published*
21:08 🔗 Frogging it's like someone leaving their house unlocked
21:08 🔗 SketchCow I think the thing is, we naturally have acquired/encouraged a general fog of nerds really into saving shit.
21:08 🔗 SketchCow Some of that saving and working isn't what we're into, but it's out there and people have skills, etc.
21:09 🔗 xmc hook54321: if you want to save it, sure. but if you publish it you're being really rude. and don't do it under the archiveteam name.
21:10 🔗 xmc make sense?
21:10 🔗 dashcloud has quit IRC (Read error: Operation timed out)
21:11 🔗 SketchCow It's technically public
21:13 🔗 xmc and you can see the tax return on my desk if you have a drone with a camera
21:13 🔗 dashcloud has joined #archiveteam-bs
21:14 🔗 xmc sometimes the line between published and unintentionally public is vague. but i think this example is fairly straigtforward.
21:15 🔗 hook54321 searching for "site:logs.omegle.com" gives you last of logs, google isn't exactly a drone. I see what you mean though, I think the logs themselves are images for some reason, so they shouldn't be OCR-ed.
21:16 🔗 hook54321 I see how this could potentially damage our reputation though.
21:16 🔗 Frogging surprised there's no robots.txt
21:17 🔗 hook54321 Me too.
21:17 🔗 yipdw never ascribe to malice what can be explained by incompetence or indifference
21:18 🔗 xmc are you really that dense. it's not about our reputation, it's about being decent members of society.
21:22 🔗 hook54321 I know, but there are uses for the logs other than what some assume they are used for. For example, Yik Yak has given researchers access to user's posts.
21:23 🔗 Frogging did omegle do that?
21:24 🔗 yipdw yeah, YikYak did that. it's also an action that's hard to reconcile with YikYak initially being sold as a geographically isolated, anonymous messenger
21:26 🔗 hook54321 idk if omegle has done it or not. There are tons of things about Yik Yak that are hard to reconcile.
21:29 🔗 JW_work has quit IRC (Quit: Leaving.)
21:29 🔗 yipdw anyway, whatever yikyak did or didn't do is tangential. the point I wanted to make is that Omegle isn't a publishing platform, and collating those logs and making them more conveniently available isn't just "I did a bunch of GET requests and shoved them all into this file"
21:31 🔗 yipdw two people using (text-mode) Omegle are deidentified. this doesn't, however, mean that they suddenly have perfect operational security. if you make a bunch of logs more accessible you run the risk of making it possible to identify Omegle users and subject them to the full range of social badness that humans can deliver
21:32 🔗 yipdw that doesn't seem like a particularly productive thing to do
21:32 🔗 JW_work has joined #archiveteam-bs
21:32 🔗 yipdw but I've exceeded my text quota so I'm done here
21:34 🔗 tomwsmf_ has quit IRC (Read error: Operation timed out)
21:34 🔗 yipdw (sidenote: even if they had perfect opsec, I'd still feel like it was wrong; the eavesdropper is an adversary. that's probably not the position you want to be in)
21:50 🔗 tomwsmf_ has joined #archiveteam-bs
22:14 🔗 kyounko has joined #archiveteam-bs
22:14 🔗 kyounko has left
22:56 🔗 dashcloud has quit IRC (Read error: Operation timed out)
23:00 🔗 dashcloud has joined #archiveteam-bs
23:17 🔗 Start has joined #archiveteam-bs
23:22 🔗 GE has quit IRC (Quit: zzz)
23:29 🔗 kristian_ has joined #archiveteam-bs
23:31 🔗 kristian_ hi all
23:31 🔗 kristian_ did you see the news on the North Korean "Internet" being leaked?
23:31 🔗 Frogging yep, we're grabbing them
23:33 🔗 kristian_ cool!
23:34 🔗 kristian_ btw, I was thinking of something ... could a project perhaps be made where people DL their own Facebook?
23:36 🔗 kristian_ not just their posts, but the experience somehow
23:36 🔗 kristian_ if that makes sense ...
23:44 🔗 zhongfu has quit IRC (Remote host closed the connection)
23:45 🔗 zhongfu has joined #archiveteam-bs
23:47 🔗 hook54321 kristian_: which parts of the experience?
23:47 🔗 kristian_ that of the user, hook54321
23:48 🔗 kristian_ this is something that will be lost when FB is gone
23:48 🔗 hook54321 that could potentially be all of facebook
23:48 🔗 kristian_ yeah
23:48 🔗 hook54321 also, part of the experience is interacting with other people :P
23:48 🔗 kristian_ but if you could save something that would let people click around for half an hour or so
23:49 🔗 Frogging I think he means being able to browse it like it was the website
23:49 🔗 kristian_ well ... we all know what happens to people, hook54321
23:49 🔗 kristian_ that's one way of putting it, Frogging
23:50 🔗 hook54321 archive.is kinda has better support for archiving pages on facebook
23:50 🔗 hook54321 Only public stuff though
23:56 🔗 JesseW has joined #archiveteam-bs

irclogger-viewer