#archiveteam-bs 2017-06-26,Mon

↑back Search

Time Nickname Message
00:02 🔗 nyany quick question
00:03 🔗 nyany I can't remember if this was us or not, but a while back there was an archiving tool to basically scrape images from certain image hosting websites, one such was I believe, prntscr
00:04 🔗 nyany have we given any thought to archiving sites like imgur?
00:44 🔗 icedice has quit IRC (Ping timeout: 740 seconds)
00:54 🔗 j08nY has quit IRC (Quit: Leaving)
00:58 🔗 BlueMaxim has joined #archiveteam-bs
01:36 🔗 schbirid2 has joined #archiveteam-bs
01:40 🔗 schbirid has quit IRC (Read error: Operation timed out)
02:05 🔗 fie has quit IRC (Read error: Operation timed out)
02:19 🔗 fie has joined #archiveteam-bs
02:28 🔗 kristian_ has quit IRC (Quit: Leaving)
02:57 🔗 wacky_ has quit IRC (Read error: Operation timed out)
02:57 🔗 wacky_ has joined #archiveteam-bs
02:57 🔗 HUBI has quit IRC (Read error: Operation timed out)
02:57 🔗 balrog has quit IRC (Read error: Operation timed out)
02:57 🔗 HUBI has joined #archiveteam-bs
02:57 🔗 sep332 has joined #archiveteam-bs
02:58 🔗 midas1 has quit IRC (Read error: Operation timed out)
02:58 🔗 midas1 has joined #archiveteam-bs
02:58 🔗 dashcloud has quit IRC (Read error: Operation timed out)
02:58 🔗 kanzure has quit IRC (Read error: Operation timed out)
02:58 🔗 chazchaz_ has quit IRC (Read error: Operation timed out)
02:58 🔗 kurt has quit IRC (Read error: Operation timed out)
02:58 🔗 whydomain has quit IRC (Read error: Operation timed out)
02:58 🔗 closure has quit IRC (Read error: Operation timed out)
02:58 🔗 sep332_ has quit IRC (Read error: Operation timed out)
02:58 🔗 kanzure has joined #archiveteam-bs
02:59 🔗 closure has joined #archiveteam-bs
02:59 🔗 whydomain has joined #archiveteam-bs
02:59 🔗 Yurume has quit IRC (Read error: Operation timed out)
02:59 🔗 jerrystie has quit IRC (Read error: Operation timed out)
03:00 🔗 balrog has joined #archiveteam-bs
03:00 🔗 swebb sets mode: +o balrog
03:00 🔗 Yurume has joined #archiveteam-bs
03:01 🔗 godane SketchCow: i'm watching your twitch stream
03:02 🔗 dashcloud has joined #archiveteam-bs
03:02 🔗 c4rc4s has quit IRC (Ping timeout: 506 seconds)
03:04 🔗 c4rc4s has joined #archiveteam-bs
03:06 🔗 jrwr godane: He has a twitch?
03:06 🔗 jrwr also SketchCow https://ubidestates.hibid.com/catalog/103245/radioshack-auction--1/
03:06 🔗 jrwr All kinds of old radio shack crap up for sale, including lots of old books
03:06 🔗 godane https://www.twitch.tv/textfilesdotcom
03:12 🔗 JerryStie has joined #archiveteam-bs
03:13 🔗 kurt has joined #archiveteam-bs
03:55 🔗 qw3rty2 has joined #archiveteam-bs
03:56 🔗 pizzaiolo has quit IRC (Quit: pizzaiolo)
04:02 🔗 qw3rty has quit IRC (Read error: Operation timed out)
04:30 🔗 kristian_ has joined #archiveteam-bs
04:37 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
04:44 🔗 Sk1d has joined #archiveteam-bs
04:56 🔗 Stiletto has quit IRC (Read error: Operation timed out)
04:57 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
04:57 🔗 BlueMaxim has joined #archiveteam-bs
05:01 🔗 Stilett0 has joined #archiveteam-bs
06:19 🔗 godane has quit IRC (Ping timeout: 245 seconds)
06:38 🔗 bwn has quit IRC (Read error: Connection reset by peer)
06:48 🔗 bwn has joined #archiveteam-bs
07:14 🔗 bwn has quit IRC (Read error: Operation timed out)
07:22 🔗 bwn has joined #archiveteam-bs
08:17 🔗 godane has joined #archiveteam-bs
09:10 🔗 SHODAN_UI has joined #archiveteam-bs
09:13 🔗 kristian_ has quit IRC (Quit: Leaving)
09:24 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
09:38 🔗 SHODAN_UI has quit IRC (Remote host closed the connection)
09:46 🔗 j08nY has joined #archiveteam-bs
10:29 🔗 JAA Whoa, my Tilt API grab has retrieved about 360k URLs (about 1 GB as .warc.gz) and discovered over 150k additional users and about 4600 campaigns already. Currently, there are about 2.2 million URLs in the queue, and it's running at about 30k URLs per hour. Well, this will take a few days.
10:40 🔗 Jonison has joined #archiveteam-bs
11:31 🔗 pizzaiolo has joined #archiveteam-bs
11:44 🔗 SHODAN_UI has joined #archiveteam-bs
12:33 🔗 thuban3 has joined #archiveteam-bs
12:35 🔗 thuban2 has quit IRC (Read error: Operation timed out)
13:20 🔗 thuban4 has joined #archiveteam-bs
13:25 🔗 thuban3 has quit IRC (Read error: Operation timed out)
13:33 🔗 SHODAN_UI has quit IRC (Remote host closed the connection)
13:47 🔗 pizzaiolo has quit IRC (Ping timeout: 260 seconds)
13:54 🔗 pnJay has joined #archiveteam-bs
14:05 🔗 SHODAN_UI has joined #archiveteam-bs
14:44 🔗 pizzaiolo has joined #archiveteam-bs
14:53 🔗 thuban has joined #archiveteam-bs
14:58 🔗 thuban4 has quit IRC (Read error: Operation timed out)
15:02 🔗 dashcloud has quit IRC (Read error: Operation timed out)
15:03 🔗 dashcloud has joined #archiveteam-bs
15:14 🔗 username1 has joined #archiveteam-bs
15:16 🔗 schbirid2 has quit IRC (Read error: Operation timed out)
15:55 🔗 schbirid2 has joined #archiveteam-bs
15:58 🔗 username1 has quit IRC (Read error: Operation timed out)
16:01 🔗 schbirid2 has quit IRC (Read error: Operation timed out)
16:05 🔗 TheLovina has joined #archiveteam-bs
16:08 🔗 schbirid2 has joined #archiveteam-bs
16:13 🔗 sep332_ has joined #archiveteam-bs
16:14 🔗 sep332 has quit IRC (Read error: Operation timed out)
16:19 🔗 mgrytbak has quit IRC (Read error: Operation timed out)
16:19 🔗 useretail has quit IRC (Read error: Operation timed out)
16:19 🔗 mgrytbak has joined #archiveteam-bs
16:20 🔗 sep332_ has quit IRC (Read error: Operation timed out)
16:20 🔗 useretail has joined #archiveteam-bs
16:30 🔗 SHODAN_UI has quit IRC (Remote host closed the connection)
17:08 🔗 odemg_ has joined #archiveteam-bs
17:11 🔗 odemg has quit IRC (Read error: Operation timed out)
17:34 🔗 bitBaron has joined #archiveteam-bs
17:37 🔗 bitBaron has quit IRC (Client Quit)
17:38 🔗 bitBaron has joined #archiveteam-bs
17:38 🔗 bitBaron has quit IRC (Read error: Connection reset by peer)
17:38 🔗 bitBaron has joined #archiveteam-bs
17:53 🔗 ivan HCross2: yeah I just saw the grab-site thing you mentioned in the other channel, crazy stuff
17:56 🔗 ivan HCross2: if you have time and you have evidence that it's making thousands of useless DNS lookups this might be a good bug to file on crbug
17:56 🔗 ivan I think it would make sense for chromium to give up on predicting at some point
17:56 🔗 HCross2 ivan: I've got crawl logs and packet captures which match up
17:58 🔗 ivan huh https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-DNS-Prefetch-Control
17:58 🔗 ivan https://www.chromium.org/developers/design-documents/dns-prefetching has it too
18:00 🔗 timmc has left
18:02 🔗 ivan HCross2: can you add <meta http-equiv="x-dns-prefetch-control" content="off"> after the other meta tags in your installed libgrabsite/dashboard.html and tell me if that stops the lookups?
18:06 🔗 ivan JAA: re: queue slowness wpull might be doing an fsync frequently, there's some PRAGMA in grab-site that turns it off, you might try that
18:07 🔗 ivan libgrabsite/plugin.py NoFsyncSQLTable
18:09 🔗 ivan HCross2: heh easy to test with chrome://net-internals/#dns I am seeing what you saw now
18:10 🔗 HCross2 ivan: I saw my home DNS server go ballistic
18:12 🔗 ivan meta tag seems to work
18:18 🔗 JAA ivan: Thanks, I'll have a look.
18:19 🔗 ivan HCross2: alright fixed in grab-site 1.2.3 thanks for finding this
18:19 🔗 SHODAN_UI has joined #archiveteam-bs
18:29 🔗 arkiver this should probably be fixed on the archivebot dashboard too
18:30 🔗 ivan HCross2: can I credit you somehow? github username?
18:33 🔗 ivan hah found you
18:35 🔗 HCross2 HarryC145
18:36 🔗 MrRadar How many of you are there??? :P
18:38 🔗 bitBaron_ has joined #archiveteam-bs
18:39 🔗 bitBaron has quit IRC (Ping timeout: 250 seconds)
18:44 🔗 bitBaron has joined #archiveteam-bs
18:44 🔗 bitBaron_ has quit IRC (Read error: Connection reset by peer)
18:46 🔗 Boppen has quit IRC (Ping timeout: 194 seconds)
18:46 🔗 Boppen has joined #archiveteam-bs
19:18 🔗 HCross2 ivan: also credit madyoda in github please, he helped me initially diagnose it
19:47 🔗 ivan done
19:55 🔗 kristian_ has joined #archiveteam-bs
20:21 🔗 kristian_ has quit IRC (Quit: Leaving)
20:39 🔗 HCross2 phantomjs + Magneto web shops = terrible idea
21:00 🔗 pnJay has quit IRC (Quit: Leaving)
21:00 🔗 schbirid2 has quit IRC (Quit: Leaving)
21:24 🔗 SHODAN_UI has quit IRC (Remote host closed the connection)
21:48 🔗 Asparagir has quit IRC (Asparagir)
22:04 🔗 bitBaron has quit IRC (Read error: Operation timed out)
22:04 🔗 powerArch has quit IRC (Remote host closed the connection)
22:04 🔗 bitBaron has joined #archiveteam-bs
22:05 🔗 powerArch has joined #archiveteam-bs
22:40 🔗 Soni has quit IRC (Ping timeout: 272 seconds)
22:42 🔗 Jonison has quit IRC (Read error: Connection reset by peer)
22:43 🔗 JAA Just over 24 hours into my Tilt API grab: 735k URLs retrieved for a .warc.gz of about 2 GiB, 3.1M urls queued, 270k users and 8k campaigns newly discovered (seeded with 30k users and 53k campaigns).
22:46 🔗 JAA Something I forgot to mention before: I also extract all URLs from the JSON so I can grab those later. They're mostly images and shortened links like http://til.tt/yeWm . 670k such URLs found so far.
22:48 🔗 Soni has joined #archiveteam-bs
23:42 🔗 andai_ has joined #archiveteam-bs
23:42 🔗 andai has quit IRC (Read error: Connection reset by peer)
23:44 🔗 Crusher_ has joined #archiveteam-bs
23:45 🔗 andai_ has quit IRC (Read error: Connection reset by peer)
23:45 🔗 andai has joined #archiveteam-bs
23:45 🔗 Crusher_ Any word on when the Vine project is going to continue?
23:49 🔗 Crusher_ has quit IRC (Read error: Connection reset by peer)

irclogger-viewer