#archiveteam-bs 2017-06-26,Mon

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
nyanyquick question
I can't remember if this was us or not, but a while back there was an archiving tool to basically scrape images from certain image hosting websites, one such was I believe, prntscr
have we given any thought to archiving sites like imgur?
[00:02]
......... (idle for 40mn)
***icedice has quit IRC (Ping timeout: 740 seconds) [00:44]
j08nY has quit IRC (Quit: Leaving)
BlueMaxim has joined #archiveteam-bs
[00:54]
........ (idle for 38mn)
schbirid2 has joined #archiveteam-bs
schbirid has quit IRC (Read error: Operation timed out)
[01:36]
...... (idle for 25mn)
fie has quit IRC (Read error: Operation timed out) [02:05]
fie has joined #archiveteam-bs [02:19]
kristian_ has quit IRC (Quit: Leaving) [02:28]
...... (idle for 29mn)
wacky_ has quit IRC (Read error: Operation timed out)
wacky_ has joined #archiveteam-bs
HUBI has quit IRC (Read error: Operation timed out)
balrog has quit IRC (Read error: Operation timed out)
HUBI has joined #archiveteam-bs
sep332 has joined #archiveteam-bs
midas1 has quit IRC (Read error: Operation timed out)
midas1 has joined #archiveteam-bs
dashcloud has quit IRC (Read error: Operation timed out)
kanzure has quit IRC (Read error: Operation timed out)
chazchaz_ has quit IRC (Read error: Operation timed out)
kurt has quit IRC (Read error: Operation timed out)
whydomain has quit IRC (Read error: Operation timed out)
closure has quit IRC (Read error: Operation timed out)
sep332_ has quit IRC (Read error: Operation timed out)
kanzure has joined #archiveteam-bs
closure has joined #archiveteam-bs
whydomain has joined #archiveteam-bs
Yurume has quit IRC (Read error: Operation timed out)
jerrystie has quit IRC (Read error: Operation timed out)
balrog has joined #archiveteam-bs
swebb sets mode: +o balrog
Yurume has joined #archiveteam-bs
[02:57]
godaneSketchCow: i'm watching your twitch stream [03:01]
***dashcloud has joined #archiveteam-bs
c4rc4s has quit IRC (Ping timeout: 506 seconds)
c4rc4s has joined #archiveteam-bs
[03:02]
jrwrgodane: He has a twitch?
also SketchCow https://ubidestates.hibid.com/catalog/103245/radioshack-auction--1/
All kinds of old radio shack crap up for sale, including lots of old books
[03:06]
godanehttps://www.twitch.tv/textfilesdotcom [03:06]
***JerryStie has joined #archiveteam-bs
kurt has joined #archiveteam-bs
[03:12]
......... (idle for 42mn)
qw3rty2 has joined #archiveteam-bs
pizzaiolo has quit IRC (Quit: pizzaiolo)
[03:55]
qw3rty has quit IRC (Read error: Operation timed out) [04:02]
...... (idle for 28mn)
kristian_ has joined #archiveteam-bs [04:30]
Sk1d has quit IRC (Ping timeout: 250 seconds) [04:37]
Sk1d has joined #archiveteam-bs [04:44]
Stiletto has quit IRC (Read error: Operation timed out)
BlueMaxim has quit IRC (Read error: Operation timed out)
BlueMaxim has joined #archiveteam-bs
Stilett0 has joined #archiveteam-bs
[04:56]
................ (idle for 1h18mn)
godane has quit IRC (Ping timeout: 245 seconds) [06:19]
.... (idle for 19mn)
bwn has quit IRC (Read error: Connection reset by peer) [06:38]
bwn has joined #archiveteam-bs [06:48]
...... (idle for 26mn)
bwn has quit IRC (Read error: Operation timed out) [07:14]
bwn has joined #archiveteam-bs [07:22]
............ (idle for 55mn)
godane has joined #archiveteam-bs [08:17]
........... (idle for 53mn)
SHODAN_UI has joined #archiveteam-bs
kristian_ has quit IRC (Quit: Leaving)
[09:10]
BlueMaxim has quit IRC (Read error: Operation timed out) [09:24]
SHODAN_UI has quit IRC (Remote host closed the connection) [09:38]
j08nY has joined #archiveteam-bs [09:46]
......... (idle for 43mn)
JAAWhoa, my Tilt API grab has retrieved about 360k URLs (about 1 GB as .warc.gz) and discovered over 150k additional users and about 4600 campaigns already. Currently, there are about 2.2 million URLs in the queue, and it's running at about 30k URLs per hour. Well, this will take a few days. [10:29]
***Jonison has joined #archiveteam-bs [10:40]
........... (idle for 51mn)
pizzaiolo has joined #archiveteam-bs [11:31]
SHODAN_UI has joined #archiveteam-bs [11:44]
.......... (idle for 49mn)
thuban3 has joined #archiveteam-bs
thuban2 has quit IRC (Read error: Operation timed out)
[12:33]
.......... (idle for 45mn)
thuban4 has joined #archiveteam-bs [13:20]
thuban3 has quit IRC (Read error: Operation timed out) [13:25]
SHODAN_UI has quit IRC (Remote host closed the connection) [13:33]
pizzaiolo has quit IRC (Ping timeout: 260 seconds) [13:47]
pnJay has joined #archiveteam-bs [13:54]
SHODAN_UI has joined #archiveteam-bs [14:05]
........ (idle for 39mn)
pizzaiolo has joined #archiveteam-bs [14:44]
thuban has joined #archiveteam-bs [14:53]
thuban4 has quit IRC (Read error: Operation timed out)
dashcloud has quit IRC (Read error: Operation timed out)
dashcloud has joined #archiveteam-bs
[14:58]
username1 has joined #archiveteam-bs
schbirid2 has quit IRC (Read error: Operation timed out)
[15:14]
........ (idle for 39mn)
schbirid2 has joined #archiveteam-bs
username1 has quit IRC (Read error: Operation timed out)
schbirid2 has quit IRC (Read error: Operation timed out)
TheLovina has joined #archiveteam-bs
schbirid2 has joined #archiveteam-bs
[15:55]
sep332_ has joined #archiveteam-bs
sep332 has quit IRC (Read error: Operation timed out)
[16:13]
mgrytbak has quit IRC (Read error: Operation timed out)
useretail has quit IRC (Read error: Operation timed out)
mgrytbak has joined #archiveteam-bs
sep332_ has quit IRC (Read error: Operation timed out)
useretail has joined #archiveteam-bs
[16:19]
SHODAN_UI has quit IRC (Remote host closed the connection) [16:30]
........ (idle for 38mn)
odemg_ has joined #archiveteam-bs
odemg has quit IRC (Read error: Operation timed out)
[17:08]
..... (idle for 23mn)
bitBaron has joined #archiveteam-bs
bitBaron has quit IRC (Client Quit)
bitBaron has joined #archiveteam-bs
bitBaron has quit IRC (Read error: Connection reset by peer)
bitBaron has joined #archiveteam-bs
[17:34]
.... (idle for 15mn)
ivanHCross2: yeah I just saw the grab-site thing you mentioned in the other channel, crazy stuff
HCross2: if you have time and you have evidence that it's making thousands of useless DNS lookups this might be a good bug to file on crbug
I think it would make sense for chromium to give up on predicting at some point
[17:53]
HCross2ivan: I've got crawl logs and packet captures which match up [17:56]
ivanhuh https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-DNS-Prefetch-Control
https://www.chromium.org/developers/design-documents/dns-prefetching has it too
[17:58]
***timmc has left [18:00]
ivanHCross2: can you add <meta http-equiv="x-dns-prefetch-control" content="off"> after the other meta tags in your installed libgrabsite/dashboard.html and tell me if that stops the lookups?
JAA: re: queue slowness wpull might be doing an fsync frequently, there's some PRAGMA in grab-site that turns it off, you might try that
libgrabsite/plugin.py NoFsyncSQLTable
HCross2: heh easy to test with chrome://net-internals/#dns I am seeing what you saw now
[18:02]
HCross2ivan: I saw my home DNS server go ballistic [18:10]
ivanmeta tag seems to work [18:12]
JAAivan: Thanks, I'll have a look. [18:18]
ivanHCross2: alright fixed in grab-site 1.2.3 thanks for finding this [18:19]
***SHODAN_UI has joined #archiveteam-bs [18:19]
arkiverthis should probably be fixed on the archivebot dashboard too [18:29]
ivanHCross2: can I credit you somehow? github username?
hah found you
[18:30]
HCross2HarryC145 [18:35]
MrRadarHow many of you are there??? :P [18:36]
***bitBaron_ has joined #archiveteam-bs
bitBaron has quit IRC (Ping timeout: 250 seconds)
[18:38]
bitBaron has joined #archiveteam-bs
bitBaron_ has quit IRC (Read error: Connection reset by peer)
Boppen has quit IRC (Ping timeout: 194 seconds)
Boppen has joined #archiveteam-bs
[18:44]
....... (idle for 32mn)
HCross2ivan: also credit madyoda in github please, he helped me initially diagnose it [19:18]
...... (idle for 29mn)
ivandone [19:47]
***kristian_ has joined #archiveteam-bs [19:55]
...... (idle for 26mn)
kristian_ has quit IRC (Quit: Leaving) [20:21]
.... (idle for 18mn)
HCross2phantomjs + Magneto web shops = terrible idea [20:39]
..... (idle for 21mn)
***pnJay has quit IRC (Quit: Leaving)
schbirid2 has quit IRC (Quit: Leaving)
[21:00]
..... (idle for 24mn)
SHODAN_UI has quit IRC (Remote host closed the connection) [21:24]
..... (idle for 24mn)
Asparagir has quit IRC (Asparagir) [21:48]
.... (idle for 16mn)
bitBaron has quit IRC (Read error: Operation timed out)
powerArch has quit IRC (Remote host closed the connection)
bitBaron has joined #archiveteam-bs
powerArch has joined #archiveteam-bs
[22:04]
........ (idle for 35mn)
Soni has quit IRC (Ping timeout: 272 seconds)
Jonison has quit IRC (Read error: Connection reset by peer)
[22:40]
JAAJust over 24 hours into my Tilt API grab: 735k URLs retrieved for a .warc.gz of about 2 GiB, 3.1M urls queued, 270k users and 8k campaigns newly discovered (seeded with 30k users and 53k campaigns).
Something I forgot to mention before: I also extract all URLs from the JSON so I can grab those later. They're mostly images and shortened links like http://til.tt/yeWm . 670k such URLs found so far.
[22:43]
***Soni has joined #archiveteam-bs [22:48]
........... (idle for 54mn)
andai_ has joined #archiveteam-bs
andai has quit IRC (Read error: Connection reset by peer)
Crusher_ has joined #archiveteam-bs
andai_ has quit IRC (Read error: Connection reset by peer)
andai has joined #archiveteam-bs
[23:42]
Crusher_Any word on when the Vine project is going to continue? [23:45]
***Crusher_ has quit IRC (Read error: Connection reset by peer) [23:49]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)