#archiveteam-bs 2017-05-22,Mon

↑back Search

Time Nickname Message
00:17 🔗 ndiddy has joined #archiveteam-bs
00:33 🔗 powerKitt has quit IRC (Ping timeout: 268 seconds)
00:37 🔗 GLaDOS has joined #archiveteam-bs
01:02 🔗 ndiddy has quit IRC ()
01:06 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
01:10 🔗 sheaf has quit IRC (Quit: sheaf)
01:11 🔗 Sk1d has joined #archiveteam-bs
01:36 🔗 schbirid2 has joined #archiveteam-bs
01:39 🔗 schbirid has quit IRC (Read error: Operation timed out)
02:04 🔗 Asparagir has joined #archiveteam-bs
02:09 🔗 Odd0002 has joined #archiveteam-bs
03:43 🔗 Lord_Nigh Somebody2: that email i sent to info@archive i never received a response to yet, but that was friday so maybe they'll answer it on monday
03:44 🔗 xmc yes, it is a job, not a lifestyle
03:44 🔗 xmc well
03:44 🔗 xmc you know what i mean
03:49 🔗 Somebody2 Lord_Nigh: It may be longer than that, if it isn't a simple fix.
03:50 🔗 Lord_Nigh i'm guessing its a regression in the robots.txt parser and its a simple/stupid bug, heck the source code to it is probably available, maybe i can fix it...
03:51 🔗 Somebody2 Heh, I'm not sure where the source code for the new version of the Wayback Machine is.
03:51 🔗 Somebody2 If you do come up with a patch, that might be likely to get a response sooner
04:04 🔗 bsmith093 the latest dump of fanfiction.net, 16gb compressed, 54gb uncompressed. 745K stories, https://archive.org/details/Fanfictiondotnet1011dump
04:20 🔗 Lord_Nigh Somebody2: i'm not sure either
04:20 🔗 Lord_Nigh where the source is
07:27 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
08:38 🔗 Honno has joined #archiveteam-bs
08:48 🔗 Sanqui I noticed WaybackMachine added an "About this capture" thingy
08:48 🔗 Sanqui So you can now identify ArchiveBot crawls
08:55 🔗 GE has joined #archiveteam-bs
09:14 🔗 Honno_ has joined #archiveteam-bs
09:15 🔗 Honno__ has joined #archiveteam-bs
09:19 🔗 Honno has quit IRC (Ping timeout: 370 seconds)
09:20 🔗 Honno_ has quit IRC (Ping timeout: 370 seconds)
09:23 🔗 Honno_ has joined #archiveteam-bs
09:28 🔗 Honno__ has quit IRC (Ping timeout: 370 seconds)
10:30 🔗 GE has quit IRC (Remote host closed the connection)
11:30 🔗 Aoede bsmith093: nice. how do you generate metadata.sqlite?
12:10 🔗 Jonison has joined #archiveteam-bs
12:10 🔗 Jonison has quit IRC (Read error: Connection reset by peer)
12:47 🔗 GE has joined #archiveteam-bs
13:22 🔗 sheaf has joined #archiveteam-bs
15:00 🔗 Fletcher has joined #archiveteam-bs
15:00 🔗 kurt has joined #archiveteam-bs
15:00 🔗 kvieta has joined #archiveteam-bs
15:00 🔗 espes__ has joined #archiveteam-bs
15:00 🔗 SilSte has joined #archiveteam-bs
15:00 🔗 Kenshin has joined #archiveteam-bs
15:00 🔗 w0rp has joined #archiveteam-bs
15:00 🔗 dashcloud has joined #archiveteam-bs
15:00 🔗 HP has joined #archiveteam-bs
15:00 🔗 antonizoo has joined #archiveteam-bs
15:00 🔗 tapedrive has joined #archiveteam-bs
15:00 🔗 eprillios has joined #archiveteam-bs
15:00 🔗 chfoo has joined #archiveteam-bs
15:00 🔗 cf has joined #archiveteam-bs
15:00 🔗 joepie91 has joined #archiveteam-bs
15:00 🔗 brayden has joined #archiveteam-bs
15:00 🔗 hub.dk sets mode: +oo Fletcher brayden
15:00 🔗 swebb sets mode: +o brayden
15:00 🔗 jmtd has joined #archiveteam-bs
15:01 🔗 Smiley has joined #archiveteam-bs
15:13 🔗 Asparagir has quit IRC (Asparagir)
15:40 🔗 RichardG has joined #archiveteam-bs
16:07 🔗 RichardG has quit IRC (Read error: Operation timed out)
16:07 🔗 RichardG has joined #archiveteam-bs
16:31 🔗 powerArch has quit IRC (Remote host closed the connection)
16:33 🔗 RedType_ has quit IRC (Read error: Operation timed out)
16:34 🔗 icedice has joined #archiveteam-bs
17:02 🔗 RichardG has quit IRC (Read error: Operation timed out)
17:02 🔗 RichardG has joined #archiveteam-bs
17:29 🔗 RichardG has quit IRC (Read error: Operation timed out)
17:30 🔗 RichardG has joined #archiveteam-bs
17:33 🔗 ndiddy has joined #archiveteam-bs
17:37 🔗 Asparagir has joined #archiveteam-bs
17:38 🔗 Honno__ has joined #archiveteam-bs
17:39 🔗 GE has quit IRC (Remote host closed the connection)
17:43 🔗 Honno_ has quit IRC (Ping timeout: 370 seconds)
18:12 🔗 RichardG has quit IRC (Read error: Operation timed out)
18:12 🔗 RichardG has joined #archiveteam-bs
18:59 🔗 RichardG has quit IRC (Read error: Operation timed out)
18:59 🔗 RichardG has joined #archiveteam-bs
19:13 🔗 BartoCH has quit IRC (Ping timeout: 260 seconds)
19:36 🔗 GE has joined #archiveteam-bs
19:40 🔗 C4K3_ is now known as C4K3
19:59 🔗 BartoCH has joined #archiveteam-bs
20:04 🔗 BartoCH has quit IRC (Ping timeout: 260 seconds)
20:08 🔗 BartoCH has joined #archiveteam-bs
20:19 🔗 BartoCH has quit IRC (Ping timeout: 260 seconds)
20:34 🔗 BartoCH has joined #archiveteam-bs
20:44 🔗 powerArch has joined #archiveteam-bs
21:11 🔗 RichardG has quit IRC (Read error: Operation timed out)
21:11 🔗 RichardG has joined #archiveteam-bs
21:37 🔗 RedType has joined #archiveteam-bs
21:37 🔗 RichardG has quit IRC (Read error: Operation timed out)
21:37 🔗 RichardG has joined #archiveteam-bs
21:54 🔗 icedice has quit IRC (Ping timeout: 250 seconds)
21:58 🔗 Asparagir im in yr internet archive archiving yr internets
21:58 🔗 Asparagir no seriously, I'm working at Funston today, come say hi if you're around
21:58 🔗 xmc ohai!
21:59 🔗 xmc sets mode: +o Asparagir
21:59 🔗 Asparagir Now I'm all super-powerful, thanks!
21:59 🔗 Asparagir Step two, get me one of those orbs
21:59 🔗 Asparagir Step three, profit.
22:00 🔗 Asparagir The WiFi here is about 165 Mbps. :-O
22:14 🔗 GE has quit IRC (Remote host closed the connection)
22:20 🔗 JAA Alright, my setup for Razer Arena with wpull and PhantomJS seems to work in principle. The main problems are that it still doesn't capture everything (wpull doesn't seem to extract links from the DOM generated by PhantomJS) and that the grab will be quite large due to duplication (each page grabs all the JavaScript, imagery, etc. again through PhantomJS).
22:22 🔗 MrRadar I think arkiver has a script to dedup WARCs
22:24 🔗 JAA Yeah, I guess that shouldn't be too difficult. I'm more concerned about the "doesn't capture everything" part.
22:44 🔗 dashcloud has quit IRC (Remote host closed the connection)
22:46 🔗 dashcloud has joined #archiveteam-bs
23:00 🔗 JAA If anyone has any ideas, please let me know. For the record, I'm using wpull 1.2.3 with PhantomJS 2.1.1 and the options --phantomjs --phantomjs-exe /path/to/phantomjs --no-phantomjs-snapshot .
23:01 🔗 JAA Otherwise, I'll just grab the actual data through the API and ignore the interface.
23:22 🔗 Ravenloft has joined #archiveteam-bs
23:35 🔗 BlueMaxim has joined #archiveteam-bs
23:56 🔗 Odd0002 hmm, is there anywhere other than archive.org that I could go to look for or upload old, late 90's/early 2000's PC games? Archive doesn't seem to have them, and there's almost no information on the internet about these games
23:57 🔗 xmc archive.org is a good place to upload
23:57 🔗 xmc i don't know where a good place to find is though

irclogger-viewer