[00:33] *** godane has quit IRC (Quit: Leaving.) [00:33] *** godane has joined #archiveteam-bs [00:43] *** Fusl has quit IRC (hub.efnet.us irc.efnet.nl) [00:43] *** nyaomi has quit IRC (hub.efnet.us irc.efnet.nl) [00:43] *** underscor has quit IRC (hub.efnet.us irc.efnet.nl) [00:43] *** SmileyG has quit IRC (hub.efnet.us irc.efnet.nl) [00:43] *** MrRadar2 has quit IRC (hub.efnet.us irc.efnet.nl) [00:43] *** tsr has quit IRC (hub.efnet.us irc.efnet.nl) [00:43] *** BnAboyZ has quit IRC (hub.efnet.us irc.efnet.nl) [00:43] *** Sue has quit IRC (hub.efnet.us irc.efnet.nl) [00:43] *** w0rp has quit IRC (hub.efnet.us irc.efnet.nl) [00:43] *** Spydar007 has quit IRC (hub.efnet.us irc.efnet.nl) [00:43] *** BnARobin has quit IRC (hub.efnet.us irc.efnet.nl) [00:43] *** bsmith093 has quit IRC (hub.efnet.us irc.efnet.nl) [00:43] *** Jonimus has quit IRC (hub.efnet.us irc.efnet.nl) [00:43] *** hook54321 has quit IRC (hub.efnet.us irc.efnet.nl) [00:43] *** Fusl has joined #archiveteam-bs [00:43] *** nyaomi has joined #archiveteam-bs [00:43] *** underscor has joined #archiveteam-bs [00:43] *** SmileyG has joined #archiveteam-bs [00:43] *** MrRadar2 has joined #archiveteam-bs [00:43] *** tsr has joined #archiveteam-bs [00:43] *** BnAboyZ has joined #archiveteam-bs [00:43] *** Sue has joined #archiveteam-bs [00:43] *** w0rp has joined #archiveteam-bs [00:43] *** Spydar007 has joined #archiveteam-bs [00:43] *** BnARobin has joined #archiveteam-bs [00:43] *** bsmith093 has joined #archiveteam-bs [00:43] *** Jonimus has joined #archiveteam-bs [00:43] *** hook54321 has joined #archiveteam-bs [00:43] *** irc.efnet.nl sets mode: +oo MrRadar2 hook54321 [00:56] *** dashcloud has quit IRC (Read error: Operation timed out) [00:59] *** Valentine has quit IRC (Read error: Operation timed out) [01:00] *** m007a83_ is now known as m007a83 [01:02] *** Valentine has joined #archiveteam-bs [01:22] *** MrRadar_ is now known as MrRadar [01:25] *** dashcloud has joined #archiveteam-bs [01:27] *** MrRadar has quit IRC (Read error: Operation timed out) [01:29] *** godane has quit IRC (Read error: Operation timed out) [01:30] *** godane has joined #archiveteam-bs [01:48] *** dashcloud has quit IRC (Read error: Operation timed out) [01:49] *** Mateon1 has quit IRC (Read error: Operation timed out) [01:49] *** Mateon1 has joined #archiveteam-bs [01:54] *** dashcloud has joined #archiveteam-bs [01:56] *** MrRadar has joined #archiveteam-bs [01:58] *** MrRadar2 sets mode: +o MrRadar [02:00] *** dashcloud has quit IRC (Read error: Operation timed out) [02:02] *** dashcloud has joined #archiveteam-bs [02:04] *** atrocity has quit IRC (Read error: Connection reset by peer) [02:04] *** atrocity has joined #archiveteam-bs [03:49] *** qw3rty116 has joined #archiveteam-bs [03:55] *** qw3rty115 has quit IRC (Read error: Operation timed out) [04:04] *** dashcloud has quit IRC (Read error: Operation timed out) [04:07] *** dashcloud has joined #archiveteam-bs [04:08] *** odemg has quit IRC (Read error: Operation timed out) [04:09] *** beardicus has quit IRC (Read error: Operation timed out) [04:14] *** odemg has joined #archiveteam-bs [04:25] *** dashcloud has quit IRC (Read error: Operation timed out) [04:30] *** dashcloud has joined #archiveteam-bs [05:12] *** beardicus has joined #archiveteam-bs [05:39] *** dashcloud has quit IRC (Read error: Connection reset by peer) [05:40] *** dashcloud has joined #archiveteam-bs [05:51] *** Pixi has quit IRC (Quit: Pixi) [05:51] *** Pixi has joined #archiveteam-bs [06:30] *** wp494_ has joined #archiveteam-bs [06:35] *** wp494 has quit IRC (Read error: Operation timed out) [06:45] *** MrRadar_ has joined #archiveteam-bs [06:46] *** dashcloud has quit IRC (Read error: Operation timed out) [06:46] *** MrRadar has quit IRC (Read error: Operation timed out) [06:51] *** dashcloud has joined #archiveteam-bs [07:00] *** dashcloud has quit IRC (Read error: Operation timed out) [07:06] *** dashcloud has joined #archiveteam-bs [07:06] *** schbirid has joined #archiveteam-bs [07:36] *** dashcloud has quit IRC (Read error: Operation timed out) [07:40] *** schbirid has quit IRC (Quit: Leaving) [07:40] *** dashcloud has joined #archiveteam-bs [08:24] My GIGA grab is still going strong. 582k thread IDs in a bit over 13 hours. There were some connection errors for a while, but no problems otherwise. [08:46] *** BlueMax has quit IRC (Leaving) [09:02] *** Mayonaise has quit IRC (Read error: Operation timed out) [09:02] *** m007a83_ has joined #archiveteam-bs [09:03] *** REiN^ has quit IRC (Read error: Operation timed out) [09:03] *** Mayonaise has joined #archiveteam-bs [09:03] *** rektide has quit IRC (Read error: Operation timed out) [09:03] *** twigfoot has quit IRC (Write error: Broken pipe) [09:03] *** squires has quit IRC (Write error: Broken pipe) [09:03] *** bmcginty has quit IRC (Read error: Operation timed out) [09:03] *** beardicus has quit IRC (Read error: Operation timed out) [09:03] *** Odd0002 has quit IRC (Read error: Operation timed out) [09:03] *** pikhq has quit IRC (Read error: Operation timed out) [09:03] *** FireFly has quit IRC (Read error: Operation timed out) [09:04] *** bwn has quit IRC (Read error: Operation timed out) [09:04] *** pikhq has joined #archiveteam-bs [09:04] *** bmcginty has joined #archiveteam-bs [09:04] *** twigfoot has joined #archiveteam-bs [09:04] *** rektide has joined #archiveteam-bs [09:04] *** Mateon1 has quit IRC (Read error: Operation timed out) [09:04] *** Mateon1 has joined #archiveteam-bs [09:04] *** C4K3 has quit IRC (Read error: Operation timed out) [09:05] *** sep332 has quit IRC (Read error: Operation timed out) [09:05] *** PotcFdk has quit IRC (Read error: Operation timed out) [09:06] *** unlobito has quit IRC (Read error: Operation timed out) [09:06] *** unlobito has joined #archiveteam-bs [09:06] *** Odd0002 has joined #archiveteam-bs [09:06] *** m007a83 has quit IRC (Read error: Operation timed out) [09:06] *** Dimtree has quit IRC (Read error: Operation timed out) [09:07] *** qw3rty116 has quit IRC (Read error: Operation timed out) [09:10] *** MrRadar has joined #archiveteam-bs [09:11] *** MrRadar_ has quit IRC (Read error: Operation timed out) [09:36] *** bwn has joined #archiveteam-bs [10:02] *** Dimtree has joined #archiveteam-bs [10:39] *** bad_faith has joined #archiveteam-bs [10:41] Hi I'd like to BS about WARC 1.1 and Wayback 2.0 if/when anyone who has feelings about these things is available [10:43] since archive.org generally does not make their full (w)arcs publicly accessible, discovery and access are (for now) limited to the CDX/availability APIs and various played-back "Mementos" derived from these warcs [10:45] Still, enough information about the original capture event can be inferred from CDX records and headers provided by the playback/wayback server. I'd like to be able to analyze these records with the same tools being developed for "first-hand" warc files e.g. Archives Unleshed/Archivespark. [10:46] one of these days I'm going to have to think about backing up ninlive.com, and that's a daunting prospect; [10:46] It's also plausible to me that down the road - maybe way way down the road - the original warc files in archive.org's crawl collections will become public [10:48] "big data" analysis of web archives is where it seems like this whole niche is heading. is anyone trying to do that without any kind of privileged access like I am? [10:57] I find the technical aspects of this more frustrating than interesting and my real hope is I can just conform to whatever solution someone else has come up with to this problem because I would like to stop thinking about it and forget all of it. [11:50] *** BlueMax has joined #archiveteam-bs [12:02] *** qw3rty116 has joined #archiveteam-bs [12:02] *** beardicus has joined #archiveteam-bs [12:03] *** REiN^ has joined #archiveteam-bs [12:06] *** C4K3 has joined #archiveteam-bs [12:08] *** sep332 has joined #archiveteam-bs [12:16] *** FireFly has joined #archiveteam-bs [12:19] *** PotcFdk has joined #archiveteam-bs [12:47] *** zhongfu has quit IRC (Read error: Connection reset by peer) [12:47] *** zhongfu has joined #archiveteam-bs [12:50] *** zino has quit IRC (Remote host closed the connection) [13:07] *** BlueMax has quit IRC (Quit: Leaving) [15:15] *** Atom has quit IRC (Read error: Operation timed out) [16:02] *** lindalap has quit IRC (Quit: Restarting for IRC ident) [16:02] *** lindalap has joined #archiveteam-bs [16:17] *** schbirid has joined #archiveteam-bs [17:21] *** Darkstar has quit IRC (Ping timeout: 246 seconds) [17:23] *** Darkstar has joined #archiveteam-bs [17:29] *** Atom has joined #archiveteam-bs [17:31] *** keith20 has joined #archiveteam-bs [17:53] Almost 1 million thread IDs scanned on GIGA in less than 24 hours. :-) So it should finish with the initial URLs sometime this night and start with the other pages of the existing threads, avatars, etc. [17:53] And only 14.3 GiB of WARCs so far. [18:04] <3 [18:49] Who was interested in PureVolume? [19:17] *** VerifiedJ has joined #archiveteam-bs [19:39] HCross: meeeee [19:40] arkiver: got 120GB of it so far, in 1.3mil requests, another 3.1mil requests to go [19:40] WARCs? [19:40] yep [19:40] awesome! [19:41] so I've been working on this new decentralized crawler [19:41] I'd like to test it soon [19:41] who would like to help out with putting it on some servers and trying it out :) [19:41] ? [19:42] arkiver: do you intend for it to replace archivebot? [19:43] not at the moment, but who knows what might happen [19:43] I'd like to start using it for large websites that we cannot really split up into parts [19:44] arkiver: its not the quickest crawl at the moment, because im running a couple of others that are in the millions of URLs [19:44] and for possibly others project later as it becomes more stable and the network larger [19:45] *** noirscape has joined #archiveteam-bs [19:47] arkiver: happy to throw some power at it [19:47] awesome, thanks Kaz [19:48] Still working on some parts of it, but it should be ready for some testing soon on a larger scale than I have locally been testing it [20:19] *** lindalap_ has joined #archiveteam-bs [20:19] *** lindalap has quit IRC (Write error: Connection reset by peer) [20:20] *** lindalap_ is now known as WubTheCap [20:20] *** WubTheCap is now known as lindalap [20:26] *** keith20 has quit IRC (byeee) [20:46] *** jschwart has joined #archiveteam-bs [20:52] SketchCow: any news from Mank? [21:08] *** noirscape has quit IRC (Ping timeout: 268 seconds) [21:14] *** noirscape has joined #archiveteam-bs [21:14] *** dashcloud has quit IRC (Quit: No Ping reply in 180 seconds.) [21:16] *** dashcloud has joined #archiveteam-bs [21:23] *** jschwart has quit IRC (Quit: Konversation terminated!) [21:50] *** schbirid has quit IRC (Quit: Leaving) [22:38] *** BlueMax has joined #archiveteam-bs [22:43] *** wp494_ is now known as wp494 [22:51] *** VerifiedJ has left [23:52] *** Mateon1 has quit IRC (Read error: Operation timed out) [23:53] *** Mateon1 has joined #archiveteam-bs