[00:07] i'm uploading The Place For No Story as 'The Place For No Story 1973 Timecode' [00:08] cause there is timecode burn into the video [00:14] JAA; Of course we're interested, why would you even ask. :-) [00:16] *** Asparagir has quit IRC (Asparagir) [00:32] *** vitzli has joined #archiveteam-bs [01:06] *** Baljem has quit IRC (Read error: Operation timed out) [01:06] i'm capturing the 36th tape from Laughing Squid [01:08] SketchCow: at this rate i may have all tapes digitize in about week [01:21] *** Odd0002_ has joined #archiveteam-bs [01:21] *** Odd0002 has quit IRC (Ping timeout: 600 seconds) [01:21] *** Odd0002_ is now known as Odd0002 [01:26] @Stiletto Amazingly, I was able to find the article again: http://www.vintagecomputing.com/index.php/archives/1063/bringing-prodigy-back-from-the-dead [01:30] thanks so much :D [01:31] I can't wait to see all of the cool stuff you've found [01:34] *** username1 has joined #archiveteam-bs [01:37] *** schbirid2 has quit IRC (Read error: Operation timed out) [02:22] *** pizzaiolo has quit IRC (Quit: pizzaiolo) [04:04] *** vitzli has quit IRC (Quit: Leaving) [04:23] http://techreport.com/news/32659/pour-one-out-for-aol-instant-messenger [04:24] *** Sk1d has quit IRC (Ping timeout: 250 seconds) [04:31] *** Sk1d has joined #archiveteam-bs [04:53] Yeah. I tried logging on to AIM just now for kicks. SSL error on login. [04:53] Can't seem to get online. [04:53] ICQ still works fine though. We'll always have ICQ. [05:03] hopefully [05:25] zino, godane : I have a torrent site for "home-recordings" and " odd stuff like travel tapes" [05:27] superkuh, yeah you need an up-to-date client [05:28] Makes sense. I'm using Pidgin 2.6.6 which is pretty old. [05:29] I was afraid they were only going to allow official aim client but new pidgin works [05:30] gf just said they are shutting down now? wtf [05:31] damn you facebook [05:32] Why can't mozilla take it over or something [05:34] someone named Mental Elf messaged me... [05:34] nobody on my buddy list is ever signed on [05:47] fie: is societyglitch? [05:47] i have a account there [06:34] *** Stiletto has quit IRC () [06:47] godane, yes [07:18] Just don't know where I would source home movies and odd stuff... probably not ebay. [07:34] *** Stilett0- has joined #archiveteam-bs [08:23] *** TheLovina has quit IRC (Ping timeout: 370 seconds) [09:42] *** brayden has quit IRC (Ping timeout: 255 seconds) [09:43] *** brayden has joined #archiveteam-bs [09:43] *** swebb sets mode: +o brayden [11:14] Alright, so about wordpress.com: they have a link shortener, wp.me. The shortcode can have various different formats for linking to specific pages of a blog (e.g. directly to a post or an image attached to a post etc.). The format of main interest in this context, however, is simply the blog ID encoded in base62 ([0-9a-zA-Z]). [11:15] This shortening is provided by Jetpack, a Wordpress plugin installed and activated by default on all wordpress.com blogs (including free ones). [11:18] It seems that the maximum blog ID is currently somewhere just below 9g000, i.e. on the order of 135M shortcodes need to be scanned (9 * 62^4 + 16 * 62^3). [11:19] That's also the order of magnitude of how many blogs there are. [11:20] We could do this through URLTeam and then figure out what to do with it later. [11:47] *** dd0a13f37 has joined #archiveteam-bs [11:47] Where do I report security issues for archive.org? [11:59] info@archive.org [11:59] *** dashcloud has quit IRC (Read error: Connection reset by peer) [12:00] they can either forward it for you or give you direct contact [12:00] alright, thanks [12:00] *** dashcloud has joined #archiveteam-bs [12:01] thank YOU [12:07] Not sure it's anything major, but better safe than sorry I guess [12:09] SketchCow: your getting a showtime airing of Road To Wellville cause that in the case of tapes [12:10] plus side is it got the most out of having 10000k setting being at 6.4gb [12:11] based on the preview of Outer Limits preview it aired on the week of 1996-04-05 [12:12] it was a preview for the episode called "The Refuge" with actor M. Emmet Walsh [12:21] *** icedice has joined #archiveteam-bs [12:34] JAA: It's very close, more than an order of magnitude. Converting the latest shortlink to decimal and using it together with https://wordpress.com/activity/ to get an estimate [12:34] for posts/blog gives a result close to https://en.blog.wordpress.com/2015/01/06/2014-in-review/ [12:35] Or the other way around, estimate number of blogs from 2014 posts/blog stats and stats, convert to b64, note that it's close [12:35] b62* [12:35] Yeah, I ran a test with the two-character codes and almost all of them existed. [12:36] According to that, there should be (base62) 08 57 30 52 59 blogs (131913987) [12:36] Which is close to 9g000 [12:37] Yup [12:44] Although it's not exact - if you manipulate the POST request from the stats page you can get a chart for the number of blogs which gives 125452778 (08 30 23 61 56) as total [12:45] Or maybe they subtract deleted blogs, in which case it makes perfect sense [12:50] 4 billion posts, that's actually not a whole lot [12:56] *** K4k has joined #archiveteam-bs [12:57] *** BlueMaxim has quit IRC (Read error: Connection reset by peer) [12:59] *** Mateon1 has quit IRC (Ping timeout: 255 seconds) [13:00] *** Mateon1 has joined #archiveteam-bs [13:51] I don't think any of the libgen collections on IA are complete unless the logs have been tampered with. Should I upload it again? [15:02] *** dashcloud has quit IRC (Read error: Operation timed out) [15:15] *** icedice2 has joined #archiveteam-bs [15:17] *** icedice has quit IRC (Ping timeout: 260 seconds) [15:17] *** icedice2 has quit IRC (Client Quit) [15:17] *** icedice has joined #archiveteam-bs [15:28] Just to confirm: is gawker.com archived, and where can the archives be found? I saw several mentions of it in the logs, but I can't find it on IA. (Via: https://www.reddit.com/r/Archiveteam/comments/73xszd/has_gawker_been_fully_archived/ ) [15:31] *** dashcloud has joined #archiveteam-bs [15:37] nvm, i found the real collection, up to r_2092000 is archived [15:51] *** username1 is now known as schbirid [15:52] anyone know how to strip all formatting from a $msg in irssi perl scripting? [15:57] *** Rai-chan has joined #archiveteam-bs [16:08] *** RichardG_ has joined #archiveteam-bs [16:08] *** RichardG has quit IRC (Read error: Connection reset by peer) [16:18] Sci-mag is archived up to 64099999, foreignfiction up to 1600000 [16:19] Foreignfiction goes up to 1890000, sci-mag torrents are down so not sure exactly how far they go [16:20] are they surely fully archived? or just 50% stalled torrents? [16:23] The torrents are seeded, so I think they're archived. The ones that I checked were at least [16:25] i meant to grab all of scimag but about iirc 25% of the ones i tried were not fully seeded :( [16:34] Ask on forums for reseed then [16:42] *** loadup has joined #archiveteam-bs [16:54] *** icedice2 has joined #archiveteam-bs [16:56] *** icedice has quit IRC (Ping timeout: 250 seconds) [16:59] *** kepler45 has quit IRC (Quit: Leaving) [17:20] *** pizzaiolo has joined #archiveteam-bs [17:27] *** Asparagir has joined #archiveteam-bs [17:28] *** svchfoo1 sets mode: +o Asparagir [17:28] *** icedice2 has quit IRC (Quit: Leaving) [17:29] *** icedice has joined #archiveteam-bs [17:33] *** TC01 has quit IRC (Remote host closed the connection) [17:38] *** icedice2 has joined #archiveteam-bs [17:44] *** icedice has quit IRC (Read error: Operation timed out) [18:00] *** Asparagir has quit IRC (Asparagir) [18:31] *** icedice2 has quit IRC (Ping timeout: 255 seconds) [18:31] For the record, we're now grabbing wp.me in URLTeam. :-) [18:33] *** icedice has joined #archiveteam-bs [18:46] So will you archive the whole of WP? [18:46] dd0a13f37: just the URLs, not their contents (at this point, at least) [18:47] In URLteam, yes, but for the WP project [18:47] Are they having any problems? [18:47] Not that I know of. [18:47] Not that I know of, but it's good to have a backup [18:47] But I figured, why the hell not? [18:47] heh, jinx [18:48] 4 bil posts, 1/4 have images, 1bil images, 1m each, 1pb [18:48] Large endeavour [18:53] *** dd0a13f37 has quit IRC (Ping timeout: 268 seconds) [18:58] *** icedice2 has joined #archiveteam-bs [18:59] *** dd0a13f37 has joined #archiveteam-bs [19:03] *** icedice has quit IRC (Ping timeout: 506 seconds) [19:14] *** icedice has joined #archiveteam-bs [19:16] *** dd0a13f37 has quit IRC (Ping timeout: 268 seconds) [19:16] *** dd0a has joined #archiveteam-bs [19:16] *** dd0a is now known as dd0a13f37 [19:21] *** icedice2 has quit IRC (Ping timeout: 506 seconds) [19:27] *** icedice2 has joined #archiveteam-bs [19:29] *** icedice has quit IRC (Ping timeout: 245 seconds) [19:30] *** icedice has joined #archiveteam-bs [19:32] *** dd0a13f37 has quit IRC (Ping timeout: 268 seconds) [19:36] *** ajshell1 has quit IRC (Quit: Leaving) [19:36] *** icedice2 has quit IRC (Read error: Operation timed out) [19:38] *** atrocity has joined #archiveteam-bs [19:39] *** Atros has quit IRC (Ping timeout: 246 seconds) [19:44] *** icedice2 has joined #archiveteam-bs [19:50] *** icedice has quit IRC (Read error: Operation timed out) [19:56] *** dd0a13f37 has joined #archiveteam-bs [19:57] It sure is some improvement over proxy+webirc [20:04] *** ajshell1 has joined #archiveteam-bs [20:15] *** atrocity has quit IRC (Read error: Connection reset by peer) [20:16] *** atrocity has joined #archiveteam-bs [20:35] *** ajshell1 has quit IRC (Quit: Leaving) [20:45] so i have this on tape from the box: https://en.wikipedia.org/wiki/Heat_and_Sunlight [20:45] digitize it now [20:48] *** ajshell1 has joined #archiveteam-bs [20:48] *** ajshell1 has quit IRC (Client Quit) [20:56] *** ajshell1 has joined #archiveteam-bs [20:57] *** Stilett0- has quit IRC (Ping timeout: 260 seconds) [21:02] *** dashcloud has quit IRC (Remote host closed the connection) [21:03] *** dashcloud has joined #archiveteam-bs [21:13] *** TC01 has joined #archiveteam-bs [21:17] *** ajshell1 has quit IRC (Quit: Leaving) [21:32] *** ajshell1 has joined #archiveteam-bs [21:45] *** ajshell1 has quit IRC (Quit: Leaving) [21:51] *** kepler45 has joined #archiveteam-bs [22:13] *** ajshell1 has joined #archiveteam-bs [22:19] *** ajshell1 has quit IRC (Quit: Leaving) [22:25] *** kepler45 has quit IRC (Quit: Leaving) [22:25] *** odemg has quit IRC (Read error: Operation timed out) [22:32] *** odemg has joined #archiveteam-bs [22:38] A stripped-down version of archivebot for !ao, now that would be something. You could make it run much much faster if you can disregard certain constraints [22:39] You can't really ignore that much though. You still need to process images, stylesheets, scripts, etc. [22:41] An internet where everyone conforms to standards so we don't have to use parsers which are slowed down by all kinds of odd special cases, now that would be something. [22:41] Not always. And you could use another parser, like myhtml [22:41] Myhtml is fast, but there are no python binding [22:43] !ao jobs don't see to be much of a bottleneck [22:43] Indeed, we rarely have a queue of !ao jobs. [22:44] And that's with only one !ao-only pipeline... [22:45] JAA: I don't think they parse inline scripts [22:45] dd0a13f37: wpull does not actually parse scripts, but it does process it and tries to extract links from it. [22:45] s/it/them/ [22:46] Same with CSS, I believe. [22:46] Only HTML is parsed properly. [23:00] *** zino has quit IRC (Read error: Connection reset by peer) [23:01] *** zino has joined #archiveteam-bs [23:08] *** ajshell1 has joined #archiveteam-bs [23:12] *** ajshell1 has quit IRC (Client Quit) [23:18] *** icedice has joined #archiveteam-bs [23:20] *** icedice2 has quit IRC (Ping timeout: 260 seconds) [23:27] *** ajshell1 has joined #archiveteam-bs [23:41] *** icedice2 has joined #archiveteam-bs [23:43] *** icedice has quit IRC (Ping timeout: 260 seconds) [23:45] *** dashcloud has quit IRC (Read error: Operation timed out) [23:48] *** dashcloud has joined #archiveteam-bs [23:58] *** icedice has joined #archiveteam-bs