[00:55] http://www.bitcointrezor.com/ [00:55] this is kinda awesome [02:10] i grabed this website today: https://archive.org/details/iiipercent.blogspot.com-20130622 [02:10] i did so cause he was on glenn beck show yesterday [05:28] So I am going to do a podcast episode on ArchiveTeam. The question is what should I talk about? SketchCow has already done great work explaining what AT is and the origin story. I was thinking of tackling the more technical issues [05:36] sure [05:39] How about the usual process that happens whenever we hear of a shutdown, and the possible ways that we have for archiving sites (warrior, manual scripts, etc.) [05:39] I could literally spend more than one, one hour episode just on crawling [05:40] Also, how we deal with the resistance that we sometimes come up against [05:40] xmc: did you get my text I sent on Wednesday? [05:40] I have no SMSes from you ... ? [05:41] ..must have an outdated number then. [05:41] what're the last few digits of the number you sent to? [05:41] 4499 [05:41] that's a landline :P [05:41] hurr! [05:41] :P [05:42] Well, that's what I get for not knowing how the American phone system assigns numbers. [05:42] there's no way to tell, actually [05:42] ah [05:43] yeah and you can switch a number from being land line to cell and back again [05:43] Well that's handy. [05:43] indeed [05:44] In 'straya, if a number starts with 04, it's a mobile. [05:46] omf_: could spend a small amount of time doing an overview, but still a technical overview, of each stage in the archiving process [05:55] The only problem with an overview is the details I might miss because some of the information is still only in a few brains instead of also being on the wiki [06:02] omf_: iirc best way to go about building a comprehensive thing is doing an outline then filling in each part as detailed as possible [06:06] Alternatively, http://pad.archivingyoursh.it/p/atpodcast [06:22] GLaDOS, and I are fucking banging it out [06:22] WOO IDEAS [06:22] you got deets we need to know, speak now [06:32] i-motherfucking-deas [06:33] tehy speak english in 'what' [06:36] that's a good outline [07:35] DFJustin: by the way, thanks for your additions to the In The Media page :) [07:37] any AT wiki admins here? [07:45] Sup [07:45] winr4r [07:47] GLaDOS: would you link the words "press and discussion" on the main page to [[In The Media]]? :) [07:50] Done [07:50] I think we should possibly update the quote. [07:51] "It sounds like you're holding hands with your userbase on the beach and walking with them into the sunset, when in fact you're choking them to death in the ocean" how about? [07:51] "Google is a library or an archive like a supermarket is a food museum." [07:51] ^ our jason [07:53] Hm.. [08:00] hahah [08:00] that quote is frame-worthy [08:02] jason is very quotable [08:15] indeed [08:30] " [08:30] was just opening a "xul" file in gvim and whispered to myself "only zuul", but then right at the top it says " xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul"> [08:31] The first javascript debugger/ide was called venkman [08:32] haha neat [08:34] about:robots in firefox make an Asmov and Battlestar Galatica reference [08:37] wow and a futurama reference [08:37] also the day after tomorrow. iirc there was some about:authors thing in fx for a while [08:40] just check about:about [08:47] i decide to not mirror the mp3 of www.boilingfrogpost.com [08:47] only so i can get the html dump first cause that would be smaller [08:47] then i can sed the hell out of it for a mp3 list and stuff [08:48] http://neocities.org/ heh [08:55] Cameron_D: should preemptively maintain a crawl of that i'd think [08:55] yeah [08:55] Cameron_D: yeah, that can only end well [08:55] it should be somewhat easy, as it lists all the accounts [08:56] Copenhagen Suborbitals Sapphire Launch (rocket launch) somewhere between 6min and like 25min http://copenhagensuborbitals.com/ http://www.youtube.com/watch?v=ADq5sOfq-lI (live stream) [08:58] arrith1: stream is frozen for me [08:58] weird [08:58] winr4r: just froze for me too. cuts in and out [08:58] their budget is odd [09:01] okay, so it just went through T-00.00.00 and then looped around [09:01] haha yeah, last i heard was reset to '18min' [09:03] so am i seeing the same thing as you? [09:03] guys on a boat [09:03] not sure how that's going into orbit [09:03] https://www.youtube.com/watch?v=Kjf4RogusJc [09:03] completely relevant [09:10] hm the navy must love that song [09:21] Copenhagen Suborbitals - The Open Source, non-profit, and volunteer based space initiative [10:00] 2min 30sec [10:02] reset to 10min nvm [10:41] hmmm [10:41] 2 min until what? [10:41] well, 10 min until what? [10:42] Smiley: rocket launch :) [10:42] Copenhagen Suborbitals Sapphire Launch (rocket launch) somewhere between 6min and like 25min http://copenhagensuborbitals.com/ http://www.youtube.com/watch?v=ADq5sOfq-lI (live stream) [10:42] Nice [10:42] goin now [10:42] This the one which you can have it display a photo? [10:42] 1min [10:42] Smiley: not sure [10:42] 40s [10:42] youtube livetream [10:42] Launching from a ship? [10:42] audio is a bit ahead [10:42] yey launch \o./ [10:43] yeah, brirish sea [10:43] uhoh the timer overflowed :D [10:45] anymore to see, ? [10:46] Smiley: i'm not sure.. maybe not [10:46] was a test thing, i guess wasn't going to space [10:47] not sure if cameras are on the rocket, no livestream from them if there are ;/ [10:48] still very dcool [10:49] yeah :) [11:49] so [11:49] funfair here [11:49] has poptstations [11:49] lol [11:49] popstations * [11:50] http://imgur.com/a/j88jz [12:06] Podcast sounds interesting! [14:17] if anyone wants an x@archivingyoursh.it address, ask me. [15:04] uploading _all_ the funfair videos! [15:04] :P [15:05] GLaDOS: hey awesome [15:06] https://www.youtube.com/playlist?list=PLT324XogjOA9rpuZBrX37SC3xwDybH-BT [15:09] sup joepie [15:37] hai [16:09] this is pretty cool: http://www.vesalia.de/e_indivisionagamk2.htm [16:26] https://www.youtube.com/watch?v=w2BssnWcUlQ&list=PLT324XogjOA9rpuZBrX37SC3xwDybH-BT&index=15 [16:26] ^ the fun part starts at 1:25 [19:37] -buttsmiley [19:39] :D [19:52] why is that not an option [19:52] `wget-warc --buttsmiley` should just do everything automatically [19:56] @:D [19:56] :D [19:56] nighty [19:56] I liked the at-hat better [19:56] (almost said ass-hat) [19:57] arrrfuck we seriously need to be so much faster with xanga D: [19:57] tomorrow I might try building a EC2 instance again [19:57] I just don't get how you pass the variables in D: [20:08] what variables [20:40] now i'm parnodia now [20:40] one of files uploaded at 100% but there is no history of the item at all [20:41] thoses files are going to get lost if i think 100% means upload but it turns out archive didn't thank it for some reason [20:46] Were you using your custom scripts to do the upload? [20:50] Someone on HN said Archive Team is not serious [20:50] lol, yeah cause we don't do shit around here [20:52] lol taking anything HN says even 1/4 seriously [20:53] I wrote a rant that got good coverage, and there were probably 3 good comments [20:53] not positive, but ones that added to a discussion [20:54] omf_: i sadly always will have to use custom scripts [20:55] mostly for the g4 videos cause i have to sed the desc and stuff to it [20:56] i have thought of a way to find out what is missing [20:56] my a index.txt file of all the urls i uploaded and download the html [20:56] then sed for what is not uploaded [21:06] Collecting and preparing metadata has nothing to do with actually uploading the file and verifying the upload [21:06] Your upload part of the process appears to be failing so fix it or replace it [21:06] just give me the line of code then? [21:07] cause i don't know it [21:07] use curl to upload my stuff [21:12] the only way to verifying a upload file is to check for the meta txt file [21:12] well there is your first mistake. A programmer for the Internet Archive already wrote an uploader script that handles a lot of the use cases for bulk file uploading so that no one would have to mess with curl on the command line. ias3upload is the program name and quite a few people in here use it on a regular basis because it has a simple interface [21:13] wrong, you can check the response code on the s3 query after the upload finishes [21:13] So there are at least 2 ways, also you could hit the json api and check it exists and the file size [21:14] or the md5sum or the sha1sum [21:14] but you have to make a meta.xml file for the upload script [21:15] i'm doing very bulk uploading and i don't see how to work t [21:15] *it [21:15] no you make a metadata.csv for ias3upload so normal people can use a spreadsheet to set it up [21:15] I have uploaded hundreds of gigabytes and thousands of files using this method because it is the one recommended for bulk uploading on the Internet Archive's Wiki [21:16] still not getting it [21:17] also i only understand bash not perl [21:17] you do not write any code to use it [21:17] that is how simple it is [21:17] The repo includes full docs and an example metadata.csv for getting started https://github.com/kimmel/ias3upload [21:22] Take Smiley for example, he knows bash and does not know Perl. He has uploaded 700+ gb of content using ias3upload and has never bothered with the code [21:22] yes but my did is in warc.gz [21:22] so [21:22] how do i sed that data and put into meta [21:22] i have to sed it to get the video file the right info [21:23] bash i understand [21:23] okay I can explain this to you. First tell me which meta files you put in your file [21:23] this i don't not [21:23] its automated [21:23] i search for the videokey when a new one needs to be uploaded [21:23] You missed the point. What meta fields are generated by your automated script? [21:24] yes [21:24] That wasn't a yes or no question. [21:24] oh [21:25] the title and desc are in meta data [21:25] So the standard needed data to make an item? [21:27] Did you build your scripts off the examples in http://archive.org/help/abouts3.txt ? [21:27] here is my script: http://pastebin.com/N6AASNSB [21:27] yes [21:27] but again very custom to my needs [21:28] i don't want to have took at my scripts when uploading stuff [21:30] I am just blown away by overly complicated that whole script is [21:30] all i want is this to have error codes of status [21:30] how overly complicated that script is. Just using the ias3upload script gets rid of lines 20 through 37 [21:31] and builds in numerous levels of error checking and correction that this script doesn't have [21:31] but you don't understand [21:32] i may not even have some of right perl stuff to use it [21:32] and i'm still on slitaz for right now [21:32] godane, I have 27 years experience as a programmer. I understand the entire problem just by looking at this script. You scrape urls for videos from warcs using command line tools and brittle regex [21:32] and I already knew you use slitaz [21:33] then download the video and upload it to IA [21:34] its still not going to fix past failed videos [21:35] i will upload what i can now then do a warc.gz sed of all the archive.org posts to see what failed when i think i'm done [21:35] One problem at a time, you just realize that things were failing because you didn't build in any kind of error checking. [21:35] if you have a video list in a file you can use ia-dirdiff to check the items on IA [21:37] i will look at it later [21:37] failed uploads are a known problem and I already wrote a script to solve it. Been using it for months. https://github.com/kimmel/ia-tools [21:39] There are many of us constantly updating the tools we use to make things easier and faster because any problem that has to be solved more than once should and is automated away. [21:44] looks like i can't use the script [21:45] i have no utf8/all.pm file [21:46] you have to install the dependencies, the command is in the docs: cpan autodie utf8::all HTTP::Tiny JSON::XS [21:48] i'm going to bed for now [21:48] bbl [22:21] To everyone, is it crazy to want us all to use the same open source tools so we can have multiple knowledgeable people to help each other out? [22:21] A lot of the problems I keep hearing about with archiving is solvable [22:23] no need to reinvent the wheel, usually [22:24] Reinventing the wheel has two valid use cases from my experiences. One is if you want to learn how it works end to end. Two if you know there is a design flaw with an existing wheel and the new one is going to fix that [22:28] aye, and there's no shortage of brilliant people around to do that if necessary [22:31] I think Archive Team could easily use 4-5 more programmers who are going to stick around for the long term [23:19] omf_: if you're referring to the comment by "chid" here https://news.ycombinator.com/item?id=5925970 i don't read it as a slight against AT, but more commenting that Snapjoy's data is private rather than public, so it's quite hard for AT to do a more traditional grab like we do [23:24] Aranje: try site:snapjoy.com in google [23:24] also having the tools everyone is supposed to use front-and-center on a short 'how to emergency download a site' [23:25] urg :( [23:25] arrith1: * [23:26] Aranje: sorry! :( [23:26] balrog: ah, comments i'm seeing are calling things private. i guess that's quite incorrect. thanks for the correction [23:27] most things appear to be private. [23:27] some users did have public sets. [23:48] ping omf_, re:cuil data for Google Reader archiving [23:50] I can provide the domain prefixes and some Python script to parse out what I need, let me know if I can make it even more convenient