[00:31] balrog: nobody knows [00:31] arkiver: nobody knows [00:31] :/ [00:32] i met with ex-sfbg staff today they said they'd try and get it up [00:32] ah ok [00:32] I see only the www is up [00:32] all other domains are still down [00:32] there's some drupal login here [00:32] http://www.sfbg.com/user [00:32] gunna try my hand at it a bit later see if there's still a way to access the db [00:32] let us know wat you find out [00:33] thx re: archivebot [00:33] will that get everything or do i need to pitch in [00:33] it will get everything that is linked on the site [00:34] there are some things it doesn't get [00:34] nod [00:34] like links hidden behind html stuff [00:34] oops [00:34] javascript stuff [00:34] nod it's drupal so i dont think there's a ton of that? probably not, seems to go fine for now [00:35] I saw there is no sitemap [00:35] yah just the archives- but also there's a ton of content (blogs) that isnt in the print issues [00:35] do you have an example for me? [00:35] still expect to be receiving some 15 years worth of word files [00:35] everything frm [00:35] http://www.sfbg.com/blog [00:35] eg [00:35] feed://www.sfbg.com/blog/rss.xml%20 [00:36] interesting they rm'd the politics blogs [00:36] http://www.sfbg.com/rss-feeds [00:36] maybe not sorry [00:36] those blogs are gine [00:36] fine [00:36] just no rss feed for it [00:36] cool [00:36] they will be crawled [00:36] eg http://www.sfbg.com/politics?page=8 [00:36] dope! [00:37] where do they get dumped [00:37] marc: word files? [00:37] the original files that is? [00:37] yes um it looks like website only has content back to 2006? at least frm that issues archive [00:37] yeah, goes back to 2006 [00:37] so the ex-sfbg staff today told me they had 15 years of word files theat they used to make the pdf [00:37] or whatever final publishing format [00:37] apparently their publishing process was email the word file to the editor then the layout duders collate it [00:37] awesome, so the original format [00:38] yah [00:38] not sure if they're sending them to me but i will ping them again to make sure they have it [00:38] they have some internal IT duderino who is going to try and dump the drupal db, we'll see if that works [00:38] how are you going to put them on IA? [00:38] come in here and ask how? :) [00:38] the issues, word docs I mean [00:38] ah [00:38] if i get them [00:39] as far as I know IA doesn't convert word documents, so we'd have to put the original word docs online and create a pdf from those ourself to put online [00:39] maybe godane would like to do that [00:39] ah okay i think i have some word conversion stuff around here somewhere frm a past life [00:39] (had to reverse word format in late 90s) [00:40] please be careful when using some random software [00:40] k [00:40] where are u dumping the website crawl? [00:40] likely there will be no quality loss in the images converted from word to pdf, but I'd check to be sure first [00:40] oic [00:41] or ask godane if he's interested in converting and uploading [00:41] k [00:41] gunna ping my drupal dev buddy and see if he has any login 0day :) [00:41] the website crawl will be dumped in one of the archivebot packs: https://archive.org/details/archivebot [00:41] cool [00:41] and it will then go into the wayback machine [00:41] FOREVER [00:42] marc: there was a drupal bug just announced in 7 :P [00:42] nice [00:42] ohhh really [00:42] sql injection though [00:42] cool will take a look thx for tip [00:42] haha [00:42] good luck [00:42] [20:38:24] they have some internal IT duderino who is going to try and dump the drupal db, we'll see if that works [00:43] if you have them helping try not to fuck around too much :P [00:43] point taken [00:43] good luck marc, and thanks for this! [00:43] * arkiver is off to bed [00:43] * arkiver says byebye [00:44] `thanks yalls!!!!!! [00:45] i mean literally the sfbg going away = 15k votes disappearing [00:45] since their election endorsement guide is what so many people print out and take to the polls on Nov 4 [00:45] website down == political ramifications [00:46] http://www.sfbg.com/2005/10/31/testing-opinion-title [00:46] looks like that is the earliest article [00:47] right that's prolly them moving over to drupal [00:47] off now [00:48] http://www.sfbg.com/36/18/news_kimiko_burton.html [00:48] i think [00:48] they have older stuff it's just in the older format [00:48] sfbg.com/$VOLUME/$ISSUE/ [00:48] that's frm 2002 [00:48] lemme see if i can find index for that [00:49] http://www.sfbg.com/38/49/x_trail_mix.html [00:49] lotts crap like this [00:50] http://www.sfbg.com/36/18/index.html [00:51] http://www.sfbg.com/37/01/index.html [00:51] heh nice annalee newitz article top of masthead here [00:51] http://www.sfbg.com/36/12/index.html [00:51] that seems to be oldest article in the $VV/$II url format - back to 2002 [00:54] how do i submit to archivebot? :) [00:59] The first steps are to join the #archivebot channel and read the documentation. [01:04] thx [01:17] Are there any direct links from the new site to the old one on sfbg? If not the warrior may be required as archivebot wouldn't find it. [01:19] nope [01:19] there's a google search window [01:19] i wrote a quick wget script to grab 36..40 volumes [01:19] so i typed in old politician names [01:19] and found old links to the pre-drupal webpage format [03:05] okay i wget 2002, 2003, 2004 [03:18] Archive.org prefers WARC format. [03:18] But wgets are good for the moment too. [03:19] You can give a bunch of URLs for archivebot to crawl and grab (the ones that aren't linked) [03:45] cool i have 130 indexes of ld issues here [03:45] http://lucidfusionlabs.com/~marc/old-issues.txt [03:45] old* [03:45] don't have access to submit to archivebot [03:47] fuq [03:51] loks like all of 39 volume is index.php har [03:54] 38.37->40.26 is index.php [03:54] shit or get off the CMS, SFBG frm the past [09:37] Expanded http://archiveteam.org/index.php?title=Quora [10:28] would anybody object to me adding "IS FUCKING AWFUL" in

to the Quora wiki article [10:28] :P [10:31] Nope. [10:35] quora? more like ebola [11:21] schbirid: but luckily not as virulent [11:22] I wonder if Yahoo! Answers can get any worse. Perhaps Quora knows [15:43] No idea if the archive team is still active, but pianofiles.com is going to drop offline [15:43] I like "I am not sure if they're still active" [15:43] Guess we're not making enough noise. [15:43] Pianofiles.com, let's see [15:47] I just spent a little time on it. [15:47] Basically, it's a sheet music trading site with no files. [15:48] ironic [15:48] Set off a archivebot. It will just be good for having a list. [15:48] Not ironic - cowardly [15:48] http://www.icmp-ciem.org/node/487 [15:48] yeah, archivebot can do this one [15:49] http://swappano.com/ is basically the same thing and even has importing. [15:49] I can't find a way to download the sheets? [15:49] The sheets are not up [15:49] You talk with people and get them [15:49] except you clogged up the queue with dutch bankruptcies :P [15:49] DFJusting: heh, sorry ;) But they usually don't too long [15:50] SketchCow: ah, I see [15:50] DFJustin: arkiver: just specify a pipeline [15:50] it's my understanding that that bypasses the queue [15:51] what? I don't have that as my understanding? I missed something? [15:52] arkiver: afaik, if you explicitly specify a pipeline for a job, it will ignore the queue and just send it directly to the pipeline in question [16:03] yes [16:04] which has its own queue [16:05] ah. [16:05] yipdw: I guess that if you target a pipeline that's running a lot of small jobs, it'd still be a lot faster [16:05] :P [16:05] not if arkiver filled it up with bankruptcies [16:06] yipdw: it sends tasks to pipelines immediately? not a central queue? [16:59] the bittorrent site pianosheets died recently [16:59] the admin vanished :( [16:59] It's been discussed above [17:00] pianosheets, not pianofiles [17:00] Oh, sorry. [17:00] you can still log in and view the torrents [17:00] but the forums are closed and the tracker is dead [19:37] -------------------------------------- [19:37] OH SHIT SON [19:37] TWITPIC IS SHUTTING DOWN [19:37] ALL HANDS ON DECK, ALL TRACKERS ON FULL [19:38] -------------------------------------- [19:39] RKenshin: Dump it into archive.org, I don't care how, take it all. [19:39] Wat. First we're shutting down, oh nevermind, we're being bought, err actually we are indeed shutting down next week.. [19:40] He probably opened his mouth during the due diligence phase. But I had my suspicions when they withdrew the trademark application. [19:43] ahh shit really [19:44] http://blog.twitpic.com/2014/09/twitpic-is-shutting-down/ [19:53] woo less than 10 days [19:53] :/ [19:59] should archiveteam's choice be changed for warriors to twitpic? [20:00] Yes, as soon as yipdw and RKenshin look over the situation. [20:00] Here's me being diplomatic: https://twitter.com/thevowel/status/522839182129893376 [20:04] hzhaahaha [20:50] is twitpic-grab ready to use? [20:51] I wonder if they unbanned my IPs [20:55] how do I change the port the web interface listens on? [20:56] for run-pipeline, looks ike --port=1234 [20:56] or --disable-web-server is useful too [20:56] thanks [20:57] running twitpic-grab now [20:59] Oh, I am so angry [21:03] I got here as soon as I heard the "Fuck Noah Everett" batsignal. What happened? [21:03] * antomatic reads back [21:03] Oh, I see. [21:03] "actually..." [21:04] https://www.youtube.com/watch?v=UDekhoeEoCc [21:06] This calls for a new wiki password. [21:07] Never again must the responsibility for so much shared popular culture rest in the hands of one individual or organisation. Therefore, fuck noah everett. [21:10] did you post that on facebook [21:11] me? no [21:17] yahoo has still done more [21:28] true, but yahoo is a big faceless entity. Twitpic and Noah put a face on the destruction. [22:29] It's crazy - FOS has been gathering ancestry.com and swipnet items into megawarcs for WEEKS. [22:29] In the case of ancestry and swipnet, it's actually been just MOVING files for a week [22:29] Just the process of moving all the items out of a hopper directory into 25gb chunks, is days and days. [22:31] uffda [22:44] Where should I ask about errors I'm seeing in the warrior? [22:46] jus431: if they are project specific, please ask them in the project channel [22:46] is there a channel for the twitpic phase 2 project? [22:47] #quitpic [22:47] jus341 ^ [22:56] What's really crazy is that just DELETING THE DIRECTORY AFTER IT'S DONE can take 2-3 hours [22:58] SketchCow: you will have a full collection of Nations Business later tonight [22:58] Great [22:59] also The Social Hour and The Totally Rad Show are all uploaded [22:59] i also uploaded i think all eric pdfs for the 38xxxx area