#archiveteam 2014-10-16,Thu

↑back Search

Time Nickname Message
00:31 🔗 marc balrog: nobody knows
00:31 🔗 marc arkiver: nobody knows
00:31 🔗 balrog :/
00:32 🔗 marc i met with ex-sfbg staff today they said they'd try and get it up
00:32 🔗 arkiver ah ok
00:32 🔗 arkiver I see only the www is up
00:32 🔗 arkiver all other domains are still down
00:32 🔗 marc there's some drupal login here
00:32 🔗 marc http://www.sfbg.com/user
00:32 🔗 marc gunna try my hand at it a bit later see if there's still a way to access the db
00:32 🔗 arkiver let us know wat you find out
00:33 🔗 marc thx re: archivebot
00:33 🔗 marc will that get everything or do i need to pitch in
00:33 🔗 arkiver it will get everything that is linked on the site
00:34 🔗 arkiver there are some things it doesn't get
00:34 🔗 marc nod
00:34 🔗 arkiver like links hidden behind html stuff
00:34 🔗 arkiver oops
00:34 🔗 arkiver javascript stuff
00:34 🔗 marc nod it's drupal so i dont think there's a ton of that? </pulled-out-of-ass
00:34 🔗 arkiver probably not, seems to go fine for now
00:35 🔗 arkiver I saw there is no sitemap
00:35 🔗 marc yah just the archives- but also there's a ton of content (blogs) that isnt in the print issues
00:35 🔗 arkiver do you have an example for me?
00:35 🔗 marc still expect to be receiving some 15 years worth of word files
00:35 🔗 marc everything frm
00:35 🔗 marc http://www.sfbg.com/blog
00:35 🔗 marc eg
00:35 🔗 marc feed://www.sfbg.com/blog/rss.xml%20
00:36 🔗 marc interesting they rm'd the politics blogs
00:36 🔗 marc http://www.sfbg.com/rss-feeds
00:36 🔗 marc maybe not sorry
00:36 🔗 arkiver those blogs are gine
00:36 🔗 arkiver fine
00:36 🔗 marc just no rss feed for it
00:36 🔗 marc cool
00:36 🔗 arkiver they will be crawled
00:36 🔗 marc eg http://www.sfbg.com/politics?page=8
00:36 🔗 marc dope!
00:37 🔗 marc where do they get dumped
00:37 🔗 arkiver marc: word files?
00:37 🔗 arkiver the original files that is?
00:37 🔗 marc yes um it looks like website only has content back to 2006? at least frm that issues archive
00:37 🔗 arkiver yeah, goes back to 2006
00:37 🔗 marc so the ex-sfbg staff today told me they had 15 years of word files theat they used to make the pdf
00:37 🔗 marc or whatever final publishing format
00:37 🔗 marc apparently their publishing process was email the word file to the editor then the layout duders collate it
00:37 🔗 arkiver awesome, so the original format
00:38 🔗 marc yah
00:38 🔗 marc not sure if they're sending them to me but i will ping them again to make sure they have it
00:38 🔗 marc they have some internal IT duderino who is going to try and dump the drupal db, we'll see if that works
00:38 🔗 arkiver how are you going to put them on IA?
00:38 🔗 marc come in here and ask how? :)
00:38 🔗 arkiver the issues, word docs I mean
00:38 🔗 arkiver ah
00:38 🔗 marc if i get them
00:39 🔗 arkiver as far as I know IA doesn't convert word documents, so we'd have to put the original word docs online and create a pdf from those ourself to put online
00:39 🔗 arkiver maybe godane would like to do that
00:39 🔗 marc ah okay i think i have some word conversion stuff around here somewhere frm a past life
00:39 🔗 marc (had to reverse word format in late 90s)
00:40 🔗 arkiver please be careful when using some random software
00:40 🔗 marc k
00:40 🔗 marc where are u dumping the website crawl?
00:40 🔗 arkiver likely there will be no quality loss in the images converted from word to pdf, but I'd check to be sure first
00:40 🔗 marc oic
00:41 🔗 arkiver or ask godane if he's interested in converting and uploading
00:41 🔗 marc k
00:41 🔗 marc gunna ping my drupal dev buddy and see if he has any login 0day :)
00:41 🔗 arkiver the website crawl will be dumped in one of the archivebot packs: https://archive.org/details/archivebot
00:41 🔗 marc cool
00:41 🔗 arkiver and it will then go into the wayback machine
00:41 🔗 arkiver FOREVER
00:42 🔗 balrog marc: there was a drupal bug just announced in 7 :P
00:42 🔗 marc nice
00:42 🔗 marc ohhh really
00:42 🔗 balrog sql injection though
00:42 🔗 marc cool will take a look thx for tip
00:42 🔗 arkiver haha
00:42 🔗 arkiver good luck
00:42 🔗 balrog [20:38:24] <marc> they have some internal IT duderino who is going to try and dump the drupal db, we'll see if that works
00:43 🔗 balrog if you have them helping try not to fuck around too much :P
00:43 🔗 marc point taken
00:43 🔗 arkiver good luck marc, and thanks for this!
00:43 🔗 * arkiver is off to bed
00:43 🔗 * arkiver says byebye
00:44 🔗 marc `thanks yalls!!!!!!
00:45 🔗 marc i mean literally the sfbg going away = 15k votes disappearing
00:45 🔗 marc since their election endorsement guide is what so many people print out and take to the polls on Nov 4
00:45 🔗 marc website down == political ramifications
00:46 🔗 arkiver http://www.sfbg.com/2005/10/31/testing-opinion-title
00:46 🔗 arkiver looks like that is the earliest article
00:47 🔗 marc right that's prolly them moving over to drupal
00:47 🔗 arkiver off now
00:48 🔗 marc http://www.sfbg.com/36/18/news_kimiko_burton.html
00:48 🔗 marc i think
00:48 🔗 marc they have older stuff it's just in the older format
00:48 🔗 marc sfbg.com/$VOLUME/$ISSUE/
00:48 🔗 marc that's frm 2002
00:48 🔗 marc lemme see if i can find index for that
00:49 🔗 marc http://www.sfbg.com/38/49/x_trail_mix.html
00:49 🔗 marc lotts crap like this
00:50 🔗 marc http://www.sfbg.com/36/18/index.html
00:51 🔗 marc http://www.sfbg.com/37/01/index.html
00:51 🔗 marc heh nice annalee newitz article top of masthead here
00:51 🔗 marc http://www.sfbg.com/36/12/index.html
00:51 🔗 marc that seems to be oldest article in the $VV/$II url format - back to 2002
00:54 🔗 marc how do i submit to archivebot? :)
00:59 🔗 aaaaaaaaa The first steps are to join the #archivebot channel and read the documentation.
01:04 🔗 marc thx
01:17 🔗 aaaaaaaaa Are there any direct links from the new site to the old one on sfbg? If not the warrior may be required as archivebot wouldn't find it.
01:19 🔗 marc nope
01:19 🔗 marc there's a google search window
01:19 🔗 marc i wrote a quick wget script to grab 36..40 volumes
01:19 🔗 marc so i typed in old politician names
01:19 🔗 marc and found old links to the pre-drupal webpage format
03:05 🔗 marc okay i wget 2002, 2003, 2004
03:18 🔗 SketchCow Archive.org prefers WARC format.
03:18 🔗 SketchCow But wgets are good for the moment too.
03:19 🔗 SketchCow You can give a bunch of URLs for archivebot to crawl and grab (the ones that aren't linked)
03:45 🔗 marc cool i have 130 indexes of ld issues here
03:45 🔗 marc http://lucidfusionlabs.com/~marc/old-issues.txt
03:45 🔗 marc old*
03:45 🔗 marc don't have access to submit to archivebot
03:47 🔗 marc fuq
03:51 🔗 marc loks like all of 39 volume is index.php har
03:54 🔗 marc 38.37->40.26 is index.php
03:54 🔗 marc shit or get off the CMS, SFBG frm the past
09:37 🔗 Nemo_bis Expanded http://archiveteam.org/index.php?title=Quora
10:28 🔗 joepie91 would anybody object to me adding "IS FUCKING AWFUL" in <h1> to the Quora wiki article
10:28 🔗 joepie91 :P
10:31 🔗 ersi Nope.
10:35 🔗 schbirid quora? more like ebola
11:21 🔗 Nemo_bis schbirid: but luckily not as virulent
11:22 🔗 Nemo_bis I wonder if Yahoo! Answers can get any worse. Perhaps Quora knows
15:43 🔗 SketchCow No idea if the archive team is still active, but pianofiles.com is going to drop offline
15:43 🔗 SketchCow I like "I am not sure if they're still active"
15:43 🔗 SketchCow Guess we're not making enough noise.
15:43 🔗 arkiver Pianofiles.com, let's see
15:47 🔗 SketchCow I just spent a little time on it.
15:47 🔗 SketchCow Basically, it's a sheet music trading site with no files.
15:48 🔗 DFJustin ironic
15:48 🔗 SketchCow Set off a archivebot. It will just be good for having a list.
15:48 🔗 SketchCow Not ironic - cowardly
15:48 🔗 SketchCow http://www.icmp-ciem.org/node/487
15:48 🔗 arkiver yeah, archivebot can do this one
15:49 🔗 SketchCow http://swappano.com/ is basically the same thing and even has importing.
15:49 🔗 arkiver I can't find a way to download the sheets?
15:49 🔗 SketchCow The sheets are not up
15:49 🔗 SketchCow You talk with people and get them
15:49 🔗 DFJustin except you clogged up the queue with dutch bankruptcies :P
15:49 🔗 arkiver DFJusting: heh, sorry ;) But they usually don't too long
15:50 🔗 arkiver SketchCow: ah, I see
15:50 🔗 joepie91 DFJustin: arkiver: just specify a pipeline
15:50 🔗 joepie91 it's my understanding that that bypasses the queue
15:51 🔗 arkiver what? I don't have that as my understanding? I missed something?
15:52 🔗 joepie91 arkiver: afaik, if you explicitly specify a pipeline for a job, it will ignore the queue and just send it directly to the pipeline in question
16:03 🔗 yipdw yes
16:04 🔗 yipdw which has its own queue
16:05 🔗 joepie91 ah.
16:05 🔗 joepie91 yipdw: I guess that if you target a pipeline that's running a lot of small jobs, it'd still be a lot faster
16:05 🔗 joepie91 :P
16:05 🔗 yipdw not if arkiver filled it up with bankruptcies
16:06 🔗 joepie91 yipdw: it sends tasks to pipelines immediately? not a central queue?
16:59 🔗 balrog the bittorrent site pianosheets died recently
16:59 🔗 balrog the admin vanished :(
16:59 🔗 ersi It's been discussed above
17:00 🔗 balrog pianosheets, not pianofiles
17:00 🔗 ersi Oh, sorry.
17:00 🔗 balrog you can still log in and view the torrents
17:00 🔗 balrog but the forums are closed and the tracker is dead
19:37 🔗 SketchCow --------------------------------------
19:37 🔗 SketchCow OH SHIT SON
19:37 🔗 SketchCow TWITPIC IS SHUTTING DOWN
19:37 🔗 SketchCow ALL HANDS ON DECK, ALL TRACKERS ON FULL
19:38 🔗 SketchCow --------------------------------------
19:39 🔗 SketchCow RKenshin: Dump it into archive.org, I don't care how, take it all.
19:39 🔗 Elegance Wat. First we're shutting down, oh nevermind, we're being bought, err actually we are indeed shutting down next week..
19:40 🔗 aaaaaaaaa He probably opened his mouth during the due diligence phase. But I had my suspicions when they withdrew the trademark application.
19:43 🔗 Jonimus ahh shit really
19:44 🔗 aaaaaaaaa http://blog.twitpic.com/2014/09/twitpic-is-shutting-down/
19:53 🔗 Jonimus woo less than 10 days
19:53 🔗 Jonimus :/
19:59 🔗 avuserow should archiveteam's choice be changed for warriors to twitpic?
20:00 🔗 SketchCow Yes, as soon as yipdw and RKenshin look over the situation.
20:00 🔗 SketchCow Here's me being diplomatic: https://twitter.com/thevowel/status/522839182129893376
20:04 🔗 xmc hzhaahaha
20:50 🔗 dserodio is twitpic-grab ready to use?
20:51 🔗 Elegance I wonder if they unbanned my IPs
20:55 🔗 dserodio how do I change the port the web interface listens on?
20:56 🔗 avuserow for run-pipeline, looks ike --port=1234
20:56 🔗 avuserow or --disable-web-server is useful too
20:56 🔗 dserodio thanks
20:57 🔗 dserodio running twitpic-grab now
20:59 🔗 SketchCow Oh, I am so angry
21:03 🔗 antomatic I got here as soon as I heard the "Fuck Noah Everett" batsignal. What happened?
21:03 🔗 * antomatic reads back
21:03 🔗 antomatic Oh, I see.
21:03 🔗 xmc <noaheverett> "actually..."
21:04 🔗 SketchCow https://www.youtube.com/watch?v=UDekhoeEoCc
21:06 🔗 antomatic This calls for a new wiki password.
21:07 🔗 antomatic Never again must the responsibility for so much shared popular culture rest in the hands of one individual or organisation. Therefore, fuck noah everett.
21:10 🔗 yipdw did you post that on facebook
21:11 🔗 antomatic me? no
21:17 🔗 xmc yahoo has still done more
21:28 🔗 antomatic true, but yahoo is a big faceless entity. Twitpic and Noah put a face on the destruction.
22:29 🔗 SketchCow It's crazy - FOS has been gathering ancestry.com and swipnet items into megawarcs for WEEKS.
22:29 🔗 SketchCow In the case of ancestry and swipnet, it's actually been just MOVING files for a week
22:29 🔗 SketchCow Just the process of moving all the items out of a hopper directory into 25gb chunks, is days and days.
22:31 🔗 xmc uffda
22:44 🔗 jus341 Where should I ask about errors I'm seeing in the warrior?
22:46 🔗 arkiver jus431: if they are project specific, please ask them in the project channel
22:46 🔗 jus341 is there a channel for the twitpic phase 2 project?
22:47 🔗 aaaaaaaaa #quitpic
22:47 🔗 aaaaaaaaa jus341 ^
22:56 🔗 SketchCow What's really crazy is that just DELETING THE DIRECTORY AFTER IT'S DONE can take 2-3 hours
22:58 🔗 godane SketchCow: you will have a full collection of Nations Business later tonight
22:58 🔗 SketchCow Great
22:59 🔗 godane also The Social Hour and The Totally Rad Show are all uploaded
22:59 🔗 godane i also uploaded i think all eric pdfs for the 38xxxx area

irclogger-viewer