Time |
Nickname |
Message |
00:31
🔗
|
marc |
balrog: nobody knows |
00:31
🔗
|
marc |
arkiver: nobody knows |
00:31
🔗
|
balrog |
:/ |
00:32
🔗
|
marc |
i met with ex-sfbg staff today they said they'd try and get it up |
00:32
🔗
|
arkiver |
ah ok |
00:32
🔗
|
arkiver |
I see only the www is up |
00:32
🔗
|
arkiver |
all other domains are still down |
00:32
🔗
|
marc |
there's some drupal login here |
00:32
🔗
|
marc |
http://www.sfbg.com/user |
00:32
🔗
|
marc |
gunna try my hand at it a bit later see if there's still a way to access the db |
00:32
🔗
|
arkiver |
let us know wat you find out |
00:33
🔗
|
marc |
thx re: archivebot |
00:33
🔗
|
marc |
will that get everything or do i need to pitch in |
00:33
🔗
|
arkiver |
it will get everything that is linked on the site |
00:34
🔗
|
arkiver |
there are some things it doesn't get |
00:34
🔗
|
marc |
nod |
00:34
🔗
|
arkiver |
like links hidden behind html stuff |
00:34
🔗
|
arkiver |
oops |
00:34
🔗
|
arkiver |
javascript stuff |
00:34
🔗
|
marc |
nod it's drupal so i dont think there's a ton of that? </pulled-out-of-ass |
00:34
🔗
|
arkiver |
probably not, seems to go fine for now |
00:35
🔗
|
arkiver |
I saw there is no sitemap |
00:35
🔗
|
marc |
yah just the archives- but also there's a ton of content (blogs) that isnt in the print issues |
00:35
🔗
|
arkiver |
do you have an example for me? |
00:35
🔗
|
marc |
still expect to be receiving some 15 years worth of word files |
00:35
🔗
|
marc |
everything frm |
00:35
🔗
|
marc |
http://www.sfbg.com/blog |
00:35
🔗
|
marc |
eg |
00:35
🔗
|
marc |
feed://www.sfbg.com/blog/rss.xml%20 |
00:36
🔗
|
marc |
interesting they rm'd the politics blogs |
00:36
🔗
|
marc |
http://www.sfbg.com/rss-feeds |
00:36
🔗
|
marc |
maybe not sorry |
00:36
🔗
|
arkiver |
those blogs are gine |
00:36
🔗
|
arkiver |
fine |
00:36
🔗
|
marc |
just no rss feed for it |
00:36
🔗
|
marc |
cool |
00:36
🔗
|
arkiver |
they will be crawled |
00:36
🔗
|
marc |
eg http://www.sfbg.com/politics?page=8 |
00:36
🔗
|
marc |
dope! |
00:37
🔗
|
marc |
where do they get dumped |
00:37
🔗
|
arkiver |
marc: word files? |
00:37
🔗
|
arkiver |
the original files that is? |
00:37
🔗
|
marc |
yes um it looks like website only has content back to 2006? at least frm that issues archive |
00:37
🔗
|
arkiver |
yeah, goes back to 2006 |
00:37
🔗
|
marc |
so the ex-sfbg staff today told me they had 15 years of word files theat they used to make the pdf |
00:37
🔗
|
marc |
or whatever final publishing format |
00:37
🔗
|
marc |
apparently their publishing process was email the word file to the editor then the layout duders collate it |
00:37
🔗
|
arkiver |
awesome, so the original format |
00:38
🔗
|
marc |
yah |
00:38
🔗
|
marc |
not sure if they're sending them to me but i will ping them again to make sure they have it |
00:38
🔗
|
marc |
they have some internal IT duderino who is going to try and dump the drupal db, we'll see if that works |
00:38
🔗
|
arkiver |
how are you going to put them on IA? |
00:38
🔗
|
marc |
come in here and ask how? :) |
00:38
🔗
|
arkiver |
the issues, word docs I mean |
00:38
🔗
|
arkiver |
ah |
00:38
🔗
|
marc |
if i get them |
00:39
🔗
|
arkiver |
as far as I know IA doesn't convert word documents, so we'd have to put the original word docs online and create a pdf from those ourself to put online |
00:39
🔗
|
arkiver |
maybe godane would like to do that |
00:39
🔗
|
marc |
ah okay i think i have some word conversion stuff around here somewhere frm a past life |
00:39
🔗
|
marc |
(had to reverse word format in late 90s) |
00:40
🔗
|
arkiver |
please be careful when using some random software |
00:40
🔗
|
marc |
k |
00:40
🔗
|
marc |
where are u dumping the website crawl? |
00:40
🔗
|
arkiver |
likely there will be no quality loss in the images converted from word to pdf, but I'd check to be sure first |
00:40
🔗
|
marc |
oic |
00:41
🔗
|
arkiver |
or ask godane if he's interested in converting and uploading |
00:41
🔗
|
marc |
k |
00:41
🔗
|
marc |
gunna ping my drupal dev buddy and see if he has any login 0day :) |
00:41
🔗
|
arkiver |
the website crawl will be dumped in one of the archivebot packs: https://archive.org/details/archivebot |
00:41
🔗
|
marc |
cool |
00:41
🔗
|
arkiver |
and it will then go into the wayback machine |
00:41
🔗
|
arkiver |
FOREVER |
00:42
🔗
|
balrog |
marc: there was a drupal bug just announced in 7 :P |
00:42
🔗
|
marc |
nice |
00:42
🔗
|
marc |
ohhh really |
00:42
🔗
|
balrog |
sql injection though |
00:42
🔗
|
marc |
cool will take a look thx for tip |
00:42
🔗
|
arkiver |
haha |
00:42
🔗
|
arkiver |
good luck |
00:42
🔗
|
balrog |
[20:38:24] <marc> they have some internal IT duderino who is going to try and dump the drupal db, we'll see if that works |
00:43
🔗
|
balrog |
if you have them helping try not to fuck around too much :P |
00:43
🔗
|
marc |
point taken |
00:43
🔗
|
arkiver |
good luck marc, and thanks for this! |
00:43
🔗
|
* |
arkiver is off to bed |
00:43
🔗
|
* |
arkiver says byebye |
00:44
🔗
|
marc |
`thanks yalls!!!!!! |
00:45
🔗
|
marc |
i mean literally the sfbg going away = 15k votes disappearing |
00:45
🔗
|
marc |
since their election endorsement guide is what so many people print out and take to the polls on Nov 4 |
00:45
🔗
|
marc |
website down == political ramifications |
00:46
🔗
|
arkiver |
http://www.sfbg.com/2005/10/31/testing-opinion-title |
00:46
🔗
|
arkiver |
looks like that is the earliest article |
00:47
🔗
|
marc |
right that's prolly them moving over to drupal |
00:47
🔗
|
arkiver |
off now |
00:48
🔗
|
marc |
http://www.sfbg.com/36/18/news_kimiko_burton.html |
00:48
🔗
|
marc |
i think |
00:48
🔗
|
marc |
they have older stuff it's just in the older format |
00:48
🔗
|
marc |
sfbg.com/$VOLUME/$ISSUE/ |
00:48
🔗
|
marc |
that's frm 2002 |
00:48
🔗
|
marc |
lemme see if i can find index for that |
00:49
🔗
|
marc |
http://www.sfbg.com/38/49/x_trail_mix.html |
00:49
🔗
|
marc |
lotts crap like this |
00:50
🔗
|
marc |
http://www.sfbg.com/36/18/index.html |
00:51
🔗
|
marc |
http://www.sfbg.com/37/01/index.html |
00:51
🔗
|
marc |
heh nice annalee newitz article top of masthead here |
00:51
🔗
|
marc |
http://www.sfbg.com/36/12/index.html |
00:51
🔗
|
marc |
that seems to be oldest article in the $VV/$II url format - back to 2002 |
00:54
🔗
|
marc |
how do i submit to archivebot? :) |
00:59
🔗
|
aaaaaaaaa |
The first steps are to join the #archivebot channel and read the documentation. |
01:04
🔗
|
marc |
thx |
01:17
🔗
|
aaaaaaaaa |
Are there any direct links from the new site to the old one on sfbg? If not the warrior may be required as archivebot wouldn't find it. |
01:19
🔗
|
marc |
nope |
01:19
🔗
|
marc |
there's a google search window |
01:19
🔗
|
marc |
i wrote a quick wget script to grab 36..40 volumes |
01:19
🔗
|
marc |
so i typed in old politician names |
01:19
🔗
|
marc |
and found old links to the pre-drupal webpage format |
03:05
🔗
|
marc |
okay i wget 2002, 2003, 2004 |
03:18
🔗
|
SketchCow |
Archive.org prefers WARC format. |
03:18
🔗
|
SketchCow |
But wgets are good for the moment too. |
03:19
🔗
|
SketchCow |
You can give a bunch of URLs for archivebot to crawl and grab (the ones that aren't linked) |
03:45
🔗
|
marc |
cool i have 130 indexes of ld issues here |
03:45
🔗
|
marc |
http://lucidfusionlabs.com/~marc/old-issues.txt |
03:45
🔗
|
marc |
old* |
03:45
🔗
|
marc |
don't have access to submit to archivebot |
03:47
🔗
|
marc |
fuq |
03:51
🔗
|
marc |
loks like all of 39 volume is index.php har |
03:54
🔗
|
marc |
38.37->40.26 is index.php |
03:54
🔗
|
marc |
shit or get off the CMS, SFBG frm the past |
09:37
🔗
|
Nemo_bis |
Expanded http://archiveteam.org/index.php?title=Quora |
10:28
🔗
|
joepie91 |
would anybody object to me adding "IS FUCKING AWFUL" in <h1> to the Quora wiki article |
10:28
🔗
|
joepie91 |
:P |
10:31
🔗
|
ersi |
Nope. |
10:35
🔗
|
schbirid |
quora? more like ebola |
11:21
🔗
|
Nemo_bis |
schbirid: but luckily not as virulent |
11:22
🔗
|
Nemo_bis |
I wonder if Yahoo! Answers can get any worse. Perhaps Quora knows |
15:43
🔗
|
SketchCow |
No idea if the archive team is still active, but pianofiles.com is going to drop offline |
15:43
🔗
|
SketchCow |
I like "I am not sure if they're still active" |
15:43
🔗
|
SketchCow |
Guess we're not making enough noise. |
15:43
🔗
|
arkiver |
Pianofiles.com, let's see |
15:47
🔗
|
SketchCow |
I just spent a little time on it. |
15:47
🔗
|
SketchCow |
Basically, it's a sheet music trading site with no files. |
15:48
🔗
|
DFJustin |
ironic |
15:48
🔗
|
SketchCow |
Set off a archivebot. It will just be good for having a list. |
15:48
🔗
|
SketchCow |
Not ironic - cowardly |
15:48
🔗
|
SketchCow |
http://www.icmp-ciem.org/node/487 |
15:48
🔗
|
arkiver |
yeah, archivebot can do this one |
15:49
🔗
|
SketchCow |
http://swappano.com/ is basically the same thing and even has importing. |
15:49
🔗
|
arkiver |
I can't find a way to download the sheets? |
15:49
🔗
|
SketchCow |
The sheets are not up |
15:49
🔗
|
SketchCow |
You talk with people and get them |
15:49
🔗
|
DFJustin |
except you clogged up the queue with dutch bankruptcies :P |
15:49
🔗
|
arkiver |
DFJusting: heh, sorry ;) But they usually don't too long |
15:50
🔗
|
arkiver |
SketchCow: ah, I see |
15:50
🔗
|
joepie91 |
DFJustin: arkiver: just specify a pipeline |
15:50
🔗
|
joepie91 |
it's my understanding that that bypasses the queue |
15:51
🔗
|
arkiver |
what? I don't have that as my understanding? I missed something? |
15:52
🔗
|
joepie91 |
arkiver: afaik, if you explicitly specify a pipeline for a job, it will ignore the queue and just send it directly to the pipeline in question |
16:03
🔗
|
yipdw |
yes |
16:04
🔗
|
yipdw |
which has its own queue |
16:05
🔗
|
joepie91 |
ah. |
16:05
🔗
|
joepie91 |
yipdw: I guess that if you target a pipeline that's running a lot of small jobs, it'd still be a lot faster |
16:05
🔗
|
joepie91 |
:P |
16:05
🔗
|
yipdw |
not if arkiver filled it up with bankruptcies |
16:06
🔗
|
joepie91 |
yipdw: it sends tasks to pipelines immediately? not a central queue? |
16:59
🔗
|
balrog |
the bittorrent site pianosheets died recently |
16:59
🔗
|
balrog |
the admin vanished :( |
16:59
🔗
|
ersi |
It's been discussed above |
17:00
🔗
|
balrog |
pianosheets, not pianofiles |
17:00
🔗
|
ersi |
Oh, sorry. |
17:00
🔗
|
balrog |
you can still log in and view the torrents |
17:00
🔗
|
balrog |
but the forums are closed and the tracker is dead |
19:37
🔗
|
SketchCow |
-------------------------------------- |
19:37
🔗
|
SketchCow |
OH SHIT SON |
19:37
🔗
|
SketchCow |
TWITPIC IS SHUTTING DOWN |
19:37
🔗
|
SketchCow |
ALL HANDS ON DECK, ALL TRACKERS ON FULL |
19:38
🔗
|
SketchCow |
-------------------------------------- |
19:39
🔗
|
SketchCow |
RKenshin: Dump it into archive.org, I don't care how, take it all. |
19:39
🔗
|
Elegance |
Wat. First we're shutting down, oh nevermind, we're being bought, err actually we are indeed shutting down next week.. |
19:40
🔗
|
aaaaaaaaa |
He probably opened his mouth during the due diligence phase. But I had my suspicions when they withdrew the trademark application. |
19:43
🔗
|
Jonimus |
ahh shit really |
19:44
🔗
|
aaaaaaaaa |
http://blog.twitpic.com/2014/09/twitpic-is-shutting-down/ |
19:53
🔗
|
Jonimus |
woo less than 10 days |
19:53
🔗
|
Jonimus |
:/ |
19:59
🔗
|
avuserow |
should archiveteam's choice be changed for warriors to twitpic? |
20:00
🔗
|
SketchCow |
Yes, as soon as yipdw and RKenshin look over the situation. |
20:00
🔗
|
SketchCow |
Here's me being diplomatic: https://twitter.com/thevowel/status/522839182129893376 |
20:04
🔗
|
xmc |
hzhaahaha |
20:50
🔗
|
dserodio |
is twitpic-grab ready to use? |
20:51
🔗
|
Elegance |
I wonder if they unbanned my IPs |
20:55
🔗
|
dserodio |
how do I change the port the web interface listens on? |
20:56
🔗
|
avuserow |
for run-pipeline, looks ike --port=1234 |
20:56
🔗
|
avuserow |
or --disable-web-server is useful too |
20:56
🔗
|
dserodio |
thanks |
20:57
🔗
|
dserodio |
running twitpic-grab now |
20:59
🔗
|
SketchCow |
Oh, I am so angry |
21:03
🔗
|
antomatic |
I got here as soon as I heard the "Fuck Noah Everett" batsignal. What happened? |
21:03
🔗
|
* |
antomatic reads back |
21:03
🔗
|
antomatic |
Oh, I see. |
21:03
🔗
|
xmc |
<noaheverett> "actually..." |
21:04
🔗
|
SketchCow |
https://www.youtube.com/watch?v=UDekhoeEoCc |
21:06
🔗
|
antomatic |
This calls for a new wiki password. |
21:07
🔗
|
antomatic |
Never again must the responsibility for so much shared popular culture rest in the hands of one individual or organisation. Therefore, fuck noah everett. |
21:10
🔗
|
yipdw |
did you post that on facebook |
21:11
🔗
|
antomatic |
me? no |
21:17
🔗
|
xmc |
yahoo has still done more |
21:28
🔗
|
antomatic |
true, but yahoo is a big faceless entity. Twitpic and Noah put a face on the destruction. |
22:29
🔗
|
SketchCow |
It's crazy - FOS has been gathering ancestry.com and swipnet items into megawarcs for WEEKS. |
22:29
🔗
|
SketchCow |
In the case of ancestry and swipnet, it's actually been just MOVING files for a week |
22:29
🔗
|
SketchCow |
Just the process of moving all the items out of a hopper directory into 25gb chunks, is days and days. |
22:31
🔗
|
xmc |
uffda |
22:44
🔗
|
jus341 |
Where should I ask about errors I'm seeing in the warrior? |
22:46
🔗
|
arkiver |
jus431: if they are project specific, please ask them in the project channel |
22:46
🔗
|
jus341 |
is there a channel for the twitpic phase 2 project? |
22:47
🔗
|
aaaaaaaaa |
#quitpic |
22:47
🔗
|
aaaaaaaaa |
jus341 ^ |
22:56
🔗
|
SketchCow |
What's really crazy is that just DELETING THE DIRECTORY AFTER IT'S DONE can take 2-3 hours |
22:58
🔗
|
godane |
SketchCow: you will have a full collection of Nations Business later tonight |
22:58
🔗
|
SketchCow |
Great |
22:59
🔗
|
godane |
also The Social Hour and The Totally Rad Show are all uploaded |
22:59
🔗
|
godane |
i also uploaded i think all eric pdfs for the 38xxxx area |