Time |
Nickname |
Message |
12:26
🔗
|
schbirid |
someone archive https://twitter.com/ProbablyOnion2 |
12:26
🔗
|
schbirid |
http://krebsonsecurity.com/2014/05/teen-arrested-for-30-swattings-bomb-threats/ |
12:59
🔗
|
midas1 |
schbirid: done, archivebot |
13:05
🔗
|
schbirid |
yay |
13:11
🔗
|
midas1 |
harddrive will arrive this week |
13:11
🔗
|
midas1 |
and ill ship it to you |
13:16
🔗
|
schbirid |
yaaay |
13:16
🔗
|
schbirid |
! |
14:25
🔗
|
Nemo_bis |
midas: can you also fire archivebot on http://forums.sugarcrm.com/ ? |
14:26
🔗
|
midas |
sure |
14:27
🔗
|
midas |
running |
16:43
🔗
|
SketchCow |
Hey, maniacs. |
16:43
🔗
|
SketchCow |
So, we are uploading 82gb of radio crap into the archive. |
16:43
🔗
|
SketchCow |
It's really in weird shape. |
16:43
🔗
|
SketchCow |
If anyone wants access to do metadata, that'd be welcome. |
16:44
🔗
|
SketchCow |
I also realize nobody has time for metadata |
16:44
🔗
|
SketchCow |
I think a project for this summer for me is coming up with some way to do metadata automatically that's vaguely useful. |
16:51
🔗
|
rocode |
ivan`: I am not sure if I read that correctly. Has the website owner disallowed members from logging in to archive the site, or do we just need login information? |
17:07
🔗
|
Smiley |
yeah, enjoy that upload.... at least the file names are _Semi_ useful. |
18:14
🔗
|
SadDM |
You like RPG fanzines? Let's be honest... probably not. Well too bad, because I've got another 200 uploading as we speak. |
18:14
🔗
|
exmic |
dag |
18:25
🔗
|
SketchCow |
Hey heyyyyyyyyyyyyyyyyyyy |
18:25
🔗
|
SketchCow |
http://rubyforge.org/ |
18:25
🔗
|
SketchCow |
What's the story - did we grab this thing? |
18:27
🔗
|
SadDM |
I don't know, but I'll throw arivebot on it and get what we can |
18:32
🔗
|
SketchCow |
I just did |
18:33
🔗
|
SadDM |
yup... saw that |
18:36
🔗
|
rocode |
SketchCow: https://twitter.com/SirTerryWrist/status/412908014316707840 heh |
18:36
🔗
|
SadDM |
SketchCow: Can I bother you later to move a few hundred items into folkscanomy_games... once my current upload is done? |
19:08
🔗
|
SketchCow |
Sure. |
19:08
🔗
|
SketchCow |
Also, I got fed up with us uploading a bunch of crazy shit |
19:08
🔗
|
SketchCow |
So I wrote something that pulls keywords. |
19:08
🔗
|
SketchCow |
it does.... OK. It does better than a kid I'm hiring for $9.95/hr to do it |
19:10
🔗
|
SketchCow |
I should note this kid does not exist and I have not just destroyed a young person's dreams |
19:11
🔗
|
SketchCow |
https://archive.org/details/1981-03-compute-magazine |
19:11
🔗
|
SketchCow |
See? The "Subject", i.e. keywords. |
19:15
🔗
|
SadDM |
That's awesome... now you need to share. |
19:17
🔗
|
exmic |
I think he just did share |
19:18
🔗
|
SadDM |
Also, my upload is now done. Everything in here can probably go: https://archive.org/search.php?query=uploader%3A%22aeakett%40gmail.com%22%20AND%20subject%3A%22roleplaying%20game%22%20AND%20NOT%20collection%3A%22folkscanomy_games%22%20AND%20NOT%20subject%3A%22podcast%22%20AND%20NOT%20collection%3A%22archiveteam%22&page=1 |
19:19
🔗
|
SadDM |
exmic: nah, he just shared the output... I wanna see inside of the black box. |
19:20
🔗
|
exmic |
o |
19:21
🔗
|
SketchCow |
Obviously, I'm finding edge cases are exploding. |
19:21
🔗
|
SketchCow |
Well, the creation of the subjects into the item are not a big black box. |
19:21
🔗
|
SketchCow |
That is, I just use the internetarchive python interface and do a ia metadata --modify="subject:SUBJECTTEXT" itemname |
19:21
🔗
|
SketchCow |
So, that saves me time. |
19:22
🔗
|
SketchCow |
But I'm using a keyword generator that I found on git |
19:22
🔗
|
* |
SadDM waits with bated breath... |
19:23
🔗
|
SketchCow |
Shhh |
19:23
🔗
|
SketchCow |
https://github.com/ox-it/spindle-code |
19:23
🔗
|
SketchCow |
Is that enough of the black box for you? |
19:24
🔗
|
SadDM |
lol... probably, yeah. |
19:30
🔗
|
SketchCow |
Now set so if there's subjects set it won't overwrite. |
19:30
🔗
|
SketchCow |
Now I will run it against an entire run of magazines. |
19:31
🔗
|
SketchCow |
If this works vaguely well, it will be especially good for the items that have never and will never have love. |
19:32
🔗
|
SketchCow |
It'll never be perfect. |
19:32
🔗
|
SketchCow |
https://archive.org/details/computer-power-user-magazine-v13i12 but that's a nice set. |
19:33
🔗
|
SadDM |
it's pretty neat though... definitly a good start on stuff that I don't have time to actually read. |
19:35
🔗
|
SketchCow |
> bash keyblart "$each" |
19:35
🔗
|
SketchCow |
> do |
19:35
🔗
|
SketchCow |
> done |
19:35
🔗
|
SketchCow |
root@teamarchive0:/0/keywords# for each in `cat rammer.txt` |
19:36
🔗
|
SketchCow |
The fact that THAT will generate a "reasonble" collection of keywords from the items, put them in, have them eventually end up as a keyword index for that collection? |
19:36
🔗
|
SketchCow |
That works for me. |
19:37
🔗
|
SketchCow |
https://archive.org/details/computer-power-user-magazine-v13i11 |
19:37
🔗
|
SketchCow |
Keyword: "Moulin Rouge" |
19:37
🔗
|
SketchCow |
<face> |
19:37
🔗
|
SketchCow |
-_- |
19:38
🔗
|
rocode |
>hard hat |
19:44
🔗
|
SketchCow |
https://archive.org/search.php?query=collection%3Acomputer_power_user&sort=-publicdate |
19:44
🔗
|
SketchCow |
There it is populating. |
19:44
🔗
|
SketchCow |
Not bad. |
22:22
🔗
|
ivan` |
rocode: I don't know anything about it |
22:32
🔗
|
rocode |
ivan`: Thanks. I will see if I can find out more. |