Time |
Nickname |
Message |
00:24
🔗
|
godane |
Arkiver2: the only subtitles i found are in dutch |
03:42
🔗
|
dx |
hey #archiveteam! bukkit, a minecraft server / modding platform, is dead/dying due to a licensing conflict. the conflicting software is all gone due to DMCA, but their websites might still have plenty of data useful for the rest of the modding community |
03:44
🔗
|
yipdw |
dx: we got a copy of the bukkit wiki via archivebot |
03:44
🔗
|
yipdw |
if you've got other sites, please feel free to drop by #archivebot |
03:44
🔗
|
dx |
neat |
03:44
🔗
|
dx |
what about this? http://dev.bukkit.org/bukkit-plugins/ |
03:45
🔗
|
dx |
pages in there could be archived through archivebot, but what about plugin .jars? |
03:45
🔗
|
dx |
pages would be useful because there's plenty of documentation in that site |
03:45
🔗
|
yipdw |
one sec, checking |
03:46
🔗
|
yipdw |
if the jars are linked, we can get them |
03:47
🔗
|
dx |
hmm, they seem like direct links http://dev.bukkit.org/bukkit-plugins/nametags/files/ |
03:49
🔗
|
balrog |
the forums |
03:49
🔗
|
balrog |
the plugins would be useful |
03:49
🔗
|
balrog |
I have clones of the DMCA'd repos |
03:49
🔗
|
balrog |
but I don't know what the legality on that is :p |
03:50
🔗
|
dx |
as illegal as it always was |
03:50
🔗
|
dx |
it's only down now because they are acting on it |
07:35
🔗
|
dx |
so uhhh, there's an api for dev.bukkit.org, i scrapped all the plugin download links with it |
07:36
🔗
|
dx |
so now i've got this huge json file with some metadata and a bunch of direct download links, sample of the first 100: http://dump.dequis.org/mF7xj.txt |
07:37
🔗
|
dx |
no filesize info or uploaded timestamps, could grab that with HEAD requests maybe. |
07:37
🔗
|
dx |
full dump http://dequis.org/bukkitdev-releases.json.gz - 3.4mb gzipped, 27mb uncompressed |
07:41
🔗
|
dx |
all file urls one per line, no filtering at all, http://dequis.org/bukkitdev_all_file_urls.txt - 4.5mb, 75494 lines |
07:42
🔗
|
dx |
it's an absurd amount of files, most of them not worth saving |
07:43
🔗
|
dx |
how do i reduce this? for some projects it's not as trivial as grabbing the latest release, they have several parallel releases for a single version |
07:46
🔗
|
Litus1960 |
Q: just installed Virtualbox and want to import a fine (appliance) how do I get the files? |
07:46
🔗
|
Litus1960 |
a file |
08:47
🔗
|
midas |
Litus1960: http://tracker.archiveteam.org/ click the download link |
08:55
🔗
|
Litus1960 |
@Midas I got the machine how do iI import the files? |
08:58
🔗
|
Rotab |
http://archiveteam.org/index.php?title=Warrior |
09:00
🔗
|
Litus1960 |
In VirtualBox, click File > Import Appliance and open the file. -> what file do I open? |
09:00
🔗
|
Rotab |
....? |
09:00
🔗
|
Rotab |
the file you just donwloaded |
09:01
🔗
|
Litus1960 |
I import the the file that I just installed the virtual machine with? |
09:03
🔗
|
Rotab |
you import the .ova |
09:04
🔗
|
midas |
Litus1960: http://archive.org/download/archiveteam-warrior/archiveteam-warrior-v2-20121008.ova |
09:04
🔗
|
midas |
then in virtualbox > file > import |
09:07
🔗
|
Litus1960 |
ok did that now how do I run a job? |
09:07
🔗
|
midas |
it shows a IP address in the console |
09:08
🔗
|
midas |
it explains it here: http://tracker.archiveteam.org/ |
09:11
🔗
|
Litus1960 |
got it |
09:11
🔗
|
schbirid |
Litus1960: welcome and thanks for helping out :) |
09:15
🔗
|
Litus1960 |
sorry bud, thank you for helping out :D was so buissy trying to comprehend the poop not a nerdy person ;-) |
09:18
🔗
|
schbirid |
http://www.classiccomputing.com/CC/Blog/Entries/2014/9/7_Computer_History_Nostalgia_Podcasts.html |
09:23
🔗
|
Litus1960 |
followed with @FlexMind |
09:26
🔗
|
Litus1960 |
bye for now |
09:50
🔗
|
danneh_ |
y'all have probably seen this, but: http://blog.twitpic.com/2014/09/twitpic-is-shutting-down/ |
09:51
🔗
|
ersi |
danneh_: Yeah. There's a project channel at #quitpic |
09:52
🔗
|
danneh_ |
ersi: Ah, fair enough. Thanks for being so quick and awesome! |
09:53
🔗
|
ersi |
np! |
09:53
🔗
|
Muad-Dib |
is anyone going to those Dutch meetups about digital preservation tomorrow? http://www.dezwijger.nl/115739/nl/tegenlicht-meet-up-23-digitaal-geheugenverlies |
09:54
🔗
|
Muad-Dib |
joepie91, midas ^ |
09:58
🔗
|
midas |
I wish i could, but work and job interviews |
10:03
🔗
|
Muad-Dib |
I might be able to go, but I'm a bit reluctant about walking around there and going "Hey guys, I'm with Archive Team!" |
10:04
🔗
|
Muad-Dib |
I don't want to be mistaken for a representative, as I'm not really a good PR person |
10:05
🔗
|
midas |
simple awnser, dont :p |
10:07
🔗
|
Muad-Dib |
but I also want to stir up discussion about digital preservation, and if people ask "how are you connected to this stuff?" |
10:09
🔗
|
Muad-Dib |
Shall I stick with Jasons "Teenagers and crazy people" description for bands of rogue archivists like AT? https://decorrespondent.nl/1695/Waarom-deze-man-het-hele-internet-downloadt/56475705-f10825bc |
10:12
🔗
|
antomatic |
I believe he said "maniacs" :) |
10:12
🔗
|
Muad-Dib |
antomatic: figures, it was translated into dutch in that interview |
10:13
🔗
|
Muad-Dib |
havent watched the entire documentary yet |
14:36
🔗
|
Arkiver2 |
Guys, I'm going to the meetup tomorrow evening in Amsterdam |
14:37
🔗
|
Arkiver2 |
^Muad-Dib |
14:39
🔗
|
Muad-Dib |
If I can free the time, I'll be there too |
14:41
🔗
|
midas |
i'd like to be there, but the day after that i have a job interview abroad.. so ill pass |
14:42
🔗
|
midas |
but do remind them that we grabbed the announcement already Muad-Dib ;) |
14:43
🔗
|
Arkiver2 |
midas: I'd like to tell people about the projects we are curently doing |
14:43
🔗
|
Arkiver2 |
and the problems the archiveteam is usually experiencing when archiving a websites with warrior |
14:44
🔗
|
Arkiver2 |
However, it would be great if someone of you can also be there |
14:44
🔗
|
Arkiver2 |
:) |
14:51
🔗
|
vantec |
I'd like to go but I'm 6.5k km away |
14:51
🔗
|
phuzion |
dx: I'm getting the sizes of everything right now, just so you know |
14:53
🔗
|
Arkiver2 |
dx: There is nothing that is not worth saving!! |
14:54
🔗
|
Arkiver2 |
If the list of files is below, say, 200 GB we can do it with archivebot |
14:58
🔗
|
Arkiver2 |
added it to archivebot |
14:58
🔗
|
Arkiver2 |
should be done in a day |
14:59
🔗
|
dx |
whoa! |
14:59
🔗
|
dx |
just came home and you're already archiving the whole thing :D |
14:59
🔗
|
Arkiver2 |
dx: http://archivebot.com/ first one |
15:00
🔗
|
dx |
Arkiver2: thanks :D |
15:00
🔗
|
Arkiver2 |
if you have any other list of files/pages please give it and it will be saved |
15:01
🔗
|
dx |
Arkiver2: all the pages under http://dev.bukkit.org/bukkit-plugins/ - mostly documentation of those plugins, they also link to the jars in that list so you'll want to exclude that. |
15:02
🔗
|
Arkiver2 |
dx: all of dev.bukkit.org is being saved: http://archivebot.com/ (third one) |
15:02
🔗
|
dx |
Arkiver2: it won't download jars twice, right? |
15:03
🔗
|
dx |
also, :D |
15:04
🔗
|
yipdw |
dx: if the JARs have N URLs, they will be downloaded N times |
15:05
🔗
|
dx |
yipdw: each url is a different version, but they are being downloaded from both my file list and the dev.bukkit.org recursive task |
15:06
🔗
|
Arkiver2 |
I'd say just leave it as it's going now, it won't be too big |
15:06
🔗
|
dx |
hmm |
15:08
🔗
|
jules |
hi |
15:08
🔗
|
Arkiver2 |
jules: hello |
15:08
🔗
|
dx |
yesterday i took a random sample of 50 of them, got average 140kbytes, max 4mbytes, so uhhh, somewhere between 10gb and 300gb |
15:08
🔗
|
dx |
but i have no idea what's "too big" for you guys :P |
15:09
🔗
|
Arkiver2 |
nothing is too big or us |
15:09
🔗
|
Arkiver2 |
well, there is a limit kind of |
15:09
🔗
|
dx |
yeah i saw the twitch wiki page |
15:09
🔗
|
Arkiver2 |
MobileMe was more then 200 TB |
15:09
🔗
|
dx |
whoa. |
15:09
🔗
|
Arkiver2 |
so it should be fine ;) |
15:10
🔗
|
dx |
:D |
15:10
🔗
|
dx |
thanks a lot! |
15:10
🔗
|
yipdw |
dx: if that's the case, please ignore all jars from the dev.bukkit.org job |
15:10
🔗
|
dx |
Arkiver2: ^ |
15:10
🔗
|
yipdw |
there's "having lots of space" and "not being brain-dead" and two identical copies are the latter |
15:10
🔗
|
dx |
yup! |
15:10
🔗
|
Arkiver2 |
we can do that too |
15:11
🔗
|
yipdw |
hopefully dev.bukkit.org doesn't use Java applets |
15:11
🔗
|
dx |
haha no |
15:11
🔗
|
yipdw |
that'll make the obvious ignore pattern a bit trickier |
15:11
🔗
|
dx |
it's not a website from the 90s luckily |
15:13
🔗
|
Arkiver2 |
yipdw: maybe ignore everything from servermods.cursecdn.com/ |
15:13
🔗
|
yipdw |
done |
15:13
🔗
|
Arkiver2 |
as all filles are form there |
15:13
🔗
|
Arkiver2 |
ah that's even better |
15:13
🔗
|
Arkiver2 |
thanks |
17:53
🔗
|
pluesch |
i hate sites that are impossible to archive... |
18:01
🔗
|
joepie91 |
nothing is impossible ;) |
18:01
🔗
|
joepie91 |
it's just a challenge! |
18:02
🔗
|
dx |
unless they are already dead |
18:03
🔗
|
joepie91 |
... okay, point taken. |
18:03
🔗
|
joepie91 |
:P |
18:17
🔗
|
pluesch |
joepie91: tumblr.com :D |
18:17
🔗
|
pluesch |
they limit everything |
18:17
🔗
|
pluesch |
for example: notes are limited to 4980 and I don't know why |
18:18
🔗
|
pluesch |
so if there is a post with lets say 10000 notes, it is impossible to find out all likes, reblogs |
18:19
🔗
|
xmc |
sounds like they tried to limit it to 250 pages of 20, but got it wrong |
18:23
🔗
|
joepie91 |
off-by-one, whoo |
18:23
🔗
|
joepie91 |
pluesch: how is the remainder normally visible? |
18:23
🔗
|
joepie91 |
surely they don't *completely* hide them |
18:40
🔗
|
pluesch |
joepie91: what do you mean? |
18:40
🔗
|
pluesch |
its not visible the data just gets lost in their database |
18:40
🔗
|
pluesch |
uhhh I hate it when data gets lost |
18:43
🔗
|
midas |
did someone say mongodb on 32bit? |
18:49
🔗
|
joepie91 |
pluesch: I suspect it can still be accessed |
18:49
🔗
|
joepie91 |
just through a different method |
18:51
🔗
|
pluesch |
joepie91: I've tried many ways ... api v1, api v2, normal page |
18:52
🔗
|
pluesch |
but yeah ... still have to check some things (tumblr android app api calls, "undocumented" api functions) |
19:12
🔗
|
espes__ |
of general interest: https://github.com/espes/jdget |
19:12
🔗
|
espes__ |
if anyone ever needs to pull a bunch of links from file lockers |
19:13
🔗
|
espes__ |
ping me and maybe I'll update / maintain it |
19:23
🔗
|
pluesch |
even with tumblr android app the 4980 limit is there.... |
19:23
🔗
|
pluesch |
-.- |
19:54
🔗
|
joepie91 |
pluesch: is it possible that the notes displayed differ depending on what reblog you're looking at |
19:54
🔗
|
xmc |
I think they don't |
19:54
🔗
|
joepie91 |
similar to how twitter sometimes shows an entirely different conversation flow on a different reply |
19:55
🔗
|
xmc |
tumblr notes are attached to the thread root |
19:55
🔗
|
joepie91 |
bah |
19:55
🔗
|
* |
joepie91 crumples up paper and aims at circular filing bin |
20:05
🔗
|
DFJustin |
espes__: ooooh |
20:06
🔗
|
DFJustin |
does it do youtube? jdownloader2 is one of the few tools I've found that will correctly grab 1080p videos |
20:06
🔗
|
pluesch |
joepie91: nope, checked |
20:06
🔗
|
pluesch |
so |
20:06
🔗
|
pluesch |
what should i do now? |
20:06
🔗
|
DFJustin |
but I can't really recommend it to people because it's such a heap |
20:06
🔗
|
pluesch |
ask tumblr to improve there api? XD |
20:07
🔗
|
pluesch |
their* |
20:07
🔗
|
pluesch |
DFJustin: youtube-dl can 1080p too? |
20:07
🔗
|
pluesch |
bestvideo+bestaudio |
20:12
🔗
|
joepie91 |
DFJustin: huh? youtube-dl does 1080p fine |
20:12
🔗
|
DFJustin |
yeah but it's not the default and you need ffmpeg |
20:12
🔗
|
joepie91 |
also, so does freerapid afaik |
20:12
🔗
|
joepie91 |
...? |
20:12
🔗
|
joepie91 |
DFJustin: link me a 1080p video? |
20:13
🔗
|
xmc |
youtube-dl usually works for me |
20:15
🔗
|
pluesch |
tumblr has just blocked my ip :/ |
20:15
🔗
|
pluesch |
damn. |
20:15
🔗
|
pluesch |
I just wanted to mirror 160 blogs |
20:16
🔗
|
RedType |
did you try saying you're googlebot? |
20:16
🔗
|
DFJustin |
https://www.youtube.com/watch?v=t8YXut6_56c |
20:17
🔗
|
RedType |
"oh what's that, you need me to register before viewing or my IP is 2much4u?" -A "Googlebot/2.1 (+http://www.google.com/bot.html)" |
20:18
🔗
|
DFJustin |
youtube-dl with default parameters gets 720p (silently!) |
20:20
🔗
|
pluesch |
will try that |
20:23
🔗
|
DFJustin |
ah --format bestvideo+bestaudio looks like it works, must be new |
20:24
🔗
|
DFJustin |
before you had to manually specify the correct numbers |
20:25
🔗
|
danneh_ |
yeah, that'll get 1080p with youtube-dl, but you need to merge the audio/video streams with some other tool later |
20:25
🔗
|
DFJustin |
the dependency on ffmpeg is also problematic, when I tried to get this working with sketchcow before his system ffmpeg was broken somehow |
20:25
🔗
|
DFJustin |
and in my case (windows) it doesn't seem to like unicode filenames |
20:26
🔗
|
danneh_ |
I believe, last time I checked you had to merge it manually at least |
20:26
🔗
|
DFJustin |
it does call out to ffmpeg to merge them |
20:26
🔗
|
danneh_ |
oh, that's shiny |
20:27
🔗
|
danneh_ |
pluesch: what're you using to download tumblr blogs? |
20:27
🔗
|
danneh_ |
I've downloaded a few and haven't really had any issues, just going through and wgetting the thing |
20:28
🔗
|
danneh_ |
had to modify the wget source code to grab another tag properly though (could probably be done via the lua extensions, but I haven't looked into that yet) |
20:29
🔗
|
pluesch |
danneh_: https://github.com/bbolli/tumblr-utils tumblr_backup.py with a few modifications |
20:30
🔗
|
pluesch |
will release it soon |
20:30
🔗
|
danneh_ |
ah, fair enough |
20:31
🔗
|
pluesch |
and i have created a script that goes through all notes to create a list of blogs |
20:31
🔗
|
pluesch |
^^ |
20:31
🔗
|
pluesch |
"all notes" == the 4890 notes per post |
20:32
🔗
|
danneh_ |
aha, nice |
20:32
🔗
|
danneh_ |
if you don't hit the api they let you download at full speed in my experience |
20:33
🔗
|
pluesch |
it hits the old api (blogname.tumblr.com/api/read) and that's the problem i guess |
20:33
🔗
|
pluesch |
media download isn't a problem yeah :) |
20:34
🔗
|
RedType |
hmm... is there a tumblr screenreading based api or something? |
20:34
🔗
|
RedType |
or just hitting their ajax endpoint? |
20:34
🔗
|
RedType |
err what im asking is if anyone has written such a thing |
20:35
🔗
|
danneh_ |
I'll try to clean up my script and throw it online sometime |
20:35
🔗
|
danneh_ |
also includes a little webserver to host blogs after they've been downloaded for fun |
20:57
🔗
|
raylee |
LOL |
20:57
🔗
|
raylee |
http://partners.disney.com/throwback |