#archiveteam 2012-01-01,Sun

↑back Search

Time Nickname Message
00:26 πŸ”— godane do you guys backup slashdot?
00:31 πŸ”— SketchCow Not well enough
01:05 πŸ”— bsmith093 SketchCow: isnt that specifically against most cc licenses?
01:09 πŸ”— chronomex CC-NC only
02:17 πŸ”— irata hi all, i'm trying to do a little tiny personal archive project, and have a question.
02:18 πŸ”— irata i had an old Lycos HTMLGear guestbook, and want to archive it. Of course Lycos doesn't give any d/l option.
02:20 πŸ”— irata it displays five entries per page, and you navigate between pages by clicking a form submit button.
02:20 πŸ”— irata my question is: what kind of tools could anyone recommend for grabbing data from a site setup this way?
02:35 πŸ”— irata :-\
02:36 πŸ”— Wyatt|Wor Hmm, have you tried wget on it? I'm still pretty low level at this archival stuff, but it's kind of the go-to tool.
02:37 πŸ”— irata yeah, i'm a total newbie in this regard. i figured if anyone could tell me what to do, it'd be archiveteam. :P
02:47 πŸ”— irata ok, so i'm looking thru wget's docs right now... but it seems like it only follows links. problem for me is this guestbook doesn't use links to go from page to page. it uses a form submit button
02:47 πŸ”— Wyatt|Wor Uhm, hang on. I know I dealt with something like this a couple weeks ago...
02:48 πŸ”— Schbirid maybe the URLs are easy?
02:48 πŸ”— Schbirid eg jkust soome number increments
02:48 πŸ”— Schbirid i should to do bed
02:48 πŸ”— Schbirid ugh
02:48 πŸ”— Schbirid 4am
02:48 πŸ”— Schbirid good night =D
02:48 πŸ”— Schbirid and happy new year
02:48 πŸ”— irata night
02:49 πŸ”— Wyatt|Wor There is that.
02:50 πŸ”— Wyatt|Wor And if not, look into using --post-file or --post-data
02:50 πŸ”— irata since it uses form submits, the url never actually changes
02:53 πŸ”— Wyatt|Wor (I was hacking a module into wgetpaste for our internal pastebin and I learned of those from that)
05:16 πŸ”— bsmith093 woop woop woop HAPPY NEW YEAR!!!11!! but seriously, yay one year left till the end
05:18 πŸ”— dnova one can only hope
05:18 πŸ”— godane i think i may found something for twitter clone
05:18 πŸ”— godane called bup
05:19 πŸ”— godane https://github.com/apenwarr/bup
05:21 πŸ”— Wyatt|Wor godane: Don't like status.net?
05:22 πŸ”— godane never seen that one
05:22 πŸ”— Wyatt|Wor It's what identi.ca uses, IIRC.
05:23 πŸ”— Wyatt|Wor OH!
05:23 πŸ”— Wyatt|Wor Twitter backup, not twitter-like
05:23 πŸ”— godane yes
05:24 πŸ”— Wyatt|Wor I misinterpreted "Twitter clone"
05:24 πŸ”— godane i was think twitter thats like git/bup
05:25 πŸ”— godane where there is no center server
05:26 πŸ”— Wyatt|Wor bup looks pretty neat, I'll give you that!
05:27 πŸ”— godane https://www.youtube.com/watch?v=u_rOi2OVvwU&feature=channel_video_title
07:08 πŸ”— chronomex go out and party, guys
07:10 πŸ”— Wyatt|Wor Can't. Have qmail to battle.
07:11 πŸ”— SketchCow Happppy neeewww yeearhrfkjdfdhf
07:12 πŸ”— Coderjoe mmm
07:12 πŸ”— Coderjoe a tenth of capn 100 proof
07:13 πŸ”— Coderjoe on an empty stomach over 4 hours. (with 1L of coke)
07:13 πŸ”— Coderjoe in other news, I can still spell?
07:15 πŸ”— Wyatt|Wor Seems to be going well for you in that regard.
07:18 πŸ”— Coderjoe well, I had to call in reenforcements to get home
07:24 πŸ”— Coderjoe btw, zip (without zip64 extensions) is limited to 4 GiB, iirc
07:27 πŸ”— Wyatt|Wor But is that compressed or uncompressed size?
07:28 πŸ”— Coderjoe compressed
07:29 πŸ”— Coderjoe well, as long as you try following the central directory at the end of the file, rather than scanning for zip file-record headers
07:30 πŸ”— Coderjoe the file offset field is, iirc, a 32-bit unsigned integer
07:30 πŸ”— Coderjoe i suppose if you haxed up a multi-volume archive you could get around the problem, as long as no part was over 4 GB
07:31 πŸ”— Coderjoe there is a separate field of the central directory for which "disk" the file starts on
09:30 πŸ”— DFJustin wonder if that zipview.php can be hooked up to the unarchiver http://wakaba.c3.cx/s/apps/unarchiver
09:30 πŸ”— DFJustin would allow browsing the shareware isos and such too
11:40 πŸ”— emijrp SketchCow: http://www.oclc.org/worldcat/newgrow.htm
15:15 πŸ”— ersi 1L of coke, whoa
18:26 πŸ”— gui77 for mobileme, do you guys download only, then upload on
18:26 πŸ”— gui77 *downlaod, then upload, then download, then upload, and just repeat the process? or do you do both at once?
18:38 πŸ”— Nemo_bis gui77, when I asked (for Splinder) I was told that it's better to keep everything on your disk until data is not published on archive.org
18:38 πŸ”— Nemo_bis otherwise, you should be able to do both at the same time
18:38 πŸ”— Nemo_bis if it makes sense because you have enough bandwidth
19:33 πŸ”— gui77 Nemo_bis: do yo umean until it IS published?
19:33 πŸ”— gui77 *you
19:33 πŸ”— Nemo_bis I suppose so, "until" in English always confuses me
19:33 πŸ”— gui77 aha :) what's your native language?
19:34 πŸ”— Nemo_bis Italian
19:34 πŸ”— gui77 i have enough bandwidth (home connection, not very fast but it's not capped) - my main problem is disk space
19:34 πŸ”— Nemo_bis "finché" or "fintantoché" etc. can be used for both
19:34 πŸ”— gui77 i'm portuguese!
19:34 πŸ”— Nemo_bis so just rsync with delete option while you keep downloading
19:34 πŸ”— Nemo_bis :)
19:35 πŸ”— gui77 i'll run it the second time (to check) with the delete option, good idea
19:35 πŸ”— gui77 bbl
19:42 πŸ”— Coderjoe no, do not just rsync with delete option
19:42 πŸ”— Coderjoe use the upload script, as it skips incompletes
19:43 πŸ”— Coderjoe (which includes profiles you are currently downloading)
20:59 πŸ”— gui77 Coderjoe: ok then
20:59 πŸ”— gui77 but then i have to erase everything manually, assuming i want to downlaod and re-up more, right?

irclogger-viewer