Time |
Nickname |
Message |
00:26
π
|
godane |
do you guys backup slashdot? |
00:31
π
|
SketchCow |
Not well enough |
01:05
π
|
bsmith093 |
SketchCow: isnt that specifically against most cc licenses? |
01:09
π
|
chronomex |
CC-NC only |
02:17
π
|
irata |
hi all, i'm trying to do a little tiny personal archive project, and have a question. |
02:18
π
|
irata |
i had an old Lycos HTMLGear guestbook, and want to archive it. Of course Lycos doesn't give any d/l option. |
02:20
π
|
irata |
it displays five entries per page, and you navigate between pages by clicking a form submit button. |
02:20
π
|
irata |
my question is: what kind of tools could anyone recommend for grabbing data from a site setup this way? |
02:35
π
|
irata |
:-\ |
02:36
π
|
Wyatt|Wor |
Hmm, have you tried wget on it? I'm still pretty low level at this archival stuff, but it's kind of the go-to tool. |
02:37
π
|
irata |
yeah, i'm a total newbie in this regard. i figured if anyone could tell me what to do, it'd be archiveteam. :P |
02:47
π
|
irata |
ok, so i'm looking thru wget's docs right now... but it seems like it only follows links. problem for me is this guestbook doesn't use links to go from page to page. it uses a form submit button |
02:47
π
|
Wyatt|Wor |
Uhm, hang on. I know I dealt with something like this a couple weeks ago... |
02:48
π
|
Schbirid |
maybe the URLs are easy? |
02:48
π
|
Schbirid |
eg jkust soome number increments |
02:48
π
|
Schbirid |
i should to do bed |
02:48
π
|
Schbirid |
ugh |
02:48
π
|
Schbirid |
4am |
02:48
π
|
Schbirid |
good night =D |
02:48
π
|
Schbirid |
and happy new year |
02:48
π
|
irata |
night |
02:49
π
|
Wyatt|Wor |
There is that. |
02:50
π
|
Wyatt|Wor |
And if not, look into using --post-file or --post-data |
02:50
π
|
irata |
since it uses form submits, the url never actually changes |
02:53
π
|
Wyatt|Wor |
(I was hacking a module into wgetpaste for our internal pastebin and I learned of those from that) |
05:16
π
|
bsmith093 |
woop woop woop HAPPY NEW YEAR!!!11!! but seriously, yay one year left till the end |
05:18
π
|
dnova |
one can only hope |
05:18
π
|
godane |
i think i may found something for twitter clone |
05:18
π
|
godane |
called bup |
05:19
π
|
godane |
https://github.com/apenwarr/bup |
05:21
π
|
Wyatt|Wor |
godane: Don't like status.net? |
05:22
π
|
godane |
never seen that one |
05:22
π
|
Wyatt|Wor |
It's what identi.ca uses, IIRC. |
05:23
π
|
Wyatt|Wor |
OH! |
05:23
π
|
Wyatt|Wor |
Twitter backup, not twitter-like |
05:23
π
|
godane |
yes |
05:24
π
|
Wyatt|Wor |
I misinterpreted "Twitter clone" |
05:24
π
|
godane |
i was think twitter thats like git/bup |
05:25
π
|
godane |
where there is no center server |
05:26
π
|
Wyatt|Wor |
bup looks pretty neat, I'll give you that! |
05:27
π
|
godane |
https://www.youtube.com/watch?v=u_rOi2OVvwU&feature=channel_video_title |
07:08
π
|
chronomex |
go out and party, guys |
07:10
π
|
Wyatt|Wor |
Can't. Have qmail to battle. |
07:11
π
|
SketchCow |
Happppy neeewww yeearhrfkjdfdhf |
07:12
π
|
Coderjoe |
mmm |
07:12
π
|
Coderjoe |
a tenth of capn 100 proof |
07:13
π
|
Coderjoe |
on an empty stomach over 4 hours. (with 1L of coke) |
07:13
π
|
Coderjoe |
in other news, I can still spell? |
07:15
π
|
Wyatt|Wor |
Seems to be going well for you in that regard. |
07:18
π
|
Coderjoe |
well, I had to call in reenforcements to get home |
07:24
π
|
Coderjoe |
btw, zip (without zip64 extensions) is limited to 4 GiB, iirc |
07:27
π
|
Wyatt|Wor |
But is that compressed or uncompressed size? |
07:28
π
|
Coderjoe |
compressed |
07:29
π
|
Coderjoe |
well, as long as you try following the central directory at the end of the file, rather than scanning for zip file-record headers |
07:30
π
|
Coderjoe |
the file offset field is, iirc, a 32-bit unsigned integer |
07:30
π
|
Coderjoe |
i suppose if you haxed up a multi-volume archive you could get around the problem, as long as no part was over 4 GB |
07:31
π
|
Coderjoe |
there is a separate field of the central directory for which "disk" the file starts on |
09:30
π
|
DFJustin |
wonder if that zipview.php can be hooked up to the unarchiver http://wakaba.c3.cx/s/apps/unarchiver |
09:30
π
|
DFJustin |
would allow browsing the shareware isos and such too |
11:40
π
|
emijrp |
SketchCow: http://www.oclc.org/worldcat/newgrow.htm |
15:15
π
|
ersi |
1L of coke, whoa |
18:26
π
|
gui77 |
for mobileme, do you guys download only, then upload on |
18:26
π
|
gui77 |
*downlaod, then upload, then download, then upload, and just repeat the process? or do you do both at once? |
18:38
π
|
Nemo_bis |
gui77, when I asked (for Splinder) I was told that it's better to keep everything on your disk until data is not published on archive.org |
18:38
π
|
Nemo_bis |
otherwise, you should be able to do both at the same time |
18:38
π
|
Nemo_bis |
if it makes sense because you have enough bandwidth |
19:33
π
|
gui77 |
Nemo_bis: do yo umean until it IS published? |
19:33
π
|
gui77 |
*you |
19:33
π
|
Nemo_bis |
I suppose so, "until" in English always confuses me |
19:33
π
|
gui77 |
aha :) what's your native language? |
19:34
π
|
Nemo_bis |
Italian |
19:34
π
|
gui77 |
i have enough bandwidth (home connection, not very fast but it's not capped) - my main problem is disk space |
19:34
π
|
Nemo_bis |
"finchΓΒ©" or "fintantochΓΒ©" etc. can be used for both |
19:34
π
|
gui77 |
i'm portuguese! |
19:34
π
|
Nemo_bis |
so just rsync with delete option while you keep downloading |
19:34
π
|
Nemo_bis |
:) |
19:35
π
|
gui77 |
i'll run it the second time (to check) with the delete option, good idea |
19:35
π
|
gui77 |
bbl |
19:42
π
|
Coderjoe |
no, do not just rsync with delete option |
19:42
π
|
Coderjoe |
use the upload script, as it skips incompletes |
19:43
π
|
Coderjoe |
(which includes profiles you are currently downloading) |
20:59
π
|
gui77 |
Coderjoe: ok then |
20:59
π
|
gui77 |
but then i have to erase everything manually, assuming i want to downlaod and re-up more, right? |