Time |
Nickname |
Message |
00:02
🔗
|
|
jschwart has quit IRC (Quit: Konversation terminated!) |
00:03
🔗
|
CoolCanuk |
wiki seems better :O |
00:30
🔗
|
|
jmtd is now known as Jon |
01:10
🔗
|
|
Mateon1 has quit IRC (Ping timeout: 245 seconds) |
01:12
🔗
|
|
Mateon1 has joined #archiveteam-bs |
01:16
🔗
|
|
Stilett0 has joined #archiveteam-bs |
01:50
🔗
|
|
purplebot has quit IRC (Ping timeout: 248 seconds) |
01:50
🔗
|
|
medowar has quit IRC (Ping timeout: 248 seconds) |
01:50
🔗
|
|
i0npulse has quit IRC (Ping timeout: 248 seconds) |
01:50
🔗
|
|
Rai-chan has quit IRC (Ping timeout: 248 seconds) |
01:50
🔗
|
|
dboard2 has quit IRC (Ping timeout: 248 seconds) |
01:50
🔗
|
|
purplebot has joined #archiveteam-bs |
01:50
🔗
|
|
medowar has joined #archiveteam-bs |
01:51
🔗
|
|
medowar has quit IRC (se.hub irc.underworld.no) |
01:51
🔗
|
|
purplebot has quit IRC (se.hub irc.underworld.no) |
01:59
🔗
|
|
dboard2 has joined #archiveteam-bs |
02:22
🔗
|
SketchCow |
Wow, CoolCanuk - that's.... not quite how I'd have uploaded those texts, item name wise |
02:22
🔗
|
CoolCanuk |
Howdy. The title or id? |
02:23
🔗
|
CoolCanuk |
I can reupload them properly if you want. But I'm not sure what the "best way" is to name them. |
02:25
🔗
|
CoolCanuk |
how the heck did it get added to newspapers?! |
02:26
🔗
|
SketchCow |
I added them |
02:26
🔗
|
CoolCanuk |
oh |
02:26
🔗
|
CoolCanuk |
the guides are not newspapers. |
02:26
🔗
|
SketchCow |
They're all called coolcanu* so there's no way to know |
02:27
🔗
|
CoolCanuk |
the title :) |
02:27
🔗
|
SketchCow |
mmmm. |
02:27
🔗
|
SketchCow |
Not how it works in the system |
02:27
🔗
|
SketchCow |
Anyway, do better next time. |
02:28
🔗
|
CoolCanuk |
I can't do better if you can't tell me what they should be named as |
02:28
🔗
|
CoolCanuk |
I put in over 4 days of effort to get and upload those files. I want to make it right but it's hard to get advice |
02:28
🔗
|
SketchCow |
Mmm |
02:28
🔗
|
SketchCow |
Yes, please snap at me |
02:29
🔗
|
SketchCow |
That has always worked |
02:29
🔗
|
SketchCow |
And let's have a "days of effort" competition |
02:30
🔗
|
CoolCanuk |
I didn't intent to come across that way. Would you like me to delete and reupload? |
02:30
🔗
|
SketchCow |
You can't delete. |
02:33
🔗
|
SketchCow |
https://archive.org/details/@starbritescanz |
02:33
🔗
|
SketchCow |
yay |
02:36
🔗
|
SketchCow |
----- ARCHIVEBOT CAUGHT UP ON FOS ----- |
02:36
🔗
|
SketchCow |
Took a while. |
03:24
🔗
|
Somebody2 |
I suspect a better identifier name would be one that clarified the subject more, and likely included the date. |
03:25
🔗
|
CoolCanuk |
Thanks so much! I will use that next time. I really appreciate the help :D |
03:37
🔗
|
Somebody2 |
Eh, I do what I can. |
04:03
🔗
|
|
second_ has joined #archiveteam-bs |
04:04
🔗
|
|
second_ has quit IRC (Client Quit) |
04:04
🔗
|
|
second_ has joined #archiveteam-bs |
04:05
🔗
|
|
second_ is now known as sec0nd |
04:12
🔗
|
jrwr |
Awesome SketchCow |
04:28
🔗
|
|
qw3rty117 has joined #archiveteam-bs |
04:30
🔗
|
CoolCanuk |
SketchCow: can I continue to upload papers (properly) like I've done, or will they likely be removed later (eg: not public domain/cc)? |
04:32
🔗
|
Somebody2 |
Eh, don't rely on them being available for you to download again, but I don't think that's a sufficient reason not to upload things. |
04:33
🔗
|
|
qw3rty116 has quit IRC (Read error: Operation timed out) |
04:34
🔗
|
CoolCanuk |
I am sorry for uploading wrong. I will try harder nex ttime |
04:34
🔗
|
CoolCanuk |
*next |
05:05
🔗
|
CoolCanuk |
do I need to request any type of disclaimer if it shows "sensitive content"? or nah |
05:06
🔗
|
Somebody2 |
No. |
05:07
🔗
|
CoolCanuk |
okay :D |
05:08
🔗
|
Somebody2 |
IA allows random people to upload random things. There's nothing you (or anyone else) *needs* to do; all filtering is post-facto. |
05:09
🔗
|
Somebody2 |
It is *nice* if the uploader adds as much metadata as possible, and uses a descriptive identifier -- but it isn't a "need". |
05:09
🔗
|
CoolCanuk |
it's not beheading or anything crazy. it's nazis, hitler, weapons, mentioning of fallen soldiers, |
05:09
🔗
|
CoolCanuk |
ah ok. so only description is what i can do? |
05:37
🔗
|
|
balrog has quit IRC (Read error: Operation timed out) |
05:43
🔗
|
|
balrog has joined #archiveteam-bs |
05:43
🔗
|
|
swebb sets mode: +o balrog |
05:51
🔗
|
bitspill |
CoolCanuk: what sorts of Nazi, Hitler, etc content do you have? |
05:51
🔗
|
CoolCanuk |
ecks dee |
05:52
🔗
|
CoolCanuk |
https://archive.org/details/RemembranceDay2006TributeSoldiersCry |
05:52
🔗
|
CoolCanuk |
it's like no more than 10 seconds |
05:54
🔗
|
bitspill |
Ok, thanks. Not quite what I was hoping for but I'll save for later. |
05:54
🔗
|
CoolCanuk |
:P |
05:54
🔗
|
bitspill |
I've been seeking out WW2 combat reports and old footage |
05:54
🔗
|
CoolCanuk |
sorry to disappoint |
05:54
🔗
|
CoolCanuk |
it appears to be war footage. Unlikely it was set-up. |
05:54
🔗
|
bitspill |
Wasn't sure if you had casualty reports from the 40's or something |
05:55
🔗
|
CoolCanuk |
but again like 10 seconds :P and the "march" is easily findable |
05:55
🔗
|
CoolCanuk |
sadly no |
05:55
🔗
|
bitspill |
All good. I'll find it elsewhere someday |
05:56
🔗
|
bitspill |
More and more though it seems I'll have to be getting it from NARA in St Louis |
06:02
🔗
|
CoolCanuk |
depressing facebook. https://usercontent.irccloud-cdn.com/file/IjBGVPdY/image.png |
06:08
🔗
|
|
Soni has quit IRC (Ping timeout: 255 seconds) |
06:46
🔗
|
|
fie has quit IRC (Quit: Leaving) |
06:47
🔗
|
|
medowar has joined #archiveteam-bs |
06:47
🔗
|
|
purplebot has joined #archiveteam-bs |
06:48
🔗
|
|
Rai-chan has joined #archiveteam-bs |
06:51
🔗
|
|
i0npulse has joined #archiveteam-bs |
06:57
🔗
|
|
ZexaronS has quit IRC (Quit: Leaving) |
07:43
🔗
|
|
schbirid has joined #archiveteam-bs |
08:22
🔗
|
|
jschwart has joined #archiveteam-bs |
08:24
🔗
|
|
schbirid has quit IRC (Ping timeout: 255 seconds) |
08:37
🔗
|
|
schbirid has joined #archiveteam-bs |
09:04
🔗
|
|
Soni has joined #archiveteam-bs |
09:35
🔗
|
|
CoolCanuk has quit IRC (Quit: Connection closed for inactivity) |
10:04
🔗
|
|
mr_archiv has joined #archiveteam-bs |
10:05
🔗
|
mr_archiv |
Hi, I am new here. |
10:06
🔗
|
mr_archiv |
I have taken in interest in the #webroasting project, I found a few additional websites. |
10:07
🔗
|
mr_archiv |
s/in interest/an interest |
10:09
🔗
|
mr_archiv |
Should I discuss this on #webroasting? I did not see any activity in the logs so that is why I am asking here first. |
10:26
🔗
|
Igloo |
Hi, I'm not sure what the status of that project is |
10:56
🔗
|
|
tobbez has joined #archiveteam-bs |
11:05
🔗
|
Jon |
hmmm wonder if we've got linux journal all sorted |
11:06
🔗
|
Igloo |
It's in archivebot |
11:36
🔗
|
|
BlueMaxim has quit IRC (Read error: Connection reset by peer) |
14:26
🔗
|
|
nyaomi has quit IRC (Read error: Operation timed out) |
14:46
🔗
|
|
K4k has quit IRC (Quit: WeeChat 1.9.1) |
14:48
🔗
|
|
K4k has joined #archiveteam-bs |
15:46
🔗
|
godane |
!ao https://www.washingtonpost.com/graphics/2017/local/las-vegas-teens/ |
15:46
🔗
|
godane |
its in archivebot now |
15:47
🔗
|
SketchCow |
godane: Is there a chance you could photograph the labels of the tapes you were sent? |
15:52
🔗
|
godane |
i don't have a camera |
16:06
🔗
|
sec0nd |
Can I have archive bot permissions? |
16:12
🔗
|
Igloo |
Join the channel |
16:12
🔗
|
Igloo |
If you join sec0nd i'll voice you |
16:15
🔗
|
sec0nd |
Which channel? |
16:16
🔗
|
|
Lord_Nigh has quit IRC (Read error: Operation timed out) |
16:17
🔗
|
Igloo |
#archivebot |
16:21
🔗
|
sec0nd |
Seems there is no way to register nicks with Efnet how do I make sure I always have my nick? |
16:22
🔗
|
Jonimus |
sec0nd: you don't |
16:22
🔗
|
sec0nd |
seems rather insecure |
16:22
🔗
|
Jonimus |
EFnet is the lawless west |
16:22
🔗
|
sec0nd |
why did #archiveteam not use freenode? |
16:28
🔗
|
Kaz |
freenode sucks |
16:35
🔗
|
sec0nd |
doesn't seem so bad to me |
16:38
🔗
|
|
Stilett0 is now known as Stiletto |
17:01
🔗
|
|
Lord_Nigh has joined #archiveteam-bs |
17:10
🔗
|
|
nyaomi has joined #archiveteam-bs |
17:14
🔗
|
jrwr |
Its come up before sec0nd, its mostly ~REASONS~ |
17:24
🔗
|
|
ola_norsk has joined #archiveteam-bs |
17:24
🔗
|
ola_norsk |
hi. Anyone know if waybackmachine prunes captures down to nearest hour? |
17:25
🔗
|
ola_norsk |
if so, there's little use in me doing captures of a twitter hashtag every 3rd minute :/ |
17:27
🔗
|
JAA |
I don't think it prunes them, but it certainly does seem like it only *displays* one capture per hour in the list. |
17:27
🔗
|
ola_norsk |
aye |
17:27
🔗
|
JAA |
... with no way to get a full list, as far as I can see. |
17:27
🔗
|
JAA |
Which is super annoying. |
17:28
🔗
|
JAA |
Also, the arrows in the top bar no longer lead to the previous/next capture as they used to, but to the previous/next *date*. |
17:28
🔗
|
JAA |
So if you want to go through the captures, you probably have to manually modify the URL. |
17:29
🔗
|
JAA |
E.g. decrement in 1-minute steps until you're redirected to the next earlier capture. |
17:29
🔗
|
JAA |
Maybe the API returns all captures, haven't checked. |
17:29
🔗
|
JAA |
I wanted to write to IA about this months ago but forgot about it. |
17:30
🔗
|
ola_norsk |
yeah, been wondering about making some python script that reads the twitter page, grabs all urls found, check trough the wayback json thingy if it already exist, then request that url trough their /save/ thingy if does not exist. |
17:30
🔗
|
ola_norsk |
still gonna let it run though, and hope they dont actually prune |
17:31
🔗
|
ola_norsk |
it says currently today 88+ captures, so |
17:31
🔗
|
JAA |
Yeah, exactly. I strongly doubt they do. |
18:00
🔗
|
schbirid |
doesnt ia get a full twitter feed anyways? |
18:03
🔗
|
zino |
The library of Congress gets the full feed. I don't know of anyone else that does. |
18:04
🔗
|
|
Valentine has quit IRC (Read error: Connection reset by peer) |
18:07
🔗
|
|
Valentine has joined #archiveteam-bs |
18:08
🔗
|
zino |
I see it's the time of year we throw some money at IA. |
18:08
🔗
|
Frogging |
I'm fresh out of money |
18:08
🔗
|
ola_norsk |
let me know if it does so |
18:09
🔗
|
ola_norsk |
doesn't library of congress go to IA? |
18:09
🔗
|
zino |
Unlikely |
18:10
🔗
|
zino |
Library of Congress ingests the data but blacks it out until it's historical. |
18:10
🔗
|
JAA |
Frogging: Yeah, Christmas definitely seems like the perfect time to ask for donations, since nobody's spending their money on anything else around this time of the year. |
18:10
🔗
|
ola_norsk |
zino: that sounds like bogus thing to do |
18:11
🔗
|
ola_norsk |
zino: "Let's us decide what is history worthy"-kind of thing :/ |
18:13
🔗
|
zino |
No, it's standard practise in a lot of historical archiving. It's not decideing what is history worthy, it's keeping it blacked out for a set amount of time to avoid complaints and demands to redact personal information. Many institutions handle it that way. |
18:13
🔗
|
ola_norsk |
https://youtu.be/NdZxI3nFVJs |
18:14
🔗
|
ola_norsk |
still bogus i think, since complaints and demands, basically reactions would IMO also be history |
18:14
🔗
|
JAA |
zino: Surely the tweets from many years ago should be considered historical by now (technology evolving fast etc.). But as far as I know, the archives are not accessible anywhere, are they? |
18:15
🔗
|
ola_norsk |
"you can't see this yet"...that's a bit shady IMO |
18:17
🔗
|
ola_norsk |
(i think i just used 'shady' in it's correct definition for the first time, ever) :D |
18:18
🔗
|
zino |
JAA: I don't know if LoC even has the infrastructure to search their own twitter database yet. My bets are on no. |
18:19
🔗
|
zino |
https://www.theatlantic.com/technology/archive/2016/08/can-twitter-fit-inside-the-library-of-congress/494339/ |
18:20
🔗
|
zino |
Ah, the embargo is only supposed to be six months. |
18:20
🔗
|
zino |
But deleted posts will not be visible. |
18:20
🔗
|
ola_norsk |
mhm |
18:21
🔗
|
ola_norsk |
zino: would tweets from deleted or banned twitter accounts be there? |
18:22
🔗
|
zino |
No idea. |
18:23
🔗
|
zino |
Oh, you can buy a full feed. |
18:23
🔗
|
ola_norsk |
yeah lol |
18:24
🔗
|
ola_norsk |
how much for every tweet containing "#netneutrality" ? |
18:24
🔗
|
ola_norsk |
:d |
18:24
🔗
|
ola_norsk |
it's how twitter makes a lot of money :) |
18:32
🔗
|
ola_norsk |
there was a research done not that long ago, about "internet harrasment"; There's is no doubt in my mind that that research institution/department simply said "Hey twitter! give us all the tweets containing any of these words" |
18:33
🔗
|
ola_norsk |
for money, of course |
18:39
🔗
|
ola_norsk |
zino: you know that "if you have a room of monkeys with typewrites for an infitie time; eventually they would reproduce a shakespear work" , right? :D |
18:39
🔗
|
ola_norsk |
makes one think of twitter lol :D |
18:41
🔗
|
ola_norsk |
https://en.wikipedia.org/wiki/Infinite_monkey_theorem |
18:46
🔗
|
ola_norsk |
it would require a "total library" though... |
19:01
🔗
|
ola_norsk |
"but for every sensible line or accurate fact there would be millions of meaningless cacophonies, verbal farragoes, and babblings. Everything: but all the generations of mankind could pass before the dizzying shelves—shelves that obliterate the day and on which chaos lies—ever reward them with a tolerable page." |
19:13
🔗
|
|
CoolCanuk has joined #archiveteam-bs |
19:20
🔗
|
ola_norsk |
what could be the closest thing to an Internet Archive 'pitch deck' presentation, other than the entire site itself? |
19:20
🔗
|
ola_norsk |
some sum-up video etc. |
19:25
🔗
|
ola_norsk |
it's a shame i can take a 3 minute walk, and go into my local museum, where they literally moved old buildings and log-cabins into, to preserve them..and yet Archiving TODAY seems very much less priotizied |
19:26
🔗
|
astrid |
things aren't valued until they are old |
19:27
🔗
|
astrid |
part of the social role of an archivist is to safeguard things that are not yet old |
19:27
🔗
|
ola_norsk |
but shit dissapears in basically 14 days now-a-days :/ |
19:27
🔗
|
astrid |
yes |
19:28
🔗
|
astrid |
this is why archiveteam exists |
19:28
🔗
|
ola_norsk |
aye |
19:28
🔗
|
astrid |
it feels sisyphean sometimes |
19:28
🔗
|
astrid |
but it is what we do |
19:31
🔗
|
ola_norsk |
the benefit of internet is that it can be copied though. No need to dismantle it and carry it to some other location. |
19:31
🔗
|
ola_norsk |
i like this slogan 'Memories give meaning' http://sunnhordland.museum.no/about/ |
19:32
🔗
|
astrid |
"no need to dismantle and carry" ... i take it you haven't moved a website to different infrastructure :P |
19:32
🔗
|
CoolCanuk |
just fyi, if you try to archive that site, it need javascript :( ola_norsk |
19:32
🔗
|
ola_norsk |
lol... |
19:33
🔗
|
ola_norsk |
but just imagine, every building seen in these photos stood somewhere else .. http://sunnhordland.museum.no/heim/om-museet/tun-samlingar/sunnhordlandstunet/ |
19:33
🔗
|
ola_norsk |
dismantled from farms etc. , moved, reassembled, and then kept |
19:34
🔗
|
ola_norsk |
that's some wget stuff |
19:35
🔗
|
ola_norsk |
this one isnt't from the same freakin island.. http://sunnhordland.museum.no/heim/om-museet/stolmastovo-sunnmus/ |
19:35
🔗
|
ola_norsk |
*even |
19:36
🔗
|
ola_norsk |
and yet, internet archiving..nah, that's something too expensive perhaps.. |
19:37
🔗
|
ola_norsk |
the upkeep of these buildings though.. |
19:41
🔗
|
ola_norsk |
mind, that's a very, VERY, small musuem in norway |
19:45
🔗
|
ola_norsk |
for some reason, we norwegians seem to like keeping old shit around. It's why i think it wouldn't take much convincing of government getting a mirror location of IA here. |
19:45
🔗
|
CoolCanuk |
I wonder if the Canadian government could be convincee |
19:45
🔗
|
CoolCanuk |
d |
19:45
🔗
|
CoolCanuk |
(very likely) |
19:45
🔗
|
ola_norsk |
isn't there already a canadian mirror? |
19:46
🔗
|
arkiver |
yes |
19:46
🔗
|
CoolCanuk |
omg I could be a book scanner |
19:46
🔗
|
CoolCanuk |
*drools* |
19:46
🔗
|
astrid |
i have a book scanner, it's a nice machine but gets boring quick :P |
19:47
🔗
|
CoolCanuk |
jobs a job |
19:48
🔗
|
astrid |
yeah |
19:49
🔗
|
astrid |
also i keep taking time off scanning, to hack on the processing pipeline |
19:49
🔗
|
astrid |
so i dont get much scanning done |
19:49
🔗
|
CoolCanuk |
wonder if I could use a DSLR camera to book scan |
19:50
🔗
|
ola_norsk |
humans suck; There's now sex robots and probes on other planets..But nothing to flip a damn page on a book :/ |
19:50
🔗
|
CoolCanuk |
there are mobile apps, which auto detect pages, but the quality wont be that great. A DSLR could work.. |
19:50
🔗
|
CoolCanuk |
some DSLRs let your use it as a "webcam" |
19:51
🔗
|
astrid |
go to youtube and search for "linear bookscanner" |
19:51
🔗
|
CoolCanuk |
O_O |
19:51
🔗
|
astrid |
i chose to get one of the ones where you have to sit, though, because the linear style has a spoil rate of one page per about ten books |
19:51
🔗
|
astrid |
and i have mostly rare books |
19:51
🔗
|
CoolCanuk |
so THATS how they dont fk the book spine up xD |
19:52
🔗
|
CoolCanuk |
I have a book from like 1996.. wonder if I could upload it |
19:52
🔗
|
CoolCanuk |
it's still under copyright :/ |
19:53
🔗
|
|
jschwart has quit IRC (Remote host closed the connection) |
19:55
🔗
|
ola_norsk |
i have a copy of the Snorre's Saga that's older than any person i know. But it's in pretty bad shape. Is there anywhere i could send that SAFELY, and FREE, to get scanned? |
19:55
🔗
|
CoolCanuk |
here's the book :P https://www.amazon.ca/Cybersurfer-Owl-Internet-Guide-Kids/dp/1895688507 |
20:04
🔗
|
ola_norsk |
that's the coolest cover i've seen :d |
20:05
🔗
|
ola_norsk |
stole it! https://web.archive.org/web/20171204200525/https://images-na.ssl-images-amazon.com/images/I/51JWT1RGVHL.jpg# |
20:07
🔗
|
CoolCanuk |
the cover isnt what it looks like in real life |
20:07
🔗
|
CoolCanuk |
it looks ... better in real life.. that has like.. 50% of the color removed or something |
20:08
🔗
|
|
jschwart has joined #archiveteam-bs |
20:09
🔗
|
ola_norsk |
i know, internet is much cooler :D https://youtu.be/Bmz67ErIRa4 |
20:19
🔗
|
ola_norsk |
are there any book-scanning locations to send books to to get them scanned? preferrably for free. Talking about a book that should probably belong to national museum. So can't really ship it by fedex etc. :/ |
20:20
🔗
|
astrid |
sooooo if you can't fedex it then how are you going to ship it |
20:21
🔗
|
ola_norsk |
that's why i am asking what locations there are :/ |
20:26
🔗
|
ola_norsk |
recognized, well-known book-scanning facilities in northern-europe or preferably scandinavia; I guess that is what i am asking about |
20:27
🔗
|
ola_norsk |
if any such place previously have scanned for IA |
20:28
🔗
|
|
schbirid has quit IRC (Remote host closed the connection) |
20:32
🔗
|
ola_norsk |
Seems like national library nb.no is the place to contact :D |
20:34
🔗
|
ola_norsk |
btw, is this collection on IA? https://www.nb.no/search |
20:35
🔗
|
ola_norsk |
(i wouldn't be surprised if access from outside of norway is restricted/blocked) |
20:37
🔗
|
ola_norsk |
https://www.nb.no/search?mediatype=bilder |
20:43
🔗
|
ola_norsk |
how can i detect if a file is already present on IA? |
20:43
🔗
|
CoolCanuk |
all I know is search |
20:43
🔗
|
CoolCanuk |
thats the only way I know of |
20:48
🔗
|
ola_norsk |
seems like national library have their own litte internet archive going, but in case IA is any part of this/these https://www.nb.no/english , it's not useful to re-upload it |
20:49
🔗
|
ola_norsk |
https://www.nb.no/english/collaboration-projects |
20:56
🔗
|
ola_norsk |
"Pliktavlevering" ..when you're shit is so old the museum just comes along and demands it ;) |
20:56
🔗
|
ola_norsk |
i wonder if any other country has that; duty to give :D |
20:57
🔗
|
|
MrDignity has quit IRC (Remote host closed the connection) |
20:57
🔗
|
|
MrDignity has joined #archiveteam-bs |
20:58
🔗
|
|
jschwart has quit IRC (Remote host closed the connection) |
20:59
🔗
|
ola_norsk |
"pliktavleverening" = duty to hand over .. it even got own adress :D |
21:00
🔗
|
ola_norsk |
CoolCanuk: those 1300+ e magazines, how did you aquire those? |
21:01
🔗
|
CoolCanuk |
I scraped the pdf urls |
21:01
🔗
|
CoolCanuk |
https://www.readmetro.com/ |
21:01
🔗
|
CoolCanuk |
pick a city, choose archves, pick a date and an "open pdf" button shows. |
21:01
🔗
|
CoolCanuk |
urls were collected, I put them into Jdownloader |
21:02
🔗
|
|
jschwart has joined #archiveteam-bs |
21:02
🔗
|
ola_norsk |
ok, it seems like a seriously collection, so i had to ask if you manually collected them as they were released :) |
21:03
🔗
|
CoolCanuk |
nope :) |
21:06
🔗
|
ola_norsk |
what did you use to scrape it? |
21:07
🔗
|
CoolCanuk |
https://chrome.google.com/webstore/detail/web-scraper/jnhgnonknehpejjnehehllkliplmbmhn |
21:07
🔗
|
CoolCanuk |
small learning curve, and interface isnt the best but works well |
21:07
🔗
|
ola_norsk |
that saves ALL urls on the page? |
21:08
🔗
|
CoolCanuk |
it saves what you tell it to |
21:08
🔗
|
CoolCanuk |
if you tell it to start at the archive page, click on the image, and gather the pdf url, that's what it will do |
21:08
🔗
|
ola_norsk |
could it scroll a "read more" page? |
21:08
🔗
|
CoolCanuk |
yup |
21:09
🔗
|
|
jschwart has quit IRC (Read error: Connection reset by peer) |
21:10
🔗
|
CoolCanuk |
http://webscraper.io/tutorials |
21:10
🔗
|
|
jschwart has joined #archiveteam-bs |
21:11
🔗
|
CoolCanuk |
you can continue to use chrome while it scrapes. Just don't close the tab |
21:14
🔗
|
ola_norsk |
i'd need a headless browser for that. So many times i close shit by even accident |
21:15
🔗
|
ola_norsk |
but, it has no problem with lets say ; having to automatically scroll down a twitter page of a certain hashtag? |
21:17
🔗
|
CoolCanuk |
shouldnt be |
21:17
🔗
|
ola_norsk |
ty |
21:18
🔗
|
|
Aerochrom has quit IRC (Read error: Connection reset by peer) |
21:23
🔗
|
ola_norsk |
CoolCanuk: this, http://webscraper.io/documentation#element-click-selector could be the damn "This is taking a long time..CLICK TO RETRY" twitter shit, right? :D |
21:23
🔗
|
JAA |
Was about to say, that button is the worst. |
21:24
🔗
|
JAA |
And headless browsers exist. Both Chromium and Firefox have headless modes since a few months. |
21:24
🔗
|
JAA |
I.e. proper headless browsers, not some frankensteined stuff like PhantomJS. |
21:24
🔗
|
ola_norsk |
any way to simulate scroll down? |
21:25
🔗
|
JAA |
Most likely. I haven't looked at it in detail yet. |
21:26
🔗
|
Harzilein |
JAA: heh, i kind of wondered if that's how firefox would sweeten the deal for devs w/ xul going away |
21:27
🔗
|
Harzilein |
JAA: so embedding might get easier again as well? |
21:27
🔗
|
JAA |
Possible, but more likely that they just realised that something like this is wanted. Chromium shipped their version of it quite a bit earlier. |
21:27
🔗
|
JAA |
Yes, definitely. |
21:28
🔗
|
JAA |
PurpleSym has been working on building a grabber with Chromium headless. |
21:28
🔗
|
Harzilein |
JAA: i always had a sweet spot for the kazehakase browser |
21:31
🔗
|
ola_norsk |
one problem with twitter is they fucking t.co url's though. Although, they could easily be written to real urls since the url they point to is always in the alt= property of the links |
21:31
🔗
|
JAA |
Harzilein: Looks like that project has been dead for years though. :-/ |
21:32
🔗
|
Harzilein |
JAA: yes. that goes together with bitrots in distributions/port collections that used to have it |
21:32
🔗
|
JAA |
ola_norsk: I wish we could throw t.co into URLTeam. But the codes are way too long to do that. |
21:32
🔗
|
ola_norsk |
? |
21:33
🔗
|
JAA |
ola_norsk: Archiving link shorteners because they suck. |
21:33
🔗
|
ola_norsk |
hell yeah they are :D |
21:34
🔗
|
JAA |
If you want to know more: http://archiveteam.org/index.php?title=URLTeam and #urlteam |
21:35
🔗
|
ola_norsk |
can't click on a damn think on captured twitter, without t.co making it fail. Even if the page shows the correct link at hovering over it |
21:39
🔗
|
ola_norsk |
some naturally short link do, but i guess if a link that surpasses char limit of tweet is pasted by user, it becomes converted to t.co link |
21:40
🔗
|
ola_norsk |
and the actual link is then put in the 'alt=' property of that t.co link |
21:45
🔗
|
ola_norsk |
basically looks like a list of internet MOULD http://archiveteam.org/index.php?title=URLTeam#URL_shorteners |
21:59
🔗
|
CoolCanuk |
it should work, yes ola_norsk |
21:59
🔗
|
CoolCanuk |
I have no used it for twitter |
21:59
🔗
|
CoolCanuk |
*not |
22:19
🔗
|
|
ndiddy has joined #archiveteam-bs |
22:23
🔗
|
|
icedice has joined #archiveteam-bs |
22:50
🔗
|
ola_norsk |
CoolCanuk: this illustrates url shorteners i think http://www.magsfrosta.no/images/svartsopp.jpg .. URL shorteners deserves a latin name.. |
22:51
🔗
|
CoolCanuk |
:P |
22:51
🔗
|
ola_norsk |
what's that shit that rots tents if one pack them while moist? |
22:53
🔗
|
ola_norsk |
"hospites putridum" .. that's url shortners |
22:56
🔗
|
CoolCanuk |
can a bureaucrat please delete those pages |
22:57
🔗
|
JAA |
SketchCow: Wiki spam ^ |
22:58
🔗
|
wp494 |
CoolCanuk: use {{Delete}} to flag spam articles for deletion |
22:58
🔗
|
wp494 |
I'll do it for you |
22:59
🔗
|
CoolCanuk |
why? The template is blank |
22:59
🔗
|
wp494 |
It adds the "Deleteme" category to those pages |
22:59
🔗
|
CoolCanuk |
Why not just add it to Deleteme? |
23:00
🔗
|
SketchCow |
Neat |
23:01
🔗
|
jrwr |
https://www.youtube.com/watch?v=cdySvd_HIlQ |
23:01
🔗
|
jrwr |
just watching it now |
23:01
🔗
|
jrwr |
Your crazy ass suite |
23:02
🔗
|
ola_norsk |
finally a video, i was so bored on #norge |
23:03
🔗
|
ola_norsk |
...what if...black light causes cancer.. :/ |
23:03
🔗
|
ola_norsk |
he's so sued! |
23:03
🔗
|
CoolCanuk |
lol |
23:09
🔗
|
CoolCanuk |
(I don't know why the IRC bot reports +1 or -1 but oh well) |
23:09
🔗
|
|
bithippo has joined #archiveteam-bs |
23:10
🔗
|
ola_norsk |
on a serious not, it worries me to learn the state of it... When someone mailed someone at dataforeningen.no regarding a national mirror , the message back was "I like the idea very much! I know WayBackMachine very well! What else can you tell me about what you're thinking." |
23:10
🔗
|
|
ranavalon has quit IRC (Read error: Connection reset by peer) |
23:10
🔗
|
JAA |
That's the article length (bytes or characters) difference due to your edit. |
23:10
🔗
|
wp494 |
CoolCanuk: it's just relaying the amount of bytes that were changed with your edit, so since yours was just 1 character taken out, it would be -1 |
23:10
🔗
|
CoolCanuk |
-1 is hardly worth reporting about |
23:10
🔗
|
wp494 |
(in the case of the -1s) |
23:10
🔗
|
|
ranavalon has joined #archiveteam-bs |
23:10
🔗
|
wp494 |
/shrug |
23:11
🔗
|
wp494 |
it relays everything and it doesn't give any damns about how big or how small edits are |
23:11
🔗
|
|
Mateon1 has quit IRC (Read error: Operation timed out) |
23:11
🔗
|
|
Mateon1 has joined #archiveteam-bs |
23:11
🔗
|
JAA |
ola_norsk: Nice! |
23:12
🔗
|
|
ZexaronS has joined #archiveteam-bs |
23:13
🔗
|
ola_norsk |
a representative of dataforeningen.no saying "I like the idea" is not actually a small deal, it's one of the oldest computer thingy techie associations in Europe (and scandinavia) |
23:14
🔗
|
ola_norsk |
all that was asked was 'What would be your stand on Norway having a IA mirror' |
23:14
🔗
|
CoolCanuk |
i clicked on their site and they retweeted a computer scren photo. NOPE. use the screenshot tool xD |
23:15
🔗
|
|
ranavalon has quit IRC (Read error: Connection reset by peer) |
23:15
🔗
|
JAA |
So I'm supposed to take a screenshot of the photo and distribute that. Got it. |
23:16
🔗
|
|
ranavalon has joined #archiveteam-bs |
23:16
🔗
|
CoolCanuk |
I brought up a privacy concern because my college was asking for SIN numbers over HTTP (not HTTPS). Privacy commissioner claimed colleges are not for profit so they won't help me. Even though the college says they have to follow PIPEDA everywhere. |
23:16
🔗
|
CoolCanuk |
I was so upset lol |
23:16
🔗
|
ola_norsk |
the problem is, i'm just a drunk fuck from an islan out west in norway...So a "pitch deck" would certainly help :/ |
23:17
🔗
|
|
SynMonger has left |
23:17
🔗
|
|
jschwart has quit IRC (Konversation terminated!) |
23:19
🔗
|
ola_norsk |
when he asked me; "Have you talked to Brewster Kahle" , i was tempted to write back "who the fuck do you think i am? I just asked about your stance!" |
23:19
🔗
|
|
ranavalon has quit IRC (Read error: Connection reset by peer) |
23:19
🔗
|
|
ranavalon has joined #archiveteam-bs |
23:24
🔗
|
ola_norsk |
anyway, my point is that Den Norske Dataforening are major 'movers-and-shakers' in Norway .. And if they represative say "I really like the idea! What more can you tell me" ..that is not really a small thingy |
23:24
🔗
|
astrid |
:) |
23:24
🔗
|
ola_norsk |
but a drunk fuck like me can't really take it further |
23:27
🔗
|
bithippo |
ola_norsk: I'd be happy to put a pitch deck together with IA's blessing |
23:27
🔗
|
bithippo |
I'd be happy to approach governments for grants on IA's behalf _for free_ actually |
23:27
🔗
|
bithippo |
Not sure if that's something they'd be cool with? Have to check with someone who has authority in such matters. |
23:27
🔗
|
|
bithippo is now known as TooMuchTo |
23:28
🔗
|
|
TooMuchTo is now known as bithippo |
23:29
🔗
|
ola_norsk |
bithippo: aye..that would be most helpful |
23:29
🔗
|
ola_norsk |
here's what i wrote back btw: https://pastebin.com/raw/RWFYhsCd (translate whoever want's to) :D |
23:32
🔗
|
ola_norsk |
google translation https://pastebin.com/raw/nzAZx0RV ..all i did was ask about their stance of the possibility :( suddenly i get questions back. I'm not a boss! :/ |
23:32
🔗
|
|
icedice has quit IRC (Ping timeout: 260 seconds) |
23:33
🔗
|
ola_norsk |
google translate is horrible shit... |
23:34
🔗
|
bithippo |
I would love to get IA EU sponsorship for an EU physical location/mirror |
23:36
🔗
|
ola_norsk |
pfff..EU , that's breaking it apart it seems :D |
23:36
🔗
|
ola_norsk |
but yeah |
23:37
🔗
|
ola_norsk |
(norway is not member of eu btw) |
23:38
🔗
|
sec0nd |
Does the archive have full text search? |
23:38
🔗
|
sec0nd |
For the wayback machine mainly and perhaps also the collections |
23:39
🔗
|
ola_norsk |
sec0nd: here's for https://archive.org/details/texts?and[]=it%20was%20the%20best%20of%20times |
23:40
🔗
|
ola_norsk |
i think WayBackMachine operates on 'keywords' ...whatever that means |
23:40
🔗
|
ola_norsk |
prolly means in urls, i don't know |
23:43
🔗
|
ola_norsk |
sec0nd: https://web.archive.org/web/*/jimmy%20hendrix |
23:44
🔗
|
astrid |
that searches <title> tags, which is better than nothing |
23:44
🔗
|
ola_norsk |
sec0nd: seems like wayback 'keywords' are words in urls |
23:44
🔗
|
ola_norsk |
even better than in just urls then.. |
23:44
🔗
|
sec0nd |
Thank you |
23:45
🔗
|
|
ZexaronS has quit IRC (Read error: Operation timed out) |
23:47
🔗
|
|
ZexaronS has joined #archiveteam-bs |
23:49
🔗
|
ola_norsk |
is this an 'The Onion' clone? http://irishpost.co.uk/irish-villagers-complain-viagra-plant-fumes-men-dogs-walking-around-hard-ons/ |
23:50
🔗
|
ola_norsk |
if that's a legit newspaper, i'd like to be irish |
23:53
🔗
|
ola_norsk |
"The Irish Post is the biggest selling national newspaper to the Irish in Britain." ..wtf |
23:53
🔗
|
CoolCanuk |
i read that too lmao |
23:54
🔗
|
ola_norsk |
no i mean, 'the biggest selling national newpapaer to the Irish IN BRITAIN'...something is fushy about that |
23:54
🔗
|
ola_norsk |
fishy* |
23:58
🔗
|
ola_norsk |
CoolCanuk: it must be a 'The Onion' clone.. Does viagra 'plants' even produce "fumes" ?? lol |
23:58
🔗
|
CoolCanuk |
i dont think it would make sense to have fumes |
23:58
🔗
|
ola_norsk |
aye |
23:58
🔗
|
CoolCanuk |
i dont think any factory for medicine would have fumes.. |
23:59
🔗
|
CoolCanuk |
heat exaust, sure. but shouldnt be fumes unless they're burning trash /bad batches which is NOT permitted |
23:59
🔗
|
ola_norsk |
aye.. And where's the lawsuits.. |
23:59
🔗
|
CoolCanuk |
ya |